Accurate Prediction of Chemical Shifts for Aqueous Protein Structure for
"Real World" Cases using **Machine** **Learning**

**machine**

**learning**module based on random forest regression which utilizes more, and more carefully... Expand abstract.

**machine**

**learning**algorithm for protein chemical shift prediction that outperforms existing chemical shift calculators on realistic NMR solution data. Our UCBShift predictor implements two modules: a transfer prediction module that employs both sequence and structural alignment to select reference candidates for experimental chemical shift replication, and a redesigned

**machine**

**learning**module based on random forest regression which utilizes more, and more carefully curated, feature extracted data. When combined together, this new predictor achieves state of the art accuracy for predicting chemical shifts on a "real-world" dataset, with root-mean-square errors of 0.31 ppm for amide hydrogens, 0.19 ppm for Halpha, 0.87 ppm for C, 0.81 ppm for Calpha, 1.01 ppm for Cbeta, and 1.83 ppm for N, exceeding current prediction accuracy of popular chemical shift predictors such as SPARTA+ and SHIFTX2.

10/10 relevant

arXiv

Merlin: Enabling **Machine** **Learning**-Ready HPC Ensembles

**machine**

**learning**(ML) techniques to analyze large scale ensemble data. Expand abstract.

**machine**

**learning**(ML) techniques to analyze large scale ensemble data. With complexities such as multi-component workflows, heterogeneous

**machine**architectures, parallel file systems, and batch scheduling, care must be taken to facilitate this analysis in a high performance computing (HPC) environment. In this paper, we present Merlin, a workflow framework to enable large ML-friendly ensembles of scientific HPC simulations. By augmenting traditional HPC with distributed compute technologies, Merlin aims to lower the barrier for scientific subject matter experts to incorporate ML into their analysis. In addition to its design and some examples, we describe how Merlin was deployed on the Sierra Supercomputer at Lawrence Livermore National Laboratory to create an unprecedented benchmark inertial confinement fusion dataset of approximately 100 million individual simulations and over 24 terabytes of multi-modal physics-based scalar, vector and hyperspectral image data.

10/10 relevant

arXiv

Combining **machine** **learning** and a universal acoustic feature-set yields efficient automated monitoring of ecosystems

8/10 relevant

bioRxiv

Quantum-Inspired Hamiltonian Monte Carlo for Bayesian Sampling

**machine**

**learning**, we develop a stochastic gradient version of QHMC using Nos\'e-Hoover thermostat called QSGNHT, and we also provide theoretical justifications about its steady-state distributions. Expand abstract.

**machine**

**learning**and data science, HMC is inefficient to sample from spiky and multimodal distributions. Motivated by the energy-time uncertainty relation from quantum mechanics, we propose a Quantum-Inspired Hamiltonian Monte Carlo algorithm (QHMC). This algorithm allows a particle to have a random mass with a probability distribution rather than a fixed mass. We prove the convergence property of QHMC in the spatial domain and in the time sequence. We further show why such a random mass can improve the performance when we sample a broad class of distributions. In order to handle the big training data sets in large-scale

**machine**learning, we develop a stochastic gradient version of QHMC using Nos\'e-Hoover thermostat called QSGNHT, and we also provide theoretical justifications about its steady-state distributions. Finally in the experiments, we demonstrate the effectiveness of QHMC and QSGNHT on synthetic examples, bridge regression, image denoising and neural network pruning. The proposed QHMC and QSGNHT can indeed achieve much more stable and accurate sampling results on the test cases.

4/10 relevant

arXiv

Physics-Informed **Machine** **Learning** with Conditional Karhunen-Lo\`eve
Expansions

**machine**

**learning**approach for the inversion of PDE models with heterogeneous parameters. Expand abstract.

**machine**

**learning**approach for the inversion of PDE models with heterogeneous parameters. In our approach, the space-dependent partially-observed parameters and states are approximated via Karhunen-Lo\`eve expansions (KLEs). Each of these KLEs is then conditioned on their corresponding measurements, resulting in low-dimensional models of the parameters and states that resolve observed data. Finally, the coefficients of the KLEs are estimated by minimizing the norm of the residual of the PDE model evaluated at a finite set of points in the computational domain, ensuring that the reconstructed parameters and states are consistent with both the observations and the PDE model to an arbitrary level of accuracy. In our approach, KLEs are constructed using the eigendecomposition of covariance models of spatial variability. For the model parameters, we employ a parameterized covariance model calibrated on parameter observations; for the model states, the covariance is estimated from a number of forward simulations of the PDE model corresponding to realizations of the parameters drawn from their KLE. We apply the proposed approach to identifying heterogeneous log-diffusion coefficients in diffusion equations from spatially sparse measurements of the log-diffusion coefficient and the solution of the diffusion equation. We find that the proposed approach compares favorably against state-of-the-art point estimates such as maximum a posteriori estimation and physics-informed neural networks.

10/10 relevant

arXiv

State estimation of surface and deep flows from sparse SSH observations of geostrophic ocean turbulence using Deep **Learning**

**Learning**— a

**machine**learning approach that extracts information only from data. Expand abstract.

**Learning**— a

**machine**

**learning**approach that extracts information only from data. Using synthetic observations taken from an idealized quasigeostrophic model of baroclinic ocean turbulence, we demonstrate that Convolutional Neural Networks with Residual

**Learning**are superior in SSH reconstruction than the linear and recently developed dynamical interpolation techniques. Furthermore, the neural network can provide an accurate state estimate of unobserved deep ocean currents at mesoscales, suggesting that SSH patterns of eddies do contain substantial information about ocean interior that is necessary for SSH prediction. Our framework is highly idealized and several crucial improvements such as transfer

**learning**and diversification of training data would be necessary to implement before its ultimate use with real satellite observations. Nonetheless, by providing a proof of concept, our results point to Deep

**Learning**as a viable alternative to existing interpolation and more generally state estimation methods for satellite observations of baroclinic ocean turbulence.

4/10 relevant

EarthArXiv

**Machine** **Learning** for Paper Grammage Prediction Based on Sensor Measurements in Paper Mills

**Machine**

**Learning**(ML) algorithms can effectively be used to resolve this tradeoff between full automation and human assistance. Expand abstract.

**Machine**Interface (HMI).

**Machine**

**Learning**(ML) algorithms can effectively be used to resolve this tradeoff between full automation and human assistance. This paper provides an example of the industrial application of ML algorithms to help human operators save their mental effort and avoid time delays and unintended mistakes for the sake of high production rates. Based on real-time sensor measurements, several ML algorithms have been tried to classify paper rolls according to paper grammage in a white paper mill. The performance evaluation shows that the AdaBoost algorithm is the best ML algorithm for this application with classification accuracy (CA), precision, and recall of 97.1%. The generalization of the proposed approach for achieving a cost-effective mill construction by reducing the total number of the required physical sensors will be the subject of our future research.

10/10 relevant

Preprints.org

An Introduction to Communication Efficient Edge **Machine** **Learning**

**machine**

**learning**. Expand abstract.

**machine**

**learning**. One challenge faced by edge

**learning**is the communication bottleneck, which is caused by the transmission of high-dimensional data from many edge devices to edge servers for

**learning**. Traditional wireless techniques focusing only on efficient radio access are ineffective in tackling the challenge. Solutions should be based on a new approach that seamlessly integrates communication and computation. This has led to the emergence of a new cross-disciplinary paradigm called communication efficient edge

**learning**. The main theme in the area is to design new communication techniques and protocols for efficient implementation of different distributed

**learning**frameworks (i.e., federated learning) in wireless networks. This article provides an overview of the emerging area by introducing new design principles, discussing promising research opportunities, and providing design examples based on recent work.

9/10 relevant

arXiv

Make Thunderbolts Less Frightening -- Predicting Extreme Weather Using
Deep **Learning**

**Machine**

**learning**approaches and especially deep learning have however shown huge improvements in many research areas dealing with large datasets in recent years. Expand abstract.

**Machine**

**learning**approaches and especially deep

**learning**have however shown huge improvements in many research areas dealing with large datasets in recent years. In this work, we tackle one specific sub-problem of weather forecasting, namely the prediction of thunderstorms and lightning. We propose the use of a convolutional neural network architecture inspired by UNet++ and ResNet to predict thunderstorms as a binary classification problem based on satellite images and lightnings recorded in the past. We achieve a probability of detection of more than 94% for lightnings within the next 15 minutes while at the same time minimizing the false alarm ratio compared to previous approaches.

5/10 relevant

arXiv

Predicting Lake Erie Wave Heights using XGBoost

**machine**

**learning**method based on XGBoost for predicting waves in Lake Erie in 2016-2017. Expand abstract.

**machine**-

**learning**method, which can potentially provide comparable performance of numerical wave models but only requires a small fraction of computational costs. In this study, we applied and tested a novel

**machine**

**learning**method based on XGBoost for predicting waves in Lake Erie in 2016-2017. In this study, buoy data from 1994 to 2017 were processed for model training and testing. We trained the model with data from 1994-2015, then used the trained model to predict 2016 and 2017 wave features. The mean absolute error of wave height is about 0.11-0.18 m and the maximum error is 1.14-1.95 m, depending on location and year. For comparison, an unstructured WW3 model was implemented in Lake Erie for simulating wind generated waves. The WW3 results were compared with buoy data from National Data Buoy Center in Lake Erie, the mean absolute error of wave height is about 0.12-0.48 m and the maximum error is about 1.03-2.93 m. The results show that WW3 underestimates wave height spikes during strong wind events and The XGBoost improves prediction on wave height spikes. The XGBoost runs much faster than WW3. For a model year run on a supercomputer, WW3 needs 12 hours with 60 CPUs while XGBoost needs only 10 minutes with 1 CPU. In summary, the XGBoost provided comparable performance for our simulations in Lake Erie wave height and the computational time required was about 0.02 % of the numerical simulations.

4/10 relevant

arXiv