Metallicity Structure in the Milky Way Disk Revealed by Galactic HII Regions

**data**

**analysis**strategies, optical depth effects, and/or the observation of different gas by the interferometer. Expand abstract.

**data**to derive the electron temperatures and metallicities for these nebulae. Since collisionally excited lines from metals (e.g., oxygen, nitrogen) are the dominant cooling mechanism in HII regions, the nebular metallicity can be inferred from the electron temperature. Including previous single dish studies, there are now 167 nebulae with radio-determined electron temperature and either parallax or kinematic distance determinations. The interferometric electron temperatures are systematically 10% larger than those found in previous single dish studies, likely due to incorrect

**data**

**analysis**strategies, optical depth effects, and/or the observation of different gas by the interferometer. By combining the interferometer and single dish samples, we find an oxygen abundance gradient across the Milky Way disk with a slope of -0.052 +/- 0.004 dex/kpc. We also find significant azimuthal structure in the metallicity distribution. The slope of the oxygen gradient varies by a factor of ~2 when Galactocentric azimuths near 30 deg are compared with those near 100 deg. This azimuthal structure is consistent with simulations of Galactic chemodynamical evolution influenced by spiral arms.

4/10 relevant

arXiv

Online Size Exclusion Chromatography-Fast Photochemical Oxidation of Proteins Allows for Targeted Structural **Analysis** of Conformationally Heterogeneous Mixtures

**analysis**by covalent labeling mass spectrometry, conformational heterogeneity will result in data reflecting a weighted average of all conformers, greatly complicating

**data**

**analys**is and potentially causing misinterpretation of results. Expand abstract.

**analysis**of proteins in a conformationally heterogeneous mixture has long been a difficult problem in structural biology, resulting in complex challenges in

**data**

**analysis**or complete failure of the method. In structural

**analysis**by covalent labeling mass spectrometry, conformational heterogeneity will result in

**data**reflecting a weighted average of all conformers, greatly complicating

**data**

**analysis**and potentially causing misinterpretation of results. Here, we describe a method coupling size exclusion chromatography in an HPLC format with Hydroxyl Radical Protein Footprinting (HRPF) using online Fast Photochemical Oxidation of Proteins (FPOP). Using controlled mixtures of myoglobin and apomyoglobin as a model system to allow for controllable conformational heterogeneity, we demonstrate that we can obtain HRPF footprints of both holomyoglobin and apomyoglobin as they elute off of the SEC column. Comparison of online SEC-FPOP

**data**of both mixture components with traditional FPOP

**data**of each individual component shows that we can obtain the exact same footprinting pattern for each conformation in an online format with real-time FPOP. Using this method, conformations within conformationally heterogeneous mixtures can now be individually probed by SEC-FPOP, and the stability of the FPOP label allows this structural information to be retained.

4/10 relevant

bioRxiv

Flash X-ray diffraction imaging in 3D: a proposed **analysis** pipeline

**data**-

**analysis**pipeline. Expand abstract.

**data**

**analysis**pipeline for FXI experiments, which includes four steps: hit finding and preliminary filtering, pattern classification, 3D Fourier reconstruction, and post

**analysis**. We also include a recently developed bootstrap methodology in the post-

**analysis**step for uncertainty

**analysis**and quality control. To achieve the best possible resolution, we further suggest using background subtraction, signal windowing, and convex optimization techniques when retrieving the Fourier phases in the post-

**analysis**step. As an application example, we quantified the 3D electron structure of the PR772 virus using the proposed

**data**-

**analysis**pipeline. The retrieved structure was above the detector-edge resolution and clearly showed the pseudo-icosahedral capsid of the PR772.

5/10 relevant

arXiv

Cumulus: a cloud-based **data** **analysis** framework for large-scale single-cell and single-nucleus RNA-seq

**data**generation is growing, so does the need for computational pipelines for scaled

**analysis**. Expand abstract.

**data**generation is growing, so does the need for computational pipelines for scaled

**analysis**. Here, we developed Cumulus, a cloud-based framework for analyzing large scale sc/snRNA-seq datasets. Cumulus combines the power of cloud computing with improvements in algorithm implementations to achieve high scalability, low cost, user-friendliness, and integrated support for a comprehensive set of features. We benchmark Cumulus on the Human Cell Atlas Census of Immune Cells dataset of bone marrow cells and show that it substantially improves efficiency over conventional frameworks, while maintaining or improving the quality of results, enabling large-scale studies.

7/10 relevant

bioRxiv

Paths Explored, Paths Omitted, Paths Obscured: Decision Points &
Selective Reporting in End-to-End **Data** **Analysis**

**analys**is, for instance via tracking and meta-

**analysis**of multiple decision paths. Expand abstract.

**data**involves many, sometimes arbitrary, decisions across phases of

**data**collection, wrangling, and modeling. As different choices can lead to diverging conclusions, understanding how researchers make analytic decisions is important for supporting robust and replicable

**analysis**. In this study, we pore over nine published research studies and conduct semi-structured interviews with their authors. We observe that researchers often base their decisions on methodological or theoretical concerns, but subject to constraints arising from the data, expertise, or perceived interpretability. We confirm that researchers may experiment with choices in search of desirable results, but also identify other reasons why researchers explore alternatives yet omit findings. In concert with our interviews, we also contribute visualizations for communicating decision processes throughout an

**analysis**. Based on our results, we identify design opportunities for strengthening end-to-end analysis, for instance via tracking and meta-

**analysis**of multiple decision paths.

8/10 relevant

arXiv

Interlaboratory **Data** Variability Contributes to the Differential Principal Components of Human Primed and Naïve-like Pluripotent States in Multivariate Meta-Analysis

**datas**ets via the combined analytic powers of percentile normalization, principal component

**analysis**(PCA), t-distributed stochastic neighbor embedding (t-SNE), and SC3 consensus clustering. Expand abstract.

**data**

**analyses**have revealed significant differences between various human naive-like pluripotent states derived from different laboratory protocols, perplexing about the criteria of human naive pluripotency. Thus, it is imperative to understand the concept concerning the ground or naive pluripotent state of pluripotent stem cells, which was initially established in mouse embryonic stem cells (mESCs). Putative human pluripotency has been proposed, largely based on comparing genome-wide transcriptomic signatures of human pluripotent stem cells (hPSCs) with human pre-implantation embryos and mESCs by several research groups. Current bioinformatics approaches, however, have inevitable conceptual biases and technological limitations, including the choices of datasets, analytic methods, and interlaboratory

**data**variability. In this report, we performed a multivariate meta-

**analysis**of major hPSC datasets via the combined analytic powers of percentile normalization, principal component

**analysis**(PCA), t-distributed stochastic neighbor embedding (t-SNE), and SC3 consensus clustering. This vigorous bioinformatics approach has significantly improved the predictive values of the current meta-analysis. Accordingly, we were able to reveal various fundamental inconsistencies between naive-like hPSCs and their human and mouse in vitro counterparts, which are likely attributed to interlaboratory protocol differences. Moreover, our meta-

**analysis**failed to provide global transcriptomic markers that support the putative in vitro human naive pluripotent state, rather suggesting the existence of altered pluripotent states under current naive-like hPSC growth protocols.

5/10 relevant

bioRxiv

Using Bayesian Model Selection to Advise Neutron Reflectometry **Analysis** from Langmuir-Blodgett Monolayers

**data**

**analysis**, and hope others more regularly consider the relative evidence for their analytical models. Expand abstract.

**analysis**of neutron and X-ray reflectometry

**data**is important for the study of interfacial soft matter structures. However, there is still substantial discussion regarding the analytical modelsthat should be used to rationalise relflectometry

**data**. In this work, we outline a robust and generic framework for the determination of the evidence for a particular model given experimental data, byapplying Bayesian logic. We apply this framework to the study of Langmuir-Blodgett monolayers by considering three possible analytical models from a recently published investigation [Campbell et al., J. Colloid Interface Sci, 2018, 531, 98]. From this, we can determine which model has the most evidence given the experimental data, and show the effect that different isotopic contrasts of neutron reflectometry will have on this. We believe that this general framework could become an important component of neutron and X-ray reflectometry

**data**analysis, and hope others more regularly consider the relative evidence for their analytical models.

4/10 relevant

chemRxiv

Robust Principal Component **Analysis** Based On Maximum Correntropy Power
Iterations

**analysis**(PCA) is recognised as a quintessential

**data**

**analys**is technique when it comes to describing linear relationships between the features of a

**datas**et. Expand abstract.

**analysis**(PCA) is recognised as a quintessential

**data**

**analysis**technique when it comes to describing linear relationships between the features of a dataset. However, the well-known sensitivity of PCA to non-Gaussian samples and/or outliers often makes it unreliable in practice. To this end, a robust formulation of PCA is derived based on the maximum correntropy criterion (MCC) so as to maximise the expected likelihood of Gaussian distributed reconstruction errors. In this way, the proposed solution reduces to a generalised power iteration, whereby: (i) robust estimates of the principal components are obtained even in the presence of outliers; (ii) the number of principal components need not be specified in advance; and (iii) the entire set of principal components can be obtained, unlike existing approaches. The advantages of the proposed maximum correntropy power iteration (MCPI) are demonstrated through an intuitive numerical example.

4/10 relevant

arXiv

**Data** Driven Conditional Optimal Transport

**data**driven procedure is developed to compute the optimal map between two conditional probabilities $\rho(x|z_{1},...,z_{L})$ and $\mu(y|z_{1},...,z_{L})$ depending on a set of covariates $z_{i}$. The procedure is tested on synthetic

**data**from the ACIC

**Data**

**Analysis**Challenge 2017 and it is applied to non uniform lightness transfer between images. Exactly solvable examples and simulations are performed to highlight the differences with ordinary optimal transport.

4/10 relevant

arXiv

Using Bayesian model selection to advise neutron reflectometry **analysis**
from Langmuir-Blodgett monolayers

**data**

**analysis**, and hope others more regularly consider the relative evidence for their analytical models. Expand abstract.

**analysis**of neutron and X-ray reflectometry

**data**is important for the study of interfacial soft matter structures. However, there is still substantial discussion regarding the analytical models that should be used to rationalise reflectometry

**data**. In this work, we outline a robust and generic framework for the determination of the evidence for a particular model given experimental data, by applying Bayesian logic. We apply this framework to the study of Langmuir-Blodgett monolayers by considering three possible analytical models from a recently published investigation [Campbell \textit{et al., J. Colloid Interface Sci}, 2018, \textbf{531}, 98]. From this, we can determine which model has the most evidence given the experimental data, and show the effect that different isotopic contrasts of neutron reflectometry will have on this. We believe that this general framework could become an important component of neutron and X-ray reflectometry

**data**analysis, and hope others more regularly consider the relative evidence for their analytical models.

4/10 relevant

arXiv