Publications | Yang Feng | Professor of Biostatistics @ NYU

Yang Feng's Research Cloud

For a more up-to-date list of publications, please visit My Google Scholar Page (By Year).

Filter by topic (click to show only papers tagged with that subject).

All High-Dimensional Statistics Nonparametric Statistics Network Analysis Graphical Models Classification Neyman-Pearson Classification Model Averaging Machine Learning Epidemiology Transfer Learning Causal Inference Multi-task Learning Federated Learning Differential Privacy

2026

Variational Nonparametric Inference in Stochastic Block Models with Functional Covariates

Zuofeng Shang, Peijun Sang, Yang Feng, and Chong Jin

Journal of the American Statistical Association, 2026

Abs DOI Bib

We propose a functional stochastic block model whose vertices involve functional data information. This new model extends the classic stochastic block model with vector-valued nodal information, and finds applications in real-world networks whose nodal information could be functional curves. Examples include international trade data in which a network vertex (country) is associated with the annual or quarterly GDP over a certain time period, and MyFitnessPal data in which a network vertex (MyFitnessPal user) is associated with daily calorie information measured over a certain time period. Two statistical tasks will be jointly executed. First, we will detect community structures of the network vertices assisted by the functional nodal information. Second, we propose a computationally efficient variational test to examine the significance of the functional nodal information. We show that the community detection algorithms achieve weak and strong consistency, and the variational test is asymptotically chi-square with diverging degrees of freedom. As a byproduct, we propose pointwise confidence intervals for the slope function of the functional nodal information. Our methods are examined through both simulated and real datasets. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
@article{shang2026variational, title = {Variational Nonparametric Inference in Stochastic Block Models with Functional Covariates}, author = {Shang, Zuofeng and Sang, Peijun and Feng, Yang and Jin, Chong}, journal = {Journal of the American Statistical Association}, pages = {1--13}, year = {2026}, publisher = {Informa UK Limited}, doi = {10.1080/01621459.2026.2654876} }
Robust Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models

Ye Tian, Haolei Weng, Lucy Xia, and Yang Feng

Journal of the American Statistical Association, 2026

Abs DOI Bib

Unsupervised learning has been widely used in many real-world applications. One of the simplest and most important unsupervised learning models is the Gaussian mixture model (GMM). In this work, we study the multi-task learning problem on GMMs, which aims to leverage potentially similar GMM parameter structures among tasks to obtain improved learning performance compared to single-task learning. We propose a multi-task GMM learning procedure based on the EM algorithm that effectively utilizes unknown similarities between related tasks and is robust against a fraction of outlier tasks from arbitrary distributions. The proposed procedure is shown to achieve the minimax optimal rate of convergence for both the parameter estimation error and the excess mis-clustering error, in a wide range of regimes. Moreover, we generalize our approach to tackle the problem of transfer learning for GMMs, where similar theoretical results are derived. Additionally, iterative unsupervised multi-task and transfer learning methods may suffer from an initialization alignment problem, and two alignment algorithms are proposed to resolve the issue. Finally, we demonstrate the effectiveness of our methods through simulations and real data examples. To the best of our knowledge, this is the first work studying multi-task and transfer learning on GMMs with theoretical guarantees.
@article{tian2026robust, title = {Robust Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models}, author = {Tian, Ye and Weng, Haolei and Xia, Lucy and Feng, Yang}, journal = {Journal of the American Statistical Association}, pages = {1--30}, year = {2026}, publisher = {Informa UK Limited}, doi = {10.1080/01621459.2026.2670031} }
Leveraging Kappa-Lambda Signatures in a Multistage Machine Learning Pipeline for B-Cell Lymphoma Detection by Flow Cytometry

Iris Zhang, Sulov Chalise, Mikhail Roshal, Qi Gao, Menglei Zhu, and 1 more author

The American Journal of Pathology, 2026

Abs DOI Bib

Flow cytometry immunophenotyping is essential for diagnosing B-cell lymphomas, but manual interpretation of high-dimensional data remains subjective, time-consuming, and prone to interoperator variability. Previous computational approaches often overlook clinically relevant principles, such as Ig light chain restriction. To address this gap, a biologically informed, three-stage machine learning pipeline that integrates Ig κ (IGK) and Ig λ (IGL) signatures to improve B-cell lymphoma detection was developed. A total of 200 peripheral blood samples (100 normal, 100 abnormal) were analyzed, comprising >15 million single-cell events characterized by 21 immunophenotypic markers. Three XGBoost models were trained sequentially: the first classified light chain expression (IGK, IGL, or nuisance), the second identified cell phenotypes using marker intensities and IGK/IGL-based neighborhood enrichment, and the third produced sample-level predictions based on aggregated cell features. The IGK/IGL classifier achieved 88.0% test accuracy [area under the receiver operating characteristic curve (AUC), 0.957], whereas the cell-level classification reached 92.9% accuracy (AUC, 0.983), with IGK/IGL enrichment as the most informative feature. Similarly, sample-level classification achieved 94.7% accuracy (AUC, 0.976), with improved performance when IGK/IGL enrichment was included. These findings demonstrate that incorporating biologically grounded features enhances both the accuracy and interpretability of automated flow cytometry analysis. This approach offers a scalable, reproducible, and clinically aligned alternative to the manual review of flow cytometry data for B-cell lymphomas.
@article{zhang2026leveraging, title = {Leveraging Kappa-Lambda Signatures in a Multistage Machine Learning Pipeline for B-Cell Lymphoma Detection by Flow Cytometry}, author = {Zhang, Iris and Chalise, Sulov and Roshal, Mikhail and Gao, Qi and Zhu, Menglei and Feng, Yang}, journal = {The American Journal of Pathology}, volume = {196}, number = {5}, pages = {1158--1168}, year = {2026}, publisher = {Elsevier BV}, doi = {10.1016/j.ajpath.2026.02.006} }
Effectiveness of nicotine vape products (E-cigarettes) as a smoking cessation aid for US adults: a narrative review of findings from the population assessment of tobacco and health study

Shu Xu, Jianan Zhu, Yuxin Zhang, Jennifer Hill, Yang Feng, and 2 more authors

Nicotine & Tobacco Research, 2026

Abs DOI Bib

Introduction Controversy remains regarding whether nicotine vaping products (NVPs) are associated with cigarette cessation in observational research. Reviews have largely overlooked studies using the same data source. To address this gap, we conducted a narrative review to examine the heterogeneity in the reported association that used data from the same source, which may help to explain inconsistent findings. Methods We identified empirical studies through PubMed and Google searches that exclusively used the Population Assessment of Tobacco and Health (PATH) Study data to examine associations between NVP use and smoking cessation among adults. Adapting Arksey and O’Malley’s approach, we extracted and summarized key study characteristics, including inclusion criteria, participant characteristics, study durations, definitions of NPV exposure and smoking outcomes, covariate adjustment, and analytic methods. We also conducted regression and regression tree analyses to examine how these characteristics were related to study findings. Results We identified 28 articles comprising 38 analyses of NVP use and cigarette cessation. Of these, 24 studies (63.2%) reported a positive association, concluding that NVP use predicted cessation. Substantial heterogeneity existed across study characteristics. Evidence suggests that daily NVP use may promote cessation, whereas studies restricted to participants with an intention to quit were less likely to observe cessation than those including participants regardless of quit intention. Conclusions Researchers are advised against making broad claims based on any single PATH Study analysis of NVP use and smoking cessation. Rather, multiple studies using the same data source must be carefully examined in order to synthesize evidence and assess consistency of the findings. Implications Whether NVPs help adult smokers quit remains controversial in observational research, partly due to heterogeneity in study characteristics across studies using the same data source. Our review of observational studies based exclusively on a single data source—an approach often overlooked—suggests that (1) daily NVP use may support smoking cessation, and (2) studies that restricted participants to those with an intention to quit were less likely to observe cessation than studies that included participants regardless of quit intention. These findings underscore the value of multiple analyses using the same data source to synthesize evidence and assess consistency.
@article{xu2026effectiveness, title = {Effectiveness of nicotine vape products (E-cigarettes) as a smoking cessation aid for US adults: a narrative review of findings from the population assessment of tobacco and health study}, author = {Xu, Shu and Zhu, Jianan and Zhang, Yuxin and Hill, Jennifer and Feng, Yang and Abrams, David and Niaura, Raymond S}, journal = {Nicotine \& Tobacco Research}, year = {2026}, publisher = {Oxford University Press (OUP)}, doi = {10.1093/ntr/ntag068} }

2025

A Latent Multilayer Graphical Model For Complex, Interdependent Systems

Martin Ondrus, Ivor Cribben, and Yang Feng

In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

Bib PDF Code

@inproceedings{ondruslatent,
  title = {A Latent Multilayer Graphical Model For Complex, Interdependent Systems},
  author = {Ondrus, Martin and Cribben, Ivor and Feng, Yang},
  booktitle = {The Thirty-ninth Annual Conference on Neural Information Processing Systems},
  year = {2025},
}

The effect of TERT promoter mutation on predicting meningioma outcomes: a multi-institutional cohort analysis

Karenna J Groff, Ruchit V Patel, Yang Feng, Hia S Ghosh, Miguel A Millares Chavez, and 6 more authors

The Lancet Oncology, 2025

DOI Bib

@article{groff2025effect,
  title = {The effect of TERT promoter mutation on predicting meningioma outcomes: a multi-institutional cohort analysis},
  author = {Groff, Karenna J and Patel, Ruchit V and Feng, Yang and Ghosh, Hia S and Chavez, Miguel A Millares and O'Brien, Joseph and Chen, William C and Nitturi, Vijay and Save, Akshay V and Youngblood, Mark W and others},
  journal = {The Lancet Oncology},
  volume = {26},
  number = {9},
  pages = {1178--1190},
  year = {2025},
  publisher = {Elsevier},
  doi = {10.1016/S1470-2045(25)00422-X}
}

Associations Between Hippocampal Transverse Relaxation Time and Amyloid PET in Cognitively Normal Aging Adults

Yu Veronica Sui, Arjun V Masurkar, Timothy M Shepherd, Yang Feng, Thomas Wisniewski, and 2 more authors

Journal of Magnetic Resonance Imaging, 2025

Abs DOI Bib

Background Identifying early neuropathological changes in Alzheimer’s disease (AD) is important for improving treatment efficacy. Among quantitative MRI measures, transverse relaxation time (T2) has been shown to reflect tissue microstructure relevant in aging and neurodegeneration; however, findings regarding T2 changes in both normal aging and AD have been inconsistent. The association between T2 and amyloid‐beta (Aβ) accumulation, a hallmark of AD pathology, is also unclear, particularly in cognitively normal individuals who may be in preclinical stages of the disease. Purpose To investigate longitudinal hippocampal T2 changes in a cognitively normal cohort of older adults and their association with global Aβ accumulation. Study Type Retrospective, longitudinal. Subjects 56 cognitively normal adults between 55 and 90 years of age (17 males and 39 females). Field Strength/Sequence 3 Tesla; multi‐echo spin echo sequence for T2 mapping; 18F‐florbetaben positron emission tomography for Aβ measurement. Assessment Bilateral hippocampal T2 and volume were extracted to relate to Aβ PET measurements. To understand variations in AD risk, participants were separated into Aβ‐high and Aβ‐low subgroups using a predetermined threshold. Statistical Tests Linear mixed‐effect models and general linear models were used. A p ‐value < 0.025 was considered significant to account for bilateral comparisons. Results Older age was associated with increased T2 in the bilateral hippocampus (left: β = 0.30, right: β = 0.25) and smaller hippocampal volume on the left ( β = −0.12). In the Aβ‐low subgroup, both longitudinal T2 increase rates ( β = 0.65) in the left hippocampus and bilateral cross‐sectional T2 (left: β = 0.64, right: β = 0.46) were positively correlated with Aβ PET, independent of hippocampal volume. Data Conclusion This study provided in vivo evidence linking hippocampal T2 to Aβ accumulation in cognitively normal aging individuals, suggesting that quantitative T2 may be sensitive to microstructural changes accompanying early Aβ pathology, such as neuroinflammation, demyelination, and reduced tissue integrity. Evidence Level 3. Technical Efficacy Stage 2.
@article{sui2025associations, title = {Associations Between Hippocampal Transverse Relaxation Time and Amyloid PET in Cognitively Normal Aging Adults}, author = {Sui, Yu Veronica and Masurkar, Arjun V and Shepherd, Timothy M and Feng, Yang and Wisniewski, Thomas and Rusinek, Henry and Lazar, Mariana}, journal = {Journal of Magnetic Resonance Imaging}, year = {2025}, publisher = {Wiley Online Library}, doi = {https://doi.org/10.1002/jmri.70097} }
GeoERM: Geometry-Aware Multi-Task Representation Learning on Riemannian Manifolds

Multi-Task Learning

Aoran Chen, and Yang Feng

arXiv preprint arXiv:2505.02972, 2025

Abs arXiv Bib Code

Multi-Task Learning (MTL) seeks to boost statistical power and learning efficiency by discovering structure shared across related tasks. State-of-the-art MTL representation methods, however, usually treat the latent representation matrix as a point in ordinary Euclidean space, ignoring its often non-Euclidean geometry, thus sacrificing robustness when tasks are heterogeneous or even adversarial. We propose GeoERM, a geometry-aware MTL framework that embeds the shared representation on its natural Riemannian manifold and optimizes it via explicit manifold operations. Each training cycle performs (i) a Riemannian gradient step that respects the intrinsic curvature of the search space, followed by (ii) an efficient polar retraction to remain on the manifold, guaranteeing geometric fidelity at every iteration. The procedure applies to a broad class of matrix-factorized MTL models and retains the same per-iteration cost as Euclidean baselines. Across a set of synthetic experiments with task heterogeneity and on a wearable-sensor activity-recognition benchmark, GeoERM consistently improves estimation accuracy, reduces negative transfer, and remains stable under adversarial label noise, outperforming leading MTL and single-task alternatives.
@article{chen2025geoerm, title = {GeoERM: Geometry-Aware Multi-Task Representation Learning on Riemannian Manifolds}, author = {Chen, Aoran and Feng, Yang}, journal = {arXiv preprint arXiv:2505.02972}, year = {2025}, }
Consistent Estimation of the Number of Communities in Non-uniform Hypergraph Model

Network Analysis

Zuofeng Shang, Zheng Zhang, and Yang Feng

Stat, 2025

Abs DOI Bib

We propose an algorithm based on cross‐validation to estimate the number of communities in a general non‐uniform hypergraph model. The algorithm involves a three‐step process. Initially, it randomly divides the set of hyperedges into a training set and a testing set. Subsequently, for each candidate number of communities, we construct a spectral estimation of community labels and least square estimation of the hyperedge probabilities based on the training set. The final step involves the computation of cross‐validation scores using the testing set. The proposed algorithm is shown to be consistent when the number of vertices tends to infinity.
@article{shang2025consistent, title = {Consistent Estimation of the Number of Communities in Non-uniform Hypergraph Model}, author = {Shang, Zuofeng and Zhang, Zheng and Feng, Yang}, journal = {Stat}, volume = {14}, number = {2}, pages = {e70066}, year = {2025}, publisher = {Wiley Online Library}, doi = {https://doi.org/10.1002/sta4.70066} }
Learning from Similar Linear Representations: Adaptivity, Minimaxity, and Robustness

Machine Learning, Transfer Learning, Multi-task Learning

Ye Tian, Yuqi Gu, and Yang Feng

Journal of Machine Learning Research, 2025

SLDS Student Paper Award Abs DOI arXiv Bib Code

Ye Tian won the 2025 student paper award in the ASA Section for Statistical Learning and Data Science

Representation multi-task learning (MTL) has achieved tremendous success in practice. However, the theoretical understanding of these methods is still lacking. Most existing theoretical works focus on cases where all tasks share the same representation, and claim that MTL almost always improves performance. Nevertheless, as the number of tasks grows, assuming all tasks share the same representation is unrealistic. Furthermore, empirical findings often indicate that a shared representation does not necessarily improve single-task learning performance. In this paper, we aim to understand how to learn from tasks with \\textitsimilar but not exactly the same linear representations, while dealing with outlier tasks. Assuming a known intrinsic dimension, we propose a penalized empirical risk minimization method and a spectral method that are \\textitadaptive to the similarity structure and \\textitrobust to outlier tasks. Both algorithms outperform single-task learning when representations across tasks are sufficiently similar and the proportion of outlier tasks is small. Moreover, they always perform at least as well as single-task learning, even when the representations are dissimilar. We provide information-theoretic lower bounds to demonstrate that both methods are nearly \\textitminimax optimal in a large regime, with the spectral method being optimal in the absence of outlier tasks. Additionally, we introduce a thresholding algorithm to adapt to an unknown intrinsic dimension. We conduct extensive numerical experiments to validate our theoretical findings.
@article{tian2025learning, author = {Tian, Ye and Gu, Yuqi and Feng, Yang}, journal = {Journal of Machine Learning Research}, title = {Learning from Similar Linear Representations: Adaptivity, Minimaxity, and Robustness}, volume = {26}, number = {187}, pages = {1--125}, year = {2025}, doi = {https://www.jmlr.org/papers/v26/23-0902.html}, }
Semiparametric Modeling and Analysis for Longitudinal Network Data

Network Analysis, Nonparametric Statistics

Yinqiu He, Jiajin Sun, Yuang Tian, Zhiliang Ying, and Yang Feng

Annals of Statistics, 2025

Abs DOI arXiv Bib Code

We introduce a semiparametric latent space model for analyzing longitudinal network data. The model consists of a static latent space component and a time-varying node-specific baseline component. We develop a semiparametric efficient score equation for the latent space parameter by adjusting for the baseline nuisance component. Estimation is accomplished through a one-step update estimator and an appropriately penalized maximum likelihood estimator. We derive oracle error bounds for the two estimators and address identifiability concerns from a quotient manifold perspective. Our approach is demonstrated using the New York Citi Bike Dataset.
@article{he2025semiparametric, author = {He, Yinqiu and Sun, Jiajin and Tian, Yuang and Ying, Zhiliang and Feng, Yang}, journal = {Annals of Statistics}, title = {Semiparametric Modeling and Analysis for Longitudinal Network Data}, year = {2025}, doi = {10.1214/25-AOS2506}, }
Design-Based Causal Inference with Missing Outcomes: Missingness Mechanisms, Imputation-Assisted Randomization Tests, and Covariate Adjustment

Machine Learning, Causal Inference

Siyu Heng, Jiawei Zhang, and Yang Feng

Journal of the American Statistical Association, 2025

IMS New Researcher Travel Award Abs DOI arXiv Bib Code

Siyu Heng won the 2024 IMS New Researcher Travel Award

Design-based causal inference, also known as randomization-based or finite-population causal inference, is one of the most widely used causal inference frameworks, largely due to the merit that its validity can be guaranteed by study design (e.g., randomized experiments) and does not require assuming specific outcome-generating distributions or super-population models. Despite its advantages, design-based causal inference can still suffer from other issues, among which outcome missingness is a prevalent and significant challenge. This work systematically studies the outcome missingness problem in design-based causal inference. First, we propose a general and flexible outcome missingness mechanism that can facilitate finite-population-exact randomization tests of no treatment effect. Second, under this general missingness mechanism, we propose a general framework called “imputation and re-imputation" for conducting randomization tests in design-based causal inference with missing outcomes. We prove that our framework can still ensure finite-population-exact type-I error rate control even when the imputation model was misspecified or when unobserved covariates or interference exist in the missingness mechanism. Third, we extend our framework to conduct covariate adjustment in randomization tests and construct finite-population-valid confidence regions with missing outcomes. Our framework is evaluated via extensive simulation studies and applied to a large-scale randomized experiment.
@article{heng2023design, author = {Heng, Siyu and Zhang, Jiawei and Feng, Yang}, journal = {Journal of the American Statistical Association}, title = {Design-Based Causal Inference with Missing Outcomes: Missingness Mechanisms, Imputation-Assisted Randomization Tests, and Covariate Adjustment}, year = {2025}, doi = {10.1080/01621459.2025.2516204}, }

2024

L1-Penalized Multinomial Regression: Estimation, Inference, and Prediction, With an Application to Risk Factor Identification for Different Dementia Subtypes

High-dimensional Statistics, Classification

Ye Tian, Henry Rusinek, Arjun V Masurkar, and Yang Feng

Statistics in Medicine, 2024

Abs DOI arXiv Bib Code

High‐dimensional multinomial regression models are very useful in practice but have received less research attention than logistic regression models, especially from the perspective of statistical inference. In this work, we analyze the estimation and prediction error of the contrast‐based ‐penalized multinomial regression model and extend the debiasing method to the multinomial case, providing a valid confidence interval for each coefficient and value of the individual hypothesis test. We also examine cases of model misspecification and non‐identically distributed data to demonstrate the robustness of our method when some assumptions are violated. We apply the debiasing method to identify important predictors in the progression into dementia of different subtypes. Results from extensive simulations show the superiority of the debiasing method compared to other inference methods.
@article{tian2024l1, title = {L1-Penalized Multinomial Regression: Estimation, Inference, and Prediction, With an Application to Risk Factor Identification for Different Dementia Subtypes}, author = {Tian, Ye and Rusinek, Henry and Masurkar, Arjun V and Feng, Yang}, journal = {Statistics in Medicine}, year = {2024}, publisher = {John Wiley \& Sons, Inc.}, doi = {10.1002/sim.10263}, }
Neyman-pearson multi-class classification via cost-sensitive learning

Machine Learning, Neyman-Pearson Classification

Ye Tian, and Yang Feng

Journal of the American Statistical Association, 2024

Abs DOI arXiv Bib Code

Most existing classification methods aim to minimize the overall misclassification error rate. However, in applications such as loan default prediction, different types of errors can have varying consequences. To address this asymmetry issue, two popular paradigms have been developed: the Neyman-Pearson (NP) paradigm and the cost-sensitive (CS) paradigm. Previous studies on the NP paradigm have primarily focused on the binary case, while the multi-class NP problem poses a greater challenge due to its unknown feasibility. In this work, we tackle the multi-class NP problem by establishing a connection with the CS problem via strong duality and propose two algorithms. We extend the concept of NP oracle inequalities, crucial in binary classifications, to NP oracle properties in the multi-class context. Our algorithms satisfy these NP oracle properties under certain conditions. Furthermore, we develop practical algorithms to assess the feasibility and strong duality in multi-class NP problems, which can offer practitioners the landscape of a multi-class NP problem with various target error levels. Simulations and real data studies validate the effectiveness of our algorithms. To our knowledge, this is the first study to address the multi-class NP problem with theoretical guarantees. The proposed algorithms have been implemented in the R package \\textttnpcs, which is available on CRAN.
@article{tian2024neyman, author = {Tian, Ye and Feng, Yang}, journal = {Journal of the American Statistical Association}, title = {Neyman-pearson multi-class classification via cost-sensitive learning}, year = {2024}, number = {just-accepted}, pages = {1--23}, publisher = {Taylor \& Francis}, doi = {10.1080/01621459.2024.2402567}, }
Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models

Machine Learning, Transfer Learning, Multi-task Learning

Ye Tian, Haolei Weng, Lucy Xia, and Yang Feng

arXiv preprint arXiv:2209.15224, 2024

Abs arXiv Bib Code

Unsupervised learning has been widely used in many real-world applications. One of the simplest and most important unsupervised learning models is the Gaussian mixture model (GMM). In this work, we study the multi-task learning problem on GMMs, which aims to leverage potentially similar GMM parameter structures among tasks to obtain improved learning performance compared to single-task learning. We propose a multi-task GMM learning procedure based on the EM algorithm that effectively utilizes unknown similarities between related tasks and is robust against a fraction of outlier tasks from arbitrary distributions. The proposed procedure is shown to achieve the minimax optimal rate of convergence for both parameter estimation error and the excess mis-clustering error, in a wide range of regimes. Moreover, we generalize our approach to tackle the problem of transfer learning for GMMs, where similar theoretical results are derived. Additionally, iterative unsupervised multi-task and transfer learning methods may suffer from an initialization alignment problem, and two alignment algorithms are proposed to resolve the issue. Finally, we demonstrate the effectiveness of our methods through simulations and real data examples. To the best of our knowledge, this is the first work studying multi-task and transfer learning on GMMs with theoretical guarantees.
@article{tian2024unsupervised, author = {Tian, Ye and Weng, Haolei and Xia, Lucy and Feng, Yang}, journal = {arXiv preprint arXiv:2209.15224}, title = {Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models}, year = {2024}, }
Prognostic value of DNA methylation subclassification, aneuploidy, and CDKN2A/B homozygous deletion in predicting clinical outcome of IDH mutant astrocytomas

Kristyn Galbraith, Mekka Garcia, Siyu Wei, Anna Chen, Chanel Schroff, and 6 more authors

Neuro-oncology, 2024

Abs DOI Bib

Background Isocitrate dehydrogenase (IDH) mutant astrocytoma grading, until recently, has been entirely based on morphology. The 5th edition of the Central Nervous System World Health Organization (WHO) introduces CDKN2A/B homozygous deletion as a biomarker of grade 4. We sought to investigate the prognostic impact of DNA methylation-derived molecular biomarkers for IDH mutant astrocytoma. Methods We analyzed 98 IDH mutant astrocytomas diagnosed at NYU Langone Health between 2014 and 2022. We reviewed DNA methylation subclass, CDKN2A/B homozygous deletion, and ploidy and correlated molecular biomarkers with histological grade, progression free (PFS), and overall (OS) survival. Findings were confirmed using 2 independent validation cohorts. Results There was no significant difference in OS or PFS when stratified by histologic WHO grade alone, copy number complexity, or extent of resection. OS was significantly different when patients were stratified either by CDKN2A/B homozygous deletion or by DNA methylation subclass (P value =.0286 and.0016, respectively). None of the molecular biomarkers were associated with significantly better PFS, although DNA methylation classification showed a trend (P value =.0534). Conclusions The current WHO recognized grading criteria for IDH mutant astrocytomas show limited prognostic value. Stratification based on DNA methylation shows superior prognostic value for OS.
@article{galbraith2024prognostic, author = {Galbraith, Kristyn and Garcia, Mekka and Wei, Siyu and Chen, Anna and Schroff, Chanel and Serrano, Jonathan and Pacione, Donato and Placantonakis, Dimitris G and William, Christopher M and Faustin, Arline and others}, journal = {Neuro-oncology}, number = {6}, pages = {1042--1051}, publisher = {Oxford University Press US}, title = {Prognostic value of DNA methylation subclassification, aneuploidy, and CDKN2A/B homozygous deletion in predicting clinical outcome of IDH mutant astrocytomas}, volume = {26}, doi = {10.1093/neuonc/noae009}, year = {2024} }
Omics feature selection with the extended SIS R package: identification of a body mass index epigenetic multi-marker in the Strong Heart Study

Machine Learning, High-dimensional Statistics, Epidemiology

Arce Domingo-Relloso, Yang Feng, Zulema Rodriguez-Hernandez, Karin Haack, Shelley A Cole, and 3 more authors

American Journal of Epidemiology, 2024

Abs DOI Bib Code

The statistical analysis of omics data poses a great computational challenge given their ultra–high-dimensional nature and frequent between-features correlation. In this work, we extended the iterative sure independence screening (ISIS) algorithm by pairing ISIS with elastic-net (Enet) and 2 versions of adaptive elastic-net (adaptive elastic-net (AEnet) and multistep adaptive elastic-net (MSAEnet)) to efficiently improve feature selection and effect estimation in omics research. We subsequently used genome-wide human blood DNA methylation data from American Indian participants in the Strong Heart Study (n = 2235 participants; measured in 1989-1991) to compare the performance (predictive accuracy, coefficient estimation, and computational efficiency) of ISIS-paired regularization methods with that of a bayesian shrinkage and traditional linear regression to identify an epigenomic multimarker of body mass index (BMI). ISIS-AEnet outperformed the other methods in prediction. In biological pathway enrichment analysis of genes annotated to BMI-related differentially methylated positions, ISIS-AEnet captured most of the enriched pathways in common for at least 2 of all the evaluated methods. ISIS-AEnet can favor biological discovery because it identifies the most robust biological pathways while achieving an optimal balance between bias and efficient feature selection. In the extended SIS R package, we also implemented ISIS paired with Cox and logistic regression for time-to-event and binary endpoints, respectively, and a bootstrap approach for the estimation of regression coefficients.
@article{domingo2024omics, author = {Domingo-Relloso, Arce and Feng, Yang and Rodriguez-Hernandez, Zulema and Haack, Karin and Cole, Shelley A and Navas-Acien, Ana and Tellez-Plaza, Maria and Bermudez, Jose D}, journal = {American Journal of Epidemiology}, pages = {kwae006}, publisher = {Oxford University Press}, title = {Omics feature selection with the extended SIS R package: identification of a body mass index epigenetic multi-marker in the Strong Heart Study}, doi = {10.1093/aje/kwae006}, year = {2024}, }
Machine collaboration

Machine Learning, High-dimensional Statistics

Qingfeng Liu, and Yang Feng

Stat, 2024

Abs arXiv Bib

We propose a new ensemble framework for supervised learning, called machine collaboration (MaC), using a collection of base machines for prediction tasks. Unlike bagging/stacking (a parallel & independent framework) and boosting (a sequential & top-down framework), MaC is a type of circular & interactive learning framework. The circular & interactive feature helps the base machines to transfer information circularly and update their structures and parameters accordingly. The theoretical result on the risk bound of the estimator from MaC reveals that the circular & interactive feature can help MaC reduce risk via a parsimonious ensemble. We conduct extensive experiments on MaC using both simulated data and 119 benchmark real datasets. The results demonstrate that in most cases, MaC performs significantly better than several other state-of-the-art methods, including classification and regression trees, neural networks, stacking, and boosting.
@article{liu2024machine, author = {Liu, Qingfeng and Feng, Yang}, journal = {Stat}, number = {1}, pages = {e661}, publisher = {Wiley Online Library}, title = {Machine collaboration}, volume = {13}, year = {2024}, }
Federated Transfer Learning with Differential Privacy

Machine Learning, Transfer Learning, Differential Privacy, Federated Learning

Mengchu Li, Ye Tian, Yang Feng, and Yi Yu

arXiv preprint arXiv:2403.11343, 2024

Abs arXiv Bib

Federated learning has emerged as a powerful framework for analysing distributed data, yet two challenges remain pivotal: heterogeneity across sites and privacy of local data. In this paper, we address both challenges within a federated transfer learning framework, aiming to enhance learning on a target data set by leveraging information from multiple heterogeneous source data sets while adhering to privacy constraints. We rigorously formulate the notion of federated differential privacy, which offers privacy guarantees for each data set without assuming a trusted central server. Under this privacy model, we study four statistical problems: univariate mean estimation, low-dimensional linear regression, high-dimensional linear regression, and M-estimation. By investigating the minimax rates and quantifying the cost of privacy, we show that federated differential privacy is an intermediate privacy model between the well-established local and central models of differential privacy. Our analyses account for data heterogeneity and privacy, highlighting the fundamental costs associated with each factor and the benefits of knowledge transfer in federated learning.
@article{li2024federated, author = {Li, Mengchu and Tian, Ye and Feng, Yang and Yu, Yi}, journal = {arXiv preprint arXiv:2403.11343}, title = {Federated Transfer Learning with Differential Privacy}, year = {2024}, }
Yang Feng and Jiajin Sun’s contribution to the Discussion of ‘Root and community inference on the latent growth process of a network’by Crane and Xu

Network Analysis

Yang Feng, and Jiajin Sun

Journal of the Royal Statistical Society Series B: Statistical Methodology, 2024

Abs DOI Bib

Many statistical models for networks overlook the fact that most real-world networks are formed through a growth process.To address this, we introduce the Preferential Attachment Plus Erdo s-Rnyi model, where we let a random network G be the union of a preferential attachment (PA) tree T and additional Erdo s-Rnyi (ER) random edges.The PA tree captures the underlying growth process of a network where vertices/edges are added sequentially, while the ER component can be regarded as noise.Given only one snapshot of the final network G, we study the problem of constructing confidence sets for the root node of the unobserved growth process; the root node can be patient zero in an infection network or the source of fake news in a social network.We propose inference algorithms based on Gibbs sampling that scales to networks with millions of nodes and provide theoretical analysis showing that the size of the confidence set is small if the noise level of the ER edges is not too large.We also propose variations of the model in which multiple growth processes occur simultaneously, reflecting the growth of multiple communities; we use these models to provide a new approach to community detection.
@article{feng2024yang, author = {Feng, Yang and Sun, Jiajin}, journal = {Journal of the Royal Statistical Society Series B: Statistical Methodology}, number = {4}, pages = {875--878}, publisher = {Oxford University Press US}, title = {Yang Feng and Jiajin Sun's contribution to the Discussion of `Root and community inference on the latent growth process of a network'by Crane and Xu}, volume = {86}, doi = {10.1093/jrsssb/qkae055}, year = {2024}, }
Towards the Theory of Unsupervised Federated Learning: Non-asymptotic Analysis of Federated EM Algorithms

Machine Learning, High-dimensional Statistics, Federated Learning, Transfer Learning, Multi-task Learning

Ye Tian, Haolei Weng, and Yang Feng

In Forty-first International Conference on Machine Learning, 2024

Abs DOI arXiv Bib Code

While supervised federated learning approaches have enjoyed significant success, the domain of unsupervised federated learning remains relatively underexplored. Several federated EM algorithms have gained popularity in practice, however, their theoretical foundations are often lacking. In this paper, we first introduce a federated gradient EM algorithm (FedGrEM) designed for the unsupervised learning of mixture models, which supplements the existing federated EM algorithms by considering task heterogeneity and potential adversarial attacks. We present a comprehensive finite-sample theory that holds for general mixture models, then apply this general theory on specific statistical models to characterize the explicit estimation error of model parameters and mixture proportions. Our theory elucidates when and how FedGrEM outperforms local single-task learning with insights extending to existing federated EM algorithms. This bridges the gap between their practical success and theoretical understanding. Our numerical results validate our theory, and demonstrate FedGrEM’s superiority over existing unsupervised federated learning benchmarks.
@inproceedings{tian2024towards, author = {Tian, Ye and Weng, Haolei and Feng, Yang}, booktitle = {Forty-first International Conference on Machine Learning}, title = {Towards the Theory of Unsupervised Federated Learning: Non-asymptotic Analysis of Federated EM Algorithms}, year = {2024}, doi = {10.5555/3692070.3694041}, }
Variational Nonparametric Inference in Functional Stochastic Block Model

Network Analysis

Zuofeng Shang, Peijun Sang, Yang Feng, and Chong Jin

arXiv preprint arXiv:2407.00564, 2024

Abs arXiv Bib

We propose a functional stochastic block model whose vertices involve functional data information. This new model extends the classic stochastic block model with vector-valued nodal information, and finds applications in real-world networks whose nodal information could be functional curves. Examples include international trade data in which a network vertex (country) is associated with the annual or quarterly GDP over certain time period, and MyFitnessPal data in which a network vertex (MyFitnessPal user) is associated with daily calorie information measured over certain time period. Two statistical tasks will be jointly executed. First, we will detect community structures of the network vertices assisted by the functional nodal information. Second, we propose computationally efficient variational test to examine the significance of the functional nodal information. We show that the community detection algorithms achieve weak and strong consistency, and the variational test is asymptotically chi-square with diverging degrees of freedom. As a byproduct, we propose pointwise confidence intervals for the slop function of the functional nodal information. Our methods are examined through both simulated and real datasets.
@article{shang2024variational, author = {Shang, Zuofeng and Sang, Peijun and Feng, Yang and Jin, Chong}, journal = {arXiv preprint arXiv:2407.00564}, title = {Variational Nonparametric Inference in Functional Stochastic Block Model}, year = {2024}, }
Racial distribution of molecularly classified brain tumors

Camila S Fang, Wanyi Wang, Chanel Schroff, Misha Movahed-Ezazi, Varshini Vasudevaraja, and 6 more authors

Neuro-Oncology Advances, 2024

Abs DOI Bib

Background In many cancers, specific subtypes are more prevalent in specific racial backgrounds. However, little is known about the racial distribution of specific molecular types of brain tumors. Public data repositories lack data on many brain tumor subtypes as well as diagnostic annotation using the current World Health Organization classification. A better understanding of the prevalence of brain tumors in different racial backgrounds may provide insight into tumor predisposition and development, and improve prevention. Methods We retrospectively analyzed the racial distribution of 1709 primary brain tumors classified by their methylation profiles using clinically validated whole genome DNA methylation. Self-reported race was obtained from medical records. Our cohort included 82% White, 10% Black, and 8% Asian patients with 74% of patients reporting their race. Results There was a significant difference in the racial distribution of specific types of brain tumors. Blacks were overrepresented in pituitary adenomas (35%, P <.001), with the largest proportion of FSH/LH subtype. Whites were underrepresented at 47% of all pituitary adenoma patients (P <.001). Glioblastoma (GBM) IDH wild-type showed an enrichment of Whites, at 90% (P <.001), and a significantly smaller percentage of Blacks, at 3% (P <.001). Conclusions Molecularly classified brain tumor groups and subgroups show different distributions among the three main racial backgrounds suggesting the contribution of race to brain tumor development.
@article{fang2024racial, author = {Fang, Camila S and Wang, Wanyi and Schroff, Chanel and Movahed-Ezazi, Misha and Vasudevaraja, Varshini and Serrano, Jonathan and Sulman, Erik P and Golfinos, John G and Orringer, Daniel and Galbraith, Kristyn and others}, journal = {Neuro-Oncology Advances}, number = {1}, pages = {vdae135}, publisher = {Oxford University Press US}, title = {Racial distribution of molecularly classified brain tumors}, volume = {6}, doi = {10.1093/noajnl/vdae135}, year = {2024} }
Multi-label Random Subspace Ensemble Classification

Machine Learning, High-dimensional Statistics

Fan Bi, Jianan Zhu, and Yang Feng

Journal of Computational and Graphical Statistics, 2024

Abs DOI Bib Code

In this work, we develop a new ensemble learning framework, multi-label Random Subspace Ensemble (mRaSE), for multi-label classification. Given a base classifier (e.g., multinomial logistic regression, classification tree, K-nearest neighbors), mRaSE works by first randomly sampling a collection of subspaces, then choosing the best ones that achieve the minimum cross-validation errors and, finally, aggregating the chosen weak learners. In addition to its superior prediction performance, mRaSE also provides a model-free feature ranking depending on the given base classifier. An iterative version of mRaSE is also developed to further improve the performance. A model-free extension is pursued on the iterative version, leading to the so-called Super mRaSE, which accepts a collection of base classifiers as input to the algorithm. We show the proposed algorithms compared favorably with the state-of-the-art classification algorithm including random forest and deep neural network, via extensive simulation studies and two real data applications. The new algorithms are implemented in an updated version of the R package RaSEn.
@article{bi2024multi, author = {Bi, Fan and Zhu, Jianan and Feng, Yang}, journal = {Journal of Computational and Graphical Statistics}, number = {just-accepted}, pages = {1--20}, publisher = {Taylor \& Francis}, title = {Multi-label Random Subspace Ensemble Classification}, doi = {10.1080/10618600.2024.2421248}, year = {2024}, }

2023

Spectral clustering via adaptive layer aggregation for multi-layer networks

Network Analysis

Sihan Huang, Haolei Weng, and Yang Feng

Journal of Computational and Graphical Statistics, 2023

Abs arXiv Bib PDF Code

One of the fundamental problems in network analysis is detecting community structure in multi-layer networks, of which each layer represents one type of edge information among the nodes. We propose integrative spectral clustering approaches based on effective convex layer aggregations. Our aggregation methods are strongly motivated by a delicate asymptotic analysis of the spectral embedding of weighted adjacency matrices and the downstream \k\-means clustering, in a challenging regime where community detection consistency is impossible. In fact, the methods are shown to estimate the optimal convex aggregation, which minimizes the mis-clustering error under some specialized multi-layer network models. Our analysis further suggests that clustering using Gaussian mixture models is generally superior to the commonly used \k\-means in spectral clustering. Extensive numerical studies demonstrate that our adaptive aggregation techniques, together with Gaussian mixture model clustering, make the new spectral clustering remarkably competitive compared to several popularly used methods.
@article{huang2023spectral, author = {Huang, Sihan and Weng, Haolei and Feng, Yang}, journal = {Journal of Computational and Graphical Statistics}, number = {32}, pages = {1170--1184}, publisher = {Taylor \& Francis}, title = {Spectral clustering via adaptive layer aggregation for multi-layer networks}, year = {2023}, }
Transfer learning under high-dimensional generalized linear models

Transfer Learning, High-dimensional Statistics

Ye Tian, and Yang Feng

Journal of the American Statistical Association, 2023

Abs DOI arXiv Bib PDF Code

In this work, we study the transfer learning problem under high-dimensional generalized linear models (GLMs), which aim to improve the fit on target data by borrowing information from useful source data. Given which sources to transfer, we propose a transfer learning algorithm on GLM, and derive its \\\ell_1/\\ell_2\-estimation error bounds as well as a bound for a prediction error measure. The theoretical analysis shows that when the target and source are sufficiently close to each other, these bounds could be improved over those of the classical penalized estimator using only target data under mild conditions. When we don’t know which sources to transfer, an algorithm-free transferable source detection approach is introduced to detect informative sources. The detection consistency is proved under the high-dimensional GLM transfer learning setting. We also propose an algorithm to construct confidence intervals of each coefficient component, and the corresponding theories are provided. Extensive simulations and a real-data experiment verify the effectiveness of our algorithms. We implement the proposed GLM transfer learning algorithms in a new R package glmtrans, which is available on CRAN.
@article{tian2023transfer, author = {Tian, Ye and Feng, Yang}, journal = {Journal of the American Statistical Association}, title = {Transfer learning under high-dimensional generalized linear models}, year = {2023}, number = {544}, pages = {2684--2697}, volume = {118}, publisher = {Taylor \& Francis}, doi = {10.1080/01621459.2022.2071278}, }
Variable selection for high-dimensional generalized linear model with block-missing data

High-dimensional Statistics

Yifan He, Yang Feng, and Xinyuan Song

Scandinavian Journal of Statistics, 2023

Abs DOI Bib

In modern scientific research, multiblock missing data emerges with synthesizing information across multiple studies. However, existing imputation methods for handling block‐wise missing data either focus on the single‐block missing pattern or heavily rely on the model structure. In this study, we propose a single regression‐based imputation algorithm for multiblock missing data. First, we conduct a sparse precision matrix estimation based on the structure of block‐wise missing data. Second, we impute the missing blocks with their means conditional on the observed blocks. Theoretical results about variable selection and estimation consistency are established in the context of a generalized linear model. Moreover, simulation studies show that compared with existing methods, the proposed imputation procedure is robust to various missing mechanisms because of the good properties of regression imputation. An application to Alzheimer’s Disease Neuroimaging Initiative data also confirms the superiority of our proposed method.
@article{he2023variable, author = {He, Yifan and Feng, Yang and Song, Xinyuan}, journal = {Scandinavian Journal of Statistics}, number = {3}, pages = {1279--1297}, title = {Variable selection for high-dimensional generalized linear model with block-missing data}, volume = {50}, doi = {10.1111/sjos.12632}, year = {2023}, }
DDAC-SpAM: A Distributed Algorithm for Fitting High-dimensional Sparse Additive Models with Feature Division and Decorrelation

High-dimensional Statistics

Yifan He, Ruiyang Wu, Yong Zhou, and Yang Feng

Journal of the American Statistical Association, 2023

Abs arXiv Bib PDF

Distributed statistical learning has become a popular technique for large-scale data analysis. Most existing work in this area focuses on dividing the observations, but we propose a new algorithm, DDAC-SpAM, which divides the features under a high-dimensional sparse additive model. Our approach involves three steps: divide, decorrelate, and conquer. The decorrelation operation enables each local estimator to recover the sparsity pattern for each additive component without imposing strict constraints on the correlation structure among variables. The effectiveness and efficiency of the proposed algorithm are demonstrated through theoretical analysis and empirical results on both synthetic and real data. The theoretical results include both the consistent sparsity pattern recovery as well as statistical inference for each additive functional component. Our approach provides a practical solution for fitting sparse additive models, with promising applications in a wide range of domains.
@article{he2023ddac, author = {He, Yifan and Wu, Ruiyang and Zhou, Yong and Feng, Yang}, journal = {Journal of the American Statistical Association}, title = {DDAC-SpAM: A Distributed Algorithm for Fitting High-dimensional Sparse Additive Models with Feature Division and Decorrelation}, year = {2023}, pages = {1--12}, publisher = {Taylor \& Francis}, }
PCABM: Pairwise Covariates-Adjusted Block Model for Community Detection

Network Analysis

Sihan Huang, Jiajin Sun, and Yang Feng

Journal of the American Statistical Association, 2023

Abs DOI arXiv Bib PDF Code

One of the most fundamental problems in network study is community detection. The stochastic block model (SBM) is a widely used model, for which various estimation methods have been developed with their community detection consistency results unveiled. However, the SBM is restricted by the strong assumption that all nodes in the same community are stochastically equivalent, which may not be suitable for practical applications. We introduce a pairwise covariates-adjusted stochastic block model (PCABM), a generalization of SBM that incorporates pairwise covariate information. We study the maximum likelihood estimates of the coefficients for the covariates as well as the community assignments. It is shown that both the coefficient estimates of the covariates and the community assignments are consistent under suitable sparsity conditions. Spectral clustering with adjustment (SCWA) is introduced to efficiently solve PCABM. Under certain conditions, we derive the error bound of community detection under SCWA and show that it is community detection consistent. In addition, we investigate model selection in terms of the number of communities and feature selection for the pairwise covariates, and propose two corresponding algorithms. PCABM compares favorably with the SBM or degree-corrected stochastic block model (DCBM) under a wide range of simulated and real networks when covariate information is accessible.
@article{huang2023pcabm, author = {Huang, Sihan and Sun, Jiajin and Feng, Yang}, journal = {Journal of the American Statistical Association}, title = {PCABM: Pairwise Covariates-Adjusted Block Model for Community Detection}, year = {2023}, pages = {1--13}, publisher = {Taylor \& Francis}, doi = {10.1080/01621459.2023.2244731}, }
A flexible quasi-likelihood model for microbiome abundance count data

Yiming Shi, Huilin Li, Chan Wang, Jun Chen, Hongmei Jiang, and 5 more authors

Statistics in Medicine, 2023

Abs DOI Bib

In this article, we present a flexible model for microbiome count data. We consider a quasi‐likelihood framework, in which we do not make any assumptions on the distribution of the microbiome count except that its variance is an unknown but smooth function of the mean. By comparing our model to the negative binomial generalized linear model (GLM) and Poisson GLM in simulation studies, we show that our flexible quasi‐likelihood method yields valid inferential results. Using a real microbiome study, we demonstrate the utility of our method by examining the relationship between adenomas and microbiota. We also provide an R package “fql” for the application of our method.
@article{shi2023flexible, author = {Shi, Yiming and Li, Huilin and Wang, Chan and Chen, Jun and Jiang, Hongmei and Shih, Ya-Chen T and Zhang, Haixiang and Song, Yizhe and Feng, Yang and Liu, Lei}, journal = {Statistics in Medicine}, number = {25}, pages = {4632--4643}, publisher = {John Wiley \& Sons, Inc. Hoboken, USA}, title = {A flexible quasi-likelihood model for microbiome abundance count data}, volume = {42}, doi = {10.1002/sim.9880}, year = {2023} }
Simulation of New York City’s Ventilator Allocation Guideline During the Spring 2020 COVID-19 Surge

Epidemiology

B Corbett Walsh, Jianan Zhu, Yang Feng, Kenneth A Berkowitz, Rebecca A Betensky, and 2 more authors

JAMA network open, 2023

Abs DOI Bib

Importance The spring 2020 surge of COVID-19 unprecedentedly strained ventilator supply in New York City, with many hospitals nearly exhausting available ventilators and subsequently seriously considering enacting crisis standards of care and implementing New York State Ventilator Allocation Guidelines (NYVAG). However, there is little evidence as to how NYVAG would perform if implemented. Objectives To evaluate the performance and potential improvement of NYVAG during a surge of patients with respect to the length of rationing, overall mortality, and worsening health disparities. Design, Setting, and Participants This cohort study included intubated patients in a single health system in New York City from March through July 2020. A total of 20 000 simulations were conducted of ventilator triage (10 000 following NYVAG and 10 000 following a proposed improved NYVAG) during a crisis period, defined as the point at which the prepandemic ventilator supply was 95% utilized. Exposures The NYVAG protocol for triage ventilators. Main Outcomes and Measures Comparison of observed survival rates with simulations of scenarios requiring NYVAG ventilator rationing. Results The total cohort included 1671 patients; of these, 674 intubated patients (mean [SD] age, 63.7 [13.8] years; 465 male [69.9%]) were included in the crisis period, with 571 (84.7%) testing positive for COVID-19. Simulated ventilator rationing occurred for 163.9 patients over 15.0 days, 44.4% (95% CI, 38.3%-50.0%) of whom would have survived if provided a ventilator while only 34.8% (95% CI, 28.5%-40.0%) of those newly intubated patients receiving a reallocated ventilator survived. While triage categorization at the time of intubation exhibited partial prognostic differentiation, 94.8% of all ventilator rationing occurred after a time trial. Within this subset, 43.1% were intubated for 7 or more days with a favorable SOFA score that had not improved. An estimated 60.6% of these patients would have survived if sustained on a ventilator. Revising triage subcategorization, proposed improved NYVAG, would have improved this alarming ventilator allocation inefficiency (25.3% [95% CI, 22.1%-28.4%] of those selected for ventilator rationing would have survived if provided a ventilator). NYVAG ventilator rationing did not exacerbate existing health disparities. Conclusions and Relevance In this cohort study of intubated patients experiencing simulated ventilator rationing during the apex of the New York City COVID-19 2020 surge, NYVAG diverted ventilators from patients with a higher chance of survival to those with a lower chance of survival. Future efforts should be focused on triage subcategorization, which improved this triage inefficiency, and ventilator rationing after a time trial, when most ventilator rationing occurred.
@article{walsh2023simulation, author = {Walsh, B Corbett and Zhu, Jianan and Feng, Yang and Berkowitz, Kenneth A and Betensky, Rebecca A and Nunnally, Mark E and Pradhan, Deepak R}, journal = {JAMA network open}, number = {10}, pages = {e2336736--e2336736}, publisher = {American Medical Association}, title = {Simulation of New York City's Ventilator Allocation Guideline During the Spring 2020 COVID-19 Surge}, volume = {6}, doi = {10.1001/jamanetworkopen.2023.36736}, year = {2023}, }

Comments on: Statistical inference and large-scale multiple testing for high-dimensional regression models

Ye Tian, and Yang Feng

Test, 2023

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009