Publications
publications in reversed chronological order. generated by jekyll-scholar.

For a more up-to-date list of publications, please visit My Google Scholar Page (By Year).
Filter by topic (click to show only papers tagged with that subject).
2026
- Variational Nonparametric Inference in Stochastic Block Models with Functional CovariatesZuofeng Shang, Peijun Sang, Yang Feng, and Chong JinJournal of the American Statistical Association, 2026
We propose a functional stochastic block model whose vertices involve functional data information. This new model extends the classic stochastic block model with vector-valued nodal information, and finds applications in real-world networks whose nodal information could be functional curves. Examples include international trade data in which a network vertex (country) is associated with the annual or quarterly GDP over a certain time period, and MyFitnessPal data in which a network vertex (MyFitnessPal user) is associated with daily calorie information measured over a certain time period. Two statistical tasks will be jointly executed. First, we will detect community structures of the network vertices assisted by the functional nodal information. Second, we propose a computationally efficient variational test to examine the significance of the functional nodal information. We show that the community detection algorithms achieve weak and strong consistency, and the variational test is asymptotically chi-square with diverging degrees of freedom. As a byproduct, we propose pointwise confidence intervals for the slope function of the functional nodal information. Our methods are examined through both simulated and real datasets. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
@article{shang2026variational, title = {Variational Nonparametric Inference in Stochastic Block Models with Functional Covariates}, author = {Shang, Zuofeng and Sang, Peijun and Feng, Yang and Jin, Chong}, journal = {Journal of the American Statistical Association}, pages = {1--13}, year = {2026}, publisher = {Informa UK Limited}, doi = {10.1080/01621459.2026.2654876} } - Robust Unsupervised Multi-task and Transfer Learning on Gaussian Mixture ModelsYe Tian, Haolei Weng, Lucy Xia, and Yang FengJournal of the American Statistical Association, 2026
Unsupervised learning has been widely used in many real-world applications. One of the simplest and most important unsupervised learning models is the Gaussian mixture model (GMM). In this work, we study the multi-task learning problem on GMMs, which aims to leverage potentially similar GMM parameter structures among tasks to obtain improved learning performance compared to single-task learning. We propose a multi-task GMM learning procedure based on the EM algorithm that effectively utilizes unknown similarities between related tasks and is robust against a fraction of outlier tasks from arbitrary distributions. The proposed procedure is shown to achieve the minimax optimal rate of convergence for both the parameter estimation error and the excess mis-clustering error, in a wide range of regimes. Moreover, we generalize our approach to tackle the problem of transfer learning for GMMs, where similar theoretical results are derived. Additionally, iterative unsupervised multi-task and transfer learning methods may suffer from an initialization alignment problem, and two alignment algorithms are proposed to resolve the issue. Finally, we demonstrate the effectiveness of our methods through simulations and real data examples. To the best of our knowledge, this is the first work studying multi-task and transfer learning on GMMs with theoretical guarantees.
@article{tian2026robust, title = {Robust Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models}, author = {Tian, Ye and Weng, Haolei and Xia, Lucy and Feng, Yang}, journal = {Journal of the American Statistical Association}, pages = {1--30}, year = {2026}, publisher = {Informa UK Limited}, doi = {10.1080/01621459.2026.2670031} } - Leveraging Kappa-Lambda Signatures in a Multistage Machine Learning Pipeline for B-Cell Lymphoma Detection by Flow CytometryIris Zhang, Sulov Chalise, Mikhail Roshal, Qi Gao, Menglei Zhu, and 1 more authorThe American Journal of Pathology, 2026
Flow cytometry immunophenotyping is essential for diagnosing B-cell lymphomas, but manual interpretation of high-dimensional data remains subjective, time-consuming, and prone to interoperator variability. Previous computational approaches often overlook clinically relevant principles, such as Ig light chain restriction. To address this gap, a biologically informed, three-stage machine learning pipeline that integrates Ig κ (IGK) and Ig λ (IGL) signatures to improve B-cell lymphoma detection was developed. A total of 200 peripheral blood samples (100 normal, 100 abnormal) were analyzed, comprising >15 million single-cell events characterized by 21 immunophenotypic markers. Three XGBoost models were trained sequentially: the first classified light chain expression (IGK, IGL, or nuisance), the second identified cell phenotypes using marker intensities and IGK/IGL-based neighborhood enrichment, and the third produced sample-level predictions based on aggregated cell features. The IGK/IGL classifier achieved 88.0% test accuracy [area under the receiver operating characteristic curve (AUC), 0.957], whereas the cell-level classification reached 92.9% accuracy (AUC, 0.983), with IGK/IGL enrichment as the most informative feature. Similarly, sample-level classification achieved 94.7% accuracy (AUC, 0.976), with improved performance when IGK/IGL enrichment was included. These findings demonstrate that incorporating biologically grounded features enhances both the accuracy and interpretability of automated flow cytometry analysis. This approach offers a scalable, reproducible, and clinically aligned alternative to the manual review of flow cytometry data for B-cell lymphomas.
@article{zhang2026leveraging, title = {Leveraging Kappa-Lambda Signatures in a Multistage Machine Learning Pipeline for B-Cell Lymphoma Detection by Flow Cytometry}, author = {Zhang, Iris and Chalise, Sulov and Roshal, Mikhail and Gao, Qi and Zhu, Menglei and Feng, Yang}, journal = {The American Journal of Pathology}, volume = {196}, number = {5}, pages = {1158--1168}, year = {2026}, publisher = {Elsevier BV}, doi = {10.1016/j.ajpath.2026.02.006} } - Effectiveness of nicotine vape products (E-cigarettes) as a smoking cessation aid for US adults: a narrative review of findings from the population assessment of tobacco and health studyShu Xu, Jianan Zhu, Yuxin Zhang, Jennifer Hill, Yang Feng, and 2 more authorsNicotine & Tobacco Research, 2026
Introduction Controversy remains regarding whether nicotine vaping products (NVPs) are associated with cigarette cessation in observational research. Reviews have largely overlooked studies using the same data source. To address this gap, we conducted a narrative review to examine the heterogeneity in the reported association that used data from the same source, which may help to explain inconsistent findings. Methods We identified empirical studies through PubMed and Google searches that exclusively used the Population Assessment of Tobacco and Health (PATH) Study data to examine associations between NVP use and smoking cessation among adults. Adapting Arksey and O’Malley’s approach, we extracted and summarized key study characteristics, including inclusion criteria, participant characteristics, study durations, definitions of NPV exposure and smoking outcomes, covariate adjustment, and analytic methods. We also conducted regression and regression tree analyses to examine how these characteristics were related to study findings. Results We identified 28 articles comprising 38 analyses of NVP use and cigarette cessation. Of these, 24 studies (63.2%) reported a positive association, concluding that NVP use predicted cessation. Substantial heterogeneity existed across study characteristics. Evidence suggests that daily NVP use may promote cessation, whereas studies restricted to participants with an intention to quit were less likely to observe cessation than those including participants regardless of quit intention. Conclusions Researchers are advised against making broad claims based on any single PATH Study analysis of NVP use and smoking cessation. Rather, multiple studies using the same data source must be carefully examined in order to synthesize evidence and assess consistency of the findings. Implications Whether NVPs help adult smokers quit remains controversial in observational research, partly due to heterogeneity in study characteristics across studies using the same data source. Our review of observational studies based exclusively on a single data source—an approach often overlooked—suggests that (1) daily NVP use may support smoking cessation, and (2) studies that restricted participants to those with an intention to quit were less likely to observe cessation than studies that included participants regardless of quit intention. These findings underscore the value of multiple analyses using the same data source to synthesize evidence and assess consistency.
@article{xu2026effectiveness, title = {Effectiveness of nicotine vape products (E-cigarettes) as a smoking cessation aid for US adults: a narrative review of findings from the population assessment of tobacco and health study}, author = {Xu, Shu and Zhu, Jianan and Zhang, Yuxin and Hill, Jennifer and Feng, Yang and Abrams, David and Niaura, Raymond S}, journal = {Nicotine \& Tobacco Research}, year = {2026}, publisher = {Oxford University Press (OUP)}, doi = {10.1093/ntr/ntag068} }
2025
- A Latent Multilayer Graphical Model For Complex, Interdependent SystemsMartin Ondrus, Ivor Cribben, and Yang FengIn The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
@inproceedings{ondruslatent, title = {A Latent Multilayer Graphical Model For Complex, Interdependent Systems}, author = {Ondrus, Martin and Cribben, Ivor and Feng, Yang}, booktitle = {The Thirty-ninth Annual Conference on Neural Information Processing Systems}, year = {2025}, } - The effect of TERT promoter mutation on predicting meningioma outcomes: a multi-institutional cohort analysisKarenna J Groff, Ruchit V Patel, Yang Feng, Hia S Ghosh, Miguel A Millares Chavez, and 6 more authorsThe Lancet Oncology, 2025
@article{groff2025effect, title = {The effect of TERT promoter mutation on predicting meningioma outcomes: a multi-institutional cohort analysis}, author = {Groff, Karenna J and Patel, Ruchit V and Feng, Yang and Ghosh, Hia S and Chavez, Miguel A Millares and O'Brien, Joseph and Chen, William C and Nitturi, Vijay and Save, Akshay V and Youngblood, Mark W and others}, journal = {The Lancet Oncology}, volume = {26}, number = {9}, pages = {1178--1190}, year = {2025}, publisher = {Elsevier}, doi = {10.1016/S1470-2045(25)00422-X} } - Associations Between Hippocampal Transverse Relaxation Time and Amyloid PET in Cognitively Normal Aging AdultsYu Veronica Sui, Arjun V Masurkar, Timothy M Shepherd, Yang Feng, Thomas Wisniewski, and 2 more authorsJournal of Magnetic Resonance Imaging, 2025
Background Identifying early neuropathological changes in Alzheimer’s disease (AD) is important for improving treatment efficacy. Among quantitative MRI measures, transverse relaxation time (T2) has been shown to reflect tissue microstructure relevant in aging and neurodegeneration; however, findings regarding T2 changes in both normal aging and AD have been inconsistent. The association between T2 and amyloid‐beta (Aβ) accumulation, a hallmark of AD pathology, is also unclear, particularly in cognitively normal individuals who may be in preclinical stages of the disease. Purpose To investigate longitudinal hippocampal T2 changes in a cognitively normal cohort of older adults and their association with global Aβ accumulation. Study Type Retrospective, longitudinal. Subjects 56 cognitively normal adults between 55 and 90 years of age (17 males and 39 females). Field Strength/Sequence 3 Tesla; multi‐echo spin echo sequence for T2 mapping; 18F‐florbetaben positron emission tomography for Aβ measurement. Assessment Bilateral hippocampal T2 and volume were extracted to relate to Aβ PET measurements. To understand variations in AD risk, participants were separated into Aβ‐high and Aβ‐low subgroups using a predetermined threshold. Statistical Tests Linear mixed‐effect models and general linear models were used. A p ‐value < 0.025 was considered significant to account for bilateral comparisons. Results Older age was associated with increased T2 in the bilateral hippocampus (left: β = 0.30, right: β = 0.25) and smaller hippocampal volume on the left ( β = −0.12). In the Aβ‐low subgroup, both longitudinal T2 increase rates ( β = 0.65) in the left hippocampus and bilateral cross‐sectional T2 (left: β = 0.64, right: β = 0.46) were positively correlated with Aβ PET, independent of hippocampal volume. Data Conclusion This study provided in vivo evidence linking hippocampal T2 to Aβ accumulation in cognitively normal aging individuals, suggesting that quantitative T2 may be sensitive to microstructural changes accompanying early Aβ pathology, such as neuroinflammation, demyelination, and reduced tissue integrity. Evidence Level 3. Technical Efficacy Stage 2.
@article{sui2025associations, title = {Associations Between Hippocampal Transverse Relaxation Time and Amyloid PET in Cognitively Normal Aging Adults}, author = {Sui, Yu Veronica and Masurkar, Arjun V and Shepherd, Timothy M and Feng, Yang and Wisniewski, Thomas and Rusinek, Henry and Lazar, Mariana}, journal = {Journal of Magnetic Resonance Imaging}, year = {2025}, publisher = {Wiley Online Library}, doi = {https://doi.org/10.1002/jmri.70097} } - GeoERM: Geometry-Aware Multi-Task Representation Learning on Riemannian ManifoldsMulti-Task LearningAoran Chen, and Yang FengarXiv preprint arXiv:2505.02972, 2025
Multi-Task Learning (MTL) seeks to boost statistical power and learning efficiency by discovering structure shared across related tasks. State-of-the-art MTL representation methods, however, usually treat the latent representation matrix as a point in ordinary Euclidean space, ignoring its often non-Euclidean geometry, thus sacrificing robustness when tasks are heterogeneous or even adversarial. We propose GeoERM, a geometry-aware MTL framework that embeds the shared representation on its natural Riemannian manifold and optimizes it via explicit manifold operations. Each training cycle performs (i) a Riemannian gradient step that respects the intrinsic curvature of the search space, followed by (ii) an efficient polar retraction to remain on the manifold, guaranteeing geometric fidelity at every iteration. The procedure applies to a broad class of matrix-factorized MTL models and retains the same per-iteration cost as Euclidean baselines. Across a set of synthetic experiments with task heterogeneity and on a wearable-sensor activity-recognition benchmark, GeoERM consistently improves estimation accuracy, reduces negative transfer, and remains stable under adversarial label noise, outperforming leading MTL and single-task alternatives.
@article{chen2025geoerm, title = {GeoERM: Geometry-Aware Multi-Task Representation Learning on Riemannian Manifolds}, author = {Chen, Aoran and Feng, Yang}, journal = {arXiv preprint arXiv:2505.02972}, year = {2025}, } - Consistent Estimation of the Number of Communities in Non-uniform Hypergraph ModelNetwork AnalysisZuofeng Shang, Zheng Zhang, and Yang FengStat, 2025
We propose an algorithm based on cross‐validation to estimate the number of communities in a general non‐uniform hypergraph model. The algorithm involves a three‐step process. Initially, it randomly divides the set of hyperedges into a training set and a testing set. Subsequently, for each candidate number of communities, we construct a spectral estimation of community labels and least square estimation of the hyperedge probabilities based on the training set. The final step involves the computation of cross‐validation scores using the testing set. The proposed algorithm is shown to be consistent when the number of vertices tends to infinity.
@article{shang2025consistent, title = {Consistent Estimation of the Number of Communities in Non-uniform Hypergraph Model}, author = {Shang, Zuofeng and Zhang, Zheng and Feng, Yang}, journal = {Stat}, volume = {14}, number = {2}, pages = {e70066}, year = {2025}, publisher = {Wiley Online Library}, doi = {https://doi.org/10.1002/sta4.70066} } - Learning from Similar Linear Representations: Adaptivity, Minimaxity, and RobustnessMachine Learning, Transfer Learning, Multi-task LearningYe Tian, Yuqi Gu, and Yang FengJournal of Machine Learning Research, 2025
Ye Tian won the 2025 student paper award in the ASA Section for Statistical Learning and Data Science
Representation multi-task learning (MTL) has achieved tremendous success in practice. However, the theoretical understanding of these methods is still lacking. Most existing theoretical works focus on cases where all tasks share the same representation, and claim that MTL almost always improves performance. Nevertheless, as the number of tasks grows, assuming all tasks share the same representation is unrealistic. Furthermore, empirical findings often indicate that a shared representation does not necessarily improve single-task learning performance. In this paper, we aim to understand how to learn from tasks with \\textitsimilar but not exactly the same linear representations, while dealing with outlier tasks. Assuming a known intrinsic dimension, we propose a penalized empirical risk minimization method and a spectral method that are \\textitadaptive to the similarity structure and \\textitrobust to outlier tasks. Both algorithms outperform single-task learning when representations across tasks are sufficiently similar and the proportion of outlier tasks is small. Moreover, they always perform at least as well as single-task learning, even when the representations are dissimilar. We provide information-theoretic lower bounds to demonstrate that both methods are nearly \\textitminimax optimal in a large regime, with the spectral method being optimal in the absence of outlier tasks. Additionally, we introduce a thresholding algorithm to adapt to an unknown intrinsic dimension. We conduct extensive numerical experiments to validate our theoretical findings.
@article{tian2025learning, author = {Tian, Ye and Gu, Yuqi and Feng, Yang}, journal = {Journal of Machine Learning Research}, title = {Learning from Similar Linear Representations: Adaptivity, Minimaxity, and Robustness}, volume = {26}, number = {187}, pages = {1--125}, year = {2025}, doi = {https://www.jmlr.org/papers/v26/23-0902.html}, } - Semiparametric Modeling and Analysis for Longitudinal Network DataNetwork Analysis, Nonparametric StatisticsYinqiu He, Jiajin Sun, Yuang Tian, Zhiliang Ying, and Yang FengAnnals of Statistics, 2025
We introduce a semiparametric latent space model for analyzing longitudinal network data. The model consists of a static latent space component and a time-varying node-specific baseline component. We develop a semiparametric efficient score equation for the latent space parameter by adjusting for the baseline nuisance component. Estimation is accomplished through a one-step update estimator and an appropriately penalized maximum likelihood estimator. We derive oracle error bounds for the two estimators and address identifiability concerns from a quotient manifold perspective. Our approach is demonstrated using the New York Citi Bike Dataset.
@article{he2025semiparametric, author = {He, Yinqiu and Sun, Jiajin and Tian, Yuang and Ying, Zhiliang and Feng, Yang}, journal = {Annals of Statistics}, title = {Semiparametric Modeling and Analysis for Longitudinal Network Data}, year = {2025}, doi = {10.1214/25-AOS2506}, } - Design-Based Causal Inference with Missing Outcomes: Missingness Mechanisms, Imputation-Assisted Randomization Tests, and Covariate AdjustmentMachine Learning, Causal InferenceSiyu Heng, Jiawei Zhang, and Yang FengJournal of the American Statistical Association, 2025
Siyu Heng won the 2024 IMS New Researcher Travel Award
Design-based causal inference, also known as randomization-based or finite-population causal inference, is one of the most widely used causal inference frameworks, largely due to the merit that its validity can be guaranteed by study design (e.g., randomized experiments) and does not require assuming specific outcome-generating distributions or super-population models. Despite its advantages, design-based causal inference can still suffer from other issues, among which outcome missingness is a prevalent and significant challenge. This work systematically studies the outcome missingness problem in design-based causal inference. First, we propose a general and flexible outcome missingness mechanism that can facilitate finite-population-exact randomization tests of no treatment effect. Second, under this general missingness mechanism, we propose a general framework called “imputation and re-imputation" for conducting randomization tests in design-based causal inference with missing outcomes. We prove that our framework can still ensure finite-population-exact type-I error rate control even when the imputation model was misspecified or when unobserved covariates or interference exist in the missingness mechanism. Third, we extend our framework to conduct covariate adjustment in randomization tests and construct finite-population-valid confidence regions with missing outcomes. Our framework is evaluated via extensive simulation studies and applied to a large-scale randomized experiment.
@article{heng2023design, author = {Heng, Siyu and Zhang, Jiawei and Feng, Yang}, journal = {Journal of the American Statistical Association}, title = {Design-Based Causal Inference with Missing Outcomes: Missingness Mechanisms, Imputation-Assisted Randomization Tests, and Covariate Adjustment}, year = {2025}, doi = {10.1080/01621459.2025.2516204}, }
2024
- L1-Penalized Multinomial Regression: Estimation, Inference, and Prediction, With an Application to Risk Factor Identification for Different Dementia SubtypesHigh-dimensional Statistics, ClassificationYe Tian, Henry Rusinek, Arjun V Masurkar, and Yang FengStatistics in Medicine, 2024
High‐dimensional multinomial regression models are very useful in practice but have received less research attention than logistic regression models, especially from the perspective of statistical inference. In this work, we analyze the estimation and prediction error of the contrast‐based ‐penalized multinomial regression model and extend the debiasing method to the multinomial case, providing a valid confidence interval for each coefficient and value of the individual hypothesis test. We also examine cases of model misspecification and non‐identically distributed data to demonstrate the robustness of our method when some assumptions are violated. We apply the debiasing method to identify important predictors in the progression into dementia of different subtypes. Results from extensive simulations show the superiority of the debiasing method compared to other inference methods.
@article{tian2024l1, title = {L1-Penalized Multinomial Regression: Estimation, Inference, and Prediction, With an Application to Risk Factor Identification for Different Dementia Subtypes}, author = {Tian, Ye and Rusinek, Henry and Masurkar, Arjun V and Feng, Yang}, journal = {Statistics in Medicine}, year = {2024}, publisher = {John Wiley \& Sons, Inc.}, doi = {10.1002/sim.10263}, } - Neyman-pearson multi-class classification via cost-sensitive learningMachine Learning, Neyman-Pearson ClassificationYe Tian, and Yang FengJournal of the American Statistical Association, 2024
Most existing classification methods aim to minimize the overall misclassification error rate. However, in applications such as loan default prediction, different types of errors can have varying consequences. To address this asymmetry issue, two popular paradigms have been developed: the Neyman-Pearson (NP) paradigm and the cost-sensitive (CS) paradigm. Previous studies on the NP paradigm have primarily focused on the binary case, while the multi-class NP problem poses a greater challenge due to its unknown feasibility. In this work, we tackle the multi-class NP problem by establishing a connection with the CS problem via strong duality and propose two algorithms. We extend the concept of NP oracle inequalities, crucial in binary classifications, to NP oracle properties in the multi-class context. Our algorithms satisfy these NP oracle properties under certain conditions. Furthermore, we develop practical algorithms to assess the feasibility and strong duality in multi-class NP problems, which can offer practitioners the landscape of a multi-class NP problem with various target error levels. Simulations and real data studies validate the effectiveness of our algorithms. To our knowledge, this is the first study to address the multi-class NP problem with theoretical guarantees. The proposed algorithms have been implemented in the R package \\textttnpcs, which is available on CRAN.
@article{tian2024neyman, author = {Tian, Ye and Feng, Yang}, journal = {Journal of the American Statistical Association}, title = {Neyman-pearson multi-class classification via cost-sensitive learning}, year = {2024}, number = {just-accepted}, pages = {1--23}, publisher = {Taylor \& Francis}, doi = {10.1080/01621459.2024.2402567}, } - Unsupervised Multi-task and Transfer Learning on Gaussian Mixture ModelsMachine Learning, Transfer Learning, Multi-task LearningYe Tian, Haolei Weng, Lucy Xia, and Yang FengarXiv preprint arXiv:2209.15224, 2024
Unsupervised learning has been widely used in many real-world applications. One of the simplest and most important unsupervised learning models is the Gaussian mixture model (GMM). In this work, we study the multi-task learning problem on GMMs, which aims to leverage potentially similar GMM parameter structures among tasks to obtain improved learning performance compared to single-task learning. We propose a multi-task GMM learning procedure based on the EM algorithm that effectively utilizes unknown similarities between related tasks and is robust against a fraction of outlier tasks from arbitrary distributions. The proposed procedure is shown to achieve the minimax optimal rate of convergence for both parameter estimation error and the excess mis-clustering error, in a wide range of regimes. Moreover, we generalize our approach to tackle the problem of transfer learning for GMMs, where similar theoretical results are derived. Additionally, iterative unsupervised multi-task and transfer learning methods may suffer from an initialization alignment problem, and two alignment algorithms are proposed to resolve the issue. Finally, we demonstrate the effectiveness of our methods through simulations and real data examples. To the best of our knowledge, this is the first work studying multi-task and transfer learning on GMMs with theoretical guarantees.
@article{tian2024unsupervised, author = {Tian, Ye and Weng, Haolei and Xia, Lucy and Feng, Yang}, journal = {arXiv preprint arXiv:2209.15224}, title = {Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models}, year = {2024}, } - Prognostic value of DNA methylation subclassification, aneuploidy, and CDKN2A/B homozygous deletion in predicting clinical outcome of IDH mutant astrocytomasKristyn Galbraith, Mekka Garcia, Siyu Wei, Anna Chen, Chanel Schroff, and 6 more authorsNeuro-oncology, 2024
Background Isocitrate dehydrogenase (IDH) mutant astrocytoma grading, until recently, has been entirely based on morphology. The 5th edition of the Central Nervous System World Health Organization (WHO) introduces CDKN2A/B homozygous deletion as a biomarker of grade 4. We sought to investigate the prognostic impact of DNA methylation-derived molecular biomarkers for IDH mutant astrocytoma. Methods We analyzed 98 IDH mutant astrocytomas diagnosed at NYU Langone Health between 2014 and 2022. We reviewed DNA methylation subclass, CDKN2A/B homozygous deletion, and ploidy and correlated molecular biomarkers with histological grade, progression free (PFS), and overall (OS) survival. Findings were confirmed using 2 independent validation cohorts. Results There was no significant difference in OS or PFS when stratified by histologic WHO grade alone, copy number complexity, or extent of resection. OS was significantly different when patients were stratified either by CDKN2A/B homozygous deletion or by DNA methylation subclass (P value =.0286 and.0016, respectively). None of the molecular biomarkers were associated with significantly better PFS, although DNA methylation classification showed a trend (P value =.0534). Conclusions The current WHO recognized grading criteria for IDH mutant astrocytomas show limited prognostic value. Stratification based on DNA methylation shows superior prognostic value for OS.
@article{galbraith2024prognostic, author = {Galbraith, Kristyn and Garcia, Mekka and Wei, Siyu and Chen, Anna and Schroff, Chanel and Serrano, Jonathan and Pacione, Donato and Placantonakis, Dimitris G and William, Christopher M and Faustin, Arline and others}, journal = {Neuro-oncology}, number = {6}, pages = {1042--1051}, publisher = {Oxford University Press US}, title = {Prognostic value of DNA methylation subclassification, aneuploidy, and CDKN2A/B homozygous deletion in predicting clinical outcome of IDH mutant astrocytomas}, volume = {26}, doi = {10.1093/neuonc/noae009}, year = {2024} } - Omics feature selection with the extended SIS R package: identification of a body mass index epigenetic multi-marker in the Strong Heart StudyMachine Learning, High-dimensional Statistics, EpidemiologyArce Domingo-Relloso, Yang Feng, Zulema Rodriguez-Hernandez, Karin Haack, Shelley A Cole, and 3 more authorsAmerican Journal of Epidemiology, 2024
The statistical analysis of omics data poses a great computational challenge given their ultra–high-dimensional nature and frequent between-features correlation. In this work, we extended the iterative sure independence screening (ISIS) algorithm by pairing ISIS with elastic-net (Enet) and 2 versions of adaptive elastic-net (adaptive elastic-net (AEnet) and multistep adaptive elastic-net (MSAEnet)) to efficiently improve feature selection and effect estimation in omics research. We subsequently used genome-wide human blood DNA methylation data from American Indian participants in the Strong Heart Study (n = 2235 participants; measured in 1989-1991) to compare the performance (predictive accuracy, coefficient estimation, and computational efficiency) of ISIS-paired regularization methods with that of a bayesian shrinkage and traditional linear regression to identify an epigenomic multimarker of body mass index (BMI). ISIS-AEnet outperformed the other methods in prediction. In biological pathway enrichment analysis of genes annotated to BMI-related differentially methylated positions, ISIS-AEnet captured most of the enriched pathways in common for at least 2 of all the evaluated methods. ISIS-AEnet can favor biological discovery because it identifies the most robust biological pathways while achieving an optimal balance between bias and efficient feature selection. In the extended SIS R package, we also implemented ISIS paired with Cox and logistic regression for time-to-event and binary endpoints, respectively, and a bootstrap approach for the estimation of regression coefficients.
@article{domingo2024omics, author = {Domingo-Relloso, Arce and Feng, Yang and Rodriguez-Hernandez, Zulema and Haack, Karin and Cole, Shelley A and Navas-Acien, Ana and Tellez-Plaza, Maria and Bermudez, Jose D}, journal = {American Journal of Epidemiology}, pages = {kwae006}, publisher = {Oxford University Press}, title = {Omics feature selection with the extended SIS R package: identification of a body mass index epigenetic multi-marker in the Strong Heart Study}, doi = {10.1093/aje/kwae006}, year = {2024}, } - Machine collaborationMachine Learning, High-dimensional StatisticsQingfeng Liu, and Yang FengStat, 2024
We propose a new ensemble framework for supervised learning, called machine collaboration (MaC), using a collection of base machines for prediction tasks. Unlike bagging/stacking (a parallel & independent framework) and boosting (a sequential & top-down framework), MaC is a type of circular & interactive learning framework. The circular & interactive feature helps the base machines to transfer information circularly and update their structures and parameters accordingly. The theoretical result on the risk bound of the estimator from MaC reveals that the circular & interactive feature can help MaC reduce risk via a parsimonious ensemble. We conduct extensive experiments on MaC using both simulated data and 119 benchmark real datasets. The results demonstrate that in most cases, MaC performs significantly better than several other state-of-the-art methods, including classification and regression trees, neural networks, stacking, and boosting.
@article{liu2024machine, author = {Liu, Qingfeng and Feng, Yang}, journal = {Stat}, number = {1}, pages = {e661}, publisher = {Wiley Online Library}, title = {Machine collaboration}, volume = {13}, year = {2024}, } - Federated Transfer Learning with Differential PrivacyMachine Learning, Transfer Learning, Differential Privacy, Federated LearningMengchu Li, Ye Tian, Yang Feng, and Yi YuarXiv preprint arXiv:2403.11343, 2024
Federated learning has emerged as a powerful framework for analysing distributed data, yet two challenges remain pivotal: heterogeneity across sites and privacy of local data. In this paper, we address both challenges within a federated transfer learning framework, aiming to enhance learning on a target data set by leveraging information from multiple heterogeneous source data sets while adhering to privacy constraints. We rigorously formulate the notion of federated differential privacy, which offers privacy guarantees for each data set without assuming a trusted central server. Under this privacy model, we study four statistical problems: univariate mean estimation, low-dimensional linear regression, high-dimensional linear regression, and M-estimation. By investigating the minimax rates and quantifying the cost of privacy, we show that federated differential privacy is an intermediate privacy model between the well-established local and central models of differential privacy. Our analyses account for data heterogeneity and privacy, highlighting the fundamental costs associated with each factor and the benefits of knowledge transfer in federated learning.
@article{li2024federated, author = {Li, Mengchu and Tian, Ye and Feng, Yang and Yu, Yi}, journal = {arXiv preprint arXiv:2403.11343}, title = {Federated Transfer Learning with Differential Privacy}, year = {2024}, } - Yang Feng and Jiajin Sun’s contribution to the Discussion of ‘Root and community inference on the latent growth process of a network’by Crane and XuNetwork AnalysisYang Feng, and Jiajin SunJournal of the Royal Statistical Society Series B: Statistical Methodology, 2024
Many statistical models for networks overlook the fact that most real-world networks are formed through a growth process.To address this, we introduce the Preferential Attachment Plus Erdo s-Rnyi model, where we let a random network G be the union of a preferential attachment (PA) tree T and additional Erdo s-Rnyi (ER) random edges.The PA tree captures the underlying growth process of a network where vertices/edges are added sequentially, while the ER component can be regarded as noise.Given only one snapshot of the final network G, we study the problem of constructing confidence sets for the root node of the unobserved growth process; the root node can be patient zero in an infection network or the source of fake news in a social network.We propose inference algorithms based on Gibbs sampling that scales to networks with millions of nodes and provide theoretical analysis showing that the size of the confidence set is small if the noise level of the ER edges is not too large.We also propose variations of the model in which multiple growth processes occur simultaneously, reflecting the growth of multiple communities; we use these models to provide a new approach to community detection.
@article{feng2024yang, author = {Feng, Yang and Sun, Jiajin}, journal = {Journal of the Royal Statistical Society Series B: Statistical Methodology}, number = {4}, pages = {875--878}, publisher = {Oxford University Press US}, title = {Yang Feng and Jiajin Sun's contribution to the Discussion of `Root and community inference on the latent growth process of a network'by Crane and Xu}, volume = {86}, doi = {10.1093/jrsssb/qkae055}, year = {2024}, } - Towards the Theory of Unsupervised Federated Learning: Non-asymptotic Analysis of Federated EM AlgorithmsMachine Learning, High-dimensional Statistics, Federated Learning, Transfer Learning, Multi-task LearningYe Tian, Haolei Weng, and Yang FengIn Forty-first International Conference on Machine Learning, 2024
While supervised federated learning approaches have enjoyed significant success, the domain of unsupervised federated learning remains relatively underexplored. Several federated EM algorithms have gained popularity in practice, however, their theoretical foundations are often lacking. In this paper, we first introduce a federated gradient EM algorithm (FedGrEM) designed for the unsupervised learning of mixture models, which supplements the existing federated EM algorithms by considering task heterogeneity and potential adversarial attacks. We present a comprehensive finite-sample theory that holds for general mixture models, then apply this general theory on specific statistical models to characterize the explicit estimation error of model parameters and mixture proportions. Our theory elucidates when and how FedGrEM outperforms local single-task learning with insights extending to existing federated EM algorithms. This bridges the gap between their practical success and theoretical understanding. Our numerical results validate our theory, and demonstrate FedGrEM’s superiority over existing unsupervised federated learning benchmarks.
@inproceedings{tian2024towards, author = {Tian, Ye and Weng, Haolei and Feng, Yang}, booktitle = {Forty-first International Conference on Machine Learning}, title = {Towards the Theory of Unsupervised Federated Learning: Non-asymptotic Analysis of Federated EM Algorithms}, year = {2024}, doi = {10.5555/3692070.3694041}, } - Variational Nonparametric Inference in Functional Stochastic Block ModelNetwork AnalysisZuofeng Shang, Peijun Sang, Yang Feng, and Chong JinarXiv preprint arXiv:2407.00564, 2024
We propose a functional stochastic block model whose vertices involve functional data information. This new model extends the classic stochastic block model with vector-valued nodal information, and finds applications in real-world networks whose nodal information could be functional curves. Examples include international trade data in which a network vertex (country) is associated with the annual or quarterly GDP over certain time period, and MyFitnessPal data in which a network vertex (MyFitnessPal user) is associated with daily calorie information measured over certain time period. Two statistical tasks will be jointly executed. First, we will detect community structures of the network vertices assisted by the functional nodal information. Second, we propose computationally efficient variational test to examine the significance of the functional nodal information. We show that the community detection algorithms achieve weak and strong consistency, and the variational test is asymptotically chi-square with diverging degrees of freedom. As a byproduct, we propose pointwise confidence intervals for the slop function of the functional nodal information. Our methods are examined through both simulated and real datasets.
@article{shang2024variational, author = {Shang, Zuofeng and Sang, Peijun and Feng, Yang and Jin, Chong}, journal = {arXiv preprint arXiv:2407.00564}, title = {Variational Nonparametric Inference in Functional Stochastic Block Model}, year = {2024}, } - Racial distribution of molecularly classified brain tumorsCamila S Fang, Wanyi Wang, Chanel Schroff, Misha Movahed-Ezazi, Varshini Vasudevaraja, and 6 more authorsNeuro-Oncology Advances, 2024
Background In many cancers, specific subtypes are more prevalent in specific racial backgrounds. However, little is known about the racial distribution of specific molecular types of brain tumors. Public data repositories lack data on many brain tumor subtypes as well as diagnostic annotation using the current World Health Organization classification. A better understanding of the prevalence of brain tumors in different racial backgrounds may provide insight into tumor predisposition and development, and improve prevention. Methods We retrospectively analyzed the racial distribution of 1709 primary brain tumors classified by their methylation profiles using clinically validated whole genome DNA methylation. Self-reported race was obtained from medical records. Our cohort included 82% White, 10% Black, and 8% Asian patients with 74% of patients reporting their race. Results There was a significant difference in the racial distribution of specific types of brain tumors. Blacks were overrepresented in pituitary adenomas (35%, P <.001), with the largest proportion of FSH/LH subtype. Whites were underrepresented at 47% of all pituitary adenoma patients (P <.001). Glioblastoma (GBM) IDH wild-type showed an enrichment of Whites, at 90% (P <.001), and a significantly smaller percentage of Blacks, at 3% (P <.001). Conclusions Molecularly classified brain tumor groups and subgroups show different distributions among the three main racial backgrounds suggesting the contribution of race to brain tumor development.
@article{fang2024racial, author = {Fang, Camila S and Wang, Wanyi and Schroff, Chanel and Movahed-Ezazi, Misha and Vasudevaraja, Varshini and Serrano, Jonathan and Sulman, Erik P and Golfinos, John G and Orringer, Daniel and Galbraith, Kristyn and others}, journal = {Neuro-Oncology Advances}, number = {1}, pages = {vdae135}, publisher = {Oxford University Press US}, title = {Racial distribution of molecularly classified brain tumors}, volume = {6}, doi = {10.1093/noajnl/vdae135}, year = {2024} } - Multi-label Random Subspace Ensemble ClassificationMachine Learning, High-dimensional StatisticsFan Bi, Jianan Zhu, and Yang FengJournal of Computational and Graphical Statistics, 2024
In this work, we develop a new ensemble learning framework, multi-label Random Subspace Ensemble (mRaSE), for multi-label classification. Given a base classifier (e.g., multinomial logistic regression, classification tree, K-nearest neighbors), mRaSE works by first randomly sampling a collection of subspaces, then choosing the best ones that achieve the minimum cross-validation errors and, finally, aggregating the chosen weak learners. In addition to its superior prediction performance, mRaSE also provides a model-free feature ranking depending on the given base classifier. An iterative version of mRaSE is also developed to further improve the performance. A model-free extension is pursued on the iterative version, leading to the so-called Super mRaSE, which accepts a collection of base classifiers as input to the algorithm. We show the proposed algorithms compared favorably with the state-of-the-art classification algorithm including random forest and deep neural network, via extensive simulation studies and two real data applications. The new algorithms are implemented in an updated version of the R package RaSEn.
@article{bi2024multi, author = {Bi, Fan and Zhu, Jianan and Feng, Yang}, journal = {Journal of Computational and Graphical Statistics}, number = {just-accepted}, pages = {1--20}, publisher = {Taylor \& Francis}, title = {Multi-label Random Subspace Ensemble Classification}, doi = {10.1080/10618600.2024.2421248}, year = {2024}, }
2023
- Spectral clustering via adaptive layer aggregation for multi-layer networksNetwork AnalysisSihan Huang, Haolei Weng, and Yang FengJournal of Computational and Graphical Statistics, 2023
One of the fundamental problems in network analysis is detecting community structure in multi-layer networks, of which each layer represents one type of edge information among the nodes. We propose integrative spectral clustering approaches based on effective convex layer aggregations. Our aggregation methods are strongly motivated by a delicate asymptotic analysis of the spectral embedding of weighted adjacency matrices and the downstream \k\-means clustering, in a challenging regime where community detection consistency is impossible. In fact, the methods are shown to estimate the optimal convex aggregation, which minimizes the mis-clustering error under some specialized multi-layer network models. Our analysis further suggests that clustering using Gaussian mixture models is generally superior to the commonly used \k\-means in spectral clustering. Extensive numerical studies demonstrate that our adaptive aggregation techniques, together with Gaussian mixture model clustering, make the new spectral clustering remarkably competitive compared to several popularly used methods.
@article{huang2023spectral, author = {Huang, Sihan and Weng, Haolei and Feng, Yang}, journal = {Journal of Computational and Graphical Statistics}, number = {32}, pages = {1170--1184}, publisher = {Taylor \& Francis}, title = {Spectral clustering via adaptive layer aggregation for multi-layer networks}, year = {2023}, } - Transfer learning under high-dimensional generalized linear modelsTransfer Learning, High-dimensional StatisticsYe Tian, and Yang FengJournal of the American Statistical Association, 2023
In this work, we study the transfer learning problem under high-dimensional generalized linear models (GLMs), which aim to improve the fit on target data by borrowing information from useful source data. Given which sources to transfer, we propose a transfer learning algorithm on GLM, and derive its \\\ell_1/\\ell_2\-estimation error bounds as well as a bound for a prediction error measure. The theoretical analysis shows that when the target and source are sufficiently close to each other, these bounds could be improved over those of the classical penalized estimator using only target data under mild conditions. When we don’t know which sources to transfer, an algorithm-free transferable source detection approach is introduced to detect informative sources. The detection consistency is proved under the high-dimensional GLM transfer learning setting. We also propose an algorithm to construct confidence intervals of each coefficient component, and the corresponding theories are provided. Extensive simulations and a real-data experiment verify the effectiveness of our algorithms. We implement the proposed GLM transfer learning algorithms in a new R package glmtrans, which is available on CRAN.
@article{tian2023transfer, author = {Tian, Ye and Feng, Yang}, journal = {Journal of the American Statistical Association}, title = {Transfer learning under high-dimensional generalized linear models}, year = {2023}, number = {544}, pages = {2684--2697}, volume = {118}, publisher = {Taylor \& Francis}, doi = {10.1080/01621459.2022.2071278}, } - Variable selection for high-dimensional generalized linear model with block-missing dataHigh-dimensional StatisticsYifan He, Yang Feng, and Xinyuan SongScandinavian Journal of Statistics, 2023
In modern scientific research, multiblock missing data emerges with synthesizing information across multiple studies. However, existing imputation methods for handling block‐wise missing data either focus on the single‐block missing pattern or heavily rely on the model structure. In this study, we propose a single regression‐based imputation algorithm for multiblock missing data. First, we conduct a sparse precision matrix estimation based on the structure of block‐wise missing data. Second, we impute the missing blocks with their means conditional on the observed blocks. Theoretical results about variable selection and estimation consistency are established in the context of a generalized linear model. Moreover, simulation studies show that compared with existing methods, the proposed imputation procedure is robust to various missing mechanisms because of the good properties of regression imputation. An application to Alzheimer’s Disease Neuroimaging Initiative data also confirms the superiority of our proposed method.
@article{he2023variable, author = {He, Yifan and Feng, Yang and Song, Xinyuan}, journal = {Scandinavian Journal of Statistics}, number = {3}, pages = {1279--1297}, title = {Variable selection for high-dimensional generalized linear model with block-missing data}, volume = {50}, doi = {10.1111/sjos.12632}, year = {2023}, } - DDAC-SpAM: A Distributed Algorithm for Fitting High-dimensional Sparse Additive Models with Feature Division and DecorrelationHigh-dimensional StatisticsYifan He, Ruiyang Wu, Yong Zhou, and Yang FengJournal of the American Statistical Association, 2023
Distributed statistical learning has become a popular technique for large-scale data analysis. Most existing work in this area focuses on dividing the observations, but we propose a new algorithm, DDAC-SpAM, which divides the features under a high-dimensional sparse additive model. Our approach involves three steps: divide, decorrelate, and conquer. The decorrelation operation enables each local estimator to recover the sparsity pattern for each additive component without imposing strict constraints on the correlation structure among variables. The effectiveness and efficiency of the proposed algorithm are demonstrated through theoretical analysis and empirical results on both synthetic and real data. The theoretical results include both the consistent sparsity pattern recovery as well as statistical inference for each additive functional component. Our approach provides a practical solution for fitting sparse additive models, with promising applications in a wide range of domains.
@article{he2023ddac, author = {He, Yifan and Wu, Ruiyang and Zhou, Yong and Feng, Yang}, journal = {Journal of the American Statistical Association}, title = {DDAC-SpAM: A Distributed Algorithm for Fitting High-dimensional Sparse Additive Models with Feature Division and Decorrelation}, year = {2023}, pages = {1--12}, publisher = {Taylor \& Francis}, } - PCABM: Pairwise Covariates-Adjusted Block Model for Community DetectionNetwork AnalysisSihan Huang, Jiajin Sun, and Yang FengJournal of the American Statistical Association, 2023
One of the most fundamental problems in network study is community detection. The stochastic block model (SBM) is a widely used model, for which various estimation methods have been developed with their community detection consistency results unveiled. However, the SBM is restricted by the strong assumption that all nodes in the same community are stochastically equivalent, which may not be suitable for practical applications. We introduce a pairwise covariates-adjusted stochastic block model (PCABM), a generalization of SBM that incorporates pairwise covariate information. We study the maximum likelihood estimates of the coefficients for the covariates as well as the community assignments. It is shown that both the coefficient estimates of the covariates and the community assignments are consistent under suitable sparsity conditions. Spectral clustering with adjustment (SCWA) is introduced to efficiently solve PCABM. Under certain conditions, we derive the error bound of community detection under SCWA and show that it is community detection consistent. In addition, we investigate model selection in terms of the number of communities and feature selection for the pairwise covariates, and propose two corresponding algorithms. PCABM compares favorably with the SBM or degree-corrected stochastic block model (DCBM) under a wide range of simulated and real networks when covariate information is accessible.
@article{huang2023pcabm, author = {Huang, Sihan and Sun, Jiajin and Feng, Yang}, journal = {Journal of the American Statistical Association}, title = {PCABM: Pairwise Covariates-Adjusted Block Model for Community Detection}, year = {2023}, pages = {1--13}, publisher = {Taylor \& Francis}, doi = {10.1080/01621459.2023.2244731}, } - A flexible quasi-likelihood model for microbiome abundance count dataYiming Shi, Huilin Li, Chan Wang, Jun Chen, Hongmei Jiang, and 5 more authorsStatistics in Medicine, 2023
In this article, we present a flexible model for microbiome count data. We consider a quasi‐likelihood framework, in which we do not make any assumptions on the distribution of the microbiome count except that its variance is an unknown but smooth function of the mean. By comparing our model to the negative binomial generalized linear model (GLM) and Poisson GLM in simulation studies, we show that our flexible quasi‐likelihood method yields valid inferential results. Using a real microbiome study, we demonstrate the utility of our method by examining the relationship between adenomas and microbiota. We also provide an R package “fql” for the application of our method.
@article{shi2023flexible, author = {Shi, Yiming and Li, Huilin and Wang, Chan and Chen, Jun and Jiang, Hongmei and Shih, Ya-Chen T and Zhang, Haixiang and Song, Yizhe and Feng, Yang and Liu, Lei}, journal = {Statistics in Medicine}, number = {25}, pages = {4632--4643}, publisher = {John Wiley \& Sons, Inc. Hoboken, USA}, title = {A flexible quasi-likelihood model for microbiome abundance count data}, volume = {42}, doi = {10.1002/sim.9880}, year = {2023} } - Simulation of New York City’s Ventilator Allocation Guideline During the Spring 2020 COVID-19 SurgeEpidemiologyB Corbett Walsh, Jianan Zhu, Yang Feng, Kenneth A Berkowitz, Rebecca A Betensky, and 2 more authorsJAMA network open, 2023
Importance The spring 2020 surge of COVID-19 unprecedentedly strained ventilator supply in New York City, with many hospitals nearly exhausting available ventilators and subsequently seriously considering enacting crisis standards of care and implementing New York State Ventilator Allocation Guidelines (NYVAG). However, there is little evidence as to how NYVAG would perform if implemented. Objectives To evaluate the performance and potential improvement of NYVAG during a surge of patients with respect to the length of rationing, overall mortality, and worsening health disparities. Design, Setting, and Participants This cohort study included intubated patients in a single health system in New York City from March through July 2020. A total of 20 000 simulations were conducted of ventilator triage (10 000 following NYVAG and 10 000 following a proposed improved NYVAG) during a crisis period, defined as the point at which the prepandemic ventilator supply was 95% utilized. Exposures The NYVAG protocol for triage ventilators. Main Outcomes and Measures Comparison of observed survival rates with simulations of scenarios requiring NYVAG ventilator rationing. Results The total cohort included 1671 patients; of these, 674 intubated patients (mean [SD] age, 63.7 [13.8] years; 465 male [69.9%]) were included in the crisis period, with 571 (84.7%) testing positive for COVID-19. Simulated ventilator rationing occurred for 163.9 patients over 15.0 days, 44.4% (95% CI, 38.3%-50.0%) of whom would have survived if provided a ventilator while only 34.8% (95% CI, 28.5%-40.0%) of those newly intubated patients receiving a reallocated ventilator survived. While triage categorization at the time of intubation exhibited partial prognostic differentiation, 94.8% of all ventilator rationing occurred after a time trial. Within this subset, 43.1% were intubated for 7 or more days with a favorable SOFA score that had not improved. An estimated 60.6% of these patients would have survived if sustained on a ventilator. Revising triage subcategorization, proposed improved NYVAG, would have improved this alarming ventilator allocation inefficiency (25.3% [95% CI, 22.1%-28.4%] of those selected for ventilator rationing would have survived if provided a ventilator). NYVAG ventilator rationing did not exacerbate existing health disparities. Conclusions and Relevance In this cohort study of intubated patients experiencing simulated ventilator rationing during the apex of the New York City COVID-19 2020 surge, NYVAG diverted ventilators from patients with a higher chance of survival to those with a lower chance of survival. Future efforts should be focused on triage subcategorization, which improved this triage inefficiency, and ventilator rationing after a time trial, when most ventilator rationing occurred.
@article{walsh2023simulation, author = {Walsh, B Corbett and Zhu, Jianan and Feng, Yang and Berkowitz, Kenneth A and Betensky, Rebecca A and Nunnally, Mark E and Pradhan, Deepak R}, journal = {JAMA network open}, number = {10}, pages = {e2336736--e2336736}, publisher = {American Medical Association}, title = {Simulation of New York City's Ventilator Allocation Guideline During the Spring 2020 COVID-19 Surge}, volume = {6}, doi = {10.1001/jamanetworkopen.2023.36736}, year = {2023}, } - Comments on: Statistical inference and large-scale multiple testing for high-dimensional regression modelsHigh-dimensional StatisticsYe Tian, and Yang FengTest, 2023
@article{tian2023comments, author = {Tian, Ye and Feng, Yang}, journal = {Test}, number = {4}, pages = {1172--1176}, publisher = {Springer Berlin Heidelberg Berlin/Heidelberg}, title = {Comments on: Statistical inference and large-scale multiple testing for high-dimensional regression models}, volume = {32}, year = {2023}, }
2022
- A likelihood-ratio type test for stochastic block models with bounded degreesNetwork AnalysisMingao Yuan, Yang Feng, and Zuofeng ShangJournal of Statistical Planning and Inference, 2022
@article{yuan2022likelihood, author = {Yuan, Mingao and Feng, Yang and Shang, Zuofeng}, journal = {Journal of Statistical Planning and Inference}, pages = {98--119}, publisher = {North-Holland}, title = {A likelihood-ratio type test for stochastic block models with bounded degrees}, volume = {219}, year = {2022}, } - Targeting predictors via partial distance correlation with applications to financial forecastingHigh-dimensional StatisticsKashif Yousuf, and Yang FengJournal of Business & Economic Statistics, 2022
@article{yousuf2022targeting, author = {Yousuf, Kashif and Feng, Yang}, journal = {Journal of Business \& Economic Statistics}, number = {3}, pages = {1007--1019}, publisher = {Taylor \& Francis}, title = {Targeting predictors via partial distance correlation with applications to financial forecasting}, volume = {40}, year = {2022}, } - Community detection with nodal information: likelihood and its variational approximationNetwork AnalysisHaolei Weng, and Yang FengStat, 2022
@article{weng2022community, author = {Weng, Haolei and Feng, Yang}, journal = {Stat}, pages = {e428}, title = {Community detection with nodal information: likelihood and its variational approximation}, year = {2022}, } - Testing community structure for hypergraphsNetwork AnalysisMingao Yuan, Ruiqi Liu, Yang Feng, and Zuofeng ShangAnnals of Statistics, 2022
Many complex networks in real world can be formulated as hypergraphs where community detection has been widely used. However, the fundamental question of whether communities exist or not in an observed hypergraph still remains unresolved. The aim of the present paper is to tackle this important problem. Specifically, we study when a hypergraph with community structure can be successfully distinguished from its Erdös-Renyi counterpart, and propose concrete test statistics based on hypergraph cycles when the models are distinguishable. Our contributions are summarized as follows. For uniform hypergraphs, we show that successful testing is always impossible when average degree tends to zero, might be possible when average degree is bounded, and is possible when average degree is growing. We obtain asymptotic distributions of the proposed test statistics and analyze their power. Our results for growing degree case are further extended to nonuniform hypergraphs in which a new test involving both edge and hyperedge information is proposed. The novel aspect of our new test is that it is provably more powerful than the classic test involving only edge information. Simulation and real data analysis support our theoretical findings. The proofs rely on Janson’s contiguity theory (\\citeJ95) and a high-moments driven asymptotic normality result by Gao and Wormald (\\citeGWALD).
@article{yuan2022testing, author = {Yuan, Mingao and Liu, Ruiqi and Feng, Yang and Shang, Zuofeng}, journal = {Annals of Statistics}, title = {Testing community structure for hypergraphs}, year = {2022}, number = {1}, pages = {147--169}, volume = {50}, publisher = {Institute of Mathematical Statistics}, doi = {10.1214/21-AOS2099}, } - Large-scale model selection in misspecified generalized linear modelsHigh-dimensional StatisticsEmre Demirkaya, Yang Feng, Pallavi Basu, and Jinchi LvBiometrika, 2022
Summary Model selection is crucial both to high-dimensional learning and to inference for contemporary big data applications in pinpointing the best set of covariates among a sequence of candidate interpretable models. Most existing work implicitly assumes that the models are correctly specified or have fixed dimensionality, yet both model misspecification and high dimensionality are prevalent in practice. In this paper, we exploit the framework of model selection principles under the misspecified generalized linear models presented in Lv & Liu (2014), and investigate the asymptotic expansion of the posterior model probability in the setting of high-dimensional misspecified models. With a natural choice of prior probabilities that encourages interpretability and incorporates the Kullback–Leibler divergence, we suggest using the high-dimensional generalized Bayesian information criterion with prior probability for large-scale model selection with misspecification. Our new information criterion characterizes the impacts of both model misspecification and high dimensionality on model selection. We further establish the consistency of covariance contrast matrix estimation and the model selection consistency of the new information criterion in ultrahigh dimensions under some mild regularity conditions. Our numerical studies demonstrate that the proposed method enjoys improved model selection consistency over its main competitors.
@article{demirkaya2022large, author = {Demirkaya, Emre and Feng, Yang and Basu, Pallavi and Lv, Jinchi}, journal = {Biometrika}, title = {Large-scale model selection in misspecified generalized linear models}, year = {2022}, number = {1}, pages = {123--136}, volume = {109}, publisher = {Oxford University Press}, doi = {10.1093/biomet/asab005}, } - Discussion of “Cocitation and Coauthorship Networks of Statisticians”Network AnalysisHaolei Weng, and Yang FengJournal of Business & Economic Statistics, 2022
@article{weng2022discussion, author = {Weng, Haolei and Feng, Yang}, journal = {Journal of Business \& Economic Statistics}, number = {2}, pages = {486--490}, publisher = {Taylor \& Francis}, title = {Discussion of ``Cocitation and Coauthorship Networks of Statisticians''}, volume = {40}, year = {2022}, } - Association of hyperglycemia and molecular subclass on survival in IDH-wildtype glioblastomaElisa K Liu, Varshini Vasudevaraja, Vladislav O Sviderskiy, Yang Feng, Ivy Tran, and 6 more authorsNeuro-Oncology Advances, 2022
Background Hyperglycemia has been associated with worse survival in glioblastoma. Attempts to lower glucose yielded mixed responses which could be due to molecularly distinct GBM subclasses. Methods Clinical, laboratory, and molecular data on 89 IDH-wt GBMs profiled by clinical next-generation sequencing and treated with Stupp protocol were reviewed. IDH-wt GBMs were sub-classified into RTK I (Proneural), RTK II (Classical) and Mesenchymal subtypes using whole-genome DNA methylation. Average glucose was calculated by time-weighting glucose measurements between diagnosis and last follow-up. Results Patients were stratified into three groups using average glucose: tertile one (<100 mg/dL), tertile two (100–115 mg/dL), and tertile three (>115 mg/dL). Comparison across glucose tertiles revealed no differences in performance status (KPS), dexamethasone dose, MGMT methylation, or methylation subclass. Overall survival (OS) was not affected by methylation subclass (P =.9) but decreased with higher glucose (P =.015). Higher glucose tertiles were associated with poorer OS among RTK I (P =.08) and mesenchymal tumors (P =.05), but not RTK II (P =.99). After controlling for age, KPS, dexamethasone, and MGMT status, glucose remained significantly associated with OS (aHR = 5.2, P =.02). Methylation clustering did not identify unique signatures associated with high or low glucose levels. Metabolomic analysis of 23 tumors showed minimal variation across metabolites without differences between molecular subclasses. Conclusion Higher average glucose values were associated with poorer OS in RTKI and Mesenchymal IDH-wt GBM, but not RTKII. There were no discernible epigenetic or metabolomic differences between tumors in different glucose environments, suggesting a potential survival benefit to lowering systemic glucose in selected molecular subtypes.
@article{liu2022association, author = {Liu, Elisa K and Vasudevaraja, Varshini and Sviderskiy, Vladislav O and Feng, Yang and Tran, Ivy and Serrano, Jonathan and Cordova, Christine and Kurz, Sylvia C and Golfinos, John G and Sulman, Erik P and others}, journal = {Neuro-Oncology Advances}, number = {1}, pages = {vdac163}, publisher = {Oxford University Press US}, title = {Association of hyperglycemia and molecular subclass on survival in IDH-wildtype glioblastoma}, volume = {4}, doi = {10.1093/noajnl/vdac163}, year = {2022} } - Clinical, Pathological, and Molecular Characteristics of Diffuse Spinal Cord GliomasMekka R Garcia, Yang Feng, Varshini Vasudevaraja, Kristyn Galbraith, Jonathan Serrano, and 6 more authorsJournal of Neuropathology & Experimental Neurology, 2022
Diffuse spinal cord gliomas (SCGs) are rare tumors associated with a high morbidity and mortality that affect both pediatric and adult populations. In this retrospective study, we sought to characterize the clinical, pathological, and molecular features of diffuse SCG in 22 patients with histological and molecular analyses. The median age of our cohort was 23.64 years (range 1–82) and the overall median survival was 397 days. K27M mutation was significantly more prevalent in males compared to females. Gross total resection and chemotherapy were associated with improved survival, compared to biopsy and no chemotherapy. While there was no association between tumor grade, K27M status (p = 0.366) or radiation (p = 0.772), and survival, males showed a trend toward shorter survival. K27M mutant tumors showed increased chromosomal instability and a distinct DNA methylation signature.
@article{garcia2022clinical, author = {Garcia, Mekka R and Feng, Yang and Vasudevaraja, Varshini and Galbraith, Kristyn and Serrano, Jonathan and Thomas, Cheddhi and Radmanesh, Alireza and Hidalgo, Eveline T and Harter, David H and Allen, Jeffrey C and others}, journal = {Journal of Neuropathology \& Experimental Neurology}, number = {11}, pages = {865--872}, publisher = {Oxford Academic}, title = {Clinical, Pathological, and Molecular Characteristics of Diffuse Spinal Cord Gliomas}, volume = {81}, doi = {10.1093/jnen/nlac075}, year = {2022} } - Differential Role of Hyperglycemia on Survival in IDH-wildtype Glioblastoma SubclassesElisa Liu, Varshini Vasudevaraja, Vladislav Sviderskiy, Yang Feng, Ivy Tran, and 6 more authorsIn JOURNAL OF NEUROPATHOLOGY AND EXPERIMENTAL NEUROLOGY, 2022
Background Hyperglycemia has been associated with worse survival in glioblastoma. Attempts to lower glucose yielded mixed responses which could be due to molecularly distinct GBM subclasses. Methods Clinical, laboratory, and molecular data on 89 IDH-wt GBMs profiled by clinical next-generation sequencing and treated with Stupp protocol were reviewed. IDH-wt GBMs were sub-classified into RTK I (Proneural), RTK II (Classical) and Mesenchymal subtypes using whole-genome DNA methylation. Average glucose was calculated by time-weighting glucose measurements between diagnosis and last follow-up. Results Patients were stratified into three groups using average glucose: tertile one (<100 mg/dL), tertile two (100–115 mg/dL), and tertile three (>115 mg/dL). Comparison across glucose tertiles revealed no differences in performance status (KPS), dexamethasone dose, MGMT methylation, or methylation subclass. Overall survival (OS) was not affected by methylation subclass (P =.9) but decreased with higher glucose (P =.015). Higher glucose tertiles were associated with poorer OS among RTK I (P =.08) and mesenchymal tumors (P =.05), but not RTK II (P =.99). After controlling for age, KPS, dexamethasone, and MGMT status, glucose remained significantly associated with OS (aHR = 5.2, P =.02). Methylation clustering did not identify unique signatures associated with high or low glucose levels. Metabolomic analysis of 23 tumors showed minimal variation across metabolites without differences between molecular subclasses. Conclusion Higher average glucose values were associated with poorer OS in RTKI and Mesenchymal IDH-wt GBM, but not RTKII. There were no discernible epigenetic or metabolomic differences between tumors in different glucose environments, suggesting a potential survival benefit to lowering systemic glucose in selected molecular subtypes.
@inproceedings{liu2022differential, author = {Liu, Elisa and Vasudevaraja, Varshini and Sviderskiy, Vladislav and Feng, Yang and Tran, Ivy and Serrano, Jonathan and Cordova, Christine and Kurz, Sylvia and Golfinos, John and Sulman, Erik and others}, booktitle = {JOURNAL OF NEUROPATHOLOGY AND EXPERIMENTAL NEUROLOGY}, number = {6}, organization = {OXFORD UNIV PRESS INC JOURNALS DEPT, 2001 EVANS RD, CARY, NC 27513 USA}, pages = {440--440}, title = {Differential Role of Hyperglycemia on Survival in IDH-wildtype Glioblastoma Subclasses}, volume = {81}, doi = {10.1093/noajnl/vdac163}, year = {2022} }
2021
- RaSE: Random Subspace Ensemble ClassificationHigh-dimensional Statistics, Classification, Machine LearningYe Tian, and Yang FengJournal of Machine Learning Research, 2021
We propose a flexible ensemble classification framework, Random Subspace Ensemble (RaSE), for sparse classification. In the RaSE algorithm, we aggregate many weak learners, where each weak learner is a base classifier trained in a subspace optimally selected from a collection of random subspaces. To conduct subspace selection, we propose a new criterion, ratio information criterion (RIC), based on weighted Kullback-Leibler divergence. The theoretical analysis includes the risk and Monte-Carlo variance of the RaSE classifier, establishing the screening consistency and weak consistency of RIC, and providing an upper bound for the misclassification rate of the RaSE classifier. In addition, we show that in a high-dimensional framework, the number of random subspaces needs to be very large to guarantee that a subspace covering signals is selected. Therefore, we propose an iterative version of the RaSE algorithm and prove that under some specific conditions, a smaller number of generated random subspaces are needed to find a desirable subspace through iteration. An array of simulations under various models and real-data applications demonstrate the effectiveness and robustness of the RaSE classifier and its iterative version in terms of low misclassification rate and accurate feature ranking. The RaSE algorithm is implemented in the R package RaSEn on CRAN.
@article{tian2021raseclassification, author = {Tian, Ye and Feng, Yang}, journal = {Journal of Machine Learning Research}, title = {RaSE: Random Subspace Ensemble Classification}, year = {2021}, date-modified = {2024-11-12 13:09:11 -0500}, read = {0}, } - Visceral adipose tissue in patients with COVID-19: risk stratification for severityHersh Chandarana, Bari Dane, Artem Mikheev, Myles T Taffel, Yang Feng, and 1 more authorAbdominal Radiology, 2021
@article{chandarana2021visceral, author = {Chandarana, Hersh and Dane, Bari and Mikheev, Artem and Taffel, Myles T and Feng, Yang and Rusinek, Henry}, journal = {Abdominal Radiology}, number = {2}, pages = {818--825}, publisher = {Springer US}, title = {Visceral adipose tissue in patients with COVID-19: risk stratification for severity}, volume = {46}, year = {2021} } - Model Averaging for Nonlinear Regression ModelsModel AveragingY Feng, Q Liu, Q Yao, and G ZhaoJournal of Business & Economic Statistics, 2021
@article{feng2021model, author = {Feng, Y and Liu, Q and Yao, Q and Zhao, G}, journal = {Journal of Business \& Economic Statistics}, title = {Model Averaging for Nonlinear Regression Models}, year = {2021}, } - The Interplay of Demographic Variables and Social Distancing Scores in Deep Prediction of US COVID-19 CasesEpidemiology, Machine LearningFrancesca Tang, Yang Feng, Hamza Chiheb, and Jianqing FanJournal of the American Statistical Association, 2021
With the severity of the COVID-19 outbreak, we characterize the nature of the growth trajectories of counties in the United States using a novel combination of spectral clustering and the correlation matrix. As the U.S. and the rest of the world are experiencing a severe second wave of infections, the importance of assigning growth membership to counties and understanding the determinants of the growth are increasingly evident. Subsequently, we select the demographic features that are most statistically significant in distinguishing the communities. Lastly, we effectively predict the future growth of a given county with an LSTM using three social distancing scores. This comprehensive study captures the nature of counties’ growth in cases at a very micro-level using growth communities, demographic factors, and social distancing performance to help government agencies utilize known information to make appropriate decisions regarding which potential counties to target resources and funding to.
@article{tang2021interplay, author = {Tang, Francesca and Feng, Yang and Chiheb, Hamza and Fan, Jianqing}, journal = {Journal of the American Statistical Association}, title = {The Interplay of Demographic Variables and Social Distancing Scores in Deep Prediction of US COVID-19 Cases}, year = {2021}, doi = {10.1080/01621459.2021.1901717}, } - RaSE: A Variable Screening Framework via Random Subspace EnsemblesMachine Learning, High-dimensional StatisticsYe Tian, and Yang FengJournal of American Statistical Association, 2021
Variable screening methods have been shown to be effective in dimension reduction under the ultra-high dimensional setting. Most existing screening methods are designed to rank the predictors according to their individual contributions to the response. As a result, variables that are marginally independent but jointly dependent with the response could be missed. In this work, we propose a new framework for variable screening, Random Subspace Ensemble (RaSE), which works by evaluating the quality of random subspaces that may cover multiple predictors. This new screening framework can be naturally combined with any subspace evaluation criterion, which leads to an array of screening methods. The framework is capable to identify signals with no marginal effect or with high-order interaction effects. It is shown to enjoy the sure screening property and rank consistency. We also develop an iterative version of RaSE screening with theoretical support. Extensive simulation studies and real-data analysis show the effectiveness of the new screening framework.
@article{tian2021rasescreening, author = {Tian, Ye and Feng, Yang}, journal = {Journal of American Statistical Association}, title = {RaSE: A Variable Screening Framework via Random Subspace Ensembles}, year = {2021}, date-modified = {2024-11-12 13:08:50 -0500}, doi = {10.1080/01621459.2021.1938084}, } - Mediation effect selection in high-dimensional and compositional microbiome dataHigh-dimensional StatisticsHaixiang Zhang, Jun Chen, Yang Feng, Chan Wang, Huilin Li, and 1 more authorStatistics in medicine, 2021
The microbiome plays an important role in human health by mediating the path from environmental exposures to health outcomes. The relative abundances of the high‐dimensional microbiome data have an unit‐sum restriction, rendering standard statistical methods in the Euclidean space invalid. To address this problem, we use the isometric log‐ratio transformations of the relative abundances as the mediator variables. To select significant mediators, we consider a closed testing‐based selection procedure with desirable confidence. Simulations are provided to verify the effectiveness of our method. As an illustrative example, we apply the proposed method to study the mediation effects of murine gut microbiome between subtherapeutic antibiotic treatment and body weight gain, and identify Coprobacillus and Adlercreutzia as two significant mediators.
@article{zhang2021mediation, author = {Zhang, Haixiang and Chen, Jun and Feng, Yang and Wang, Chan and Li, Huilin and Liu, Lei}, journal = {Statistics in medicine}, number = {4}, pages = {885--896}, publisher = {Wiley Online Library}, title = {Mediation effect selection in high-dimensional and compositional microbiome data}, volume = {40}, year = {2021}, doi = {10.1002/sim.8808}, } - Imbalanced classification: A paradigm-based reviewClassificationYang Feng, Min Zhou, and Xin TongStatistical Analysis and Data Mining: The ASA Data Science Journal, 2021
A common issue for classification in scientific research and industry is the existence of imbalanced classes. When sample sizes of different classes are imbalanced in training data, naively implementing a classification method often leads to unsatisfactory prediction results on test data. Multiple resampling techniques have been proposed to address the class imbalance issues. Yet, there is no general guidance on when to use each technique. In this article, we provide a paradigm‐based review of the common resampling techniques for binary classification under imbalanced class sizes. The paradigms we consider include the classical paradigm that minimizes the overall classification error, the cost‐sensitive learning paradigm that minimizes a cost‐adjusted weighted type I and type II errors, and the Neyman–Pearson paradigm that minimizes the type II error subject to a type I error constraint. Under each paradigm, we investigate the combination of the resampling techniques and a few state‐of‐the‐art classification methods. For each pair of resampling techniques and classification methods, we use simulation studies and a real dataset on credit card fraud to study the performance under different evaluation metrics. From these extensive numerical experiments, we demonstrate under each classification paradigm, the complex dynamics among resampling techniques, base classification methods, evaluation metrics, and imbalance ratios. We also summarize a few takeaway messages regarding the choices of resampling techniques and base classification methods, which could be helpful for practitioners.
@article{feng2021imbalanced, author = {Feng, Yang and Zhou, Min and Tong, Xin}, journal = {Statistical Analysis and Data Mining: The ASA Data Science Journal}, number = {5}, pages = {383--406}, publisher = {Wiley Subscription Services, Inc., A Wiley Company Hoboken}, title = {Imbalanced classification: A paradigm-based review}, volume = {14}, year = {2021}, doi = {10.1002/sam.11538}, } - Comparison of solid tissue sequencing and liquid biopsy accuracy in identification of clinically relevant gene mutations and rearrangements in lung adenocarcinomasLawrence Hsu Lin, Douglas HR Allison, Yang Feng, George Jour, Kyung Park, and 6 more authorsModern Pathology, 2021
@article{lin2021comparison, author = {Lin, Lawrence Hsu and Allison, Douglas HR and Feng, Yang and Jour, George and Park, Kyung and Zhou, Fang and Moreira, Andre L and Shen, Guomiao and Feng, Xiaojun and Sabari, Joshua and others}, journal = {Modern Pathology}, number = {12}, pages = {2168--2174}, publisher = {Nature Publishing Group}, title = {Comparison of solid tissue sequencing and liquid biopsy accuracy in identification of clinically relevant gene mutations and rearrangements in lung adenocarcinomas}, volume = {34}, year = {2021} } - Super RaSE: Super Random Subspace Ensemble ClassificationMachine Learning, High-dimensional Statistics, ClassificationJianan Zhu, and Yang FengJournal of Risk and Financial Management, 2021
@article{zhu2021super, author = {Zhu, Jianan and Feng, Yang}, journal = {Journal of Risk and Financial Management}, number = {12}, pages = {612}, publisher = {Multidisciplinary Digital Publishing Institute}, title = {Super RaSE: Super Random Subspace Ensemble Classification}, volume = {14}, year = {2021}, } - Association of body composition parameters measured on CT with risk of hospitalization in patients with Covid-19Hersh Chandarana, Nisanard Pisuchpen, Rachel Krieger, Bari Dane, Artem Mikheev, and 3 more authorsEuropean Journal of Radiology, 2021
@article{chandarana2021association, author = {Chandarana, Hersh and Pisuchpen, Nisanard and Krieger, Rachel and Dane, Bari and Mikheev, Artem and Feng, Yang and Kambadakone, Avinash and Rusinek, Henry}, journal = {European Journal of Radiology}, pages = {110031}, publisher = {Elsevier}, title = {Association of body composition parameters measured on CT with risk of hospitalization in patients with Covid-19}, doi = {10.1016/j.ejrad.2021.110031}, volume = {145}, year = {2021} } - Targeted crisis risk control: A neyman-pearson approachNeyman-Pearson ClassificationYang Feng, Xin Tong, and Weining XinAvailable at SSRN, 2021
@article{feng2021targeted, author = {Feng, Yang and Tong, Xin and Xin, Weining}, journal = {Available at SSRN}, title = {Targeted crisis risk control: A neyman-pearson approach}, doi = {10.2139/ssrn.3945980}, year = {2021}, } - NCOG-11. ASSOCIATION OF HYPERGLYCEMIA AND TUMOR SUBCLASS ON SURVIVAL IN IDH-WILDTYPE GLIOBLASTOMAElisa Liu, Varshini Vasudevaraja, Vladislav Sviderskiy, Yang Feng, Ivy Tran, and 6 more authorsNeuro-Oncology, 2021
BACKGROUND RNA expression and DNA methylation studies have identified different subclasses of isocitrate dehydrogenase (IDH)-wildtype (wt) glioblastoma (GBM). However, the prognostic significance of molecular subclasses is unclear. Although hyperglycemia has been previously associated with worse survival, attempts to lower glucose have yielded mixed responses. The role of hyperglycemia may be confounded by molecular heterogeneity and have different impact in molecularly distinct GBM subclasses. METHODS Clinical, laboratory, and molecular data on 89 IDH-wt GBMs profiled by clinical next-generation sequencing and treated with Stupp protocol were reviewed. IDH-wt GBMs were subclassified into RTKI (Proneural), RTKII (Classical) and Mesenchymal subtypes using DNA methylation. Average glucose was calculated by time-weighting plasma glucose measurements between diagnosis and last follow-up. RESULTS Patients were stratified into three groups using average glucose: tertile one (< 100mg/dL), tertile two (100-115mg/dL), and tertile three ( > 115mg/dL). Comparison across glucose tertiles revealed no significant differences in Karfnosky Performance Status (KPS), dexamethasone dose, MGMT methylation, or methylation subclass. Overall survival (OS) was not affected by methylation subclass (log-rank p=0.9) but decreased with higher glucose (log-rank p=0.015). Higher glucose tertiles were associated with poorer OS among RTK I (log-rank p=0.08) and mesenchymal tumors (log-rank p=0.05), but not RTK II (log-rank p=0.99). After controlling for age, KPS, dexamethasone dose, and MGMT status, glucose remained significantly associated with survival (adjusted hazard ratio=5.2, p=0.02). DNA methylation clustering did not identify a unique signature associated with high or low glucose levels. Metabolomic analysis of 23 tumors showed minimal variation across metabolites within the cohort with no differences across molecular subclasses. CONCLUSION Higher average glucose values were associated with poorer OS in RTKI and Mesenchymal IDH-wt GBM, but not RTKII. There were no discernible epigenetic or metabolomic differences between tumors in different glucose environments, suggesting a potential survival benefit with systemic glucose lowering in selected molecular subtype.
@article{liu2021ncog, author = {Liu, Elisa and Vasudevaraja, Varshini and Sviderskiy, Vladislav and Feng, Yang and Tran, Ivy and Serrano, Jonathan and Cordova, Christine and Kurz, Sylvia and Golfinos, John and Sulman, Erik and others}, journal = {Neuro-Oncology}, number = {Suppl 6}, pages = {vi154}, publisher = {Oxford University Press}, title = {NCOG-11. ASSOCIATION OF HYPERGLYCEMIA AND TUMOR SUBCLASS ON SURVIVAL IN IDH-WILDTYPE GLIOBLASTOMA}, volume = {23}, doi = {10.1093/neuonc/noab196.602}, year = {2021} }
2020
- A Projection Based Conditional Dependence Measure with Applications to High-dimensional Undirected Graphical ModelsHigh-Dimensional Statistics, Graphical ModelsJianqing Fan, Yang Feng, and Lucy XiaJournal of Econometrics, 2020
@article{fan2020projection, author = {Fan, Jianqing and Feng, Yang and Xia, Lucy}, journal = {Journal of Econometrics}, title = {A Projection Based Conditional Dependence Measure with Applications to High-dimensional Undirected Graphical Models}, year = {2020}, doi = {10.1016/j.jeconom.2019.12.016}, } - On the sparsity of Mallows model averaging estimatorHigh-Dimensional Statistics, Model AveragingYang Feng, Qingfeng Liu, and Ryo OkuiEconomics Letters, 2020
@article{feng2020sparsity, author = {Feng, Yang and Liu, Qingfeng and Okui, Ryo}, journal = {Economics Letters}, pages = {108916}, publisher = {North-Holland}, title = {On the sparsity of Mallows model averaging estimator}, volume = {187}, year = {2020}, } - On the estimation of correlation in a binary sequence modelNetwork AnalysisHaolei Weng, and Yang FengJournal of Statistical Planning and Inference, 2020
@article{weng2020estimation, author = {Weng, Haolei and Feng, Yang}, journal = {Journal of Statistical Planning and Inference}, pages = {123--137}, publisher = {North-Holland}, title = {On the estimation of correlation in a binary sequence model}, volume = {207}, year = {2020}, } - Neyman-Pearson classification: parametrics and sample size requirementHigh-Dimensional Statistics, Neyman-Pearson ClassificationXin Tong, Lucy Xia, Jiacheng Wang, and Yang FengJournal of Machine Learning Research, 2020
@article{tong2020neyman, author = {Tong, Xin and Xia, Lucy and Wang, Jiacheng and Feng, Yang}, journal = {Journal of Machine Learning Research}, title = {Neyman-Pearson classification: parametrics and sample size requirement}, year = {2020}, } - Nested Model Averaging on Solution Path for High-dimensional Linear RegressionHigh-Dimensional Statistics, Model AveragingYang Feng, and Qingfeng LiuStat, 2020
We study the nested model averaging method on the solution path for a high‐dimensional linear regression problem. In particular, we propose to combine model averaging with regularized estimators (e.g., lasso, elastic net, and Sorted L‐One Penalized Estimation [SLOPE]) on the solution path for high‐dimensional linear regression. In simulation studies, we first conduct a systematic investigation on the impact of predictor ordering on the behaviour of nested model averaging, and then show that nested model averaging with lasso, elastic net and SLOPE compares favourably with other competing methods, including the infeasible lasso, elastic, net and SLOPE with the tuning parameter optimally selected. A real data analysis on predicting the per capita violent crime in the United States shows outstanding performance of the nested model averaging with lasso.
@article{feng2020nested, author = {Feng, Yang and Liu, Qingfeng}, journal = {Stat}, title = {Nested Model Averaging on Solution Path for High-dimensional Linear Regression}, year = {2020}, doi = {10.1002/sta4.317} } - Accounting for incomplete testing in the estimation of epidemic parametersEpidemiologyRebecca A Betensky, and Yang FengInternational Journal of Epidemiology, 2020
@article{betensky2020accounting, author = {Betensky, Rebecca A and Feng, Yang}, journal = {International Journal of Epidemiology}, publisher = {Cold Spring Harbor Laboratory Press}, title = {Accounting for incomplete testing in the estimation of epidemic parameters}, year = {2020}, } - Analytical performance of lateral flow immunoassay for SARS-CoV-2 exposure screening on venous and capillary blood samplesMargaret A Black, Guomiao Shen, Xiaojun Feng, Wilfredo Garcia Beltran, Yang Feng, and 6 more authorsJ Immunol Methods, 2020
@article{black2020analytical, author = {Black, Margaret A and Shen, Guomiao and Feng, Xiaojun and Beltran, Wilfredo Garcia and Feng, Yang and Vasudevaraja, Varshini and Allison, Douglas and Lin, Lawrence H and Gindin, Tatyana and Astudillo, Michael and others}, journal = {J Immunol Methods}, publisher = {Cold Spring Harbor Laboratory Press}, title = {Analytical performance of lateral flow immunoassay for SARS-CoV-2 exposure screening on venous and capillary blood samples}, doi = {10.1016/j.jim.2020.112909}, year = {2020} } - Visceral adipose tissue in patients with COVID-19Hersh Chandarana, Bari Dane, Artem Mikheev, Myles T. Taffel, Yang Feng, and 1 more authorAbdominal Radiology, 2020
@article{chandarana2020visceral, title = {Visceral adipose tissue in patients with COVID-19}, author = {Chandarana, Hersh and Dane, Bari and Mikheev, Artem and Taffel, Myles T. and Feng, Yang and Rusinek, Henry}, journal = {Abdominal Radiology}, volume = {46}, number = {2}, pages = {818--825}, year = {2020}, publisher = {Springer Science and Business Media LLC}, doi = {10.1007/s00261-020-02693-2} }
2019
- Regularization after retention in ultrahigh dimensional linear regression modelsHigh-Dimensional StatisticsHaolei Weng, Yang Feng, and Xingye QiaoStatistica Sinica, 2019
@article{weng2019regularization, author = {Weng, Haolei and Feng, Yang and Qiao, Xingye}, journal = {Statistica Sinica}, title = {Regularization after retention in ultrahigh dimensional linear regression models}, year = {2019}, } - The restricted consistency property of leave-n_v-out cross-validation for high-dimensional variable selectionHigh-Dimensional StatisticsYang Feng, and Yi YuStatistica Sinica, 2019
@article{feng2019restricted, author = {Feng, Yang and Yu, Yi}, journal = {Statistica Sinica}, pages = {1607--1630}, title = {The restricted consistency property of leave-$n\_v$-out cross-validation for high-dimensional variable selection}, volume = {29}, year = {2019}, } - Likelihood adaptively modified penaltiesHigh-Dimensional StatisticsYang Feng, Tengfei Li, and Zhiliang YingApplied Stochastic Models in Business and Industry, 2019
@article{feng2019likelihood, author = {Feng, Yang and Li, Tengfei and Ying, Zhiliang}, journal = {Applied Stochastic Models in Business and Industry}, number = {2}, pages = {330--353}, publisher = {Wiley Online Library}, title = {Likelihood adaptively modified penalties}, volume = {35}, year = {2019}, } - A kronecker product model for repeated pattern detection on 2d urban imagesComputer VisionJuan Liu, Emmanouil Z Psarakis, Yang Feng, and Ioannis StamosIEEE transactions on pattern analysis and machine intelligence, 2019
@article{liu2019kronecker, author = {Liu, Juan and Psarakis, Emmanouil Z and Feng, Yang and Stamos, Ioannis}, journal = {IEEE transactions on pattern analysis and machine intelligence}, title = {A kronecker product model for repeated pattern detection on 2d urban images}, year = {2019}, number = {9}, pages = {2266--2272}, volume = {41}, publisher = {IEEE}, }
2018
- SIS: An R Package for Sure Independence Screening in Ultrahigh Dimensional Statistical ModelsHigh-Dimensional StatisticsDiego Franco Saldana, and Yang FengJournal of Statistical Software, 2018
@article{saldana2018sis, author = {Saldana, Diego Franco and Feng, Yang}, journal = {Journal of Statistical Software}, number = {2}, pages = {1--25}, title = {SIS: An R Package for Sure Independence Screening in Ultrahigh Dimensional Statistical Models}, volume = {83}, year = {2018}, } - Penalized weighted least absolute deviation regressionHigh-Dimensional StatisticsXiaoli Gao, and Yang FengStatistics and Its Interface, 2018
@article{gao2018penalized, author = {Gao, Xiaoli and Feng, Yang}, journal = {Statistics and Its Interface}, pages = {79--89}, publisher = {http://dx.doi.org/10.4310/SII.2018.v11.n1.a7}, title = {Penalized weighted least absolute deviation regression}, volume = {11}, year = {2018}, } - Nonparametric Independence Screening via Favored Smoothing BandwidthHigh-Dimensional StatisticsYang Feng, Yichao Wu, and Leonard StefanskiJournal of Statistical Planning and Inference, 2018
@article{feng2018nonparametric, author = {Feng, Yang and Wu, Yichao and Stefanski, Leonard}, journal = {Journal of Statistical Planning and Inference}, title = {Nonparametric Independence Screening via Favored Smoothing Bandwidth}, year = {2018}, } - Neyman-Pearson classification algorithms and NP receiver operating characteristicsHigh-Dimensional Statistics, Neyman-Pearson ClassificationXin Tong, Yang Feng, and Jingyi Jessica LiScience Advances, 2018
@article{tong2018neyman, author = {Tong, Xin and Feng, Yang and Li, Jingyi Jessica}, journal = {Science Advances}, title = {Neyman-Pearson classification algorithms and NP receiver operating characteristics}, year = {2018}, number = {2}, pages = {eaao1659}, volume = {4}, publisher = {American Association for the Advancement of Science}, } - Model selection for high-dimensional quadratic regression via regularizationHigh-Dimensional StatisticsNing Hao, Yang Feng, and Hao Helen ZhangJournal of the American Statistical Association, 2018
@article{hao2018model, author = {Hao, Ning and Feng, Yang and Zhang, Hao Helen}, journal = {Journal of the American Statistical Association}, title = {Model selection for high-dimensional quadratic regression via regularization}, year = {2018}, number = {522}, pages = {615--625}, volume = {113}, publisher = {Taylor \& Francis}, } - A crowdsourced analysis to identify ab initio molecular signatures predictive of susceptibility to viral infectionSlim Fourati, Aarthi Talla, Mehrad Mahmoudian, Joshua G Burkhart, Riku Klén, and 6 more authorsNature communications, 2018
@article{fourati2018crowdsourced, author = {Fourati, Slim and Talla, Aarthi and Mahmoudian, Mehrad and Burkhart, Joshua G and Kl{\'e}n, Riku and Henao, Ricardo and Yu, Thomas and Ayd{\i}n, Zafer and Yeung, Ka Yee and Ahsen, Mehmet Eren and others}, journal = {Nature communications}, number = {1}, pages = {4418}, publisher = {Nature Publishing Group UK London}, title = {A crowdsourced analysis to identify ab initio molecular signatures predictive of susceptibility to viral infection}, volume = {9}, year = {2018} }
2017
- How many communities are there?Network AnalysisD Franco Saldana, Yi Yu, and Yang FengJournal of Computational and Graphical Statistics, 2017
@article{saldana2017many, author = {Saldana, D Franco and Yu, Yi and Feng, Yang}, journal = {Journal of Computational and Graphical Statistics}, number = {1}, pages = {171--181}, publisher = {Taylor \& Francis}, title = {How many communities are there?}, volume = {26}, year = {2017}, } - Binary switch portfolioTengfei Li, Kani Chen, Yang Feng, and Zhiliang YingQuantitative Finance, 2017
@article{li2017binary, author = {Li, Tengfei and Chen, Kani and Feng, Yang and Ying, Zhiliang}, journal = {Quantitative Finance}, number = {5}, pages = {763--780}, publisher = {Routledge}, title = {Binary switch portfolio}, volume = {17}, year = {2017} } - Post selection shrinkage estimation for high-dimensional data analysisHigh-Dimensional StatisticsXiaoli Gao, SE Ahmed, and Yang FengApplied Stochastic Models in Business and Industry, 2017
@article{gao2017post, author = {Gao, Xiaoli and Ahmed, SE and Feng, Yang}, journal = {Applied Stochastic Models in Business and Industry}, number = {2}, pages = {97--120}, publisher = {John Wiley \& Sons, Ltd}, title = {Post selection shrinkage estimation for high-dimensional data analysis}, volume = {33}, year = {2017}, } - JDINAC: joint density-based non-parametric differential interaction network analysis and classification using high-dimensional sparse omics dataHigh-Dimensional Statistics, Network AnalysisJiadong Ji, Di He, Yang Feng, Yong He, Fuzhong Xue, and 1 more authorBioinformatics, 2017
@article{ji2017jdinac, author = {Ji, Jiadong and He, Di and Feng, Yang and He, Yong and Xue, Fuzhong and Xie, Lei}, journal = {Bioinformatics}, number = {19}, pages = {3080--3087}, publisher = {Oxford University Press}, title = {JDINAC: joint density-based non-parametric differential interaction network analysis and classification using high-dimensional sparse omics data}, volume = {33}, year = {2017}, } - Regularization After Marginal Learning for Ultra-High Dimensional Regression ModelsHigh-Dimensional StatisticsYang Feng, and Mengjia YuBig and Complex Data Analysis: Methodologies and Applications, 2017
@article{feng2017regularization, author = {Feng, Yang and Yu, Mengjia}, journal = {Big and Complex Data Analysis: Methodologies and Applications}, pages = {3--28}, publisher = {Springer International Publishing}, title = {Regularization After Marginal Learning for Ultra-High Dimensional Regression Models}, year = {2017}, } - Rejoinder to ‘Post-selection shrinkage estimation for high-dimensional data analysis’High-Dimensional StatisticsXiaoli Gao, S Ejaz Ahmed, Yang Feng, and othersApplied Stochastic Models in Business and Industry, 2017
@article{gao2017rejoinder, author = {Gao, Xiaoli and Ahmed, S Ejaz and Feng, Yang and others}, journal = {Applied Stochastic Models in Business and Industry}, number = {2}, pages = {131--135}, publisher = {John Wiley \& Sons}, title = {Rejoinder to `Post-selection shrinkage estimation for high-dimensional data analysis'}, volume = {33}, year = {2017}, } - Discussion on "Random-projection ensemble classification"High-Dimensional Statistics, ClassificationYang FengJournal of the Royal Statistical Society: Series B, 2017
@article{feng2017discussion, author = {Feng, Yang}, journal = {Journal of the Royal Statistical Society: Series B}, number = {4}, pages = {1011}, title = {Discussion on "Random-projection ensemble classification"}, volume = {79}, year = {2017}, }
2016
- Feature Augmentation via Nonparametrics and Selection (FANS) in high-dimensional classificationHigh-Dimensional Statistics, ClassificationJianqing Fan, Yang Feng, Jiancheng Jiang, and Xin TongJournal of the American Statistical Association, 2016
@article{fan2016feature, author = {Fan, Jianqing and Feng, Yang and Jiang, Jiancheng and Tong, Xin}, journal = {Journal of the American Statistical Association}, title = {Feature Augmentation via Nonparametrics and Selection (FANS) in high-dimensional classification}, year = {2016}, number = {513}, pages = {275--287}, volume = {111}, publisher = {Taylor \& Francis}, } - A survey on Neyman-Pearson classification and suggestions for future researchNeyman-Pearson ClassificationXin Tong, Yang Feng, and Anqi ZhaoWiley Interdisciplinary Reviews: Computational Statistics, 2016
In statistics and machine learning, classification studies how to automatically learn to make good qualitative predictions (i.e., assign class labels) based on past observations. Examples of classification problems include email spam filtering, fraud detection, market segmentation. Binary classification, in which the potential class label is binary, has arguably the most widely used machine learning applications. Most existing binary classification methods target on the minimization of the overall classification risk and may fail to serve some real‐world applications such as cancer diagnosis, where users are more concerned with the risk of misclassifying one specific class than the other. Neyman‐Pearson ( NP ) paradigm was introduced in this context as a novel statistical framework for handling asymmetric type I/ II error priorities. It seeks classifiers with a minimal type II error subject to a type I error constraint under some user‐specified level. Though NP classification has the potential to be an important subfield in the classification literature, it has not received much attention in the statistics and machine learning communities. This article is a survey on the current status of the NP classification literature. To stimulate readers’ research interests, the authors also envision a few possible directions for future research in NP paradigm and its applications. WIREs Comput Stat 2016, 8:64–81. doi: 10.1002/wics.1376 This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification
@article{tong2016survey, author = {Tong, Xin and Feng, Yang and Zhao, Anqi}, journal = {Wiley Interdisciplinary Reviews: Computational Statistics}, number = {2}, pages = {64--81}, publisher = {Wiley Online Library}, title = {A survey on Neyman-Pearson classification and suggestions for future research}, volume = {8}, doi = {10.1002/wics.1376}, year = {2016}, } - Variable selection and prediction with incomplete high-dimensional dataHigh-Dimensional StatisticsYing Liu, Yuanjia Wang, Yang Feng, and Melanie M WallThe annals of applied statistics, 2016
@article{liu2016variable, author = {Liu, Ying and Wang, Yuanjia and Feng, Yang and Wall, Melanie M}, journal = {The annals of applied statistics}, number = {1}, pages = {418}, publisher = {NIH Public Access}, title = {Variable selection and prediction with incomplete high-dimensional data}, volume = {10}, year = {2016}, } - Neyman-Pearson classification under high-dimensional settingsHigh-Dimensional Statistics, Neyman-Pearson ClassificationAnqi Zhao, Yang Feng, Lie Wang, and Xin TongJournal of Machine Learning Research, 2016
@article{zhao2016neyman, author = {Zhao, Anqi and Feng, Yang and Wang, Lie and Tong, Xin}, journal = {Journal of Machine Learning Research}, title = {Neyman-Pearson classification under high-dimensional settings}, year = {2016}, number = {212}, pages = {1--39}, volume = {17}, } - Tuning-parameter selection in regularized estimations of large covariance matricesHigh-Dimensional StatisticsYixin Fang, Binhuan Wang, and Yang FengJournal of Statistical Computation and Simulation, 2016
@article{fang2016tuning, author = {Fang, Yixin and Wang, Binhuan and Feng, Yang}, journal = {Journal of Statistical Computation and Simulation}, number = {3}, pages = {494--509}, publisher = {Taylor \& Francis}, title = {Tuning-parameter selection in regularized estimations of large covariance matrices}, volume = {86}, year = {2016}, }
2015
- Functional and Parametric Estimation in a Semi-and Nonparametric Model with Application to Mass-Spectrometry DataNonparametric StatisticsWeiping Ma, Yang Feng, Kani Chen, and Zhiliang YingThe international journal of biostatistics, 2015
@article{ma2015functional, author = {Ma, Weiping and Feng, Yang and Chen, Kani and Ying, Zhiliang}, journal = {The international journal of biostatistics}, number = {2}, pages = {285--303}, title = {Functional and Parametric Estimation in a Semi-and Nonparametric Model with Application to Mass-Spectrometry Data}, volume = {11}, year = {2015}, }
2014
- Apple: Approximate path for penalized likelihood estimatorsHigh-Dimensional StatisticsYi Yu, and Yang FengStatistics and Computing, 2014
@article{yu2014apple, author = {Yu, Yi and Feng, Yang}, journal = {Statistics and Computing}, pages = {803--819}, publisher = {Springer US}, title = {Apple: Approximate path for penalized likelihood estimators}, volume = {24}, year = {2014}, } - Modified cross-validation for penalized high-dimensional linear regression modelsHigh-Dimensional StatisticsYi Yu, and Yang FengJournal of Computational and Graphical Statistics, 2014
@article{yu2014modified, author = {Yu, Yi and Feng, Yang}, journal = {Journal of Computational and Graphical Statistics}, number = {4}, pages = {1009--1027}, publisher = {Taylor \& Francis}, title = {Modified cross-validation for penalized high-dimensional linear regression models}, volume = {23}, year = {2014}, } - Regularized principal components of heritabilityHigh-Dimensional StatisticsYixin Fang, Yang Feng, and Ming YuanComputational Statistics, 2014
@article{fang2014regularized, author = {Fang, Yixin and Feng, Yang and Yuan, Ming}, journal = {Computational Statistics}, pages = {455--465}, publisher = {Springer Berlin Heidelberg}, title = {Regularized principal components of heritability}, volume = {29}, year = {2014}, }
2013
2012
- A road to classification in high dimensional space: the regularized optimal affine discriminantHigh-Dimensional Statistics, ClassificationJianqing Fan, Yang Feng, and Xin TongJournal of the Royal Statistical Society: Series B (Statistical Methodology), 2012
@article{fan2012road, author = {Fan, Jianqing and Feng, Yang and Tong, Xin}, journal = {Journal of the Royal Statistical Society: Series B (Statistical Methodology)}, title = {A road to classification in high dimensional space: the regularized optimal affine discriminant}, year = {2012}, number = {4}, pages = {745--771}, volume = {74}, publisher = {Wiley Online Library}, }
2011
- Nonparametric independence screening in sparse ultra-high-dimensional additive modelsHigh-dimensional Statistics, Nonparametric StatisticsJianqing Fan, Yang Feng, and Rui SongJournal of the American Statistical Association, 2011
@article{fan2011nonparametric, author = {Fan, Jianqing and Feng, Yang and Song, Rui}, journal = {Journal of the American Statistical Association}, title = {Nonparametric independence screening in sparse ultra-high-dimensional additive models}, year = {2011}, pages = {544--557}, volume = {106}, publisher = {Taylor \& Francis}, }
2010
- High-dimensional variable selection for Cox’s proportional hazards modelHigh-Dimensional StatisticsJianqing Fan, Yang Feng, and Yichao WuIn Borrowing strength: Theory powering applications–a Festschrift for Lawrence D. Brown, 2010
@incollection{fan2010high, author = {Fan, Jianqing and Feng, Yang and Wu, Yichao}, booktitle = {Borrowing strength: Theory powering applications--a Festschrift for Lawrence D. Brown}, pages = {70--87}, publisher = {Institute of Mathematical Statistics}, title = {High-dimensional variable selection for Cox's proportional hazards model}, volume = {6}, year = {2010}, } - Nonparametric estimation of genewise variance for microarray dataNonparametric StatisticsJianqing Fan, Yang Feng, and Yue S NiuAnnals of Statistics, 2010
@article{fan2010nonparametric, author = {Fan, Jianqing and Feng, Yang and Niu, Yue S}, journal = {Annals of Statistics}, title = {Nonparametric estimation of genewise variance for microarray data}, year = {2010}, number = {5}, pages = {2723}, volume = {38}, publisher = {NIH Public Access}, } - High-dimensional statistical learning and nonparametric modelingHigh-Dimensional Statistics, Nonparametric StatisticsYang FengPrinceton University, 2010
@phdthesis{feng2010high, author = {Feng, Yang}, school = {Princeton University}, title = {High-dimensional statistical learning and nonparametric modeling}, year = {2010}, } - The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models.Leming Shi, Gregory Campbell, Wendell D Jones, Fabien Campagne, Zhining Wen, and 6 more authorsNature biotechnology, 2010
@article{shi2010microarray, author = {Shi, Leming and Campbell, Gregory and Jones, Wendell D and Campagne, Fabien and Wen, Zhining and Walker, Stephen J and Su, Zhenqiang and Chu, Tzu-Ming and Goodsaid, Federico M and Pusztai, Lajos and others}, journal = {Nature biotechnology}, number = {8}, pages = {827}, title = {The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models.}, volume = {28}, year = {2010}, }
2009
- Network exploration via the adaptive LASSO and SCAD penaltiesHigh-dimensional Statistics, Network Analysis, Graphical ModelsJianqing Fan, Yang Feng, and Yichao WuThe annals of applied statistics, 2009
Graphical models are frequently used to explore networks, such as genetic networks, among a set of variables. This is usually carried out via exploring the sparsity of the precision matrix of the variables under consideration. Penalized likelihood methods are often used in such explorations. Yet, positive-definiteness constraints of precision matrices make the optimization problem challenging. We introduce non-concave penalties and the adaptive LASSO penalty to attenuate the bias problem in the network estimation. Through the local linear approximation to the non-concave penalty functions, the problem of precision matrix estimation is recast as a sequence of penalized likelihood problems with a weighted L1 penalty and solved using the efficient algorithm of Friedman et al. (2008). Our estimation schemes are applied to two real datasets. Simulation experiments and asymptotic theory are used to justify our proposed methods.
@article{fan2009network, author = {Fan, Jianqing and Feng, Yang and Wu, Yichao}, journal = {The annals of applied statistics}, title = {Network exploration via the adaptive LASSO and SCAD penalties}, year = {2009}, pages = {521--541}, publisher = {Institute of Mathematical Statistics}, } - Local quasi-likelihood with a parametric guideNonparametric StatisticsJianqing Fan, Yichao Wu, and Yang FengAnnals of Statistics, 2009
@article{fan2009local, author = {Fan, Jianqing and Wu, Yichao and Feng, Yang}, journal = {Annals of Statistics}, title = {Local quasi-likelihood with a parametric guide}, year = {2009}, number = {6B}, pages = {4153}, volume = {37}, publisher = {NIH Public Access}, } - Alignment of protein mass spectrometry data by integrated Markov chain shifting methodNonparametric StatisticsYang Feng, Weiping Ma, Zhanfeng Wang, Yaning Yang, and Zhiliang YingStatistics and its Interface, 2009
@article{feng2009alignment, author = {Feng, Yang and Ma, Weiping and Wang, Zhanfeng and Yang, Yaning and Ying, Zhiliang}, journal = {Statistics and its Interface}, number = {3}, pages = {329--340}, publisher = {International Press of Boston}, title = {Alignment of protein mass spectrometry data by integrated Markov chain shifting method}, volume = {2}, year = {2009}, } - Discussion on "Nonparametric Prediction in Measurement Error Models"Nonparametric StatisticsJianqing Fan, and Yang FengJournal of the American Statistical Association, 2009
@article{fan2009discussion, author = {Fan, Jianqing and Feng, Yang}, journal = {Journal of the American Statistical Association}, number = {487}, pages = {1003--1007}, publisher = {American Statistical Association}, title = {Discussion on "Nonparametric Prediction in Measurement Error Models"}, volume = {104}, year = {2009}, doi = {10.1198/jasa.2009.tm09188}, }