Transfer Learning and Multi-task Learning
Overview
This page summarizes several of my recent works on Transfer Learning (TL) and Multi-task Learning (MTL), focusing on theoretical guarantees, algorithmic design, and applications to high-dimensional, unsupervised, and federated settings.
These works aim to answer a common question:
How can we safely and efficiently borrow information across related tasks or domainsโwhile avoiding negative transfer and preserving privacy?
๐งฎ Transfer Learning under High-Dimensional Generalized Linear Models
Ye Tian & Yang Feng, Journal of the American Statistical Association (JASA), 2023
๐ Paperโ|โ๐ป R Package: glmtrans
Summary
This work develops a principled framework for transfer learning in high-dimensional GLMs, where multiple related source datasets can inform a target task. It proposes a two-step TransGLM procedure to automatically determine which sources are informative, mitigating negative transfer.
Highlights
- Introduces a transferable source detection mechanism to adaptively select useful sources.
- Provides non-asymptotic bounds for estimation and prediction errors.
- Establishes theoretical conditions under which transfer learning yields provable benefits.
- Includes a method for valid post-transfer inference (confidence intervals).
๐งฉ Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models
Ye Tian, Haolei Weng, Lucy Xia, & Yang Feng, arXiv preprint, 2024
๐ Paper โ|โ๐ป R Package: mtlgmm
Summary
This paper studies unsupervised MTL/TL for Gaussian Mixture Models (GMMs), proposing a robust and theoretically grounded approach that jointly learns across multiple mixture tasks.
Highlights
- A multi-task EM algorithm that learns shared latent structures across tasks.
- Robustness against outlier tasks with arbitrary distributions.
- Finite-sample guarantees and minimax-optimal estimation rates.
- Theoretical resolution of label-alignment issues across tasks.
๐ Towards the Theory of Unsupervised Federated Learning
Ye Tian, Haolei Weng, & Yang Feng, ICML 2024
๐ Paper | ๐ป Python package: FedGrEM
Summary
This paper provides one of the first non-asymptotic analyses of federated EM algorithms for unsupervised learning. It bridges the gap between theory and practice in federated expectation-maximization across heterogeneous clients.
Highlights
- Finite-sample convergence guarantees for federated EM under heterogeneity.
- Characterizes trade-offs between communication cost and estimation accuracy.
- Addresses label alignment and initialization issues unique to unsupervised federated setups.
- Empirically validated across distributed mixture models.
๐ง Learning from Similar Linear Representations: Adaptivity, Minimaxity, and Robustness
Ye Tian, Yuqi Gu, & Yang Feng, Journal of Machine Learning Research (JMLR), 2025
๐ Paper | ๐ป Python package: RL-MTL-TL
Summary
This work provides a general theory for learning from similar but not identical linear representations. It develops adaptive algorithms that automatically determine how much to share across tasks, achieving robustness and minimax optimality.
Highlights
- Introduces an adaptive penalized ERM framework for shared representations.
- Characterizes regimes of beneficial transfer vs negative transfer.
- Achieves minimax rates and adapts seamlessly to unknown similarity levels.
- Robust to outlier tasks and distributional shifts.
๐งญ Geometry-Aware Multi-Task Representation Learning
Aoran Chen & Yang Feng, arXiv preprint, 2025
๐ Paper | ๐ป Python package: GeoERM
Summary
This paper introduces GeoERM, a geometry-aware multi-task learning (MTL) framework that respects the intrinsic Riemannian geometry of the representation space. Rather than treating the shared representation as a point in Euclidean space, the method performs optimization on the appropriate manifold, combining:
- Riemannian gradient steps that align with the curvature of the search space,
- Polar retraction to ensure that updates remain on the manifold.
GeoERM incurs similar per-iteration computational cost to Euclidean baselines but offers enhanced robustness under task heterogeneity and adversarial label noise. Experiments on synthetic and real datasets (e.g. wearable-sensor activity recognition) show improved estimation accuracy and reduced negative transfer.
Highlights
- Embeds the latent representation on its natural Riemannian manifold and operates via manifold-aware updates.
- Demonstrates resilience to heterogeneity and adversarial noise.
- Retains computational efficiency comparable to standard Euclidean MTL methods.
- Empirically outperforms leading MTL and single-task baselines.
๐ Federated Transfer Learning with Differential Privacy
Mengchu Li, Ye Tian, Yang Feng, & Yi Yu, arXiv preprint, 2024
๐ Paper
Summary
This study integrates federated transfer learning with differential privacy (DP) to enable knowledge transfer across domains without compromising user data confidentiality.
Highlights
- A noise-calibrated federated transfer algorithm with formal DP guarantees.
- Tight theoretical characterization of privacyโutility trade-offs.
- Demonstrates practical feasibility for privacy-sensitive health and financial data.