Nicholas Ho

Computational Biology PhD @ CMU School of Computer Science

nzh [AT] cs.cmu.edu

Hello! My name is Nicholas Ho and I am a third year Computational Biology PhD student in Carnegie Mellon University School of Computer Science. I am incredibly grateful to be advised by Professor Jian Ma and Professor Eric Xing. I develop self-supervised models that learn from large biological datasets to uncover new insights across different scales of biology.

Self-supervised learning across biological scales

I believe that the task always informs the representation. In biology, because data is limited, it is critical to design pretraining objectives aligned with the downstream tasks that matter. By working on the right problems with the right inductive biases, I believe we can build generalizable and scalable models.

Feel free to reach out to me! I'm always looking for interesting problems to talk about!

Publications

Most recent publications on Google Scholar.
indicates equal contribution.

HEIMDALL: Disentangling Tokenizer Design for Robust Transfer in Single-Cell Foundation Models

Ellie Haber*, Shahul Alam*, Nicholas Ho*, Renming Liu, Evan Trop, Shaoheng Liang, Muyu Yang, Spencer Krieger, Jian Ma.

bioRxiv, 2025.

AIDO.Tissue: Spatial Cell-Guided Pretraining for Scalable Spatial Transcriptomics Foundation Model

Jing Gong*, Yixuan Wang*, Nicholas Ho, Xingyi Cheng, Le Song, Eric Xing.

bioRxiv, 2025.

Foundation Models Improve Perturbation Response Prediction

Elijah Cole, Geert-Jan Huizing, Sohan Addagudi, Nicholas Ho, Euxhen Hasanaj, Merel Kuijs, Toby Johnstone, Maria Carilli, Alec Davi, Caleb Ellington, Christoph Feinauer, Pan Li, Romain Menegaux, Shahin Mohammadi, Yanjun Shao, Josiah Zhang, Emma Lundberg, Le Song, Ziv Bar-Joseph, Eric P. Xing.

bioRxiv, 2026.

Scaling Dense Representations for Single Cell with Transcriptome-Scale Context

Nicholas Ho, Caleb N. Ellington, Jinyu Hou, Sohan Addagudi, Shentong Mo, Tianhua Tao, Dian Li, Yonghao Zhuang, Hongyi Wang, Xingyi Cheng, Le Song, Eric P. Xing.

In NeurIPS Workshop on AI for New Drug Modalities, 2024.

Accurate and General DNA Representations Emerge from Genome Foundation Models at Scale

Caleb N. Ellington, Ning Sun, Nicholas Ho, Tianhua Tao, Sazan Mahbub, Dian Li, Yonghao Zhuang, Hongyi Wang, Le Song, Eric P. Xing.

In NeurIPS Workshop on AI for New Drug Modalities, 2024.

Learning Free Energy Pathways through Reinforcement Learning of Adaptive Steered Molecular Dynamics

Nicholas Ho, John Kevin Cava, John Vant, Ankita Shukla, Jacob Miratsky, Pavan Turaga, Ross Maciejewski, Abhishek Singharoy.

Machine Learning In Structural Biology (MLSB) Workshop at the 36th Conference on Neural Information Processing Systems

HEIMDALL: Disentangling Tokenizer Design for Robust Transfer in Single-Cell Foundation Models

Ellie Haber*, Shahul Alam*, Nicholas Ho*, Renming Liu, Evan Trop, Shaoheng Liang, Muyu Yang, Spencer Krieger, Jian Ma.

bioRxiv, 2025.

AIDO.Tissue: Spatial Cell-Guided Pretraining for Scalable Spatial Transcriptomics Foundation Model

Jing Gong*, Yixuan Wang*, Nicholas Ho, Xingyi Cheng, Le Song, Eric Xing.

bioRxiv, 2025.

Foundation Models Improve Perturbation Response Prediction

Elijah Cole, Geert-Jan Huizing, Sohan Addagudi, Nicholas Ho, Euxhen Hasanaj, Merel Kuijs, Toby Johnstone, Maria Carilli, Alec Davi, Caleb Ellington, Christoph Feinauer, Pan Li, Romain Menegaux, Shahin Mohammadi, Yanjun Shao, Josiah Zhang, Emma Lundberg, Le Song, Ziv Bar-Joseph, Eric P. Xing.

bioRxiv, 2026.

Scaling Dense Representations for Single Cell with Transcriptome-Scale Context

Nicholas Ho, Caleb N. Ellington, Jinyu Hou, Sohan Addagudi, Shentong Mo, Tianhua Tao, Dian Li, Yonghao Zhuang, Hongyi Wang, Xingyi Cheng, Le Song, Eric P. Xing.

In NeurIPS Workshop on AI for New Drug Modalities, 2024.

Accurate and General DNA Representations Emerge from Genome Foundation Models at Scale

Caleb N. Ellington, Ning Sun, Nicholas Ho, Tianhua Tao, Sazan Mahbub, Dian Li, Yonghao Zhuang, Hongyi Wang, Le Song, Eric P. Xing.

In NeurIPS Workshop on AI for New Drug Modalities, 2024.

Learning Free Energy Pathways through Reinforcement Learning of Adaptive Steered Molecular Dynamics

Nicholas Ho, John Kevin Cava, John Vant, Ankita Shukla, Jacob Miratsky, Pavan Turaga, Ross Maciejewski, Abhishek Singharoy.

Machine Learning In Structural Biology (MLSB) Workshop at the 36th Conference on Neural Information Processing Systems

Towards Conditional Generation of Minimal Action Potential Pathways for Molecular Dynamics

John Kevin Cava, John Vant, Nicholas Ho, Ankita Shukla, Pavan Turaga, Ross Maciejewski, Abhishek Singharoy

ELLIS ML4Molecules Workshop, 2021

CyanoPATH: a knowledgebase of genome-scale functional repertoire for toxic cyanobacterial blooms

Wei Du, Gaoyang Li, Nicholas Ho, Landon Jenkins, Drew Hockaday, Jiankang Tan, Huansheng Cao

Briefings in Bioinformatics

Projects

HEIMDALL: Disentangling Tokenizer Design for Robust Transfer in Single-Cell Foundation Models
bioRxiv, 2025.
Foundation Models Improve Perturbation Response Prediction
bioRxiv, 2026.
Scaling Dense Representations for Single Cell with Transcriptome-Scale Context
In NeurIPS Workshop on AI for New Drug Modalities, 2024.
Accurate and General DNA Representations Emerge from Genome Foundation Models at Scale
In NeurIPS Workshop on AI for New Drug Modalities, 2024.
AIDO.Tissue: Spatial Cell-Guided Pretraining for Scalable Spatial Transcriptomics Foundation Model
bioRxiv, 2025.
Learning Free Energy Pathways through Reinforcement Learning of Adaptive Steered Molecular Dynamics
A paper accepted to NeurIPS Workshop Machine Learning for Structural Biology 2022, First Author
HEIMDALL: Disentangling Tokenizer Design for Robust Transfer in Single-Cell Foundation Models
bioRxiv, 2025.
Foundation Models Improve Perturbation Response Prediction
bioRxiv, 2026.
Scaling Dense Representations for Single Cell with Transcriptome-Scale Context
In NeurIPS Workshop on AI for New Drug Modalities, 2024.
Accurate and General DNA Representations Emerge from Genome Foundation Models at Scale
In NeurIPS Workshop on AI for New Drug Modalities, 2024.
AIDO.Tissue: Spatial Cell-Guided Pretraining for Scalable Spatial Transcriptomics Foundation Model
bioRxiv, 2025.
Learning Free Energy Pathways through Reinforcement Learning of Adaptive Steered Molecular Dynamics
A paper accepted to NeurIPS Workshop Machine Learning for Structural Biology 2022, First Author
Towards Conditional Generation of Minimal Action Potential Pathways for Molecular Dynamics
A paper accepted to ELLIS ML4Molecules Workshop 2021.
Particle System Dynamics for Compromised Social Networks
This is the code and additional results for my APM541 final project on Particle System Dynamics for Compromised Social Networks.
An Interpretable Method of Learning Stochastic Game Dynamics from CMSAC
Developed a new physics-inspired framework for analyzing soccer ball dynamics by modeling underlying potential landscapes.
CyanoPATH - a knowledgebase of genome-scale functional repertoire for toxic cyanobacterial blooms
CyanoPATH is a database that curates and analyzes the common genomic functional repertoire for cyanobacteria harmful algal blooms (CyanoHABs) in eutrophic waters.
Bayesian Information Criterion from Scratch
The Bayesian Information Criterion implemented from scratch in python to predict change points from noisy data.
HonorCode Tutoring Website
A System that organizes service and provides tutoring for members. Link for the website is tutors.concordiashanghai.org.
Gibbs Sampling of a Gaussian Mixture Model from Scratch Using Python
Using the Gibbs Sampling Scheme and Metropolis-Within-Gibbs Sampling Scheme to learn parameters of Gaussian Mixture Models.
Concordia International IOT Environmental Sensors and Data Analysis
Soldered, built and programmed 30 microcontroller sensors that streamed data to an SQL database. Built a data analysis dashboard from scratch for school admins to view air quality dynamically.
Gaussian Process from Scratch with Python for simple interpolation
Just a quick and dirty implementation of GP from scratch using python
Viterbi Algorithm for Hidden Markov Models
Coding up the viterbi algorithm for solving hidden markov models from scratch using python.
Interactive Assistant Winter
Project Winter is a proof of concept where several APIs are connected through Flask and VueJS and activated via parsed intents from DialogFlow.

Vitæ

  • Carnegie Mellon University August 2023 - Now
    PhD Student
    Advised by Prof. Jian Ma and Prof. Eric Xing.
  • Harvard Medical School June 2022 - May 2023
    Visiting Research Scholar
    • Accepted as a visiting research fellow at Harvard DBMI through a highly competitive application process.
    • Currently developing a novel machine learning architecture to deal with missing and imbalanced modalities.
  • Struct. Sys. Bio at Biodesign Institute at ASU Feb 2020 - May 2023
    Research Associate
    Worked as a Research Associate in the Singharoy Lab. Lead research for my honors thesis to utilize deep ML methods with statistical mechanical methodologies.

    • Lead and wrote a project on discovering free energy pathways with reinforcement learning
    • Developed a molecular differentiable simulator for Jax based on TorchMD
    • Helped develop novel conditional generative models for deriving free energy pathways
    • Implemented and tested several physics-informed models.
  • Carnegie Mellon University June 2021 - August 2021
    Visiting Research Scholar
    Accepted as a visiting research fellow in CMU Statistics through a highly competitive summer research program.
    • Created and a novel methodology for using stochastic simulators for soccer game predictions. Our method had comparable results to models trained directly on scores.
  • Pichel Lab Biodesign Inst. May 2020 - May 2021
    Assistant Researcher under Ferran Garcia Pichel at Biodesign Institute
    Developed a Python Plugin for Qiime2 for the relationship between ribosomal gene copy number and size.
  • Cao Lab Biodesign Inst. August 2019 – May 2021
    Assistant Researcher
    • Conducted data engineering and analysis on genomes assembled from a metagenome in order to study the strain level variance within microbiome communities.
    • Implemented a web system that highlighted which genes present in a pathway for particular cyanobacteria species using PFAM’s Hidden Markov Model package, JavaScript, SQL databases, shell and Python.
  • Western Tool & Supply June 2020 - July 2020
    Software Engineer, Intern
    • Implemented and trained an LSTM Recurrent Neural Network to predict customer purchase likelihoods. • Integrating and using Bluetooth LE between microcontrollers and Google Chrome into their IOT system.
  • TGEN May 2020 - August 2020
    Virtual Helios Scholar
    Due to covid, the Helios Scholars Program was moved to a virtual format.
  • Arizona State University August 2019 - May 2023
    B.Sc. Honors Student
    Double Major in Mathematics (4.0) and Computer Science (4.0)

Hobbies

I like to Yoyo!
2025 Yoyo Performance (Check this one out)
yoyo
Thank you Martin Saveski for creating this really neat template!