Hosted by
Arc Institute

Virtual Cell Challenge

Join us in shaping the future of cellular biology through predictive modeling.

Join the challenge
Overview

Solve One of Biology’s Biggest Challenges with AI

Understanding, predicting, and ultimately programming how cells respond to internal cues and external stimuli is a fundamental challenge in biology.


Advances in single-cell technologies now enable large-scale measurements of cellular responses to genetic and chemical perturbations, fueling this exciting era of predictive cellular modeling. Virtual Cell Challenge is a recurring, open, community-driven challenge aimed at evaluating and improving computational models that predict cellular responses to genetic or chemical perturbations.

In 2025, the challenge will focus on context generalization: participants must predict the effects of perturbation in a held out cell type—the H1 human embryonic steam cell line.

The top three models will win cash prizes of $100,000, $50,000, and $25,000.

Join us in shaping the future of cellular biology through predictive modeling.

Cell Journal Cover
CELL
Read about the
Challenge in Cell
Read more about the
Challenge in Cell ↗

Why Participate?

World-class Benchmark Datasets
World-class Benchmark Datasets
Access high-quality single-cell perturbation datasets specifically designed for model training and evaluation.
Significant Scientific Impact
Significant Scientific Impact
Enable exploration of cellular behavior, accelerate drug discovery and develop new strategies for disease modeling.
Recognition and Awards
Recognition and Awards
Showcase your research and compete for prestigious recognition and substantial cash prizes.
Global Community
Global Community
Collaborate and compete with leading scientists, computational biologists, and AI researchers from academia, biotech, and industry.

Getting Started

Register Now
01
Register Now
Sign up to access datasets and challenge details.
Prepare Your Model
02
Prepare Your Model
Use provided datasets to build your predictive models.
Submit and Compete
03
Submit and Compete
Benchmark your model’s performance on our live leaderboard.

Challenge Timeline

The Virtual Cell Challenge will be an annual challenge, with new data added each year to help improve model performance. The inaugural challenge will run from June to November 2025.

June 26, 2025
Challenge announced, registration open, datasets available, submit your model to the leaderboard
October 27, 2025
Final test set released
Nov 3, 2025
Final submission deadline
Dec 2, 2025
Winners announced

About the Data

Background image

Arc Institute has developed a new deeply sequenced, high cell coverage, large scale dataset of single cell perturbations. We are making these data available through the Virtual Cell Challenge challenge.

Learn more about the datasets

Arc’s Perturbation Data

Training data
Training data
High-quality perturbation and response data for genetic perturbations targeting 150 specified genes to help train your models.
Join to access
Validation set
Validation set
50 genes to test your predictions and monitor your progress in real-time on our live leaderboard.
Join to access
Final test set
Final test set
100 genes to submit a final score at the end of the challenge.
Available in october

Public Training Data

Arc Institute
scBaseCount
A continuously updated single-cell RNA-seq database, scBaseCount comprises over 230 million cells (and expanding), spanning 21 organisms and 72 tissues. 150 specified genes to help train your models.
Tahoe
Tahoe-100
The world’s largest single-cell dataset generated and open-sourced by Tahoe, containing 100M cells from ~60,000 drug perturbation experiments.
Public Perturb-seq Datasets
Publicly available external resources, including gene expression datasets and pre-trained models recommended by us.

Evaluation

While the challenge is open, predictions submitted to the Virtual Cell Challenge will be compared to known results from the validation data set to calculate an accuracy score. Participants’ scores will be shown on the live ranking leaderboard in real time as they are submitted.

We have designed three metrics to evaluate model performance—

  • Differential expression score to measure differential gene expression prediction accuracy
  • Perturbation discrimination score to rank predictions according to their similarity to the true perturbational effect
  • Mean Absolute Error to capture overall predictive accuracy and capture performance across the entire gene expression profile

We use a combined score (Overall score) that appropriately weighs each component, and enforces minimum thresholds on all metrics to promote a balanced performance.

In October, we will release the final test set for final submissions to the challenge. Scoring against the final test set will not be revealed publicly until winners are announced in November.

Scoring framework

Prizes

The top three models submitted to the challenge will win cash prizes. Winners will be announced in December 2025 after submission and evaluation of final scoring.

Winning submissions will be based on prediction accuracy on the final test set. The Virtual Cell Challenge will privately identify the top five submissions at the end of the challenge.

The top three scores will win prizes as follows —

1st
$100,000
2nd
$50,000
3rd
$25,000
Prize money is generously sponsored by
10x
Nvidia
Arc Institute