Hosted by

Arc Institute

Sponsored by

Nvidia10X GenomicsUltima Biosciences

Virtual Cell Challenge

Join us in shaping the future of cellular biology through predictive modeling.

Join the challenge
Video preview
Overview

Solve One of Biology’s Biggest Challenges with AI

Understanding, predicting, and ultimately programming how cells respond to internal cues and external stimuli is a fundamental challenge in biology.


Advances in single-cell RNA-seq technologies now enable large-scale measurements of cellular responses to genetic and chemical perturbations, fueling this exciting era of predictive cellular modeling. Virtual Cell Challenge is a recurring, open, community-driven challenge aimed at evaluating and improving computational models that predict cellular responses to genetic or chemical perturbations.

In 2025, the challenge will focus on context generalization: participants must predict the effects of perturbation in a held out cell type—the H1 human embryonic stem cell line. Using new experimental data we have generated for the Challenge, you will build a model that predicts these effects and submit the results to the Challenge leaderboard.

The top three models will win prizes valued at $100,000, $50,000, and $25,000.

Join us in shaping the future of cellular biology through predictive modeling.

Cell Journal Cover
CELL
Read about the
Challenge in Cell
Read about the
Challenge in Cell ↗

Why Participate?

World-class Benchmark Datasets
World-class Benchmark Datasets
Access high-quality single-cell perturbation datasets specifically designed for model training and evaluation.
Significant Scientific Impact
Significant Scientific Impact
Enable exploration of cellular behavior, accelerate drug discovery, and develop new strategies for disease modeling.
Recognition and Awards
Recognition and Awards
Showcase your research and compete for prestigious recognition and substantial cash and technology prizes.
Global Community
Global Community
Collaborate and compete with leading scientists, computational biologists, and AI researchers from academia, biotech, and industry.

Getting Started

Register Now
01
Register Now
Sign up to access datasets and Challenge details.
Prepare Your Model
02
Prepare Your Model
Use provided datasets to build your predictive models.
Submit and Compete
03
Submit and Compete
Benchmark your model’s performance on our live leaderboard.

Challenge Timeline

The Virtual Cell Challenge will be an annual challenge, with new data added each year to help improve model performance. The inaugural challenge will run from June to November 2025.

June 26, 2025
Challenge announced, registration open, datasets available, start submitting your model's results to the leaderboard
October 27, 2025
Final test set released
Nov 3, 2025
Final submission deadline
Early Dec 2025
Winners announced

About the Data

Background image

Arc Institute has developed a dedicated, deeply sequenced, high cell coverage dataset of single-cell genetic perturbations for the Virtual Cell Challenge.

Learn more about the datasets

Arc’s Perturbation Data

Training data
Training data
High-quality perturbation and response data for genetic perturbations targeting 150 specified genes to help train your models.
Join to access
Validation set
Validation set
50 CRISPRi gene perturbations to test your predictions and monitor your progress in real time on our live leaderboard.
Join to access
Final test set
Final test set
100 CRISPRi gene perturbations to submit a final score at the end of the challenge.
Available in october

Public Training Data

Arc Institute
scBaseCount
A continuously updated single-cell RNA-seq database, scBaseCount comprises data from over 300 million cells, spanning 26 organisms and 72 tissues. 150 specified genes to help train your models.
Tahoe
Tahoe-100M
The world’s largest single-cell dataset generated and open-sourced by Tahoe, containing data from 100M cells from ~60,000 drug perturbation experiments across 50 cell lines.

Evaluation

During the Challenge, submitted predictions will be compared to known results from the validation data set to calculate a performance score based on three metrics. Participants’ scores will be shown on the live ranking leaderboard in real time as they are submitted.

We have designed three metrics to evaluate model performance—

  • Differential expression score to measure differential gene expression prediction accuracy.
  • Perturbation discrimination score to rank predicted perturbations according to their similarity to the true perturbational effect.
  • Mean Absolute Error to capture overall predictive accuracy and capture performance across the entire gene expression profile.

We use a combined score (overall score) that appropriately weighs each component relative to a baseline based on the cell-mean model of the training dataset.

In October, we will release the final test set for final submissions to the challenge. Scoring against the final test set will not be revealed publicly until winners are announced.

Scoring framework

Prizes

The top three models submitted to the challenge will win a combination of cash and cloud GPU credits. Winners will be determined based on a score that takes into account the Evaluation criteria described above. Final results will be announced in December 2025 after submission and evaluation of final scoring.

The top three scores will win prizes as follows, each divided equally between cash and cloud credits—

1st
$100,000
2nd
$50,000
3rd
$25,000
Prizes are generously sponsored by
Nvidia10xUltima Genomics