Try Models

Workflows

Last Updated: January 15, 2026

Overview

The AI Workspace offers multiple workflow types for analyzing single-cell RNA sequencing data. Each workflow serves different analytical needs and has specific requirements:

  • Standard Workflow: Analyze a single dataset independently
  • Co-Embed Workflow: Integrate multiple datasets into a unified embedding space
  • Comparison Workflow: Compare your data against the CELLxGENE Census reference atlas

Standard Workflow

Analyze your dataset independently without reference to external data.

  • Purpose: Single-dataset analysis and exploration
  • Data Requirements: H5AD files with ENSEMBL gene IDs
  • Available Models: PCA, scVI, TranscriptFormer
  • Organism Support: Varies by model (see Data Requirements)

Learn more about Standard Workflow →

Co-Embed Workflow

Integrate multiple datasets into a unified embedding space for comparative analysis.

  • Purpose: Combine and analyze multiple datasets together
  • Data Requirements: Two H5AD files with compatible gene IDs
  • Available Models: PCA, scVI, TranscriptFormer
  • Integration: Automatic batch correction via Harmony (PCA) or de novo training (scVI)
  • Classification: De novo label transfer from reference dataset to target dataset

Learn more about Co-Embed Workflow →

Comparison Workflow

Compare your data against the CELLxGENE Census reference atlas.

  • Purpose: Contextualize cells against comprehensive reference data
  • Data Requirements: H5AD files with human or mouse data only
  • Available Models: scVI, TranscriptFormer only
  • Gene Requirements: ENSEMBL IDs (ENSG or ENSMUSG prefixes)

Learn more about Comparison Workflow →

Co-Embed Workflow

The Co-Embed workflow integrates multiple datasets into a unified embedding space, enabling comparative analysis across different data sources.

Co-Embed is automatically triggered when you explicitly select the "co-embed" workflow type.

Processing Steps

The workflow combines datasets through an outer-join operation that creates a dataset_source column to track the origin of each cell. The integration approach varies by model:

PCA with Co-Embed

  1. PCA Step: Performs principal component analysis on the combined dataset
    • Uses dataset_source as the batch column (automatically created from outer-join)
  2. Harmony Integration: Applies Harmony batch correction to integrate the datasets
    • Corrects for batch effects between different data sources
    • Uses dataset_source as the batch column
    • Outputs integrated embeddings in X_pca

scVI with Co-Embed

  • De Novo Training: When classification is also enabled, the workflow uses de novo scVI training instead of pre-trained inference
    • Trains a new model on the combined datasets
    • Uses dataset_source as the batch column
    • Runs for 400 epochs to ensure proper integration
    • Produces better integration quality for label transfer

Use Cases

  • Compare cells from different experimental conditions
  • Integrate data from multiple time points
  • Combine datasets from different studies or labs
  • Transfer labels from a reference dataset to a query dataset

Cell Type Classification

Automated cell type annotation transfers cell type labels to your data using either pre-trained Census models or de novo label transfer from a reference dataset.

Classification Methods

The system uses two different approaches depending on your workflow configuration:

Standard Classification (Pre-trained Census Model)

Available for Standard and Comparison workflows when using scVI or TranscriptFormer models.

  • Method: Uses pre-trained models from the CELLxGENE Census
  • Requirements:
    • scVI or TranscriptFormer models only (PCA not supported)
    • Human or mouse data
  • Process:
    1. Generates embeddings using your selected model
    2. Applies pre-trained classifier trained on Census reference data
    3. Transfers cell type labels with confidence scores
  • Embedding Keys:
    • scVI: Uses scvi embeddings
    • TranscriptFormer: Uses transcriptformer embeddings

Co-Embed Classification (De Novo Label Transfer)

Available when combining multiple datasets in a Co-Embed workflow.

  • Method: Transfers labels from a reference dataset to a target dataset
  • Requirements:
    • Two uploaded files
    • Reference dataset must contain a cell_type column in its metadata
    • Works with PCA, scVI, and TranscriptFormer models
  • Process:
    1. Combines datasets via outer-join (creates dataset_source column)
    2. Generates integrated embeddings using your selected model
    3. Identifies reference dataset as dataset_2 (second file uploaded)
    4. Uses KNN classifier to transfer labels from reference to target cells
    5. Applies labels based on nearest neighbors in embedding space
  • Embedding Keys (automatically selected based on model):
    • PCA: Uses X_pca (Harmony-corrected embeddings)
    • scVI: Uses scvi embeddings
    • TranscriptFormer: Uses transcriptformer embeddings
  • Note: Currently requires cell_type column in reference data (future versions will support dynamic column selection)

Classification Output

Both methods produce:

  • Predicted cell type labels for each cell in your dataset
  • Confidence scores indicating prediction reliability
  • Integration quality metrics (for co-embed workflows)

When to Use Each Method

  • Pre-trained Census: Best for single-dataset analysis when you want to leverage comprehensive reference annotations
  • De Novo Transfer: Best when you have a specific reference dataset with known cell types that you want to transfer to your query data

Workflow Selection

Feature

Standard

Co-Embed

Comparison

Number of Datasets

121

Organism Support

Model-dependent

Model-dependent

Human/mouse only

Available Models

PCA, scVI, TranscriptFormer

PCA, scVI, TranscriptFormer

scVI, TranscriptFormer

Data Requirements

Raw or processed (model-dependent)

Raw or processed (model-dependent)

Raw integer counts only
Cell Type Classification

Pre-trained Census (scVI/TranscriptFormer, human/mouse)

De Novo Transfer (all models, requires reference with cell_type column)

Pre-trained Census (scVI/TranscriptFormer, human/mouse)