Workflows
Last Updated: January 15, 2026
Overview
The AI Workspace offers multiple workflow types for analyzing single-cell RNA sequencing data. Each workflow serves different analytical needs and has specific requirements:
- Standard Workflow: Analyze a single dataset independently
- Co-Embed Workflow: Integrate multiple datasets into a unified embedding space
- Comparison Workflow: Compare your data against the CELLxGENE Census reference atlas
Standard Workflow
Analyze your dataset independently without reference to external data.
- Purpose: Single-dataset analysis and exploration
- Data Requirements: H5AD files with ENSEMBL gene IDs
- Available Models: PCA, scVI, TranscriptFormer
- Organism Support: Varies by model (see Data Requirements)
Learn more about Standard Workflow →
Co-Embed Workflow
Integrate multiple datasets into a unified embedding space for comparative analysis.
- Purpose: Combine and analyze multiple datasets together
- Data Requirements: Two H5AD files with compatible gene IDs
- Available Models: PCA, scVI, TranscriptFormer
- Integration: Automatic batch correction via Harmony (PCA) or de novo training (scVI)
- Classification: De novo label transfer from reference dataset to target dataset
Learn more about Co-Embed Workflow →
Comparison Workflow
Compare your data against the CELLxGENE Census reference atlas.
- Purpose: Contextualize cells against comprehensive reference data
- Data Requirements: H5AD files with human or mouse data only
- Available Models: scVI, TranscriptFormer only
- Gene Requirements: ENSEMBL IDs (ENSG or ENSMUSG prefixes)
Learn more about Comparison Workflow →
Co-Embed Workflow
The Co-Embed workflow integrates multiple datasets into a unified embedding space, enabling comparative analysis across different data sources.
Co-Embed is automatically triggered when you explicitly select the "co-embed" workflow type.
Processing Steps
The workflow combines datasets through an outer-join operation that creates a dataset_source column to track the origin of each cell. The integration approach varies by model:
PCA with Co-Embed
- PCA Step: Performs principal component analysis on the combined dataset
- Uses
dataset_sourceas the batch column (automatically created from outer-join)
- Uses
- Harmony Integration: Applies Harmony batch correction to integrate the datasets
- Corrects for batch effects between different data sources
- Uses
dataset_sourceas the batch column - Outputs integrated embeddings in
X_pca
scVI with Co-Embed
- De Novo Training: When classification is also enabled, the workflow uses de novo scVI training instead of pre-trained inference
- Trains a new model on the combined datasets
- Uses
dataset_sourceas the batch column - Runs for 400 epochs to ensure proper integration
- Produces better integration quality for label transfer
Use Cases
- Compare cells from different experimental conditions
- Integrate data from multiple time points
- Combine datasets from different studies or labs
- Transfer labels from a reference dataset to a query dataset
Cell Type Classification
Automated cell type annotation transfers cell type labels to your data using either pre-trained Census models or de novo label transfer from a reference dataset.
Classification Methods
The system uses two different approaches depending on your workflow configuration:
Standard Classification (Pre-trained Census Model)
Available for Standard and Comparison workflows when using scVI or TranscriptFormer models.
- Method: Uses pre-trained models from the CELLxGENE Census
- Requirements:
- scVI or TranscriptFormer models only (PCA not supported)
- Human or mouse data
- Process:
- Generates embeddings using your selected model
- Applies pre-trained classifier trained on Census reference data
- Transfers cell type labels with confidence scores
- Embedding Keys:
- scVI: Uses
scviembeddings - TranscriptFormer: Uses
transcriptformerembeddings
- scVI: Uses
Co-Embed Classification (De Novo Label Transfer)
Available when combining multiple datasets in a Co-Embed workflow.
- Method: Transfers labels from a reference dataset to a target dataset
- Requirements:
- Two uploaded files
- Reference dataset must contain a
cell_typecolumn in its metadata - Works with PCA, scVI, and TranscriptFormer models
- Process:
- Combines datasets via outer-join (creates
dataset_sourcecolumn) - Generates integrated embeddings using your selected model
- Identifies reference dataset as
dataset_2(second file uploaded) - Uses KNN classifier to transfer labels from reference to target cells
- Applies labels based on nearest neighbors in embedding space
- Combines datasets via outer-join (creates
- Embedding Keys (automatically selected based on model):
- PCA: Uses
X_pca(Harmony-corrected embeddings) - scVI: Uses
scviembeddings - TranscriptFormer: Uses
transcriptformerembeddings
- PCA: Uses
- Note: Currently requires
cell_typecolumn in reference data (future versions will support dynamic column selection)
Classification Output
Both methods produce:
- Predicted cell type labels for each cell in your dataset
- Confidence scores indicating prediction reliability
- Integration quality metrics (for co-embed workflows)
When to Use Each Method
- Pre-trained Census: Best for single-dataset analysis when you want to leverage comprehensive reference annotations
- De Novo Transfer: Best when you have a specific reference dataset with known cell types that you want to transfer to your query data
Workflow Selection
Feature | Standard | Co-Embed | Comparison |
|---|---|---|---|
Number of Datasets | 1 | 2 | 1 |
Organism Support | Model-dependent | Model-dependent | Human/mouse only |
Available Models | PCA, scVI, TranscriptFormer | PCA, scVI, TranscriptFormer | scVI, TranscriptFormer |
Data Requirements | Raw or processed (model-dependent) | Raw or processed (model-dependent) | Raw integer counts only |
| Cell Type Classification | Pre-trained Census (scVI/TranscriptFormer, human/mouse) | De Novo Transfer (all models, requires reference with cell_type column) | Pre-trained Census (scVI/TranscriptFormer, human/mouse) |