Standard Workflow

Last Updated: January 27, 2025

The Standard Workflow performs comprehensive single-cell analysis on your dataset without reference data integration.

Overview

The Standard Workflow processes your data through a complete analysis pipeline:

Quality Control - Filter low-quality cells and genes
Normalization - Standardize gene expression values
Feature Selection - Identify highly variable genes
Dimensionality Reduction - Generate embeddings (PCA/scVI/TranscriptFormer)
Visualization - Create UMAP coordinates for 2D plotting
Classification - Predict cell types (optional, scVI/TranscriptFormer only)
Export - Convert results to CellxGene Explorer format

Computational Steps

1. Data Loading and Validation

Input: Your H5AD file
↓
- Validate file structure and metadata
- Check organism compatibility
- Verify gene identifier format
- Generate compatibility report

2. Quality Control (Automatic)

Raw Data
↓
- Filter cells: minimum 200 genes per cell
- Filter genes: minimum 3 cells per gene
- Remove high-mitochondrial cells (greater than 20% MT genes)
- Calculate quality metrics

Note: Quality control is automatically applied to raw count data. Pre-normalized data skips this step.

3. Normalization and Preprocessing

Filtered Data
↓
- Detect data type (raw counts vs normalized)
- Normalize to 10,000 counts per cell (if raw)
- Log-transform (log1p)
- Store raw counts in layers['counts']

4. Feature Selection

Normalized Data
↓
- Identify highly variable genes (top 3,000)
- Apply batch-aware selection if batch column provided
- Use Seurat v3 method for feature selection
- Subset data to highly variable genes

Model Options

PCA (Principal Component Analysis)

Features

Linear method: Captures major sources of variation
Batch correction: Harmony integration for multi-batch datasets if batch column is provided
Speed: Fastest processing time
Memory: Lowest memory requirements
Interpretability: Loadings show gene contributions

scVI (Single-cell Variational Inference)

Features

Generative model: Probabilistic framework for single-cell data
Batch correction: Built-in handling of batch effects
Census-trained: Leverages pre-trained models from reference data
Classification compatible: Supports cell type prediction

TranscriptFormer

An LLM adapted for cross-species single-cell transcriptomics. Learn more about TranscriptFormer

Expected Results

Processing Time

PCA: 2-5 minutes for typical datasets
scVI: 2-5 minutes depending on dataset size
TranscriptFormer: ~1 hour, requires GPU

Output Files

Your completed workflow provides:

Processed H5AD File

Disclaimer: Output H5AD files will not be downloadable for the Alpha release

Original data: Preserved in appropriate layers
Quality metrics: Cell and gene filtering statistics
Embeddings: Model-specific embeddings in .obsm
UMAP coordinates: 2D visualization coordinates
Classifications: Cell type predictions (if enabled)

CellxGene Explorer Link

Interactive visualization: Explore data in web browser
Gene expression: View expression patterns across cells
Embeddings: Navigate PCA/scVI/TranscriptFormer-derived UMAPs
Cell metadata: Color by batch, predicted cell types, quality metrics, etc.
Gene search: Find and highlight specific genes