Standard Workflow
Last Updated: January 27, 2025
The Standard Workflow performs comprehensive single-cell analysis on your dataset without reference data integration.
Overview
The Standard Workflow processes your data through a complete analysis pipeline:
- Quality Control - Filter low-quality cells and genes
- Normalization - Standardize gene expression values
- Feature Selection - Identify highly variable genes
- Dimensionality Reduction - Generate embeddings (PCA/scVI/TranscriptFormer)
- Visualization - Create UMAP coordinates for 2D plotting
- Classification - Predict cell types (optional, scVI/TranscriptFormer only)
- Export - Convert results to CellxGene Explorer format
Computational Steps
1. Data Loading and Validation
Input: Your H5AD file
↓
- Validate file structure and metadata
- Check organism compatibility
- Verify gene identifier format
- Generate compatibility report2. Quality Control (Automatic)
Raw Data
↓
- Filter cells: minimum 200 genes per cell
- Filter genes: minimum 3 cells per gene
- Remove high-mitochondrial cells (greater than 20% MT genes)
- Calculate quality metricsNote: Quality control is automatically applied to raw count data. Pre-normalized data skips this step.
3. Normalization and Preprocessing
Filtered Data
↓
- Detect data type (raw counts vs normalized)
- Normalize to 10,000 counts per cell (if raw)
- Log-transform (log1p)
- Store raw counts in layers['counts']4. Feature Selection
Normalized Data
↓
- Identify highly variable genes (top 3,000)
- Apply batch-aware selection if batch column provided
- Use Seurat v3 method for feature selection
- Subset data to highly variable genesModel Options
PCA (Principal Component Analysis)
Features
- Linear method: Captures major sources of variation
- Batch correction: Harmony integration for multi-batch datasets if batch column is provided
- Speed: Fastest processing time
- Memory: Lowest memory requirements
- Interpretability: Loadings show gene contributions
scVI (Single-cell Variational Inference)
Features
- Generative model: Probabilistic framework for single-cell data
- Batch correction: Built-in handling of batch effects
- Census-trained: Leverages pre-trained models from reference data
- Classification compatible: Supports cell type prediction
TranscriptFormer
An LLM adapted for cross-species single-cell transcriptomics. Learn more about TranscriptFormer
Expected Results
Processing Time
- PCA: 2-5 minutes for typical datasets
- scVI: 2-5 minutes depending on dataset size
- TranscriptFormer: ~1 hour, requires GPU
Output Files
Your completed workflow provides:
Processed H5AD File
Disclaimer: Output H5AD files will not be downloadable for the Alpha release
- Original data: Preserved in appropriate layers
- Quality metrics: Cell and gene filtering statistics
- Embeddings: Model-specific embeddings in
.obsm - UMAP coordinates: 2D visualization coordinates
- Classifications: Cell type predictions (if enabled)
CellxGene Explorer Link
- Interactive visualization: Explore data in web browser
- Gene expression: View expression patterns across cells
- Embeddings: Navigate PCA/scVI/TranscriptFormer-derived UMAPs
- Cell metadata: Color by batch, predicted cell types, quality metrics, etc.
- Gene search: Find and highlight specific genes