Try Models

Standard Workflow

Last Updated: January 27, 2025

The Standard Workflow performs comprehensive single-cell analysis on your dataset without reference data integration.

Overview

The Standard Workflow processes your data through a complete analysis pipeline:

  1. Quality Control - Filter low-quality cells and genes
  2. Normalization - Standardize gene expression values
  3. Feature Selection - Identify highly variable genes
  4. Dimensionality Reduction - Generate embeddings (PCA/scVI/TranscriptFormer)
  5. Visualization - Create UMAP coordinates for 2D plotting
  6. Classification - Predict cell types (optional, scVI/TranscriptFormer only)
  7. Export - Convert results to CellxGene Explorer format

Computational Steps

1. Data Loading and Validation

Input: Your H5AD file
- Validate file structure and metadata
- Check organism compatibility
- Verify gene identifier format
- Generate compatibility report

2. Quality Control (Automatic)

Raw Data
- Filter cells: minimum 200 genes per cell
- Filter genes: minimum 3 cells per gene
- Remove high-mitochondrial cells (greater than 20% MT genes)
- Calculate quality metrics

Note: Quality control is automatically applied to raw count data. Pre-normalized data skips this step.

3. Normalization and Preprocessing

Filtered Data
- Detect data type (raw counts vs normalized)
- Normalize to 10,000 counts per cell (if raw)
- Log-transform (log1p)
- Store raw counts in layers['counts']

4. Feature Selection

Normalized Data
- Identify highly variable genes (top 3,000)
- Apply batch-aware selection if batch column provided
- Use Seurat v3 method for feature selection
- Subset data to highly variable genes

Model Options

PCA (Principal Component Analysis)

Features

  • Linear method: Captures major sources of variation
  • Batch correction: Harmony integration for multi-batch datasets if batch column is provided
  • Speed: Fastest processing time
  • Memory: Lowest memory requirements
  • Interpretability: Loadings show gene contributions

scVI (Single-cell Variational Inference)

Features

  • Generative model: Probabilistic framework for single-cell data
  • Batch correction: Built-in handling of batch effects
  • Census-trained: Leverages pre-trained models from reference data
  • Classification compatible: Supports cell type prediction

TranscriptFormer

An LLM adapted for cross-species single-cell transcriptomics. Learn more about TranscriptFormer

Expected Results

Processing Time

  • PCA: 2-5 minutes for typical datasets
  • scVI: 2-5 minutes depending on dataset size
  • TranscriptFormer: ~1 hour, requires GPU

Output Files

Your completed workflow provides:

Processed H5AD File

Disclaimer: Output H5AD files will not be downloadable for the Alpha release

  • Original data: Preserved in appropriate layers
  • Quality metrics: Cell and gene filtering statistics
  • Embeddings: Model-specific embeddings in .obsm
  • UMAP coordinates: 2D visualization coordinates
  • Classifications: Cell type predictions (if enabled)
  • Interactive visualization: Explore data in web browser
  • Gene expression: View expression patterns across cells
  • Embeddings: Navigate PCA/scVI/TranscriptFormer-derived UMAPs
  • Cell metadata: Color by batch, predicted cell types, quality metrics, etc.
  • Gene search: Find and highlight specific genes