A computational pathology pipeline for quantifying tissue fibrosis from whole-slide images (WSIs) using foundation model embeddings.
This pipeline extracts patch-level features from H&E-stained WSIs using vision foundation models (e.g., UNI2-h), clusters tissue morphologies, and computes a fibrosis composite score that integrates:
- Tissue-level features: cluster proportions from unsupervised patch clustering
- Cell-level features: nuclear morphometry (eccentricity, area, solidity) via Cellpose segmentation
- Spatial features: Moran's I, hotspot detection, neighborhood enrichment
The pipeline supports batch correction (ComBat), multi-cohort statistical testing, permutation/bootstrap validation, supervised transfer learning, and cross-modal integration with single-cell RNA-seq data.
pathofib/ # Pip-installable Python package
├── patch_extraction.py # Tissue segmentation + patching
├── feature_extraction.py # Foundation model feature extraction
├── clustering.py # PCA + K-means clustering pipeline
├── cell_analysis.py # Cellpose segmentation + morphometry
├── spatial_analysis.py # Moran's I, Gi*, neighborhood enrichment
├── supervised.py # Annotation-based supervised classification
├── stats.py # MWU, permutation, bootstrap, effect sizes
├── batch_correction.py # ComBat wrapper (calls R/sva)
├── visualization.py # Heatmaps, cluster overlays, plots
├── interpretation.py # Discriminative patch extraction
├── stain_normalization.py # Macenko/Reinhard normalization
└── config.py # PipelineConfig dataclass
applications/ # Study-specific implementations
├── mouse_lung_covid/ # SARS-CoV-2 mouse lung study
└── _template/ # Template for new studies
# Clone
git clone https://github.com/princello/pathology-fibrosis-pipeline.git
cd pathology-fibrosis-pipeline
# Install (editable mode recommended)
pip install -e .
# For R-based ComBat batch correction
# R >= 4.0 with sva package: install.packages("BiocManager"); BiocManager::install("sva")- Python >= 3.9
- PyTorch >= 2.0
- OpenSlide (system library + openslide-python)
- Cellpose >= 3.0
- scikit-learn, scipy, pandas, numpy, matplotlib, seaborn
- Optional: R + sva (for ComBat batch correction)
from pathofib import PipelineConfig, PatchClusteringPipeline
# Configure
config = PipelineConfig(
slides_dir="/path/to/svs/files",
features_dir="/path/to/features",
output_dir="/path/to/results",
k_values=[5, 8, 10],
n_pca=100,
)
# Run clustering
pipeline = PatchClusteringPipeline(config)
pipeline.load_features()
pipeline.fit_pca()
pipeline.cluster()
pipeline.save_results()See docs/getting_started.md for the full walkthrough and applications/_template/ for adapting the pipeline to your own study.
- Copy the template:
cp -r applications/_template/ applications/my_study/ - Edit
config.pywith your paths, cohort definitions, and parameters - Run feature extraction on your WSIs
- Execute the pipeline scripts
See applications/_template/README.md for detailed instructions.
The applications/mouse_lung_covid/ directory contains a complete 9-step analysis of SARS-CoV-2 infection in humanized mouse lungs (67 slides, 307K patches). This study demonstrates the full pipeline including ComBat batch correction, fibrosis quantification, and cross-modal validation with snRNA-seq. See applications/mouse_lung_covid/README.md.
If you use this pipeline, please cite:
@software{pathology_fibrosis_pipeline,
author = {Wang, Zicheng},
title = {pathology-fibrosis-pipeline: Computational Pathology for Tissue Fibrosis Quantification},
url = {https://github.com/princello/pathology-fibrosis-pipeline}
}
MIT License. See LICENSE.