BioNeighbor — Bioinformatics & Assay Simulation Epic
Purpose
BioNeighbor is a research-only bioinformatics exploration platform inspired by collaborative filtering.
It helps users discover relationships between molecules, targets, diseases, and assays using similarity,
neighbor analysis, and explainable inference — not clinical prediction.
This document defines a complete epic suitable for GitHub issues or AI-assisted implementation.
Core Principles
- Research and hypothesis generation only
- No clinical, dosing, or treatment claims
- Fully explainable similarity and inference
- Clear data provenance
- Offline-capable where possible
- Kotlin Multiplatform–friendly architecture
Primary App Sections
Main navigation:
Drugs | Molecules | Targets | Diseases | Assays | Similarity
All sections share the same underlying bioinformatics graph.
1. Target Bioinformatics
1.1 Target Profiles
Each biological target includes:
- Target name(s)
- Gene symbol(s)
- Protein name
- Protein family (GPCR, kinase, enzyme, transporter, ion channel)
- Biological function summary
- Associated diseases
- Known ligands
- Known assays
- Pathway membership
1.2 Target Classification
Targets are categorized by mechanism of interaction:
Targets
- Agonists
- Antagonists
- Inhibitors
- Modulators (allosteric, partial, inverse)
Each category includes:
- Known ligand examples
- Typical structural features of ligands
- Common assay types validating the interaction
This classification is descriptive, not predictive.
1.3 Target Neighbors (Target Similarity)
Enable discovery of biologically similar targets based on:
- Protein family membership
- Shared ligands
- Shared assays
- Pathway overlap
- Sequence similarity (when available)
Use cases:
- Off-target discovery
- Drug repurposing hypotheses
- Polypharmacology research
2. Ligand–Target Interaction Bioinformatics
2.1 Interaction Evidence
For each molecule–target pair:
- Interaction type (agonist, antagonist, inhibitor, modulator)
- Assay count
- Measurement types (IC50, EC50, Ki, Kd)
- Species tested
- Experimental system (cell-free, cell-based, in vivo)
- Confidence score based on data volume only
No efficacy or safety claims are made.
2.2 Evidence Aggregation
Aggregate interaction evidence across:
- Multiple assays
- Multiple publications
- Multiple species
Conflicting results must be surfaced, not hidden.
3. Chemical Structure & Bond Features
3.1 Molecular Features
Extract and store:
- Molecular fingerprints (ECFP-style or equivalent)
- Functional groups
- Ring systems
- Aromaticity
- Charge distribution
- Hydrogen bond donors/acceptors
Used for:
- Molecule similarity
- Neighbor discovery
- Explainability
3.2 Chemical Bond Pattern Analysis
Analyze bond-level patterns across ligands for a target:
- Conserved scaffolds
- Substituent variability
- Recurrent bond motifs
Purpose:
- Explain similarity scores
- Support medicinal chemistry reasoning
4. Pathway Bioinformatics
4.1 Pathway Mapping
Targets are mapped into pathways:
- Signaling cascades
- Enzymatic chains
- Regulatory feedback loops
Users can explore:
- Upstream regulators
- Downstream effects
- Multi-target intervention zones
4.2 Multi-Target Neighborhoods
Identify:
- Molecules hitting multiple targets
- Targets affected by similar molecule sets
Use cases:
- CNS research
- Oncology
- Side-effect exploration
5. Disease Bioinformatics
5.1 Disease-Centered Navigation
Support navigation flow:
Disease → Associated Genes → Targets → Known Ligands → Neighbor Molecules
This enables disease-first exploration rather than chemistry-first.
6. Assay Bioinformatics
6.1 Assay Catalog
Each assay entry includes:
- Assay name
- Assay type (binding, functional, reporter, enzymatic)
- Target(s)
- Readout type
- Measurement units
- Biological system
- Known limitations
6.2 Assay Neighbors
Assays are considered similar based on:
- Target overlap
- Measurement type
- Biological context
- Detection technology
6.3 Assay Simulation (Research-Only)
Assay simulation is exploratory and non-predictive.
Level 1 — Statistical Replay
- Sample historical assay distributions
- Add noise
- Display confidence intervals
Level 2 — Mechanism-Aware Simulation
- Incorporate interaction type assumptions
- Adjust expected readout behavior
- Model assay sensitivity limits
Level 3 — Hypothesis Stress Testing
- Cross-assay consistency checks
- Sensitivity analysis
- Failure mode visualization
All outputs are labeled:
“Computational hypothesis — not experimental data”
7. Safe Inference of New Targets & Antagonists
7.1 Inference Conditions
Hypotheses may be suggested when:
- Strong structural similarity exists
- Targets share ligands or pathways
- Assay patterns overlap significantly
All inferred results must:
- Be labeled “Hypothesis”
- Show supporting neighbors
- Avoid clinical or therapeutic claims
7.2 Explainability Requirements
Every inference must answer:
- Which neighbors influenced this?
- Which features overlap?
- What source data supports it?
8. Data Sources & Retrieval Methods
8.1 UniProt
Data:
- Protein sequences
- Functional annotations
Access:
- REST API
- Bulk downloads (FASTA, TSV)
Size:
- Tens of GB full
- Targeted subsets recommended
Offline snapshots:
8.2 IUPHAR / Guide to Pharmacology
Data:
- Curated ligand–target interactions
Access:
Size:
Notes:
- High-quality curated antagonist/agonist data
8.3 Reactome
Data:
- Pathways
- Target participation
Access:
- REST API
- Bulk downloads (JSON, BioPAX)
Size:
Offline snapshots:
8.4 KEGG (Optional)
Data:
Access:
Notes:
- Optional or link-only integration
8.5 PubChem BioAssay
Data:
- Assays
- Bioactivity results
Access:
- REST API
- FTP bulk downloads
Size:
- Hundreds of GB full
- Filtered subsets strongly recommended
Offline snapshots:
8.6 ChEMBL (When Reachable)
Data:
- Molecules
- Assays
- Activities
Access:
- REST API (currently unstable)
- PostgreSQL database dumps
Size:
Strategy:
- Cached mirrors
- Graceful degradation if unavailable
9. Architecture Notes
- Kotlin Multiplatform core models
- Optional Python preprocessing pipelines
- Client-side similarity computation where feasible
- Precomputed neighbor graphs
- Snapshot-based datasets for offline use
10. Ethics & Safety
- No dosing information
- No synthesis instructions
- No treatment advice
- No medical claims
- Clear research-only disclaimers
Guiding Principle
BioNeighbor does not discover drugs.
It helps humans discover biological neighborhoods and testable ideas.
BioNeighbor — Bioinformatics & Assay Simulation Epic
Purpose
BioNeighbor is a research-only bioinformatics exploration platform inspired by collaborative filtering.
It helps users discover relationships between molecules, targets, diseases, and assays using similarity,
neighbor analysis, and explainable inference — not clinical prediction.
This document defines a complete epic suitable for GitHub issues or AI-assisted implementation.
Core Principles
Primary App Sections
Main navigation:
Drugs | Molecules | Targets | Diseases | Assays | Similarity
All sections share the same underlying bioinformatics graph.
1. Target Bioinformatics
1.1 Target Profiles
Each biological target includes:
1.2 Target Classification
Targets are categorized by mechanism of interaction:
Targets
Each category includes:
This classification is descriptive, not predictive.
1.3 Target Neighbors (Target Similarity)
Enable discovery of biologically similar targets based on:
Use cases:
2. Ligand–Target Interaction Bioinformatics
2.1 Interaction Evidence
For each molecule–target pair:
No efficacy or safety claims are made.
2.2 Evidence Aggregation
Aggregate interaction evidence across:
Conflicting results must be surfaced, not hidden.
3. Chemical Structure & Bond Features
3.1 Molecular Features
Extract and store:
Used for:
3.2 Chemical Bond Pattern Analysis
Analyze bond-level patterns across ligands for a target:
Purpose:
4. Pathway Bioinformatics
4.1 Pathway Mapping
Targets are mapped into pathways:
Users can explore:
4.2 Multi-Target Neighborhoods
Identify:
Use cases:
5. Disease Bioinformatics
5.1 Disease-Centered Navigation
Support navigation flow:
Disease → Associated Genes → Targets → Known Ligands → Neighbor Molecules
This enables disease-first exploration rather than chemistry-first.
6. Assay Bioinformatics
6.1 Assay Catalog
Each assay entry includes:
6.2 Assay Neighbors
Assays are considered similar based on:
6.3 Assay Simulation (Research-Only)
Assay simulation is exploratory and non-predictive.
Level 1 — Statistical Replay
Level 2 — Mechanism-Aware Simulation
Level 3 — Hypothesis Stress Testing
All outputs are labeled:
“Computational hypothesis — not experimental data”
7. Safe Inference of New Targets & Antagonists
7.1 Inference Conditions
Hypotheses may be suggested when:
All inferred results must:
7.2 Explainability Requirements
Every inference must answer:
8. Data Sources & Retrieval Methods
8.1 UniProt
Data:
Access:
Size:
Offline snapshots:
8.2 IUPHAR / Guide to Pharmacology
Data:
Access:
Size:
Notes:
8.3 Reactome
Data:
Access:
Size:
Offline snapshots:
8.4 KEGG (Optional)
Data:
Access:
Notes:
8.5 PubChem BioAssay
Data:
Access:
Size:
Offline snapshots:
8.6 ChEMBL (When Reachable)
Data:
Access:
Size:
Strategy:
9. Architecture Notes
10. Ethics & Safety
Guiding Principle
BioNeighbor does not discover drugs.
It helps humans discover biological neighborhoods and testable ideas.