Epic: Bioinformatics & Assay Simulation

# BioNeighbor — Bioinformatics & Assay Simulation Epic

## Purpose

BioNeighbor is a research-only bioinformatics exploration platform inspired by collaborative filtering.
It helps users discover relationships between molecules, targets, diseases, and assays using similarity,
neighbor analysis, and explainable inference — not clinical prediction.

This document defines a complete epic suitable for GitHub issues or AI-assisted implementation.

---

## Core Principles

- Research and hypothesis generation only
- No clinical, dosing, or treatment claims
- Fully explainable similarity and inference
- Clear data provenance
- Offline-capable where possible
- Kotlin Multiplatform–friendly architecture

---

## Primary App Sections

Main navigation:

Drugs | Molecules | Targets | Diseases | Assays | Similarity

All sections share the same underlying bioinformatics graph.

---

## 1. Target Bioinformatics

### 1.1 Target Profiles

Each biological target includes:
- Target name(s)
- Gene symbol(s)
- Protein name
- Protein family (GPCR, kinase, enzyme, transporter, ion channel)
- Biological function summary
- Associated diseases
- Known ligands
- Known assays
- Pathway membership

---

### 1.2 Target Classification

Targets are categorized by mechanism of interaction:

Targets
- Agonists
- Antagonists
- Inhibitors
- Modulators (allosteric, partial, inverse)

Each category includes:
- Known ligand examples
- Typical structural features of ligands
- Common assay types validating the interaction

This classification is descriptive, not predictive.

---

### 1.3 Target Neighbors (Target Similarity)

Enable discovery of biologically similar targets based on:
- Protein family membership
- Shared ligands
- Shared assays
- Pathway overlap
- Sequence similarity (when available)

Use cases:
- Off-target discovery
- Drug repurposing hypotheses
- Polypharmacology research

---

## 2. Ligand–Target Interaction Bioinformatics

### 2.1 Interaction Evidence

For each molecule–target pair:
- Interaction type (agonist, antagonist, inhibitor, modulator)
- Assay count
- Measurement types (IC50, EC50, Ki, Kd)
- Species tested
- Experimental system (cell-free, cell-based, in vivo)
- Confidence score based on data volume only

No efficacy or safety claims are made.

---

### 2.2 Evidence Aggregation

Aggregate interaction evidence across:
- Multiple assays
- Multiple publications
- Multiple species

Conflicting results must be surfaced, not hidden.

---

## 3. Chemical Structure & Bond Features

### 3.1 Molecular Features

Extract and store:
- Molecular fingerprints (ECFP-style or equivalent)
- Functional groups
- Ring systems
- Aromaticity
- Charge distribution
- Hydrogen bond donors/acceptors

Used for:
- Molecule similarity
- Neighbor discovery
- Explainability

---

### 3.2 Chemical Bond Pattern Analysis

Analyze bond-level patterns across ligands for a target:
- Conserved scaffolds
- Substituent variability
- Recurrent bond motifs

Purpose:
- Explain similarity scores
- Support medicinal chemistry reasoning

---

## 4. Pathway Bioinformatics

### 4.1 Pathway Mapping

Targets are mapped into pathways:
- Signaling cascades
- Enzymatic chains
- Regulatory feedback loops

Users can explore:
- Upstream regulators
- Downstream effects
- Multi-target intervention zones

---

### 4.2 Multi-Target Neighborhoods

Identify:
- Molecules hitting multiple targets
- Targets affected by similar molecule sets

Use cases:
- CNS research
- Oncology
- Side-effect exploration

---

## 5. Disease Bioinformatics

### 5.1 Disease-Centered Navigation

Support navigation flow:

Disease → Associated Genes → Targets → Known Ligands → Neighbor Molecules

This enables disease-first exploration rather than chemistry-first.

---

## 6. Assay Bioinformatics

### 6.1 Assay Catalog

Each assay entry includes:
- Assay name
- Assay type (binding, functional, reporter, enzymatic)
- Target(s)
- Readout type
- Measurement units
- Biological system
- Known limitations

---

### 6.2 Assay Neighbors

Assays are considered similar based on:
- Target overlap
- Measurement type
- Biological context
- Detection technology

---

### 6.3 Assay Simulation (Research-Only)

Assay simulation is exploratory and non-predictive.

Level 1 — Statistical Replay
- Sample historical assay distributions
- Add noise
- Display confidence intervals

Level 2 — Mechanism-Aware Simulation
- Incorporate interaction type assumptions
- Adjust expected readout behavior
- Model assay sensitivity limits

Level 3 — Hypothesis Stress Testing
- Cross-assay consistency checks
- Sensitivity analysis
- Failure mode visualization

All outputs are labeled:
“Computational hypothesis — not experimental data”

---

## 7. Safe Inference of New Targets & Antagonists

### 7.1 Inference Conditions

Hypotheses may be suggested when:
- Strong structural similarity exists
- Targets share ligands or pathways
- Assay patterns overlap significantly

All inferred results must:
- Be labeled “Hypothesis”
- Show supporting neighbors
- Avoid clinical or therapeutic claims

---

### 7.2 Explainability Requirements

Every inference must answer:
- Which neighbors influenced this?
- Which features overlap?
- What source data supports it?

---

## 8. Data Sources & Retrieval Methods

### 8.1 UniProt

Data:
- Protein sequences
- Functional annotations

Access:
- REST API
- Bulk downloads (FASTA, TSV)

Size:
- Tens of GB full
- Targeted subsets recommended

Offline snapshots:
- Yes

---

### 8.2 IUPHAR / Guide to Pharmacology

Data:
- Curated ligand–target interactions

Access:
- REST API

Size:
- MB-scale

Notes:
- High-quality curated antagonist/agonist data

---

### 8.3 Reactome

Data:
- Pathways
- Target participation

Access:
- REST API
- Bulk downloads (JSON, BioPAX)

Size:
- Few GB

Offline snapshots:
- Yes

---

### 8.4 KEGG (Optional)

Data:
- Pathways
- Disease maps

Access:
- API (license-sensitive)

Notes:
- Optional or link-only integration

---

### 8.5 PubChem BioAssay

Data:
- Assays
- Bioactivity results

Access:
- REST API
- FTP bulk downloads

Size:
- Hundreds of GB full
- Filtered subsets strongly recommended

Offline snapshots:
- Partial

---

### 8.6 ChEMBL (When Reachable)

Data:
- Molecules
- Assays
- Activities

Access:
- REST API (currently unstable)
- PostgreSQL database dumps

Size:
- ~30–40 GB

Strategy:
- Cached mirrors
- Graceful degradation if unavailable

---

## 9. Architecture Notes

- Kotlin Multiplatform core models
- Optional Python preprocessing pipelines
- Client-side similarity computation where feasible
- Precomputed neighbor graphs
- Snapshot-based datasets for offline use

---

## 10. Ethics & Safety

- No dosing information
- No synthesis instructions
- No treatment advice
- No medical claims
- Clear research-only disclaimers

---

## Guiding Principle

BioNeighbor does not discover drugs.
It helps humans discover biological neighborhoods and testable ideas.

Epic: Bioinformatics & Assay Simulation #20

Description

BioNeighbor — Bioinformatics & Assay Simulation Epic

Purpose

Core Principles

Primary App Sections

1. Target Bioinformatics

1.1 Target Profiles

1.2 Target Classification

1.3 Target Neighbors (Target Similarity)

2. Ligand–Target Interaction Bioinformatics

2.1 Interaction Evidence

2.2 Evidence Aggregation

3. Chemical Structure & Bond Features

3.1 Molecular Features

3.2 Chemical Bond Pattern Analysis

4. Pathway Bioinformatics

4.1 Pathway Mapping

4.2 Multi-Target Neighborhoods

5. Disease Bioinformatics

5.1 Disease-Centered Navigation

6. Assay Bioinformatics

6.1 Assay Catalog

6.2 Assay Neighbors

6.3 Assay Simulation (Research-Only)

7. Safe Inference of New Targets & Antagonists

7.1 Inference Conditions

7.2 Explainability Requirements

8. Data Sources & Retrieval Methods

8.1 UniProt

8.2 IUPHAR / Guide to Pharmacology

8.3 Reactome

8.4 KEGG (Optional)

8.5 PubChem BioAssay

8.6 ChEMBL (When Reachable)

9. Architecture Notes

10. Ethics & Safety

Guiding Principle

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions