Team Project · Python · Claude API · Data Pipeline

PatientPunk

Millions of patients document their treatment experiences online. PatientPunk aggregates those reports into a structured research database, turning patient self-reports into queryable, analysis-ready data for researchers studying understudied conditions.

★First place: Human Enhancement Hackathon

Organized by Biopunk Lab, a San Francisco community biology space focused on democratizing access to biotechnology, the hackathon challenged teams to build at the intersection of biotech and human health. PatientPunk won for its approach to turning patient-generated data into structured research signal.

GitHub

The pipeline ingests posts from Reddit and patient forums, normalizes qualitative markers into structured records, uses Claude to extract symptoms and map them to medical ontologies (MeSH, SNOMED), and outputs to CSV, SQL, and REST API. For conditions like ME/CFS and long COVID, patient testimony is often the only available signal about real-world treatment outcomes.

PythonSQLiteOpenRouterJupyterReddit scraperMeSH / SNOMED

My contributions

Scrapers

Built the Reddit ingestion scrapers that pull and normalize patient posts at scale.

Demographic analysis

Designed and implemented the demographic analysis pipeline.

Analysis pipeline

Architected the sql to results pipeline: LLM analysis with both guidelines and rails, with a focus on reproducible tests and results

Claude research skill

Wrote the Claude skill that instructs the model how to turn research questions into reproducible, publication-quality Jupyter notebooks, including chart standards, statistical requirements, and reporting conventions.

Sample figures

Autogenerated by the Claude research skill from raw patient data. No hand-tuning: chart standards, axis labels, and statistical overlays are all produced by the skill spec.

Analysis notebooks

Eight research questions across three patient communities, covering 10,000+ treatment reports. Findings are preliminary, designed to demonstrate pipeline capability, not clinical conclusions.

Treatment Overview

r/covidlonghaulers

LDN, magnesium, and electrolytes ranked highest by community sentiment. SSRIs ranked lowest.

POTS Preliminary

r/covidlonghaulers

POTS subgroup tries twice as many treatments and reports consistently worse outcomes than the broader cohort.

POTS Treatment Strategy

r/covidlonghaulers

4–6 concurrent treatments is the empirical sweet spot. Core stack: electrolytes, magnesium, LDN, antihistamines.

Fatigue Treatments

r/covidlonghaulers

LDN and magnesium lead outcomes for fatigue specifically. Findings consistent with the broader treatment analysis.

PSSD Harm Profile

r/PSSD

Sertraline and paroxetine associated with worst outcomes. Microdosing psilocybin helps; full-dose hurts. Same compound, opposite outcomes by dose.

PSSD Recovery

r/PSSD

Antihistamines show the strongest recovery signal, suggesting a neuroinflammatory mechanism.

Abortion Experience

r/abortion

Support system predicts emotional outcomes better than medical method. Co-presenting relief and guilt are the dominant reported pattern.

Negative Predictors

r/covidlonghaulers

Analysis of factors that predict worse treatment outcomes across the long COVID cohort.

Back to work