Team Project · Python · Claude API · Data Pipeline
PatientPunk
Millions of patients document their treatment experiences online. PatientPunk aggregates those reports into a structured research database, turning patient self-reports into queryable, analysis-ready data for researchers studying understudied conditions.
Organized by Biopunk Lab, a San Francisco community biology space focused on democratizing access to biotechnology, the hackathon challenged teams to build at the intersection of biotech and human health. PatientPunk won for its approach to turning patient-generated data into structured research signal.
The pipeline ingests posts from Reddit and patient forums, normalizes qualitative markers into structured records, uses Claude to extract symptoms and map them to medical ontologies (MeSH, SNOMED), and outputs to CSV, SQL, and REST API. For conditions like ME/CFS and long COVID, patient testimony is often the only available signal about real-world treatment outcomes.
My contributions
Scrapers
Built the Reddit ingestion scrapers that pull and normalize patient posts at scale.
Demographic analysis
Designed and implemented the demographic analysis pipeline.
Analysis pipeline
Architected the sql to results pipeline: LLM analysis with both guidelines and rails, with a focus on reproducible tests and results
Wrote the Claude skill that instructs the model how to turn research questions into reproducible, publication-quality Jupyter notebooks, including chart standards, statistical requirements, and reporting conventions.
Sample figures
Autogenerated by the Claude research skill from raw patient data. No hand-tuning: chart standards, axis labels, and statistical overlays are all produced by the skill spec.
Analysis notebooks
Eight research questions across three patient communities, covering 10,000+ treatment reports. Findings are preliminary, designed to demonstrate pipeline capability, not clinical conclusions.
Treatment Overview
r/covidlonghaulersLDN, magnesium, and electrolytes ranked highest by community sentiment. SSRIs ranked lowest.
POTS Preliminary
r/covidlonghaulersPOTS subgroup tries twice as many treatments and reports consistently worse outcomes than the broader cohort.
POTS Treatment Strategy
r/covidlonghaulers4–6 concurrent treatments is the empirical sweet spot. Core stack: electrolytes, magnesium, LDN, antihistamines.
Fatigue Treatments
r/covidlonghaulersLDN and magnesium lead outcomes for fatigue specifically. Findings consistent with the broader treatment analysis.
PSSD Harm Profile
r/PSSDSertraline and paroxetine associated with worst outcomes. Microdosing psilocybin helps; full-dose hurts. Same compound, opposite outcomes by dose.
PSSD Recovery
r/PSSDAntihistamines show the strongest recovery signal, suggesting a neuroinflammatory mechanism.
Abortion Experience
r/abortionSupport system predicts emotional outcomes better than medical method. Co-presenting relief and guilt are the dominant reported pattern.
Negative Predictors
r/covidlonghaulersAnalysis of factors that predict worse treatment outcomes across the long COVID cohort.