Back to work

Team Project · Python · Claude API · Data Pipeline

PatientPunk

Millions of patients document their treatment experiences online. PatientPunk aggregates those reports into a structured research database, turning patient self-reports into queryable, analysis-ready data for researchers studying understudied conditions.

First place: Human Enhancement Hackathon

Organized by Biopunk Lab, a San Francisco community biology space focused on democratizing access to biotechnology, the hackathon challenged teams to build at the intersection of biotech and human health. PatientPunk won for its approach to turning patient-generated data into structured research signal.

GitHub

The pipeline ingests posts from Reddit and patient forums, normalizes qualitative markers into structured records, uses Claude to extract symptoms and map them to medical ontologies (MeSH, SNOMED), and outputs to CSV, SQL, and REST API. For conditions like ME/CFS and long COVID, patient testimony is often the only available signal about real-world treatment outcomes.

PythonSQLiteOpenRouterJupyterReddit scraperMeSH / SNOMED

My contributions

Scrapers

Built the Reddit ingestion scrapers that pull and normalize patient posts at scale.

Demographic analysis

Designed and implemented the demographic analysis pipeline.

Analysis pipeline

Architected the sql to results pipeline: LLM analysis with both guidelines and rails, with a focus on reproducible tests and results

Claude research skill

Wrote the Claude skill that instructs the model how to turn research questions into reproducible, publication-quality Jupyter notebooks, including chart standards, statistical requirements, and reporting conventions.

Sample figures

Autogenerated by the Claude research skill from raw patient data. No hand-tuning: chart standards, axis labels, and statistical overlays are all produced by the skill spec.

Sensitivity analysis
Line chart with error margins
Wilson score interval

Analysis notebooks

Eight research questions across three patient communities, covering 10,000+ treatment reports. Findings are preliminary, designed to demonstrate pipeline capability, not clinical conclusions.