HypatiaX: LLM-Guided Symbolic Discovery
Welcome to the complete HypatiaX tutorial series! Learn how to use hybrid LLM + symbolic regression to discover scientific equations from data with near-perfect extrapolation.
π― What is HypatiaX?
HypatiaX is a groundbreaking framework that combines:
- Large Language Models (LLMs) for intelligent initialization
- Symbolic Regression for mathematically rigorous discovery
- Multi-layer Validation for ensuring correctness
Key Results from JMLR Paper:
- β 95.8% success rate on 131 scientific equations
- β Median extrapolation error < 10β»ΒΉΒ² (floating-point precision limit)
- β Complete statistical separation from neural networks (Mann-Whitney U=0, p<10β»βΆ)
- β New state-of-the-art on Feynman SR Benchmark: 96.7% exact recovery at RΒ²>0.9999 (Hybrid DeFi, +17.4 pp over AI Feynman 2.0)
- β Mean discovery time: 390 seconds per equation
π Tutorial Series
Tutorial 1: Environment Setup and First Discovery
Time: 15 minutes | Difficulty: Beginner
Learn to install HypatiaX and discover your first equation (Ohmβs Law). Includes:
- Installation and verification
- Your first formula discovery
- Extrapolation validation
- Understanding the core advantage: symbolic vs neural
What youβll discover:
# HypatiaX: Median error 2.34e-13 (near floating-point precision!)
# Neural Network: Median error 12.47 (1,247% error!)
Tutorial 2: Running Benchmark Experiments
Time: 45 minutes (active) + 3β8 hours (compute) | Difficulty: Intermediate
Reproduce the three benchmark evaluations from the JMLR paper:
| Benchmark | Equations | Primary metric | Section |
|---|---|---|---|
| Core 15 | 15 across 4 domains | Extrapolation error (%) | Β§6.4 |
| DeFi Extrapolation | 73 test cases (66 standard) | RΒ²>0.99 at fixed n=66 | Β§6.5 |
| Feynman SR | 30-equation subset | Recovery rate at RΒ²>0.9999 | Β§5.8 |
Key experiments:
- Run individual benchmark campaigns
- Compare discovery systems (Pure Symbolic, Hybrid, Pure LLM)
- Parallel execution for faster results
- Checkpoint/resume functionality
Tutorial 3: Statistical Analysis and Publication Figures
Time: 45 minutes | Difficulty: Intermediate
Generate all 13 publication-quality figures and reproduce statistical analyses:
- Figure 3: Arrhenius equation extrapolation failure
- Figure 2: Success rate comparison across domains
- Figure 6: Validation cascade breakdown
- Figure 9: Five-system unified comparison (13 figures total in paper)
- Figures 11β13: DeFi extrapolation benchmark (new in v2)
Statistical validation:
Mann-Whitney U test: U=0, p<10β»βΆ
Cohen's d: 0.95 (pooled; see paper note on degenerate distributions)
95% CI for neural error: [1,087%, 1,456%]
Tutorial 4: Custom Applications and Extensions
Time: 45 minutes | Difficulty: Advanced
Apply HypatiaX to your own scientific problems:
6 Complete Real-World Examples:
- Materials Science: Discover Hall-Petch relationship (yield stress)
- Environmental Science: COβ sequestration formulas
- Scikit-learn Integration: Drop-in replacement for ML pipelines
- Multi-Equation Systems: Discover coupled ODEs (predator-prey)
- Production Deployment: REST API + Docker
- Custom Operators: Add domain-specific mathematical functions
Production-ready code included!
π Quick Start Path
New to symbolic discovery? Follow this path:
Week 1: Tutorial 1 (15 min)
β Install and verify
β Discover Ohm's Law
β Understand extrapolation
Week 2: Tutorial 2 (8-12 hours)
β Run Core 15 and DeFi benchmarks
β Compare with baselines
β Understand discovery paths
Week 3: Tutorial 3 (2 hours)
β Generate all 13 figures
β Validate statistics (v2 corrected)
β Create publication materials
Week 4: Tutorial 4 (varies)
β Apply to your domain
β Customize and extend
β Deploy in production
π What Youβll Learn
Core Concepts:
- Health factor calculations in DeFi
- Liquidation thresholds and zombie positions
- Symbolic vs neural extrapolation
- Multi-layer validation cascades
Technical Skills:
- Python symbolic regression (PySR)
- Julia backend integration
- LLM-guided initialization
- Production deployment patterns
Research Methods:
- Benchmark design and execution
- Statistical validation (Mann-Whitney U, effect sizes)
- Publication-quality visualization
- Reproducibility best practices
π― Prerequisites
Minimum requirements:
- Python 3.8+
- 4GB RAM
- Basic command line knowledge
- Understanding of mathematical formulas
Optional (for LLM features):
- Anthropic API key (for 73% speedup)
- 8GB+ RAM for parallel execution
π¦ Whatβs Included
Each tutorial provides:
- β Complete working code (copy-paste ready)
- β Real examples (not toy problems)
- β Expected outputs (verify your results)
- β Troubleshooting (common issues solved)
- β Quick reference (commands at a glance)
π Key Advantages
Why HypatiaX?
1. Near-Perfect Extrapolation
Symbolic (HypatiaX): Median error < 10β»ΒΉΒ²
Neural Networks: Median error 1,231%
Complete statistical separation: U=0, p<10β»βΆ
2. Mathematical Rigor
- Discovers exact symbolic formulas
- Not black-box approximations
- Interpretable and verifiable
3. Domain Agnostic
- Physics, chemistry, biology
- Economics, finance
- Any domain with mathematical relationships
4. Production Ready
- API deployment examples
- Docker containerization
- Scikit-learn integration
π Additional Resources
Paper & Code:
- π JMLR Paper (2026)
- π» GitHub Repository
- π Full Documentation
Community:
- π¬ Discussions
- π Report Issues
- π Academic Citations
Related Work:
- PySR (Python Symbolic Regression)
- SymbolicRegression.jl (Julia backend)
- AI Feynman (physics-inspired discovery)
π Citation
If you use HypatiaX in your research:
@article{bonetchaple2026hypatiax,
title={Why Extrapolation Breaks Na{\"i}ve Analytical Discovery},
author={Bonet Chaple, Ruperto Pedro},
journal={Journal of Machine Learning Research},
year={2026},
volume={27},
pages={1--47}
}
π¦ Getting Started
Ready to begin? Start with Tutorial 1: Environment Setup!
Or jump to:
- Tutorial 2: Benchmark Experiments (if already installed)
- Tutorial 3: Analysis & Figures (if experiments complete)
- Tutorial 4: Custom Applications (if ready to build)
π‘ Support
Need help?
- π Check the troubleshooting sections in each tutorial
- π¬ Ask questions in GitHub Discussions
- π Report bugs via GitHub Issues
Letβs discover some equations! π§ͺπ¬β¨