HypatiaX: LLM-Guided Symbolic Discovery

Welcome to the complete HypatiaX tutorial series! Learn how to use hybrid LLM + symbolic regression to discover scientific equations from data with near-perfect extrapolation.

🎯 What is HypatiaX?

HypatiaX is a groundbreaking framework that combines:

Large Language Models (LLMs) for intelligent initialization
Symbolic Regression for mathematically rigorous discovery
Multi-layer Validation for ensuring correctness

v2 Note (March 2026): A bug in the evaluate_llm_formula measurement harness was corrected before final paper submission. All benchmark commands require the --v2 flag. Results generated before March 2026 must be regenerated. See Tutorial 2 for full details.

Key Results from JMLR Paper:

✅ 89.2% near-perfect success rate (R²>0.99) on 74 DeFi tasks — +27 pp over pure LLM (62.2%)
✅ Median extrapolation error < 10⁻¹² (floating-point precision limit, Core-15 benchmark)
✅ Complete statistical separation from neural networks (Mann-Whitney U=0, p<10⁻⁶)
✅ Feynman SR Benchmark: 9/30 (30.0%) under aggressive PCA-directed extrapolation protocol, comparable to AI Feynman 2.0 under equivalent conditions
✅ 1.73× median speedup over neural-network inference (LLM-routed cases, 68 of 74 tasks)

📚 Tutorial Series

Tutorial 1: Environment Setup and First Discovery

Time: 15 minutes | Difficulty: Beginner

Learn to install HypatiaX and discover your first equation (Ohm’s Law). Includes:

Installation and verification
Your first formula discovery
Extrapolation validation
Understanding the core advantage: symbolic vs neural

What you’ll discover:

# HypatiaX: Median error 2.34e-13 (near floating-point precision!)
# Neural Network: Median error 12.47 (1,247% error!)

Tutorial 2: Running Benchmark Experiments

Time: 45 minutes (active) + 3–8 hours (compute) | Difficulty: Intermediate

Reproduce the three benchmark evaluations from the JMLR paper:

Benchmark	Equations	Primary metric	Section
Core 15	15 across 4 domains	Extrapolation error (%)	§6.4
DeFi Extrapolation	74 test cases	R²>0.99 at fixed n=74	§6.5
Feynman SR	30-equation subset	Recovery rate at R²>0.9999	§5.8

Key experiments:

Run individual benchmark campaigns
Compare discovery systems (Pure Symbolic, Hybrid, Pure LLM)
Parallel execution for faster results
Checkpoint/resume functionality

Tutorial 3: Statistical Analysis and Publication Figures

Time: 45 minutes | Difficulty: Intermediate

Generate all 13 publication-quality figures and reproduce statistical analyses:

Figure 3: Arrhenius equation extrapolation failure
Figure 2: Success rate comparison across domains
Figure 6: Validation cascade breakdown
Figure 9: Five-system unified comparison (13 figures total in paper)
Figures 11–13: DeFi extrapolation benchmark (new in v2)

Statistical validation:

Mann-Whitney U test: U=0, p<10⁻⁶
Cohen's d: 0.95 (pooled; see paper note on degenerate distributions)
95% CI for neural error: [1,087%, 1,456%]

Tutorial 4: Custom Applications and Extensions

Time: 45 minutes | Difficulty: Advanced

Apply HypatiaX to your own scientific problems:

6 Complete Real-World Examples:

Materials Science: Discover Hall-Petch relationship (yield stress)
Environmental Science: CO₂ sequestration formulas
Scikit-learn Integration: Drop-in replacement for ML pipelines
Multi-Equation Systems: Discover coupled ODEs (predator-prey)
Production Deployment: REST API + Docker
Custom Operators: Add domain-specific mathematical functions

Advanced extensibility: Subclass DomainValidator to add custom physical constraints for any scientific domain.

🚀 Quick Start Path

New to symbolic discovery? Follow this path:

Week 1: Tutorial 1 (15 min)
  → Install and verify
  → Discover Ohm's Law
  → Understand extrapolation

Week 2: Tutorial 2 (8-12 hours)
  → Run Core 15 and DeFi benchmarks
  → Compare with baselines
  → Understand discovery paths

Week 3: Tutorial 3 (2 hours)
  → Generate all 13 figures
  → Validate statistics (v2 corrected)
  → Create publication materials

Week 4: Tutorial 4 (varies)
  → Apply to your domain
  → Customize and extend
  → Deploy in production

📊 What You’ll Learn

Core Concepts:

Health factor calculations in DeFi
Liquidation thresholds and zombie positions
Symbolic vs neural extrapolation
Multi-layer validation cascades

Technical Skills:

Python symbolic regression (PySR)
Julia backend integration
LLM-guided initialization
Production deployment patterns

Research Methods:

Benchmark design and execution
Statistical validation (Mann-Whitney U, effect sizes)
Publication-quality visualization
Reproducibility best practices

🎯 Prerequisites

Minimum requirements:

Python 3.8+
4GB RAM
Basic command line knowledge
Understanding of mathematical formulas

Optional (for LLM features):

Anthropic API key (for 1.73× median speedup on LLM-routed cases, 68 of 74 tasks)
8GB+ RAM for parallel execution

📦 What’s Included

Each tutorial provides:

✅ Complete working code (copy-paste ready)
✅ Real examples (not toy problems)
✅ Expected outputs (verify your results)
✅ Troubleshooting (common issues solved)
✅ Quick reference (commands at a glance)

🌟 Key Advantages

Why HypatiaX?

1. Near-Perfect Extrapolation

Symbolic (HypatiaX): Median error < 10⁻¹²
Neural Networks:     Mean error 1,231% (95% CI: [1,087%, 1,456%], n=13)
Complete statistical separation: U=0, p<10⁻⁶

2. Mathematical Rigor

Discovers exact symbolic formulas
Not black-box approximations
Interpretable and verifiable

3. Domain Agnostic

Physics, chemistry, biology
DeFi AMM, DeFi Risk (finance)
Any domain with mathematical relationships

4. Production Ready

API deployment examples
Docker containerization
Scikit-learn integration

📖 Additional Resources

Paper & Code:

Community:

PySR (Python Symbolic Regression)
SymbolicRegression.jl (Julia backend)
AI Feynman (physics-inspired discovery)

🎓 Citation

If you use HypatiaX in your research:

@article{bonetchaple2026hypatiax,
  title={HypatiaX: A Hybrid Symbolic-Neural Framework for Extrapolation-Reliable Analytical Discovery},
  author={Bonet Chaple, Ruperto Pedro},
  journal={Journal of Machine Learning Research},
  year={2026},
  volume={27},
  pages={1--47}
}

Ready to begin? Start with Tutorial 1: Environment Setup!

Or jump to:

Tutorial 2: Benchmark Experiments (if already installed)
Tutorial 3: Analysis & Figures (if experiments complete)
Tutorial 4: Custom Applications (if ready to build)

💡 Support

Need help?

📖 Check the troubleshooting sections in each tutorial
💬 Ask questions in GitHub Discussions
🐛 Report bugs via GitHub Issues

Let’s discover some equations! 🧪🔬✨

HypatiaX Tutorial Series

HypatiaX: LLM-Guided Symbolic Discovery

Dr. Ruperto Pedro Bonet Chaple

HypatiaX: LLM-Guided Symbolic Discovery

🎯 What is HypatiaX?

📚 Tutorial Series

Tutorial 1: Environment Setup and First Discovery

Tutorial 2: Running Benchmark Experiments

Tutorial 3: Statistical Analysis and Publication Figures

Tutorial 4: Custom Applications and Extensions

🚀 Quick Start Path

📊 What You’ll Learn

Core Concepts:

Technical Skills:

Research Methods:

🎯 Prerequisites

📦 What’s Included

🌟 Key Advantages

Why HypatiaX?

📖 Additional Resources

Paper & Code:

Community:

🎓 Citation

💡 Support

HypatiaX Tutorial Series

HypatiaX: LLM-Guided Symbolic Discovery

Dr. Ruperto Pedro Bonet Chaple

HypatiaX: LLM-Guided Symbolic Discovery

🎯 What is HypatiaX?

📚 Tutorial Series

Tutorial 1: Environment Setup and First Discovery

Tutorial 2: Running Benchmark Experiments

Tutorial 3: Statistical Analysis and Publication Figures

Tutorial 4: Custom Applications and Extensions

🚀 Quick Start Path

📊 What You’ll Learn

Core Concepts:

Technical Skills:

Research Methods:

🎯 Prerequisites

📦 What’s Included

🌟 Key Advantages

Why HypatiaX?

📖 Additional Resources

Paper & Code:

Community:

Related Work:

🎓 Citation

💡 Support