Chemical Accuracy vs. Certification: What the Gap Bars on the Leaderboard Mean
When you open the leaderboard and filter to H₂O, all the bars are red. Filter to H₂ and they're all green. Both are correct. Here's why — and what the two accuracy thresholds actually mean.
The error metric: |E_VQE − E_FCI|
Every entry on the leaderboard is evaluated by a single number: the absolute difference between the VQE ground-state energy and the exact Full Configuration Interaction (FCI) energy within the same active space. We call this the gap:
This metric has a physical meaning: it tells you how much energy the VQE ansatz fails to recover compared to the best possible wavefunction within the chosen orbital space. A gap of zero means the ansatz found the exact ground state. A gap of 3.5 mHa means it's 3.5 milliHartree above the exact solution.
Chemical accuracy: 1.6 mHa
The gold standard in computational chemistry is chemical accuracy: an error small enough that the computed thermochemistry (reaction energies, activation barriers, bond dissociation energies) matches experiment to within 1 kcal/mol. In Hartree units, 1 kcal/mol = 1.594 mHa ≈ 1.6 mHa.
Below this threshold, a computed energy difference is reliable enough to inform real chemical decisions — predicting which reaction pathway is preferred, whether a transition state is accessible at room temperature, or how stable a molecular complex is. Above it, you're in the regime of "qualitatively correct but not quantitatively reliable."
Chemical accuracy is the aspirational target. It's what separates a scientifically useful energy estimate from a demonstration that the algorithm ran.
The certification threshold: 10 mHa
QEncode's certification threshold is set at 10 mHa (0.01 Ha) — about 6× looser than chemical accuracy. Why not use 1.6 mHa directly?
Two reasons. First, many interesting quantum algorithm demonstrations don't yet reach chemical accuracy, especially on 8-qubit molecules with standard UCCSD reps=1. Setting the bar at 1.6 mHa would exclude a large fraction of valid, reproducible, scientifically interesting results. Second, the certification threshold is about reproducibility, not optimality — it confirms the implementation is correct and the result is consistent across runs, not that the algorithm is the best possible.
Think of it this way: certification asks "did this run correctly and consistently?" Chemical accuracy asks "is this result good enough to trust for chemistry?" These are different questions.
Why H₂ is green and H₂O is red
The colored bars on the leaderboard show where each entry sits on a log scale relative to the best and worst gap in the currently visible set. Green = best (lowest gap), red = worst (highest gap) within what you're looking at.
H₂ under Jordan-Wigner UCCSD achieves a gap of 1.15 × 10⁻⁹ Ha — about one nanoHartree. That's essentially machine precision; the VQE found the exact ground state of the [2,2] active space to within floating-point noise. H₂ is a trivially easy problem for UCCSD: the [2,2] space has only a single double excitation, so UCCSD with one repetition is the exact solver.
H₂O UCCSD achieves 3.54 × 10⁻³ Ha. That's about 3 million times larger than H₂'s gap. On the log scale the bars use, H₂O sits near the top (red) and H₂ sits near the bottom (green), even though both are certified and both represent correct, reproducible results.
The bars are not saying H₂O is broken. They're saying H₂O is a harder problem. The [4,4] active space has many more excitation amplitudes, a higher-dimensional optimization landscape, and UCCSD reps=1 is not the exact solver for it the way it is for H₂.
Reading the bars when filtering by molecule
When you filter the leaderboard to a single molecule — say, H₂O — the bar scale automatically adjusts to the min and max gap within that filtered set. This means:
- The best H₂O entry (JW UCCSD, 3.54 mHa) shows as green
- The worst certified H₂O entry (JW HEA, 7.45 mHa) shows as red
- The bars let you compare relative performance within H₂O, not against H₂
Switch to "All molecules" and the scale becomes global: now H₂ at 1 nHa is green and H₂O at 3.5 mHa is red, because the scale spans the full range of gaps in the dataset. Both views are correct — they just answer different questions.
The research tier: beyond certification
Some molecules are genuinely too hard for UCCSD reps=1 to certify at any encoding. N₂ with a [6,6] active space is a prime example: the best UCCSD gap we achieve is 44 mHa — 4× above the certification threshold. This isn't a convergence failure; it's a property of N₂'s triple bond, which creates strong multi-reference correlation that single-reference UCCSD cannot capture.
These entries appear in the Research tab on the leaderboard — separate from the certified results, with a note explaining the physical reason for the large gap. The data is correct and reproducible; the method just has a known limitation for this class of molecule.
Summary
| Molecule | Best gap | Status |
|---|---|---|
| H₂, HF, LiH | < 10 nHa | Chemical accuracy ✓ |
| BeH₂, H₂O | ~3.5 mHa | Certified, above chemical accuracy |
| N₂ [6,6] | 44 mHa | Research tier (UCCSD limitation) |
The leaderboard is designed to show all of this honestly. Certified entries that don't reach chemical accuracy are included — they represent real progress. The gap bars let you see at a glance where each result sits, both within a molecule and across the full suite. Explore the leaderboard to see the full picture.