Chemical Accuracy vs. Certification: What the Gap Bars on the Leaderboard Mean

The error metric: |E_VQE − E_FCI|

Every entry on the leaderboard is evaluated by a single number: the absolute difference between the VQE ground-state energy and the exact Full Configuration Interaction (FCI) energy within the same active space. We call this the gap:

gap = |E_VQE − E_FCI|

This metric has a physical meaning: it tells you how much energy the VQE ansatz fails to recover compared to the best possible wavefunction within the chosen orbital space. A gap of zero means the ansatz found the exact ground state. A gap of 3.5 mHa means it's 3.5 milliHartree above the exact solution.

Chemical accuracy: 1.6 mHa

The gold standard in computational chemistry is chemical accuracy: an error small enough that the computed thermochemistry (reaction energies, activation barriers, bond dissociation energies) matches experiment to within 1 kcal/mol. In Hartree units, 1 kcal/mol = 1.594 mHa ≈ 1.6 mHa.

Below this threshold, a computed energy difference is reliable enough to inform real chemical decisions — predicting which reaction pathway is preferred, whether a transition state is accessible at room temperature, or how stable a molecular complex is. Above it, you're in the regime of "qualitatively correct but not quantitatively reliable."

Chemical accuracy is the aspirational target. It's what separates a scientifically useful energy estimate from a demonstration that the algorithm ran.

The certification threshold: 10 mHa

QEncode's certification threshold is set at 10 mHa (0.01 Ha) — about 6× looser than chemical accuracy. Why not use 1.6 mHa directly?

Two reasons. First, many interesting quantum algorithm demonstrations don't yet reach chemical accuracy, especially on 8-qubit molecules with standard UCCSD reps=1. Setting the bar at 1.6 mHa would exclude a large fraction of valid, reproducible, scientifically interesting results. Second, the certification threshold is about reproducibility, not optimality — it confirms the implementation is correct and the result is consistent across runs, not that the algorithm is the best possible.

Think of it this way: certification asks "did this run correctly and consistently?" Chemical accuracy asks "is this result good enough to trust for chemistry?" These are different questions.

Gap < 1.6 mHa — chemical accuracy (excellent)

1.6 mHa – 10 mHa — certified but not chemically accurate

Gap > 10 mHa — not certified (excluded from main leaderboard)

Why H₂ is green and H₂O is red

The colored bars on the leaderboard show where each entry sits on a log scale relative to the best and worst gap in the currently visible set. Green = best (lowest gap), red = worst (highest gap) within what you're looking at.

H₂ under Jordan-Wigner UCCSD achieves a gap of 1.15 × 10⁻⁹ Ha — about one nanoHartree. That's essentially machine precision; the VQE found the exact ground state of the [2,2] active space to within floating-point noise. H₂ is a trivially easy problem for UCCSD: the [2,2] space has only a single double excitation, so UCCSD with one repetition is the exact solver.

H₂O UCCSD achieves 3.54 × 10⁻³ Ha. That's about 3 million times larger than H₂'s gap. On the log scale the bars use, H₂O sits near the top (red) and H₂ sits near the bottom (green), even though both are certified and both represent correct, reproducible results.

The bars are not saying H₂O is broken. They're saying H₂O is a harder problem. The [4,4] active space has many more excitation amplitudes, a higher-dimensional optimization landscape, and UCCSD reps=1 is not the exact solver for it the way it is for H₂.

Reading the bars when filtering by molecule

When you filter the leaderboard to a single molecule — say, H₂O — the bar scale automatically adjusts to the min and max gap within that filtered set. This means:

The best H₂O entry (JW UCCSD, 3.54 mHa) shows as green
The worst certified H₂O entry (JW HEA, 7.45 mHa) shows as red
The bars let you compare relative performance within H₂O, not against H₂

Switch to "All molecules" and the scale becomes global: now H₂ at 1 nHa is green and H₂O at 3.5 mHa is red, because the scale spans the full range of gaps in the dataset. Both views are correct — they just answer different questions.

The research tier: beyond certification

Some molecules are genuinely too hard for UCCSD reps=1 to certify at any encoding. N₂ with a [6,6] active space is a prime example: the best UCCSD gap we achieve is 44 mHa — 4× above the certification threshold. This isn't a convergence failure; it's a property of N₂'s triple bond, which creates strong multi-reference correlation that single-reference UCCSD cannot capture.

These entries appear in the Research tab on the leaderboard — separate from the certified results, with a note explaining the physical reason for the large gap. The data is correct and reproducible; the method just has a known limitation for this class of molecule.

Summary

Molecule	Best gap	Status
H₂, HF, LiH	< 10 nHa	Chemical accuracy ✓
BeH₂, H₂O	~3.5 mHa	Certified, above chemical accuracy
N₂ [6,6]	44 mHa	Research tier (UCCSD limitation)

The leaderboard is designed to show all of this honestly. Certified entries that don't reach chemical accuracy are included — they represent real progress. The gap bars let you see at a glance where each result sits, both within a molecule and across the full suite. Explore the leaderboard to see the full picture.