The Paradox of Symmetric Equations
Consider a ferromagnet above its Curie temperature. For iron, the Curie temperature is 770°C. Below it, iron is magnetic. Above it, not. But here's the precise point: the equations governing iron above and below the Curie temperature are symmetric. There is no preferred magnetic direction in the laws of physics. Every direction is equally consistent with the underlying equations.
Below the Curie temperature, iron picks a direction anyway. Spontaneously. Not because of any external field, not because of any asymmetry in the equations, but because the symmetric state becomes thermodynamically unstable. Any infinitesimal fluctuation gets amplified. The system falls into one of many equivalent ground states, each with a definite magnetic direction.
The ground state is asymmetric. The equations that produced it are not.
This is spontaneous symmetry breaking, and it is foundational to 20th-century physics:
- The Higgs mechanism — how elementary particles acquire mass
- Superconductivity
- The separation of fundamental forces as the early universe cooled
I want to apply this precisely — not metaphorically — to the problem of bias in large language models.
The Symmetric Training Set That Doesn't Help
A natural response to AI bias: fix the training data. If the data overrepresents certain demographics, correcting that should produce a more neutral model. This response is partially correct and substantially incomplete.
Consider a language model trained on a perfectly balanced dataset — equal representation of every demographic, geography, viewpoint, and cultural context. Rigorously enforced. The loss function treats every token equally. The architecture has no built-in preference. At the level of training specification, the model should produce symmetric outputs.
It will not. Here is why.
A neural network is initialized with random weights — approximately symmetric statistically. Training is gradient descent through a loss landscape with many minima. These minima are not all equivalent, but many achieve similar loss values while representing quite different computational functions.
Gradient descent is deterministic given a starting point, but the starting point is random. The trajectory is shaped by:
- Random initialization
- Random ordering of training examples
- Specific numerical implementation of gradient computation
All of these introduce asymmetry into a process that began from a symmetric specification.
What the Ground State Actually Is
The ground state a model occupies after training is its set of default behaviors, associations, and framings. These are shaped by the specific trajectory through weight space the training dynamics produced — not derivable from the data distribution alone, because symmetry was broken by optimization dynamics.
In practice: a model trained on balanced data will still have default framings. It will associate certain phrasings with certain contexts in ways that reflect the optimization path, not just the data.
At hireEZ, we trained screening models on data deliberately balanced across geographies, industries, and career backgrounds. We had done the data work. The models still developed implicit preferences — subtle score differences for candidates from specific regions, not traceable to any single training example or the overall distribution, but appearing consistently across multiple trained model instances.
Goldstone Modes and Zero-Cost Directions in Prompt Space
When a continuous symmetry is spontaneously broken, the breaking produces Goldstone bosons — massless modes representing directions in which the broken symmetry can be rotated. In a ferromagnet, the Goldstone mode is the spin wave: a gentle rotation of magnetization direction that costs very little energy.
Language models have an analogue. When you probe a model with slightly different phrasings of the same question — same semantic content, different surface form — outputs often vary dramatically. Some produce confident, specific outputs. Others produce hedged, generic ones.
These zero-cost directions — prompt variations producing large output changes at low semantic cost — are the Goldstone modes of the model's broken symmetry.
| Prompt Space Property | Symmetry Interpretation |
|---|---|
| Output varies dramatically with small phrasing changes | Symmetry is maximally broken in that region |
| Output is robust to prompt variation | Model's representation is more isotropic |
| Prompt engineering produces large, task-specific effects | You are navigating the Goldstone landscape |
Auditing the Ground State, Not the Training Data
If bias lives in the ground state — shaped by optimization dynamics, not just training data — then auditing training data is necessary but not sufficient. You must audit the ground state directly.
Training Data Auditing
Ground State Auditing
At hireEZ, we moved to ground state auditing after discovering data balancing wasn't sufficient. We run systematic probes — varying candidate backgrounds, phrasing, context, and task specification — and measure the output distribution. Places where the distribution is non-uniform in ways correlated with protected attributes tell us where broken symmetry has produced bias.
RLHF as Ground State Selection, Not Symmetry Restoration
RLHF is the most widely deployed technique for addressing LLM bias. It's worth being precise about what it does.
RLHF does not restore symmetry. It does not push the model back to the symmetric saddle point. It selects among ground states — applying a field that biases the model toward one particular asymmetric configuration, chosen because human evaluators prefer it.
This is the right approach. You cannot restore symmetry to a trained neural network any more than you can restore a ferromagnet to its paramagnetic state without heating it above the Curie temperature. The symmetry is broken. The question is which asymmetric ground state you want.
But understanding this framing changes expectations:
- RLHF-trained models are not neutral — they occupy a specific ground state selected by a specific preference signal
- The quality of that ground state depends on the representativeness of the human feedback
- If evaluators had systematic preferences — writing styles, cultural assumptions — those are encoded in the ground state
The Choice You Are Already Making
The conclusion from symmetry breaking physics is precise: there is no neutral AI. This is not a rhetorical point. It is a structural statement about the mathematics of training neural networks via gradient descent.
Any sufficiently complex neural network trained on real data by gradient descent will spontaneously break the symmetry of its initial state. The ground state will be asymmetric. This asymmetry will show up as systematic tendencies not traceable to individual training examples and not eliminable by better data curation alone.
The engineering question is never "can we make the AI neutral?" The question is: "which ground state is our model occupying, and is that the ground state we want?"
Every deployed model has already answered the first part. The question is whether you chose it intentionally.
// key takeaway