Emerging
Jun 18, 20261
67%
Researchers Detail Limitations and Refinements of SWAVE Complex-Valued Language Model

Researchers have published a detailed analysis of SWAVE, a 169.26-million-parameter complex-valued recurrent language model, documenting its three-phase development and identifying both critical structural failures and successful architectural choices. The study identifies and resolves a failure mode called cos-domination collapse and extracts six transferable engineering principles for complex-valued recurrent training.

Quick Facts
Who
research team
What
Developed SWAVE complex-valued recurrent language model
When
submitted June 16, 2026
- Developed SWAVE complex-valued recurrent language model
- Identified cos-domination collapse failure mode in Resonance Head
- Replaced Resonance Head with untied head from PAM architecture
- Found four multi-scale retention concepts non-load-bearing
- Replaced ComplexGatedUnit with squared-ReLU channel mixer
Researchers have published a detailed retrospective analysis of SWAVE, a complex-valued recurrent language model containing 169.26 million parameters, examining its development across three phases and identifying both structural challenges and successful design choices. The model was designed with three core premises: that representing language using complex waves rather than real-valued numbers enables richer information encoding, that a Cayley-parameterised unitary transition mathematically prevents state decay or explosion, and that rotating hidden states preserve signal integrity over long contexts.
The investigation identified a critical failure mode termed "cos-domination collapse," where the Resonance Head component structurally admitted imaginary-channel collapse as a global loss minimum. This architectural flaw was resolved by replacing it with an untied head featuring independent real and imaginary embedding tables derived from the Phase-Associative Memory (PAM) architecture. This refinement enabled stable training over 200,000 steps, achieving a best-step perplexity of 22.0 at step 89,861.
Throughout development, ComplexNorm and the Wave Propagation Scan proved essential to all three phases and were retained in the final architecture. However, several components were found non-load-bearing through controlled evaluation. The four multi-scale retention concepts showed no measurable improvement and were removed, while the ComplexGatedUnit was superseded by a more parameter-efficient real-valued squared-ReLU channel mixer. ProtectGatedScan was reframed from a learned behavior to a structural prior.
The research yields formal characterization of cos-domination collapse, a parallel scan implementation with log-space backward pass for numerical stability, and six transferable engineering principles for complex-valued recurrent model training. The authors also propose a plan-to-code traceability methodology designed to catch structural divergences that conventional test suites typically miss. The model was trained on the FineWeb-Edu dataset using two H100 NVL processors.
Why This Matters
This research provides critical insights into complex-valued neural network design that advance the field beyond conventional real-valued approaches. The identification of cos-domination collapse and its systematic resolution offers actionable architectural principles for practitioners building complex-valued models. The six extracted engineering principles and plan-to-code traceability methodology enable reproducible development of similarly sophisticated models, directly improving model reliability and reducing debugging cycles for researchers exploring alternative numerical representations in language modeling.
Timeline & Sources
Jun 16, 2026
WireResearch paper submitted to arXiv
Jun 18, 2026
WireResearch paper published and announced