We are building quantum diffusion models — generative systems that encode structured data as quantum states and learn to denoise them in Hilbert space.

The goal is generative models that learn structure natively rather than memorizing it through repetition at scale. Solo founder lab, San Francisco.

q0q1q2q3HHHHHHH|0>+|1>measure

Research Directions

Quantum Diffusion

Replacing the classical Gaussian noise schedule with a quantum-parameterized denoising process over Hilbert space. Data encodes as quantum states; the model learns to denoise in Hilbert space rather than memorizing structure through scale.

Hybrid Classical-Quantum Pipelines

Classical CNN encoders compress visual data onto qubit statevectors. A quantum module models the latent distribution. A classical decoder reconstructs the output. The goal: orders-of-magnitude less data and energy than classical diffusion.

Barren Plateau Mitigation

Global cost functions become exponentially flat beyond ~6 qubits. We are developing local cost functions, layerwise pre-training, and adaptive noise schedules calibrated to measured saturation depth d* ≈ 1.5n.

Quantum Generative Priors

Parameterized quantum circuits as inductive biases for generative models. Quantum states are constrained to the unit sphere in ℂ^(2^n), structured by complex phases and entanglement geometry in ways classical latents are not.

Selected Work

View all
Quantum Diffusion Models: First Experiments

Experiment · Mar 2026 · Released

We trained a quantum denoising circuit on systems from 4 to 16 qubits, characterised the quantum noise process across all scales, and ran the denoiser at 10 qubits. Full empirical data: noise schedule characterisation via OTOC and entanglement entropy, barren plateau analysis, generalisation gap, and a path toward a working generative model. Preliminary results show over 90% reduction in required training data versus classical baselines.

LayerSkip for Mixture of Experts (MoE) Architecture

Technical Report · May 2025 · Released

We integrate Meta Research's LayerSkip early-exit framework into a Mixture-of-Experts architecture, combining width-wise sparsity (MoE expert routing) with depth-wise sparsity (layer dropout and early exit). Trained on WikiText-2 with a 12-layer, 8-expert model. Preliminary results show 25–35% inference time reductions while maintaining comparable perplexity. Analysis of exit layer patterns reveals that tokens requiring complex reasoning (proper nouns) exit at deeper layers (10–11), while common words and repeated phrases exit early (5–7). Co-authored with Nicholas Papciak at Georgia Tech.