QQ COGNITION PILOT · WANG–BUSEMEYER 2013 EMPIRICAL DATA · IBM FEZ EXECUTION

QPC-QQ-v3: Polycontextural Reproduction of the Wang–Busemeyer Question-Order Effect

A discriminating K-ablation against published empirical Clinton–Gore joint distributions, executed end-to-end on ibm_fez. The first QPC pilot to produce a quantitative architectural claim with hardware-archived statistical significance.

Plain English

What this test is: Reproduce Wang–Busemeyer question-order effects on IBM Fez — K=4 polycontextural contexts vs K=1 control, same 16 qubits, depth, and shots.
Why we did it: Only the number of contextures changes. If K=4 wins, the effect tracks polycontextural blocking, not generic 16-qubit noise.
Headline result: K=4 beats K=1 on held-out joint prediction · bootstrap p < 0.0005 · 18 IBM job IDs archived.
What we claim — and do not: Claim: architectural discrimination on real hardware. Do not claim: re-proving classical impossibility (W&B 2013 already did).

Reader's guide

Start here for the narrative, the K-ablation table, and the bootstrap discrimination result. Follow the chain below if you audit outcomes via job IDs.

This report — executive summary: task, data contract, simulator pre-flight, IBM Fez K-ablation (16 qubits, 18 jobs), bootstrap discrimination at p<0.0005, interpretation boundary.
Machine-readable evidence — JSON produced alongside scripts on your machine (not hosted here): QQ v3 evidence bundle in the QQ workspace folder. Publish copies with your submission bundle if a reviewer requires downloads.
Pilot iteration history: v1 (initial run, K=1 control too degenerate), v2 (K=1 control fixed; QQ-residual metric did not discriminate within quantum), v3 (this report — divergence-based metric, simulator pre-flight verified before hardware).

Navigation: Home · Highlights

Task and goal

The task asks whether QPC's polycontextural architecture — multiple coexisting contextures with their own quantum-logical states — reproduces the empirical structure of contextual cognitive data better than a faithful non-polycontextural control of equal quantum resources. The empirical target is the Clinton–Gore question-order experiment of Wang & Busemeyer (2013), in which a 1997 Gallup poll of 1,002 respondents found a robust, replicable order effect: the joint answer distribution differs depending on whether the Clinton question is asked first or second. Wang & Busemeyer prove that this empirical structure cannot arise from any single Kolmogorov probability space (no Bayesian or Markov model satisfies the QQ equality the data satisfies). Classical impossibility is therefore established by theorem and empirical replication; this pilot does not re-prove it.

What this pilot does claim:

Demonstrate that a 16-qubit polycontextural QPC circuit reproduces the empirical Clinton–Gore joint-distribution shape on real quantum hardware.
Show via K-ablation that the polycontextural blocking — not generic 16-qubit quantum behaviour — is the active ingredient, with bootstrap-significant separation between K=1 and K=4 at equal qubit count, depth, and shot budget.
Archive all 18 IBM Runtime job IDs for hardware reproducibility.

IBM hardware pilot (ibm_fez)

Run archived as QQ v3 evidence bundle. IBM Quantum Platform instance routed as open-instance; SamplerV2 primitive on Heron R2 device. QPC noise reducer enabled — 3 runs per circuit, counts averaged across runs.

16q / 4c

Architecture (depth 8–16)

0.251

K=4 TV-mean to empirical

+0.0505

K=1 − K=4 TV gap

p < 0.0005

Bootstrap one-sided

4096

Shots per circuit

Sampler jobs (3 K × 2 orders × 3 runs)

2,155 s

Total wall-clock

2026-05-08

Run timestamp (UTC 22:34)

K-ablation: divergence to empirical Clinton–Gore joints

K	TV_AB	TV_BA	TV_mean	KL_mean
1 (faithful non-polycontextural control)	0.2815	0.3216	0.3015	0.4317
2 (intermediate transjunctional structure)	0.2672	0.2491	0.2582	0.3296
4 (full polycontextural)	0.2479	0.2541	0.2510	0.3115

Lower TV / KL means closer to the empirical Wang–Busemeyer joint distribution. Improvement is monotone across K on both metrics. The model fits empirical data more faithfully as polycontextural blocking is added, with parameters held fixed across K.

Bootstrap discrimination (n = 2000)

Each replicate resamples per-circuit counts from the multinomial defined by the observed shots, recomputes both TV_mean values, and records the K=1 − K=4 difference.

Metric	Mean K=1	Mean K=4	Mean diff	95% CI of diff	One-sided p	Significant @95%?
Total-variation distance	0.3015	0.2510	+0.0505	[+0.0359, +0.0655]	0.0000	YES
KL divergence	0.4319	0.3119	+0.1200	[+0.0888, +0.1520]	0.0000	YES

A one-sided p-value of zero from 2000 bootstrap replicates means every single resample showed K=4 fitting the empirical Clinton–Gore joints strictly better than K=1. The 95% confidence intervals on the difference do not cross zero on either metric.

Hardware archive

Field	Value
Backend	`ibm_fez` (Heron R2)
Mode	SamplerV2 on open-instance Runtime; readout proprietary readout
QPC noise reducer	Enabled (`[internal flag]`); 3 runs per circuit aggregated; matrix readout mitigation applicable at 16 qubits
First / last job ID	`d7v5p2jack5s73bf13jg` (K=1 AB run 1) … `d7v5q2nmrars73d7prsg` (K=4 BA run 3)
Full job list	18 IDs in `ibm_job_ids_all`

The headline result

On ibm_fez quantum hardware, the QPC polycontextural architecture (K=4) reproduces the empirical Wang–Busemeyer Clinton–Gore joint-distribution shape strictly better than a faithful non-polycontextural control (K=1) of equal qubit count, depth, and shot budget — bootstrap-significant at p<0.0005 on both total-variation and KL-divergence metrics, with parameters fit only from order-blind marginals.

This is the first QPC pilot that produces a quantitative architectural claim grounded in a controlled comparison against published empirical human data, with hardware-archived statistical significance, on a problem class where the classical limit is theorem-level rather than computational.

Interpretation boundary

What this pilot does not claim, and what readers should not infer from it:

Not a claim about human cognition. The empirical Clinton–Gore data exhibits a structure that classical Kolmogorov models theorematically cannot reproduce. We show that a polycontextural quantum architecture reproduces that structure better than a flat one. Whether brains do anything QPC-like is outside the scope of this pilot.
Not a perfect fit. TV(K=4) ≈ 0.25 means K=4 is closer to the empirical target than K=1 but is still a meaningful distance from it. This is by design: φ_AB and ψ were not tuned to the empirical joints. A separate parameter-optimisation study could push TV(K=4) much lower, but that would test a different claim ("QPC can be tuned to fit empirical data") rather than the architectural claim tested here ("polycontextural blocking does the work, not parameter tuning").
Not a uniqueness claim. Some other quantum architecture might do as well or better. We show that polycontextural blocking does the work within QPC, in a controlled comparison.
Not a binary verdict. The K-ablation reports continuous statistical evidence (effect size + confidence interval + p-value), not a pass/fail threshold. Within-quantum architectural claims do not have a sharp theoretical bound that gives binary verdicts; this is the right form of evidence for the claim.

Pilot iteration history

Three iterations were required to produce a defensible result. We document the trajectory because the methodological lessons are part of the evidence.

v1. Initial pilot using QQ-residual as the discrimination metric. K=1 satisfied QQ trivially because the K=1 control was a near-uniform distribution. The hardware run completed (18 jobs on ibm_fez, archived as QQ v1 evidence bundle) but the K-ablation could not discriminate.
v2. Redesigned K=1 as a faithful non-polycontextural control carrying the empirical marginals; raised coupling parameter and phase parameter to produce a stronger architectural signal. Simulator pre-flight revealed that QQ-residual itself is a quantum-vs-classical population metric, not a within-quantum architectural discriminator. Both K=1 and K=4 satisfied QQ; no discrimination possible. Hardware execution skipped on this pre-flight finding.
v3. Replaced QQ-residual with TV / KL divergence against published Wang–Busemeyer empirical joints. Simulator pre-flight passed (p=0.0000). Hardware execution proceeded; this report.

Data and source artifacts

Empirical target data: Wang & Busemeyer 2013, Topics in Cognitive Science 5(4):689–710, Table 1. PDF available at https://jbusemey.pages.iu.edu/quantum/QuestOrdEff.pdf.

Numbers in this page come from:

QQ v3 evidence bundle — main hardware archive (this run)
QQ v1 evidence bundle — v1 hardware archive (preserved as iteration record)

← Home Highlights HSBC report