QPC HSBC Fraud Pilot — Data and Sources

Path	Purpose
`~/Desktop/Competition 2/data/`	Kaggle IEEE-CIS source CSV files
`~/Desktop/Competition 2/Cursor/`	Canonical scripts, JSON outputs, concept notes
`~/Desktop/Competition 2/venv/`	Python environment used for all runs

Input Data Files

File	Role
`train_transaction.csv`	Primary transaction records (contains `isFraud`)
`train_identity.csv`	Identity/device join data by `TransactionID`
`test_transaction.csv`	Kaggle test package file (not used for labeled pilot metrics)
`test_identity.csv`	Kaggle test identity package file
`sample_submission.csv`	Kaggle template file

Primary Artifacts (Generated Numbers)

Artifact	Contains
`cursor_hsbc_pcqrc_fd_tuning.json`	Architecture sweep (8q/12q/16q), seed screening, main simulator metrics
`cursor_hsbc_matched_subset_xgb.json`	Matched-slice XGBoost baseline (same MI/split protocol)
`cursor_hsbc_pcqrc_fd_ablation.json`	K-ablation at fixed 12 qubits/depth (K=1,2,3,4,6)
`cursor_hsbc_pcqrc_fd_ibm.json`	IBM Fez run, job IDs, capped-slice hardware metrics
`cursor_hsbc_classical_baseline.json`	Full-split classical reference metrics

Run Commands (Reproducibility)

Always execute from ~/Desktop/Competition 2/Cursor with the shared venv:

cd ~/Desktop/Competition\ 2/Cursor
../venv/bin/python cursor_hsbc_pcqrc_fd_tuning.py
../venv/bin/python cursor_hsbc_matched_subset_xgb.py
../venv/bin/python cursor_hsbc_pcqrc_fd_ablation.py

IBM hardware pilot (archived May 2026 — test ROC-AUC 0.7717, PR-AUC 0.1750 on capped 48/12/48, ibm_fez max-qubits mode, 156q/12c/d6, readout z+ctxpool):

unset QISKIT_IBM_INSTANCE
export QISKIT_IBM_INSTANCE=open-instance
cd ~/Desktop/Competition\ 2/Cursor
../venv/bin/python cursor_hsbc_pcqrc_fd_ibm.py \
  --mode ibm --backend ibm_fez \
  --max-qubits-mode \
  --depth 6 \
  --seed 1000 --seed-sweep 1 \
  --readout z+ctxpool \
  --runs-per-batch 1 --use-qpc-noise-reducer \
  --batch-size 4 --shots 2048 \
  --cap-train 48 --cap-val 12 --cap-test 48

Do not type literal ... on the command line. Adjust caps/shots if you need a cheaper smoke test (--ibm-fast-defaults is an alternative preset).

IBM dependency: ../venv/bin/pip install -U 'qiskit-ibm-runtime>=0.46'

IBM credentials: quantum token in QISKIT_IBM_TOKEN or ~/.ibm_quantum_token; IAM key optional on some accounts via QISKIT_IBM_IAM_API_KEY. Instance routing is documented in README_CURSOR_WORKSPACE.md (open-instance, CRN file precedence, aggregate fallback).

Artifact: cursor_hsbc_pcqrc_fd_ibm.json contains metrics and full job-id arrays (ibm_job_ids_all).

156-qubit / noise-reducer stack: --max-qubits-mode matches Fez logical width; readout mode z+ctxpool keeps feature vectors manageable. qpc_noise_reducer.py is an optional in-repo Python helper loaded when you pass --use-qpc-noise-reducer; it supports count aggregation across --runs-per-batch repeats (readout-error mitigation inside it is only practical for smaller widths).

IBM Platform Pitfalls (Honest Run Log)

Teams often lose hours here until routing is stable:

Instance mismatch: quantum.ibm.com token + wrong CRN / wrong paid vs open quota produces “invalid instance” or empty backends.
Env vs files: QISKIT_IBM_INSTANCE overrides ~/.ibm_quantum_instance_crn in the HSBC script — stale CRN files can confuse debugging.
Open vs paid: Paid instances can hit time limits while open-instance minutes remain — route explicitly (open-instance) when using open-plan time.
SDK channel: use Runtime channel ibm_quantum_platform (not deprecated names).

The successful archived pilot used open-instance after pinned discovery quirks and executed 27 Sampler batches at max logical width; job IDs are in cursor_hsbc_pcqrc_fd_ibm.json.

Protocol Notes for Reviewers

Time-respecting split by TransactionDT (train earlier than test).
MI feature selection fit only on train-fit rows; no val/test leakage.
Scaler and angle percentile mapping fit on train-fit rows only.
Matched classical comparison uses identical 1800/500 slice contract.
IBM run uses capped rows (48/12/48) for cost-control and queue practicality; max-qubits mode uses scalable z+ctxpool readout at device logical width.

Interpretation boundary: capped IBM metrics are evidence of hardware execution and workflow, not direct evidence of production ROI or superiority vs full-data classical systems.

HSBC Fraud Pilot — Data & Sources

Workspace Layout

Input Data Files

Primary Artifacts (Generated Numbers)

Run Commands (Reproducibility)

IBM Platform Pitfalls (Honest Run Log)

Protocol Notes for Reviewers

Navigation