2026 Watchlist
The dated signals, scenario trajectories, and falsifiers that will tell you whether the framework holds.
In 60 seconds. A framework that does not specify how it could be wrong is not a framework. This page lists the dated signals over the next 18 months that will test the analysis on this site. Five scenario trajectories. Two levers that move the system this quarter. One per-organization deadline: 90 days from the day your team adopted AI-assisted drafting. One thesis falsifier: if AI volume rises in your organization but the unanchored-claim rate does not, the framework is wrong for your context.
A framework that does not specify how it could be wrong is not a framework. It is a marketing claim. This page lists the concrete signals over the next twelve to eighteen months that test the analysis on this site. Each scenario carries an observable. Each observable resolves one way or the other.
Read it as a stress test, not as a forecast you can outsource. The framework is directional. Timing is uncertain. The point of this page is to make the uncertainty visible.
The decision window: 90 days from your AI adoption date
There is one date worth treating as a deadline. It is not a calendar date. It is the day your team adopted AI-assisted drafting, formally or informally.
The 90-day window starts then. After that, the new cadence is normalized. Gates installed later feel like sabotage rather than quality control. The retrofit cost rises sharply once the new cadence is calendared, staffed, and expected by clients.
If your team has been using AI-assisted drafting for more than a quarter already, you are already inside the window where retrofit cost rises. Install the gate now.
A soft heuristic, not a structural law
The 90-day window is a control-start deadline, not a maturity claim. The doctrine treats it as the point where a release gate should be live on at least one named workflow class. Full enterprise rollout takes longer, and the window itself stretches or compresses with conditions the doctrine does not set.
Conditions that compress the window: pre-approved cloud marketplace paths, written delegated authority for the sponsor before kickoff, a funded cost center already in place, an active risk committee with budget rights, prior AI policy artifacts the team can extend rather than draft from scratch.
Conditions that extend the window: sector-specific model risk validation (finance, healthcare, insurance), EU works-council consultation requirements, cross-border data deployments, dispute resolution between Legal, Procurement, and Security on data-handling terms. Federal benchmark for scale: OMB M-24-10 issued in March 2024 set a December 2024 implementation horizon, an eight-month rollout for a government-wide governance retrofit. Plan against the conditions you actually face, not against a single calendar.
The no-regret action. Require every decision-grade brief to include:
- A numbered list of core claims.
- A source or computation for each claim.
- Explicit assumptions named, not assumed.
- At least one observation that would make the conclusion wrong.
Start with one product line. Scale once the gate proves it can hold under cycle-time pressure.
The two levers active this quarter
The system has two enforcement points with enough friction to alter the trajectory inside ninety days. Everything else is downstream of these.
Release gate (0 to 60 days)
Editorial and production teams formally require assumption registers, boundary conditions, and claim-source maps before any output carries a decision-grade label.
Procurement artifacts (0 to 90 days)
At least one procurement cycle includes structural transparency artifacts as acceptance criteria, not just deliverable templates.
Five scenario trajectories
The framework predicts five plausible futures. They are not mutually exclusive; the market can split across them. Each has a named mechanism, a named observable, and a named falsifier.
1. Polished Flood, Thin Spine (base case)
Trajectory: The market drifts into endless glossy deliverables whose core claims lack traceable support.
Mechanism: Cheap drafting pushes volume up. Fixed reviewer capacity skims rather than tests. Throughput incentives treat "reads well" as "is solid."
Observable: Review cycles lengthening, style checks substituting for evidence checks, release gates optimized for speed rather than structural validity.
Falsifier: Independent sampling shows stable anchoring of core claims despite higher AI volume. If unanchored-claim rates stay at pre-AI baselines while output rises, the drift is not happening.
2. Audit Islands, Narrative Ocean (serious possibility)
Trajectory: A minority of producers build verification artifacts. They form islands of trust. The rest of the market becomes a narrative ocean.
Mechanism: Some producers sell to high-downside buyers willing to pay for traceability. Everyone else competes on speed and polish.
Observable: Visible split in deliverable standards. Some firms ship assumption registers and claim-source maps. Others ship slides only.
Falsifier: Verification artifacts become table stakes across most major producers within twelve months. Fragmentation gives way to a new market norm.
3. Procurement of Proof (plausible, buyer-led)
Trajectory: Purchasing departments start buying traceability, not just slides. The market shifts from brand trust to auditability.
Mechanism: Buyers require claim-source maps and assumption registers as acceptance criteria. Vendors fund verification capacity to protect revenue.
Observable: At least one major procurement cycle includes structural transparency artifacts as acceptance criteria within ninety days.
Falsifier: Buyers keep selecting vendors primarily on reputation and turnaround. Procurement does not pull the system toward proof.
4. Compliance Theater Stack (plausible, regulator-led)
Trajectory: Badges and disclosures stack on top of the same thin verification layer. Compliance gets read as truth-testing.
Mechanism: Regulators demand visible action about AI-generated content. Rules focus on labeling AI use and content provenance (who made this, with what tool), not on whether the claim is true. Organizations optimize for the check-the-box audit.
Observable: New review boards, policies, or labels reference quality or governance but do not specify enforceable artifacts (no assumption registers, no claim-source maps, no discriminating tests).
Falsifier: Regulators bind certification to measurable accuracy claims, or impose real liability for false analytical claims. The trajectory changes character.
5. Validator's Edge (low probability, high impact)
Trajectory: Verification gets cheap enough that the bottleneck flips. Teams scale output without hollowing it out.
Mechanism: Validator tools cut reviewer-minutes per claim by automatically tracing assertions, surfacing missing premises, and flagging conflicts. The cost curve for verification finally bends.
Observable: At least one buyer pays a premium or extends timeline specifically for validated outputs within ninety days. Organizations that build or buy genuine validation capacity gain a measurable competitive edge.
Falsifier: Validator tools fail on false negatives, or raise review time through noise. Organizations do not trust them for decision-grade work.
Internal scenario signals
The regulatory signals below are external and lag. Three signals tell you which scenario your own organization is drifting toward, independent of enforcement timing. Instrument these inside your own pipeline.
Two-Speed Governance
Signal. Approved-channel logs stay flat while procurement records, survey results, or unsanctioned tool spend show rising AI use.
What to track. The gap between employee-reported AI use and approved-channel logs. Quarterly procurement records against the approved vendor list. Survey responses to "what AI tool did you use on this brief?"
Shadow Productivity Trap
Signal. Output volume rises but claim-to-evidence maps and reviewer records do not. The throughput gain is visible. The verification trail behind it is not.
What to track. Analytical output count per quarter against retained-evidence count per quarter. The share of released documents with a complete claim-to-evidence map.
Inline Friction Revolt
Signal. Retry rates, copy-paste flows to unmanaged tools, or exception requests rise after controls go live. Users are routing around the sanctioned path because it is slower than the unsanctioned one.
What to track. Override request rate. Retry rate at the gate. The share of decision-grade-eligible documents that reach the decision-maker through a non-lane path.
These three signals are owned, not waited for. They are observable in the first operating cycle of the lane and do not depend on a regulator, a competitor failure, or a public incident to resolve.
What would prove this framework wrong
One observation, if true, breaks the entire analysis on this site: Independent audits showing that AI-assisted volume rose, but the rate of unanchored core claims in released work did not rise.
Define "unanchored" narrowly: a core claim lacks a traceable source, a checkable computation, or a named primary witness.
Three observable proxies:
- Stable rejection rates for missing sourcing
- Stable correction or retraction rates after publication
- Stable client-reported accuracy scores over time
If those metrics stay at pre-AI baselines while volume climbs, the verification-gap story is wrong. Existing review gates are absorbing the shock. The framework should be revised, not defended.
A frank acknowledgment
The proxies above are hard to operationalize in practice. Most analytical work is quietly superseded rather than formally retracted. Client-reported accuracy is produced by clients with motivated reasoning. Rejection rates require a published rejection policy most firms do not have. The framework's falsifier is genuine in form and operationally weak in current practice.
What would make the falsifier observable:
- A sampling protocol. Random selection of decision-grade outputs from a defined population, with disclosed inclusion criteria.
- A baseline period. Pre-AI-adoption window of at least four quarters, with the adoption date logged explicitly per team.
- A defect taxonomy. Unanchored claim, missing assumption, unspecified mechanism, unfalsifiable prediction, phantom precision. Each defect with a definition and an examples set.
- Inter-rater reliability targets. Multiple coders per sample, Cohen's kappa above 0.7 before any rate is published.
- Survivorship-bias treatment. Include outputs that were withdrawn or downgraded, not only those that shipped.
Until that data exists for a representative sample, the falsifier is a methodology in waiting. The framework predicts the rate rises. The framework is currently unable to prove it. This is the most honest the page can be about its own grounding.
Regulatory signals
These signals are already live as of May 2026. Track them for implementation severity, waiver applications, and the first material enforcement actions. The pattern: when these regimes show teeth, the buyer-side correction this framework predicts accelerates.
| Signal | Date in force | What to watch for | What it tells you |
|---|---|---|---|
| SR 26-2 Revised Guidance on Model Risk Management | April 17, 2026 | First material enforcement action against a major bank for inadequate model validation. Caveat: SR 26-2 explicitly excludes generative and agentic AI from scope. Use as proxy for general model-validation enforcement direction, not as direct AI verification regulation. | The procurement-of-proof dynamic is propagating from banking, even with gen AI out of scope. Watch the next iteration of the guidance for AI inclusion. |
| GENIUS Act implementation | July 18, 2025 onward | First OCC enforcement against a stablecoin issuer for inadequate attestation; whether BDO attestations for federally-supervised stablecoins are substantive or perfunctory | Federal coalition enforcing a regulated perimeter around private dollar liabilities, even if not yet around analytical reasoning |
| NIST AI RMF in federal procurement | RMF 1.0 published Jan 2023; GenAI Profile published Jul 2024 | Federal agencies citing AI RMF as a procurement requirement under FedRAMP, GSA schedules, or agency-specific acquisition rules | Federal procurement adopting AI risk frameworks is a leading indicator of broader enterprise procurement moves |
| EU AI Act high-risk provisions | High-risk standalone systems: December 2, 2027. High-risk systems integrated into products: August 2, 2028. Dates per the AI Act omnibus political agreement. | First regulator-level fine against an AI verification provider for inadequate transparency, oversight, or risk management | European enforcement typically leads U.S. enforcement by 12-18 months in adjacent domains |
Indicators worth instrumenting now
If you want to run the framework's tests on your own pipeline, these are the measures to collect. They distinguish a real verification deficit from a phantom one.
Throughput indicators
- Documents per reviewer per period
- Average review lag from draft to signoff
- Output volume change since AI-assisted drafting was adopted
Quality indicators
- Structural-defect rate per document
- Share of documents with at least one coded defect
- Factual-error rate (separate from structural)
Traceability indicators
- Attribution time from defect flag to evidence path
- Correction latency from detection to documented fix
- Post-publication reversal rate
Incentive indicators
- What gets rewarded in performance reviews (speed, satisfaction, accuracy)
- Whether analysts are scored on forecast or claim accuracy
- Whether postmortems are run on released analysis
Routing and bypass indicators
- Router recall on a labeled high-stakes sample (the share the router catches before any selective-skip rule fires)
- Bypass rate (documents or users that reach the decision-maker through a non-lane path)
- Override request rate, and override latency against email-and-paste latency under the same deadline pressure
Four pivots, one falsifier
The 90-day window starts on your team's AI adoption date, not on the calendar. The two levers above (release gate, procurement artifact mandate) are the only enforcement points with enough short-term friction to alter the trajectory. The five scenarios each carry a named observable, and more than one can resolve simultaneously; the market can fragment.
The thesis falsifier can be run against your own pipeline if you can measure unanchored-claim rates over time. The framework predicts the rate rises with AI volume. If yours does not, the framework is wrong for your context. The watchlist exists so you can find that out, not so you can outsource the conclusion.
An epistemic note
This framework, including this Watchlist, is directional rather than precise. The core mechanism (drafting costs collapse faster than verification capacity scales) is well-supported by available evidence. The timing, the actor sequencing, and the relative probability of each scenario are less well-grounded.
Treat the scenarios as a way to stress-test your process, not as a timer with an alarm you can set. The framework's own grounding rate sits below where you would want it for a definitive forecast. The point of publishing it openly is that it can be contested, refined, and corrected.
The doctrine improves when it is contested. Substantive disagreements through the repository are welcome.