Operating Model

Why Your AI Activity Is Not Showing Up in Operating Results. A Five-Dimension Diagnostic.

Most AI maturity models score the technology, not the operating system around it. The BRAGI five-dimension framework (Baseline, Revenue, Adoption, Governance, Intelligence) scores where AI work actually lifts revenue, cuts cost, accelerates speed, reduces risk, and compounds capability. What the dimensions are and what a high score in each looks like.

A pattern shows up in almost every mid-market AI engagement. The company has activity. The activity has budget. The activity has good people behind it. And the activity does not show up in the operating numbers. Revenue does not move. Cost does not drop. Cycle times stay where they were. Risk profile gets murkier, not clearer.

The diagnosis is almost never "the technology is wrong". The diagnosis is "the operating system around the technology was never built". Most AI maturity models do not surface this because they score the technology stack rather than the operating system. They measure pilots launched, models deployed, tools in use, and people trained. None of those metrics predict whether AI work shows up in the P&L.

This post lays out the five operating dimensions that do predict P&L movement, what each one measures, and what a high score looks like. It is the framework used inside every BRAGI Assessment.

What most AI maturity models score (and miss)

The standard AI maturity model has been around since the analytics-maturity wave of 2018-2020. It scores the technology gradient: from awareness, to experimentation, to deployment, to scale, to optimisation. The scoring is internally consistent. It misses the part that matters.

What it misses: whether the AI activity is connected to a commercial outcome, whether the operating model around the activity can absorb and scale it, whether governance is keeping pace, and whether the work is compounding into reusable capability or burning fresh budget every cycle.

A company can score "deployment-stage" on a standard maturity model and still produce zero operating impact, because deployment without operating connection produces activity without outcome. The five-dimension scorecard exists to score the operating connection, not the deployment stage.

The five dimensions

The framework is built around the operating outcomes mid-market boards actually ask about. Each dimension has a 0 to 5 scoring rubric. The composite score predicts whether AI work will show up in next four quarters of P&L.

| Dimension | What it scores | What a 5 looks like | |---|---|---| | Baseline | Whether AI activity is anchored to a measured starting point on revenue, cost, speed, risk, and capability | Every AI workstream has a named baseline number, a measurement method, and a quarterly review cadence | | Revenue | Whether AI is producing measurable lift on the top line (qualified pipeline, conversion, retention, expansion) | Named revenue workstreams with attributed lift, sponsor-level review, and an honest assessment of what would have happened without the AI work | | Adoption | Whether the people who are supposed to use the AI actually use it, and whether usage produces a behavior change | More than 60 percent of intended users use the AI tools weekly, with measurable behavior change in the underlying process | | Governance | Whether the AI work has appropriate guardrails, escalation paths, and risk classification (including EU AI Act exposure) | Named AI governance owner, documented review cadence, vendor exposure map, regulatory classification per tool, incident response path | | Intelligence | Whether the work is compounding into reusable IP, workflow patterns, vendor intelligence, and executive briefings that get better with each engagement | Documented operating patterns, a working knowledge base, named pattern owners, and an executive briefing cadence that surfaces learning across functions |

A high composite score (above 18 out of 25) is associated with AI activity that moves the P&L. A low composite score (below 10) is associated with AI activity that consumes budget without producing operating change, regardless of how mature the technology stack is.

What the dimensions measure that others do not

The five-dimension framework is opinionated. The opinions are based on what produces operating movement in mid-market companies between 50M and 500M USD revenue.

Baseline scores the measurement discipline, not the measurement itself. Most AI workstreams are scoped without a starting number. When the project ends and someone asks "did it work", there is no anchor for a credible answer. A high baseline score means every workstream has a number to beat, a method for measuring against it, and a forum where the measurement is honestly reviewed.

Revenue is the dimension most companies underscore. Most AI work is scoped against efficiency or cost reduction because those are easier to measure. Revenue is harder, slower, and more political. A high revenue score requires that at least one AI workstream has a sponsor whose own bonus moves with the revenue number, which forces the measurement discipline up the stack.

Adoption is the dimension that decouples from training spend. Most companies spend on AI training and then assume adoption follows. It does not. The IBM 2026 CEO Study found that 86 percent of CEOs say employees have the skills, but only 25 percent of the workforce uses AI regularly. The gap is not skills. The gap is workflow design and management cadence. A high adoption score requires both.

Governance is the dimension that surprises most boards. Most mid-market companies do not have a vendor exposure map, a named AI governance owner, or a documented review cadence. With the EU AI Act's main obligations landing 2 August 2026, the governance dimension shifts from "good practice" to "operational prerequisite" for any company with EU customer exposure.

Intelligence is the dimension that determines whether AI work compounds. Most engagements produce one-off output: a model, a tool, a deployment. The intelligence dimension scores whether the engagement also produces reusable artifacts (workflow patterns, vendor briefings, governance templates, executive briefings) that compound across future cycles. Without it, every engagement starts from zero.

How the scoring rubric works in practice

Each dimension has a 0-to-5 rubric. The rubric is concrete, not subjective. A representative example for the Revenue dimension:

  • 0: No AI workstream has a named revenue outcome
  • 1: At least one workstream is named against a revenue outcome but no measurement is in place
  • 2: At least one workstream has a measurement method but no baseline number
  • 3: At least one workstream has a baseline, a measurement method, and a sponsor
  • 4: Multiple workstreams, sponsor-level review cadence, and attribution discipline
  • 5: Multiple workstreams with named lift, honest counterfactual analysis, and clear handoff into business-as-usual

The other four dimensions follow the same pattern. The rubric is designed so that scoring is reproducible across reviewers, which is the property most subjective maturity models lose.

Composite score interpretation

| Composite (out of 25) | What it predicts | |---|---| | 21-25 | AI work is operationally embedded. Focus shifts to scale and compounding. | | 16-20 | AI work is producing movement but not yet compounding. Focus on intelligence and governance to scale. | | 11-15 | Activity-heavy, outcome-light. Most mid-market companies score here. Focus on baseline and revenue to connect activity to outcomes. | | 6-10 | High structural risk. Likely zombie pilots, missing governance, no revenue accountability. Focus on baseline first, then revenue. | | 0-5 | Pre-engagement. AI conversation has not yet produced a workable operating shape. Focus on commercial framing before any tool decision. |

For context, the median mid-market score across BRAGI engagements is in the 11-15 band. The companies that get to 21-25 within 12 to 18 months are the ones that show up in next year's "AI advantage" CEO narratives.

How this connects to the buying decision

For a mid-market company evaluating AI advisory engagements, the five-dimension scorecard is a way to scope what you are actually buying. Three common scoping mistakes that the framework prevents.

Buying a tool when the gap is baseline. Most "we need an AI for X" requests are actually "we have not measured X" requests in disguise. A tool deployed on top of a missing baseline produces movement that cannot be attributed. The framework surfaces this before procurement.

Buying training when the gap is workflow. Most "we need to upskill our team on AI" requests are actually "we have not redesigned the workflow" requests. Training without workflow redesign produces engaged learners who go back to the old process. The framework surfaces this before the training budget is committed.

Buying governance when the gap is sponsor. Most "we need an AI governance committee" requests are actually "no senior leader has owned this" requests. A governance committee without an executive sponsor produces meetings and minutes, not decisions. The framework surfaces this before the committee charter is signed.

The scorecard does not eliminate the need for tools, training, or governance. It scopes what gets bought against the dimension where the actual gap sits, which is usually not where the original request pointed.

What this means in practice

For a mid-market company with AI activity in motion but uneven results, three near-term moves.

  1. Score the current state honestly. Each of the five dimensions, 0 to 5. The score does not need to be precise. It needs to surface the lowest-scoring dimension, because that is where the operating gap usually sits.
  2. Pick the lowest dimension as the next 90-day focus. Most companies try to improve all five at once. The pattern that produces movement is sequential: fix the lowest dimension first, then re-score, then move to the next-lowest.
  3. Anchor sponsorship at the right level. Each dimension has a natural sponsor altitude. Baseline and Revenue are CEO or COO. Adoption is COO or function head. Governance is CIO or CRO. Intelligence is whoever owns the operating practice (often the CFO in mid-market). Without a sponsor at the right altitude, the dimension does not move.

The companies that score themselves on this framework annually, pick the lowest dimension as the focus, and assign the right-altitude sponsor are the companies whose AI activity starts showing up in operating numbers within two to four quarters.

How Bragi helps

The BRAGI Assessment is the formal version of this scoring framework, run over four weeks with the company's leadership team and underlying data. The output is the scored baseline across all five dimensions, the lowest-dimension diagnosis, and a 90-day operating plan to move the lowest dimension first.

For companies that want to self-assess before committing to the full assessment, a shorter scored self-assessment is available at /assess.

Take the next step

If your AI activity is not yet showing up in operating numbers, the highest-leverage first step is the scored diagnostic. It produces a number on each of the five dimensions and an honest read on where the operating gap actually sits.


Sources

  • BRAGI five-dimension scoring framework (proprietary methodology)
  • IBM Institute for Business Value, 2026 CEO Study (86 percent skills vs 25 percent regular usage finding)
  • BCG Nordic AI research, late 2025 (4 percent meaningful value capture baseline)
  • EU AI Act, Regulation (EU) 2024/1689, main obligations enforcement date 2 August 2026
TAKE ACTION

Turn this into a baseline you can act on.

The BRAGI Assessment scores where AI can improve your business across revenue, cost, speed, risk, and capability. The fractional Chief AI Officer engagement turns the scorecard into operating motion.