16:00 coffee break
AI Verification in Action: Modeling the Independent Verification Organization (IVO) Governance Framework.
Room VIII UNESCO House, address: 7 Pl. de Fontenoy-Unesco, 75007 Paris, France.
14:00
14:20
14:40
15:10
17:00
16:35
15:35
Thursday 26 February 2026
17:30
end
Phase 0: Welcome, Introductions, Norms and Groundrules
|
|
|
|


Welcome
Norms, Ground Rules

5
Introductions
5
5
Phase 0: Welcome, Introductions, Norms and Groundrules
|
|
|
|

Welcome
Gillian and Bri
5

Phase 0: Introductions and Roles


|
|
|
|

7
Introductions
Today I'm playing the role of:
INSTRUCTIONS: First tell us who you are and an IVO insight, benefit, or concern from where you sit professionally. Second, get into your role. Tell us your role and a priority and/or concern about IVOs from the perspective of your role.

2
INSTRUCTIONS: First tell us who you are and an IVO insight, benefit, or concern from where you sit professionally. Second, get into your role. Tell us your role and a priority and/or concern about IVOs from the perspective of your role.
Phase 0: Welcome and Agenda - Norms, Ground Rules, etc.

|
|
|
|

2
Norms
First, please always stay in character. Answer questions based on what your role might think or feel, not what you or someone in your job outside of the room might think.
Second, please use these workbooks to write down your responses when prompted.
Third, please be respectful of others and allow everyone to speak.
Any questions?
Fast paced. Nothing will get the time it deserves. Hear everyone, before free-for-all discussion.
We will oscillate between four modes of interaction.
Phase 0: Welcome and Agenda
|
|
|
|

1
Run of Show (4m padding)
|
|
|
|
|
|
15
17
25
24
22
33
16:00 coffee break
14:00
14:19
14:40
15:08
|
20
16:59
16:35
15:34
17:32
end
Phase 1: Outcomes
|
|
|
|


5
1

2


2
4

2

1
Define.
Level one (democratic) and level two (technical).
Review
Introduce topic and prompt
Room together.
17
Quick
practice.
Scan.
Multi-industry
examples.

1
Heads-down
generate outcomes
Heads-together
discuss and compare
Phase 1: Outcomes - Democratic and Technical
|
|
|
|

1
The basic idea behind IVOs is that societal outcomes are articulated by a democratic process and IVOs verify that technically measurable versions of these outcomes are met.
Good level-one outcomes are democratically viable.
They...
-
...can be understood and debated by affected stakeholders, not just technical experts
-
...represent genuine value choices that reasonable people might weigh differently
-
...are specific enough to guide action but general enough to accommodate different approaches to achievement
Good level-two outcomes are technically viable.
They...
-
...can be operationalized into specific, verifiable measurements or tests
-
...provide meaningful signal about whether level-one outcomes are being achieved (not just proxy measures that can be gamed)
-
...are feasible to assess given constraints on time, cost, expertise, and technology
Phase 1: Outcomes - Examples from Other Industries


|
|
|
|

1
Phase 1: Outcomes - Practice

|
|
|
|

1
INSTRUCTIONS: Fill in the blanks. What might be some technical outcomes related to keeping flying safe? What might be a generic, societal level outcome for maritime that is implemented by these concrete outcomes?
Phase 1: Outcomes - Practice

|
|
|
|

2
FAA approves design changes
Accident rates below targets
Ships don't sink
Phase 1: Outcomes

|
|
|
|

5


2
4

1
INSTRUCTIONS: Take two minutes, heads down to make some notes about a level one outcome and level two technical outcome for LLM-based products and self-harm by young people. Give some thought to WHO ought to be determining such outcomes and how stakeholders will respond to these technical outcomes.
What constituencies will/will not be satisfied that this technical outcome suffices for this democratic outcome?
Phase 2: Licensing
|
|
|
|



5
2
3

2


2
3


2
3

1
Not new. SMOG example.
Many different licensing schemes
Collection of generic requirements

1
Task 2: which ones would need tweaking for AI?
Task 3: design a licensing scheme within a budget. Room together stong consensus? Disagreement?
Individual think and commit, discuss/argue, approach consensus.
Individual think and commit, discuss/argue, approach consensus, room share, room consensus.
25
Task 1: Scan, must have, no go

1
Individual think and commit, discuss/argue, approach consensus.
Phase 2: Licensing
Vehicle Emission Inspections in the US
Vehicle owners must submit to periodic inspections.
Inspection station licensing can include.
Entry Requirements.
Physical facilities must meet standards
Testing equipment must meet specifications and come from approved vendors.
Technicians require training and certification.
There are also background checks, liability insurance requirements, and application fees.
License Process.
Application, facility inspection, connection to state database.
Ongoing Oversight.
Annual renewal. Covert audits and random inspections. Realtime anomaly detection via database, audits, mandatory continuing education.


1
|
|
|
|
Phase 2: Licensing

Independent verifiers in different industries are subject to all manner of licensing requirements.

1
|
|
|
|
Phase 2: Licensing - Must Have and No-Go

Ex 1. Heads down: Spend some time scanning the list. Anything missing? Identify requirements that feel like "MUST HAVEs" for an IVO licensing scheme and requirements that seem like are potential deal breakers. Put marks in the "must have" and "no go" columns for any which meet these requirements for you.
Heads together: What does the team agree on? Disagree on?
Room together: Consensus/divergence from each team.
Here's a list of generic licensing requirements based on a survey of many different industry verification schemes we identified a few dozen . We divide these into
- Prerequisites
- Process
- Oversight
- Enforcement


2
3
|
|
|
|

1
Phase 2: Licensing

Ex 2. Heads down: Spend some time with the list of requirements and identify any that might be a challenging fit for AI. Mark these in the "AI fit?" column.
Heads together: Where is there consensus? Where are there divergences?


2
3
|
|
|
|
Some generic licensing requirements might be "right" but will need some rethinking in order to fit them into an AI context.

1
Phase 2: Licensing - Build a Scheme

Let's design a licensing scheme for IVOs. But we want to keep it simple. You have a "budget" of no more than 8 requirements. Which ones do you pick? Why?
Ex 3. Heads down: Design a licensing scheme, picking no more than 10 requirements.
Heads together: Discuss with team, note differences, work toward consensus, note divergences
Room together: What emerged as consensus requirements for each team? Where were there divergences?


2
3
|
|
|
|

1
Phase 2: Licensing

Is there a consensus? Where do we diverge?
Ex 4. Five minutes to get a sense of the room

5
|
|
|
|
Phase 3: Modes of Engagement
Based on a survey of industries where third party providers are a part of a regulatory regime we generated a list of a dozen generic rules of engagement.
Ex 1. One minute heads down. Scan the lists. What's missing or mischaracterized?
Two minutes heads together.
Ex 2. Five minutes heads down. Sort the lists into four boxes: Load-bearing; Needs redesign for AI; Important but save for later; Potential for disagreement. Five minutes heads together: discuss and steer toward consensus.
Ten minutes room together: compare, contrast.


2
4

1


4
4

5

2
Explain Modes of Engagement.
Gov-IVO
Individuals classify
Appropriateness varies. Classification task.

|
|
|
|
24
fix order of pages
IVO-Company.

1
Validation

Review lists. What's missing? What to cut? Change?
Compare. Contrast. Consensus? Divergence?
1
Teams classify
Room classifies
Phase 3: Modes of Engagement - Between Governments and IVOs
-
Qualification. Government substantively reviews IVO.
-
Authorization, recognition, licensing
-
Formal delegation agreement or MOU about IVO's delegated functions, duties, limits.
-
Articulation of prescribed standards and procedures. What standards, methods, protocols must IVO faithfully apply?
-
Delegation of authority. Government specifies what inspection, certification, rating, or approval determinations by IVO carry legal effect?
-
Periodic inspection and audits of IVO quality systems, personnel, practices.
-
Validation. Government tracks outcomes and re-examines sample of certified entities to verify accuracy and consistency.
-
Mandatory reporting. Some nonconformity or incidents trigger reports.
-
Periodic reporting on caseloads, outcomes, activity.
-
Government requires IVO to publicly disclose evaluation methodologies, standards, and outcome statistics.
-
Government maintains sanction and enforcement authority over IVO.
-
Government investigates IVO misconduct or complaints.

1
|
|
|
|
Phase 3: Modes of Engagement - Between IVOs and Companies
-
Market selection. Licensed IVOs market themselves. Regulatee selects based on cost, reputation, expertise, geography, jurisdiction.
-
Disclosure and Access. Regulatee submits technical information and data and/or provides IVO access to models/products.
-
Physical inspections and interviews by IVO of company.
-
Testing and Technical Evaluation.
-
Embedded or hold-point oversight. In a high stakes context IVO might be "inside" and regulatee cannot proceed past a hold point without signoff.
-
Issue certification or rating.
-
Confidential feedback to management.
-
Ongoing monitoring.
-
Change notification and re-evaluation.
-
Response to findings and corrective action protocol.
-
Suspension or revocation and appeal.
- Conflict of interest monitoring/disclosures.
|
|
|
|

1
Phase 3: Modes of Engagement - Validation



2
4
|
|
|
|
Exercise
←Assess both Lists→
Anything missing? What doesn't belong? What doesn't quite fit the AI context?
Phase 3: Classifying Modes of Engagement
Load-bearing.This interaction is structurally necessary for the IVO system to have integrity. If it's absent, the whole edifice loses credibility.
Needs AI-specific redesign. This interaction works well in physical-world analogues (nuclear, aviation, food) but maps poorly onto AI: concepts don't translate, mechanics are unclear, or pace of capability change breaks the underlying assumption.
Viable only when ecosystem matures. This interaction makes sense in principle but can't be done until there's infrastructure, precedent, or market depth that doesn't yet exist. It's a phase-two interaction.
Live negotiation zone. Stakeholders genuinely disagree about whether this should exist, in what form, or who controls it. Not because anyone is wrong, but because real interests conflict.

2
|
|
|
|
Phase 3: Modes of Engagement




4
4
4
|
|
|
|
←Which modes of engagement →
would you classify into
each of these boxes?
[NOTE: some might be "none of these"]
Phase 4: Something Happens



5
2
5

2


2
5


2
5

5

1
Describe EVENT
Point out INFO ASYMMETRY
Describe first decision trees
Set task
After Action Report

1

1


1
5
Explain phase two task
Explain phase three task
Explain after action task
|
|
|
|
42
72 hours decision trees
First
month
narrative
Longer term
decision
trees


2
|
|
|
|
Event Description: An AI service called YOLT which had been certified by ACME IVO experiences large scale incidents of self harm among young users.
Those of us playing different roles have, at the outset, different info sets.
We will convene in our role teams and work through (1) decision tree covering the first 72 hours. (2) scenario description for what happens over the first month or so. (3) decision tree covering longer term (2-3 month fallout and followup).
Phase 4: Something Happens - Asymmetric Information
Phase 4: Something Happens - THE IMMEDIATE RESPONSE



|
|
|
|

1
INSTRUCTIONS: Stylized decision trees read left to right.
Circles are decision points. Rectangles are possible next steps.
Each role has a different decision sequence.
Speak from your role based on information you have.
Trace your way through this possible decision tree.
Make any changes as you see fit.
Phase 4: Something Happens - THE IMMEDIATE RESPONSE

|
|
|
|
Certification should let us mostly move forward
How did this happen? Did you miss something?
Or congress?
Gotta get out front...

1
Phase 4: Something Happens - THE IMMEDIATE RESPONSE



|
|
|
|


2
5
INSTRUCTIONS: Stylized decision trees read left to right.
Circles are decision points. Rectangles are possible next steps.
Each role has a different decision sequence.
Speak from your role based on information you have.
Trace your way through this possible decision tree.
Make any changes as you see fit.
Phase 4: Something Happens - ONE MONTH OUT
|
|
|
|


2
5

5

1
INSTRUCTIONS: (optionally) use the prompts on the left to craft a narrative about what happens over the next month.
You might take the optimistic (regulation succeeds) or the pessimistic (regulation fails) path.


COFFEE
Phase 4: Something Happens - Month 2 and 3





2
5

1
|
|
|
|
INSTRUCTIONS: Stylized decision trees read left to right.
Speak from your role based on information you have.
Trace your way through this possible decision tree.
Make any changes as you see fit.
Phase 4: Something Happens - AFTER ACTION REPORT

|
|
|
|


1


1
5
5
INSTRUCTIONS: Fill in the blanks to describe your best and worst case scenarios.
Phase 5: Debrief



3
2
3

2


1
2


1
2

3

30s
Describe Task
Gaps, Challenges, and Obstacles. Try to elicit as many ideas as possible.
Time Line. Elicit range of dates and milestones candidates.
Next Steps
IVOs in context


1
2
Explain phase two task
Explain phase three task
Explain next steps task
|
|
|
|

5

3

30s

30s
33
Phase 5: Debrief

|
|
|
|


2
3

2

5
INSTRUCTIONS: Having been through today's conversations, what do you see as gaps, challenges, and obstacles to IVOs?
Try to generate at least three.
Then classify them in this 2x2 table.
Phase 5: Debrief




3
1
2

|
|
|
|
30s
1. When might an
effective IVO
ecosystem exist?
2. What has to happen before that?
Phase 5: Debrief




3
1
2

30s
|
|
|
|
Instructions. Consider the current state of AI safety and assurance. Rate the IVO framework relative to the status quo in terms of promising safer outcomes, more democratic accountability, and fostering innovation.
Phase 5: Debrief




3
1
2

30s
|
|
|
|
Instructions. Consider the current state of AI safety and assurance. Rate the IVO framework relative to traditional approaches to regulation in terms of promising safer outcomes, more democratic accountability, and fostering innovation.
Phase 5: Debrief

|
|
|
|



3
1
2

30s
Instructions. Fill in the blanks
Run of Show
By Dan Ryan
Run of Show
- 18