16:00 coffee break

AI Verification in Action: Modeling the Independent Verification Organization (IVO) Governance Framework.

Room VIII UNESCO House, address: 7 Pl. de Fontenoy-Unesco, 75007 Paris, France.

14:00

14:20

14:40

15:10

17:00

16:35

15:35

Thursday 26 February 2026

17:30

end

Phase 0: Welcome, Introductions, Norms and Groundrules

|

|

|

|

Welcome

Norms, Ground Rules

5

Introductions

5

5

Phase 0: Welcome, Introductions, Norms and Groundrules

|

|

|

|

Welcome

 

Gillian and Bri

5

Phase 0: Introductions and Roles

|

|

|

|

7

Introductions

Today I'm playing the role of:

INSTRUCTIONS: First tell us who you are and an IVO insight, benefit, or concern from where you sit professionally. Second, get into your role. Tell us your role and a priority and/or concern about IVOs from the perspective of your role.

2

INSTRUCTIONS: First tell us who you are and an IVO insight, benefit, or concern from where you sit professionally. Second, get into your role. Tell us your role and a priority and/or concern about IVOs from the perspective of your role.

Phase 0: Welcome and Agenda - Norms, Ground Rules, etc.

|

|

|

|

2

Norms

First, please always stay in character. Answer questions based on what your role might think or feel, not what you or someone in your job outside of the room might think.

Second, please use these workbooks to write down your responses when prompted.

Third, please be respectful of others and allow everyone to speak.

Any questions?

Fast paced. Nothing will get the time it deserves. Hear everyone, before free-for-all discussion.

We will oscillate between four modes of interaction.

Phase 0: Welcome and Agenda

|

|

|

|

1

Run of Show (4m padding)

|

|

|

|

|

|

15

17

25

24

22

33

16:00 coffee break

14:00

14:19

14:40

15:08

|

20

16:59

16:35

15:34

17:32

end

Phase 1: Outcomes

|

|

|

|

5

1

2

2

4

2

1

Define.

Level one (democratic) and level two (technical).

Review

Introduce topic and prompt

Room together.

17

Quick
practice
.

Scan.

Multi-industry
examples.

1

Heads-down
generate outcomes

Heads-together
discuss and compare

Phase 1: Outcomes - Democratic and Technical

|

|

|

|

1

The basic idea behind IVOs is that societal outcomes are articulated by a democratic process and IVOs verify that technically measurable versions of these outcomes are met.

Good level-one outcomes are democratically viable.

They...

  • ...can be understood and debated by affected stakeholders, not just technical experts

  • ...represent genuine value choices that reasonable people might weigh differently

  • ...are specific enough to guide action but general enough to accommodate different approaches to achievement

Good level-two outcomes are technically viable.

They...

  • ...can be operationalized into specific, verifiable measurements or tests

  • ...provide meaningful signal about whether level-one outcomes are being achieved (not just proxy measures that can be gamed)

  • ...are feasible to assess given constraints on time, cost, expertise, and technology

Phase 1: Outcomes - Examples from Other Industries

|

|

|

|

1

Phase 1: Outcomes - Practice

|

|

|

|

1

INSTRUCTIONS: Fill in the blanks. What might be some technical outcomes related to keeping flying safe? What might be a generic, societal level outcome for maritime that is implemented by these concrete outcomes?

Phase 1: Outcomes - Practice

|

|

|

|

2

FAA approves design changes

Accident rates below targets

Ships don't sink

Phase 1: Outcomes

|

|

|

|

5

2

4

1

INSTRUCTIONS: Take two minutes, heads down to make some notes about a level one outcome and level two technical outcome for LLM-based products and self-harm by young people. Give some thought to WHO ought to be determining such outcomes and how stakeholders will respond to these technical outcomes.

What constituencies will/will not be satisfied that this technical outcome suffices for this democratic outcome?

Phase 2: Licensing

|

|

|

|

5

2

3

2

2

3

2

3

1

Not new. SMOG example.

Many different licensing schemes

Collection of generic requirements

 

1

Task 2: which ones would need tweaking for AI?

Task 3: design a licensing scheme within a budget. Room together stong consensus? Disagreement?

Individual think and commit, discuss/argue, approach consensus.

Individual think and commit, discuss/argue, approach consensus, room share, room consensus.

25

Task 1: Scan, must have, no go

1

Individual think and commit, discuss/argue, approach consensus.

Phase 2: Licensing

Vehicle Emission Inspections in the US
Vehicle owners must submit to periodic inspections.

 

Inspection station licensing can include.

Entry Requirements.  
Physical facilities must meet standards

Testing equipment must meet specifications and come from approved vendors.  

Technicians require training and certification.

There are also background checks, liability insurance requirements, and application fees.

License Process.
Application, facility inspection, connection to state database.

Ongoing Oversight.
Annual renewal. Covert audits and random inspections. Realtime anomaly detection via database, audits, mandatory continuing education.

 

1

|

|

|

|

Phase 2: Licensing

Independent verifiers in different industries are subject to all manner of licensing requirements.

1

|

|

|

|

Phase 2: Licensing - Must Have and No-Go

Ex 1. Heads down: Spend some time scanning the list. Anything missing?  Identify requirements that feel like "MUST HAVEs" for an IVO licensing scheme and requirements that seem like are potential deal breakers.  Put marks in the "must have" and "no go" columns for any which meet these requirements for you.

 

Heads together: What does the team agree on? Disagree on?


Room together: Consensus/divergence from each team.

Here's a list of generic licensing requirements based on a survey of many different industry verification schemes we identified a few dozen .  We divide these into

  • Prerequisites
  • Process
  • Oversight
  • Enforcement

2

3

|

|

|

|

1

Phase 2: Licensing

Ex 2. Heads down: Spend some time with the list of requirements and identify any that might be a challenging fit for AI. Mark these in the "AI fit?" column.

 

Heads together: Where is there consensus? Where are there divergences?

 

2

3

|

|

|

|

Some generic licensing requirements might be "right" but will need some rethinking in order to fit them into an AI context.

1

Phase 2: Licensing - Build a Scheme

Let's design a licensing scheme for IVOs. But we want to keep it simple. You have a "budget" of no more than 8 requirements. Which ones do you pick? Why?

Ex 3. Heads down: Design a licensing scheme, picking no more than 10 requirements.

 

Heads together: Discuss with team, note differences, work toward consensus, note divergences

 

Room together: What emerged as consensus requirements for each team? Where were there divergences?

2

3

|

|

|

|

1

Phase 2: Licensing

Is there a consensus? Where do we diverge?

Ex 4. Five minutes to get a sense of the room

5

|

|

|

|

Phase 3: Modes of Engagement

Based on a survey of industries where third party providers are a part of a regulatory regime we generated a list of a dozen generic rules of engagement.

 

Ex 1. One minute heads down. Scan the lists.  What's missing or mischaracterized?
​Two minutes heads together.

 

Ex 2.  Five minutes heads down. Sort the lists into four boxes: Load-bearing; Needs redesign for AI; Important but save for later; Potential for disagreement. Five minutes heads together: discuss and steer toward consensus.

 

Ten minutes room together: compare, contrast.

 

 

 

2

4

1

4

4

5

2

Explain Modes of Engagement.

Gov-IVO

Individuals classify

Appropriateness varies. Classification task.

|

|

|

|

24

fix order of pages

IVO-Company.

1

Validation

Review lists. What's missing? What to cut? Change?

Compare. Contrast. Consensus? Divergence?

1

Teams classify

Room classifies

Phase 3: Modes of Engagement - Between Governments and IVOs

  1. Qualification. Government substantively reviews IVO.

  2. Authorization, recognition, licensing

  3. Formal delegation agreement or MOU about IVO's delegated functions, duties, limits.

  4. Articulation of prescribed standards and procedures. What standards, methods, protocols must IVO  faithfully apply?

  5. Delegation of authority. Government specifies what inspection, certification, rating, or approval determinations by IVO carry legal effect?

  6. Periodic inspection and audits of IVO quality systems, personnel, practices.

  7. Validation. Government tracks outcomes and re-examines sample of certified entities to verify accuracy and consistency.

  8. Mandatory reporting. Some nonconformity or incidents trigger reports.

  9. Periodic reporting on caseloads, outcomes, activity.

  10. Government requires IVO to publicly disclose evaluation methodologies, standards, and outcome statistics.

  11. Government maintains sanction and enforcement authority over IVO.

  12. Government investigates IVO misconduct or complaints.

1

|

|

|

|

Phase 3: Modes of Engagement - Between IVOs and Companies

  1. Market selection. Licensed IVOs market themselves. Regulatee selects based on cost, reputation, expertise, geography, jurisdiction.

  2. Disclosure and Access. Regulatee submits technical information and data and/or provides IVO access to models/products.

  3. Physical inspections and interviews by IVO of company.

  4. Testing and Technical Evaluation.

  5. Embedded or hold-point oversight. In a high stakes context IVO might be "inside" and regulatee cannot proceed past a hold point without signoff.

  6. Issue certification or rating.

  7. Confidential feedback to management.

  8. Ongoing monitoring.

  9. Change notification and re-evaluation.

  10. Response to findings and corrective action protocol.

  11. Suspension or revocation and appeal.

  12. Conflict of interest monitoring/disclosures.

|

|

|

|

1

Phase 3: Modes of Engagement - Validation

2

4

|

|

|

|

Exercise

←Assess both Lists→

Anything missing? What doesn't belong? What doesn't quite fit the AI context?

Phase 3: Classifying Modes of Engagement

Load-bearing.This interaction is structurally necessary for the IVO system to have integrity. If it's absent, the whole edifice loses credibility.

Needs AI-specific redesign. This interaction works well in physical-world analogues (nuclear, aviation, food) but maps poorly onto AI: concepts don't translate, mechanics are unclear, or pace of capability change breaks the underlying assumption.

Viable only when ecosystem matures. This interaction makes sense in principle but can't be done until there's infrastructure, precedent, or market depth that doesn't yet exist. It's a phase-two interaction.

Live negotiation zone. Stakeholders genuinely disagree about whether this should exist, in what form, or who controls it. Not because anyone is wrong, but because real interests conflict.

2

|

|

|

|

Phase 3: Modes of Engagement

4

4

4

|

|

|

|

←Which modes of engagement →

would you classify into
 each of these boxes?

[NOTE: some might be "none of these"]

Phase 4: Something Happens

5

2

5

2

2

5

2

5

5

1

Describe EVENT

Point out INFO ASYMMETRY

Describe first decision trees

Set task

After Action Report

1

1

1

5

Explain phase two task

Explain phase three task

Explain after action task

|

|

|

|

42

72 hours decision trees

First
month
narrative

Longer term
decision
trees

2

|

|

|

|

Event Description: An AI service called YOLT which had been certified by ACME IVO experiences large scale incidents of self harm among young users.

 

Those of us playing different roles have, at the outset, different info sets.

 

We will convene in our role teams and work through (1) decision tree covering the first 72 hours. (2) scenario description for what happens over the first month or so. (3) decision tree covering longer term (2-3 month fallout and followup).

Phase 4: Something Happens - Asymmetric Information

Phase 4: Something Happens - THE IMMEDIATE RESPONSE

|

|

|

|

1

INSTRUCTIONS: Stylized decision trees read left to right.


Circles are decision points. Rectangles are possible next steps.

 

Each role has a different decision sequence.

 

Speak from your role based on information you have.

 

Trace your way through this possible decision tree.

 

Make any changes as you see fit.

 

Phase 4: Something Happens - THE IMMEDIATE RESPONSE

|

|

|

|

Certification should let us mostly move forward

How did this happen? Did you miss something?

Or congress?

Gotta get out front...

1

Phase 4: Something Happens - THE IMMEDIATE RESPONSE

|

|

|

|

2

5

INSTRUCTIONS: Stylized decision trees read left to right.


Circles are decision points. Rectangles are possible next steps.

 

Each role has a different decision sequence.

 

Speak from your role based on information you have.

 

Trace your way through this possible decision tree.

 

Make any changes as you see fit.

 

Phase 4: Something Happens - ONE MONTH OUT

|

|

|

|

2

5

5

1

INSTRUCTIONS: (optionally) use the prompts on the left to craft a narrative about what happens over the next month.

 

You might take the optimistic (regulation succeeds) or the pessimistic (regulation fails) path.

 

COFFEE

Phase 4: Something Happens - Month 2 and 3

2

5

1

|

|

|

|

INSTRUCTIONS: Stylized decision trees read left to right.

 

Speak from your role based on information you have.

 

Trace your way through this possible decision tree.

 

Make any changes as you see fit.

 

Phase 4: Something Happens - AFTER ACTION REPORT

|

|

|

|

1

1

5

5

INSTRUCTIONS: Fill in the blanks to describe your best and worst case scenarios.

Phase 5: Debrief

3

2

3

2

1

2

1

2

3

30s

Describe Task

Gaps, Challenges, and Obstacles. Try to elicit as many ideas as possible.

Time Line. Elicit range of dates and milestones candidates.

Next Steps

IVOs in context

1

2

Explain phase two task

Explain phase three task

Explain next steps task

|

|

|

|

5

3

30s

30s

33

Phase 5: Debrief

|

|

|

|

2

3

2

5

INSTRUCTIONS: Having been through today's conversations, what do you see as gaps, challenges, and obstacles to IVOs?

 

Try to generate at least three.

 

Then classify them in this 2x2 table.

Phase 5: Debrief

3

1

2

|

|

|

|

30s

1. When might an
    effective IVO
    ecosystem exist?

2. What has to happen before that?

Phase 5: Debrief

3

1

2

30s

|

|

|

|

Instructions. Consider the current state of AI safety and assurance.  Rate the IVO framework relative to the status quo in terms of promising safer outcomes, more democratic accountability, and fostering innovation.

Phase 5: Debrief

3

1

2

30s

|

|

|

|

Instructions. Consider the current state of AI safety and assurance.  Rate the IVO framework relative to traditional approaches to regulation in terms of promising safer outcomes, more democratic accountability, and fostering innovation.

Phase 5: Debrief

|

|

|

|

3

1

2

30s

Instructions. Fill in the blanks

Run of Show

By Dan Ryan

Run of Show

  • 18