Deterrence

HMIA 2025

WORK IN PROGRESS

Deterrence

HMIA 2025

"Readings"

Reading George Herbert Mead The Generalized Other

Activity: TBD

PRE-CLASS

CLASS

In a community of agents some measure of alignment can be achieved if agents respond to misaligned behavior. Response can range from In the individual instance, the response can arrest the behavior (restraining the agent causing disruption) but over time, consistent and certain response can be learned by agents and condition their choice of actions. We call that deterrence.

Deterrence assume that agents will sometimes cause harm, fail in their roles, or break rules but that alignment can facilitated by predictable responses to these. Mechanisms of liability, malpractice, and sanctioning rule violation reduce the impunity attached to misalignment.

Outline

Under what conditions must one say excuse me in a social situation?
What are some phrases that you might use to accuse a friend of malpractice?
Consider "Machine systems operating in shared or high-stakes environments must follow formal policy constraints. Rule enforcement may include disabling behaviors, limiting capabilities, or escalating violations for review. Arresting rule-breaking can restore alignment but punishment-based deterrence, per se, may be problematic in the machine realm since systems lack intentionality and may respond unpredictably to externally imposed penalties." We can see this in reward functions - but is "deterrence" what we are seeing in RLHF? It shares surface level similarities, but missing some internals.

HMIA 2025

PRE-CLASS

Video: Linked Title [3m21s]

HMIA 2025

PRE-CLASS

HMIA 2025

PRE-CLASS

HMIA 2025

PRE-CLASS

Deterrence

HMIA 2025

CLASS

HMIA 2025

CLASS

So where do we stand on a dog that learns not to bark because of being trained by being hit whenever it barks? Is that deterrence?

Brings up functional, cognitive, and moral dimensions of deterrence.

For behaviorism and control theory: operant conditioning via punishment. Functional deterrence.

But no concept of rule violation or ethical boundary.
For humans, deterrence means behavior changes, agent knows why, agent anticipates punishment. The dog has an associative learning with the anticipated punishment. The RLHF does not?

Is the dog deterred, or just conditioned? What’s the difference — and does it matter for how we train AI systems?

HMIA 2025

Resources

Author. YYYY. "Linked Title" (info)

HMIA 2025 Deterrence

By Dan Ryan

HMIA 2025 Deterrence

Dan Ryan PRO

djjr.us
djjr

Deterrence

HMIA 2025

WORK IN PROGRESS

Deterrence

HMIA 2025

Outline

HMIA 2025

HMIA 2025

HMIA 2025

HMIA 2025

HMIA 2025

Deterrence

HMIA 2025

HMIA 2025

HMIA 2025

HMIA 2025 Deterrence

More from Dan Ryan