"Readings"
Reading George Herbert Mead The Generalized Other
Activity: TBD
PRE-CLASS
CLASS
In a community of agents some measure of alignment can be achieved if agents respond to misaligned behavior. Response can range from In the individual instance, the response can arrest the behavior (restraining the agent causing disruption) but over time, consistent and certain response can be learned by agents and condition their choice of actions. We call that deterrence.
Deterrence assume that agents will sometimes cause harm, fail in their roles, or break rules but that alignment can facilitated by predictable responses to these. Mechanisms of liability, malpractice, and sanctioning rule violation reduce the impunity attached to misalignment.
PRE-CLASS
Video: Linked Title [3m21s]
PRE-CLASS
PRE-CLASS
PRE-CLASS
CLASS
CLASS
So where do we stand on a dog that learns not to bark because of being trained by being hit whenever it barks? Is that deterrence?
Brings up functional, cognitive, and moral dimensions of deterrence.
For behaviorism and control theory: operant conditioning via punishment. Functional deterrence.
But no concept of rule violation or ethical boundary.
For humans, deterrence means behavior changes, agent knows why, agent anticipates punishment. The dog has an associative learning with the anticipated punishment. The RLHF does not?
Is the dog deterred, or just conditioned? What’s the difference — and does it matter for how we train AI systems?
Resources
Author. YYYY. "Linked Title" (info)