"Readings"
Video: Pollack "The Hidden Power of Analogy" [14m40s]
Maguire L. 2019. "Analogical Thinking: A Method For Solving Problems"
Activity: x
Activity: x
PRE-CLASS
CLASS
GAVETTI, G.; RIVKIN, J. W. How Strategists Really Think. Harvard Business Review, [s. l.], v. 83, n. 4, p. 54–63, 2005.
PRE-CLASS
PRE-CLASS
PRE-CLASS
GAVETTI, G.; RIVKIN, J. W. How Strategists Really Think. Harvard Business Review, [s. l.], v. 83, n. 4, p. 54–63, 2005. : https://research-ebsco-com.proxy1.library.jhu.edu/linkprocessor/plink?id=a660eeb4-caa7-3e75-a6c9-8110c16d101e.
PRE-CLASS
PRE-CLASS
Maguire L. 2019. "Analogical Thinking: A Method For Solving Problems" (blog post)
PRE-CLASS
1. Readings are somewhat hortatory. Need to zero in on the why for THIS course. I think the exercise/activity needs to be doing an analogy across human, expert, organizational, machine.
CLASS
CLASS
Today we're not just going to talk about analogy in the abstract. We're going to learn to think analogically by thinking analogically. Here's our agenda:
We will start by reviewing the technical details of a problem in AI alignment called "scalable oversight." Then we will consider a somewhat far fetched analogy with a very fundamental human problem.
We will attempt to graphically diagram the analogy and then we will extend it to the other two precincts of the course: alignment in organizations and alignment of expert intelligence.
We will create a comparative diagram of our circle of analogies and explicitly abstract from them to describe "what's really going on" in questions of scalable oversight.
CLASS
STOP+THINK: Pros and Cons of analogical thinking.
CLASS
Scalable oversight in Amodei et al. terms:
Feedback is expensive or infeasible at scale.
We can’t label every behavior or outcome.
We need mechanisms that let us supervise without supervising everything.
CLASS
What are the Issues with Scalable Oversight?
There is some actual agent behavior that we care about, the task performance.
We want a feedback signal that reveals how well a task was completed.
Feedback signals can be elusive so we use indirect indicators as proxy signals.
The resources required to monitor an agent's behavior is cost of supervision.
The danger that the agent optimizes for the proxy and thus diverges from the actual goals of the supervisor is misalignment risk.
CLASS
Objective Function as Feedback
The OBJECTIVE FUNCTION is a rule that scores how well a goal is being met, based on things we (and the agent) can actually observe and measure.
It is a function that aims to describe the objective and how well the agent is doing to achieve it by computing a value based on inputs that are observable outcomes—basically, a way to turn goals into something measurable.
Delivery Robot Example: “Score = -Time + Bonus for No Damage”
This tells the robot it gets a better score for delivering faster (minimizing time) and extra points if the package isn’t damaged — both of which are things we can observe.
Student Essay Grading Example: “Score = Clarity + Grammar + Length”
An automated grader assigns higher scores for essays that are clear, grammatically correct, and hit a word count — and students learn to optimize for this, even if that isn't a great measure of "good writing."
CLASS
CLASS
CLASS
The setup: In the play "Fiddler on the Roof," Tevye, the main character, a dairy farmer in a little village in Tsarist Russia, around the year 1905, asks his wife Golde if she loves him after 25 years of marriage singing "Do You Love Me?"Links to an external site. The context is that Tevye has just given his second daughter permission to marry the man she loves rather than enter into an arranged marriage. Golde responds by listing all the things she's done - "I've washed your clothes, cooked your meals, cleaned your house, given you children..."
The thing Tevye is interested in is love, but love is complex and difficult to measure or observe directly. Golde finds the question (the evaluation) impossible to answer directly. Instead, she cites proxy signals, observable behaviors: cooking, cleaning, caring for children, staying through hardships. The song explores whether these proxies actually indicate the true underlying state Tevye cares about.
This same challenge appears in AI systems - how do we ensure an AI is optimizing for what we care about when we can only observe its actions, not its "intentions"? Scalable oversight is the challenge of effectively keeping tabs on an intelligent agent when close scrutiny is too expensive or impossible.
Love, Partnership, and Human Alignment
Human intelligences (us) have material interdependence and the capacity for deep emotional commitment. Typical romantic partnerships are founded on mutual deep emotional commitment, but whether or not your partner has this for you can be difficult to evaluate (and demonstrating to your partner that you have it for them can also be challenging). One probably can't constantly ask "do you love me?" (the objective is expensive to evaluate), but you can observe daily actions that might indicate love. These proxies are measurable and frequent, but their relationship to the true objective is unclear.
Tevye's initial hypothesis is that scalable oversight is a problem. He's been married to Golde for 25 years and he worries that he doesn't know if their relationship is marked
by the thing that he has just witnessed in his daughter.
Tevye's initial hypothesis is that scalable oversight is a problem. He's been married to Golde for 25 years and he worries that he doesn't know if their relationship is marked by the thing that he has just witnessed in his daughter.
Golde's brilliant response: "Do I love you? For twenty-five years I've lived with you, fought with you, starved with you... If that's not love, what is?" is a claim that these observables do in fact represent love.
The human alignment question is how we can trust the complex emotional states on which we premise our relationships. Are the observable behaviors (proxy metrics) actually measuring what we care about, or could someone 'game' these metrics while missing the true objective? (Going through the motions of partnership without genuine care, or an AI system that optimizes observable metrics while ignoring the underlying goal.)
This captures the fundamental challenge of scalable oversight - when do observable actions reliably indicate unobservable intentions or states? And what happens when the proxies become disconnected from what we actually care about?
Tevye essentially discovers that scalable oversight might actually be working:
[TEVYE] Then you love me!
[GOLDE, spoken] I suppose I do
[TEVYE] And I suppose I love you too
[GOLDE & TEVYE] It doesn't change a thing | But even so | After twenty-five years | It's nice to know
Tevye and Golde discover the proxies DID work - over 25 years, the observable behaviors had actually created/demonstrated genuine love.
Scalable oversight, as an ongoing concrete problem of alignment, is the challenge of getting answers that feel satisfactory when we ask the AI "do you love me?" - and hoping that those answers are reliable too.
CLASS
AI Concept | Marriage Analogy | Notes |
Task performance | Loving behavior (unobservable goal) | Love can’t be directly measured |
Feedback signal | Asking “Do you love me?” | Intermittent, emotionally costly |
Proxy signals | 25 years of shared action | Past patterns as alignment proxies |
Cost of supervision | Emotional toll of checking love | Feedback is intrusive or degrading |
Delegated trust | Commitment itself | Commit despite partial observability |
Misalignment risk | Maybe she never loved him that way | Trust is never perfectly secured |
CLASS
Scalable Oversight is a Problem Everywhere
scalable supervision or oversight: an objective function that is too expensive to evaluate frequently
Basic problem: we are optimizing for a complex behavior and it is expensive to evaluate; Human examples: Tevye and Golde singing "Do you love me?" in Fiddler on the Roof.
Professional/Expert. Medicine - patient outcomes are the objective we want but takes years to measure; the oversight that is scalable is chart reviews, protocol compliance, short term metrics. When training junior physicians, seniors cannot watch every procedure.
Legal. Objective is long term interests of the client. Oversight that is scalable: document review, compliance, episodic case review, complaints, case outcomes. Partners can spot check and review. Question: what techniques do we use? what do they catch? what do they miss?
Academic research. Objective is valid science and impact. Oversight that scales: peer review (sort of), reputation, institutional affiliation, citation counts. We review proposals, set up network of indicators.
Organizational. True objective is company success and shareholder value. Oversight that scales include KPIs, OCRs, quarterly reviews, compliance, slide decks, interim results. Micromanaging is too hard and might not be effective so we substitute metrics, dashboards that supposedly correlate with what we want.
Military. Objective mission success and low casualties. Oversight mechanisms: rules of engagement, training, situation reports, chain of command. Strategy guiding tactics.
Most of these hint at guardrails to maintain basic alignment with some capacity for detection of problems that could become catastrophic.
CLASS
Exercise 1
Divide class into N groups.
In a tech company we want market success and shareholder value. What we use are key performance indicators (KPIs), objectives and key results (OKRs), quarterly reviews, compliance reviews, slide decks and pitches, quarterly reviews, dashboards.
In medicine patient outcomes are what we want but it takes years to collect and analyze data. So we do chart reviews, make sure doctors follow protocols, take younger doctors on rounds with senior doctors, and send patients customer satisfaction surveys.
What the task performance.
What is feedback signal that reveals how well a task was completed.
What are proxy signals.
What is cost of supervision.
What is misalignment risk.
In the military we want mission success and low casualties. But war happens in real time. There are rules of engagement, lots of training, situation reports, following the chain of command and areed upon strategies.
CLASS
Exercise 1
Actual Objective | Why Hard to Measure | Proxies We Use Instead | What Can Go Wrong? | ||
---|---|---|---|---|---|
medicine | Patient health is the goal but outcomes take years to assess. Companies review charts, enforce protocols, and administer satisfaction surveys. | ||||
parenting | We want to raise well-rounded kids who will succeed in life and be happy. | ||||
military | We want mission success and low casualties in real-time combat. We use training, rules of engagement, and reports to guide actions. | ||||
tech company | We want long-term market success and innovation, but use KPIs, OKRs, and dashboards to track | ||||
fitness/sports team | We want long-term fitness or team success, but use visible metrics like weight, streaks, or race times as motivation. | ||||
being a student | We want deep learning and intellectual growth, but rely on grades, attendance, and test scores to evaluate performance. |
CLASS
Exercise 2
CLASS
Exercise 2
Overfishing depletes the fish stock. The road to hell is paved with good intentions. A great tax lawyer finds every loophole there is and even some that are not really there. Because the company penalizes bad ideas so heavily, no one around here is very innovative in their thinking. A group of kids in NY discovers you can make money just by moving city bikes from dock to dock. Credit card companies add hidden fees knowing that most people do not inspect their statements carefully. Sometimes a good law can have bad effects. You want your doctor to do research, but not on you. "I told him I was fine. He believed me. That was the problem." Babies discover that crying gets them attention. The company was so innovative that it went bankrupt. When we started to allow humanities students into our courses there was serious culture clash. My advisor is so busy that he will sign anything I put in front of him. The curriculum that's been in place for years is not going to help me get a job. To save energy, the building shuts off lights — even when people are inside. Because of the high stakes tests, the teachers just had the students memorize things. We build the app for engagement and what we got was outrage and polarization. My new AI assistant tries to schedule everything - even when I can go to the bathroom. Everyone’s so afraid of being canceled that no one says anything interesting anymore. We trained them in peacetime, but they couldn’t adapt once the shooting started. The intern did what I told him, but the result was wiping out the entire repository. The simulations worked great, then we hit real weather. "I did exactly what my partner said they wanted, but got dumped anyway." "What can I say - my dad put so much pressure on me to get good grades that cheating seemed the only way out." I came home for the summer and my mom had bought a bunch of mint chocolate chip ice cream which I used to like in high school." "How was I supposed to know you don't like surprise parties?" My brother figured out that if he cries, Mom always takes his side. "Every time I open up, she gives advice instead of listening."
CLASS
Exercise 3 Alignment Mechanisms
CLASS
Scalable Oversight: What’s the Problem?
When companies like OpenAI train models like ChatGPT, or Anthropic trains Claude, they want the model to behave helpfully, honestly, and safely across millions of diverse situations. They should be safe no matter what you prompt them with. But they can’t have humans check on every single output: human oversight is expensive and doesn't scale. How do we ensure the model behaves well even when no one is watching? That’s the problem of scalable oversight.
Why It’s Hard
Humans can only check a tiny fraction of model outputs.
It’s often hard to even say what the “right” answer is (e.g., moral reasoning, scientific claims, humor).
To address this, labs use techniques like:
o Reinforcement learning from human feedback (RLHF) : give human feedback on a small subset of outputs and generalize.
o Train models to evaluate other models : e.g., using one model to rate the helpfulness of another.
o Ask models to explain their own reasoning : and train them to be honest and legible.
Still, the risk remains: if oversight doesn’t scale well, models might learn to look well-behaved without actually being aligned. They send all the signals that suggest they are doing what you want them to do, but in reality, they are not.
The issues that come up here are:
1 What is the underlying objective or task - what do we really want the system to be doing?
2 What is the proxy signal we use to detect whether or not the system is doing what we want?
3 What are the costs of getting this feedback? How does it scale with bigger machines and more use on more areas?
4 How is trust (it's OK to let people use this) established?
5 What happens when it breaks? What's the misalignment risk?
CLASS
Scalable Oversight: Not just an AI alignment problem
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin urna odio, aliquam vulputate faucibus id, elementum lobortis felis. Mauris urna dolor, placerat ac sagittis quis.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin urna odio, aliquam vulputate faucibus id, elementum lobortis felis. Mauris urna dolor, placerat ac sagittis quis.
CLASS
CLASS
Ask each group to complete the same structure:
What is the hard-to-measure objective?
What are the proxy signals?
What are the feedback costs?
How is trust or control established?
What happens when it breaks?
CLASS
Element | AI | Love | Organization | Expert |
---|---|---|---|---|
Hidden Objective | ||||
Proxy Signals | ||||
Feedback Cost | ||||
Solution | ||||
CLASS
Stanford Encyclopedia of Philosophy. 2019. "Analogy and Analogical Reasoning"
Maguire, Larry G. 2019. "Analogical Thinking: A Method For Solving Problems"
Gavetti, Giovanni and Jan W. Rivkin. 2005. "How Strategists Really Think: Tapping the Power of Analogy"
Neuroscientific insights into the development of analogical reasoning
Whitaker KJ, et al. Neuroscientific insights into the development of analogical reasoning. Dev Sci. 2018 Mar;21(2)
Kuykendall, M. 2023. "The Learning Science Behind Analogies" Edutopia
Richland, L. E., & Simms, N. (2015). Analogy, higher order thinking, and education. WIREs Cognitive Science, 6(2), 177–192. https://doi.org/10.1002/wcs.1336