Guest Lecture!
https://cahootery.herokuapp.com/section/5fcea466558eb10004f28682
#002554
#fed141
#007396
#382F2D
#3eb1c8
#f9423a
#C8102E
TIME
TEMP
SET TEMP
ACTUAL TEMP
GAP
Furnace switches off but some heat continues to flow
Furnace ON
B loops produce systems that oscillate or settle down.
T
H
H
bedroom 1
bedroom 2
cold Canadian house in winter
domain: states x actions (4X4)
range: rewards and punishments, costs and benefits
Actions | ||||
Current State | A1 | A2 | A3 | A4 |
S1 | S1 | S1 | S2 | S4 |
S2 | S2 | S1 | S2 | S4 |
S3 | S4 | S2 | S1 | S2 |
S4 | S2 | S4 | S2 | S1 |
domain: states x actions x nextStates (4X4X4=64)
range: rewards and punishments, costs and benefits
1 | 2 |
3 | 4 |
Actions | ||||
Current State | U | D | L | R |
S1 | 0.85,S1 0.05,S2 0.05,S3 0.05,S4 |
0.85,S3 0.05,S1 0.05,S2 0.05,S3 |
0.85,S1 0.05,S2 0.05,S3 0.05,S4 |
0.85,S2 0.05,S1 0.05,S3 0.05,S4 |
S2 | 0.85,S2 0.05,S1 0.05,S3 0.05,S4 |
0.85,S4 0.05,S1 0.05,S2 0.05,S3 |
0.85,S1 0.05,S2 0.05,S3 0.05,S4 |
0.85,S2 0.05,S1 0.05,S3 0.05,S4 |
S3 | 0.85,S1 0.05,S2 0.05,S3 0.05,S4 |
0.85,S3 0.05,S1 0.05,S2 0.05,S3 |
0.85,S3 0.05,S1 0.05,S2 0.05,S3 |
0.85,S4 0.05,S1 0.05,S2 0.05,S3 |
S4 | 0.85,S2 0.05,S1 0.05,S2 0.05,S3 |
0.85,S4 0.05,S1 0.05,S2 0.05,S3 |
0.85,S3 0.05,S1 0.05,S3 0.05,S4 |
0.85,S4 0.05,S1 0.05,S2 0.05,S3 |
I am in state X, should I do action Y? The likely result is Z - what are the expected rewards in life trajectories following from there? But I also have to consider that action Y could lead to states Z', Z'', etc.
Actions | |||||
States | Play a video game | Read | Read and take notes | Clean bedroom | Sit Exam |
Haven't Even Started | Haven't Even Started | 1 page read, low comprehension | 1 page read, high comprehension | Haven't Even Started | Did poorly on exam |
1 page read, low comprehension | 1 page read, low comprehension | 2 pages read, low comprehension | 2 pages read, mixed comprehension | 1 page read, low comprehension | Did poorly on exam |
1 page read, high comprehension | 1 page read, high comprehension | 2 pages read, mixed comprehension | 2 pages read, high comprehension | 1 page read, high comprehension | Did so-so on exam |
2 pages read, low comprehension | 2 pages read, low comprehension | 2 pages read, mixed comprehension | 2 pages read, mixed comprehension | 2 pages read, low comprehension | Did poorly on exam |
2 pages read, mixed comprehension | 2 pages read, mixed comprehension | 2 pages read, mixed comprehension | 2 pages read, high comprehension | 2 pages read, mixed comprehension | Did so-so on exam |
2 pages read, high comprehension | 2 pages read, high comprehension | 2 pages read, high comprehension | 2 pages read, high comprehension | 2 pages read, high comprehension | Did well on exam |
Did well on exam | - | - | - | - | - |
Did so-so on exam | - | - | - | - | - |
Did poorly on exam | - | - | - | - | - |
“It seems probable that once the machine thinking method had started, it would not take long to outstrip our feeble powers.... They would be able to converse with each other to sharpen their wits. At some stage therefore, we should have to expect the machines to take control.”
Alan Turing, 1951.
Integrate AI: The Ethics of Artificial Intelligence (2000)
1. If you want the future to look different from the past, you need to design systems with that in mind.
2. Be clear about what proxies do and don’t optimize.
3. When you deal with abstractions and groupings, you run the risk of treating humans unethically.
4. Beware of correlations that mask sensitive data behind benign proxies.
5. Context is key for explainability and transparency.
6. Privacy is about appropriate data flows that conform to social norms and expectations.
7. 'Govern the optimizations. Patrol the results.'
8. Ask communities and customers what matters to them.
AI at Google: Our Principles (2018)
1. Be socially beneficial.
2. Avoid creating or reinforcing unfair bias.
3. Be built and tested for safety.
4. Be accountable to people.
5. Incorporate privacy design principles.
6. Uphold high standards of scientific excellence.
7. Be made available for uses that accord with these principles.
Summaries from https://aiethicslab.com/big-picture/
Google's principles also have a section "AI applications we will not pursue":
In addition to the above objectives, we will not design or deploy AI in the following application areas:
As our experience in this space deepens, this list may evolve.
First Law: A robot may not injure a human being or, through inaction, allow a human being to come to harm.
Second Law: A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
Third Law: A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.
Asimov, 1942
Specification Gaming - Gaming the Specs
The machine's only objective is to maximize the realization of human preferences.
The machine is initially uncertain about what those preferences are.
The ultimate source of information about human preferences ("are all-encompassing; they cover everything you might care about, arbitrarily far into the future.") is human behavior.
Russell 2020
ERROR
SET
OUTPUT
The What If We Succeed Problem
"As Norbert Wiener put it in his 1964 book God and Golem,7
In the past, a partial and inadequate view of human purpose has been relatively innocuous only because it has been accompanied by technical limitations. . . . This is only one of the many places where human impotence has shielded us from the full destructive impact of human folly."
Accuracy
Validity
Precision
Reliability
Bias
Fairness
Bounded Rationality
when a machine learning model used to inform decisions effects people differently on the basis of categories that are not acceptable as reasons for differential treatment in a given community
COMPAS: Race & Recidivism prediction
Word Embeddings reproduce race/gender stereotypes
NIST Idemia: ID matching
Bias is the tendency of a statistic to overestimate or underestimate a parameter
HR Example
Many sources of error are subtypes of "sampling error" - when the cases that we collect data on do not actually "represent" that universe we intend to study.
https://achievement.org/achiever/marvin-minsky-ph-d/#gallery
The capacity of a system to interpret and learn from its environment and to use those learnings to achieve goals and and carry out tasks.
27.3 The Ethics of AI ... 986
27.3.1 Lethal autonomous weapons ... 987
27.3.2 Surveillance, security, and privacy ... 990
27.3.3 Fairness and bias ... 992
27.3.4 Trust and transparency ... 996
27.3.5 The future of work ... 998
27.3.6 Robot rights ... 1000
27.3.7 AI Safety ... 1001
Russell & Norvig 4th ed 2020
"Reward functions describe how the agent ought to behave. In other words, they have normative content, stipulating what you want"
In "gridworld"
RF: S -> R
RF: S x A -> R
etc.
Machine intelligence with the capacity to learn any task that a human being can
If humans can build an AGI then an AGI would be able to improve upon itself even more quickly than humans could.
Imagine a machine that will do perfectly what it is told to do. Alignment is being able to tell it what we want it to do.
The problem: we don't know how to map "what we care about" (our "values") to the objective function that we give to an algorithm.
A machine can see the whole thing at once. A big enough machine could base its next move on all the books ever printed.
As Alfred North Whitehead wrote in 1911, “Civilization advances by extending the number of important operations which we can perform without thinking about them.”
"The main missing piece of the puzzle is a method for constructing the hierarchy of abstract actions in the first place. For example, is it possible to start from scratch with a robot that knows only that it can send various electric currents to various motors and have it discover for itself the action of standing up?"
"What we want is for the robot to discover for itself that standing up is a thing—a useful abstract action, one that achieves the precondition (being upright) for walking or running or shaking hands or seeing over a wall and so forms part of many abstract plans for all kinds of goals."
"I believe this capability is the most important step needed to reach human-level AI."
Russell p 87ff
"...of machines is that they are not human. This puts them at an intrinsic disadvantage when trying to model and predict one particular class of objects: humans."
..."acquiring a human-level or superhuman understanding of humans will take them longer than most other capabilities."
Russell p 97ff
"For privacy advocates like Mr. Stanley of the A.C.L.U., the concern is that increasingly powerful technology will be used to target parts of the community — or strictly enforce laws that are out of step with social norms."
First Law
A robot may not injure a human being or, through inaction, allow a human being to come to harm.
Second Law
A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
Third Law
A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.
Eliezer Yudkowsky : "achieve that which we would have wished the AI to achieve if we had thought about the matter long and hard."
"coherent extrapolated volition" (CEV), where the AI's meta-goal would be something like "achieve that which we would have wished the AI to achieve if we had thought about the matter long and hard."
Altruistic
Uncertain
Watch All of Us
Problem: training an agent on human values and preferences is hard
Option: a coach during training that labels good and bad behavior. Expensive. Difficult for complex tasks. Errors when bad behavior looks good.
Alternative: a coach during training that labels good and bad behavior. Expensive. Difficult for complex tasks. Errors when bad behavior looks good.
In RL, the environment "contains" rewards and the agent learns a policy to maximize its reward.
To deliberately train an agent, a human has to formulate those rewards.
A "reward function" maps states x actions to rewards.
Human observes snippets and offers critique.
Then the agent uses RL to learn a behavioral policy.
Critique is data for algorithm learning unknown reward function.