Shared Meaning

HMIA 2025

BLANK

Alignment via Shared Meaning

HMIA 2025

"Readings"

Marx: The Production of Consciousness

Activity: TBD

PRE-CLASS

CLASS

Mead: Play and Generalized Other

HMIA 2025

Karl Marx - "The Production of Consciousness" 1845

Marx is a MATERIALIST.  For him, real human activity gives rise to social relations from which ideas emerge.

This is the younger, more humanist and philosophical (as opposed to political) Marx. He focused on the human

What distinguishes humans is that they PRODUCE their means of life.

Productive Activity

BOTH what + how

Form of Life

Social Structure

Ideas

"It is not consciousness that determines life,
but life that
determines consciousness."

effects of capitalism, especially "alienation": the estrangement of workers from what they make, from the process of labor, from other people, and from their own human potential. Marx was deeply engaged with German philosophy (especially Hegel), French socialism, and English political economy. The early works emphasize the conditions for human flourishing under and beyond capitalism.

For IDEALIST like Hegel the ideas of the day (Zeitgeist) generate institutions and relations in society.

If how people think arises partly from
how they work, then
at least a part of
their consciousness
will be shared.
This yields solidarity around their productive
activity.

HMIA 2025

HMIA 2025

Humans are part individual, part social. A part of our mental content is shared with others in our society or
                             group.

But how does the social part get into their heads?

Marx's answer is "by working at productive activity together." Sharing history and culture is not just conceptual - for Marx it is real material activity.

children at play

teens at play

game

generalized other

institutions

conversation of gestures

Principles

self

HMIA 2025

Marx

Durkheim

Mead

HMIA 2025

Marx

S1 S2 S3
S4 S5 S6
S7 S8 S9

N

S

E

W

HMIA 2025

Human Programs the Robot

Robot Learns from Trial and Error

Robot Learns from Working with Human

Robot Learns from Watching Human

HMIA 2025

Reinforcement Learning
 

Agent: chooses actions from a set A={a}A = \{a\}A={a}.
 

Environment:

  • has a set of states S={s}S = \{s\}S={s},

  • dynamics described by a transition function T:S×A→ST : S \times A \to ST:S×AS,

  • reward signal defined by R:S×A×S→RR : S \times A \times S \to \mathbb{R}R:S×A×SR.

Policy: the agent’s strategy, π:S→A\pi : S \to Aπ:SA.
 

Typically we write this as a Markov Decision Process (MDP) tuple*:
 

M=⟨S,A,T,R,γ⟩\mathcal{M} = \langle S, A, T, R, \gamma \rangleM=S,A,T,R

Inverse Reinforcement Learning


Same state, action, and transition structure.

 

Reward signal is now unknown (but still R:S×A×S→RR : S \times A \times S \to \mathbb{R}R: S×A×SR).

 

Robot observes (dataset D) state-action sequences ("expert trajectories") and infers     assuming that expert's policy, π:S→A\pi : S \to Aπ:S A, is optimal.

 

Typically we write this as a Markov Decision Process (MDP) tuple*:
 

M=⟨S,A,T,R,γ⟩\mathcal{M} = \langle S, A, T, R, \gamma \rangleMIRL=S,A,T,?R,D

\hat{R}

* for simplicity we are omitting the discount factor, gamma

Cooperative Inverse Reinforcement
Learning (CIRL)

Two-player cooperative game (like flying a plane together)

G=⟨S, AH​, AA​, T, R, Θ⟩

States: S={s}; Actions: AH​ (human), AA​ (agent); Transition: T: S × AH​ × AA​ → S
 

Reward: R: S × AH​ × AA​ × S′ → R — known to human but not to agent

Parameter space: Θ = hidden human preferences (reward parameters).

Agent’s policy: πA​:S→AA​ must both act and infer Θ.

 

Human’s role: both acts in the world and conveys reward information through choices.

HMIA 2025

Norbert Wiener said it well: if you create a machine you can't turn off, you better be sure you put in the right purpose.

Alignment is important. Giving robots the right objectives and getting them to make the right trade-offs.

Inverse Reinforcement Learning

robot observes human

robot infers R(behavior)

BUT

1. don't actually want robots to want what we want - rather they should want us to get what we want.

2. IRL assumes H behavior is optimal. But best way to learn might be from non-optimal H behavior

Outline

  1. TBD
  2. TBD

HMIA 2025

HMIA 2025

PRE-CLASS

Video: Linked Title [3m21s]

HMIA 2025

PRE-CLASS

HMIA 2025

PRE-CLASS

Lecture Title

HMIA 2025

CLASS

HMIA 2025

CLASS

HMIA 2025

Resources

Author. YYYY. "Linked Title" (info)

NEXT Hierarchy

HMIA 2025 Shared Meaning

By Dan Ryan

HMIA 2025 Shared Meaning

  • 54