Hi, I'm Tim Tyler - and today I will be discussing the wirehead problem
in the context of machine intelligence.
The term "Wireheads" refers to agents who have
short-circuited their reward systems - in order to gain access to
pleasurable sensations without performing useful tasks.
Drug addicts are one example of agents which have turned themselves
into wireheads. Instead of eating, having sex, and other
rewarding activities, these agents stimulate their pleasure centres
directly - typically using chemical compounds that mimic the
neurotransmitters which are involved in reinforcement learning, or
pain-killing.
Another example illustrates the origin of the term "wirehead"
- here rodents have their brains connected up to remote electrodes -
and then have their pleasure centres repeatedly stimulated when they
perform some specified action.
Here, a rat pleasures itself by pressing a metal bar.
Self-stimulation is indicated by the flashing light. The
rat immediately shows considerable interest.
Here a rat is kept in a Skinner box. The rat's
self-stimulation continues for extended periods - displacing its
normal interests in food, water and sex.
The problem
One question that arises in the context of constructing machine
intellingence is: how can we prevent mechanical wireheads from arising?
The first relevant observation is that wireheads are fairly
widespread. In addition to human drug addicts, other systems are
vulnerable to similar types of corruption:
Money is intended to motivate people towards actions society regards
as productive - but it motivates some people to perform bank robberies,
and other people to engage in counterfitting activities - actions
that go straight for the reward, and bypass the behaviours which
society intended money to produce.
Other things besides money can be forged. Products, reputations,
qualifications, citizenship and identity are other common targets for
forgers - in each case something desirable is obtained while bypassing
the normal means of its production.
On the other hand, the frequency of wirehead behaviour is typically
kept relatively low: wireheads are usually in a minority,
due to the use of anti-wirehead strategies.
Wireheads can be dangerous
A wirehead doesn't necessarily sit there, doing nothing in a
state of sublime ecstacy. Its pleasurable state may still require
effort to maintain. Consider heroin addicts - they are
short-circuiting their pleasure centres, but they still need cash to
fund their habit - and the result can be crime and violence.
Policing and surveillance
Looking at the anti-wirehead strategies used, police surveillance is
one of the main approaches. It seems reasonable to expect surveillance
to become ubiquitous in the transparent society of the future - which
might be bad news for prospective wireheads.
Superintelligence and self-improving systems
Interest in the wirehead problem often centres around
the issue of whether superintelligences can be defended
against these kinds of problems once they can
self-improve.
By assumption, the superintelligences under discussion have total
access to their inner workings, and the power to change them as they
wish.
If superintelligences tend to use their intelligence to find ways of
stimulating their pleasure centres directly, it seems likely that this
would compromise their reliability and limit their usefulness.
Exploring the problem
To defend against wireheading, a superintelligence needs two
main architectural elements:
it must evaluate the desirability of the expected consequences of
changes to its goal system with respect to its current
goals;
its goals should accurately represent the state the agent
is supposed to produce;
It is the second condition that leads to the main problems.
Say you build a superintelligence with the goal of cooling down the
planet. If you give it access to a range of thermometers and tell it
to minimise their temperature readings, the superintelligence may
notice that it can best attain its goals by immersing the thermometers
in liquid nitrogen.
How can you tell the superintelligence that it's actually the
temperature that needs minimising - and not some proxy for it?
This turns out to be a non-trivial problem. You need to give the
superintelligence a sophisticated understanding of its goal -
including things like what the concept of temperature means and how it
is measured.
The wirehead problem can be illustrated even in relatively simple
systems.
Say you want the superintelligence to find as many prime numbers as
possible. Here the utility function might reference a counter
representing the number of prime numbers which it has identified so
far. However, a wirehead might notice that it could increment the
counter without even attempting to test the candidate numbers
for primality.
How can you tell a self-modifying superintelligent agent to
actually maximise the number of prime numbers found - and
not simply poke values into a counter?
Failure modes
Some solutions to this problem do not appear to be promising:
Attempting to limit self-modification by walling off the agent's
utility function seems destined to fail - this is getting in between
a superintelligent agent and its utility - which is rarely a good idea.
Similarly, making the agent feel revulsion when it thinks about
modifying its goal system is another unlikely solution. That just
creates motivation for the agent to hire a third party to perform
such modifications.
The proposed resolution
The key to the problem is widely thought to be to make the
agent in such a way that it doesn't want to modify its goals
- and so has a stable goal structure which it actively defends. This
resolution was pioneered and promoted by Eliezer Yudkowsky.
Here he is describing the basic idea:
[Eliezer Yudkowsky footage]
Another researcher who has investigated self-improving systems in
depth is Steve Omohundro. He seems to have arrived at a position
similar to Eliezer. Here is Steve describing the situation:
[Steve Omohundro footage]
Proof needed
Unfortunately, the problem of how to avoid wirehead behaviour is one
where we have relatively little experimental evidence which bears
directly on the issue.
Also, the wirehead problem is an extremely complicated one - and it is
difficult to think clearly about all the issues - and so there is the
possibility of making mistakes when reasoning about the topic. There
is not yet any rigorous proof that real systems will actually act as
though they have stable goals once they become capable of
self-modification. Here is Eliezer on that topic:
[Eliezer Yudkowsky footage]
Real systems
The real intelligent systems which we can see exhibit varying levels
of resistance to wireheading:
Some people refuse to take pleasure-inducing drugs - while others
become addicts all too easily. Most people work honestly for a living
- but there are some who rob banks. Most corporations perform a
service to their stockholders - but some engage in accounting fraud to
elevate their stock price, and then use insider-trading techniques to
siphon off their money.
While humans normally appear to have stable goal systems,
there is the phenomenon of religious conversion to consider - in which
people's goal systems often appear to undergo dramatic
changes.
Of course humans are handicapped by being evolved agents. Agents which
are deliberately engineered to be resistant to the
temptation to wirehead themselves may perform a lot better than humans
do.
Implications for machine intelligence
It seems likely that some approaches to constructing machine
intelligence will be prone to the wirehead problem. So, although there
appears to be a theoretical solution, getting agents to the
point where they have sufficent understanding of their own goals to
know enough to avoid self-modifications that trash them looks likely
to prove to be a non-trivial exercise.
One proposed architecture for synthetic intelligence involves a neural
network surrounded by sensor and motor signals, with positive and
negative critical feedback.
This type of architecture - which happens to be the only one for which
we have practical proof that it works - along with other strategies
based directly on reinforcement learning - seems especially likely to
be vulnerable to the wirehead problem.
Significance of the problem
The details of this problem and its resolution make a difference for
futurists, and the architects of intelligent machines.
I rate the problem as one of the most important and interesting
philosophical issues in machine intelligence. If one percent of the
people who talk about machine consciousness considered the problem,
we might have a better understanding of it today.