- Home
- Ajay Agrawal
The Economics of Artificial Intelligence Page 12
The Economics of Artificial Intelligence Read online
Page 12
ting eaten by one of the adversarial “ghosts.” The Maluuba researchers were
able to build a system that learned how to master the game, achieving the
highest possible score and surpassing human performance.
A common misunderstanding of AI imagines that, in a system like
Maluuba’s, the player of the game is a deep neural network. That is, the
system works by swapping out the human joystick operator for an artifi cial
DNN “brain.” That is not how it works. Instead of a single DNN that is tied
to the Ms. Pac- Man avatar (which is how the human player experiences the
game), the Maluuba system is broken down into 163 component ML tasks.
As illustrated on the right panel of fi gure 2.2, the engineers have assigned
a distinct DNN routine to each cell of the board. In addition, they have
DNNs that track the game characters: the ghosts and, of course, Ms. Pac-
Man herself. The direction that the AI system sends Ms. Pac- Man at any
point in the game is then chosen through consideration of the advice from
each of these ML components. Recommendations from the components
that are close to Ms. Pac- Man’s current board position are weighted more
strongly than those of currently remote locations. Hence, you can think of
the ML algorithm assigned to each square on the board as having a simple
task to solve: when Ms. Pac- Man crosses over this location, which direction
should she go next?
Learning to play a video or board game is a standard way for AI fi rms
to demonstrate their current capabilities. The Google DeepMind system
AlphaGo (Silver et al. 2016), which was constructed to play the fantastically
complex board game “go,” is the most prominent of such demonstrations.
The system was able to surpass human capability, beating the world cham-
pion, Lee Sedol, four matches to one at a live- broadcast event in Seoul,
South Korea, in March 2016. Just as Maluuba’s system broke Ms. Pac- Man
into a number of composite tasks, AlphaGo succeeded by breaking Go into
an even larger number of ML problems: “value networks” that evaluate
diff erent board positions and “policy networks” that recommend moves.
The key point here is that while the composite ML tasks can be attacked
with relatively generic DNNs, the full combined system is constructed in a
way that is highly specialized to the structure of the problem at hand.
In fi gure 2.1, the fi rst listed pillar of AI is domain structure. This is the structure that allows you to break a complex problem into composite tasks
that can be solved with ML. The reason that AI fi rms choose to work with
games is that such structure is explicit: the rules of the game are codifi ed.
This exposes the massive gap between playing games and a system that
could replace humans in a real- world business application. To deal with the
real world, you need to have a theory as to the rules of the relevant game.
For example, if you want to build a system that can communicate with cus-
tomers, you might proceed by mapping out customer desires and intents in
wing the
ws shoo k. The full video
or
d arr
al netw
e assignev
. On the right, the authors ha
esponding to a distinct deep neur
d, each corr
- Man and the ghosts
ac
Man
ac-
tions on the boar
or Ms P
e f
ent loca
ying Ms. P
er
y diff
t contains a maz
vised b
d tha
t is ad
wU .
the Maluuba system pla
- Man tha
ac
. P
e see the game boar
zQyWMHFje
or Ms
eenshots of
Scr
outu.be/
ection f
y
On the left, w
ent dir
t https://
Fig. 2.2
Notes:
curr
is a
The Technological Elements of Artifi cial Intelligence 65
such a way that allows diff erent dialog- generating ML routines for each. Or,
for any AI system that deals with marketing and prices in a retail environ-
ment, you need to be able to use the structure of an economic demand system
to forecast how changing the price on a single item (which might, say, be the
job of a single DNN) will aff ect optimal prices for other products and behav-
ior of your consumers (who might themselves be modeled with DNNs).
The success or failure of an AI system is defi ned in a specifi c context,
and you need to use the structure of that context to guide the architecture
of your AI. This is a crucial point for businesses hoping to leverage AI and
economists looking to predict its impact. As we will detail below, machine
learning in its current form has become a general purpose technology (Bres-
nahan 2010). These tools are going to get cheaper and faster over time, due
to innovations in the ML itself and above and below in the AI technology
stack (e.g., improved software connectors for business systems above, and
improved computing hardware like GPUs below). Macine learning has
the potential to become a cloud- computing commodity.2 In contrast, the
domain knowledge necessary to combine ML components into an end-
to-end AI solution will not be commoditized. Those who have expertise
that can break complex human business problems into ML- solvable compo-
nents will succeed in building the next generation of business AI, that which
can do more than just play games.
In many of these scenarios, social science will have a role to play. Science
is about putting structure and theory around phenomena that are obser-
vationally incredibly complex. Economics, as the social ccience closest to
business, will often be relied upon to provide the rules for business AI. And
since ML- driven AI relies upon measuring rewards and parameters inside its
context, econometrics will play a key role in bridging between the assumed
system and the data signals used for feedback and learning. The work will
not translate directly. We need to build systems that allow for a certain mar-
gin of error in the ML algorithms. Those economic theories that apply for
only a very narrow set of conditions—for example, at a knife’s edge equilib-
rium—will be too unstable for AI. This is why we mention relaxations and
heuristics in fi gure 2.1. There is an exciting future here where economists
can contribute to AI engineering, and both AI and economics advance as
we learn what recipes do or do not work for business AI.
Beyond ML and domain structure, the third pillar of AI in fi gure 2.1 is
data generation. I am using the term “generation” here, instead of a more
passive term like “collection,” to highlight that AI systems require an active
strategy to keep a steady stream of new and useful information fl owing
into the composite learning algorithms. In most AI applications there will
2. Amazon, Microsoft, and Google are all starting to off er basic ML capabilities like transcription and ima
ge classifi cation as part of their cloud services. The prices for these services are low and mostly matched across providers.
66 Matt Taddy
be two general classes of data: fi xed- size data assets that can be used to
train the models for generic tasks, and data that is actively generated by the
system as it experiments and improves performance. For example, in learn-
ing how to play Ms. Pac- Man the models could be initialized on a bank of
data recording how humans have played the game. This is the fi xed- size data
asset. Then this initialized system starts to play the game of Ms. Pac- Man.
Recalling that the system is broken into a number of ML components, as
more games are played each component is able to experiment with possible
moves in diff erent scenarios. Since all of this is automated, the system can
iterate through a massive number of games and quickly accumulate a wealth
of experience.
For business applications, we should not underestimate the advantage
of having large data assets to initialize AI systems. Unlike board or video
games, real- world systems need to be able to interpret a variety of extremely
subtle signals. For example, any system that interacts with human dialog
must be able to understand the general domain language before it can deal
with specifi c problems. For this reason, fi rms that have large banks of human
interaction data (e.g., social media or a search engine) have a large techno-
logical advantage in conversational AI systems. However, this data just gets
you started. The context- specifi c learning starts happening when, after this
“warm start,” the system begins interacting with real- world business events.
The general framework of ML algorithms actively choosing the data that
they consume is referred to as reinforcement learning (RL).3 It is a hugely
important aspect of ML- driven AI, and we have a dedicated section on the
topic. In some narrow and highly structured scenarios, researchers have
build “zero- shot” learning systems where the AI is able to achieve high
performance after starting without any static training data. For example, in
subsequent research, Google DeepMind has developed the AlphaGoZero
(Silver et al. 2017) system that uses zero- shot learning to replicate their ear-
lier AlphaGo success. Noting that the RL is happening on the level of indi-
vidual ML tasks, we can update our description of AI as being composed
of many RL- driven ML components.
As a complement to the work on reinforcement learning, there is a lot of
research activity around AI systems that can simulate “data” to appear as
though it came from a real- world source. This has the potential to accelerate
system training, replicating the success that the fi eld has had with video and
board games where experimentation is virtually costless ( just play the game,
nobody loses money or gets hurt). Generative adversarial networks (GANs;
Goodfellow et al. 2014) are schemes where one DNN is simulating data and
another is attempting to discern which data is real and which is simulated.
3. This is an old concept in statistics. In previous iterations, parts of reinforcement learning have been referred to as the sequential design of experiments, active learning, and Bayesian optimization.
The Technological Elements of Artifi cial Intelligence 67
For example, in an image- tagging application one network will generate
captions for the image while the other network attempts to discern which
captions are human versus machine generated. If this scheme works well
enough, then you can build an image tagger while minimizing the number
of dumb captions you need to show humans while training.
And fi nally, AI is pushing into physical spaces. For example, the Amazon
Go concept promises a frictionless shopping checkout experience where
cameras and sensors determine what you’ve taken from the shelves and
charge you accordingly. These systems are as data intensive as any other AI
application, but they have the added need to translate information from a
physical to a digital space. They need to be able to recognize and track both
objects and individuals. Current implementations appear to rely on a combi-
nation of object- based data sources via sensor and device networks (i.e., the
IoT or Internet of Things), and video data from surveillance cameras. The
sensor data has the advantage in that it is well structured and tied to objects,
but the video data has the fl exibility to look in places and at objects that you
did not know to tag in advance. As computer vision technology advances,
and as the camera hardware adapts and decreases in cost, we should see
a shift in emphasis toward unstructured video data. We have seen similar
patterns in AI development, for example, as use of raw conversation logs
increases with improved machine reading capability. This is the progress of
ML- driven AI toward general purpose forms.
2.3 General Purpose Machine Learning
The piece of AI that gets the most publicity—so much so that it is often
confused with all of AI—is general purpose machine learning. Regardless
of this slight overemphasis, it is clear that the recent rise of deep neural net-
works (DNNs; see section 2.5) is a main driver behind growth in AI. These
DNNs have the ability to learn patterns in speech, image, and video data (as
well as in more traditional structured data) faster, and more automatically,
than ever before. They provide new ML capabilities and have completely
changed the workfl ow of an ML engineer. However, this technology should
be understood as a rapid evolution of existing ML capabilities rather than
as a completely new object.
Machine learning is the fi eld that thinks about how to automatically build
robust predictions from complex data. It is closely related to modern statis-
tics, and indeed many of the best ideas in ML have come from statisticians
(the lasso, trees, forests, etc). But whereas statisticians have often focused
model inference—on understanding the parameters of their models (e.g.,
testing on individual coeffi
cients in a regression)—the ML community has
been more focused on the single goal of maximizing predictive performance.
The entire fi eld of ML is calibrated against “out- of-sample” experiments
that evaluate how well a model trained on one data set will predict new data.
68 Matt Taddy
And while there is a recent push to build more transparency into machine
learning, wise ML practitioners will avoid assigning structural meaning to
the parameters of their fi tted models. These models are black boxes whose
purpose is to do a good job in predicting a future that follows the same pat-
terns as in past data.
Prediction is easier than model inference. This has allowed the ML com-
munity to quickly push forward and work with larger and more complex
data. It also facilitated a focus on automation: developing algorithms that
will work on a variety of diff erent types of data with little or no tuning
required. We have seen an explosion of general purpose ML tools in the past
decade—tools that can be deployed on messy data and automatically tuned
> for optimal predictive performance.
The specifi c ML techniques used include high- dimensional ℓ regularized
1
regression (Lasso), tree algorithms and ensembles of trees (e.g., Random
Forests), and neural networks. These techniques have found application in
business problems under such labels as “data mining” and, more recently,
“predictive analytics.” Driven by the fact that many policy and business
questions require more than just prediction, practitioners have added an
emphasis on inference and incorporated ideas from statistics. Their work,
combined with the demands and abundance of big data, coalesced together
to form the loosely defi ned fi eld of data science. More recently, as the fi eld
matures and as people recognize that not everything can be explicitly A/ B
tested, data scientists have discovered the importance of careful causal anal-
ysis. One of the most currently active areas of data science is combining
ML tools with the sort of counterfactual inference that econometricians
have long studied, hence now merging the ML and statistics material with
the work of economists. See, for example, Athey and Imbens (2016), Hart-
ford et al. (2017), and the survey in Athey (2017).
The push of ML into the general area of business analytics has allowed
companies to gain insight from high- dimensional and unstructured data.
This is only possible because the ML tools and recipes have become robust
and usable enough that they can be deployed by nonexperts in computer
science or statistics. That is, they can be used by people with a variety of
quantitative backgrounds who have domain knowledge for their business
use case. Similarly, the tools can be used by economists and other social
scientists to bring new data to bear on scientifi cally compelling research
questions. Again: the general usability of these tools has driven their adop-
tion across disciplines. They come packaged as quality software and include
validation routines that allow the user to observe how well their fi tted models
will perform in future prediction tasks.
The latest generation of ML algorithms, especially the deep learning
technology that has exploded since around 2012 (Krizhevsky, Sutskever,
and Hinton 2012), has increased the level of automation in the process of
fi tting and applying prediction models. This new class of ML is the general