The Economics of Artificial Intelligence Read online

Page 12

ting eaten by one of the adversarial “ghosts.” The Maluuba researchers were

able to build a system that learned how to master the game, achieving the

highest possible score and surpassing human performance.

A common misunderstanding of AI imagines that, in a system like

Maluuba’s, the player of the game is a deep neural network. That is, the

system works by swapping out the human joystick operator for an artifi cial

DNN “brain.” That is not how it works. Instead of a single DNN that is tied

to the Ms. Pac- Man avatar (which is how the human player experiences the

game), the Maluuba system is broken down into 163 component ML tasks.

As illustrated on the right panel of fi gure 2.2, the engineers have assigned

a distinct DNN routine to each cell of the board. In addition, they have

DNNs that track the game characters: the ghosts and, of course, Ms. Pac-

Man herself. The direction that the AI system sends Ms. Pac- Man at any

point in the game is then chosen through consideration of the advice from

each of these ML components. Recommendations from the components

that are close to Ms. Pac- Man’s current board position are weighted more

strongly than those of currently remote locations. Hence, you can think of

the ML algorithm assigned to each square on the board as having a simple

task to solve: when Ms. Pac- Man crosses over this location, which direction

should she go next?

Learning to play a video or board game is a standard way for AI fi rms

to demonstrate their current capabilities. The Google DeepMind system

AlphaGo (Silver et al. 2016), which was constructed to play the fantastically

complex board game “go,” is the most prominent of such demonstrations.

The system was able to surpass human capability, beating the world cham-

pion, Lee Sedol, four matches to one at a live- broadcast event in Seoul,

South Korea, in March 2016. Just as Maluuba’s system broke Ms. Pac- Man

into a number of composite tasks, AlphaGo succeeded by breaking Go into

an even larger number of ML problems: “value networks” that evaluate

diff erent board positions and “policy networks” that recommend moves.

The key point here is that while the composite ML tasks can be attacked

with relatively generic DNNs, the full combined system is constructed in a

way that is highly specialized to the structure of the problem at hand.

In fi gure 2.1, the fi rst listed pillar of AI is domain structure. This is the structure that allows you to break a complex problem into composite tasks

that can be solved with ML. The reason that AI fi rms choose to work with

games is that such structure is explicit: the rules of the game are codifi ed.

This exposes the massive gap between playing games and a system that

could replace humans in a real- world business application. To deal with the

real world, you need to have a theory as to the rules of the relevant game.

For example, if you want to build a system that can communicate with cus-

tomers, you might proceed by mapping out customer desires and intents in

wing the

ws shoo k. The full video

or

d arr

al netw

e assignev

. On the right, the authors ha

esponding to a distinct deep neur

d, each corr

- Man and the ghosts

ac

Man

ac-

tions on the boar

or Ms P

e f

ent loca

ying Ms. P

er

y diff

t contains a maz

vised b

d tha

t is ad

wU .

the Maluuba system pla

- Man tha

ac

. P

e see the game boar

zQyWMHFje

or Ms

eenshots of

Scr

outu.be/

ection f

y

On the left, w

ent dir

t https://

Fig. 2.2

Notes:

curr

is a

The Technological Elements of Artifi cial Intelligence 65

such a way that allows diff erent dialog- generating ML routines for each. Or,

for any AI system that deals with marketing and prices in a retail environ-

ment, you need to be able to use the structure of an economic demand system

to forecast how changing the price on a single item (which might, say, be the

job of a single DNN) will aff ect optimal prices for other products and behav-

ior of your consumers (who might themselves be modeled with DNNs).

The success or failure of an AI system is defi ned in a specifi c context,

and you need to use the structure of that context to guide the architecture

of your AI. This is a crucial point for businesses hoping to leverage AI and

economists looking to predict its impact. As we will detail below, machine

learning in its current form has become a general purpose technology (Bres-

nahan 2010). These tools are going to get cheaper and faster over time, due

to innovations in the ML itself and above and below in the AI technology

stack (e.g., improved software connectors for business systems above, and

improved computing hardware like GPUs below). Macine learning has

the potential to become a cloud- computing commodity.2 In contrast, the

domain knowledge necessary to combine ML components into an end-

to-end AI solution will not be commoditized. Those who have expertise

that can break complex human business problems into ML- solvable compo-

nents will succeed in building the next generation of business AI, that which

can do more than just play games.

In many of these scenarios, social science will have a role to play. Science

is about putting structure and theory around phenomena that are obser-

vationally incredibly complex. Economics, as the social ccience closest to

business, will often be relied upon to provide the rules for business AI. And

since ML- driven AI relies upon measuring rewards and parameters inside its

context, econometrics will play a key role in bridging between the assumed

system and the data signals used for feedback and learning. The work will

not translate directly. We need to build systems that allow for a certain mar-

gin of error in the ML algorithms. Those economic theories that apply for

only a very narrow set of conditions—for example, at a knife’s edge equilib-

rium—will be too unstable for AI. This is why we mention relaxations and

heuristics in fi gure 2.1. There is an exciting future here where economists

can contribute to AI engineering, and both AI and economics advance as

we learn what recipes do or do not work for business AI.

Beyond ML and domain structure, the third pillar of AI in fi gure 2.1 is

data generation. I am using the term “generation” here, instead of a more

passive term like “collection,” to highlight that AI systems require an active

strategy to keep a steady stream of new and useful information fl owing

into the composite learning algorithms. In most AI applications there will

2. Amazon, Microsoft, and Google are all starting to off er basic ML capabilities like transcription and ima
ge classifi cation as part of their cloud services. The prices for these services are low and mostly matched across providers.

66 Matt Taddy

be two general classes of data: fi xed- size data assets that can be used to

train the models for generic tasks, and data that is actively generated by the

system as it experiments and improves performance. For example, in learn-

ing how to play Ms. Pac- Man the models could be initialized on a bank of

data recording how humans have played the game. This is the fi xed- size data

asset. Then this initialized system starts to play the game of Ms. Pac- Man.

Recalling that the system is broken into a number of ML components, as

more games are played each component is able to experiment with possible

moves in diff erent scenarios. Since all of this is automated, the system can

iterate through a massive number of games and quickly accumulate a wealth

of experience.

For business applications, we should not underestimate the advantage

of having large data assets to initialize AI systems. Unlike board or video

games, real- world systems need to be able to interpret a variety of extremely

subtle signals. For example, any system that interacts with human dialog

must be able to understand the general domain language before it can deal

with specifi c problems. For this reason, fi rms that have large banks of human

interaction data (e.g., social media or a search engine) have a large techno-

logical advantage in conversational AI systems. However, this data just gets

you started. The context- specifi c learning starts happening when, after this

“warm start,” the system begins interacting with real- world business events.

The general framework of ML algorithms actively choosing the data that

they consume is referred to as reinforcement learning (RL).3 It is a hugely

important aspect of ML- driven AI, and we have a dedicated section on the

topic. In some narrow and highly structured scenarios, researchers have

build “zero- shot” learning systems where the AI is able to achieve high

performance after starting without any static training data. For example, in

subsequent research, Google DeepMind has developed the AlphaGoZero

(Silver et al. 2017) system that uses zero- shot learning to replicate their ear-

lier AlphaGo success. Noting that the RL is happening on the level of indi-

vidual ML tasks, we can update our description of AI as being composed

of many RL- driven ML components.

As a complement to the work on reinforcement learning, there is a lot of

research activity around AI systems that can simulate “data” to appear as

though it came from a real- world source. This has the potential to accelerate

system training, replicating the success that the fi eld has had with video and

board games where experimentation is virtually costless ( just play the game,

nobody loses money or gets hurt). Generative adversarial networks (GANs;

Goodfellow et al. 2014) are schemes where one DNN is simulating data and

another is attempting to discern which data is real and which is simulated.

3. This is an old concept in statistics. In previous iterations, parts of reinforcement learning have been referred to as the sequential design of experiments, active learning, and Bayesian optimization.

The Technological Elements of Artifi cial Intelligence 67

For example, in an image- tagging application one network will generate

captions for the image while the other network attempts to discern which

captions are human versus machine generated. If this scheme works well

enough, then you can build an image tagger while minimizing the number

of dumb captions you need to show humans while training.

And fi nally, AI is pushing into physical spaces. For example, the Amazon

Go concept promises a frictionless shopping checkout experience where

cameras and sensors determine what you’ve taken from the shelves and

charge you accordingly. These systems are as data intensive as any other AI

application, but they have the added need to translate information from a

physical to a digital space. They need to be able to recognize and track both

objects and individuals. Current implementations appear to rely on a combi-

nation of object- based data sources via sensor and device networks (i.e., the

IoT or Internet of Things), and video data from surveillance cameras. The

sensor data has the advantage in that it is well structured and tied to objects,

but the video data has the fl exibility to look in places and at objects that you

did not know to tag in advance. As computer vision technology advances,

and as the camera hardware adapts and decreases in cost, we should see

a shift in emphasis toward unstructured video data. We have seen similar

patterns in AI development, for example, as use of raw conversation logs

increases with improved machine reading capability. This is the progress of

ML- driven AI toward general purpose forms.

2.3 General Purpose Machine Learning

The piece of AI that gets the most publicity—so much so that it is often

confused with all of AI—is general purpose machine learning. Regardless

of this slight overemphasis, it is clear that the recent rise of deep neural net-

works (DNNs; see section 2.5) is a main driver behind growth in AI. These

DNNs have the ability to learn patterns in speech, image, and video data (as

well as in more traditional structured data) faster, and more automatically,

than ever before. They provide new ML capabilities and have completely

changed the workfl ow of an ML engineer. However, this technology should

be understood as a rapid evolution of existing ML capabilities rather than

as a completely new object.

Machine learning is the fi eld that thinks about how to automatically build

robust predictions from complex data. It is closely related to modern statis-

tics, and indeed many of the best ideas in ML have come from statisticians

(the lasso, trees, forests, etc). But whereas statisticians have often focused

model inference—on understanding the parameters of their models (e.g.,

testing on individual coeffi

cients in a regression)—the ML community has

been more focused on the single goal of maximizing predictive performance.

The entire fi eld of ML is calibrated against “out- of-sample” experiments

that evaluate how well a model trained on one data set will predict new data.

68 Matt Taddy

And while there is a recent push to build more transparency into machine

learning, wise ML practitioners will avoid assigning structural meaning to

the parameters of their fi tted models. These models are black boxes whose

purpose is to do a good job in predicting a future that follows the same pat-

terns as in past data.

Prediction is easier than model inference. This has allowed the ML com-

munity to quickly push forward and work with larger and more complex

data. It also facilitated a focus on automation: developing algorithms that

will work on a variety of diff erent types of data with little or no tuning

required. We have seen an explosion of general purpose ML tools in the past

decade—tools that can be deployed on messy data and automatically tuned

> for optimal predictive performance.

The specifi c ML techniques used include high- dimensional ℓ regularized

1

regression (Lasso), tree algorithms and ensembles of trees (e.g., Random

Forests), and neural networks. These techniques have found application in

business problems under such labels as “data mining” and, more recently,

“predictive analytics.” Driven by the fact that many policy and business

questions require more than just prediction, practitioners have added an

emphasis on inference and incorporated ideas from statistics. Their work,

combined with the demands and abundance of big data, coalesced together

to form the loosely defi ned fi eld of data science. More recently, as the fi eld

matures and as people recognize that not everything can be explicitly A/ B

tested, data scientists have discovered the importance of careful causal anal-

ysis. One of the most currently active areas of data science is combining

ML tools with the sort of counterfactual inference that econometricians

have long studied, hence now merging the ML and statistics material with

the work of economists. See, for example, Athey and Imbens (2016), Hart-

ford et al. (2017), and the survey in Athey (2017).

The push of ML into the general area of business analytics has allowed

companies to gain insight from high- dimensional and unstructured data.

This is only possible because the ML tools and recipes have become robust

and usable enough that they can be deployed by nonexperts in computer

science or statistics. That is, they can be used by people with a variety of

quantitative backgrounds who have domain knowledge for their business

use case. Similarly, the tools can be used by economists and other social

scientists to bring new data to bear on scientifi cally compelling research

questions. Again: the general usability of these tools has driven their adop-

tion across disciplines. They come packaged as quality software and include

validation routines that allow the user to observe how well their fi tted models

will perform in future prediction tasks.

The latest generation of ML algorithms, especially the deep learning

technology that has exploded since around 2012 (Krizhevsky, Sutskever,

and Hinton 2012), has increased the level of automation in the process of

fi tting and applying prediction models. This new class of ML is the general

< Prev Next >