The Economics of Artificial Intelligence Read online

Page 12


  ting eaten by one of the adversarial “ghosts.” The Maluuba researchers were

  able to build a system that learned how to master the game, achieving the

  highest possible score and surpassing human performance.

  A common misunderstanding of AI imagines that, in a system like

  Maluuba’s, the player of the game is a deep neural network. That is, the

  system works by swapping out the human joystick operator for an artifi cial

  DNN “brain.” That is not how it works. Instead of a single DNN that is tied

  to the Ms. Pac- Man avatar (which is how the human player experiences the

  game), the Maluuba system is broken down into 163 component ML tasks.

  As illustrated on the right panel of fi gure 2.2, the engineers have assigned

  a distinct DNN routine to each cell of the board. In addition, they have

  DNNs that track the game characters: the ghosts and, of course, Ms. Pac-

  Man herself. The direction that the AI system sends Ms. Pac- Man at any

  point in the game is then chosen through consideration of the advice from

  each of these ML components. Recommendations from the components

  that are close to Ms. Pac- Man’s current board position are weighted more

  strongly than those of currently remote locations. Hence, you can think of

  the ML algorithm assigned to each square on the board as having a simple

  task to solve: when Ms. Pac- Man crosses over this location, which direction

  should she go next?

  Learning to play a video or board game is a standard way for AI fi rms

  to demonstrate their current capabilities. The Google DeepMind system

  AlphaGo (Silver et al. 2016), which was constructed to play the fantastically

  complex board game “go,” is the most prominent of such demonstrations.

  The system was able to surpass human capability, beating the world cham-

  pion, Lee Sedol, four matches to one at a live- broadcast event in Seoul,

  South Korea, in March 2016. Just as Maluuba’s system broke Ms. Pac- Man

  into a number of composite tasks, AlphaGo succeeded by breaking Go into

  an even larger number of ML problems: “value networks” that evaluate

  diff erent board positions and “policy networks” that recommend moves.

  The key point here is that while the composite ML tasks can be attacked

  with relatively generic DNNs, the full combined system is constructed in a

  way that is highly specialized to the structure of the problem at hand.

  In fi gure 2.1, the fi rst listed pillar of AI is domain structure. This is the structure that allows you to break a complex problem into composite tasks

  that can be solved with ML. The reason that AI fi rms choose to work with

  games is that such structure is explicit: the rules of the game are codifi ed.

  This exposes the massive gap between playing games and a system that

  could replace humans in a real- world business application. To deal with the

  real world, you need to have a theory as to the rules of the relevant game.

  For example, if you want to build a system that can communicate with cus-

  tomers, you might proceed by mapping out customer desires and intents in

  wing the

  ws shoo k. The full video

  or

  d arr

  al netw

  e assignev

  . On the right, the authors ha

  esponding to a distinct deep neur

  d, each corr

  - Man and the ghosts

  ac

  Man

  ac-

  tions on the boar

  or Ms P

  e f

  ent loca

  ying Ms. P

  er

  y diff

  t contains a maz

  vised b

  d tha

  t is ad

  wU .

  the Maluuba system pla

  - Man tha

  ac

  . P

  e see the game boar

  zQyWMHFje

  or Ms

  eenshots of

  Scr

  outu.be/

  ection f

  y

  On the left, w

  ent dir

  t https://

  Fig. 2.2

  Notes:

  curr

  is a

  The Technological Elements of Artifi cial Intelligence 65

  such a way that allows diff erent dialog- generating ML routines for each. Or,

  for any AI system that deals with marketing and prices in a retail environ-

  ment, you need to be able to use the structure of an economic demand system

  to forecast how changing the price on a single item (which might, say, be the

  job of a single DNN) will aff ect optimal prices for other products and behav-

  ior of your consumers (who might themselves be modeled with DNNs).

  The success or failure of an AI system is defi ned in a specifi c context,

  and you need to use the structure of that context to guide the architecture

  of your AI. This is a crucial point for businesses hoping to leverage AI and

  economists looking to predict its impact. As we will detail below, machine

  learning in its current form has become a general purpose technology (Bres-

  nahan 2010). These tools are going to get cheaper and faster over time, due

  to innovations in the ML itself and above and below in the AI technology

  stack (e.g., improved software connectors for business systems above, and

  improved computing hardware like GPUs below). Macine learning has

  the potential to become a cloud- computing commodity.2 In contrast, the

  domain knowledge necessary to combine ML components into an end-

  to-end AI solution will not be commoditized. Those who have expertise

  that can break complex human business problems into ML- solvable compo-

  nents will succeed in building the next generation of business AI, that which

  can do more than just play games.

  In many of these scenarios, social science will have a role to play. Science

  is about putting structure and theory around phenomena that are obser-

  vationally incredibly complex. Economics, as the social ccience closest to

  business, will often be relied upon to provide the rules for business AI. And

  since ML- driven AI relies upon measuring rewards and parameters inside its

  context, econometrics will play a key role in bridging between the assumed

  system and the data signals used for feedback and learning. The work will

  not translate directly. We need to build systems that allow for a certain mar-

  gin of error in the ML algorithms. Those economic theories that apply for

  only a very narrow set of conditions—for example, at a knife’s edge equilib-

  rium—will be too unstable for AI. This is why we mention relaxations and

  heuristics in fi gure 2.1. There is an exciting future here where economists

  can contribute to AI engineering, and both AI and economics advance as

  we learn what recipes do or do not work for business AI.

  Beyond ML and domain structure, the third pillar of AI in fi gure 2.1 is

  data generation. I am using the term “generation” here, instead of a more

  passive term like “collection,” to highlight that AI systems require an active

  strategy to keep a steady stream of new and useful information fl owing

  into the composite learning algorithms. In most AI applications there will

  2. Amazon, Microsoft, and Google are all starting to off er basic ML capabilities like transcription and ima
ge classifi cation as part of their cloud services. The prices for these services are low and mostly matched across providers.

  66 Matt Taddy

  be two general classes of data: fi xed- size data assets that can be used to

  train the models for generic tasks, and data that is actively generated by the

  system as it experiments and improves performance. For example, in learn-

  ing how to play Ms. Pac- Man the models could be initialized on a bank of

  data recording how humans have played the game. This is the fi xed- size data

  asset. Then this initialized system starts to play the game of Ms. Pac- Man.

  Recalling that the system is broken into a number of ML components, as

  more games are played each component is able to experiment with possible

  moves in diff erent scenarios. Since all of this is automated, the system can

  iterate through a massive number of games and quickly accumulate a wealth

  of experience.

  For business applications, we should not underestimate the advantage

  of having large data assets to initialize AI systems. Unlike board or video

  games, real- world systems need to be able to interpret a variety of extremely

  subtle signals. For example, any system that interacts with human dialog

  must be able to understand the general domain language before it can deal

  with specifi c problems. For this reason, fi rms that have large banks of human

  interaction data (e.g., social media or a search engine) have a large techno-

  logical advantage in conversational AI systems. However, this data just gets

  you started. The context- specifi c learning starts happening when, after this

  “warm start,” the system begins interacting with real- world business events.

  The general framework of ML algorithms actively choosing the data that

  they consume is referred to as reinforcement learning (RL).3 It is a hugely

  important aspect of ML- driven AI, and we have a dedicated section on the

  topic. In some narrow and highly structured scenarios, researchers have

  build “zero- shot” learning systems where the AI is able to achieve high

  performance after starting without any static training data. For example, in

  subsequent research, Google DeepMind has developed the AlphaGoZero

  (Silver et al. 2017) system that uses zero- shot learning to replicate their ear-

  lier AlphaGo success. Noting that the RL is happening on the level of indi-

  vidual ML tasks, we can update our description of AI as being composed

  of many RL- driven ML components.

  As a complement to the work on reinforcement learning, there is a lot of

  research activity around AI systems that can simulate “data” to appear as

  though it came from a real- world source. This has the potential to accelerate

  system training, replicating the success that the fi eld has had with video and

  board games where experimentation is virtually costless ( just play the game,

  nobody loses money or gets hurt). Generative adversarial networks (GANs;

  Goodfellow et al. 2014) are schemes where one DNN is simulating data and

  another is attempting to discern which data is real and which is simulated.

  3. This is an old concept in statistics. In previous iterations, parts of reinforcement learning have been referred to as the sequential design of experiments, active learning, and Bayesian optimization.

  The Technological Elements of Artifi cial Intelligence 67

  For example, in an image- tagging application one network will generate

  captions for the image while the other network attempts to discern which

  captions are human versus machine generated. If this scheme works well

  enough, then you can build an image tagger while minimizing the number

  of dumb captions you need to show humans while training.

  And fi nally, AI is pushing into physical spaces. For example, the Amazon

  Go concept promises a frictionless shopping checkout experience where

  cameras and sensors determine what you’ve taken from the shelves and

  charge you accordingly. These systems are as data intensive as any other AI

  application, but they have the added need to translate information from a

  physical to a digital space. They need to be able to recognize and track both

  objects and individuals. Current implementations appear to rely on a combi-

  nation of object- based data sources via sensor and device networks (i.e., the

  IoT or Internet of Things), and video data from surveillance cameras. The

  sensor data has the advantage in that it is well structured and tied to objects,

  but the video data has the fl exibility to look in places and at objects that you

  did not know to tag in advance. As computer vision technology advances,

  and as the camera hardware adapts and decreases in cost, we should see

  a shift in emphasis toward unstructured video data. We have seen similar

  patterns in AI development, for example, as use of raw conversation logs

  increases with improved machine reading capability. This is the progress of

  ML- driven AI toward general purpose forms.

  2.3 General Purpose Machine Learning

  The piece of AI that gets the most publicity—so much so that it is often

  confused with all of AI—is general purpose machine learning. Regardless

  of this slight overemphasis, it is clear that the recent rise of deep neural net-

  works (DNNs; see section 2.5) is a main driver behind growth in AI. These

  DNNs have the ability to learn patterns in speech, image, and video data (as

  well as in more traditional structured data) faster, and more automatically,

  than ever before. They provide new ML capabilities and have completely

  changed the workfl ow of an ML engineer. However, this technology should

  be understood as a rapid evolution of existing ML capabilities rather than

  as a completely new object.

  Machine learning is the fi eld that thinks about how to automatically build

  robust predictions from complex data. It is closely related to modern statis-

  tics, and indeed many of the best ideas in ML have come from statisticians

  (the lasso, trees, forests, etc). But whereas statisticians have often focused

  model inference—on understanding the parameters of their models (e.g.,

  testing on individual coeffi

  cients in a regression)—the ML community has

  been more focused on the single goal of maximizing predictive performance.

  The entire fi eld of ML is calibrated against “out- of-sample” experiments

  that evaluate how well a model trained on one data set will predict new data.

  68 Matt Taddy

  And while there is a recent push to build more transparency into machine

  learning, wise ML practitioners will avoid assigning structural meaning to

  the parameters of their fi tted models. These models are black boxes whose

  purpose is to do a good job in predicting a future that follows the same pat-

  terns as in past data.

  Prediction is easier than model inference. This has allowed the ML com-

  munity to quickly push forward and work with larger and more complex

  data. It also facilitated a focus on automation: developing algorithms that

  will work on a variety of diff erent types of data with little or no tuning

  required. We have seen an explosion of general purpose ML tools in the past

  decade—tools that can be deployed on messy data and automatically tuned

>   for optimal predictive performance.

  The specifi c ML techniques used include high- dimensional ℓ regularized

  1

  regression (Lasso), tree algorithms and ensembles of trees (e.g., Random

  Forests), and neural networks. These techniques have found application in

  business problems under such labels as “data mining” and, more recently,

  “predictive analytics.” Driven by the fact that many policy and business

  questions require more than just prediction, practitioners have added an

  emphasis on inference and incorporated ideas from statistics. Their work,

  combined with the demands and abundance of big data, coalesced together

  to form the loosely defi ned fi eld of data science. More recently, as the fi eld

  matures and as people recognize that not everything can be explicitly A/ B

  tested, data scientists have discovered the importance of careful causal anal-

  ysis. One of the most currently active areas of data science is combining

  ML tools with the sort of counterfactual inference that econometricians

  have long studied, hence now merging the ML and statistics material with

  the work of economists. See, for example, Athey and Imbens (2016), Hart-

  ford et al. (2017), and the survey in Athey (2017).

  The push of ML into the general area of business analytics has allowed

  companies to gain insight from high- dimensional and unstructured data.

  This is only possible because the ML tools and recipes have become robust

  and usable enough that they can be deployed by nonexperts in computer

  science or statistics. That is, they can be used by people with a variety of

  quantitative backgrounds who have domain knowledge for their business

  use case. Similarly, the tools can be used by economists and other social

  scientists to bring new data to bear on scientifi cally compelling research

  questions. Again: the general usability of these tools has driven their adop-

  tion across disciplines. They come packaged as quality software and include

  validation routines that allow the user to observe how well their fi tted models

  will perform in future prediction tasks.

  The latest generation of ML algorithms, especially the deep learning

  technology that has exploded since around 2012 (Krizhevsky, Sutskever,

  and Hinton 2012), has increased the level of automation in the process of

  fi tting and applying prediction models. This new class of ML is the general