The reach of AI — Jane X. Wang

Humans’ unique form of intelligence originated, many tens of millennia ago, with our hands, powerful and dextrous. But our reach ended at our fingertips, with what we could physically touch and construct.

Then came advancements in metallurgy, physics, chemistry, and material sciences, which all extended that reach in the form of tools, machines, ballistics, and firepower. Through electricity and the telephone, and of course language, we could obtain information from a wider and wider sphere of influence, but our ability to translate that information into what we wanted was still limited.

The history of human civilization has been progressing toward more and more power to influence our environment. Our survival tactic is not to adapt to our surroundings, but rather to adapt our surroundings to ourselves (the specific term for this in evolutionary sciences is “niche construction”). Doing this requires being able to exert specialized influence with surgical precision. Think about the exacting requirements of synthesizing medicine, or the amazingly intricate feats of engineering needed to be able to fly hundreds of us far-flung distances through the stratosphere.

Machinery, technology, computers, and the like, have all progressively increased our ability to influence our surroundings, but this process has mainly been limited by what we can logically think through or design step by step in our minds. Like machines, traditional computer programs are very brittle and literal, and contain multitude points of failure, because they were constructed to do exactly one thing. Remove just one component from a car, and it can fail entirely; remove one line of code from software, and the whole thing can crash rather ungracefully.

Deep learning, machine learning, AI - whatever you want to call it - is the next step in increasing our reach. We no longer need to specify exactly what we want our machine or program to do. We just give it the capacity to map appropriately between the input information we feed it and the output decision that would be useful to us. In theory, if we had enough examples and enough capacity, we could get a deep learning model to construct any of those same machines or programs that a human previously had to painstakingly construct, bit by bit. This is, of course, a much harder problem, because it’s much more general. Just like it’s far harder to construct a calculator than it is to do the calculation of 2 + 2 = 4. But the calculator can do much more than just calculate one equation.

What does it mean to have “reach” within our environment? All tools, machines, and software give us reach, by transforming one state of our environment into a different, more desirable state. In image classification, a deep learning model transforms a set of unlabeled images into more useful labelled images. In reinforcement learning, our model can take a given situation and output a policy, which defines a set of actions that will yield high reward.

Taking this as the end goal, we can imagine that the most powerful AI models of the future will be able to directly transform current states of the world into whatever is desired by the user, in a manner that isn’t programmed, but rather is learned by a model given large amounts of training data and only minor human guidance. It conceivably would be indistinguishable from magic. Imagine being able to generate music or paintings from just a few descriptive words, or to compose a song by humming a few melodies or chords. Some current large-scale generative models already appear to be making huge strides in this direction.

Of course, a critical, unavoidable challenge on both the ethical and practical side will be in deciding how exactly to specify those target states, the training data, and who the users should be. If the most useful AI models require the least amount of human specification and guidance, then what assumptions should they make about what is most useful? What dangers and biases will be inherent in making these assumptions?