Chicago Public Schools introduction to philosophy Reflection Paper

User Generated

cbepur73

Humanities

Chicago Public Schools

Description

Links will be attached!

Reflections are 500+ word papers, focusing on the way that the information since the previous reflection has affected your ideas regarding the general group of topics covered (for Reflection 1, it's since the start of the course). The general topics are also color-coded in the schedule above. This includes the readings, discussions, class discussions, lectures, etc. as the source of the effect.

You don’t need to address every reading in the group, but they are all related, so you will probably end up doing that anyway. The purpose of the assignment is for you to reflect on your own intellectual positions, how they work, how they deal with the kinds of problems brought up in the readings, and how your ideas change over the course of the semester.

The point here is for you to think about your thinking. What do you think about these kinds of issues? Why? How is that similar/different from the way you thought about them before? Why did you change your mind? What idea strengthened/weakened your original position? What are some implications of that strengthening/weakening?

Whether your positions have changed or not, the influx of new information, new problems and solutions, etc. means that your ideas have changed, because you are looking at the field of ethics with a more discerning eye. Even if your ideas stay exactly the same as before, the new information has either strengthened your position, or has given you a better way of addressing the various problems.

Unformatted Attachment Preview

sort of future do you want? Should we develop lethal autonomous weapons? What would you like to happen with job automation? What career advice would you give today’s kids? Do you prefer new jobs replacing the old ones, or a jobless society where everyone enjoys a life of leisure and machine-produced wealth? Further down the road, would you like us to create Life 3.0 and spread it through our cosmos? Will we control intelligent machines or will they control us? Will intelligent machines replace us, coexist with us or merge with us? What will it mean to be human in the age of artificial intelligence? What would you like it to mean, and how can we make the future be that way? The goal of this book is to help you join this conversation. As I mentioned, there are fascinating controversies where the world’s leading experts disagree. But I’ve also seen many examples of boring pseudo-controversies in which people misunderstand and talk past each other. To help ourselves focus on the interesting controversies and open questions, not on the misunderstandings, let’s start by clearing up some of the most common misconceptions. There are many competing definitions in common use for terms such as “life,” “intelligence” and “consciousness,” and many misconceptions come from people not realizing that they’re using a word in two different ways. To make sure that you and I don’t fall into this trap, I’ve put a cheat sheet in table 1.1 showing how I use key terms in this book. Some of these definitions will only be properly introduced and explained in later chapters. Please note that I’m not claiming that my definitions are better than anyone else’s—I simply want to avoid confusion by being clear on what I mean. You’ll see that I generally go for broad definitions that avoid anthropocentric bias and can be applied to machines as well as humans. Please read the cheat sheet now, and come back and check it later if you find yourself puzzled by how I use one of its words—especially in chapters 4–8. Terminology Cheat Sheet Life Process that can retain its complexity and replicate Life 1.0 Life that evolves its hardware and software (biological stage) Life 2.0 Life that evolves its hardware but designs much of its software (cultural stage) Life 3.0 Life that designs its hardware and software (technological stage) Intelligence Ability to accomplish complex goals Artificial Intelligence (AI) Non-biological intelligence Narrow intelligence Ability to accomplish a narrow set of goals, e.g., play chess or drive a car General intelligence Ability to accomplish virtually any goal, including learning Universal intelligence Ability to acquire general intelligence given access to data and resources [Human-level] Artificial General Intelligence (AGI) Ability to accomplish any cognitive task at least as well as humans Human-level AI AGI Strong AI AGI Superintelligence General intelligence far beyond human level Civilization Interacting group of intelligent life forms Consciousness Subjective experience Qualia Individual instances of subjective experience Ethics Principles that govern how we should behave Teleology Explanation of things in terms of their goals or purposes rather than their causes Goal-oriented behavior Behavior more easily explained via its effect than via its cause Having a goal Exhibiting goal-oriented behavior Having purpose Serving goals of one’s own or of another entity Friendly AI Superintelligence whose goals are aligned with ours Cyborg Human-machine hybrid Intelligence explosion Recursive self-improvement rapidly leading to superintelligence Singularity Intelligence explosion Universe The region of space from which light has had time to reach us during the 13.8 billion years since our Big Bang Table 1.1: Many misunderstandings about AI are caused by people using the words above to mean different things. Here’s what I take them to mean in this book. (Some of these definitions will only be properly introduced and explained in later chapters.) In addition to confusion over terminology, I’ve also seen many AI conversations get derailed by simple misconceptions. Let’s clear up the most common ones. Timeline Myths The first one regards the timeline from figure 1.2: how long will it take until machines greatly supersede human-level AGI? Here, a common misconception is that we know the answer with great certainty. One popular myth is that we know we’ll get superhuman AGI this century. In fact, history is full of technological over-hyping. Where are those fusion power plants and flying cars we were promised we’d have by now? AI too has been repeatedly over-hyped in the past, even by some of the founders of the field: for example, John McCarthy (who coined the term “artificial intelligence”), Marvin Minsky, Nathaniel Rochester and Claude Shannon wrote this overly optimistic forecast about what could be accomplished during two months with stone-age computers: “We propose that a 2 month, 10 man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College…An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.” On the other hand, a popular counter-myth is that we know we won’t get Chapter 2 Matter Turns Intelligent Hydrogen…, given enough time, turns into people. Edward Robert Harrison, 1995 One of the most spectacular developments during the 13.8 billion years since our Big Bang is that dumb and lifeless matter has turned intelligent. How could this happen and how much smarter can things get in the future? What does science have to say about the history and fate of intelligence in our cosmos? To help us tackle these questions, let’s devote this chapter to exploring the foundations and fundamental building blocks of intelligence. What does it mean to say that a blob of matter is intelligent? What does it mean to say that an object can remember, compute and learn? What Is Intelligence? My wife and I recently had the good fortune to attend a symposium on artificial intelligence organized by the Swedish Nobel Foundation, and when a panel of leading AI researchers were asked to define intelligence, they argued at length without reaching consensus. We found this quite funny: there’s no agreement on what intelligence is even among intelligent intelligence researchers! So there’s clearly no undisputed “correct” definition of intelligence. Instead, there are many competing ones, including capacity for logic, understanding, planning, emotional knowledge, self-awareness, creativity, problem solving and learning. In our exploration of the future of intelligence, we want to take a maximally broad and inclusive view, not limited to the sorts of intelligence that exist so far. That’s why the definition I gave in the last chapter, and the way I’m going to use the word throughout this book, is very broad: intelligence = ability to accomplish complex goals This is broad enough to include all above-mentioned definitions, since understanding, self-awareness, problem solving, learning, etc. are all examples of complex goals that one might have. It’s also broad enough to subsume the Oxford Dictionary definition—“the ability to acquire and apply knowledge and skills”—since one can have as a goal to apply knowledge and skills. Because there are many possible goals, there are many possible types of intelligence. By our definition, it therefore makes no sense to quantify intelligence of humans, non-human animals or machines by a single number such as an IQ.*1 What’s more intelligent: a computer program that can only play chess or one that can only play Go? There’s no sensible answer to this, since they’re good at different things that can’t be directly compared. We can, however, say that a third program is more intelligent than both of the others if it’s at least as good as them at accomplishing all goals, and strictly better at at least one (winning at chess, say). It also makes little sense to quibble about whether something is or isn’t intelligent in borderline cases, since ability comes on a spectrum and isn’t necessarily an all-or-nothing trait. What people have the ability to accomplish the goal of speaking? Newborns? No. Radio hosts? Yes. But what about toddlers who can speak ten words? Or five hundred words? Where would you draw the line? I’ve used the deliberately vague word “complex” in the definition above, because it’s not very interesting to try to draw an artificial line between intelligence and non-intelligence, and it’s more useful to simply quantify the degree of ability for accomplishing different goals. Figure 2.1: Intelligence, defined as ability to accomplish complex goals, can’t be measured by a single IQ, only by an ability spectrum across all goals. Each arrow indicates how skilled today’s best AI systems are at accomplishing various goals, illustrating that today’s artificial intelligence tends to be narrow, with each system able to accomplish only very specific goals. In contrast, human intelligence is remarkably broad: a healthy child can learn to get better at almost anything. To classify different intelligences into a taxonomy, another crucial distinction is that between narrow and broad intelligence. IBM’s Deep Blue chess computer, which dethroned chess champion Garry Kasparov in 1997, was only able to accomplish the very narrow task of playing chess—despite its impressive hardware and software, it couldn’t even beat a four-year-old at tic-tac-toe. The DQN AI system of Google DeepMind can accomplish a slightly broader range of goals: it can play dozens of different vintage Atari computer games at human level or better. In contrast, human intelligence is thus far uniquely broad, able to master a dazzling panoply of skills. A healthy child given enough training time can get fairly good not only at any game, but also at any language, sport or vocation. Comparing the intelligence of humans and machines today, we humans win hands-down on breadth, while machines outperform us in a small but growing number of narrow domains, as illustrated in figure 2.1. The holy grail of AI research is to build “general AI” (better known as artificial general intelligence, AGI) that is maximally broad: able to accomplish virtually any goal, including learning. We’ll explore this in detail in chapter 4. The term “AGI” was popularized by the AI researchers Shane Legg, Mark Gubrud and Ben Goertzel to more specifically mean human-level artificial general intelligence: the ability to accomplish any goal at least as well as humans.1 I’ll stick with their definition, so unless I explicitly qualify the acronym (by writing “superhuman AGI,” for example), I’ll use “AGI” as shorthand for “human-level AGI.”*2 Although the word “intelligence” tends to have positive connotations, it’s important to note that we’re using it in a completely value-neutral way: as ability to accomplish complex goals regardless of whether these goals are considered good or bad. Thus an intelligent person may be very good at helping people or very good at hurting people. We’ll explore the issue of goals in chapter 7. Regarding goals, we also need to clear up the subtlety of whose goals we’re referring to. Suppose your future brand-new robotic personal assistant has no goals whatsoever of its own, but will do whatever you ask it to do, and you ask it to cook the perfect Italian dinner. If it goes online and researches Italian dinner recipes, how to get to the closest supermarket, how to strain pasta and so on, and then successfully buys the ingredients and prepares a succulent meal, you’ll presumably consider it intelligent even though the original goal was yours. In fact, it adopted your goal once you’d made your request, and then broke it into a hierarchy of subgoals of its own, from paying the cashier to grating the Parmesan. In this sense, intelligent behavior is inexorably linked to goal attainment. Figure 2.2: Illustration of Hans Moravec’s “landscape of human competence,” where elevation represents difficulty for computers, and the rising sea level represents what computers are able to do. It’s natural for us to rate the difficulty of tasks relative to how hard it is for us humans to perform them, as in figure 2.1. But this can give a misleading picture of how hard they are for computers. It feels much harder to multiply 314,159 by 271,828 than to recognize a friend in a photo, yet computers creamed us at arithmetic long before I was born, while human-level image recognition has only recently become possible. This fact that low-level sensorimotor tasks seem easy despite requiring enormous computational resources is known as Moravec’s paradox, and is explained by the fact that our brain makes such tasks feel easy by dedicating massive amounts of customized hardware to them—more than a quarter of our brains, in fact. I love this metaphor from Hans Moravec, and have taken the liberty to illustrate it in figure 2.2: Computers are universal machines, their potential extends uniformly over a boundless expanse of tasks. Human potentials, on the other hand, are strong in areas long important for survival, but weak in things far removed. Imagine a “landscape of human competence,” having lowlands with labels like “arithmetic” and “rote memorization,” foothills like “theorem proving” and “chess playing,” and high mountain peaks labeled “locomotion,” “handeye coordination” and “social interaction.” Advancing computer performance is like water slowly flooding the landscape. A half century ago it began to drown the lowlands, driving out human calculators and record clerks, but leaving most of us dry. Now the flood has reached the foothills, and our outposts there are contemplating retreat. We feel safe on our peaks, but, at the present rate, those too will be submerged within another half century. I propose that we build Arks as that day nears, and adopt a seafaring life!2 During the decades since he wrote those passages, the sea level has kept rising relentlessly, as he predicted, like global warming on steroids, and some of his foothills (including chess) have long since been submerged. What comes next and what we should do about it is the topic of the rest of this book. As the sea level keeps rising, it may one day reach a tipping point, triggering dramatic change. This critical sea level is the one corresponding to machines becoming able to perform AI design. Before this tipping point is reached, the sea-level rise is caused by humans improving machines; afterward, the rise can be driven by machines improving machines, potentially much faster than humans could have done, rapidly submerging all land. This is the fascinating and controversial idea of the singularity, which we’ll have fun exploring in chapter 4. Computer pioneer Alan Turing famously proved that if a computer can perform a certain bare minimum set of operations, then, given enough time and memory, it can be programmed to do anything that any other computer can do. Machines exceeding this critical threshold are called universal computers (aka Turing-universal computers); all of today’s smartphones and laptops are universal in this sense. Analogously, I like to think of the critical intelligence threshold required for AI design as the threshold for universal intelligence: given enough time and resources, it can make itself able to accomplish any goal as well as any other intelligent entity. For example, if it decides that it wants better social skills, forecasting skills or AI-design skills, it can acquire them. If it decides to figure out how to build a robot factory, then it can do so. In other words, universal intelligence has the potential to develop into Life 3.0. The conventional wisdom among artificial intelligence researchers is that intelligence is ultimately all about information and computation, not about flesh, blood or carbon atoms. This means that there’s no fundamental reason why machines can’t one day be at least as intelligent as us. But what are information and computation really, given that physics has taught us that, at a fundamental level, everything is simply matter and energy moving around? How can something as abstract, intangible and ethereal as information and computation be embodied by tangible physical stuff? In particular, how can a bunch of dumb particles moving around according to the laws of physics exhibit behavior that we’d call intelligent? If you feel that the answer to this question is obvious and consider it plausible that machines might get as intelligent as humans this century—for example because you’re an AI researcher—please skip the rest of this chapter and jump straight to chapter 3. Otherwise, you’ll be pleased to know that I’ve written the next three sections specially for you. What Is Memory? If we say that an atlas contains information about the world, we mean that there’s a relation between the state of the book (in particular, the positions of certain molecules that give the letters and images their colors) and the state of the world (for example, the locations of continents). If the continents were in different places, then those molecules would be in different places as well. We humans use a panoply of different devices for storing information, from books and brains to hard drives, and they all share this property: that their state can be related to (and therefore inform us about) the state of other things that we care about. What fundamental physical property do they all have in common that makes them useful as memory devices, i.e., devices for storing information? The answer is that they all can be in many different long-lived states—long-lived enough to encode the information until it’s needed. As a simple example, suppose you place a ball on a hilly surface that has sixteen different valleys, as in figure 2.3. Once the ball has rolled down and come to rest, it will be in one of sixteen places, so you can use its position as a way of remembering any number between 1 and 16. This memory device is rather robust, because even if it gets a bit jiggled and disturbed by outside forces, the ball is likely to stay in the same valley that you put it in, so you can still tell which number is being stored. The reason that this memory is so stable is that lifting the ball out of its valley requires more energy than random disturbances are likely to provide. This same idea can provide stable memories much more generally than for a movable ball: the energy of a complicated physical system can depend on all sorts of mechanical, chemical, electrical and magnetic properties, and as long as it takes energy to change the system away from the state you want it to remember, this state will be stable. This is why solids have many long-lived states, whereas liquids and gases don’t: if you engrave someone’s name on a gold ring, the information will still be there years later because reshaping the gold requires significant energy, but if you engrave it in the surface of a pond, it will be lost within a second as the water surface effortlessly changes its shape. The simplest possible memory device has only two stable states (figure 2.3). We can therefore think of it as encoding a binary digit (abbreviated “bit”), i.e., a zero or a one. The information stored by any more complicated memory device can equivalently be stored in multiple bits: for example, taken together, the four bits shown in figure 2.3 can be in 2 × 2 × 2 × 2 = 16 different states 0000, 0001, 0010, 0011,…, 1111, so they collectively have exactly the same memory capacity as the more complicated 16-state system. We can therefore think of bits as atoms of information—the smallest indivisible chunk of information that can’t be further subdivided, which can combine to make up any information. For example, I just typed the word “word,” and my laptop represented it in its memory as the 4-number sequence 119 111 114 100, storing each of those numbers as 8 bits (it represents each lowercase letter by a number that’s 96 plus its order in the alphabet). As soon as I hit the w key on my keyboard, my laptop displayed a visual image of a w on my screen, and this image is also represented by bits: 32 bits specify the color of each of the screen’s millions of pixels. Figure 2.3: A physical object is a useful memory device if it can be in many different stable states. The ball on the left can encode four bits of information labeling which one of 24 = 16 valleys it’s in. Together, the four balls on the right also encode four bits of information—one bit each. Since two-state systems are easy to manufacture and work with, most modern computers store their information as bits, but these bits are embodied in a wide variety of ways. On a DVD, each bit corresponds to whether there is or isn’t a microscopic pit at a given point on the plastic surface. On a hard drive, each bit corresponds to a point on the surface being magnetized in one of two ways. In my laptop’s working memory, each bit corresponds to the positions of certain electrons, determining whether a device called a micro-capacitor is charged. Some kinds of bits are convenient to transport as well, even at the speed of light: for example, in an optical fiber transmitting your email, each bit corresponds to a laser beam being strong or weak at a given time. Engineers prefer to encode bits into systems that aren’t only stable and easy to read from (as a gold ring), but also easy to write to: altering the state of your hard drive requires much less energy than engraving gold. They also prefer systems that are convenient to work with and cheap to mass-produce. But other than that, they simply don’t care about how the bits are represented as physical objects—and nor do you most of the time, because it simply doesn’t matter! If you email your friend a document to print, the information may get copied in rapid succession from magnetizations on your hard drive to electric charges in your computer’s working memory, radio waves in your wireless network, voltages in your router, laser pulses in an optical fiber and, finally, molecules on a piece of paper. In other words, information can take on a life of its own, independent of its physical substrate! Indeed, it’s usually only this substrateindependent aspect of information that we’re interested in: if your friend calls you up to discuss that document you sent, she’s probably not calling to talk about voltages or molecules. This is our first hint of how something as intangible as intelligence can be embodied in tangible physical stuff, and we’ll soon see how this idea of substrate independence is much deeper, including not only information but also computation and learning. Because of this substrate independence, clever engineers have been able to repeatedly replace the memory devices inside our computers with dramatically better ones, based on new technologies, without requiring any changes whatsoever to our software. The result has been spectacular, as illustrated in figure 2.4: over the past six decades, computer memory has gotten half as expensive roughly every couple of years. Hard drives have gotten over 100 million times cheaper, and the faster memories useful for computation rather than mere storage have become a whopping 10 trillion times cheaper. If you could get such a “99.99999​99999​9% off” discount on all your shopping, you could buy all real estate in New York City for about 10 cents and all the gold that’s ever been mined for around a dollar. For many of us, the spectacular improvements in memory technology come with personal stories. I fondly remember working in a candy store back in high school to pay for a computer sporting 16 kilobytes of memory, and when I made and sold a word processor for it with my high school classmate Magnus Bodin, we were forced to write it all in ultra-compact machine code to leave enough memory for the words that it was supposed to process. After getting used to floppy drives storing 70kB, I became awestruck by the smaller 3.5-inch floppies that could store a whopping 1.44MB and hold a whole book, and then my firstever hard drive storing 10MB—which might just barely fit a single one of today’s song downloads. These memories from my adolescence felt almost unreal the other day, when I spent about $100 on a hard drive with 300,000 times more capacity. Figure 2.4: Over the past six decades, computer memory has gotten twice as cheap roughly every couple of years, corresponding to a thousand times cheaper roughly every twenty years. A byte equals eight bits. Data courtesy of John McCallum, from http://www.jcmit.net/​memoryprice.htm. What about memory devices that evolved rather than being designed by humans? Biologists don’t yet know what the first-ever life form was that copied its blueprints between generations, but it may have been quite small. A team led by Philipp Holliger at Cambridge University made an RNA molecule in 2016 that encoded 412 bits of genetic information and was able to copy RNA strands longer than itself, bolstering the “RNA world” hypothesis that early Earth life involved short self-replicating RNA snippets. So far, the smallest memory device known to be evolved and used in the wild is the genome of the bacterium Candidatus Carsonella ruddii, storing about 40 kilobytes, whereas our human DNA stores about 1.6 gigabytes, comparable to a downloaded movie. As mentioned in the last chapter, our brains store much more information than our genes: in the ballpark of 10 gigabytes electrically (specifying which of your 100 billion neurons are firing at any one time) and 100 terabytes chemically/biologically (specifying how strongly different neurons are linked by synapses). Comparing these numbers with the machine memories shows that the world’s best computers can now out-remember any biological system—at a cost that’s rapidly dropping and was a few thousand dollars in 2016. The memory in your brain works very differently from computer memory, not only in terms of how it’s built, but also in terms of how it’s used. Whereas you retrieve memories from a computer or hard drive by specifying where it’s stored, you retrieve memories from your brain by specifying something about what is stored. Each group of bits in your computer’s memory has a numerical address, and to retrieve a piece of information, the computer specifies at what address to look, just as if I tell you “Go to my bookshelf, take the fifth book from the right on the top shelf, and tell me what it says on page 314.” In contrast, you retrieve information from your brain similarly to how you retrieve it from a search engine: you specify a piece of the information or something related to it, and it pops up. If I tell you “to be or not,” or if I google it, chances are that it will trigger “To be, or not to be, that is the question.” Indeed, it will probably work even if I use another part of the quote or mess things up somewhat. Such memory systems are called auto-associative, since they recall by association rather than by address. In a famous 1982 paper, the physicist John Hopfield showed how a network of interconnected neurons could function as an auto-associative memory. I find the basic idea very beautiful, and it works for any physical system with multiple stable states. For example, consider a ball on a surface with two valleys, like the one-bit system in figure 2.3, and let’s shape the surface so that the x-coordinates of the two minima where the ball can come to rest are x = √2 ≈ 1.41421 and x = π ≈ 3.14159, respectively. If you remember only that π is close to 3, you simply put the ball at x = 3 and watch it reveal a more exact π-value as it rolls down to the nearest minimum. Hopfield realized that a complex network of neurons provides an analogous landscape with very many energy-minima that the system can settle into, and it was later proved that you can squeeze in as many as 138 different memories for every thousand neurons without causing major confusion. What Is Computation? We’ve now seen how a physical object can remember information. But how can it compute? A computation is a transformation of one memory state into another. In other words, a computation takes information and transforms it, implementing what mathematicians call a function. I think of a function as a meat grinder for information, as illustrated in figure 2.5: you put information in at the top, turn the crank and get processed information out at the bottom—and you can repeat this as many times as you want with different inputs. This information processing is deterministic in the sense that if you repeat it with the same input, you get the same output every time. Figure 2.5: A computation takes information and transforms it, implementing what mathematicians call a function. The function f (left) takes bits representing a number and computes its square. The function g (middle) takes bits representing a chess position and computes the best move for White. The function h (right) takes bits representing an image and computes a text label describing it. Although it sounds deceptively simple, this idea of a function is incredibly general. Some functions are rather trivial, such as the one called NOT that inputs a single bit and outputs the reverse, thus turning zero into one and vice versa. The functions we learn about in school typically correspond to buttons on a pocket calculator, inputting one or more numbers and outputting a single number —for example, the function x2 simply inputs a number and outputs it multiplied by itself. Other functions can be extremely complicated. For instance, if you’re in possession of a function that would input bits representing an arbitrary chess position and output bits representing the best possible next move, you can use it to win the World Computer Chess Championship. If you’re in possession of a function that inputs all the world’s financial data and outputs the best stocks to buy, you’ll soon be extremely rich. Many AI researchers dedicate their careers to figuring out how to implement certain functions. For example, the goal of machine-translation research is to implement a function inputting bits representing text in one language and outputting bits representing that same text in another language, and the goal of automatic-captioning research is inputting bits representing an image and outputting bits representing text describing it (figure 2.5). Figure 2.6: A so-called NAND gate takes two bits, A and B, as inputs and computes one bit C as output, according to the rule that C = 0 if A = B = 1 and C = 1 otherwise. Many physical systems can be used as NAND gates. In the middle example, switches are interpreted as bits where 0 = open, 1= closed, and when switches A and B are both closed, an electromagnet opens the switch C. In the rightmost example, voltages (electrical potentials) are interpreted as bits where 1 = five volts, 0 = zero volts, and when wires A and B are both at five volts, the two transistors conduct electricity and the wire C drops to approximately zero volts. In other words, if you can implement highly complex functions, then you can build an intelligent machine that’s able to accomplish highly complex goals. This brings our question of how matter can be intelligent into sharper focus: in particular, how can a clump of seemingly dumb matter compute a complicated function? Rather than just remain immobile as a gold ring or other static memory device, it must exhibit complex dynamics so that its future state depends in some complicated (and hopefully controllable/programmable) way on the present state. Its atom arrangement must be less ordered than a rigid solid where nothing interesting changes, but more ordered than a liquid or gas. Specifically, we want the system to have the property that if we put it in a state that encodes the input information, let it evolve according to the laws of physics for some amount of time, and then interpret the resulting final state as the output information, then the output is the desired function of the input. If this is the case, then we can say that our system computes our function. As a first example of this idea, let’s explore how we can build a very simple (but also very important) function called a NAND gate*3 out of plain old dumb matter. This function inputs two bits and outputs one bit: it outputs 0 if both inputs are 1; in all other cases, it outputs 1. If we connect two switches in series with a battery and an electromagnet, then the electromagnet will only be on if the first switch and the second switch are closed (“on”). Let’s place a third switch under the electromagnet, as illustrated in figure 2.6, such that the magnet will pull it open whenever it’s powered on. If we interpret the first two switches as the input bits and the third one as the output bit (with 0 = switch open, and 1 = switch closed), then we have ourselves a NAND gate: the third switch is open only if the first two are closed. There are many other ways of building NAND gates that are more practical—for example, using transistors as illustrated in figure 2.6. In today’s computers, NAND gates are typically built from microscopic transistors and other components that can be automatically etched onto silicon wafers. There’s a remarkable theorem in computer science that says that NAND gates are universal, meaning that you can implement any well-defined function simply by connecting together NAND gates.*4 So if you can build enough NAND gates, you can build a device computing anything! In case you’d like a taste of how this works, I’ve illustrated in figure 2.7 how to multiply numbers using nothing but NAND gates. MIT researchers Norman Margolus and Tommaso Toffoli coined the name computronium for any substance that can perform arbitrary computations. We’ve just seen that making computronium doesn’t have to be particularly hard: the substance just needs to be able to implement NAND gates connected together in any desired way. Indeed, there are myriad other kinds of computronium as well. A simple variant that also works involves replacing the NAND gates by NOR gates that output 1 only when both inputs are 0. In the next section, we’ll explore neural networks, which can also implement arbitrary computations, i.e., act as computronium. Scientist and entrepreneur Stephen Wolfram has shown that the same goes for simple devices called cellular automata, which repeatedly update bits based on what neighboring bits are doing. Already back in 1936, computer pioneer Alan Turing proved in a landmark paper that a simple machine (now known as a “universal Turing machine”) that could manipulate symbols on a strip of tape could also implement arbitrary computations. In summary, not only is it possible for matter to implement any well-defined computation, but it’s possible in a plethora of different ways. As mentioned earlier, Turing also proved something even more profound in that 1936 paper of his: that if a type of computer can perform a certain bare minimum set of operations, then it’s universal in the sense that given enough resources, it can do anything that any other computer can do. He showed that his Turing machine was universal, and connecting back more closely to physics, we’ve just seen that this family of universal computers also includes objects as diverse as a network of NAND gates and a network of interconnected neurons. Indeed, Stephen Wolfram has argued that most non-trivial physical systems, from weather systems to brains, would be universal computers if they could be made arbitrarily large and long-lasting. Figure 2.7: Any well-defined computation can be performed by cleverly combining nothing but NAND gates. For example, the addition and multiplication modules above both input two binary numbers represented by 4 bits, and output a binary number represented by 5 bits and 8 bits, respectively. The smaller modules NOT, AND, XOR and + (which sums three separate bits into a 2-bit binary number) are in turn built out of NAND gates. Fully understanding this figure is extremely challenging and totally unnecessary for following the rest of this book; I’m including it here just to illustrate the idea of universality—and to satisfy my inner geek. This fact that exactly the same computation can be performed on any universal computer means that computation is substrate-independent in the same way that information is: it can take on a life of its own, independent of its physical substrate! So if you’re a conscious superintelligent character in a future computer game, you’d have no way of knowing whether you ran on a Windows desktop, a Mac OS laptop or an Android phone, because you would be substrateindependent. You’d also have no way of knowing what type of transistors the microprocessor was using. I first came to appreciate this crucial idea of substrate independence because there are many beautiful examples of it in physics. Waves, for instance: they have properties such as speed, wavelength and frequency, and we physicists can study the equations they obey without even needing to know what particular substance they’re waves in. When you hear something, you’re detecting sound waves caused by molecules bouncing around in the mixture of gases that we call air, and we can calculate all sorts of interesting things about these waves—how their intensity fades as the square of the distance, such as how they bend when they pass through open doors and how they bounce off of walls and cause echoes —without knowing what air is made of. In fact, we don’t even need to know that it’s made of molecules: we can ignore all details about oxygen, nitrogen, carbon dioxide, etc., because the only property of the wave’s substrate that matters and enters into the famous wave equation is a single number that we can measure: the wave speed, which in this case is about 300 meters per second. Indeed, this wave equation that I taught my MIT students about in a course last spring was first discovered and put to great use long before physicists had even established that atoms and molecules existed! This wave example illustrates three important points. First, substrate independence doesn’t mean that a substrate is unnecessary, but that most of its details don’t matter. You obviously can’t have sound waves in a gas if there’s no gas, but any gas whatsoever will suffice. Similarly, you obviously can’t have computation without matter, but any matter will do as long as it can be arranged into NAND gates, connected neurons or some other building block enabling universal computation. Second, the substrate-independent phenomenon takes on a life of its own, independent of its substrate. A wave can travel across a lake, even though none of its water molecules do—they mostly bob up and down, like fans doing “the wave” in a sports stadium. Third, it’s often only the substrateindependent aspect that we’re interested in: a surfer usually cares more about the position and height of a wave than about its detailed molecular composition. We saw how this was true for information, and it’s true for computation too: if two programmers are jointly hunting a bug in their code, they’re probably not discussing transistors. We’ve now arrived at an answer to our opening question about how tangible physical stuff can give rise to something that feels as intangible, abstract and ethereal as intelligence: it feels so non-physical because it’s substrateindependent, taking on a life of its own that doesn’t depend on or reflect the physical details. In short, computation is a pattern in the spacetime arrangement of particles, and it’s not the particles but the pattern that really matters! Matter doesn’t matter. In other words, the hardware is the matter and the software is the pattern. This substrate independence of computation implies that AI is possible: intelligence doesn’t require flesh, blood or carbon atoms. Because of this substrate independence, shrewd engineers have been able to repeatedly replace the technologies inside our computers with dramatically better ones, without changing the software. The results have been every bit as spectacular as those for memory devices. As illustrated in figure 2.8, computation keeps getting half as expensive roughly every couple of years, and this trend has now persisted for over a century, cutting the computer cost a whopping million million million (1018) times since my grandmothers were born. If everything got a million million million times cheaper, then a hundredth of a cent would enable you to buy all goods and services produced on Earth this year. This dramatic drop in costs is of course a key reason why computation is everywhere these days, having spread from the building-sized computing facilities of yesteryear into our homes, cars and pockets—and even turning up in unexpected places such as sneakers. Why does our technology keep doubling its power at regular intervals, displaying what mathematicians call exponential growth? Indeed, why is it happening not only in terms of transistor miniaturization (a trend known as Moore’s law), but also more broadly for computation as a whole (figure 2.8), for memory (figure 2.4) and for a plethora of other technologies ranging from genome sequencing to brain imaging? Ray Kurzweil calls this persistent doubling phenomenon “the law of accelerating returns.” Figure 2.8: Since 1900, computation has gotten twice as cheap roughly every couple of years. The plot shows the computing power measured in floating-point operations per second (FLOPS) that can be purchased for $1,000.3 The particular computation that defines a floating point operation corresponds to about 105 elementary logical operations such as bit flips or NAND evaluations. All examples of persistent doubling that I know of in nature have the same fundamental cause, and this technological one is no exception: each step creates the next. For example, you yourself underwent exponential growth right after your conception: each of your cells divided and gave rise to two cells roughly daily, causing your total number of cells to increase day by day as 1, 2, 4, 8, 16 and so on. According to the most popular scientific theory of our cosmic origins, known as inflation, our baby Universe once grew exponentially just like you did, repeatedly doubling its size at regular intervals until a speck much smaller and lighter than an atom had grown more massive than all the galaxies we’ve ever seen with our telescopes. Again, the cause was a process whereby each doubling step caused the next. This is how technology progresses as well: once technology gets twice as powerful, it can often be used to design and build technology that’s twice as powerful in turn, triggering repeated capability doubling in the spirit of Moore’s law. Something that occurs just as regularly as the doubling of our technological power is the appearance of claims that the doubling is ending. Yes, Moore’s law will of course end, meaning that there’s a physical limit to how small transistors can be made. But some people mistakenly assume that Moore’s law is synonymous with the persistent doubling of our technological power. Contrariwise, Ray Kurzweil points out that Moore’s law involves not the first but the fifth technological paradigm to bring exponential growth in computing, as illustrated in figure 2.8: whenever one technology stopped improving, we replaced it with an even better one. When we could no longer keep shrinking our vacuum tubes, we replaced them with transistors and then integrated circuits, where electrons move around in two dimensions. When this technology reaches its limits, there are many other alternatives we can try—for example, using three-dimensional circuits and using something other than electrons to do our bidding. Nobody knows for sure what the next blockbuster computational substrate will be, but we do know that we’re nowhere near the limits imposed by the laws of physics. My MIT colleague Seth Lloyd has worked out what this fundamental limit is, and as we’ll explore in greater detail in chapter 6, this limit is a whopping 33 orders of magnitude (1033 times) beyond today’s state of the art for how much computing a clump of matter can do. So even if we keep doubling the power of our computers every couple of years, it will take over two centuries until we reach that final frontier. Although all universal computers are capable of the same computations, some are more efficient than others. For example, a computation requiring millions of multiplications doesn’t require millions of separate multiplication modules built from separate transistors as in figure 2.6: it needs only one such module, since it can use it many times in succession with appropriate inputs. In this spirit of efficiency, most modern computers use a paradigm where computations are split into multiple time steps, during which information is shuffled back and forth between memory modules and computation modules. This computational architecture was developed between 1935 and 1945 by computer pioneers including Alan Turing, Konrad Zuse, Presper Eckert, John Mauchly and John von Neumann. More specifically, the computer memory stores both data and software (a program, i.e., a list of instructions for what to do with the data). At each time step, a central processing unit (CPU) executes the next instruction in the program, which specifies some simple function to apply to some part of the data. The part of the computer that keeps track of what to do next is merely another part of its memory, called the program counter, which stores the current line number in the program. To go to the next instruction, simply add one to the program counter. To jump to another line of the program, simply copy that line number into the program counter—this is how so-called “if” statements and loops are implemented. Today’s computers often gain additional speed by parallel processing, which cleverly undoes some of this reuse of modules: if a computation can be split into parts that can be done in parallel (because the input of one part doesn’t require the output of another), then the parts can be computed simultaneously by different parts of the hardware. The ultimate parallel computer is a quantum computer. Quantum computing pioneer David Deutsch controversially argues that “quantum computers share information with huge numbers of versions of themselves throughout the multiverse,” and can get answers faster here in our Universe by in a sense getting help from these other versions.4 We don’t yet know whether a commercially competitive quantum computer can be built during the coming decades, because it depends both on whether quantum physics works as we think it does and on our ability to overcome daunting technical challenges, but companies and governments around the world are betting tens of millions of dollars annually on the possibility. Although quantum computers cannot speed up run-of-the-mill computations, clever algorithms have been developed that may dramatically speed up specific types of calculations, such as cracking cryptosystems and training neural networks. A quantum computer could also efficiently simulate the behavior of quantum-mechanical systems, including atoms, molecules and new materials, replacing measurements in chemistry labs in the same way that simulations on traditional computers have replaced measurements in wind tunnels. What Is Learning? Although a pocket calculator can crush me in an arithmetic contest, it will never improve its speed or accuracy, no matter how much it practices. It doesn’t learn: for example, every time I press its square-root button, it computes exactly the same function in exactly the same way. Similarly, the first computer program that ever beat me at chess never learned from its mistakes, but merely implemented a function that its clever programmer had designed to compute a good next move. In contrast, when Magnus Carlsen lost his first game of chess at age five, he began a learning process that made him the World Chess Champion eighteen years later. The ability to learn is arguably the most fascinating aspect of general intelligence. We’ve already seen how a seemingly dumb clump of matter can remember and compute, but how can it learn? We’ve seen that finding the answer to a difficult question corresponds to computing a function, and that appropriately arranged matter can calculate any computable function. When we humans first created pocket calculators and chess programs, we did the arranging. For matter to learn, it must instead rearrange itself to get better and better at computing the desired function—simply by obeying the laws of physics. To demystify the learning process, let’s first consider how a very simple physical system can learn the digits of π and other numbers. Above we saw how a surface with many valleys (see figure 2.3) can be used as a memory device: for example, if the bottom of one of the valleys is at position x = π ≈ 3.14159 and there are no other valleys nearby, then you can put a ball at x = 3 and watch the system compute the missing decimals by letting the ball roll down to the bottom. Now, suppose that the surface is made of soft clay and starts out completely flat, as a blank slate. If some math enthusiasts repeatedly place the ball at the locations of each of their favorite numbers, then gravity will gradually create valleys at these locations, after which the clay surface can be used to recall these stored memories. In other words, the clay surface has learned to compute digits of numbers such as π. Other physical systems, such as brains, can learn much more efficiently based on the same idea. John Hopfield showed that his above-mentioned network of interconnected neurons can learn in an analogous way: if you repeatedly put it into certain states, it will gradually learn these states and return to them from any nearby state. If you’ve seen each of your family members many times, then memories of what they look like can be triggered by anything related to them. Neural networks have now transformed both biological and artificial intelligence, and have recently started dominating the AI subfield known as machine learning (the study of algorithms that improve through experience). Before delving deeper into how such networks can learn, let’s first understand how they can compute. A neural network is simply a group of interconnected neurons that are able to influence each other’s behavior. Your brain contains about as many neurons as there are stars in our Galaxy: in the ballpark of a hundred billion. On average, each of these neurons is connected to about a thousand others via junctions called synapses, and it’s the strengths of these roughly hundred trillion synapse connections that encode most of the information in your brain. We can schematically draw a neural network as a collection of dots representing neurons connected by lines representing synapses (see figure 2.9). Real-world neurons are very complicated electrochemical devices looking nothing like this schematic illustration: they involve different parts with names such as axons and dendrites, there are many different kinds of neurons that operate in a wide variety of ways, and the exact details of how and when electrical activity in one neuron affects other neurons is still the subject of active study. However, AI researchers have shown that neural networks can still attain human-level performance on many remarkably complex tasks even if one ignores all these complexities and replaces real biological neurons with extremely simple simulated ones that are all identical and obey very simple rules. The currently most popular model for such an artificial neural network represents the state of each neuron by a single number and the strength of each synapse by a single number. In this model, each neuron updates its state at regular time steps by simply averaging together the inputs from all connected neurons, weighting them by the synaptic strengths, optionally adding a constant, and then applying what’s called an activation function to the result to compute its next state.*5 The easiest way to use a neural network as a function is to make it feedforward, with information flowing only in one direction, as in figure 2.9, plugging the input to the function into a layer of neurons at the top and extracting the output from a layer of neurons at the bottom. Figure 2.9: A network of neurons can compute functions just as a network of NAND gates can. For example, artificial neural networks have been trained to input numbers representing the brightness of different image pixels and output numbers representing the probability that the image depicts various people. Here each artificial neuron (circle) computes a weighted sum of the numbers sent to it via connections (lines) from above, applies a simple function and passes the result downward, each subsequent layer computing higher-level features. Typical facerecognition networks contain hundreds of thousands of neurons; the figure shows merely a handful for clarity. The success of these simple artificial neural networks is yet another example of substrate independence: neural networks have great computational power seemingly independent of the low-level nitty-gritty details of their construction. Indeed, George Cybenko, Kurt Hornik, Maxwell Stinchcombe and Halbert White proved something remarkable in 1989: such simple neural networks are universal in the sense that they can compute any function arbitrarily accurately, by simply adjusting those synapse strength numbers accordingly. In other words, evolution probably didn’t make our biological neurons so complicated because it was necessary, but because it was more efficient—and because evolution, as opposed to human engineers, doesn’t reward designs that are simple and easy to understand. When I first learned about this, I was mystified by how something so simple could compute something arbitrarily complicated. For example, how can you compute even something as simple as multiplication, when all you’re allowed to do is compute weighted sums and apply a single fixed function? In case you’d like a taste of how this works, figure 2.10 shows how a mere five neurons can multiply two arbitrary numbers together, and how a single neuron can multiply three bits together. Although you can prove that you can compute anything in theory with an arbitrarily large neural network, the proof doesn’t say anything about whether you can do so in practice, with a network of reasonable size. In fact, the more I thought about it, the more puzzled I became that neural networks worked so well. For example, suppose that we wish to classify megapixel grayscale images into two categories, say cats or dogs. If each of the million pixels can take one of, say, 256 values, then there are 2561000000 possible images, and for each one, we wish to compute the probability that it depicts a cat. This means that an arbitrary function that inputs a picture and outputs a probability is defined by a list of 2561000000 probabilities, that is, way more numbers than there are atoms in our Universe (about 1078). Yet neural networks with merely thousands or millions of parameters somehow manage to perform such classification tasks quite well. How can successful neural networks be “cheap,” in the sense of requiring so few parameters? After all, you can prove that a neural network small enough to fit inside our Universe will epically fail to approximate almost all functions, succeeding merely on a ridiculously tiny fraction of all computational tasks that you might assign to it. Figure 2.10: How matter can multiply, but using not NAND gates as in figure 2.7 but neurons. The key point doesn’t require following the details, and is that not only can neurons (artificial or biological) do math, but multiplication requires many fewer neurons than NAND gates. Optional details for hard-core math fans: Circles perform summation, squares apply the function σ, and lines multiply by the constants labeling them. The inputs are real numbers (left) and bits (right). The multiplication becomes arbitrarily accurate as a → 0 (left) and c → ∞ (right). The left network works for any function σ(x) that’s curved at the origin (with second derivative σ″(0)≠0), which can be proven by Taylor expanding σ(x). The right network requires that the function σ(x) approaches 0 and 1 when x gets very small and very large, respectively, which is seen by noting that uvw = 1 only if u + v + w = 3. (These examples are from a paper I wrote with my students Henry Lin and David Rolnick, “Why Does Deep and Cheap Learning Work So Well?,” which can be found at http://arxiv.org/​abs/​1608.08225.) By combining together lots of multiplications (as above) and additions, you can compute any polynomials, which are well known to be able to approximate any smooth function. I’ve had lots of fun puzzling over this and related mysteries with my student Henry Lin. One of the things I feel most grateful for in life is the opportunity to collaborate with amazing students, and Henry is one of them. When he first walked into my office to ask whether I was interested in working with him, I thought to myself that it would be more appropriate for me to ask whether he was interested in working with me: this modest, friendly and bright-eyed kid from Shreveport, Louisiana, had already written eight scientific papers, won a Forbes 30-Under-30 award, and given a TED talk with over a million views— and he was only twenty! A year later, we wrote a paper together with a surprising conclusion: the question of why neural networks work so well can’t be answered with mathematics alone, because part of the answer lies in physics. We found that the class of functions that the laws of physics throw at us and make us interested in computing is also a remarkably tiny class because, for reasons that we still don’t fully understand, the laws of physics are remarkably simple. Moreover, the tiny fraction of functions that neural networks can compute is very similar to the tiny fraction that physics makes us interested in! We also extended previous work showing that deep-learning neural networks (they’re called “deep” if they contain many layers) are much more efficient than shallow ones for many of these functions of interest. For example, together with another amazing MIT student, David Rolnick, we showed that the simple task of multiplying n numbers requires a whopping 2n neurons for a network with only one layer, but takes only about 4n neurons in a deep network. This helps explain not only why neural networks are now all the rage among AI researchers, but also why we evolved neural networks in our brains: if we evolved brains to predict the future, then it makes sense that we’d evolve a computational architecture that’s good at precisely those computational problems that matter in the physical world. Now that we’ve explored how neural networks work and compute, let’s return to the question of how they can learn. Specifically, how can a neural network get better at computing by updating its synapses? In his seminal 1949 book, The Organization of Behavior: A Neuropsychological Theory, the Canadian psychologist Donald Hebb argued that if two nearby neurons were frequently active (“firing”) at the same time, their synaptic coupling would strengthen so that they learned to help trigger each other—an idea captured by the popular slogan “Fire together, wire together.” Although the details of how actual brains learn are still far from understood, and research has shown that the answers are in many cases much more complicated, it’s also been shown that even this simple learning rule (known as Hebbian learning) allows neural networks to learn interesting things. John Hopfield showed that Hebbian learning allowed his oversimplified artificial neural network to store lots of complex memories by simply being exposed to them repeatedly. Such exposure to information to learn from is usually called “training” when referring to artificial neural networks (or to animals or people being taught skills), although “studying,” “education” or “experience” might be just as apt. The artificial neural networks powering today’s AI systems tend to replace Hebbian learning with more sophisticated learning rules with nerdy names such as “backpropagation” and “stochastic gradient descent,” but the basic idea is the same: there’s some simple deterministic rule, akin to a law of physics, by which the synapses get updated over time. As if by magic, this simple rule can make the neural network learn remarkably complex computations if training is performed with large amounts of data. We don’t yet know precisely what learning rules our brains use, but whatever the answer may be, there’s no indication that they violate the laws of physics. Just as most digital computers gain efficiency by splitting their work into multiple steps and reusing computational modules many times, so do many artificial and biological neural networks. Brains have parts that are what computer scientists call recurrent rather than feedforward neural networks, where information can flow in multiple directions rather than just one way, so that the current output can become input to what happens next. The network of logic gates in the microprocessor of a laptop is also recurrent in this sense: it keeps reusing its past information, and lets new information input from a keyboard, trackpad, camera, etc., affect its ongoing computation, which in turn determines information output to, say, a screen, loudspeaker, printer or wireless network. Analogously, the network of neurons in your brain is recurrent, letting information input from your eyes, ears and other senses affect its ongoing computation, which in turn determines information output to your muscles. The history of learning is at least as long as the history of life itself, since every self-reproducing organism performs interesting copying and processing of information—behavior that has somehow been learned. During the era of Life 1.0, however, organisms didn’t learn during their lifetime: their rules for processing information and reacting were determined by their inherited DNA, so the only learning occurred slowly at the species level, through Darwinian evolution across generations. About half a billion years ago, certain gene lines here on Earth discovered a way to make animals containing neural networks, able to learn behaviors from experiences during life. Life 2.0 had arrived, and because of its ability to learn dramatically faster and outsmart the competition, it spread like wildfire across the globe. As we explored in chapter 1, life has gotten progressively better at learning, and at an ever-increasing rate. A particular ape-like species grew a brain so adept at acquiring knowledge that it learned how to use tools, make fire, speak a language and create a complex global society. This society can itself be viewed as a system that remembers, computes and learns, all at an accelerating pace as one invention enables the next: writing, the printing press, modern science, computers, the internet and so on. What will future historians put next on that list of enabling inventions? My guess is artificial intelligence. As we all know, the explosive improvements in computer memory and computational power (figure 2.4 and figure 2.8) have translated into spectacular progress in artificial intelligence—but it took a long time until machine learning came of age. When IBM’s Deep Blue computer overpowered chess champion Garry Kasparov in 1997, its major advantages lay in memory and computation, not in learning. Its computational intelligence had been created by a team of humans, and the key reason that Deep Blue could outplay its creators was its ability to compute faster and thereby analyze more potential positions. When IBM’s Watson computer dethroned the human world champion in the quiz show Jeopardy!, it too relied less on learning than on custom-programmed skills and superior memory and speed. The same can be said of most early breakthroughs in robotics, from legged locomotion to self-driving cars and self-landing rockets. In contrast, the driving force behind many of the most recent AI breakthroughs has been machine learning. Consider figure 2.11, for example. It’s easy for you to tell what it’s a photo of, but to program a function that inputs nothing but the colors of all the pixels of an image and outputs an accurate caption such as “A group of young people playing a game of frisbee” had eluded all the world’s AI researchers for decades. Yet a team at Google led by Ilya Sutskever did precisely that in 2014. Input a different set of pixel colors, and it replies “A herd of elephants walking across a dry grass field,” again correctly. How did they do it? Deep Blue–style, by programming handcrafted algorithms for detecting frisbees, faces and the like? No, by creating a relatively simple neural network with no knowledge whatsoever about the physical world or its contents, and then letting it learn by exposing it to massive amounts of data. AI visionary Jeff Hawkins wrote in 2004 that “no computer can…see as well as a mouse,” but those days are now long gone. Figure 2.11: “A group of young people playing a game of frisbee”—that caption was written by a computer with no understanding of people, games or frisbees. Just as we don’t fully understand how our children learn, we still don’t fully understand how such neural networks learn, and why they occasionally fail. But what’s clear is that they’re already highly useful and are triggering a surge of investments in deep learning. Deep learning has now transformed many aspects of computer vision, from handwriting transcription to real-time video analysis for self-driving cars. It has similarly revolutionized the ability of computers to transform spoken language into text and translate it into other languages, even in real time—which is why we can now talk to personal digital assistants such as Siri, Google Now and Cortana. Those annoying CAPTCHA puzzles, where we need to convince a website that we’re human, are getting ever more difficult in order to keep ahead of what machine-learning technology can do. In 2015, Google DeepMind released an AI system using deep learning that was able to master dozens of computer games like a kid would—with no instructions whatsoever—except that it soon learned to play better than any human. In 2016, the same company built AlphaGo, a Go-playing computer system that used deep learning to evaluate the strength of different board positions and defeated the world’s strongest Go champion. This progress is fueling a virtuous circle, bringing ever more funding and talent into AI research, which generates further progress. We’ve spent this chapter exploring the nature of intelligence and its development up until now. How long will it take until machines can out-compete us at all cognitive tasks? We clearly don’t know, and need to be open to the possibility that the answer may be “never.” However, a basic message of this chapter is that we also need to consider the possibility that it will happen, perhaps even in our lifetime. After all, matter can be arranged so that when it obeys the laws of physics, it remembers, computes and learns—and the matter doesn’t need to be biological. AI researchers have often been accused of overpromising and under-delivering, but in fairness, some of their critics don’t have the best track record either. Some keep moving the goalposts, effectively defining intelligence as that which computers still can’t do, or as that which impresses us. Machines are now good or excellent at arithmetic, chess, mathematical theorem proving, stock picking, image captioning, driving, arcade game playing, Go, speech synthesis, speech transcription, translation and cancer diagnosis, but some critics will scornfully scoff “Sure—but that’s not real intelligence!” They might go on to argue that real intelligence involves only the mountaintops in Moravec’s landscape (figure 2.2) that haven’t yet been submerged, just as some people in the past used to argue that image captioning and Go should count—while the water kept rising. Assuming that the water will keep rising for at least a while longer, AI’s impact on society will keep growing. Long before AI reaches human level across all tasks, it will give us fascinating opportunities and challenges involving issues such as bugs, laws, weapons and jobs. What are they and how can we best prepare for them? Let’s explore this in the next chapter. THE BOTTOM LINE: • Intelligence, defined as ability to accomplish complex goals, can’t be measured by a single IQ, only by an ability spectrum across all goals. • Today’s artificial intelligence tends to be narrow, with each system able to accomplish only very specific goals, while human intelligence is remarkably broad. • Memory, computation, learning and intelligence have an abstract, intangible and ethereal feel to them because they’re substrate-independent: able to take on a life of their own that doesn’t depend on or reflect the details of their underlying material substrate. • Any chunk of matter can be the substrate for memory as long as it has many different stable states. • Any matter can be computronium, the substrate for computation, as long as it contains certain universal building blocks that can be combined to implement any function. NAND gates and neurons are two important examples of such universal “computational atoms.” • A neural network is a powerful substrate for learning because, simply by obeying the laws of physics, it can rearrange itself to get better and better at implementing desired computations. • Because of the striking simplicity of the laws of physics, we humans only care about a tiny fraction of all imaginable computational problems, and neural networks tend to be remarkably good at solving precisely this tiny fraction. • Once technology gets twice as powerful, it can often be used to design and build technology that’s twice as powerful in turn, triggering repeated capability doubling in the spirit of Moore’s law. The cost of information technology has now halved roughly every two years for about a century, enabling the information age. • If AI progress continues, then long before AI reaches human level for all skills, it will give us fascinating opportunities and challenges involving issues such as bugs, laws, weapons and jobs—which we’ll explore in the next chapter. *1 To see this, imagine how you’d react if someone claimed that the ability to accomplish Olympic-level athletic feats could be quantified by a single number called the “athletic quotient,” or AQ for short, so that the Olympian with the highest AQ would win the gold medals in all the sports. *2 Some people prefer “human-level AI” or “strong AI” as synonyms for AGI, but both are problematic. Even a pocket calculator is a human-level AI in the narrow sense. The antonym of “strong AI” is “weak AI,” but it feels odd to call narrow AI systems such as Deep Blue, Watson, and AlphaGo “weak.” *3 NAND is short for NOT AND: An AND gate outputs 1 only if the first input is 1 and the second input is 1, so NAND outputs the exact opposite. *4 I’m using “well-defined function” to mean what mathematicians and computer scientists call a “computable function,” i.e., a function that could be computed by some hypothetical computer with unlimited memory and time. Alan Turing and Alonzo Church famously proved that there are also functions that can be described but aren’t computable. *5 In case you like math, two popular choices of this activation function are the so-called sigmoid function σ(x) ≡ 1/(1 + e−x) and the ramp function σ(x) = max{0, x}, although it’s been proven that almost any function will suffice as long as it’s not linear (a straight line). Hopfield’s famous model uses σ(x) = −1 if x < 0 and σ(x) = 1 if x ≥ 0. If the neuron states are stored in a vector, then the network is updated by simply multiplying that vector by a matrix storing the synaptic couplings and then applying the function σ to all elements. THE SELF The idea of the self is a kind of initial stepping-stone idea used by every ideology, religion, and way of understanding the world. When we say, “the self,” we are really speaking about what it means to be an individual. That is, what kind of a thing is a human being, and what are the rules, limits, and other similar features of being human. Why is this an important question? You make assumption, justifications, explanations, and expectations depending on how you understand what it means to be human. Think of it this way: for the most part, we think of people as responsible for their behavior, and expect them to conform to some basic social norms. But, we don’t include young children in that group of “people.” Why not? Because we believe that the kind of individual that a child is, is not the same as the kind of individual that an adult is. And because of that difference in kind, we automatically have different assumptions about children, different kinds of justifications for their behavior, different explanations about what is happening, and different expectations. Similarly, people who have very different ideas about what is the self, will have very different assumptions, justifications, explanations, and expectations about individuals. If you believe that all ideas like loyalty are really a way of getting short-term gains – and will be abandoned as soon as a better opportunity comes along – then your perception of what people do and why will be very different from someone who builds their life on the idea that loyalty is paramount. But why focus on the individual? Well, any kind of society is ultimately made up of individuals. A society, or social order, is just an idea of how those individuals should be arranged and how they should relate to each other. So, you can’t talk about society (at least not coherently) unless you understand the building blocks of that society – that is, individuals. Any social system that gets its building blocks wrong, is going to collapse – sooner or later. This is because the building blocks are the parts out of which we construct the society. If you don’t understand how your building blocks work, you don’t understand what they can and cannot do. To use a construction analogy, bricks are a great building material, but you cannot build a skyscraper out of bricks. Why not? Because the weight of the height of bricks is resting completely on the bottom brick, and that means that you can only build as high as the bottom brick can bear. If you go any higher, the whole thing will collapse. Now, if you don’t understand that bricks have limits, and keep on building up, sooner or later, those limits (whether you know about them or not) will reveal themselves, and the building will crumble. Communism, for example, had that problem. It assumed that the people would take care of shared possessions in the same way they obviously took care of their personal possessions. The whole idea behind the Communist utopia was based on this idea of the self. It turns out, it was a bad idea – because that’s just not how people work. As a result, we had a number of failed communist experiments in the 20th century – with the death toll above 100 million people killed by the good intentions of their state. How do we understand this “self?” The first step in understanding your idea of the self, is to recognize that there are a number of competing models. How come? Because the idea of the self is a metaphysical idea – not something we can weigh and measure. As a result, there is more than one way of understanding the issue. We’re not THE SELF concerned with what you believe, but what it means to believe that. That is, we’re interested in the kinds of big implications that your idea of the self carries along with it. For our purposes, we will be looking at three big questions about the self. These are the kinds of questions that will have a very deep impact on any other ideas you have. First, we will look at the question of free will. In this section, we will discuss whether or not you have free will – the freedom to make choices – or if you are something closer to a mindless machine who only has the illusion of freedom. All of ethics, right and wrong, etc. rest completely on this issue. If you have no free will, you cannot ever be responsible for anything. Second, we will look at the question of mind and body. Are mind and body the same? Are they different? If the mind is not part of the body, how does it control the body? If it is part of the body, then it must be mechanical – and that results in the loss of free will. Both the first and the second questions are critical for understanding our own position on consciousness and ethics. Finally, we will look at AI (artificial intelligence). Can an AI ever become conscious? If it could, would it become a living thing? If it can become conscious, and because computers are getting better than us at many things, should we think of computers in the same way we think of people – with rights, responsibilities, etc.? If a machine can become conscious, can we think of ourselves as anything but machines? If we’re machines, then the mind is the body, and there is no free will. You can see how the topics are connected, and how critical they are for every other topic we will be talking about. They have major implications for ethics, major implications for society, etc. This is why the topic of the self comes first. In the rest of this reading, we will consider one more reason why the self is a crucial topic for our consideration. The Oracle at Delphi – person considered to be a sacred by the Greek religious and social ideas, and one believed to be connected directly to the gods – had a sign above the temple, that said, “Know Thyself.” While this is a starting point of philosophy, it should also be a starting point of every person’s understanding of the world. Why? To know yourself means to understand: who you are, what you believe, why you believe it, the extent of your knowledge and your ignorance, the justification for your beliefs, the meaning of your ideas, the implications of your beliefs, how the world is (according to you), why it is the way it is, how you should behave, why you should behave that way, the consequences of behaving correctly and incorrectly, what you should and should not value, and what ideas are so important that you should sacrifice everything else to make them real. In other words, knowing yourself is the foundation on which much of your metaphysical outlook depends. If you have ever seen or heard of a person that claims to be… something (good, religious, atheist, materialistic, nihilist, hipster, etc.), but whose behavior or other ideas contradict that claim – congratulations! You have found a person who does not know themselves. THE SELF This kind of hypocrisy (say one thing, do the opposite) is an easy mistake to make, because we don’t really understand what it is to be that something (we mostly just claim belonging to whatever is around us); and we have no idea whether we really are that thing or not. And so, as soon as we are faced with a kind of problem that requires thinking, or taking a stand on principle, we do the wrong/easy thing – according to our own claim of what we are. Unfortunately, while the hypocrisy is easy to understand, its consequences are rather massively problematic. Why is the self-ignorant hypocrisy a problem? As noted in our earlier readings, knowledge and an orderly world is crucial for our survival and success. The orderliness of the world means that that we can rely on it, that we can make predictions about the future, and that we can orient ourselves for success. Of course, nothing is perfect, but a high degree of order ensures that we can navigate life as individuals and as a society in a stable and useful way. Knowledge is the way that we understand that orderliness, the way we learn to navigate from our present to the future. Again, this is done on an individual level, as well as a social one. But when we have individual and social lack of self-understanding, we get individual and social hypocrisy. On the individual level, you get unreliable people. When they are entrusted with a task, they betray that trust. When you rely on them, they disappoint. When you expect the truth, you get lies and confusion. In other words, they become the source of chaos that spreads like wildfire. The damage they do is, quite often, incalculable. On a social level, organizations and agencies either fail, or worse, do the kind of harm that is worse than if they did nothing. Social, legal, and national efforts become highly counterproductive and harmful. Again, these become the sources of chaos which penetrate the society – making it less orderly, less stable, less predictable. For example, Conservatives in the US have historically stood on the “family values” platform. This is not a problem by itself; in fact, traditional family values have been a fairly core concept to things like the Western Civilization. However, it’s easy to claim to be pro “family values.” It is a lot more difficult to make your actions match your words. And every time that a conservative gets caught cheating on their spouse, or pushing their mistress to have an abortion, it’s not merely Bob the conservative who is seen as a problem. Instead, the actions of an individual reflect highly negatively on the entire conservative ideology. It does not take long before anyone with a “family values” platform is seen as disingenuous. The thing is, if you had asked Bob about his position – and he answered truthfully – before he screwed up, he would have likely been entirely honest in his claim to believe in family values. But, failing to actually understand himself, Bob could not understand what the idea of family values should mean to him; what kinds of behaviors he needed to embody, what kinds of sacrifices he needs to make – and so his claim to be pro “family values” was merely a parroting of an idea that sounded good – as long as he did not think about it. And so, Bob causes incredible amount of harm by his failure to know himself. The problem of trust, reliability, truth, and order are also there for the individual. If you are ignorant of your own self, your own actions become unpredictable to you. What will you do if X happens? How will you react? How will your actions and reactions affect your mental state? Physical state? Your job? Your family? Your spouse? Your kids? Will you be a beacon of stability and hope, or will you drag everyone around you into chaos? Plenty of seemingly happy and stable people are only that way because they’ve THE SELF not experienced anything resembling hardship in their lives. And when hardship comes – and it will – they turn positively genocidal. On a social level, when a series of people who fail to know themselves are involved in a project, the odds of it going sideways go through the roof. What you have is a bunch of people who don’t understand themselves, acting as the executors of some social policy. For example, most people are unlikely to think of themselves as thieves. But, that’s because they’ve never been in a situation where there is so much money flying around that $100,000 is not likely to be missed by anyone. Think of the 2007-8 housing crisis that collapsed the markets. For years, banks had been handing out money to every schmuck who walked in through the door. They did this, because they would then bundle up thousands of those bad loans and sell them as super-secure package deals. They got paid back, the bad loans were someone else’s problem. Now, we say “banks,” but that’s not accurate. Banks are really just people with jobs. So, Steve knew that he was doing a bad thing signing off on a very bad loan. Joe knew that he was doing a bad thing when he was lying to his customers when he packaged bad loans and pretended that they were great. Stacy knew that she was doing the wrong thing when she incentivized Steve and Joe to do the wrong thing, to cheat and lie, in order to make more money. All three (tens of thousands of people, really) were doing what they knew for a fact to be wrong, in order to make money. They also knew that doing the right thing would likely cost them their jobs. And that’s where the failure to know yourself enters the picture. No one thinks they’re a bad person. But they’re unlikely to have ever examined themselves and understood what it means to claim to be good. They have no idea what they should do in a critical situation; no idea what the sacrifice might be to do the right thing. And so, it is unsurprising when, at a critical moment, they do exactly the wrong thing. Besides the ignorance, it should be noted that people are incredibly good at justifying their actions to themselves, despite knowing full-well that they’re doing the wrong thing. Without understanding the idea of self, you cannot understand yourself. Without understanding yourself, you cannot meaningfully adhere to anything like an ethical system. Without an ethical system, you are little more than an unpredictable potential for destruction, waiting for an opportunity to collapse yourself, and everything around you, into pure chaos. For these reasons, and others, it is crucial that we understand what we mean by “the self,” for both ourselves and humanity.
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Please find the attached. I have uploaded the assignment earlier than the initial deadline. This is to give you time to ask for any revisions. kindly go through the assignment as soon as possible. Let me know if you need any clarification.

Running head: REFLECTION

1

Intellectual Thinking Positions
Student’s Name
Institutional Affiliation

REFLECTION

2
Intellectual Thinking Positions

Intellectual describes something related to the mind. It’s the extent to which one
thinks critically. The extent to which one can think critically and make the right decision
appropriately. It simply describes intensive reasoning and deep thinking. An intellectual
person engages in critical thinking research and reflection to discuss specific issues, including
academic work. It’s pretty tricky to tell how intellectually thinking since other people from
outside can only notice intellectual thinkers. However, every person can feel from the inside
their position as far as...


Anonymous
Great study resource, helped me a lot.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Similar Content

Related Tags