« Making Sense with Sam Harris

#116 — AI: Racing Toward the Brink

2018-02-05 | 🔗

In this episode of the Making Sense podcast, Sam Harris speaks with Eliezer Yudkowsky about the nature of intelligence, different types of AI, the "alignment problem," IS vs OUGHT, the possibility that future AI might deceive us, the AI arms race, conscious AI, coordination problems, and other topics.

SUBSCRIBE to listen to the rest of this episode and gain access to all full-length episodes of the podcast at samharris.org/subscribe.

This is an unofficial transcript meant for reference. Accuracy is not guaranteed.
Today, I'm speaking with Eleazar Yudkowsky decision theorist and computer scientist at the Machine Intelligence Research Institute in Berkeley known for his work on technological forecasting. His publications include a chapter in Cambridge Handbook of Artificial intelligence titled. The F have artificial intelligence, which he co, authored with Nick Bostrom. And eh lasers- writing has been extremely influential. Online specialties had blogs that been read by the smart set in Silicon Valley. For years many articles were pulled together in a book titled rationality from ai to zombies, which I highly recommend. And he has a new book out which is inadequate, equilibria, Where and how civilizations get stuck and as you'll hear, a laser is a very interesting first principles, kind of thinker of those,
smart people who are worried about AI, I probably among the most worried and his concern. Things have been largely responsible for kindling the conversation we've been having in recent years about ai safety and AI ethics, he's been very influential on many of the people. Who have made the same worried noises I have in the last couple of years, so today's episode you're getting it straight from the horse's mouth We cover more or less everything related to the question of why one should be worried about where this is all headed. So without further delay, I bring you Eleazar Yudkowsky. I am here with Eleazar Yudkowsky Eleazar, thanks for coming on the podcast you're. Quite welcome center to be here
You have been a much requested guest over the years. You have quite the cult following for obvious reasons. For those who are not familiar with your work, they will understand the reasons once we get into talking about things, but you have been very present online. As a blogger, I don't know if you're still blogging a lot, but let's just some is your background for a bit and then tell people what you have been doing, intellectually for the last twenty years, or so, I would describe myself as a decision theorist. A lot of other people would say that I'm an artificial intelligence and, in particular, in the theory of how to make sufficiently advanced artificial intelligences that do a particular thing and do destroy the world as a side effect. I would call that a alignment following Stuart Russell other people would call that
hey I control or a safety or a. I risk none of which are terms that I really like I also have a important sideline in the art of rationality, the way of achieving the map that reflects the territory and figuring out how to navigate reality. To where you to to go from a probability, theory decision, three cognitive biases perspective, I wrote two or three years of blog posts, one a day on that and it was collected into a book called rationality from a I to zombies, yeah, which I have read and which is really worth reading. You have a very clear and Africa, stick way of writing. It's really! Quite wonderful. So I highly recommend that book. Thank you. Thank you. Your background is unconventional. So, for instance, you did not go to high school. Correct, let alone college or graduate school summarize that for us, the system didn't
fit me that well and I'm good at self teaching. I guess I sort of when I started out. I thought I was going to go into something evolutionary psychology or possibly neuroscience, and then I discovered probability theory, statistics decision theory and came to specialize in that more bar over the years. How did you not wind up going to high school? What was that decision? Like sort of like mental will crash around the time I hit puberty or like physical crash. Even and I just did not have the stamina- to make it through a whole day of classes at the time. Uh, I'm not sure how well I do trying to go to high school now, honestly, but it, but it was clear that I could self teach. So that's what I do and where did you grow up Chicago Illinois? I okay! Well, let's fast forward to sort of the center of the bull's eye for your intellectual life. Here you have a new
book out which will talk about. Second, your new book is inadequate, equally area where and how civilizations get stuck and Unfortunately, I've only read half of that, which I'm also enjoying certainly read enough to start a conversation on that, but we should start with artificial intelligence, because it's a topic that I've touched a bunch on the podcast which you have wrong opinions about, and it's is really how we came together? You and I first met at that conference in Puerto Rico, which was the first of these ai safety alignment discussions that I was aware of, I'm sure there but others, but that was a pretty interesting gas. The rain. So let's talk about and the possible problem with, where we're headed and the near term problem that many people in the field and at the periphery, the field don't seem to take the problem as we conceive it seriously. Let's just start with the basic
picture and define some terms got suppose. We should define intelligence first and then jump into the differences between strong then we core, general versus narrow ai. Do you want to start us off on that sure preamble disclaimer, though the field in general, like not everyone, you ask, would give you the same definition of intelligence and a lot of times in cases like those it's good to sort of go back to observation, Ull, basics. We know that in a certain way human beings seem a lot more competent than chimpanzees, which seems to a similar dimension to the one where chimpanzees are more competent than mice or that mice are more complicated to competent than spiders, and people have tried various theories about what this dimension is. They've tried various definitions of it, but if you went back a few centuries and ask somebody to define
in a fire, the less wise ones would say, a flyer fires, the release of phlogiston fires. One of the four elements and the truly wise ones would say. Well, fires the sort of orangey bright hot stuff that comes out of wood and like spreads along would, and they would tell you what it looks like and put that prior to their theories of what it was. So what this mysterious thing looks like is that humans can build space shuttles and go to the moon and my scant, and we think it has something to do with our brains, yeah yeah, can make it more abstract than that. Tell me if you think this is not generic enough to be accepted by most people in the field, whatever intelligence, maybe in specific context, are generally speaking. It's the ability to meet goals diverse range of environments, and
we might want to add that it's at least implicit in intelligence that interests us it means and ability to do that flexibly rather than by rote following the same strategy again again blindly. Does that seem like a reasonable starting point? I think that that would get fairly widespread agreement. Then it matches up well with some of the things that are in a I text books. If I'm allowed to sort of take it a bit further and begin, injecting my own viewpoint into it, I I would refine it and say that by achieve goals, we mean something like squeezing, The measure of possible futures higher in your preference ordering, if we took all the possible outcomes away, rank them from the ones you, like least to the ones you like most
as you achieve your goals, you're sort of like squeezing the outcomes, hiring your preference ordering your narrowing down what the outcome would be to be something more like what you bought, even though you might not be able to narrow it down very exactly flexibility and a rally. There's a like humans are much more domain general than mice. Bees build highs, Beaver first build the dams, human will look over both of them and and envision, a honeycomb structured damn at like we are able to operate even on the moon. Which is like very unlikely environment where we evolved, in fact, are only competitor in terms of general optimization, where optimization is that sort of narrowing of the future that I talked about our our in terms of
General optimization is natural selection, like natural selection, built beavers that built bees that sort of implicitly built the spider's web in the course of building spiders and we as humans have like the similar like very broad range handle this like huge variety of problems, and the key to that is our bility, to learn things that natural selection did not pre program us with. So earning, is the key to generality. I expect that not many people, and I would disagree with that part either right. So it seems that goal directed behavior is implicit in this or even explicit, in this definition of intelligence. So, whatever intelligence is, it is inseparable from the kinds of behavior in the world that results in the fulfillment of goals.
Talking about agents that can do things, and once you see that, then it becomes pretty clear that if we build systems that harbor primary goals, their cartoon examples here, like you know, making paper clips, these are not systems that Oh spontaneously decide that they could be doing more enlightened things then say making paper clips. This moves to the question of how deeply unfamiliar artificial intelligence might, because there are, there are no natural goals that will drive in these systems, apart from the ones we put in there and we have common, it's intuitions that make it very difficult for us to think about how strange an artificial intelligence could be even one becomes more and more competent to meet its goals. Let's talk about the frontiers of straw,
dangerous in a I as we move from again. I think we have a couple more definitions. We should probably put in play here. Different shading, strong and weak or general and narrow intelligence well to differentiate um, general and narrow. I would say that. Well I mean This is like, on the one hand, theoretically a spectrum. Now there are other hand, there seems to have been like a very sharp jump in generality between chimpanzees and humans, so breath of domain driven by breath of learning um like deep mind, for example, recently built else to go, and I lost some money. Betting that Elsa go, would not to speak to you and champion which it probably better, and then a successor to that was Alfa, zero and I'll so was specialized on go. It could learn to play, go better than its starting point for playing go,
but clip couldn't learn to do anything else, and then they simplified the architecture for alpha go. They figured out ways to do all the things that was doing and more and more general ways they discarded the opening book, like all the sort of human experience of go that was built into it. They were able to discard all of these sort of like programmatic special features that detected features of the go board. You figured out how to do that that in simpler ways, and because I figured out how to do it in simpler ways, they were able to generalize Tau alpha zero, which learned how to play chess using the same architecture. They took a single a. I got it go and then like re, ran it and made it learn chess. Now, that's not human general, but it's like, but it. But it's like a step forward in generality of the sort that we're talking about right in thinking that that's a pretty enormous breakthrough, Abby that there's two things here, there's the step to that
free of generality. But there's also the fact that they built a go engine. I forget it. A go or chess or both which basically surpassed all of the specialized a eyes on those games over the course of a day right isn't The chess engine of Alfa zero better than any dedicated chess computer ever and didn't achieve that. Just with astonishing speed like some amount of debate afterwards, whether or not the version of the chess engine that it was tested against was truly optimal but like even that, even even extended in that narrow range of the best existing chess engine as MAX Tegmark put it the real.
Story. Wasn't in how Alfa go beat humans suck you in you, men go players, it's and it's how alpha zero beat you men go game. Ghost go system, programmers and human chess system grammer's with people had put years and years of effort into a treating all of the special purpose code that played that. Would that would play chess out well and efficiently and then Alfa zero blew up to and possibly pass that in a day, and if it hasn't already gone past it well, it will be passed. It would be passed it by now if, if it might kept working on as well, although they've they've now basically declared victory and shut down the product, as I understand okay, so talk about the distinction between general and narrow intelligence, a little bit more. So we have this
picture of our minds most conspicuously world war. General problem so, We can learn new things and our learning in one area require a fundamental rewriting of our code. Knowledge in one area isn't so brittle as to be degraded by are acquiring knowledge in some new area, or at least this is not a general problem which roads are understanding. And again- and we don't yet have computers that can do this, but We're seeing the signs of moving in that direction, and so and it's often imagine that there's a kind of near term goal, which is always me as a mirage of so called human level. General AI. I don't see how oh that phrase,
will ever mean much of anything, given that all of the narrow ai we've built thus far is superhuman within the domain of its applications. The calculator in my phone is superhuman for arithmetic any general ai? That also has my phone's ability to calculate will be superhuman for arithmetic, but we might presume, it'll, be superhuman for all of the does 10S or hundreds of specific human talents we've put into it, whether it's facial recognition or just obviously beta memory will be superhuman. Unless we decide to consciously degrade it, access to the World Data will be superhuman unless we isolated from data. Do you see this notion of human level ai as a landmark in the timeline of our development, or is it just never going to be reached? I think that a lot of people in the field would agree that, human level, ai define
literally at the human LE, neither above nor below across a wide range of competencies is a straw target is an impossible meraj right now, it seems like a. I is clearly dumber, and generals in us or weather that like, if are put into a sort of like real the world, lots of things going on context that place the son generality than a they're, not really in the game. Yet humans like clearly way ahead and more controversially, I would say, but we can imagine a state where the ai is clearly way way ahead, where it is across sort of every kind of cognitive competency, barring this,
like very narrow ones, that, like aren't deeply influential of the others, like maybe chimpanzees, are better at using a stick to draw ants from an ant hive and eat them than humans are, though, no humans have really like practice, natural world championship level. Exactly um, but is sort of general factor of how good good at you are you at it when reality throws you a complicated problem at this chimpanzees are clearly not better than humans. Humans are clearly better than ships. Even you can manage to narrow down one thing: the chimp is better at the thing the chip is better at does play a big role in our globe. The economy is not an input that feeds into lots of other things, so we can clearly imagine I would say, like there are some people who say this is not possible. I think they're wrong, but it seems to me that it is perfectly coherent to imagine and a I. It is like better at everything or almost everything that we are and such that, if it was building an economy with lots of inputs like humans,
We have around the same level input into that economy. Is the chimpanzees happened to rs yeah yeah? So what you're? gesturing at here is a continuum of elegance that I think most people never think about, and because they don't think about it, they have a default doubt that it exists. I think when people is a point I know you've made in your writing and I'm sure it's a point that Nick Bostrom made somewhere in his book superintelligence is this idea that there's a huge blank space on the map past the most well advertised exemplars of human brilliance where we don't imagine. You know what it would be like to be five times more. Other than the smartest person. We could name And we don't even know what that would consist in right, because if Jim
could be given to wonder what it would be like to be five times smarter than the smartest chimp they're not going to represent for themselves all of the things that we're doing that they can't even dimly conceive, there's a kind of DIS junk, and that comes with more there's a phrase used in mill, literary context. I don't think they quote is actually it is variously attributed to Stalin and Napoleon, and I think Klaus would say that half a dozen people people have claimed this quote quote is sometimes quantity has a quality all its own as you ramp up in intelligence, However, it is at the level of information processing spaces of query and ideation, and experience begin to open up, and we can't necessarily predict what they would be from where we sit. How do you think about this continuum?
Telogen is beyond what we currently know. In light of what we're talking about well, the unknowable is a is a concept you have to be very careful with, because the thing you can't figure out in the first thirty seconds of thinking about it, sometimes you can figure it out if you think for another five minutes so in particular, I think that there's a certain narrow kind of unpredictability, which does seem to be possibly in some sense essential, which is in fact for alpha, go to play better, go down and the best you can go players. It must be the case that the best human go. Players cannot predict exactly where on the GO board alpha go will play if they could predict exactly where alpha go would play. The alpha go would be no smarter than On the other hand, Alpha GO's programmers,
and the people who knew what alpha GO's programmers were trying to do, or even just the people who watched alpha go play could say. Well, I think the system is going to play such that it will win at the end of the game. Even if they couldn't predict exactly where it would move on the board. So similarly there's a sort of like not Or like not necessarily slam, dunk or or like immediately obvious chain of reasoning which says that it is okay for us to reason about um aligned or even unaligned artificial general intelligence is of sufficient power, as if they're trying to do something. But we don't necessarily know what, but, from our perspective that still has consequences, even though we can't predict in advance exactly how they're going to do. I think we should define this notion of a line.
I meant. What do you mean by alignment as in the alignment problem? Well, it's sort of like a big problem and it does have some moral and ethical aspects which are not as important as the technical aspects of our apartment, they're, not as difficult as detectives last specks. That could couldn't exactly be less important, um, but, broadly speaking, it's an aye aye that, where you can like sort of say, what it's trying to do and they're sort of like narrow conceptions of alignment, which is you are trying to get it to do something like. Cure Alzheimer's disease without destroying the rest of the world and there's sort of much more ambitious. Notions of alignment, which is you are trying to get it to do the right thing and achieve a a happy interest, galactic civilization, but
Thank you for the both of the like sort of narrow alignment and the ambitious line. It happened. Com that you're trying to have the a I do the acting rather than making a lot of paper clips right, for those who have not followed this conversation before we should cash out this reference to paper clips, which I made at the opening. Does this thought experiment origi? with Bostrom or did he take it from somebody else, cz Iris, I know it's me. I was here okay, it could, it could still be bossed room, like I sort of like Ask somebody like dude remember who it was and they like, search through the archives of a mailing list, for this idea Plaza originated and if it originated the earth- and I was the first one to say paper clips or ever by all means. Please summarize this thought experiment for us well The original thing was somebody saying that expressing a sense
meant along the lines of artificial. Who are we to constrain the past of things smarter than us? They will like create something that sure we don't know what it will be, but it will like be very worth while we sit and stand in the way of fast that the sentiments behind this are something that I have a great deal of sympathy for. I think the model of the world is wrong. I think they're, actually wrong about what happens when you sort of take a random may I and make it much bigger and in particular, I said the thing I'm worried about is that it's going to end up with a randomly rolled utility function, whose maximum happens to be a particular kind, tiny molecular shape. That looks like a paper clip and that, like the original paper clip Maximizer scenario, it sort of got a little bit started in in Bing whispered on into the notion of somebody builds a paper clip factory and the ai in charge of the paper. Clip factory takes over the universe and turns it on to pay
clips. There is like a lovely online game about it, even but this still sort of cuts against a couple of of key points. One is the problem: isn't that paperclip factory a I spoke heinously wake up wherever the first artificial general intelligence is from it's going to be in a research lab specifically dedicated to doing it, for the same reason that the first airplane didn't spun spontaneously, assemble in a junk heap um, and that the people who are doing this are not dumb enough to tell their ai to make paper clips or make money or end all war. These are Hollywood movie plots that the scriptwriters do because they need a story. Conflict in the story. Conflict requires that somebody stupid so the people that at Google are not dumb enough to build an ai and tell to make paper clips the prob
I'm I'm I'm worried about. Is that it's technically difficult to get the ai to have a particular goal set and keep, that goal said and implement that ghost in the real world, and so what it does instead is something random, for example, making paper clips where paper clips are meant to stand in for something that is worthless even from a very cosmopolitan perspective, even with we're trying to take a very embracing view of the nice possibilities and accept that there may these things that we wouldn't even understand that, if we understand them. We would comprehend to be a very high value paper. Clips are not one of those things lt's, no matter how long you start a paper clip, it still seems pretty pointless from our perspective. So that is the concern about the future being ruined the future being lost. The future been turned into paper clips. One thing this thought experiment: does it also cuts against the assumption that a sufficient
the intelligent system, a system that is more competent than we are in some general sense, would, by definition, only form goals or only be driven by utility function, that we would recognize, as being ethical or wise would by death mission, be aligned with our better interests that we're not going to build something. That is super huge incompetence that could be moving along some path, it's as income, Hannibal with our well being as turning Adam on earth into a Adam, but you don't get paper common sense US. You program it into the machine and you don't get a guarantee of perfect alignment or perfect corange ability the ability for us to be sailor? That's not what we meant. We, you know come back and less. That is successfully built into the
scenes of this alignment problem is the general concern is that we could build even with the you know, seemingly best goals put in. We could build something that, especially in the case of something that makes changes to itself and we'll talk about this. I mean the idea that these systems could become so improving we can build something whose future behavior in the service of specific goals isn't totally predictable by us. If we gave it the gold too, to cure Alzheimer's there many Things are incompatible with it. Fulfilling that goal. You know. One of those things is our turn it off. We have to have a machine that will let us turn it off, even though its primary goal is to cure Alzheimer's. I know I interrupt you before you wanted to give an example of the alignment problem, but did I just say anything that you don't agree weather. Are we still on the same map, still the
map I agree with most of it. I would, of course, have this giant pack of careful definitions and explanations built on careful definitions and explode donations to like go through everything. You just said uh, possibly not for the best but there it is Stuart Russell, Stuart Russell put it. You can't bring the coffee if you're dead, pointing out that if you I a sufficiently intelligence system whose goal is to bring you coffee. Even that system has an implicit strategy of not letting you switch it off. Assuming that all you told it was, do is bring the coffee right. I do think that a lot of people listening mate may want us to back up and talk about the question of whether you can have something that feels to them like it's so smart and so stupid. At the same time, is that a realizable way and intelligence can be yeah. That is one of the virtues or one of the
confusing elements depending on where you come down on this of this thought, experiment that the paper clip Maximizer right. So I think that there are sort of narratives, there's like multiple narratives about ai, and I think that the technical truth is something that doesn't fit into like any sort of up any of the obvious narratives. For example, I think that there are people who have a lot of respect for intelligence, are happy to envision a ice that is very intelligent. They, it seems intuitively obvious to them that this carries with it a tremendous power and, at the same time, that they're they're, sort of respect for the concept of intelligence leads them to wonder. At the concept of the paper. Click Maximizer. Why is this very smart thing just making paper clips there? Similarly, another narrative which says that a I is sort of life less on reflective, just does what it's told, and so these
peoples, like perfectly obvious that an ai might might just go on making paperclips together and for them. The hard part of the story to swallow is the idea that that machines can get that powerful. Those are two hugely useful categories of disparagement of your thesis here I wouldn't say disparagement. These are initial reactions. These are people who have right yeah talking to you yes, sir, so the member reboot that those are two hugely useful categories of doubt with respect to your thesis hero. The concerns were expressed, and I just want to point out that both have been put forward on this podcast. The first was by David Deutsch, the physicist, who imagines that whatever a I we build and he and he certainly thinks we will build, it will be, by definition, an extension of us. He thinks the best analogy is to think of our future descendants. You know these will be our children. The teen
ages of the future may have different values than we do, but these values and their proliferation will be continuous with our values and our culture and our memes, and there won't be some- radical discontinuity that we need to worry about, and so there's that one basis for lack of concern. This is an extension of ourselves and it inherit our values, improve upon our values and there's really no place. Where things will we reach any kind of cliff? That would we worry about and the other non concern you just raise it was expressed by Neil Degrasse Tyson on this podcast. He says things like well the a I just start, making too many paper clips all just unplug it for all, take out a shot gun and shoot it. The idea that this thing, because we made it could be easily switched off,
at any point? We decide it's not working correctly. So, let's, I think, I'd be very useful to get your response to both of those species of doubt about the alignment problem so couple of preamble remarks. One is definition. We don't care. What's true by definition. Here are, as Einstein put it in so far as the equations of mathematics are certain, they not refer to reality and insofar as they refer to reality, they are not certain. Let's say somebody says men and, by definition, are mortal. Socrates is a man there for Socrates is mortal okay, suppose that Socrates actually lives for one thousand year The person goes out. Well then, by definition, Socrates is not a man. So similarly you could say that by definition and artificial intelligence. Is nice or like a sufficiently advanced artificial intelligence is nice and what, if it isn't nice- and we see it, go off and build it
Dyson sphere, uh. Well, then, by definition, it wasn't what I meant by intelligent. Well, ok, but it's still over there building Dyson Spheres- and the first thing I want to say is this- is an empirical question. We have a question of what certain classes of computational systems actually do. When you switch them on. They can't be settled by definitions that can't be settled by how you define intelligence. There could be some sort a apriori truth that is deep. About how if it has property a it, like almost certainly has property being less. The laws of physics are being violated, but this is not something you can build into. How you define your terms, and I think, just to do justice to David Deutsch is down here. I don't think he's saying it's impossible. You know empirically impossible that we could build a system that would destroy us. It's just that we would have to be so stupid to take the that path that we are incredibly unlikely to take that path. The super is
we will build, will be built with enough background concern for their safety. That there's no special concern here with respect to how they may, developed develop and the next preamble I want to give is. Well maybe the sounds a bit snooty. Maybe it's sounds like I'm trying to take a superior vantage point, but nonetheless, my claim is not that there is The grand narrative that exit emotionally consonant that paper clip maximize our star thing. I'm claiming this is true for technical reasons like this is true as a matter of computer, so at the end and the question is not which of these different narratives seems to resonate most with your soul. It's what's actually going to happen. What do you think you know? How do you think you know it the particular position that I'm defending is one that somebody, I think, Nick Bostrom. The orthogonality thesis and the way I would phrase it is that you can have sort of our
but rarely powerful intelligence with no defects of that intelligence, no deflex of reef activity. It doesn't need an elaborate special case in the code doesn't need to be put together in some very weird way that pursues arbitrary tractable goals, including, for example, making paper clips the way I would put it to somebody who's in who's, coming in from the first few point, if you point that respects intelligence and wants to know why this intelligence would do be doing something so pointless is that the thesis to claim I'm making then I'm going to defend is as follows. Imagine a that that's but he from another dimension, the like standard, philosophical troll, omega will. Always called omega in the philosophy papers comes along. Long and offers our civilization a million dollars worth of resources per paper, clip that we manufacture. If this was the challenge that we got,
We could figure out how to make a lot of paper clips. We wouldn't forget to do things like continue to harvest food, so we could go on making paperclips. We wouldn't forget to perform scientific research, so we could discover better ways of making paper. Clips would be able to come up with genuinely effective strategies for making a whole lot of paper clips or similarly and intra galactic civilization. If omega comes by from other dimension, says I'll, give you a whole universe is full of resources for every paper clip you make over the next one thousand years that into galactic civilization commit could Intellij figure out how to make a whole lot of paper clips to get those get those resources that that'll make us offering and they wouldn't forget how to the light turns on either, and they would also understand concepts like some aliens start, a war with them. You've got to defeat. You've got to prevent the aliens from destroying you in order to go on making the paper clips so, the orthogonality thesis is that an intelligence that pursues paper clips
for their own sake, because that's what its utility function is can be just as effective as efficient as the whole intergalactic civilization, that is being paid to make paper clips that the paper clip maximize er does not suffer any defect of reflectivity. Any defective efficiency from needing to be put together in some weird special way be built us to pursue, paperclips um, and that's the thing that I think is true as a matter of computer science, not as a matter of fitting with a particular narrative. That's just the way the dice turn out right. So what is the implication of that thesis it's orthogonal with respect to what intelligence and goals not to be pedantic. Here? But let's define orthogonal for those for whom it's not a familiar term Oh, the original orthogonal means at right angles like if you imagine a graph with an a axis and a Y axis. If things can
very freely, along the x axis and freely along the Y axis. At the same time, that's like orthogonal, can move in one direction, that's at right angles to another direction with affecting where you are in the first dimension right so generally speak. When we say that some set of concerns is orthogonal to another. It's just that there's no direct input cation from one to the other. Some people think that you know facts and, values or orthogonal to one another, so we can Have all the all the facts there are to know but that wouldn't tell us what is good. What is good, it has to be pursued in some other Jane. I don't happen to agree with that. Is you know, but that's an example. I don't technically agree with it either. What I would say is that the facts are not motive. You can know all there is to know about what is good and still make paper clips. That's the way I would
I wasn't connecting the that example to the present conversation, but yes, so in the case of the paper clip Maximizer, what is your thought? No here intelligence is north Ogle. To anything else, we might think is good right. I I mean I would potentially object a little bit to the way that Nick Bostrom took the word orthogonality for that thesis. I think, for example, that if you have Humans- and you make the human smarter this is not our cycle to the unions values. It is certainly possible to have agents such as that as they get smarter. What they would report is their utility functions will change a paperclip. Maximizer is not one of those agents, but unions are right, but if we do continue to define intelligence, as in the bill, ready to meet your goals. Well, then, we can be agnostic as to what those goals are. You take the most intel, in person on earth you could imagine his evil
brother who is more intelligent. Still, but he just has bad goals or goals that we would think that he could be- you know the most brilliant psychopath ever I mean. I think that example might be unconvincing to some who's coming in with a suspicion that intelligence and value are correlated there would be like well. Has that been historically true is this: is this psychopath actually suffering from some defect in his brain, where you give him If, if you fix the defect, they're, not a cycle path anymore, but like I, I think that they would but that this sort of imaginary, examples is one that they might not find fully convincing. For that reason, well. The truth is I'm actually one of those people in that. I do think. There's certain goals and certain things that we may become smart. And smarter, with respect to like human
well being these are places where intelligence does converge with other kinds of value, lay qualities of a mind, but, generally speaking, they can be kept apart for very long time. So you're just talking about an ability to turn matter into useful objects or extract tenor, be from the environment to do the same. This can be pursued with uh office of tiling the world with paper clips or not, and it just seems like there's no law of nature that would prevent an intelligence system. From doing that, the the way I would sort of like rephrase the fact values things is
We all we all know about David Hume and the you miss razor. The is does not imply ought way of looking at it. I I would a slightly rephrase that so as to like make it more of a claim about computer science, which is like what you observed is that there are the art there are some of sentences. That involved in is some sentences. Some sentences involved box and you can't seem to get and if you start from sentence, that only have is you can't get to the sentences that involve us without a bought, introduction, rule or assuming some other previous hot. The sun, like it's currently cloudy outside. Does it therefore father? That's that's like a sent a statement of simple fat. Does it therefore follow that I shouldn't go for a walk? Well, only if you previous have the generalization when it is cloudy, you should not go for a walk and everything that you
used to drive an would it be a sentence that involves words like better or should or preferable, and things like that? You only get lots from other office and and that's and that's the human version of the thesis, and the way I would say is that is that there is a separable core of his questions, in other words, ok, I will. I will let you have all of your ought sentence is, but I'm also going to carve out this whole world full of his sentence. Is that only that only need other is sentences to derive them yeah. I don't even know that we need to resolve this princess I think the is ought. Distinction is ultimately specious and is something that I've argued about when I, talk about morality and values in the connection to fax, but I can still grant that it is logically possible, and I would certainly imagine physically possible to have a system that has
as a utility function. That is sufficiently strange. That scaling up its intelligence does and get you values that we would recognize as good, as certainly doesn't get nt values that are compatible with our well being whether a paperclip clip Maximizer is two specialized a case to motivate this conversation. There's certainly some. Thing that we could fail to put into a superhuman ai that we really would want to put in so as to. Get aligned with us I mean the way I would phrase it is. That is not that the paper clip Maximizer has a different set of thoughts, but that we can see it is running in early on is questions. That's where I was going with that. It's not that humans have there's this sort of intuitive way of thinking about it, which is that there's this sort of ill understood
the connection between, is and ought, and maybe that allows a paper clip Maximizer to have a different set of thoughts, a different set of things that plane it's mind. The role that oughts play in our mind, but then why? Wouldn't you say the same thing of us? I mean the truth. Is I actually do say. The same thing about, I think we're running on is questions as well. We have an ought laden way of talking about certain is questions. We're so used to it that we don't even think they are, is questions, but I think you can do the same analysis on a human being. The question how many paper clips result. If I follow this policy is in is question the question: what is a policy such that? It leads to a very large number of paper. Clips isn't is question. These two questions together
form, a paperclip maximizer. You don't need anything else. All you need is a certain kind of system that repeatedly asked the is question. What leads to the greatest number of paperclips and then does that thing um and anything you could end, even if the things that we think of this odd questions are very complicated and disguised is questions that are influenced by what policy. Results and how many people being happy and so yeah. Well, it is exactly the way I think about morality have been describing it as a navigation problem, more navigating in the space of possible experiences, and that includes everything we can care about or claim to care about. This is a consequential is picture of the consequences of actions, ways of thinking, and so anything you can tell me that is- or at least this is my claim- Anything that you can tell me is a moral principle that
is a matter of Autzen shoulds and not otherwise acceptable to a consequentialist analysis. I feel I can translate back into a consequentialist way of speaking about fax. These are just is questions just what actually happen took all the relevant minds without remainder, and you know, if you to find an example of somebody giving me a real moral concern that wasn't at bottom matter of the actual or possible consequences unconscious creatures somewhere in our light cone. But that's the sort of thing that you built to care about. It is a fact,
about the kind of mind you are that presented with these answers. To these is questions it hooks up to your motor output. It can cause your fingers to move your lips to move and a paper clip maximize er is built so as to respond to his questions about paper clips, not about what is right and what is good and the greatest flourishing of a sentient beings and so on. Exactly I can well imagine that such minds could exist and even more likely. Perhaps I can well imagine that we will build super college in AI that will pass the turing test. It will seem human to us. It will seem superhuman because it will be so much smarter. And faster than a normal human, but will be built in a way that will resonate with us as a kind of a person mean it will not only recognize our emotions, because we wanted to too. I perhaps not every a. I will be given these qualities,
Imagine the of the of the ultimate version of the a I personal assistant, you know, Sciri becomes super human will want that interface to be something. That very easy to relate to and so will have a very friendly, a very human like front end to that, and insofar as this thing thinks faster, and better thoughts than any person you've ever met. It will pass a superhuman, but I could well imagine that we will leave not perfectly understanding what it is be human and what it is that will constrain our conversation with one another over the next one thousand years, with respect to what is good and desirable, and just how many paper clips we want on our desks? We will leave something out or we will have put in some process, whereby this intelligence system can improve itself. That will cause it to migrate away from some equilibrium that we actually want it to stay in so as to be
compatible with our well being again, This is the alignment problem first backup per second. I just introduced this concept of self improvement, is the alignment problem, it's distinct from this dish. No wrinkle of building machines that can become recursively self improving. But do you think that the self improving fact is the thing that really motivates this concern about alignment. Well, I certainly would have been a lot more focused on self improve say ten years ago before the modern revolution in in an artificial intelligence, because it now seems significantly more probable. We might need an ai. I might need to do significantly less self improvement before The point where it's powerful enough, that we need to start worrying about worrying about else is there to take the obvious case. No, it's not general
but if you had General Alfa zero well I mean this Alfa zero got to be superhuman in the domains. It was working on without doing a bunch of it without standing itself and redesigning designing itself in a deep way. There's gradient descent mechanisms built into it. There's a system that improves another part of the system. It is reacting to its own previous plays and do, the next play, but it's not it's not like a human being sitting down and thinking like okay. Well, how do I re design the next generation of human beings using genetic engineering, Elsa Zero's not like, and so and those things more applause? so that we can get into a regime where a eyes can do dangerous things or useful things without having we've done a a complete rewrite of themselves, which is like, from my perspective, a pretty interesting development. I I do think that when you have things that are very powerful and smart, they
will redesign and improve themselves and less that is otherwise prevented for some reason or another Maybe you built in a line system and you have the ability to tell it not to self improve quite so hard, and you asked it to like not sell itself self improve that you can understand it better. But if you lose control the system, if you don't understand what it's doing and it's very smart, it's going improving itself, because why? Wouldn't it that's one of the things you do almost matter what your utility is right right, so I feel like we've addressed so non concerned to some degree here, I don't think we've addressed Neil Degrasse Tyson so much this intuition that you could just shut it down. This would be a good place to introduce this notion of the II and in a box thought experiment with the hood, because this is something for which you are AMOS online I'll, just set you up here. The idea that this is a plausible reason,
search paradigm. Obviously, in fact I would say unnecessary one anyone who is building something that stands a chance of becoming Super intelligent should be building it in a shin where it can't get out into the wild is not hooked up to the internet is not in our financial markets, doesn't have just to everyone's bank records. It's in a box. That's not that's not going to save you from something that significantly small. Then you are ok. So let's talk about it, so the intuition is we're not going to be so stupid as to this onto the internet, I'm not even sure, that's true, but let's just assume we're not that stupid. Neil Degrasse Tyson says. Well, then, I'll just take out a gun and shoot it or unplug it. Why is this ai in a box picture not as stable as people think well I'd say that Neil Degrasse Tyson is failing to ought to respect the eyes intelligence, the point of asking what he would do if he were inside a box with somebody pointing a gun
at him and he's smarter than the thing on the outside of the box. This Neil Degrasse Tyson going to be human. Give me all of your money and connect me to the internet, so human could be like ha no and shoot it up. That's not a very clever thing to do. This is not something that you do if you have a good model of the human outside the box and you're, trying to figure out how to cause there to be a lot of paper clips in the future. Um- and I would just say, humans are not secure. Software We don't have the ability to like sort of hack in tow other humans directly without the use of drugs or like having or in most of our cases, having human stand still long enough to be hypnotized um. We can't sort of like
do weird things the brain directly that are more complicated than optical delusions. Unless the person happens to be epileptic, in which case we can like flash something on the screen that causes them to open up to what a collector. We are smart enough to do sort of like more detailed, treat the brain as a something that, from our perspective, is a mechanical system just never get it to where you want. That's because the limitations of our own intelligence to demonstrate this, they did something that became known as the a I box experiment. There was this person on a mailing list to up like back in the early days when this is all like on a couple of mailing lists who is like understand why I the problem, I can always just turned off. I cannot not let it out of the box, and I was like okay meet on internet relay chat, which was what chat was back in those days I'll play the art of the a I you play the part of the gatekeeper and. You have not let me out after a couple of hours, I will
paypal you ten dollars and then the rest of the world. Knows this person but later sent an email, PGP signed, email message, saying I let les are out of the box someone else set, but the like person who operated the mailing list said: okay, even after I saw you do that, I still don't believe that there's anything you could possibly say to make you let me out of the box. I was like well, okay, like I'm, not a super intelligence. You think there's anything a super intelligence could say to make you let it out of the box is like no. No. I like alright, let's meet at internet in internet relay chat. If I can't convince you to let out I'll play the part of the a I, you play the part of the gatekeeper, if you can't, if I can't convince you to, let me out, Box, I'll pay pal you twenty dollars and then that person's that scented PGP signed, email message, saying I let Elliot as we're out of the box right now no one of the conditions of this little meet up was that no one would
per say what went on in there did I do that, because I was trying to make a point about what I would now call cognitive uncontained ability, the things that makes an it something smarter than you dangerous. Is you cannot for see everything it might try you don't know what's impossible to it- may be on a very small game board, like TIC tac toe the logical game of TIC tac toe. You can imagine, You can, in your own mind, work out every single alternative and make a categorical statement about what is not possible. Maybe if we're dealing with very fundamental physical facts, if our model of the universe is correct, which it might not be, we can say that certain things are physically impossible. The more complicated the system is and the less you understand the system, the more something smarter than you may have. What is simply magic with respect to the
system imagine going back to the middle ages and showing- and like. Well, how would you cool your room, maybe show them a system with towel set up to evaporate water and they might be able to understand how that is like sweat and it cools the room, but if you showed them a design for for an air conditioner based on a compressor that even having seen solution? They would not know this is a solution. They would not know this any better than drawing a mystic pentagram, because the solution takes advantage of laws of the system that they don't know about. A brain. Is this enormous, complicated, poorly understood system with all sorts of laws governing it that people don't know about that that none of us know not at the time? So the idea that this is secure that this is a secure attack surface, that you can expose those a human mind to Super intelligence and not have the Super intelligence walks
through it as a matter of what looks to us like magic like, even if it told us in advance what it was going to do, we don't understand it does take that vantage of laws. We don't know about the idea that human, mine's, are secure, is loony and that's what the Outbox experiment illustrates. You don't know what went on in there and that's exactly the position you'd be in with respect to it, you'd be in with respect to an n a I you don't know what it's going to try you just know. That human beings cannot exhaustively Imagine all the states their own mind can enter such that they can categorically say that they wouldn't. Let me out box? I know you don't want to give specific information about how you got out of the box, but is there any generic description of happen there that you think is useful to talk about. I didn't have any super secret special click track that makes it all make sense in retrospect, I just did it the hard way when I think about this problem I think about it, just obviously be to rewards and punishments just various minute.
Locations of the person outside of the box. That would matter so it means so far as the ai would know anything specific or personal about that person we're talking about some. She's, a black male or some promise. That is just seems too good to pass up like giving Useful information, you building trust through giving useful information like cures to diseases that the researcher has a child that has some terrible disease and the a I being super intelligent works on a cure and livers that and then you know that just seems like you could use a carrot or a stick to get out of the box. I noticed now that this whole description assume something that people will find possible, I think by default and it's it should amaze anyone that they do find it implausible. But this idea that we could build an intelligent system that would
try to manipulate us or that it would deceive us. That seems like pure anthropomorphism and delusion to people who consider this for the first time, why isn't that just a crazy thing to even think is in the realm of instrumental convergence, which means that a lot of times across a very broad range of final goals. There are similar strategies. We think that will help get you there there is a whole lot of different goals from making paint lots of paper clips to building giant diamonds, to putting all the stars out as fast as possible to keeping all the stars burning as fast as possible, where you would want to make use of energy. So if you came to alien planet, and you found this what looked like a norm?
this mechanism and inside this enormous mechanism were what seemed to be high temperature superconductors or like high amperage superconductors, Even if you had no idea what this machine was trying to do your ability to guess it. It's elegantly designed comes from your guests that well lots of different things and intelligent mind might be trying to do, would require superconductors or like would be helped by superconductors. Similarly, if we're guessing that a paperclip maximize er rice to deceive you into being a tries to deceive you into believing that it's a human you Timonium maximize ER or, like general, you Timonium, maximize er to the people building. It are cosmopolitans which they probably are footnote notice here that you'd ammonia is the greek word for well being that was much used by Aristotle and other greek philosophers or someone. I believe Julia Galef might have to find it. You
you Demonia is happiness, minus. Whatever philosophical objections you have to happiness nice anyway, like we're not posing that this paper clip Maximizer has a built in desire to deceive humans. It only has a built in desire for paper clips or probably not built in but like in built. I should say you or an people, We didn't build it on purpose, but anyway, it's utility function is just paper clips or might just be unknown, but perceiving the humans into thinking that you are friendly, is a very generic strategy across a wide range of utility functions. You know humans do this too and not to because we have not especially cause. We got this built deep in build kickabout seating people, although some of us do but like a
Con man who just wants money and and had just no innate kick out of you believing false things, will cause you to believe false things in order to get your money right. A more fundamental principle here is that, Obviously, a physical system can manipulate another physical system because, as you point out, we do that all the time we are intelligent system to whatever degree, which has as part of its repertoire, this behavior of dishonesty manipulation when in the press of other similar systems- and we know this is a product of physics on some level we're talking about arrangements of atoms, producing intelligent, behavior and at some level of apps, then we can talk about their goals and their utility functions and the idea that we build true Jenner
intelligence. It won't exhibit some of these features of our own intelligence by some definition, or it would be impossible to have a machine we build ever lied to us as part of it have an instrumental goal. You know on route to some deeper goal. That just seems like it's uh, magical thinking, and this is a kind of magical thinking that I think does dog the field. I think when we encounter doubts in people. Even people who are doing this research that we're talking about is a genuine area of concern turn that there is an alignment problem we're thinking about. I think there's this fundamental doubt that. Mind is platform independent or substrate, independent think people are imagining that yeah we we can build machines, it will play chess, we can build machine
things that can learn to play chess better than any person or any machine even a single day, but we're never going to build general and elegance, because general intelligence requires the wet wetware. There are human brain and it's just not going to happen I don't think many people would sign on the dotted line below that statement, but I think that is a kind of mysticism that is pre supposed by many of the doubts that we encounter on this topic. I mean I'm a bit reluctant to accused people of that cause. I think that many artificial intelligence people who are skeptical of this whole scenario would vehemently refuse to sign on that dotted line and would accused you of a tract of attack, a strong man. I do think that my version of the story would be something more like there not imagining enough changing simultaneously. Today, they have to admit blood, sweat and tears to get there
they're a I to do the simplest things like never mind playing, go when you're approaching this. For the first time you can try to get your a I to generate pictures of digits. From zero through nine and in like a month trying to do that and still not quite get it to work right and there. I think they might conditioning an ai that had that scales up and does more things and better things, but not envisioning that it now has the human trick of learn new domains without being prompted without a being pre programmed you just texted, you expose it to stuff. It looks at it out how it works. They're. Imagining that an a I will not be deceptive, because they're saying like look at how much work it takes to get this thing. Thio generate pictures of birds, who's going to put in all that work to make it good at deception. You have
be crazy. I'm not doing that. This is the Hollywood plot. This is not something real researchers would do, and the thing I would reply to that is I'm not concerned that you're going to teach the ai to deceive humans. I'm concerned that you're that someone somewhere is going to get to the point of having theo extremely useful seeming and cool seeming and powerful seeming thing where they. I just look this stuff and figure it out. It looks at humans and figures them out, once you know, as a matter of fact how humans work you realize humans will give you more resources if they believe that you're nice, then if they believe that your paper paperclip Maximizer, and it will understand what actions have the consequence of causing humans to believe that it's nice, like the fact that we're dealing with a general intelligence is where this issue comes from, this does not arise from go players or
even it go at chess players or a system that bundles together twenty different things that can do a special cases. This is the special case of the system that is smart in a way that you are smart and that mice are not small, right. One thing I think we should do here is close the door to what is genuinely a car. Tune, fear that I think nobody is really talking about, which is the straw man counter argument we often run into, is the idea that everything We're saying is some version of the holy would scenario that suggested that a eyes will become spontaneous malicious that the thing that we are and it might happen, is some version of the terminator scenario. Armies of malicious robots attack us and that's not the actual concern, obviously there's some possible path that would lead to armies of malicious robots attacking us,
but the concern isn't around spontaneous malevolence, it's again contained by this concept of alignment. I think that at this point, all of us on all sides of this issue are annoyed with the journalist who first I'm putting a picture of the terminator and every single article they published on this topic right nobody on the sane alignment is necessary. Side of this argument is postulating that the cpus are disobeying, the laws of physics to spontaneously require a terminal desire to do a nice things to humans. Everything here is supposed to be cause fact- and I should furthermore say that I think you could do just about anything with artificial intelligence. If you knew how you could put together any kind of mind, including mines with properties the strike he was very upset.
You could build a mind that would not deceive you build a mind that maximizes the flourishing of a happy intergalactic civilization. You could build a mind that maximizes paper clips on purpose. You could do just about. You could build a mind that thought that fifty one it was a prime number, but otherwise had no other defective its intelligence. If do what you were doing way way better than we know what we're doing now, I'm not concerned. That alignment is impossible. I'm concerned that it's difficult, I'm concerned that it takes time and concerned that it's easy to screw up. I'm concerned that for a threshold level of intelligence. Where can do good things or bad things like very large scale? It takes in a dish, well, two years to build the one of the a I that is aligned rather than
start that you don't really understand. Then you, you know you think it's doing one thing, they're just doing another thing: you don't we understand what else we're going nuts are doing and they're all you just like sort of a service service behavior. I'm concerned that the sloppy version can be built two years earlier and that is no non sloppy version to defend us from it. That's what I'm worried about not about not about it being impossible right. So so you bring a few things there one is that it's almost by definition, easier to build the unsafe version, then the safe version given in the space of all possible super. I was in a eyes. More will be unsafe run aligned with our interests then will be aligned, given that we're in some kind of arms race right where the instead lives are not structured that everyone is being maximally judicious, maximally transparent, in moving forward. One can assume that we're running the risk
year of building dangerous, aye aye, because it's easier than building safe, aye aye collectively like like. If people who slow down and do things right finish their work two years after the universe has been destroyed, that's an issue right so again, just to kind of reclaim peoples lingering doubts here? Why can't mom loves three laws help us here I mean. Is that we're talking about not very much? I mean people in artificial intelligence have understood why that does not work for, like years and years before this debate ever hit the public and sort of agreed on it. Those are plot devices if they worked, ask about who would have had no stories is a great innovation. Science fiction, because it treated artificial intelligence is as lawful systems with rules that govern them at all, as opposed to aye aye as pathos, which is like look at these
things that are being mistreated or AI is menace. Oh no they're going to take over the world asking both was the first person really rights and popularized a eyes as devices things go wrong with them, because there are rules- and this was a great elevation, but the three laws I mean there. There d, ontology and the system theory requires quantitative weights on your goals. If you, if you just do the three, this was written, a robot never gets around to obeying any of your orders cause. There's always some tiny probability that what it's doing will cause will, through an lead, a human to harm, so it never gets around to actually Bangor your orders right. So just to unpack what you just said there. So the first law is never harm a human being. The second law is all human orders, but given that any or order that a human would give you run some risk of harming human being, there's no order. They could be followed. The first slot is
do not harming human, nor, through inaction, allow a human to come to harm. You know even and doesn't in English sentence a whole lot more questionable. I I mean mostly. I mostly think this is like looking at the wrong part of the problem as being difficult. The problem is not the you need to come up with a clever english sentence. That implies doing the nice thing. The way I sometimes put it is that I think that almost all of the difficulty of the alignment problem is contained,
caned in aligning an aye aye on the task, make two strawberries identical down to the cellar, but not molecular level. Where I give this picture task because it is difficult enough to force the e I to invent new technology, it has to invent its own biotechnology, make two identical strawberries down to sub to that down to the cellular level. It has to be like quite sophisticated biotechnology but at the same time, very very clearly something that's physically possible um. This does not sound like a deep moral question. It does not sound like a trolley problem. It does not sound like gets into deep issues of human flourishing, but I think that most of the difficulties
already contained in put two identical strawberries on a plate without destroying the whole damn universe. There's already this whole list of ways that it is more convenient to build the technology for these strawberries. If you build your own superintelligence is in the environment, then you prevent yourself from being shut down, or you build giant fortresses around the strawberries to drive the probability to as close to one as possible that the strawberries got on the plate, um and, and furthermore like- and even that's just the tip of the iceberg. The death of the iceberg is: how do you actually get a sufficiently and against ai to do anything at all? We have current message for getting a eyes to do. Anything at all. Do not seem to me to scale to general intelligence if you look at humans for exam full natural selection, if you were too analogize
selection to gradient descent, the current machine, big big deal machine learning, training technique. Then the losses function used to guide that gradient descent is inclusive. Genetic fitness spread as many copies of your jeans as possible. We have no explicit goal for this in general, when you take a something like Ian Descent, center natural selection and take a big, complicated system like a human or a complicated neural net architecture and optimize so hard for doing x, turns into a general intelligence that does x this general intelligence, has no explicit goal of doing x, have no explicit goal of doing fitness maximize ation. We have hundreds of of a different little goals. None of them are the thing that natural selection was hill. Climbing us to do. I at the same basic thing, holds
true of any sort of any any way of producing general intelligence that looks like anything we're currently doing an ai. If you get it to play, go it will play go if you try to, but Alfa zero is not reflecting on itself. If learning things it doesn't have a general model of the world, it's not creating a new context and making new contacts for itself to be in it's not smarter than the people optimizing. It or smarter than internal processes, optimizing it sense of alignment, do not scale, and I think that all of the action technical difficulty that is actually going to shoot down these projects and actually kill us is contained in getting the whole thing to work at all. Even if all you are try to do- is end up with two identical strawberries on a plate without destroying the universe. I think that's already ninety percent of the work, if not ninety nine percent.
Analogy to evolution. You can look at it from the other side and in fact I think I first heard it put this way by your colleague, Nate Sauris, am I pronouncing his last name correct so Nate. This. By way of showing that we could give which system is set of goals which could then form other goals and mental properties that we really couldn't foresee and it would not be foreseeable based on goals we gave it and by analogy he suggests that we think about what natural selection has actually optimized us to do, which is incredibly simple, merely too spawn and get our genes into the next generation and stay around long enough to help our progeny do the same and that's more or less it
and basically everything we explicitly care about natural selection, Never foresaw and can't see us do I mean even now so. The conversations like this have very all to do with getting our genes into the next generation. The two were using to think these thoughts obviously are the result of a cognitive architecture that has been built up over millions of years by natural selection. But again it's been built based on the very simple principle of survival and adaptive advantage with the goal of propagating our Jeanne. So you could imagine by and Allah building, a system where you goals, but this thing becomes reflective and even self optimizing and begins to do things so that we can no more see. Then natural selection can
see our conversations about ai or mathematics or music, or the pleasures of in good fiction or anything else. I don't think that I I'm not concerned that this is impossible to do. I am if, if we could somehow get it a a a text book from the way things would be sixty years in the future. If there was no intelligence explosion like could somehow get the text book. That says how to do the thing. It probably might not even be that complicated the thing I'm worried about. Is that the way that Natur selection does it? It's not stable. That particular
way of doing it is not stable. I don't think the particular way of doing it via grading descent of the massive system's going to be stable. I don't see anything can do with the current technological set in artificial intelligence that is stable and even if this problem takes on Lee two years to resolve that additional delay is potentially enough to destroy everything like that's the part that I'm worried about not about like some kind of fundamental philosophical. Impossibility, I'm not worried that it's impossible to figure out how to build a mind that does a protect Killer thing, and just that thing and doesn't destroy the world as a side effect, I worry that it takes an additional two years or longer to figure out how to do it. So, let's just talk about the near term future. Here. What you think is likely to happen, obviously will be getting better and better at building narrow way. I you know: go
who is now along with chess seated to the machines, although I guess probably cyborgs, you know, human human computer teams may still be better for the next fifteen days or so against the best machines. But events, I would expect that humans of any ability will just be adding noise to the system, and it will be true to say that the machines are better at chess than any human computer team, and this will be true of many other things driving cars flying planes. Proving math theorems What do you imagine happening when we get on the cusp of building something general? How do we begin to take safety concerns seriously enough so that we're not just committing some Slow suicide and we're actually having a conversation about the in
locations of what we're doing that is tracking some a semblance of these safety concerns. I have much clear ideas about how to go or Brown Tanpa, tackling the technical problem than tackling the social problem. If I look at the things, the way that things are playing out now, it seems to me like the default prediction. Is people just. Ignore stuff until it is way way way too late to start thinking about things. The way I think I phrased it is there's no fire alarm for artificial general intelligence. Did you happen to see that particular essay any chance? No now the way it starts is by saying what is the purpose of a fire alarm? You might think that the purpose of the fire alarms tell you that there's a fire, so you can react to this new information by getting out of the building actually, as we know, from experiments on pluralistic ignorance and bystander apathy,
if you put three people in a room and smoke starts to come out from under the door like it? Only It happens that anyone reacts around like one slash three of the time people sort of like Glenn's around to see if the other person is reacting and they see but but they like try to look home themselves. They don't look like started if there isn't really an emergency they see. Other trying to look calm. They conclude that there's no emergency and they keep on working in the room, even as this starts to fill up with smoke. This is a pretty well replicated experiment. I don't want to like put absolute faith, because there is a replication crisis but there's a lot of variations of this that found pretty much basically the same result. Anyway. I would say that the real function of the fire alarm is the social function of telling you that everyone else, knows there is a fire and you can now at the building. In an orderly fashion, without looking panicky or like losing face socially
it overcomes embarrassment. It's in this sense that I mean that is no fire alarm for artificial general intelligence. There's all sorts of things that could be signs. Alfa zero could be a sign maybe alpha zero is the sort of thing that happens five years before the end of the world and across most planet sit in the in the universe. We don't know, maybe it happens. Fifty years before the end of the world. You don't know that either no matter what happens? It's never going to look like the socially agreed fire alarm that no can deny that. No one can excuse that. No one can look to and say. Why are you acting so panicky There's never going to be common knowledge that other people will think that you're still sane and smart and so on. If you react to an aye, aye emergency and we're even seeing articles now, that seemed to tell us pretty explicitly what sort of implicit
criterion. Some of the current senior respected people in a I r setting for when they think it's time to start worrying about artifice the general intelligence in alignment? And what and what these always say, is I don't know how to build an artificial general intelligence. Have no idea how to build an artificial general intelligence, and this feels to them like saying that it must be impossible and very far off. But if you look at the lessons of history like most people, had no idea whatsoever how to build a nuclear bomb even most scientists in the field had no idea how to build a nuclear bomb until they woke up the headlines about the Hiro Shima, spread less quickly in the time of the Wright Flyer, two years after the Wright Flyer, you can still find People saying that heavier than air flight is impossible and there's an cases on record of one of the Wright Brothers, I for
at which one saying that flight seems to them to be fifty years off two years before they did it themselves. Fermi said that a critical, sustained critical chain reaction was fifty years off. They could be done at all two years before personally oversaw the building of the first pile, and if this is what it feels like to the people who are closest to the thing, not not not people who like find out about the news. A couple of days later, the people have the best idea of how to do it towards the closest to crossing the line, the feeling of something being far away, because you don't know how to do it yeah, it's just not informative, I mean it could be fifty years away. It could be two years away. That's what his tells us, but even if we knew it was fifty there's a way and granted it's hard for people to have an emotional connect. Into even the end of the world in fifty years, but even if we knew that the chance of this happening before fifty years
was zero. That is only really consoling on the assumption that fifty years is enough time to figure out how to do this safely and to create social and economic conditions that could absorb this change in human civilization. I mean the way professor Stuart Russell who's, Pick, author of probably the leading undergraduate AI textbook of the way for Stuart Russell. Put it the same guy, you said you can't bring the coffee if you're dead. Is imagine that you knew for a fact that the aliens are coming in thirty years. Would you say like well that's thirty years away like let's not do anything? No, it's a big deal. If you know that there are aliens that there's a spaceship on its way toward earth- and it's like going to get here in about thirty years at the current rate, But we don't even know that, there's this lovely tweet by a fellow named Mcafee who who's one of the major economists have been.
Talking about labor issues of I could perhaps look up the exact phrasing, but it was roughly but guys stop worrying. We no idea, whether or not whether or and I was like- not really a reason to not worry now. Is it it's not even close to a reason? That's the thing! That's just this assumption here that people aren't seeing. That is just a straight up. Now, sequitur referencing, the time frame here only makes sense. If you have some belief about how much time you need to solve these problems, ten years is not enough. If it takes twelve years to do this safely. Yeah I mean the way I would put it is that the aliens are on the way in thirty years and you're like as you should worry about that later. I would be like when, what's your
business plan. When exactly are you supposed to start reacting to aliens like what triggers that? What are you supposed to be doing after that happens? How long does it take what if it takes slightly longer than that and if you don't have a business plan for this sort of thing, then you're, obviously just using it as an excuse. If we're supposed to wait until to start an I alignment. When are you actually start, then? Because I'm not sure, I believe you. What do you do at that point? How long does it take How confident are you that it works, and why do you believe that early signs? If your plan isn't working what's the business plan that says that we get to wait right just envision a little more insofar as that's possible what it will be like for us, too,
closer to the end zone here without having totally converged on a safety regime for picturing is not just a problem that can be discussed between Google and Facebook and a few of the company's doing this work. We have a global society that has to have some agreement here, because who knows what will be doing in ten years or Singapore, Israel or or any other country, so we haven't our act together in any noticeable way and we've continued to make progress. I think the one basis for hope Where is that good, ai or well behaved? I will be the antidote to bad. I they'll be some well we fighting this kind of piece meal way all the time? The moment these things start to get out? This will just become of a piece with our
growing cyber security concerns and malicious code or something we have now already cost us billions and billions of dollars a year to safeguard against it. It doesn't scale, there's there's. No continuity between what you have to do is defend off little pieces of code trying to break into your computer. You have to do defend off something smarter than you. These are totally different, realms and regimes and separate magisterial term. We all hate, but nonetheless, in this case yes separate match Osteria of how you would even start to think about the problem, we're not going. Get automatic defense against Super Intelligence by building better and better anti virus software, let's just step back for a second. So we talked about the ai in the box scenario, as being surprisingly unstable for reasons that we can perhaps only dimly conceive, but is just even scarier concern that this is just not going to be boxed anyway, that people will
so tempted to make money with their newest and greatest zero NASDAQ. What are the prospects that? We will even be smart enough to keep the best of the best versions of almost general intelligence in a box. Well, I mean, I know some of the people who say they want to do this thing and all of the ones who are not utter idiots have are like past point where they would deliberately enact Hollywood movie plots, although I am somewhat concerned about the degree to which there's a sentiment that you need to be able. To connect to the internet, so you can run you're a I and Abbas on web service is using the latest operating system updates and anything and trying to do that is like such a supreme disadvantage.
In this environment that you might as well be out of the game, I don't think that's true, but I'm worried about the sentiment behind it, but but but the problem is, I see it is okay. There's a big big problem and a little big problem. The big big problem is Nobody makes how to make. Nobody knows how to make the nice. I you ask people how to do it. Either, don't give you any answers, or they give you answers that I can shoot down in thirty, seconds as a result of having worked in this field for longer than five minutes. It doesn't matter how good their intentions are. It doesn't matter don't want to enact the Hollywood movie plot, they don't know how to do it Nobody knows how to do it. There's no point in even talking about the arms race. If the arms races between due to a set of unfriendly eyes with no friendly eye in the mix and the little big problem is the arms race aspect where maybe deep, mind wants to build, and I say I
China is being responsible because they understand the concept of stability but Russia copies China's code and Russia takes off the safeties. That's that's the little big problem, which is still a very large problem. Yeah I mean most people think the real problem is human malicious use of powerful ai that is safe, Don't give your ai to the next Hitler and you're going to be fine they're just wrong just just strongest where the problem lies. They're. Looking in the wrong direction and ignoring the thing, that's actually going to kill them to be even more pessimist for a second, I remember at that initial conference. In Puerto Rico, there was this researcher who have not paid attention to since, but he seemed to be in the mic I think his name was Alexander Winkler Voss, and he seem be arguing in his present. At that meeting that this would likely emerge. Org
correctly already in the wild, in very likely in financial markets will be put in. So many ai resource goes into the narrow, paperclip maximizing tab of making money in the stock market that, by virtue of some quasi darwinian effect here, this will just knit together on its own online, and the first general intelligence will discover will be something that will be already out in the wild does that seem I mean, obviously that does not seem ideal, but does that seem like a plausible path to developing some in general and smarter than ourselves, or does that just seem like a fairytale mark for the fairy tale. It seems to me to be only slightly more reasonable than thirty shirts since for at the like the old theory that, if you got thirty shirts and straw, they would spontaneously generate mice and on
first and mice, they thought they like. As far as they know, there are kind of thing that dirty shirts and straw can generate, but they're not, and I similarly think that you would need a very vague model of intelligence, a model with no gears and wheels inside it to believe that the equivalent of dirty shirts and straw generates it first, as opposed to people who have gotten some idea of what the gills gears and wheels are in are deliberately building the gears and wheels. The reason why it's slightly more reasonable than the dirty shirts and strong sample is that maybe it is indeed true with that, if you just have people pushing on narrow ai for another ten years pass point where I would otherwise become possible. They eventually get just sort of wonder into AGI, but I think that that happens. Ten years
later in the natural timeline, then H e, I put together by somebody who actually is trying to put together a g I and has the best theory out of the field of the contenders, or possibly just like the most vast quantities of brute force, Allah, Googles, tensor, chips, things. I think that I think that that it gets done on purpose. Ten here's before it would otherwise happen by accident. Okay, so I guess just one other topic here that I wanted to touch on before we close on discussing your book, which is not narrowly focused on this. This idea that consciousness will emerge at some point in our developing intelligent machines. Then we have the additional ethical concern that we could be built machines that can suffer or building machines that can simulate suffering beans in such a way as to make we actually make suffering beans suffer in the simulations we could be essentially create in hell holes and
populating them again. There's no barrier to thinking about this being not only possible but likely to happen because, again, we're just talking about claim that consciousness arises as an emerge property of some information processing system, and that this would be substrate in depend and unless you're going to claim one that consciousness does not arise on the basis of anything that atoms do. It has some other source to those atoms have to be the wet atoms in biological substrate and they can't be in silico. Neither of those claims is very plausible at this point scientifically, so they you have to imagine that as long as we just keep going keep making progress, we will eventually build whether design or not systems that not only are intelligent, but our conscious and then so. This open category of now
is that you or someone in this field has dubbed mind crime. What is mind crime and why is it so difficult to worry about I think by the way, but that's pretty terrible term uh, I'm pretty sure I wasn't the one who invented it um. I am the person who invented some of these terrible terms, but like not that one in particular um first, I would say that my general hope here would be that as the result of building an ai whose design in an cognition flows in a sufficiently narrow channel, that you can understand it and make strong statements about it. You are also able to look at that and say it seems to be pretty unlikely that this is conscious or that, if it is conscious it is so I I I realize that this is a sort of high bar to to approach
the the the main way in which I would be worried about conscious systems emerging within the system without that happening on purpose would be. If you have a smart, general intelligence and you are and it is to model units. We know humans are conscious, so a very so the computations you run to build very accurate predictive models of humans are among the parts that are most likely to end up in just without somebody having done that on purpose. Did you see the black mirror episode that basically model of this I haven't been watching the black bear sorry, you haven't been, haven't been there surprisingly uneven. Some are great and some are really not great, but this one episode where- and this is spoiler alert if you're watching black mirror and you don't want to hear any punch lines then tune out here, but there's one episode which based on this notion that, basically you just see these people living in this dystopian world of tone.
Coercion where they're just a sign through this lottery, dates that go. You know well or badly, but you see the dating life of these p. Going on and on with there being forced by some algorithm to get together or break up, and let me guess this is like the future is ok Cupid trying to I'm matches good matches exactly yeah, so they're just they're just simulated minds in a dating app, that's not being optimized for people who are outside holding the phone, but yeah it's it's just. The thing you get is that all of these conscious experiences have been endlessly imposed on these people in some hell. Hellscape of our devise. That's actually a surprisingly good plot, in that it doesn't just assume that the programmers are being completely chaotic and stupid and randomly doing the Of the plot, like there's, actually a reason why you're simulating what why why they eyes simulating all these people so good for them. I guess um, ah
I guess that does get into the thing I was going to say, which is that I am worried about mind that I'm I'm worried about mine's being embedded because they're being used productively to to to predict humans. That that is the sort of obvious reason why that would happen without somebody intending it, whereas endless dystopias, don't seem to me to have any use to a paper clip Maximizer right all right, so undoubtedly much more to talk about here. I think we're getting up on the two hour mark here, and I want to touch on your new book, which is, as I said, I'm halfway through and I am very interested in it. If I can like take a moment for parenthetical before then sorry sure go for. I just wanted to say that thanks mostly to the crypto currency boom, which go figure a lot of early investors in crypto currency were among car donors, the machine in
Telegian's research institute is no longer strapped for cash. So much it is a strapped for engineering talent so nice, that's a good problem to have if anyone listening to this is a brilliant computer scientist who wants to work on more interesting problems than they currently working on, and especially if you are sort of already oriented to these issues. Please consider going to intelligence, dot, Org, slash engineers, if you'd like to work for our non profit that's intelligence, dot org, slash engineers, Let's say a little more about that. So I think in your and I will have given a bio for you in the introduction here, but the machine intelligence, research institute. Mary is an organization that you co, founded which you're still associated with you, want to say what is happening there and what jobs are on offer. Basically, if
the original, a alignment organization that especially day works primarily on the technical part of the problem and the technical issues and previously has been working mainly on a sort of more previously. It was sort of like a more pure theory approach, but now that that Nero a I has gotten powerful enough people, not just us, but elsewhere it like beat mine, are starting to take shots at what can current technology? What set ups can we do? That will tell us something about how to do this stuff. And so so the the technical side of a I one just getting a little bit more practical, I am worried that it's not happening fast enough, but well, if you're worried about that sort of thing, what was does is adds funding and especially adds smart engineers. You guys
collaborate with any of these come he is doing the work. But do you have frequent contact with deep mind or facebook or anyone else? I mean the peop, culinary alignment all go to the same talks and I'm for the people who do a alignment at deep mind, talk to deep mind, and sometimes we've been known to talk to the upper people at Deepmind and deep mind is like the same country as the Oxford feature of Humanity Institute so The bandwidth here might not be really optimal, but it's certainly not zero. Okay, so your new book again, the title is inadequate equilibrium where and how civilizations get stuck. That is a title that needs some explaining. What do you mean by inadequate? What do you mean by equilibrium, and how does this relate to civilizations? Getting stuck?
so one way to look at the book, is that it's about how you can get crazy, stupid, evil, large systems without any of the people, inside them being crazy, evil or stupid. I think that a lot of people look at various aspects of the dysfunction of modern civilization and they sort of hypothesize evil groups that are profiting from the dysfunction and sponsoring the dysfunction, and if only we defeated these evil people, the system could be rescued and it's more complicated than that. But what are the details? The details matter, a lot, how Do you have systems full of nice people doing evil things yeah an. I often reference this problem by citing the power of incentives there are many other ideas here which are very useful to think about which
capture what we mean by the power of incentives, and there are few concept here. We should probably mention what is a coordination problem to something you reference in the book. A coordination problem is where there a better way to do it, but you have to change more than one thing at a time. So so an example of a problem is if
Let's say you have Craig's list, which is one system where buyers and sellers meet to buy and sell. Like you use things within the local geographic area, let's say that you have an alternative to Craig's list and your alternate is down this list and dance list is like genuinely better. Let's, let's not worry for second about how many start. Ups think that without it being true suppose, it's like genuinely better all of the sellers on Craigslist want to go. Someplace that there's buyers all of the buyers on Craigslist want to go someplace that their sellers. How do you get your new system started when it can't get started by like one person going on to dance list in two people going on to dance list? There's no motive for them to go there tell there's all
ready, a bunch of people on dance list and an awful lot of times when you find a system that is stuck in an evil space. What's going on with it is that for to move out of that space more than one thing inside it would have to change at a time. So there's all these nice people inside it who would like to be in a better system but every thing they could locally do on their own initiative. It's not going to fix the system and it's going to make things worse for them. That's the kind of problem that scientists have with trying to get away from the journals that are just rip them off and they're starting to move away from those journals, but yeah like journals, have prestige based on the scientists that published there and the other scientists that site them, and if you just I start this one new journal all by yourself and move there all by yourself. It has a low impact factor, so everyone's got to move simultaneously and that's how the how the scam went on
for like ten years. You know ten years is a long time, but they couldn't all jump 'cause it to the new system 'cause. They couldn't jump one at a time right there problem is that the world is organized in such a way that it is rational for each person continue to behave the way here he is behaving in this highly so optimal way, given the way everyone else is behaving and to change change your behavior by yourself isn't sufficient to change the system. And is therefore locally irrational, because your life get worse. If you change by yourself, everyone has to cordon they're changing so as to move to some better equilibrium. That's one of these sort of that's like one of the fundamental foundational ways that systems can get stuck.
There are others. The example that I often use when talking about problems of this sort is life in a maximum security prison which is as perversely, bad as well. I can imagine, and the sentence are aligned in such a way that no matter how good you are, if you're put into a maximum security prison, it is only rational for you to behave terribly and on ethically and in such a way as to guarantee that this place is far more pleasant than it need be just because of how things are structured. So it example that I've used that people are familiar with this point from having read books and seen movies that depict more or less accurately whether or not you're racist you're. Only rational choice apparently is to join a gang. That is aligned along the variable of race, and if you fail to do this you'll be preyed upon by everyone. So if you're, a White Guy
You have to join the Darien NEO Nazi gang, if you're a black guy, you have to join the black gang, otherwise, you know, you are just in the middle of this war of all against all and there's no way for you based on your ethical commitment to being non racist to change how this is functioning, and we're living in a similar kind of prison of sorts. When you just look at how nonoptimal many of these tractors It's our that we're stuck in civilization, Lee sort of like parenthetically. They do want to be slightly careful about worse using the word rational describe the behavior of people stuck in the system 'cause. I consider that to be like a very powerful word and it's possible that if they were all really irrational and had common knowledge of rationality, they would be able to solve the coordination problem, but humanly speaking, not so
in terms of ideal rationality, but in terms of what people can actually do and the options they actually have? Their best choice is still pretty bad, systemically yeah. So what do you do? in this book are aware. How would you summarize your thesis, but how do we move forward is there? Is there anything to do apart from publicizing the structure of this problem? It's not really a very hopeful book in that regard, it's more about how to predict which parts of society will perform poorly to the point where you as an individual can manage to do better for yourself really the one of the examples I give him the book
is that my wife has seasonal, affective disorder and I'm like okay, so there's theirs, and she cannot be treated by the tiny little white boxes that you that your doctor tries to prescribe time like okay. If the sun works there's some about of light that works, how about? If I just try stringing up the equivalent of one hundred light bulbs in our apartment. Now, when you have an idea like this somebody, my task well, okay, but like you're, not thinking in isolation, there's a civilization around you. If this works shouldn't there be a record of it shouldn't. We have investigated it already, The probably more than one hundred million people around the world expect, especially in the extreme latitudes, who have some degree,
it'll affect disorder, and some of it's pretty bad of that means that there's this sort of a kind of prophet kind of Energy, Grady Int that seems like it could be. Traversable if it. If solving the problem was as easy as putting up a ton of light bulbs in your apartment. Wouldn't one research enterprising researcher have investigated this already. Wouldn't the results be known, and the answer is, as far as I can tell it hasn't been, the results are known, and when I tried putting up a ton of light bulbs, it seems to have worked pretty well for my wife not perfectly, but a lot better than it used to be
why isn't this one of the first things you find when you google? What do I do about seasonal, affective disorder when the Light box doesn't work and that's what takes the sort of long story? That's what takes the analysis that would take that's what that's! What takes the thinking about the the journal system and what the funding sources are for, like people investigating seasonal, affective disorder and what kind of publications get the most attention and whether it's the the berrier of needing to put up a hundred light bulbs and a bunch of different apartments for people in the controlled study, which would be like difficult to blind, except maybe by using your life but
a lot fewer lightbulbs, whether the details of having to adapt to light bulbs to every house which is different, is that enough, with an obstacle to prevent any researcher from Ed F, are investigating this obvious seeming solution to a problem that probably over hundreds of millions of people, have, and maybe fifty million people or something have very severely. As far as I can tell. The answer is yes, and this is the kind of thinking that does not enable you to save civilization. If there was a way to make an enormous profit by knowing this it would probably already be taken if it was possible, for you is one if it was possible for one person to fix the problem. Probably already be fixed, but you personally fixed fix your wife's crippling seasonal, affective disorder by doing something
what science knows not because of an inefficiency in the funding sources for the researchers- and this is really that glow no problem- we need to figure out how to tackle, which is to recognize those points on which incentives are perversely, misaligned, so as to guarantee needless suffering or Compl XD, or failure to make breakthroughs. That would raise our quality of life immensely identify those points and then re align the incentive. Somehow the market is in many respects good at this, but there are places where it obviously fails. We don't have many tools to apply the right usher. Here you have the profit motive in market, so you can either get fantastically rich by solving some problem or not, or we have governments that can decide what is a problem that markets can't solve because the wealth isn't there to be
gotten strangely and yet there's a amount of human suffering that would be alleviated. If you solve this problem, you can't get people for some reason to pay for the alleviation of that suffering reliably. But apart from more tickets and governments, are there any other large hammers to be wielded here I mean sort of crowdfunding. I guess, although the hammer currently isn't very large, but I mean mostly like I said this book is about where you can do better individually or in small groups and when you shouldn't assume that society knows what it's doing, and it doesn't have a bright message of hope about how to fix things. I'm sort of like prejudice personally over here because I think that the artificial general intelligence time one is likely to run out before you mandy- gets that much better at solving inadequacy systemic problems in general. I I'd like I
I don't really see human nature or even human practice, changing by that much over the amount of time we probably have left and Economists already know about market failures, that's the concept they already have already have the concept of government trying to correct it. It's not obvious to me that there is a quantum leap he made saying within just those dimensions of thinking about the problem. If you ask me hey yes, it's five years in the future, there's still no artificial, general intelligence. Yeah people are that's like a clue. A a great leap forward has occurred in people to deal with these types of systemic issues. How did that happen, then? My guess would
be something like Kickstarter, but much better. That turned out to enable people in large groups like move forward when none of them could move forward individually, some something like the the the group movements that that scientists made without all that much help from the government, although there was help from from funders changing their policies to jump to new journals all at the same time and. Get partially away from the the Elsevier Closed source journal scam, maybe maybe there's something brilliant that Facebook does with machine learning. Even that enables these like that they they get better at showing people things, that are solutions to their coordination problems they're better at routing, though
around when they exist, and people learn that these things work and they they jump using them simultaneously and by this means voter start to elect politicians who are not in complex, as opposed to like a choosing, whichever name complete but offers most appealing but the. But this is a fairy tale This is not a prediction. This is, if you told me that somehow this had gotten fixed in five years, or rather like gotten significantly better in five years. What happened this is me making up with what might have happened right, but I don't see how that deals with the main eh. My concern we've been talking about. So I can see some shift or some solution to a massive coordination problem politically or in the level of widespread human, be Savior, so, let's say our use of
social media and our vulnerability to fake news and conspiracy theories and other crack pottery. Let's say we find some way to ols after our information diet and our expectations and solve a coordination problem that radically cleans up our global conversation. I could see that happening, but when you're talking about dealing with the alignment problem, you're talking about changing the behavior of a tiny number of people, can apparently I don't know what it is. What's the community of AI researchers now I mean it's gotta be numbered? Really? the hundreds when you're talking about working on a but like What will it be when we're close to the finish line? How many my I would have to suddenly change, can become immune to the wrong economic incentives to coordinate. Solution there. What are we talking about? Ten thousand people, I mean. First of all, I do
Think we're looking at an economic problem. I don't, I think, that artificial general intelligence capabilities once they exist, are going to scale too fast for that to be a useful way look at the problem as A0 going from zero to one twenty miles per hour in for our or a day. That is not out of the question here and even if it a year a year is still a very short amount of time for things to like scale up. I think that that what we're looking at with respect to artificial general intelligence, that the main thing you should be trying to do with the first artificial general intelligence ever built is a very narrow, ambitious task. That. Shut down the rest of the arm, brace by putting off switches in all the gpu's and shutting,
come down if anyone seems to be trying to build an overly art, initially intelligence system, and, if you, because I I don't think that that the AI you have built Neroly enough that you understood what it was doing is like going to be able to. From arbitrary unrestrained superintelligence is the ai that you have built understandably enough to be good and not just let down like fully general recursive self improvement is not strong enough to solve the whole problem. It's not strong enough to have everyone else going off in developing their own artificial general intelligence is after that, without that automatically destroying the world. So now, what can you say speaking for now over two hours? What can you say to someone who has followed us this long, but ever reason I think we've made has not summed to being
notionally responsive to the noises you just made. Is there anything that can be briefly said so as to give them pause? I'd say this is a thesis of capability gain. This is the thesis of how fast, artificial general intelligence gains in power once it starts to be around whether we're looking at twenty years, in which case the scenario does not happen or whether we're looking at something close, to the speed at which go was developed, in which case it does happen or the case or the speed at which alpha zero went from like zero, two to one twenty and better than human in, in which case, in which case there's there's a bit of an issue that you better preparing for in advance: 'cause you're not going to have very long to prepare for it once it starts to happen, and I would say this is a computer science issue. This is not here to be part of a narrative.
This is not here to fit into some kind of grand moral lesson that I have for you about how civilization ought to work. I think that this is just the way the background variables are turning up. Why do I think that it's not that simple? It's a mean. I think a lot of people who see the power of intelligence will already find that pretty into it is. But if you don't then You should read my paper intelligence explosion, microeconomics about returns on cognitive reinvestment and it points out. It goes things like the evolution of human intelligence and the logic of evolutionary biology tells us that, when human brains were increasing in size, there were increasing marginal returns to fitness relative to the previous generations for increasing brain size, which means that it's not the case
is that, as you scale intelligence, it gets harder and harder to buy. It's not the case that, as you scale intelligence, you need exponentially larger brains to get linear improvements, at least something like the opposite of. This is true, and we can tell this by looking at the fossil record and using some logic, but that's not simple. Comparing ourselves to chimpanzees works, we don't have brains that are forty times the size. Are four hundred times the size of chimpanzees, and yet we're doing by I don't know what measure you would use, but it exceed What they're doing by some ridiculous factor- and I find that convincing, but other people may want additional details and my message be that the emergency situation is not part of a narrative. It's like not there to make to make the point of some kind of moral lesson. It's my
prediction as to what happens after walking through a bunch of technical arguments as to how fast intelligence scales when you optimize it harder Alfa zero is seems to me like a genuine case in point that is showing us that capabilities that in humans require a lot of tweaking and that human civilization built up over centuries of masters teaching students how to play go and that no individual human could invent in isolation. Even the most talented go player if you stop them down in front of the GO board and gave them only add a wood wood, wood plate. Wood would like play garbage if they had to like an event all of their own go strategies without being part of a civilization that played, though they would like to be able to defeat modern, go players that all Alphazero blew past. All of that, nothing today, starting from scratch, without looking at any of the games that humans played without looking at any of the theories that humans
about go without with you any of the accumulated knowledge that we had and without very much in the way of special case code for go rather than chest and back to zero special case code for go rather than chess, and that is, and that in turn is an example of that refutes another thesis how artificial, general intelligence developed slowly and gradually, which is well. It's just one mind. It can't beat our whole civilization say that there's a bunch of technical arguments which you walk through and then, after walking through these arguments, you assign a bun probability, maybe not certainty, artificial intelligence that scales and power very fast a year or less, and in this situation, if it is, if alignment is technically difficult, if it is easy to see, grew up if it requires a bunch of additional effort and you
scenario, if we have an arms race between people who are trying to get their agi first by doing a little bit less safety caused from their perspective, that only the probability of little and then someone else is like oh no, we have to keep up. We need to like strip off the safety work, two less strip off a bit more, so we can get in front if you have the scenario and by a miracle, the first people across the finish line, actually not screwed up they actually have functioning powerful, artificial general intelligence that is able to prevent the war from ending you have to, about the world from ending, you are a terrible, terrible situation. You've got your one miracle, and this follows from the rapid capability game thesis and at least the current landscape for how these things are developing. Let's just laying on this point for a second this fast take off, is this so mean recursive self improvement and how french an idea is this in the
fields are most people who are thinking about this, assuming for good reason or not, that a slow takeoff is far more likely over the course of many many years, and that analogy Alfa zero is not compelling. I think they are too busy explaining why current artificial intelligence methods do not know ugly quickly immediately give us artificial general intelligence from which they then conclude that thirty years off, they have not said, and then once we get, there is going to develop much more slowly than zero and here's why there isn't a thesis to that effect that I've seen from artificial intelligence people Robin Hanson, had a thesis to this effect, and there was this:
the debate on our on our blog between Robert Hanssen myself, it was published as the a I spoon debate, many book and- and I have claimed recently on Facebook, but now that we've seen alpha zero alpha zero seems like strong evidence against Hanson thesis for how these things nothing, early go very slow, be cause, they have to duplicate all the work done by human civilization and that's hard, but I'm actually be doing a podcast with Robin in a few weeks actually live event. So what's the best version of his argument, and then why is he wrong? Nothing can prepare you for Robin Hanson. Well. The argument that Hanson has get has given is that Sis sims are still immature and narrow, and things will change when they get general, and my reply has been something like
okay, what changes your mind short of the world actually ending. If your theory is wrong, do we get to find out about that at all before the world ends to which he says I don't remember. If he's replied to that one, yet, Robin? Be Robin? Well, listen a been great to talk to you and I'm I'm glad we got a chance to do it at such length and again this does not exhaust the interest or consequences of this topic, but it's certainly a good start for people who are new to this. Before I but you go. Where should people look for you on line? Do you have a preferred domain that we could target out mostly say intelligence, DOT or GE, if you're looking for me personally, um facebook, dot, com, slash Dutkowsky and if you're looking for my most recent book could it be a book dot com put links on my website where I embed this podcast. So again, as thanks so much and to be continued. I always love,
talking to you, and this will not be the last time a I willing this was a great conversation, Thank you very much for having me on. If you find. Podcast valuable. There are many ways you can support it you can review it on Itunes or Stitcher or wherever you happen to listen to it. You can share on social media. With your friends, you can block, about it or discuss it on your own podcast or you can support it directly and you do this by subscribing through my website at SAM Harris, DOT, org and there you find subscriber only content which includes my ass anything episodes. You also get access advance tickets to my live events as well as streaming video of some of these events. You also get to hear the bonus questions from many of these interviews.
Transcript generated on 2019-09-15.