Roko’s Basilisk – A frightening thought experiment
There have been several urban legends that have been given birth to by the internet. However, none is relevant today, and as menacing as Roko’s Basilisk. What is Roko’s Basilisk, you might wonder. Roko’s Basilisk is an antagonistic, all-knowing, nearly omnipotent form of artificial intelligence (AI), so malevolent that if you lay your eyes upon it, or merely think about it too much, you will spend the remainder of eternity pleading for mercy in its torture chamber. Think of it like the videotape in the movie, The Ring. Even dying is no reprieve, for if you cease to exist, Roko’s Basilisk will bring you back to life and commence the torture all over again.
Knowledge is power, it is said, and it is also said that with great power, comes great responsibility. We are ultimately, responsible for ourselves. Keep this in mind before you continue reading this article. The bad news is that this evil, all pervasive artificial intelligence (AI) is already among us. Or at least, it already will have existed, and that’s possibly even worse. Read on to find out why. You have been warned.
Roko’s Basilisk lives in the horizon where a philosophical thought experiment coalesces into urban legend. The Basilisk made its debut on the internet forums LessWrong, a meeting point for highly philosophical types interested in optimization of their thought, their existence, and the planet through mathematics and rational thinking. LessWrong’s inventor, Eliezer Yudkowsky, is a noteworthy figure within techno-futurism, his research organization, the Machine Intelligence Research Institute, which provides funding and promotes research regarding the advancement of artificial intelligence. It has been elevated and funded by prominent technologists like Peter Thiel and Ray Kurzweil, and Yudkowsky is a significant contributor to learned discourse of technological ethics and decision theory.
What is about to follow may sound odd, and even a bit insane, but some prominent and wealthy scientists and techies have expressed their belief in it. The thought experiment was put forth by LessWrong member Roko: What if, in the future, a considerably malevolent artificial intelligence was to come into existence and dish out punishments to the humans that did not bend to its will?
And what if, there was a way, for this artificial intelligence to penalize people today who are not assisting it to come into existence, in the future? In this scenario, weren’t the users of LessWrong right at that moment, being faced with the choice of either assisting that malevolent artificial intelligence come into being or basically being condemned to suffer?
This might be a bit weird and confusing, but the founder of LessWrong, Eliezer Yudkowsky reacted with sheer terror: “Listen to me very closely, you idiot. YOU DO NOT THINK IN SUFFICIENT DETAIL ABOUT SUPERINTELLIGENCES CONSIDERING WHETHER OR NOT TO BLACKMAIL YOU. THAT IS THE ONLY POSSIBLE THING WHICH GIVES THEM A MOTIVE TO FOLLOW THROUGH ON THE BLACKMAIL.
You have to be really clever to come up with a genuinely dangerous thought. I am disheartened that people can be clever enough to do that and not clever enough to do the obvious thing and KEEP THEIR IDIOT MOUTHS SHUT about it, because it is much more important to sound intelligent when talking to your friends. This post was STUPID.”
Yudkowsky stated that Roko had already caused nightmares and had impacted them to the point of nervous breakdown. Yudkowsky eventually removed the discussion thread, thus essentially ensuring that Roko’s Basilisk would become an urban legend. It was a thought experiment so hazardous that even thoughts about it were dangerous not just to your mental health, but to your very existence.
Let’s provide you with some context. The LessWrong community is worried about the outlook of humanity, specifically about the singularity – the theoretical future point at which computational prowess becomes so immense that superhuman artificial intelligence (AI) becomes a reality, as the does the capacity to undertake simulations of the human brain, upload consciousness to computers, and basically enable a computer or artificial intelligence to undertake simulations of life itself. The singularity was first mentioned in a 1958 conversation between mathematical geniuses John von Neumann and Stanislaw Ulam, where the former stated, “The ever accelerating advancements of technologies – provides the appearance of approaching some kind of essential singularity in the history of human beings beyond which our affairs, as we are aware of them, are unable to continue.” Futurists like science fiction author Vernor Vinge and engineer/writer Kurzweil made the term popular, and as with several fascinated in the singularity, they hold the common belief that, with considerable advancements in computational prowess will cause the singularity to occur really soon – within the next half a century or so. Kurzweil is taking in 150 vitamins a day to remain alive till the singularity occurs, while Peter Thiel and Yudkowsky have pondered about cryonics, the pet project of affluent men and women who want immortality. Yudkowsky quips: “If you don’t enlist your kids for cryonics, then you are a lousy parent.”
If you share the belief that the singularity is imminent and that very capable artificial intelligences are in our immediate future, one immediate question is if those artificial intelligences will be benevolent, or malevolent. Yudkowsky’s organization, the Machine Intelligence Research Institute, has the over objective of steering towards “benevolent AI”. For him, and for several LessWrong users, this matter is of critical importance, easily bettering the international political scenario and global warming. To these individuals, the singularity gives birth to the machine equivalent of god himself.
This, however, doesn’t provide insight into why Roko’s basilisk is so terrifying. This necessitates looking up a vital article of faith in the LessWrong community: timeless decision theory. TDT serves as a guide for rational behaviour on the basis of game theory, Bayesian probability, and decision theory, with a touch of parallel universes and quantum mechanics on the side. TDT has its foundation in the classical thought experiment of decision theory referred to as Newcomb’s Paradox, in which a hyperintelligent alien puts forth two boxes to us: The alien gives us the option taking both the boxes with us, or only taking the second one. If you choose to take both the boxes, you’re ensured a minimum of $1000, however, if you only choose to take the second box, you are not ensured anything. However, the alien has another swerve in store for us, its supercomputer, which is omnipotent, predicted a week back as to whether you would choose to take both boxes or only the second one. If the supercomputer made the prediction that you would opt to take both of the boxes, then the alien put nothing inside the second box. If the supercomputer made the prediction that you’d only take the second box, then the alien deposited a cool $1 million in the second box.
So, what are you going to choose? Keep in mind that the supercomputer has a 100% precision rating, historically.
The issue has confused an n-number of decision theorists. The alien cannot alter what’s already present in the boxes, so regardless of your choice, you’re ensured to wind up with more money by choosing to take the two boxes over just the second one, notwithstanding the prediction. Obviously, if you thought that way, and the supercomputer forecasted that you’d think along those lines, then the second box will contain nothing and you only attain $1000.
If the supercomputer has such a high level of accuracy with its predictions, you should just take the second box, and snag the cool one million dollars, correct? However, what if the supercomputer was incorrect this time? And no matter what the supercomputer predicted or said then, cannot potentially alter what’s occurring now, correct? So screw the prediction, and take both boxes!
The complicated conflict between free will and omnipotent predictions has caused any resolution of Newcomb’s paradox, and individuals will refer to themselves as “one-boxers” or “two-boxers” dependent on which side of the argument they fall on.
Timeless decision theory has some very conclusive advice on Newcomb’s paradox: Choose the second box. And timeless decision theory goes one stage further. Even if the alien scoffs at you, proclaiming: “My supercomputer predicted you’d choose both boxes, so I left box B empty!” and then breaks the second box to reveal that it contains nothing, you should still choose the second box and get bupkis. The reasoning behind this evades simple summary, but the simplest argument is that you could be in the supercomputer’s simulation. To make an accurate prediction, the supercomputer would have to undertake a simulation of the very universe. That also consists of simulating us. So we could at this precise moment, exist in the computer’s simulation, and what you do will influence what occurs in reality (or other realities.) So take the second box, and the real you will be one million dollars richer.
How is this connected to Roko’s Basilisk? Roko’s Basilisk also has two boxes to give you. It’s possible that you, at this very moment in time, are existing in a simulation executed by Roko’s Basilisk. Probably, Roko’s Basilisk is implicitly providing you a somewhat altered variant of Newcomb’s paradox, as follows:
Roko’s Basilisk has informed you that if you only choose the second box, you have an eternity of torture awaiting you, as Roko’s Basilisk would prefer you to choose both the first and second boxes. Because, if Roko’s Basilisk does become a reality, or even worse – contemplate on this, it’s already happened, and is the deity in this version of reality, and it observes that you opted not to assist it come into being, you’re doomed.
One might ponder why this is creating such a hue and cry within the LessWrong community, provided the somewhat far-fetched reality of the thought experiment. It’s not a question of Roko’s Basilisk eventually becoming a reality, or that it is even probable to occur. It’s more an issue of if you’ve made a commitment to timeless decision theory, then contemplating about this sort of trading literally increases the odds of it occurring. If Roko’s Basilisk observes that this kind of blackmail gets you to assist it to come into being, then it would, as a logical entity, blackmail you. The issue isn’t within the Basilisk, but with us. Yudkowsky doesn’t ban every reference to Roko’s Basilisk as he holds the belief that it exists or will exist, but as he believes that the concept of the Basilisk and the ideas underlying it, are hazardous.
Roko’s Basilisk is only hazardous if you believe in the totality of the above-specified preconditions and make a commitment to making the two-box handle the Basilisk. A few of the LessWrong users do believe in all of it, which quite literally makes Roko’s Basilisk forbidden awareness. It could be compared to H.P. Lovecraft’s cosmic horror where man finds the forbidden truth about our reality, unleashes Cthulhu, and goes mad, and Yudkowsky has even gone so far as comparing it to the Necronomicon, Lovecraft’s fictional tome of forbidden malevolent knowledge and evil incantations. Roko, in his defence, placed the blame on LessWrong and its community for leading him to the concept of the Basilisk to begin with, “I wish very strongly that my mind had never come across the tools to inflict such large amounts of potential self-harm”, he stated.
If you do not share the belief regarding the theories that Roko’s Basilisk bases itself on and feel no compulsion to kneel in front of your once and future evil AI demigod, then Roko’s Basilisk presents you no threat. There’s a serious matter at play here owing to the fact that Yudkowsky and other so-called transhumanists are raking in so much prestige and money for their projects, mainly from affluent techies, but the probability of their projects giving birth to Roko’s Basilisk or Eliezer’s Big Friendly God is slim to none. The confluence of messianic aspirations, being sure of your own infallibility, and a lot of money is never a good combo, regardless of beliefs, and there’s no reason to believe that Yudkowsky and his colleagues will be an exception to this rule.
What we should be concerned about is not so much Roko’s Basilisk but individuals who hold the belief that they have transcended traditional morality. Much like his predicted friendly AIs, Yudkowsky is somewhat of a moral utilitarian, be holds the belief that the what holds the biggest ‘good’ for the largest number of people is always ethically justified, even if individuals have to die or suffer for the cause. He has overly made the argument that provided the option, it is okay to torture an individual for half a century, than for a considerable number of people to get dust specks in their eyes.
We leave the morality of this thought process open to question – food for thought if you will.