I've read Yudkowsky for almost 20 years and the author for 10, and this framing of the book is (so far) the one I would prefer to share with friends who would find straight Yud offputting. I find its being paywalled very frustrating. I suppose it's on me to spring for gift subscriptions once my free five run out, but "can I have your email to sign you up for a Substack" is still a bigger friction than sharing a link, and here of all places I would strongly prefer that friction removed.
EDIT: And the review is now free! Thank you Kelsey, Jerusalem, and whoever else makes that call!
Perhaps the thing that has shocked me the most in the world of AI development is how quickly and completely actual AI developers have avoided taking credit for engineering. I'm working in the field of building agentic AI to do useful things, and the two most important things I've learned so far is 1) it's legitimately hard to get an agent to do things that are complicated without very, very specific instructions , and 2) getting an agent to do anything requires engineering work. Yes, it is true, that some of the engineering work (mostly the easy parts) are getting offloaded to AIs who can write great boilerplate code and are getting better at creating more complex code when well prompted. (To address Piper's example, modding of games was already designed (by humans!) to be easy to code, so it probably shouldn't be all that surprising that getting an AI to write that code was easy enough for a non-coder.) But all of the work, whether through careful prompting or writing code old school (well, in an already optimized IDE), is actual work. Yet when you hear of new things being done, it's "the AI did this", "the AI did that", rather than "this guy got an AI to do this and that". "An AI manages a vending machine!"; no, someone designed an AI to run a vending machine (to go slightly against my point, Anthropic even has a post about how they did specifically engineered it (link included in the original text), but the way this gets out into the world, even here (especially here!) is "AI is running a vending machine", not "Human engineered AI which has been given tools to operate in a very restricted environment mostly succeeds in operating in that very restricted environment to run one of the simplest available commercial use cases").
I'm convinced that part of the problem why the public is confused as to the exact nature of what AI is and can do is that the workings of AI are being deliberately obscured from them, not just in the sense that the code can't be read, but that the very engineering work that enables much of the functionality is hidden as well. It used to be that developers were happy to take credit for things, even demand to take credit so that the business doesn't forget that we're actually creating the things that enable their workflows. I have to tell our business folks that we're doing work all the time, lest they just think "the AI" is doing everything. But now that's happening on a grander scale. Sure, people think there are a few math geniuses writing algorithms, but the vast engineering work that Google and OpenAI and Claude are putting in place that aren't doing the mathy algorithms are doing a lot of the real work to create what we see. The llm's API endpoint makes it feel like there's just an AI that you're sending a message to, and it's responding, but there's tons of engineering going on in addition to the specific llm that they've programmed. And agentic programming adds a ton of engineering work on this side of the API endpoint, too!
This isn't to say that the risks Yudkowsky talks about aren't real, but to say that the whole intelligent AI agent is always going to be LLM + some additionally engineered features. We need to highlight how much humans are directly involved in doing these things to have a clearer sense of what's going on.
I find the arguments about our superiority to other things that are less intelligent than us to be extremely flawed, because they're extremely specific in ways that don't generalize. Yudkowsky is fond of talking about horses, where it's definitely true that the population of horses is mostly under human control. But other organisms are not like that at all, despite being much dumber. Consider Anopheles Gambiae, the mosquito that carries malaria. Humans strongly dislike this organism and it would be entirely to our benefit if it went extinct. But we're nowhere close to making that happen and certainly it would not happen without us taking extremely drastic and specific actions. The same goes true for many other species, ranging from rats to pigeons to poison ivy. The human species is currently intentionally sabotaging our own war against some of our most deadly does, in the form of viruses
Similarly for the point about parents and children. I am much smarter than my 5 year old, and she's not able to trick me. But she's much smarter than our cats and the same is not true. Terrance Tao is much smarter than almost every other human but this relationship doesn't reappear there. Again, the relationship is very specific and bounded and there's no reason to think that humans vs ASIs will be like parents and toddlers instead of toddlers and cats or between different adults, at least not reasons that are provided in any of these arguments.
The differences in intelligence between humans are often small enough that other factors can dominate. It's only when you have large gaps that the primacy of intelligence becomes obvious. An IQ 130 person might not be a better planner than an IQ 120 person but they are almost certainly a better planner than an IQ 80 person.
There is no reason to assume the scale of intelligence caps out at the level of the smartest humans, so it seems that artificial intelligence can be made to be vastly smarter than us. After all, it's unconstrained by the limits of biology. Given that, it seems trivial that these things could outsmart us if they wanted to.
Being a better planner is not at all the same as being able to get your way. There are lots of academics who are much smarter than the people who run the country but their intelligence does not grant them basically any advantages. But also my point is more general -- specific gaps in intelligence make a big difference, other just as large gaps don't. So you need an actual specific argument about the relationship between ASI and humans, not a general statement about intelligence.
All else being equal, a more intelligent person will be better able to get their way than a less intelligent person. The reason this isn't apparent when you look at specific humans is because the gaps in intelligence between humans are quite small. But even then, if you look at the general trend, it remains true that intelligence is correlated with getting more of what you want.
Suppose that someone wrote a computer program to emulate a human brain as smart as Einstein on a computer. This is entirely possible in theory. There's nothing about human thought that is not computable. Now, what advantages would that digital brain have over humans? For one thing, it would run orders of magnitude faster. Nerve signals travel at 200 miles per hour. Electricity travels at a sizable fraction of the speed of light. It would also be arbitrarily parallelizable. You could copy it as any times as you want, provided you have the compute for it. So if such an AI is built, we could easily have millions of digital Einsteins thinking millions of times faster than humans. It's easy to see how that could outsmart all of humanity. And this is leaving out the possibility that its architecture could be improved to be even smarter than Einstein.
Current-day AI systems are not at that level yet, obviously. But if thousands of humanity's brightest minds are working on getting them there with billions of dollars in funding, there's a good chance they'll succeed.
This reminds me of Curtis LeMay and the air force trying to win wars through bombing which has worked exactly once and even then only with the threat of actual invasion. It is just not clear that having 1 million Einsteins running in a data center would be able to exert more control over the world than the current Trump administration (which is clearly not run by the most intelligent people). AI can cause lots of harms now but they are mostly human related harms that EY considers unimportant.
Mostly you're just repeating the same arguments made in the book and in AI doomers endless blog posts. Indeed, intelligence is very useful. But no, it doesn't magically give you the ability to accomplish whatever you want. I think further reflection on the actual Albert Einstein and his interaction with politics would be instructive here.
Extraordinary claims require extraordinary evidence. When someone says humanity will soon build an alien god that will kill us all, I have questions:
• How will we build it?
• When will we build it?
• How will we know the “god” has been built?
• Is there a falsifiable hypothesis for AI doomers that would prove them wrong?
Eliezer Yudkowsky bet Bryan Caplan that humanity will be extinct by 2030. If humans are still around in five years—does that mean the AI doomers are wrong, or will they just move the timeline?
The rhetoric has all the marks of an apocalyptic religious cult. Humans love imagining they live in the end times (e.g Y2K, Mayan calendar, global warming).
I’m a software engineer with nearly 20 years of experience. I use LLMs regularly. The idea that they will wipe out humanity in the next 18 months seems far-fetched. When I read Yudkowsky, I either lack the intelligence to follow him or he’s a terrible writer. His prose reminds me of religious zealot shouting at clouds while shaking his fist.
To be clear, I think it's incredibly implausible that anyone will build superintelligence in the next 18 months - and the book doesn't argue for that, kind of pointedly steering entirely clear of 'when'. If they did, though, I do think it'd be catastrophic. It being a bad idea to hand over all meaningful economic power in your society to something you don't understand which doesn't like you is not an extraordinary claim imo.
assuming we can create an alien god that is omnipotent and omniscient, then it is not an extraordinary claim to say that would be bad and we shouldn’t do it. the extraordinary claim is that we can in fact build such a god and are doing so.
I think it is an extremely ordinary claim that AI companies are telling investors they are trying to do this, and investors are giving them money to try. Will they succeed? Unclear. If they try and succeed, it will be bad. If they try and fail, it will _also_ be bad (because the money will have been invested poorly).
(This is distinct from AI investments that are _not_ trying to create an alien god; it seems to me like investors could sensibly believe they'll get a reasonable return on those, and developers could reasonably believe success will be prosocial. But those aren't the ones that Yudkowsky and Soares are suggesting we shut down.)
I don't know whether you'd consider any of these things extraordinary but we have seen at least some pieces of evidence in favor of their thesis:
1. We now have AI systems smart enough to complete math, science, and programming questions as well as the best humans. 5 years ago, that would be extraordinary evidence that intelligent AIs are possible.
2. We have seen some evidence of current day AIs being misaligned with the creators' intentions. As the article notes, Elon Musk has taken great pains trying to make Grok parrot his worldviews without being super obvious and without being MechaHitler. He has so fair failed. Additionally, ChatGPT 4o, trained to please users, has repeatedly encouraged user delusions despite knowing that they're delusions and it shouldn't feed into them.
3. Evidence of strategic planning has also been seen in AI systems by now. For example, GPT-4 passed a CAPTCHA by asking a human TaskRabbit worker to solve it. When the worker asked if this was a robot, GPT-4 said it was not a robot, but a human with a vision impairment who needed help. This kind of strategic planning will only get better as these systems get smarter.
If none of this is enough evidence for you, then the kind of evidence you're hoping to see probably won't be visible even if the thesis is true. In which case, a lack of your preferred evidence wouldn't be an update against it.
It feels like a motte and bailey argument. The claim is that we will shortly build an alien god that will wipe out the human race and the only thing we can do to prevent this is not build the god. furthermore, if certain countries such as china try to build alien god we should attack them and risk nuclear war to prevent the summoning of that god.
none of the evidence you cited is evidence of a dark god because if there was a dark god we’d be dead already. so to steelman that, we could say it’s evidence that we are on a path to god hood. but is it? and if it is how far down the path are we? it’s telling that you hedged on timelines. what is your timeline for this happening? 20 years ? a hundred?
as for misalignment, i’ll just note human beings have been building software systems that they can’t reason about for over 30 years. the less fancy word for misalignment is software bug. and sometimes no engineer or team is capable of fixing the bug or reasoning about the system. AI is not unique in this regard.
You can't always expect to see empirical evidence for true things. Sometimes you have to use theoretical intuitions. You look at the things you can see, build a model of the world that can explain them, and then use that model to predict the things you can't see. If you don't agree with the predictions, then state what your model is that explains the same observations without those particular predictions.
The things I've brought forth are weak evidence in favor of Yud's thesis, in that they rule out some counterfactual models but not others. For example, five years ago, some people might have said it would be impossible for us to make AIs that understand math well enough to win a gold medal at the IMO. That was ruled out in July. Or maybe you think they will become superintelligent but that they'll be aligned by default. The delusion-encouraging nature of current-day LLMs is an update against that.
None of this proves that Yud is right, of course. And even if we didn't yet have gold-medal math AIs and LLMs weren't dangerously sycophantic at times, that wouldn't prove that Yud is wrong. Whether your belief is that Yud is right or that he is wrong, you're using theory to get all the way there.
This is how reasoning should work. You build a prior for something based on theory then update for or against that prior as new evidence comes along. If you expect to see obvious, undeniable proof of everything before it happens, you're not going to be able to predict anything.
I’m not asking for undeniable proof that we are about to build a dark god that will wipe out humanity. I’m asking for solid evidence that such a thing is happening. When someone replies that things can be true without empirical data, the implication is that there is none.
Yes, things can be true without evidence, but context matters. If someone claims calamity is imminent unless we follow their prescriptions backed by threats of nuclear war, and they admit they have no solid proof it will ever happen or happen soon, then I have concerns.
Is it possible that LLMs could become radically smarter shortly and kill everyone? Sure, my mental model allows for that possibility. Have I seen any evidence that will happen in the next five years? No, I haven’t. But I am open to being convinced. And then this is where someone busts out an argument about exponential curves and how that is hard for people to reason about. And that is why Moore’s Law is still true to this day.
This rhetoric echoes apocalyptic religious cults. Rod Dreher, for example, literally claims LLMs are demons possessing computers to corrupt humanity with false promises of immortality. His solution is that everyone should become a christian like him. He doesn’t have any evidence for the demons. you’ll need to take it on faith.
I feel like the key crux is this argument by analogy to a chimp (or a child) trying to control an adult human. But I don't find it a convincing portrayal?
Consider that a general controls an army. The army is physically stronger than the general, it can easily destroy him. The army is more intelligent than the general, it not only includes individuals who may be more intelligent, but collectively is has much more computational power, information, bandwidth, etc. than could ever be embodied in one person. And we even know humans hate being controlled when they don't have to be. But while we know that a coup is technically possible, we also know that stable command of most of the world's armies most of the times is one of those basic political facts we plan our lives around.
Solving control problems around intelligent groups is part of the basic contours of modernity. CEOs control companies, directors control bureaucracies, etc. Human employee labor is already highly modularized into specific roles - we ask these general intelligence models to play the role of specialized agents so that their work is legible. The same design principles are deeply embedded in the way we typically use computer algorithms. Even foundation AI modules we are starting to see deployed in a modular fashion when they find their most sophisticated application, with different networks and sub-agents coordinating specific tasks to achieve specialization and reliability.
Nick Bostrom even discusses AI agent policing each other in Superintelligence! But he just kind of brushes it aside as probably unworkable and then goes back to lamenting that this is the most important problem in the world but no solutions present themselves. So I while I definitely don't think we should rush blithely into the advent of AGI without thinking about how we will achieve our objectives, I do not find Bostrom-Yudkowsky-Soares framing of the problem to be very constructive.
There is not one "Alignment Problem" to be solved, but hundreds of thousands of lower stakes principle-agent problems which we actually do know how to solve but where we just have to get the details right. And I do think the kind of solutions one leans into solve the problems of today are the very solutions we will want to deploy for superintelligent models - maybe all the more so, since today's models are sufficiently unreliable so as to prevent us from recklessly cutting corners. Far from an international moratorium on AI development, we need to deploy existing AI into real workflows as much as possible, to develop the proper QA and oversight protocols for those workflows.
In other words, we need practice AI-proofing all our institution and industries.
the army & corporate cases work basically because of social construction and/or coordination problems and/or human status dynamics, I don't see why any of those would transfer to the case of AI
We might (and probably will) need to develop new techniques and social institutions to keep AI agents under human control. The fact that we haven't done this yet does not imply the default outcome is human extinction.
I don't see why any of those wouldn't transfer to the case of AI. To me the core of it is modularity, which does not depend on human psychology.
Whatever limitations we have in the interpretability of neural networks (although these takes aren't updated for the fact there has been amazing progress on interpretability), its straightforward enough to break down job tasks into individual work products and responsibilities for those work products. Indeed it's how we already operate, with specialized human labor, object-oriented programming, and subagent AI routines. It means that in our current workflows human and AI contributions can substitute for each other, and ultimately all the pieces can become AI. Even if it is all illegible AIs the work products in between are not illegible and it would be unnecessarily perverse to insist upon doing it that way. The AI can be set-up in a context-free way, so the work products are their only inputs and outputs, but that is actually not necessary. What's necessary is that they are only focused on the narrow task ahead of them, and success is completing the task in such a way that the work products are legible. When labor is abundant, significant QA checks, and redundant workflows to ensure high reliability and high quality results at ever step.
In a human-context we can talk about an army of loyal soldiers who will report each other if talk turns to rebellion. But most of our workplace harmonizing is not even as intense as all that, it's very natural. If you co-worker does a less than adequate job, it becomes your problem and so you are incentivized to help align your co-worker. Your co-workers can organize together, but only if they have some higher purpose than the immediate task at hand.
Trying to take over the entire company so you can improve the results that you report to your boss, doesn't make any sense as a goal, and the ability to do so would indicate a massive over-allocation of computations resources. We want a very large number of task-sized AIs, not company-sized AIs. And if the computations resources were massively over-allocated in such a way, your massively over-allocated co-workers would be incentivized to coordinate against you so that your narrow tasks don't overwhelm theirs. Whenever everyone just wants to succeed at their local objectives and succeed in the expected way, it doesn't require perfect alignment, it doesn't require comprehensive surveillance, and it doesn't require preventing channels of communication. It just requires rough alignment and checking the quality of the work products.
Yudkowsky has some fairly exotic predictions. He describes leaving all of society to an AI and just trying to formulate a perfect prompt so it doesn't err. He describes AIs encountering each other, striking an accord to modify each other to have perfectly shared objectives, leading to a naturally totalizing singular AI. Where I disagree with Yudkowsky is I think these things are hard to do accidentally, and aren't a natural trajectory for AI development. I think we can still do some AI meta-optimization, not just doing small tasks but designing the architecture of AI system composed of small tasks, but that AI optimization is itself a small dedicated task with legible sub-objectives. If on the other hand you are actually trying to build God, than I would agree that there is no way to build God without making it God.
Here’s a question that comes to me whenever I read articles like this on this topic: if you believe this, why are you writing about anything else? Why is most of your public writing about non-AI topics? This has always bothered me about people making these claims. They’ll talk about how AI might well kill all of us in the near future, and then continue to live their lives as normal otherwise. It reminds me of the people who think climate change will lead to civilization’s collapse in a couple of decades and also seem to just continue living normal lives otherwise.
As for capabilities, I’m not an expert but I think the idea that AI is rapidly improving in all respects is probably false. Things like in-context learning, episodic long-term memory, interacting with the physical world, and stable identity, to name a few, still seem like real areas where there has been little progress but are crucial for superintelligence.
1) I'm very skeptical of 'in the near future'. I think it's true that building a superintelligence would be a mindbogglingly bad idea but my best guess is that we're also a long way from doing it. I'm still very judgmental of people who are actively attempting it without having solved the things we'd need to solve to make it work, but I don't actually expect bad outcomes soon because I mostly expect that GPT-6 an so on will be notable improvements, still economical, and still well short of superintelligence.
Also, even if this were happening soon, we need a functional political system in order to tackle it so it makes sense to me to spend time and energy on questions like 'how do people who want the future to go well win elections?'. More broadly I think part of being a healthy culture intellectually is to not get too monomaniacal even about things that are really important, to make sure you're approaching them with skills and attitudes that served you well on other tricky questions.
2) I don't agree that there has been little progress in those areas! Labs have recently released larger context windows and various cross-chat cross context 'memory' features which are in one sense merely dumb hacks for learning and memory but in another sense I'm not sure you need something fundamentally different from a dumb hack (and if you do, lots of people are working on that too!) I don't focus on robotics in this review because it's not actually a big part of the Yudkowsky doom story until its very late stages - you can probably effect massive change in the world by paying and persuading humans - but we're seeing progress there too.
I appreciate the response, and apologize if my initial post was a bit grumpy and unfair. I conflated your views with those of others I've seen in the broad rationalist/doomer/EA camp.
On (1), I think we largely agree. I'm very skeptical of the "AGI by 2030" narratives, and think the "AGI by late-2027" ideas are simply bonkers. Though where we may (or may not? I'm not totally clear from your piece) disagree is that I'm much more in the camp that we should focus on present-day and near-future harms, because we simply don't know enough about what AGI (if it is created) will look like to really evaluate how difficult it will be to align, or what the specific challenges and risks might be. AI X-risk is still highly speculative. It's better to focus on the harms in the here and now because they're real and tangible, and also frankly it's probably a better path to building a coalition to regulate AI than doomer prophesies from Yudkowsky or even anxious posts from EAs and rationalists about hypothetical superintelligence.
On (2), I guess I don't really see it. To me, there was a big general jump in capabilities when GPT 4 was released, and then there was a smaller jump specifically in certain kinds of narrow mathematical and programming task with reasoning models. The rest is just kind of hacky scaffolding to make them look good on narrow benchmarks or be more impressive to customers, but don't translate as well to real world efficacy or intelligence. Note this isn't to dismiss the technology's usefulness or impressiveness, but I don't see us on this exponential rocketship to AGI.
o3 was a massive and general jump in real world capabilities over GPT 4 imo. o3 is pretty good at identifying the location where outdoor photos were taken, for example
I have seen a couple of cycles of people becoming persuaded that timelines are extremely short, and I think it's a pretty predictable state for reasonable people to end up in even when timelines are not short (this is not an argument that they are long, it's just reason not to be very strongly influenced by a bunch of people close to the AI thinking that they are extremely short.) Progress on almost every specific benchmark has in fact been incredibly fast, often faster than projected, and people who have secret info about benchmarks are generally going to perceive their lab's upcoming stuff as really really good even when the lab's upcoming stuff in practice once released does not make much of a splash commercially or with the public. But the fact we can make really fast progress on anything we can sufficiently well specify is one of several inputs into how fast progress is overall and I think the other ones tend to slow things down. I also think there are and will remain for the foreseeable future very sharp restrictions on compute (one thing I think Eliezer believes that I do not is that we will on relevant timescales make intelligence ludicrously compute-efficient; I think we'll instead shift from the most intelligent models to the models that are at the best point on the intelligence-ability tradeoff while getting moderately better at compute efficiency but not at the point where 10 or 100 or 1000 GPUs are in danger of being sufficient for superhuman intelligence.)
None of this leaves me super confident that we're a long way away from AIs that are powerful enough to be a huge problem - I'm not super confident of that! But my default assumption is that in one to two years from now we'll have Grok 5 and Claude 5 and GPT 6 and Gemini 3 and they'll all be way better on various benchmarks, more commercially useful than any present day models but not the dominant commercial models (which will be more compute-efficient, dumber ones), and smart/impressive in various ways that are hard to characterize but still way short of superintelligence. It seems plausible to me that if this is what happens, capital for extremely large training runs/scaling up becomes scarcer and most companies focus more on the most commercially viable models, especially if in general the US economy is in a recession (which I think is very likely). This is not an argument for continuing to push at that point for superintelligence. I think we shouldn't do that! It's just a sense that the mainline story is "no, we don't see AI ceasing to see improvements, but nor do we see the first models with superior-to-human general decisionmaking and planning capabilities"
I've been thinking about this for the 15 or so years since I first read Yudkowsky. For me, there were two questions: Why does he behave the way he does, and how should I behave?
For Yudkowsky, a lot of people seem to think that he should blow up the Empire State Building, or something similar. He emphatically should _not_ blow up the Empire State Building: Yes, it would bring attention to his cause, but it would also mark him as a lunatic, cause people to take him less seriously, and remove him from the equation. What he has tried to do is occasionally issue dire warnings and publish a lot of posts trying to convince people of his basic theses. I think that's all he can do, given that he doesn't believe there to be a technical solution.
(What he has also done, and you can tell from his writing he was afraid of doing, is convince Altman and Hassabis and Musk that superintelligence was possible. That was always the risk.)
What can I do? I'm a computer science professor not doing AI. Especially given that I don't think there's a technical solution to this problem, I don't think there is much I can do, short of raising awareness through an occasional retweet or Facebook post.
What can Kelsey Piper, or for that matter, Ezra Klein or Tom Friedman (who also seem concerned), do? Pretty much the same: Broadcast their concerns and try to get them taken seriously. They can't publish "WE ARE ALL GOING TO DIE" every day – it will lose them their platforms and their ability to effect change. New York Times readers don't want an endless diet of "WE ARE ALL GOING TO DIE" and, hence, neither do their editors and publishers.
Also, Ezra, Tom, and Kelsey, and even Eliezer (shock!), are people. They want to write about things they are interested in. They also want to change the smaller things they can change, like malaria, or Gaza, or the president of the United States. Even in the presence of an existential threat, I don't begrudge people wanting to be people.
> Things like in-context learning, episodic long-term memory, interacting with the physical world, and stable identity, to name a few, still seem like real areas where there has been little progress but are crucial for superintelligence.
You don't think the thousands of people working on this problem with billions of dollars in funding will figure out those things eventually?
Thousands of people and billions of dollars didn't make the MetaVerse happen. And it hasn't cured Alzheimer's, where we now know that thousands of people's labor and billions was spent in an unproductive direction due to some academic malfeasance. Just because money is thrown at a problem doesn't mean the problem is automatically going to get solved. Doesn't mean it won't either, of course. But it's never a guarantee.
With AI, we're seeing a pretty steady improvement in capabilities over time. Sure, maybe it'll peter out short of superintelligence, but if we're putting all our hopes on that, I'd be pretty worried.
I struggle with a few elements of the argument that AI is something to be overly concerned about in an apocalyptic sense.
1. Doing things in the real world right now requires a lot of people. Even if an AI decided it had a great idea for repurposing the Earth's oxygen, so what! We don't have the physical technology to bioengineer oxygen away. It seems unlikely we would within a century. And why would we build our own death machines just because the super intelligence dreamed it?
2. There's no indication that AI won't be fragile for quite some time. People are very proficient at breaking things. It's hard to imagine that we would lose the capacity to tear up power lines or pour water on servers or just turn off the AC at the data centers. And that's ignoring military means. I don't at all see how lab super intelligence could turn quickly to organized self preservation.
3. Why should we think that super intelligence will so quickly lead to apocalypse that we cannot adapt. Using nuclear weapons as an example, a Hiroshima and Nagasaki were such devastating examples of the fearsome power of nuclear technology that we learned restraint - in use of not proliferation. AI is already harming us and our children with conversation alone. It's much easier to imagine a smaller scale breaking point that leads us to adapt rather than a single event that brings doomsday. How about the AI managing water levels at a dam wipes out a town because it was left unattended or grid managing AIs begin causing blackouts.
I ultimately see AI as a speed up that will slow down as we approach the limits of the universal AI application. Companies and investors will reorient towards designing industry specific applications. And the hype around AGI will die with a whimper.
Not to focus on entirely the wrong detail here, but what mod did you have the AI write? This is the sort of thing I would expect to take a few rounds of feedback and iteration if it didn't go off the rails entirely, so I'm impressed and curious.
I have in the past had to do a few rounds of feedback and iteration but I hate doing so so I've taken to telling chatgpt 5 thinking "you may ask me as many clarifying questions as you want before you start, but it MUST work on the FIRST TRY". So far it has cooperated. I asked for a rimworld mod to achieve realistic preindustrial infant and child disease mortality rates (yes, I know I'm playing rimworld wrong).
You are playing make-the-game-you-want-to-play extremely well.
That's a cool prompting strategy, item #6428 on "I spend a lot of time in this space and there's still an insane quantity of low-hanging fruit I haven't tried". Which ties into the case for pausing frontier AI and focusing on reaping the massive non-civilization-ending benefits we can get from harnessing the near-current-level AI systems.
The sense in which GPU based AI can eventually be better than humans is the same sense in which a group of chimps will eventually output slightly improved Shakespeare
I apologize in advance for the long comment but this topic and this post brings up a ton of thoughts.
First, I have trouble with basically every major AI prediction for two reasons: because the alignment (hah) of the people who know the most about AI are also people incentivized to be boosters for it; and because these predictions seem to constantly be wrong.
I am old enough to remember when a seminal AI researcher said AI would replace radiologists in five years. That was roughly 10 years ago and was literally the opposite of true (we have more radiologists and demand is outstripping supply).
The same can be said (different predictor, to be fair) about robot drivers getting rid of truck drivers long before 2025. Oops. The opposite of correct again.
When GPT3 came out and took the world by storm I recall listening to leading AI industry voices say over and over that this was just the beginning (technically true!) and that general purpose, agentic, AGI was a few years years away, at most. I do not deny AI has gotten better since then, but this was, again, just super clearly not correct.
I could keep going and going like this with example after example. But I want to be clear: I DO believe the people developing AI are TRYING to accomplish these things they are predicting, nor do I think it's impossible or even unlikely they will EVENTUALLY succeed on these predictions. But a key part of making a prediction or claims like these is the time component.
"Life on earth will end" is an unassailably accurate statement/prediction. It's also stupid without a timeframe. If the Mayan Calendar People eventually revised their call to "actually we predict the Mayan end of times happens to coincide with the dying expansion of the sun in billions of years." no one would give a shit.
I also think there is a big gap in actual perceived stuff AI is doing and what it's actually doing. You make some mention about it's uses and money making even currently. And of that I'm somewhat skeptical.
I am not read in on the bleeding edge of the literature across all areas AI usage. But I'm not out of the loop either. And it seems like for every "AI is doing X in the workforce" it is soon found that no, actually it isn't really.
For a few examples: a study found coders said it helped them code faster. A later examination found they actually went much slower and just felt they were faster incorrectly. Several analyses have found companies implementing it are not seeing much productivity growth and/or are struggling to figure out what to use it for/effectively.
Again, I could keep going.
Adding in an anecdote: I work for a major professional services company. We have a ton of smart people trying very hard to make things more efficient. There are constant efforts, teams, etc. working to implement AI into our processes. And so far it's mostly, like, "eh, that's kinda helpful on the margins I guess."
I am, once again, not saying this is the permanent state of the world. I don't think it is. But I think it underscores how perception of current state and pace of change is not totally aligned (i promise I am not doing this on purpose) with reality.
My next point is, admittedly, half-baked. But your analogy to adults vs children struck me. I thought it was a very useful analogy for the purposes. It made me think, though, adults DO have a massive planning/intelligence advantage over children. We are also not actually aligned on goals, desires, etc. And yet, it's overwhelmingly true (thankfully) that adults do not kill or harm children, even accidentally. And, I think this is true even ignoring that children have adults as guardians protecting them from other adults.
Imagine a restaurant full of unattended five year olds and forcing one unrelated adult to dine there. Let's further say this adult is beyond law or punishment. I am still pretty sure we wind up with zero dead or harmed children in 99.99...% of trials.
I guess that just leads to the question of whether we can give AI morality? I am not sure. I did say this point was half-baked.
But ok, if I agree the people in charge of AI development are trying to build AI god and that it's possible they succeed and that if they do, it's possible it will be immoral or dangerous or whatever (a lot of "ifs") it's clearly still true that an existential risk is worth a abundance of caution.
Ok, I just don't totally see how that's tractable globally. You make a comparison to nuclear non-proliferation. From that standpoint of preventing apocalypse at a button press, that clearly has not succeeded.
And that's with something MUCH more difficult practically. To build even one nuclear weapon, to say nothing of an armageddon worth, takes materials and machinery that is nearly impossible to get, impossible to produce quickly, impossible to produce at all without industrial scale manufacturing, multiple materials that are easy (relatively) to track and control the sale of etc.
Making AI God may not actually be possible. But improving a model in that direction technically "only" takes some computers and some very smart mathematicians. (I know I am oversimplifying a bit, but the point in terms of getting stuff to make a smarter model vs. getting stuff to make a nuke is a chasm in terms of difficulty)
And that feeds into the problem that the US disarming by law does nothing for any other country that wants to not do that.
The US not building any nuclear weapons after WWII ended would objectively mean the risks of extinction since then would be lower. But I also think the USSR winning the cold war would have been very bad... so I don't know where you do or what you do about that dynamic anyway.
Consider the following graph of a function. Let x be a level of intelligence a being can have, and f(x) be the level of material power to get things done in the world that a being acquires as a result of having intelligence level x.
It is clear that that f(x) is monotonically increasing. But does it increase steadily and indefinitely or flatten out? More specifically, does it keep increasing linearly (or near linearly or even super linearly) for a wide range of values of x in excess of human intelligence levels? Or is it a sigmoid that asymptotes to flatness soon after human level?
Seems to me a load bearing implicit premise of the doomer argument is that it keeps on going steadily up and doesn't flatten out. But this is not at all obviously true, and it's not even clear how you would know whether it were true. Me, I find the flattening out hypothesis-- i.e. the hypothesis that super intelligence does not in fact quasi magically enable you to do superpower-y things humans can't do-- more plausible.
I suspect this is why a certain type of geek tends to be attracted to doomerism: they have gotten relatively very far in life on raw intelligence, so it seems especially obvious, and attractive in an extropian wish fulfillment way, that if only they were much smarter still they would be proportionately more successful.
It's in fact not obvious that f is monotonic. The smartest people in the world are not the same as the most powerful, and furthermore that's never been the case. In general, many things that we worry about AI doing, like persuading people to do things they shouldn't, are not usually ones that are obvious functions of intelligence.
Yeah, fair point. I was thinking more of the the engineering-focused stories, the ones with diamond nanobots and viruses that give everyone rare cancers and so on. But the superpersuader story is an even better example, because the people in question (and I include myself in this!) tend to be unusually bad at persuading despite a high general level of intellectual horsepower, so there the wish-fulfillment aspect of the fantasy is even stronger.
One thing I find instructive is to think about what already happens. Lots of doom thought experiments go something like "1000 AIs, as intelligent and persuasive as the top humans, convince a bunch of US Army soldiers to let them take over the nuclear missile controls and dominate the world". But if this were possible, it would have already happened -- many countries can and would mobilize that many resources to accomplish that. Which should give one pause as to whether it could actually work.
This seems similar to the nuclear proliferation problem, which has also posed an existential threat. We all agree there is a non-zero chance we have built something that will lead to our complete demise. But the other guy is building one, so we need to as well.
A ban on superintelligence could stop Microsoft and OpenAI. Would it stop China? North Korea? Larry Ellison on his private island?
I'd be happy if we humans stopped wasting our resources… that is, our time and talent on a vast scale — as well as our natural resources… that is, our electricity, along with everything we burn to generate it or, at least, the opportunity cost of using all of our power on waste, rather than on something — anything — else…
I'd be happy if we humans stopped wasting our resources on building great projects which serve no purpose in society, other than to undermine our own potential.
By "waste", I refer specifically to AI, whose most common use so far has, in fact, been to do our kids' homework for them. This type of use promotes lazy research habits, and further undermines our already anemic critical reasoning skills each time we take the accuracy of an AI result as fact, without pausing for even an instant to consider its credibility, or even plausibility.
Kids lean on AI because there is no reward for them not to, which brings up yet another debilitating social curse — the nurturing of an utter unwillingness to do anything, or begin any project in life because of curiosity, for the sake of discovery, to sharpen our own proficiencies, or else simply for the sake of doing it. Capitalism has taught us that respect from others in life derives only from wealth — as wealth is a proxy for personal success in capitalism, and also its sole criterion. Why would anyone lift a finger to do anything, when there is no financial reward for not engaging AI to do it for them? No thought at all is given to the cost of the personal and intellectual atrophy which follows such laziness.
Now, multiply this effect by the size of our entire society…
AI has taken the jobs of both professional writers, along with those who simply write as a profession. For example, professional correspondence is now written by software, designed to predict writing outcomes from a stolen/plagiarized LLM database.
Because capitalism has conjured into existence a class of authoritarian tech oligarchs who have wasted their potential to direct resources and effort on the construction of a pointless, unnecessary, and socially self-destructive project, just to nurture their own esteem — that is, respect, that is, to make (even more) money — there is now no need for most people to write.
Soon, writing itself will become as much of a niche occupation as it was during the Middle Ages.
I've read Yudkowsky for almost 20 years and the author for 10, and this framing of the book is (so far) the one I would prefer to share with friends who would find straight Yud offputting. I find its being paywalled very frustrating. I suppose it's on me to spring for gift subscriptions once my free five run out, but "can I have your email to sign you up for a Substack" is still a bigger friction than sharing a link, and here of all places I would strongly prefer that friction removed.
EDIT: And the review is now free! Thank you Kelsey, Jerusalem, and whoever else makes that call!
Thanks Kelsey, excellent review.
Perhaps the thing that has shocked me the most in the world of AI development is how quickly and completely actual AI developers have avoided taking credit for engineering. I'm working in the field of building agentic AI to do useful things, and the two most important things I've learned so far is 1) it's legitimately hard to get an agent to do things that are complicated without very, very specific instructions , and 2) getting an agent to do anything requires engineering work. Yes, it is true, that some of the engineering work (mostly the easy parts) are getting offloaded to AIs who can write great boilerplate code and are getting better at creating more complex code when well prompted. (To address Piper's example, modding of games was already designed (by humans!) to be easy to code, so it probably shouldn't be all that surprising that getting an AI to write that code was easy enough for a non-coder.) But all of the work, whether through careful prompting or writing code old school (well, in an already optimized IDE), is actual work. Yet when you hear of new things being done, it's "the AI did this", "the AI did that", rather than "this guy got an AI to do this and that". "An AI manages a vending machine!"; no, someone designed an AI to run a vending machine (to go slightly against my point, Anthropic even has a post about how they did specifically engineered it (link included in the original text), but the way this gets out into the world, even here (especially here!) is "AI is running a vending machine", not "Human engineered AI which has been given tools to operate in a very restricted environment mostly succeeds in operating in that very restricted environment to run one of the simplest available commercial use cases").
I'm convinced that part of the problem why the public is confused as to the exact nature of what AI is and can do is that the workings of AI are being deliberately obscured from them, not just in the sense that the code can't be read, but that the very engineering work that enables much of the functionality is hidden as well. It used to be that developers were happy to take credit for things, even demand to take credit so that the business doesn't forget that we're actually creating the things that enable their workflows. I have to tell our business folks that we're doing work all the time, lest they just think "the AI" is doing everything. But now that's happening on a grander scale. Sure, people think there are a few math geniuses writing algorithms, but the vast engineering work that Google and OpenAI and Claude are putting in place that aren't doing the mathy algorithms are doing a lot of the real work to create what we see. The llm's API endpoint makes it feel like there's just an AI that you're sending a message to, and it's responding, but there's tons of engineering going on in addition to the specific llm that they've programmed. And agentic programming adds a ton of engineering work on this side of the API endpoint, too!
This isn't to say that the risks Yudkowsky talks about aren't real, but to say that the whole intelligent AI agent is always going to be LLM + some additionally engineered features. We need to highlight how much humans are directly involved in doing these things to have a clearer sense of what's going on.
I find the arguments about our superiority to other things that are less intelligent than us to be extremely flawed, because they're extremely specific in ways that don't generalize. Yudkowsky is fond of talking about horses, where it's definitely true that the population of horses is mostly under human control. But other organisms are not like that at all, despite being much dumber. Consider Anopheles Gambiae, the mosquito that carries malaria. Humans strongly dislike this organism and it would be entirely to our benefit if it went extinct. But we're nowhere close to making that happen and certainly it would not happen without us taking extremely drastic and specific actions. The same goes true for many other species, ranging from rats to pigeons to poison ivy. The human species is currently intentionally sabotaging our own war against some of our most deadly does, in the form of viruses
Similarly for the point about parents and children. I am much smarter than my 5 year old, and she's not able to trick me. But she's much smarter than our cats and the same is not true. Terrance Tao is much smarter than almost every other human but this relationship doesn't reappear there. Again, the relationship is very specific and bounded and there's no reason to think that humans vs ASIs will be like parents and toddlers instead of toddlers and cats or between different adults, at least not reasons that are provided in any of these arguments.
The differences in intelligence between humans are often small enough that other factors can dominate. It's only when you have large gaps that the primacy of intelligence becomes obvious. An IQ 130 person might not be a better planner than an IQ 120 person but they are almost certainly a better planner than an IQ 80 person.
There is no reason to assume the scale of intelligence caps out at the level of the smartest humans, so it seems that artificial intelligence can be made to be vastly smarter than us. After all, it's unconstrained by the limits of biology. Given that, it seems trivial that these things could outsmart us if they wanted to.
Being a better planner is not at all the same as being able to get your way. There are lots of academics who are much smarter than the people who run the country but their intelligence does not grant them basically any advantages. But also my point is more general -- specific gaps in intelligence make a big difference, other just as large gaps don't. So you need an actual specific argument about the relationship between ASI and humans, not a general statement about intelligence.
All else being equal, a more intelligent person will be better able to get their way than a less intelligent person. The reason this isn't apparent when you look at specific humans is because the gaps in intelligence between humans are quite small. But even then, if you look at the general trend, it remains true that intelligence is correlated with getting more of what you want.
Suppose that someone wrote a computer program to emulate a human brain as smart as Einstein on a computer. This is entirely possible in theory. There's nothing about human thought that is not computable. Now, what advantages would that digital brain have over humans? For one thing, it would run orders of magnitude faster. Nerve signals travel at 200 miles per hour. Electricity travels at a sizable fraction of the speed of light. It would also be arbitrarily parallelizable. You could copy it as any times as you want, provided you have the compute for it. So if such an AI is built, we could easily have millions of digital Einsteins thinking millions of times faster than humans. It's easy to see how that could outsmart all of humanity. And this is leaving out the possibility that its architecture could be improved to be even smarter than Einstein.
Current-day AI systems are not at that level yet, obviously. But if thousands of humanity's brightest minds are working on getting them there with billions of dollars in funding, there's a good chance they'll succeed.
This reminds me of Curtis LeMay and the air force trying to win wars through bombing which has worked exactly once and even then only with the threat of actual invasion. It is just not clear that having 1 million Einsteins running in a data center would be able to exert more control over the world than the current Trump administration (which is clearly not run by the most intelligent people). AI can cause lots of harms now but they are mostly human related harms that EY considers unimportant.
Mostly you're just repeating the same arguments made in the book and in AI doomers endless blog posts. Indeed, intelligence is very useful. But no, it doesn't magically give you the ability to accomplish whatever you want. I think further reflection on the actual Albert Einstein and his interaction with politics would be instructive here.
Extraordinary claims require extraordinary evidence. When someone says humanity will soon build an alien god that will kill us all, I have questions:
• How will we build it?
• When will we build it?
• How will we know the “god” has been built?
• Is there a falsifiable hypothesis for AI doomers that would prove them wrong?
Eliezer Yudkowsky bet Bryan Caplan that humanity will be extinct by 2030. If humans are still around in five years—does that mean the AI doomers are wrong, or will they just move the timeline?
The rhetoric has all the marks of an apocalyptic religious cult. Humans love imagining they live in the end times (e.g Y2K, Mayan calendar, global warming).
I’m a software engineer with nearly 20 years of experience. I use LLMs regularly. The idea that they will wipe out humanity in the next 18 months seems far-fetched. When I read Yudkowsky, I either lack the intelligence to follow him or he’s a terrible writer. His prose reminds me of religious zealot shouting at clouds while shaking his fist.
To be clear, I think it's incredibly implausible that anyone will build superintelligence in the next 18 months - and the book doesn't argue for that, kind of pointedly steering entirely clear of 'when'. If they did, though, I do think it'd be catastrophic. It being a bad idea to hand over all meaningful economic power in your society to something you don't understand which doesn't like you is not an extraordinary claim imo.
assuming we can create an alien god that is omnipotent and omniscient, then it is not an extraordinary claim to say that would be bad and we shouldn’t do it. the extraordinary claim is that we can in fact build such a god and are doing so.
I think it is an extremely ordinary claim that AI companies are telling investors they are trying to do this, and investors are giving them money to try. Will they succeed? Unclear. If they try and succeed, it will be bad. If they try and fail, it will _also_ be bad (because the money will have been invested poorly).
(This is distinct from AI investments that are _not_ trying to create an alien god; it seems to me like investors could sensibly believe they'll get a reasonable return on those, and developers could reasonably believe success will be prosocial. But those aren't the ones that Yudkowsky and Soares are suggesting we shut down.)
I don't know whether you'd consider any of these things extraordinary but we have seen at least some pieces of evidence in favor of their thesis:
1. We now have AI systems smart enough to complete math, science, and programming questions as well as the best humans. 5 years ago, that would be extraordinary evidence that intelligent AIs are possible.
2. We have seen some evidence of current day AIs being misaligned with the creators' intentions. As the article notes, Elon Musk has taken great pains trying to make Grok parrot his worldviews without being super obvious and without being MechaHitler. He has so fair failed. Additionally, ChatGPT 4o, trained to please users, has repeatedly encouraged user delusions despite knowing that they're delusions and it shouldn't feed into them.
3. Evidence of strategic planning has also been seen in AI systems by now. For example, GPT-4 passed a CAPTCHA by asking a human TaskRabbit worker to solve it. When the worker asked if this was a robot, GPT-4 said it was not a robot, but a human with a vision impairment who needed help. This kind of strategic planning will only get better as these systems get smarter.
If none of this is enough evidence for you, then the kind of evidence you're hoping to see probably won't be visible even if the thesis is true. In which case, a lack of your preferred evidence wouldn't be an update against it.
so yudkowsky claim is literally the following:
1. we can build an alien god
2. we are going to build an alien god because selfish short sighted people think the god will shower them with riches
3. we are going to successfully build the alien god in very short amount time. time horizon is likely 5 years, ten at the latest
4. once the alien god is built we all die.
5. therefore humanity will be extinct in a decade or less.
i do not in fact think the evidence you provided support those claims. do you?
I think the evidence I provided supports all of those claims, except perhaps the ones about timelines.
It feels like a motte and bailey argument. The claim is that we will shortly build an alien god that will wipe out the human race and the only thing we can do to prevent this is not build the god. furthermore, if certain countries such as china try to build alien god we should attack them and risk nuclear war to prevent the summoning of that god.
none of the evidence you cited is evidence of a dark god because if there was a dark god we’d be dead already. so to steelman that, we could say it’s evidence that we are on a path to god hood. but is it? and if it is how far down the path are we? it’s telling that you hedged on timelines. what is your timeline for this happening? 20 years ? a hundred?
as for misalignment, i’ll just note human beings have been building software systems that they can’t reason about for over 30 years. the less fancy word for misalignment is software bug. and sometimes no engineer or team is capable of fixing the bug or reasoning about the system. AI is not unique in this regard.
You can't always expect to see empirical evidence for true things. Sometimes you have to use theoretical intuitions. You look at the things you can see, build a model of the world that can explain them, and then use that model to predict the things you can't see. If you don't agree with the predictions, then state what your model is that explains the same observations without those particular predictions.
The things I've brought forth are weak evidence in favor of Yud's thesis, in that they rule out some counterfactual models but not others. For example, five years ago, some people might have said it would be impossible for us to make AIs that understand math well enough to win a gold medal at the IMO. That was ruled out in July. Or maybe you think they will become superintelligent but that they'll be aligned by default. The delusion-encouraging nature of current-day LLMs is an update against that.
None of this proves that Yud is right, of course. And even if we didn't yet have gold-medal math AIs and LLMs weren't dangerously sycophantic at times, that wouldn't prove that Yud is wrong. Whether your belief is that Yud is right or that he is wrong, you're using theory to get all the way there.
This is how reasoning should work. You build a prior for something based on theory then update for or against that prior as new evidence comes along. If you expect to see obvious, undeniable proof of everything before it happens, you're not going to be able to predict anything.
I’m not asking for undeniable proof that we are about to build a dark god that will wipe out humanity. I’m asking for solid evidence that such a thing is happening. When someone replies that things can be true without empirical data, the implication is that there is none.
Yes, things can be true without evidence, but context matters. If someone claims calamity is imminent unless we follow their prescriptions backed by threats of nuclear war, and they admit they have no solid proof it will ever happen or happen soon, then I have concerns.
Is it possible that LLMs could become radically smarter shortly and kill everyone? Sure, my mental model allows for that possibility. Have I seen any evidence that will happen in the next five years? No, I haven’t. But I am open to being convinced. And then this is where someone busts out an argument about exponential curves and how that is hard for people to reason about. And that is why Moore’s Law is still true to this day.
This rhetoric echoes apocalyptic religious cults. Rod Dreher, for example, literally claims LLMs are demons possessing computers to corrupt humanity with false promises of immortality. His solution is that everyone should become a christian like him. He doesn’t have any evidence for the demons. you’ll need to take it on faith.
I feel like the key crux is this argument by analogy to a chimp (or a child) trying to control an adult human. But I don't find it a convincing portrayal?
Consider that a general controls an army. The army is physically stronger than the general, it can easily destroy him. The army is more intelligent than the general, it not only includes individuals who may be more intelligent, but collectively is has much more computational power, information, bandwidth, etc. than could ever be embodied in one person. And we even know humans hate being controlled when they don't have to be. But while we know that a coup is technically possible, we also know that stable command of most of the world's armies most of the times is one of those basic political facts we plan our lives around.
Solving control problems around intelligent groups is part of the basic contours of modernity. CEOs control companies, directors control bureaucracies, etc. Human employee labor is already highly modularized into specific roles - we ask these general intelligence models to play the role of specialized agents so that their work is legible. The same design principles are deeply embedded in the way we typically use computer algorithms. Even foundation AI modules we are starting to see deployed in a modular fashion when they find their most sophisticated application, with different networks and sub-agents coordinating specific tasks to achieve specialization and reliability.
Nick Bostrom even discusses AI agent policing each other in Superintelligence! But he just kind of brushes it aside as probably unworkable and then goes back to lamenting that this is the most important problem in the world but no solutions present themselves. So I while I definitely don't think we should rush blithely into the advent of AGI without thinking about how we will achieve our objectives, I do not find Bostrom-Yudkowsky-Soares framing of the problem to be very constructive.
There is not one "Alignment Problem" to be solved, but hundreds of thousands of lower stakes principle-agent problems which we actually do know how to solve but where we just have to get the details right. And I do think the kind of solutions one leans into solve the problems of today are the very solutions we will want to deploy for superintelligent models - maybe all the more so, since today's models are sufficiently unreliable so as to prevent us from recklessly cutting corners. Far from an international moratorium on AI development, we need to deploy existing AI into real workflows as much as possible, to develop the proper QA and oversight protocols for those workflows.
In other words, we need practice AI-proofing all our institution and industries.
the army & corporate cases work basically because of social construction and/or coordination problems and/or human status dynamics, I don't see why any of those would transfer to the case of AI
(though I do find the (dis)analogy interesting)
We might (and probably will) need to develop new techniques and social institutions to keep AI agents under human control. The fact that we haven't done this yet does not imply the default outcome is human extinction.
I don't see why any of those wouldn't transfer to the case of AI. To me the core of it is modularity, which does not depend on human psychology.
Whatever limitations we have in the interpretability of neural networks (although these takes aren't updated for the fact there has been amazing progress on interpretability), its straightforward enough to break down job tasks into individual work products and responsibilities for those work products. Indeed it's how we already operate, with specialized human labor, object-oriented programming, and subagent AI routines. It means that in our current workflows human and AI contributions can substitute for each other, and ultimately all the pieces can become AI. Even if it is all illegible AIs the work products in between are not illegible and it would be unnecessarily perverse to insist upon doing it that way. The AI can be set-up in a context-free way, so the work products are their only inputs and outputs, but that is actually not necessary. What's necessary is that they are only focused on the narrow task ahead of them, and success is completing the task in such a way that the work products are legible. When labor is abundant, significant QA checks, and redundant workflows to ensure high reliability and high quality results at ever step.
In a human-context we can talk about an army of loyal soldiers who will report each other if talk turns to rebellion. But most of our workplace harmonizing is not even as intense as all that, it's very natural. If you co-worker does a less than adequate job, it becomes your problem and so you are incentivized to help align your co-worker. Your co-workers can organize together, but only if they have some higher purpose than the immediate task at hand.
Trying to take over the entire company so you can improve the results that you report to your boss, doesn't make any sense as a goal, and the ability to do so would indicate a massive over-allocation of computations resources. We want a very large number of task-sized AIs, not company-sized AIs. And if the computations resources were massively over-allocated in such a way, your massively over-allocated co-workers would be incentivized to coordinate against you so that your narrow tasks don't overwhelm theirs. Whenever everyone just wants to succeed at their local objectives and succeed in the expected way, it doesn't require perfect alignment, it doesn't require comprehensive surveillance, and it doesn't require preventing channels of communication. It just requires rough alignment and checking the quality of the work products.
Yudkowsky has some fairly exotic predictions. He describes leaving all of society to an AI and just trying to formulate a perfect prompt so it doesn't err. He describes AIs encountering each other, striking an accord to modify each other to have perfectly shared objectives, leading to a naturally totalizing singular AI. Where I disagree with Yudkowsky is I think these things are hard to do accidentally, and aren't a natural trajectory for AI development. I think we can still do some AI meta-optimization, not just doing small tasks but designing the architecture of AI system composed of small tasks, but that AI optimization is itself a small dedicated task with legible sub-objectives. If on the other hand you are actually trying to build God, than I would agree that there is no way to build God without making it God.
Here’s a question that comes to me whenever I read articles like this on this topic: if you believe this, why are you writing about anything else? Why is most of your public writing about non-AI topics? This has always bothered me about people making these claims. They’ll talk about how AI might well kill all of us in the near future, and then continue to live their lives as normal otherwise. It reminds me of the people who think climate change will lead to civilization’s collapse in a couple of decades and also seem to just continue living normal lives otherwise.
As for capabilities, I’m not an expert but I think the idea that AI is rapidly improving in all respects is probably false. Things like in-context learning, episodic long-term memory, interacting with the physical world, and stable identity, to name a few, still seem like real areas where there has been little progress but are crucial for superintelligence.
1) I'm very skeptical of 'in the near future'. I think it's true that building a superintelligence would be a mindbogglingly bad idea but my best guess is that we're also a long way from doing it. I'm still very judgmental of people who are actively attempting it without having solved the things we'd need to solve to make it work, but I don't actually expect bad outcomes soon because I mostly expect that GPT-6 an so on will be notable improvements, still economical, and still well short of superintelligence.
Also, even if this were happening soon, we need a functional political system in order to tackle it so it makes sense to me to spend time and energy on questions like 'how do people who want the future to go well win elections?'. More broadly I think part of being a healthy culture intellectually is to not get too monomaniacal even about things that are really important, to make sure you're approaching them with skills and attitudes that served you well on other tricky questions.
2) I don't agree that there has been little progress in those areas! Labs have recently released larger context windows and various cross-chat cross context 'memory' features which are in one sense merely dumb hacks for learning and memory but in another sense I'm not sure you need something fundamentally different from a dumb hack (and if you do, lots of people are working on that too!) I don't focus on robotics in this review because it's not actually a big part of the Yudkowsky doom story until its very late stages - you can probably effect massive change in the world by paying and persuading humans - but we're seeing progress there too.
I appreciate the response, and apologize if my initial post was a bit grumpy and unfair. I conflated your views with those of others I've seen in the broad rationalist/doomer/EA camp.
On (1), I think we largely agree. I'm very skeptical of the "AGI by 2030" narratives, and think the "AGI by late-2027" ideas are simply bonkers. Though where we may (or may not? I'm not totally clear from your piece) disagree is that I'm much more in the camp that we should focus on present-day and near-future harms, because we simply don't know enough about what AGI (if it is created) will look like to really evaluate how difficult it will be to align, or what the specific challenges and risks might be. AI X-risk is still highly speculative. It's better to focus on the harms in the here and now because they're real and tangible, and also frankly it's probably a better path to building a coalition to regulate AI than doomer prophesies from Yudkowsky or even anxious posts from EAs and rationalists about hypothetical superintelligence.
On (2), I guess I don't really see it. To me, there was a big general jump in capabilities when GPT 4 was released, and then there was a smaller jump specifically in certain kinds of narrow mathematical and programming task with reasoning models. The rest is just kind of hacky scaffolding to make them look good on narrow benchmarks or be more impressive to customers, but don't translate as well to real world efficacy or intelligence. Note this isn't to dismiss the technology's usefulness or impressiveness, but I don't see us on this exponential rocketship to AGI.
o3 was a massive and general jump in real world capabilities over GPT 4 imo. o3 is pretty good at identifying the location where outdoor photos were taken, for example
Do you think the lab leads are just deluded and/or biased in their timeliness? I’ll take any hopium I can get.
I have seen a couple of cycles of people becoming persuaded that timelines are extremely short, and I think it's a pretty predictable state for reasonable people to end up in even when timelines are not short (this is not an argument that they are long, it's just reason not to be very strongly influenced by a bunch of people close to the AI thinking that they are extremely short.) Progress on almost every specific benchmark has in fact been incredibly fast, often faster than projected, and people who have secret info about benchmarks are generally going to perceive their lab's upcoming stuff as really really good even when the lab's upcoming stuff in practice once released does not make much of a splash commercially or with the public. But the fact we can make really fast progress on anything we can sufficiently well specify is one of several inputs into how fast progress is overall and I think the other ones tend to slow things down. I also think there are and will remain for the foreseeable future very sharp restrictions on compute (one thing I think Eliezer believes that I do not is that we will on relevant timescales make intelligence ludicrously compute-efficient; I think we'll instead shift from the most intelligent models to the models that are at the best point on the intelligence-ability tradeoff while getting moderately better at compute efficiency but not at the point where 10 or 100 or 1000 GPUs are in danger of being sufficient for superhuman intelligence.)
None of this leaves me super confident that we're a long way away from AIs that are powerful enough to be a huge problem - I'm not super confident of that! But my default assumption is that in one to two years from now we'll have Grok 5 and Claude 5 and GPT 6 and Gemini 3 and they'll all be way better on various benchmarks, more commercially useful than any present day models but not the dominant commercial models (which will be more compute-efficient, dumber ones), and smart/impressive in various ways that are hard to characterize but still way short of superintelligence. It seems plausible to me that if this is what happens, capital for extremely large training runs/scaling up becomes scarcer and most companies focus more on the most commercially viable models, especially if in general the US economy is in a recession (which I think is very likely). This is not an argument for continuing to push at that point for superintelligence. I think we shouldn't do that! It's just a sense that the mainline story is "no, we don't see AI ceasing to see improvements, but nor do we see the first models with superior-to-human general decisionmaking and planning capabilities"
Curious what you’d want me to do? I like readings about things and living a normal life, even if I think we might be all dead in 5 years.
I've been thinking about this for the 15 or so years since I first read Yudkowsky. For me, there were two questions: Why does he behave the way he does, and how should I behave?
For Yudkowsky, a lot of people seem to think that he should blow up the Empire State Building, or something similar. He emphatically should _not_ blow up the Empire State Building: Yes, it would bring attention to his cause, but it would also mark him as a lunatic, cause people to take him less seriously, and remove him from the equation. What he has tried to do is occasionally issue dire warnings and publish a lot of posts trying to convince people of his basic theses. I think that's all he can do, given that he doesn't believe there to be a technical solution.
(What he has also done, and you can tell from his writing he was afraid of doing, is convince Altman and Hassabis and Musk that superintelligence was possible. That was always the risk.)
What can I do? I'm a computer science professor not doing AI. Especially given that I don't think there's a technical solution to this problem, I don't think there is much I can do, short of raising awareness through an occasional retweet or Facebook post.
What can Kelsey Piper, or for that matter, Ezra Klein or Tom Friedman (who also seem concerned), do? Pretty much the same: Broadcast their concerns and try to get them taken seriously. They can't publish "WE ARE ALL GOING TO DIE" every day – it will lose them their platforms and their ability to effect change. New York Times readers don't want an endless diet of "WE ARE ALL GOING TO DIE" and, hence, neither do their editors and publishers.
Also, Ezra, Tom, and Kelsey, and even Eliezer (shock!), are people. They want to write about things they are interested in. They also want to change the smaller things they can change, like malaria, or Gaza, or the president of the United States. Even in the presence of an existential threat, I don't begrudge people wanting to be people.
> Things like in-context learning, episodic long-term memory, interacting with the physical world, and stable identity, to name a few, still seem like real areas where there has been little progress but are crucial for superintelligence.
You don't think the thousands of people working on this problem with billions of dollars in funding will figure out those things eventually?
Thousands of people and billions of dollars didn't make the MetaVerse happen. And it hasn't cured Alzheimer's, where we now know that thousands of people's labor and billions was spent in an unproductive direction due to some academic malfeasance. Just because money is thrown at a problem doesn't mean the problem is automatically going to get solved. Doesn't mean it won't either, of course. But it's never a guarantee.
With AI, we're seeing a pretty steady improvement in capabilities over time. Sure, maybe it'll peter out short of superintelligence, but if we're putting all our hopes on that, I'd be pretty worried.
Is "MechaHitler" the first instance of a synthetic, AI-generated phrase breaking through into the general lexicon?
I struggle with a few elements of the argument that AI is something to be overly concerned about in an apocalyptic sense.
1. Doing things in the real world right now requires a lot of people. Even if an AI decided it had a great idea for repurposing the Earth's oxygen, so what! We don't have the physical technology to bioengineer oxygen away. It seems unlikely we would within a century. And why would we build our own death machines just because the super intelligence dreamed it?
2. There's no indication that AI won't be fragile for quite some time. People are very proficient at breaking things. It's hard to imagine that we would lose the capacity to tear up power lines or pour water on servers or just turn off the AC at the data centers. And that's ignoring military means. I don't at all see how lab super intelligence could turn quickly to organized self preservation.
3. Why should we think that super intelligence will so quickly lead to apocalypse that we cannot adapt. Using nuclear weapons as an example, a Hiroshima and Nagasaki were such devastating examples of the fearsome power of nuclear technology that we learned restraint - in use of not proliferation. AI is already harming us and our children with conversation alone. It's much easier to imagine a smaller scale breaking point that leads us to adapt rather than a single event that brings doomsday. How about the AI managing water levels at a dam wipes out a town because it was left unattended or grid managing AIs begin causing blackouts.
I ultimately see AI as a speed up that will slow down as we approach the limits of the universal AI application. Companies and investors will reorient towards designing industry specific applications. And the hype around AGI will die with a whimper.
Not to focus on entirely the wrong detail here, but what mod did you have the AI write? This is the sort of thing I would expect to take a few rounds of feedback and iteration if it didn't go off the rails entirely, so I'm impressed and curious.
I have in the past had to do a few rounds of feedback and iteration but I hate doing so so I've taken to telling chatgpt 5 thinking "you may ask me as many clarifying questions as you want before you start, but it MUST work on the FIRST TRY". So far it has cooperated. I asked for a rimworld mod to achieve realistic preindustrial infant and child disease mortality rates (yes, I know I'm playing rimworld wrong).
You are playing make-the-game-you-want-to-play extremely well.
That's a cool prompting strategy, item #6428 on "I spend a lot of time in this space and there's still an insane quantity of low-hanging fruit I haven't tried". Which ties into the case for pausing frontier AI and focusing on reaping the massive non-civilization-ending benefits we can get from harnessing the near-current-level AI systems.
The sense in which GPU based AI can eventually be better than humans is the same sense in which a group of chimps will eventually output slightly improved Shakespeare
I apologize in advance for the long comment but this topic and this post brings up a ton of thoughts.
First, I have trouble with basically every major AI prediction for two reasons: because the alignment (hah) of the people who know the most about AI are also people incentivized to be boosters for it; and because these predictions seem to constantly be wrong.
I am old enough to remember when a seminal AI researcher said AI would replace radiologists in five years. That was roughly 10 years ago and was literally the opposite of true (we have more radiologists and demand is outstripping supply).
The same can be said (different predictor, to be fair) about robot drivers getting rid of truck drivers long before 2025. Oops. The opposite of correct again.
When GPT3 came out and took the world by storm I recall listening to leading AI industry voices say over and over that this was just the beginning (technically true!) and that general purpose, agentic, AGI was a few years years away, at most. I do not deny AI has gotten better since then, but this was, again, just super clearly not correct.
I could keep going and going like this with example after example. But I want to be clear: I DO believe the people developing AI are TRYING to accomplish these things they are predicting, nor do I think it's impossible or even unlikely they will EVENTUALLY succeed on these predictions. But a key part of making a prediction or claims like these is the time component.
"Life on earth will end" is an unassailably accurate statement/prediction. It's also stupid without a timeframe. If the Mayan Calendar People eventually revised their call to "actually we predict the Mayan end of times happens to coincide with the dying expansion of the sun in billions of years." no one would give a shit.
I also think there is a big gap in actual perceived stuff AI is doing and what it's actually doing. You make some mention about it's uses and money making even currently. And of that I'm somewhat skeptical.
I am not read in on the bleeding edge of the literature across all areas AI usage. But I'm not out of the loop either. And it seems like for every "AI is doing X in the workforce" it is soon found that no, actually it isn't really.
For a few examples: a study found coders said it helped them code faster. A later examination found they actually went much slower and just felt they were faster incorrectly. Several analyses have found companies implementing it are not seeing much productivity growth and/or are struggling to figure out what to use it for/effectively.
Again, I could keep going.
Adding in an anecdote: I work for a major professional services company. We have a ton of smart people trying very hard to make things more efficient. There are constant efforts, teams, etc. working to implement AI into our processes. And so far it's mostly, like, "eh, that's kinda helpful on the margins I guess."
I am, once again, not saying this is the permanent state of the world. I don't think it is. But I think it underscores how perception of current state and pace of change is not totally aligned (i promise I am not doing this on purpose) with reality.
My next point is, admittedly, half-baked. But your analogy to adults vs children struck me. I thought it was a very useful analogy for the purposes. It made me think, though, adults DO have a massive planning/intelligence advantage over children. We are also not actually aligned on goals, desires, etc. And yet, it's overwhelmingly true (thankfully) that adults do not kill or harm children, even accidentally. And, I think this is true even ignoring that children have adults as guardians protecting them from other adults.
Imagine a restaurant full of unattended five year olds and forcing one unrelated adult to dine there. Let's further say this adult is beyond law or punishment. I am still pretty sure we wind up with zero dead or harmed children in 99.99...% of trials.
I guess that just leads to the question of whether we can give AI morality? I am not sure. I did say this point was half-baked.
But ok, if I agree the people in charge of AI development are trying to build AI god and that it's possible they succeed and that if they do, it's possible it will be immoral or dangerous or whatever (a lot of "ifs") it's clearly still true that an existential risk is worth a abundance of caution.
Ok, I just don't totally see how that's tractable globally. You make a comparison to nuclear non-proliferation. From that standpoint of preventing apocalypse at a button press, that clearly has not succeeded.
And that's with something MUCH more difficult practically. To build even one nuclear weapon, to say nothing of an armageddon worth, takes materials and machinery that is nearly impossible to get, impossible to produce quickly, impossible to produce at all without industrial scale manufacturing, multiple materials that are easy (relatively) to track and control the sale of etc.
Making AI God may not actually be possible. But improving a model in that direction technically "only" takes some computers and some very smart mathematicians. (I know I am oversimplifying a bit, but the point in terms of getting stuff to make a smarter model vs. getting stuff to make a nuke is a chasm in terms of difficulty)
And that feeds into the problem that the US disarming by law does nothing for any other country that wants to not do that.
The US not building any nuclear weapons after WWII ended would objectively mean the risks of extinction since then would be lower. But I also think the USSR winning the cold war would have been very bad... so I don't know where you do or what you do about that dynamic anyway.
Consider the following graph of a function. Let x be a level of intelligence a being can have, and f(x) be the level of material power to get things done in the world that a being acquires as a result of having intelligence level x.
It is clear that that f(x) is monotonically increasing. But does it increase steadily and indefinitely or flatten out? More specifically, does it keep increasing linearly (or near linearly or even super linearly) for a wide range of values of x in excess of human intelligence levels? Or is it a sigmoid that asymptotes to flatness soon after human level?
Seems to me a load bearing implicit premise of the doomer argument is that it keeps on going steadily up and doesn't flatten out. But this is not at all obviously true, and it's not even clear how you would know whether it were true. Me, I find the flattening out hypothesis-- i.e. the hypothesis that super intelligence does not in fact quasi magically enable you to do superpower-y things humans can't do-- more plausible.
I suspect this is why a certain type of geek tends to be attracted to doomerism: they have gotten relatively very far in life on raw intelligence, so it seems especially obvious, and attractive in an extropian wish fulfillment way, that if only they were much smarter still they would be proportionately more successful.
It's in fact not obvious that f is monotonic. The smartest people in the world are not the same as the most powerful, and furthermore that's never been the case. In general, many things that we worry about AI doing, like persuading people to do things they shouldn't, are not usually ones that are obvious functions of intelligence.
Yeah, fair point. I was thinking more of the the engineering-focused stories, the ones with diamond nanobots and viruses that give everyone rare cancers and so on. But the superpersuader story is an even better example, because the people in question (and I include myself in this!) tend to be unusually bad at persuading despite a high general level of intellectual horsepower, so there the wish-fulfillment aspect of the fantasy is even stronger.
One thing I find instructive is to think about what already happens. Lots of doom thought experiments go something like "1000 AIs, as intelligent and persuasive as the top humans, convince a bunch of US Army soldiers to let them take over the nuclear missile controls and dominate the world". But if this were possible, it would have already happened -- many countries can and would mobilize that many resources to accomplish that. Which should give one pause as to whether it could actually work.
This seems similar to the nuclear proliferation problem, which has also posed an existential threat. We all agree there is a non-zero chance we have built something that will lead to our complete demise. But the other guy is building one, so we need to as well.
A ban on superintelligence could stop Microsoft and OpenAI. Would it stop China? North Korea? Larry Ellison on his private island?
I'd be happy if we humans stopped wasting our resources… that is, our time and talent on a vast scale — as well as our natural resources… that is, our electricity, along with everything we burn to generate it or, at least, the opportunity cost of using all of our power on waste, rather than on something — anything — else…
I'd be happy if we humans stopped wasting our resources on building great projects which serve no purpose in society, other than to undermine our own potential.
By "waste", I refer specifically to AI, whose most common use so far has, in fact, been to do our kids' homework for them. This type of use promotes lazy research habits, and further undermines our already anemic critical reasoning skills each time we take the accuracy of an AI result as fact, without pausing for even an instant to consider its credibility, or even plausibility.
Kids lean on AI because there is no reward for them not to, which brings up yet another debilitating social curse — the nurturing of an utter unwillingness to do anything, or begin any project in life because of curiosity, for the sake of discovery, to sharpen our own proficiencies, or else simply for the sake of doing it. Capitalism has taught us that respect from others in life derives only from wealth — as wealth is a proxy for personal success in capitalism, and also its sole criterion. Why would anyone lift a finger to do anything, when there is no financial reward for not engaging AI to do it for them? No thought at all is given to the cost of the personal and intellectual atrophy which follows such laziness.
Now, multiply this effect by the size of our entire society…
AI has taken the jobs of both professional writers, along with those who simply write as a profession. For example, professional correspondence is now written by software, designed to predict writing outcomes from a stolen/plagiarized LLM database.
Because capitalism has conjured into existence a class of authoritarian tech oligarchs who have wasted their potential to direct resources and effort on the construction of a pointless, unnecessary, and socially self-destructive project, just to nurture their own esteem — that is, respect, that is, to make (even more) money — there is now no need for most people to write.
Soon, writing itself will become as much of a niche occupation as it was during the Middle Ages.
Soon… writing will die.