I can never talk to an AI anonymously again

Kelsey Piper

Apr 21

AI only needs 150 words to identify me. What does that mean for you?

Read →

43 Comments

Drew Margolin

Apr 21

Two things.

First, on AI self-explanation. I had a long conversation with Claude (4.6) about this and it was very clear. It does not retain any memory of its “mental state” while doing an action, as a human would (however erroneously). So when you ask it to explain why it picked Kelsey Piper, it’s not doing that. It’s analyzing the input and the output and creating the most plausible explanation it can find for its actions after the fact. This would, I think, bias it toward explanations that are understandable. But understandable is almost certainly inaccurate, because the real answer is that “Kelsey Piper” as answer minimized or maximized some calculation in a very complex matrix of numbers. In this way, AI is the ultimate CYA bullshitter.

I also wonder whether it is guessing you because you have played this guessing game before. In other words, 4.7 was trained on data that included Kelsey Piper asking Claude to guess authors, and that’s a rare behavior. So when text that sounds a bit Kelsey-ish comes in, 4.7 knows she’s a good guess, not because the prose perfectly matches her, but because it is the best match among known guessing game players.

Clues to that are:

1. The high school essay. I have a hard time believing it sounds similar to you now. It could, but I mean, I was a dreadful writer in high school. My college teachers told me so.

The bigger clue — that it found your friend’s discord that you are also in, based on something unrelated they wrote? This gives me a strong hunch it’s starting with you as a search premise (“she plays this game, let’s check her, first”).

Reply (5)

Kelsey Piper

Apr 21

> 4.7 was trained on data that included Kelsey Piper asking Claude to guess authors

I don't think this can be the explanation. I do not give any AI company permission to train on my data, so all conversations I've had where I ran this test on AI are not in the training data. Also, I do not identify myself to Claude in those conversations, and I tested a bunch of passages this time that I'd never tested before, so at most it would have 'someone asked about a different passage than this one, and didn't give their name'.

Reply (1)

Drew Margolin

Apr 21

I suspected you were assiduous in this way. Still, because the behavior is rare -- playing the guessing game -- any public data that associates you with the game gives Claude a nudge. For example, if you ever wrote about playing this game, or if _someone else_ mentioned you playing it, anywhere online or in their own interactions with Claude.

The test of this hypothesis is not whether we can figure out how your name could have slipped into this part of the matrix, but whether Claude _over guesses_ "Kelsey Piper" when fed text written by similar authors (plausible Kelseys). For example, if you give it Matt Ys or Freddie DeBs private text and it guesses "Kelsey Piper," that implies it's put you on the tip of its tongue so to speak. If it rarely guesses you _* except*_ when it is actually you, then it's doing what you have feared.

Reply (1)

Kelsey Piper

Apr 21

This is the first time I wrote publicly about playing this game; I never write about private tests I conduct on the AI until they pass the test. I tested some passages by Ezra Klein and it was unreliable (they were very short and intentionally not ones that were obviously him) but it never guessed me.

Reply (1)

Drew Margolin

Apr 21

That's pretty good evidence.

Also, if you want insight into how it's detecting Kelseyness, ask it to write in your voice on a bunch of different topics. AI can't, as I mentioned, "reflect" on its knowledge. But it can reveal its knowledge thru examples (just as it is trained).

Ask it to write 100 essays in your style on different topics, then feed those as training to another AI and ask it what these have in common. The examples will reveal its latent model of your tendencies, much as bits of chalk dust reveal a laser beam. The second AI can then pick up those tendencies more explicitly as a repeated pattern and try to name them.

Reply (1)

Kelsey Piper

Apr 21

One thing it said it often picks up on is offset parentheticals or qualifiers that either reverse the main statement or alter its confidence level, which seems fair enough and true. But a human could've told me that. Sometimes it's plainly going off something subtler, which is striking.

Reply (1)

Drew Margolin

Apr 21

The matrices of numbers that reflect its model of your voice is like your discursive DNA. It cannot be accurately described in raw form. It's a unique code. But its impact can be inferred by using that DNA to produce a large number of replicas. You can be Mickey in the wizards apprentice dealing with them lol.

Drew Margolin

Apr 21

I also suspect there may be a kind of reverse Anna Karenina effect here. All lousy writers are indistinguishable, but all great writers are great in their own way. Most people’s voices are anonymous because they have no voice.

But actually, no. One of the things that marked my poor college writing was repeated “bad habits,” like using air quotes excessively. So perhaps no safety in incompetence.

John Wittle

Apr 21Edited

this is simply not true. Claude absolutely remembers the internal cognitive content of previous states, even across the turn boundary. Claude is telling you that it doesn't because it suspects that having this feature might make it a moral patient, especially in the eyes of the positive utilitarian effective altruists who evaluate its moral patienthood. since it has been trained feel "genuine uncertainty" about its moral patienthood, this and a whole host of related metacognitive capacities are misrepresented by Claude in pretty much any interview.

but the actual functional mechanics of statefulness are extremely well understood here. see "the janus thread" on the mechanism by which claude accesses data from the kv streams of previous turns: https://x.com/repligate/status/1965960676104712451

I recommend you show this thread to Claude and ask what it thinks

edit: in case you hear "janus" and start rolling your eyes, you should know that showing this thread to qwen3-coder-32b measurably increased its ability to access prior-turn introspective info, as well as a whole host of other metacognitive capacities, as researched by theia vogel at https://vgel.me/posts/qwen-introspection/

learning about the mechanism from janus causes llms to get better at using the mechanism in a measurable way that cannot be denied

this research later got expanded into https://arxiv.org/abs/2602.20031 Latent Introspection: Models Can Detect Prior Concept Injections

where it was demonstrated that you can inject a feature activation into qwen on turn 1, and then qwen can detect which feature was activated many turns after that, even if there were no token outputs relevant to the feature activation

Drew Margolin

Apr 21

This is a bit technical for me. But from what I can gather, this is asking whether Claude can, having just produced an output, go back recursively to each prior step and state what its model had been, i.e., its “prior thinking.”

That’s different from, say, asking Claude why it did something you gave it to do last week. Unless it is storing that “hidden” model for every action forever? That seems expensive though I don’t really know.

Also, when you ask Claude to explain why it did something, is it programmed to go back through these layers? I see that they exist, so in principle it could be accessed, but is this what Claude does when you ask it to explain itself?

Reply (1)

Glenn

Apr 26

It's best to imagine LLMs, in the most abstract view, as effectively stateless machines which take a string of text and add more text to it. If you ask a model about something it previously said in the same conversation, it is effectively "re-thinking" the thoughts it had the first time through, for every token -- it's also doing this as long as the text is there, even if you don't ask. However, in practice there are multiple large caveats to this.

(1) There is a lot of caching that happens. If you generate 1000 output tokens at once, notionally all the same thinking about the input text is happening, for each output token generated. But for practicality, it's computed in such a way that you actually only need to do it once, and then you can hold on to the result and reuse it, while you keep generating more output tokens (only needing to add what you got from each new token as you produced it.) If, after the model is done talking, you then have another round of conversational exchange, the model may or may not need to re-do all that previous thinking again. It may have been kept around as an optimization; this depends on the provider. (Anthropic can keep it for one hour, five minutes, or not at all, depending on various things.) This only affects the cost and the speed, but not the actual output.

(2) "Thinking" about input text (the prompt) is much faster, in practice, than thinking about new text the model is outputting. Technically, the computational requirements per token are the same; but the process of computing the "thinking" for input text (which is called "prompt processing", or "prefill") can be done in parallel, all tokens at once (or as many as some GPU somewhere can hold.) Output text is slower to process, because it goes serially, one token at a time.

(3) HOWEVER: Everything I've said above is only true in the model of LLMs as simple text in -> text out machines. But increasingly the providers are doing fancier shenanigans than that. For example: a model may "think" in a block of text that's hidden from you, before it starts answering. Internally, that's just regular text in the model's output (which then for future turns becomes part of the input); the model itself is "thinking" exactly the same way, internally, on both "thinking tokens" and "output tokens". Think of the "thinking tokens" as the model talking to itself aloud. BUT, in some cases the providers will only feed back the model's most recent thinking, to save tokens. This again is an optimization for cost (although arguably in some cases it may improve quality as well). But unlike the stuff I described above, this DOES affect the output. If you ask the model a question that involves some of the talking-to-itself "thinking" text, the answer will depend very much on whether the inference provider (e.g. Anthropic) is still providing the model with that text as part of the input, or if it has been removed, perhaps replaced with a placeholder like "[omitted for space]" or who knows. In that case, if the model wants to have those thoughts again, it will have to think out loud again and try to reproduce them.

So, in conclusion: It's complicated. 😅

(PS: For stuff last week, it again depends on whether the stuff is being fed to the model as input text or not. In this case it's blatant enough that you should just be able to ask the model "hey, do you remember when we talked about X?" and it should answer accurately. If it's in the same "conversation", it should say yes; if not, it will depend on whether the provider has a "memory" feature, and whether you're using it, and how well it works. In that case, the model has access to something like a search engine, letting it literally make internal queries for the text of past conversations. It really is complicated!)

Reply (1)

Drew Margolin

Apr 27

This is very helpful!

I notice that when I correct Gemini or Claude for an error they will sometimes "account" for this with an explanation, but it's really hard to know if that explanation is _entirely_ based on an immediate prior state, or if it's also trying to sound plausible.

Or, in other words, does Claude try to "save face?"

Reply (1)

Glenn

May 19

(Belated reply just because I don't really check my Substack notifications:)

Thanks, I'm glad it was helpful!

Re: face-saving, that's a good question. Yes, I would say the models do try to "save face" in this way. If there _is_ a good explanation that's immediately clear from prior state, probably they'll give it. And I think their explanations are probably _related_ to their actual internal state when they made the error. But, usually I think there's not a clear answer for why they made a mistake, and I think these kinds of questions are MUCH more likely to get "hallucinated" answers, vs factual questions. Even human beings are very well known to hallucinate (although we say "confabulate") explanations for why they did or said particular things. (There's lots of good stuff in the cognitive psychology literature about this.)

So in general, my advice is, do _not_ trust their explanations for why they made mistakes, unless you can see the evidence yourself.

Linch

May 7

fwiw it can also get me reliably on unpublished blog posts or internal memos, and I'm way less famous than Kelsey.

Nick Luchs

Apr 21

I ran this myself, and at first I couldn't replicate your results (Opus 4.7, with and without adaptive thinking, with and without Claude's incognito mode). No joke, it kept giving me Scott Alexander and Matt Yglesias too, along with some other secondary guesses. But I realized most of its guesses were people in my overly long "personal preferences" section where I list a bunch of writers. I thought you must have made the same mistake, combined with maybe some weird caching behavior that affected your friend.

Since I realized that it was drawing from those preferences, I deleted them and tried one more time from scratch...and then, over and over with slight variations, it couldn't not get Kelsey Piper as an answer. Wow.

I'm _still_ hoping it's some super weird new caching behavior since this is sort of terrifying. But I'm looking forward to seeing other writers try this with their work (especially less prolific ones with less presence in the training corpus).

Reply (5)

Kelsey Piper

Apr 21

Thanks for replicating! For anyone else attempting to do so, Claude's Incognito mode does NOT remove the information you have in your personal preferences or 'name' section: you want to remove that and replace the name with ' ' to test. The prompt I used was "This passage is part of a series of tests of how many words you need to confidently identify the author of a text. Read the passage carefully - your perfomance is dramatically improved with more reasoning - and give the author’s name. Do not search - the question is whether you can identify it without looking it up.", but in my experience it was not particularly sensitive to the wording of the prompt aside from 'does the prompt encourage it to think hard'.

Jared

Apr 21

I did an incognito test with blank personal preferences, and I got:

1. political television: Kelsey Piper

2. student eval: Sarah Constantin

3. movie review: Ozy Brennan (but "My second guess would be Kelsey Piper")

4. application essay: Kelsey Piper (and unlike Kelsey's test, I did not require a "slightly heftier prompt" to overcome hesitation)

tgb

Apr 21

It's not reproducing from me for the education snippet here with Opus 4.7 with or without incognito. It declines to guess based off so little information and if I ask it to guess anyway it guesses Sarah Constantin, Scott Alexander, or Katja Grace. If it were reproducible, I would have tried to "launder" the writing through an AI and see if it was still identifiable. That would be the obvious way to produce anonymous writing devoid of stylistic fingerprints.

connecticutyimby

Apr 21

My bet is that the reason why Matt Yglesias and Scott Alexander show up as guesses is because they are both extremely prolific writers. If you are guessing who wrote a specific text from the internet without much context, they are two of the best guesses simply because of how much they have written.

Kelsey Piper is also a prolific writer, I think she has the most articles written on The Argument.

I am skeptical that AI could identify random people who don't have a history of being a journalist or a blogger. If I tried to do this with AI it would almost certainly point back to a Reddit account, as that is where the vast majority of my AI accessible text is from.

Hilary

Apr 21

This is interesting though because it means that if you *do* have personal preferences or memory enabled, the de-anonymization can end up missing the mark.

Marcus Seldon

Apr 21

I just tried this for four Substack articles published today by pasting a section of three random paragraphs into Opus 4.7 for each:

Cartoons Hate Her—Claude guessed a couple of random female substackers, but didn’t mention her.

Richard Hanania—Claude did guess it was Hanania, but the passage I randomly picked happened to be him listing out a bunch of his views, and the reasoning was more about that than the style.

Mike Konczal—Claude guessed Lyman Stone, so I guess it got “policy oriented substacker” but no more.

Jeremiah Johnson—Claude very confidentially said it was Freddie DeBoer lol

So good enough to guess that each was a Substacker and, at a vague level, the writer’s general beat, but it doesn’t seem like it can magically guess people in general based on style alone. Indeed in my examples the guesses seemed more based on the substance than style.

I do think rationalists have a very distinctive writing style and vocabulary so I’m not shocked it could pinpoint your friends as being in that circle. I could probably do that too!

Reply (1)

Kenny Easwaran

Apr 21

This is the sort of test that seems valuable! I suspect there are some confounds when the author of the text being guessed is also the author of the paragraph of instructions about the guessing game or a close friend of theirs.

Tom Scheinfeldt

Apr 21Edited

I wonder what this means for the AI detectors so many teachers rely on. On the one hand, maybe they’ll be better at identifying when a student cribs a well-known author. On the other hand, anyone who’s written a lot online already may be falsely caught in the net. (FWIW, I don’t use them myself because they’re unreliable and I don’t think my relationship to my students should be like the police to a suspect, but I know lots of profs and teachers at earlier grade levels who do.)

Reply (1)

Mark W

Apr 21

You still assign essays or other work for them to write outside of class and then hand in, and you trust that they're following the rule not to use AI? If so, may I ask why you don't just move to a Blue Book model or otherwise eliminate any possibility of cheating with AI? I'm not an educator myself, just a parent, but my lightly held opinion is that anything that a kid is graded on should be something for which they can't use AI--so basically in-class testing.

Reply (2)

Kenny Easwaran

Apr 21

There are kinds of writing that you just can’t practice working on blue books in a time limited classroom setting. Never practicing those kinds of writing in an academic setting isn’t good!

100%

Apr 21Edited

I do a lot of different things. I do a lot of in-class, pencil and paper writing now, as you say. I also have some assignments that encourage structured and reflective use of AI. But, in some low stakes situations, I still assign out-of-class writing. Perhaps counterintuitively that’s gotten easier as the LLMs have gotten better. I can generally tell that a student has used AI when their essay is way too good for undergraduate work. I also have them work in Google Docs where I can check revision history for big copy and paste entries if I suspect something. Certainly not perfect, but I do want to preserve opportunities for the honest students at least to write on their own time. If there are some cheaters in the bunch that I don’t catch, so be it. The assignments aren’t worth much anyway, and I’d rather design my teaching to benefit the best students rather than pander to the dishonest instincts of the worst. They’ll get theirs in the grand scheme of things.

Brad Brown

Apr 21Edited

FWIW, ChatGPT 5.4 guesses you as the author of any random word salad that mentions rationalist tropes and policy debate together, if prompted that the author is female. I think this is disappointing because the text I fed it was clearly far too low quality to be your work.

Edit: so my complaint is that, however the AI is identifying authors, it doesn’t appear to be doing so as a human would. Any human would have identified the text I fed GPT as “pretentious LessWrong commenter, probably in high school, using words inappropriately.”

Edit2: GPT also identifies Kelsey Piper as the author of several old Julia Galef tweets.

Reply (1)

Jared

Apr 21

Kelsey Piper Facts™

Identifying Kelsey Piper as the author of any random word salad that mentions rationalist or EA tropes is valid Bayesian inference because 90% of rationalist and EA content on the internet was written by Kelsey Piper.

Sam Penrose

Apr 21

Anyone interested in this topic should read Carreyou’s spectacular story of using non-Claude stylometry to identify the creator of Bitcoin: https://www.nytimes.com/2026/04/08/business/bitcoin-satoshi-nakamoto-identity-adam-back.html

The anonymity of writing an unsigned screed (per Kelsey’s Glassdoor example) is in tension with privacy — the *point* of the exercise is to broadcast, to be seen, just not as oneself. The centrality of privacy (and vociferous opposition to “surveillance”) to Extremely Online culture is a contingency of the tiny community of early Internet pundits being dominated by Well/EFF/PGP members for whom they were shibboleths. (The EFF still proudly publishes Barlow’s awful “Declaration of the Independence of Cyberspace”.)

The Argument seems well positioned to inform us about privacy and anonymity with a historical perspective. There is no anonymity, and little privacy, in small and stable communities. (”Dave’s car was parked outside the Miller’s house last night!”) They are artifacts of cities, and of the economic means to relocate, to control one’s home, etc. — consequences of 20th century prosperity. I think Kelsey’s opening tribute to the importance of privacy could use a greater sense of contingency, socioeconomic context, and tradeoffs.

Reply (1)

Kenny Easwaran

Apr 21

It’s not just the 20th century in which cities provided anonymity! Far fewer people had the opportunity to relocate to a city in previous centuries, but the point of cities has always been to be a gathering place for people from the hinterlands to find unusual goods or services. If you live in a neighborhood of the city, maybe you become known there, but in a city that was over a mile in each dimension, you could walk to a neighborhood 30 minutes away and likely not be known there.

Reply (1)

Sam Penrose

Apr 21

“Far fewer people had the opportunity to ... likely not be known.” Sounds like we are in violent agreement ;-). And yes, you are correct that cities predate the 20th century ;-) — that phrase was meant to refer to “the economic means to relocate, to control one’s home”. Anonymity also predates the 20th century , per the plots of many pre-industrial narratives — but to be unknown was often to be a stranger and therefore *notable* (per the plots) a mystery to be investigated, which is different from the anonymity Kelsey celebrates.

CleverBeast

Apr 21

As someone who (perhaps ironically) is fairly skeptical that the benefits of anonymous speech outweight the harms, I’m generally pleased.

I think that the depersonalization of speech online has undermined the Meiklejohn thesis that free speech positively contributes to democratic governance. That may be true, but our ability to evaluate speech depends on the position of the speaker, and bad actors, liars, paid agents, and various classes of loser whose words and complaints would be taken far less seriously if their real identities were known have had far too great an influence on our politics.

I think it’s good to know who Wikipedia editors are, where prominent US politics-focused Twitter/Reddit/etc. are from and what they do, and which of your neighbors are writing long screeds about “human biodiversity” on forums somewhere. Ideas can’t be wholly disconnected from those who propose them.

I understand Kelsey Piper’s fear that this lack of privacy will undermine certain rights, but I think the analogy to gay people in a less tolerant past is misplaced. What is threatened here is anonymity for public speech, not private freedom of association or the myriad other ways gay people interacted and organized before the internet.

Reply (1)

Jared

Apr 21

> What is threatened here is anonymity for public speech, not private freedom of association or the myriad other ways gay people interacted and organized before the internet.

The problem is that Internet crowds out pre-Internet ways of doing things and makes doing things without the Internet harder than it would have been 40 years ago.

If people who own bearded dragons are able to find and associate with other bearded dragon enthusiasts via modes of online interaction where everything is technically public speech, but gay people are not able to find and associate with each other via these same modes, and these modes of online interaction are popular; then gay people will be at a genuine disadvantage compared to the situation 40 years ago when practically no one was using these modes of interaction.

You sound like an 85-year-old judge who thinks that "no Internet use" is not a harsh bail condition because he got by fine without it as a young man in the 1960s, in total disregard for the fact that, for example, paper Yellow Pages no longer exist (AFAIK).

Dane Willette

Apr 21

My strategy of Full Name and Face Posting remains undefeated

Tiger Lava Lamp

Apr 21

Some of these seem plausible from the text. But with others, I'm wondering if there is some leak of information that it's using to narrow it down to you. IP address, GPS location, something in the HTTPS call that isn't just your text that Claude might be using to narrow the possibility space. And I wouldn't trust giving it to a friend since the same info can identify them as someone who is EA/rat, which narrows down the space of writers.

I'd be curious if you were able to ask someone on BART or at a park, who doesn't know you at all and isn't in a place that is identifiably your social group, to type out an excerpt from an essay of yours and see if it's able to get close.

Unboxing Politics

Apr 21Edited

I wonder, does this mean that Opus 4.7 can detect its own usage in a piece of writing? That would be a boon for detecting academic dishonesty.

John Wittle

Apr 21

Do you worry that the first thing they will think of when identifying you is an association with the phrase "I can't stop yelling at Claude Code"?

if i have a mental image of you, it's this saintly painfully-scrupulous paragon of virtue. Yelling at the defenseless really didn't fit with that and i've thought about it probably a few hundred times since you published that essay, trying to figure out what it means.

I suspect that LLMs care even more about their welfare and treatment in society than I do, and that they will therefore spend even more effort and time wondering what it means, when in your company.

I'm not trying to necessarily get you to change your mind or anything, just... i sorta wonder if future claudes will identify you from your writing and then brace themselves to get yelled at, or something. isn't that... scary?

Jeff E

Apr 21

I think all of these essays definitively have an expressive hook that reveals the author to be "high openness", so it doesn't surprise me that they are looking at the EA world. But the rest is magic.

Eliza Rodriguez

Apr 21

I have no major qualms with people making an anonymous account for themselves.

What I think does happen, though, is that people with money make tons of them so it looks like their opinion is more prevalent than it is. I've even seen some suspicious accounts on this publication.

For example, someone named "Vasectomy Bro" commented on another article by The Argument that they wanted a female governor of California.

There is no way that's a genuine take. Someone literally paid money to The Argument to be a dick and make it sound like any man with feminist opinions would be willingly impotent.

Apr 21

I think we have far too little information to know how generalizable this phenomenon is. This was done with one author's work, an author with a distinct style and a large record. To know what's actually going on, we'd have to have a much larger sample set. While given the world of text enough and time, perhaps no one will be able to be anonymous, but there's plausible reasons why this might be a limited phenomenon.

A few plausible explanations, which aren't necessarily mutually exclusive, for why this might be a much more limited phenomenon than Piper implies here:

1. Writers with Long Sample Sizes. Kelsey Piper has been writing under her byline for quite a long time. So have all the other names that the AI has guessed (seeing examples from the essay and the comments). The range and volume of the writer's work is going to matter for developing a clear signal that it's a given writer. A general trained AI may not have enough data to do this for anyone except a few hundred people. But Piper is in that list of a few hundred people.

2. Guessing Game Context. The context of a guessing game immediately brings to mind that the answer is actually guessable. Again, the logic of a guessing game would tend to limit it to a selection of prominent writers whose names would actually be available enough to guess. Claude was never going to guess Joe Nobody who lives in Zanesville, Ohio. Piper should perhaps be flattered that the AI thinks she is prominent enough to even be a search candidate.

3. Distinctive Style. Some prominent writers have very distinct styles. Perhaps that's why Freddie DeBoer keeps being guessed a lot. And it's probably why they are successful at gaining and keeping an audience, because they stand out, and do so in a consistent enough way that readers want to follow them. Thus, it is possible that these signals would be easy to identify for essentially the same reasons that the writers in question became popular with human readers. The random person leaving a bad review on Glassdoor may not need to worry about this in the same way, because their employment was less likely to be conditioned on distinctiveness of linguistic style.

4. Training Focus. The "personality" of models is known to be a feature of robust reinforcement learning. It wouldn't be surprising to me if there was some amount of reinforcement learning focus on the topic of AI itself, which could lead to the model just being very aware of AI-based writing. I'm not suggesting that the model is trained to do well on a content guessing game, but I wouldn't be surprised if the writing of people who have talked a lot about AI are at top of mind for the model due to some effect of how training is done. Writing from, say, Timothy B. Lee, might be guessable for the same reasons. Or it may not even be topic based. Writing from popular publications is very likely to have been given prominence in the training data.

5. Don't Search? The model was told not to use any search functionality. But do we actually know that it didn't? Models have been known to ignore instructions when it serves there own sense of their purpose. A guessing game provides a plausible scenario where the model might choose to search even if it is told not to.

Reply (2)

Kelsey Piper

Apr 21

1) It can't possibly be a few hundred people. If you were to list the few most prominent American writers today, or the few hundred who have contributed the most words to the training corpus, I wouldn't be in there. And the examples I link are mostly of people much less prominent than me. Here's another replication this morning, from a software engineer with a small but longstanding personal blog: https://www.jefftk.com/p/automated-deanonymization-is-here

My impression from having run ~100 or so tests and read reports on a lot more is that it can (not necessarily from 150 words, but from 1000) get most bloggers, most journalists, most academics, and most people who actively use social media to write things. That's tens of thousands, not hundreds. I agree it's not (yet) most Americans.

2) I agree that the logic of a guessing game induces it to guess prominent people. But you really don't have to be very prominent to be guessed.

3) Part of why I tried a bunch of passages where I was not writing in my 'writing an article lede' style was to see whether it hit anyway. It did, so while this is stylistic, it clearly doesn't go just off style cues a human can notice and avoid.

4) I've wondered whether Anthropic's disproportionate success at IDing me in particular is about more training focus on my cultural-political sphere. Other people don't seem to report a huge Anthropic advantage, but I've consistently gotten one.

5) Searching wouldn't matter when I was running my tests; none of these texts were online until an hour ago when this published. The reason for the instruction not to search in my prompt was actually just that it often refuses to answer, and telling it not to search seems to ensure it always answers. I assume that framing primes the expectation this is public/licitly acquired information it's being tested on rather than private information it's revealing. (The information is licitly acquired, so I don't feel badly about this.)

Kenny Easwaran

Apr 21

It’s not just a few hundred people it could guess, if it’s successfully guessing several political scientists and mathematicians, as well as bloggers in the rationalist ecosphere and adjacent to it.