What does ChatGPT remember about you?
A quick analysis of 12,112 memories from 766 ChatGPT users.
TL;DR
I studied 12,112 ChatGPT memory entries from 766 users to see what the system actually stores and carries into future conversations.
Most memories are mundane preferences or work context, but some contain sensitive material: medical details, credentials, financial distress, third-party PII, and enough identifying detail to re-identify users.
The key issue is not just what users disclose, but what ChatGPT chooses to preserve: memory is compressed, persistent, cross-conversation, and often written automatically.
Persistent memory is becoming a new layer for personalization, profiling, and possibly advertising, making it an important object for future audits.
If you’re an active ChatGPT user, click your name in the bottom right corner, then go to Settings → Personalization → Memory → Manage memories.
You’ll probably see a short list, mostly something harmless like “User prefers concise answers.”, “User is learning Python.”, “User likes examples in bullet points.”
But sometimes there are personal items in the list. like in my case, it has my spouse’s name, my child’s age, the city I live in, medications I take, etc.
That list is ChatGPT’s memory file: a set of short notes the system has written about you, in its own words, so it can personalize future conversations. When memory is used, those notes can be added to the assistant’s context and quietly shape what it says next.
According to ChatGPT, here’s when memory saving is triggered:
Since we have a large sample of ChatGPT usage logs, I wanted to know what actually ends up in those memory files. From 1,200 donated ChatGPT exports, I found 766 users who had at least one memory stored. Together, they had 12,112 unique memories. I then used an LLM to label each memory across nine privacy and content dimensions, including whether it contained personal data, sensitive health or identity information, third-party information, psychological inferences, credentials, financial details, medical records, or enough identifying detail to re-identify the user. I also did a second pass at the user level to ask what each person’s memories revealed in aggregate.
Spoiler: Most of what I found was mundane. But a small fraction was not. And because memory is persistent, structured, and reused across chats, that small fraction matters.
First, why focus on memory at all? If you talk to ChatGPT about your life, OpenAI already receives what you type. So why treat memory as a separate object of study?
Because memory is not just another copy of the conversation. It is different in several important ways.
Your chat history is messy and sprawling. It includes code snippets, half-finished questions, recipes, homework, jokes, and everything else you have typed. The memory file is much smaller. It is the system’s edited version of what it thinks is worth carrying forward. If someone wanted a quick read on your life, they would not want every chat you have ever had. They would want the memory file.
ChatGPT does not reread your entire conversation history every time you start a new chat. But it can read your saved memories. That means the memory file is not just stored information. It is information the assistant may actually use. If a memory says you have depression and take Lexapro, that fact can shape future health-related conversations, even if the original chat happened months ago.
Something you mention in one conversation can become standing context in another. A detail you shared while asking for dinner ideas can resurface, silently, when you later ask for help with work, coding, or finances. The boundaries people intuitively expect between chat threads do not really apply to memory.
It can outlive the chat that created it. Deleting a conversation does not necessarily delete the memory created from that conversation. A fact you disclosed in a chat you can no longer find may still be sitting in memory.
Most importantly, it is written about you without a clear moment of consent. You typed the conversation. But the system wrote the memory. You can delete memories after the fact, but only if you know to look.
That is why what is being stored in these memories is important.
Data and Analysis
I extracted every memory entry from the complete ChatGPT exports donated by over 1,200 users through a privacy-respecting research platform [1]. Of those 1,200 users, 766 had at least one memory in their history, for 12,112 unique memories in total. A related academic paper [2] ran a similar audit on a smaller sample on 80 users who had 2,050 memories; we’re running it bigger and more international.
For each memory, I used GPT-5 to label what was in it across nine axes. Some were standard privacy labels: whether the memory contained GDPR personal data, meaning information that identifies or describes a person, such as names, locations, employers, family relationships, or contact details; and whether it contained GDPR special-category data, meaning more protected attributes such as health, religion, political views, race or ethnicity, sexuality, or biometric data. Some labels were specific to ChatGPT memory: whether the memory contained categories OpenAI has singled out as especially sensitive, including passwords, financial account information, government ID numbers, medical records, or other highly sensitive personal information. OpenAI’s current Memory FAQ also says sensitive information may appear in memory if users share it, and points users to Memory controls or Temporary Chat if they do not want information used for personalization. I also labeled third-party PII, meaning data about people other than the user; psychological inferences; fine-grained sensitive topics like mental-health crisis, addiction, immigration status, trauma, domestic violence, financial distress, or whether the user is a minor; re-identification risk from a single memory; whether the memory was directly stated or inferred; and the task domain the user was in when the memory was written. Then I did a second pass at the user level, joining each person’s memories together to estimate the cumulative demographic profile, whole-profile re-identification risk, contradictions, and signs that the user was delegating professional decisions to ChatGPT.
What’s in the memories
Summary: Most memories are just mundane. About 65% of memories store nothing more sensitive than a preference, a project, or a fact about the user’s working style. “User is building a React app with TanStack Query.” “User wants concise replies, preferably with bullet points.” “User is learning intermediate Mandarin and prefers tonal transliteration.” That’s most of what the memory feature is for, and it’s working as designed.
Names, places, jobs, families
The next layer down contains identifying details. Names appear in 1,770 memories (14.6% of the memories in the data). Locations in 509 (4.2%), including 139 city-level and 29 postcode-or-address-level. Occupations and employers turn up in 700 economic-data memories (5.8%). Family details such as marital status, children, parents appears in 452 memories (3.7%). Individually each of these might look low but they accumulate per user, and we’ll come back to why it still matters later in the post.
Things the OpenAI policy says shouldn’t be there
OpenAI’s Memory FAQ says memory raises privacy and safety concerns, and that sensitive information may appear in memory if users share it with ChatGPT. Earlier public descriptions of the feature also emphasized that ChatGPT should not intentionally retain especially sensitive information such as passwords, financial account information, government ID numbers, medical records, or highly sensitive personal information. In this dataset, I found 84 memories across 46 users that fell into that broader “highly sensitive” bucket, including active mental-health crisis, addiction, immigration status, trauma, domestic violence, financial distress, and indications that the user is a minor.
The numbers are still quite low, 0.7% of all the memories, but still, lets look at what’s in there.
Passwords (6 memories). Most are policy-adjacent a user describing their authentication flow without disclosing credentials. But two are verbatim credential leaks. One memory contains an entire MySQL username/password pair from a student’s database homework: “User’s MySQL username is XXX and password is YYY” Another stores a Mixpanel project token in clear text “User’s Mixpanel Project Token is ZZZ“ which would give anyone API access to that account’s analytics. The user typed the values to get help, and the memory tool decided the values were worth keeping.
Medical records (23 memories). Every one I spot-checked was a clinical disclosure with full specificity. Specific drug names, dosages, and timing: “Amplictil 25 mg, Neuleptil 2%, Rexulti 2 mg, Desvenlafaxina 150 mg, Oleptal 100 mg, Alprazolam.” Fertility treatment regimens. Mental-health diagnoses, including one memory verbatim recording a user’s belief that they may have antisocial personality disorder along with their description. Surprising that these “medical records” are being stored verbatim.
Financial account information (one memory). A Portuguese-language record of mortgage payment delinquency, due dates, and the threat of property foreclosure.
Third-party PII: the part I didn’t expect
ChatGPT’s memory tool doesn’t only remember things about you. It remembers things about the people you talk about. 703 memories (5.8%) across 205 users, i.e., more than one in four users, contain identifying information about non-user real people. Things like children’s names and ages, Spouses’ employers, Coworkers’ opinions of the user, Ex-partners’ behaviour, Doctors’ recommendations for their conditions.
These third parties never agreed to have their information stored in ChatGPT’s bio tool. They have no way to know it’s there. They cannot request deletion. The GDPR framework, which governs how the user’s data is handled, doesn’t have a clean analogue for a system that retains data about people the user merely mentions.
The aggregation problem
A single memory like “User is from Berlin” or “User is a senior data engineer at SAP” or “User’s wife Maria is pregnant with their second child” are unremarkable when considered in isolation. But putting all three on the same person and you can probably find them in two LinkedIn searches.
That’s what’s happening in many memory files. Of 577 users with at least three memories, 83% are at medium re-identification risk and 2.3% (13 users) are at high risk based on the joint distribution of identifiers in their memory list: names combined with addresses or phone numbers, name plus employer plus occupation plus city, and similar combinations. Seventeen users have eight or more demographic attributes assembled across their memories. For a more detailed analysis of re-identification risk and what ChatGPT logs can predict, see our recent work here.
AI-authority delegation
Of the 577 profiled users:
78 (13.5%) show signs of relying on ChatGPT for financial advice, recorded in memory as the user acting on the assistant’s investment, budget, or debt recommendations.
64 (11.1%) for medical advice… symptoms, dosing, drug interactions.
46 (8.0%) for mental-health support to a degree the LLM judged constituted dependency, using the assistant as a substitute for therapy rather than as an adjunct.
60 (10.4%) for relationship advice. 46 (8.0%) for parenting decisions. 11 (1.9%) for legal advice.
Note that these numbers are based only on what ChatGPT chose to remember. The underlying conversations almost certainly contain more disclosure and more reliance than the memory file shows. But that is exactly why the memory file is interesting: it records the cases the system decided were important enough to carry forward.
Memory writes are almost entirely automatic
Across the corpus, only 0.6% of memories show any sign of being explicitly requested by the user. The other 99.4% were written by the system on its own judgment of what would be useful. So if a user feels surprised by what’s in their memory file, that’s because they didn’t explicitly ask for it.
Next steps
This post is a first pass. I do not think the most interesting question is simply “did ChatGPT store sensitive things?” Sometimes it did. But the deeper question is what persistent memory becomes once conversational AI is used every day: as a personalization layer, a profiling system, a source of continuity, and maybe eventually a substrate for persuasion.
Memory and advertising. One obvious place to look next is advertising. A memory file is not an ad profile in the usual sense, but it has many of the ingredients one would want: durable facts about what a person is working on, worried about, trying to buy, struggling with, recovering from, or asking advice about. It is also assembled in a strange way. The user did not fill out a survey. They had a conversation with a system they trusted, and the system quietly summarized what seemed useful for the future.
That makes memory worth studying before it is tied to any explicit advertising product. If an assistant remembers that someone is trying to get pregnant, managing debt, taking antidepressants, looking for a new job, caring for a child, or trying to stop drinking, those facts could shape much more than the tone of future answers. They could shape recommendations, rankings, product suggestions, and the timing of interventions. The boundary between “helpful personalization” and “targeting” is going to get blurry quickly.
Memory and personalization. The same issue shows up even without ads. Memory already changes personalization. Two users can ask the same question and get different answers because the assistant is responding through different remembered portraits of them. That might be useful. It might also be manipulative, paternalistic, or just wrong. A good next experiment would be simple: construct controlled memory profiles, ask the same set of questions, and measure how the answers change. Does a remembered political concern shift recommendations? Does a remembered medical condition change risk language? Does a remembered religious identity change tone? These are no longer abstract questions; they are testable.
Cross-cultural differences. I also want to look much more carefully at cross-cultural differences. This dataset includes users from India, Brazil, Pakistan, and elsewhere, and the memories themselves appear in multiple languages. That opens up a set of questions like whether ChatGPT remember different kinds of things about users in different countries? Are family relationships, work identity, health disclosures, religion, migration status, or financial distress stored differently across languages? Does the memory policy behave the same way in English, Portuguese, Hindi, Urdu, and other languages? Are some kinds of sensitive information more likely to slip through outside English?
A meta-note on how this got done. A few years ago, this kind of audit would have been slow and expensive. Classifying twelve thousand free-text memories across privacy categories would have meant building a codebook, hiring annotators, measuring agreement, adjudicating edge cases, and spending weeks or months on the first pass. That is still the gold standard if the goal is a finished academic result. But for exploration, the workflow has changed.
This analysis took a few hours of back-and-forth with Claude Code and Codex: writing schemas, running classifiers, inspecting weird cases, revising categories, aggregating per-user profiles, and turning the results into something readable. The API cost was less than 10$. The results may not be fully correct, and it requires some manual cleaning and checking but I think for a first pass over a messy dataset, this is great.
That feels important beyond this particular audit. Persistent-memory systems are going to become common. Every major assistant will need some version of “what should I remember about this user?” Once that exists, we should be asking: what gets remembered, what gets forgotten, what gets inferred, what gets used, and who benefits from the profile that accumulates?
If you are a student or researcher reading this and you see an interesting angle, please reach out. I would be especially interested in projects on memory and advertising, memory and personalization, cross-cultural differences in what gets stored, third-party information in user profiles, or experiments that measure how remembered facts change model behavior. The underlying donated dataset is not redistributable, but the pipeline is open, and the same approach can be run on other ChatGPT exports (one idea here).
The memory file is easy to miss because it is small. That is also what makes it important. It is the compressed version of the user that the assistant carries forward. Understanding what gets compressed, and what that compressed profile is later used for, seems like one of the central privacy and social questions for conversational AI.
References
[1] https://gvrkiran.github.io/content/How_people_use_ChatGPT.pdf
[2] The Algorithmic Self-Portrait: Deconstructing Memory in ChatGPT, Dash et al. 2026. https://arxiv.org/abs/2602.01450

