23andMe and the Future of Human Genomic Data Privacy and Security
Over 15 million genomes with profile and metadata used, hacked, bankrupted, then sold for $256 Million. What lessons did we learn?
Before I start this long read, full disclosure - we do not use personal genomic data at our company - it is policy. We consider it to be very toxic data type since it is very easy to be accused of misuse. Even experts do not fully understand the risks and benefits of this data. I also think that it is not as valuable for target discovery or other drug discovery tasks as many people believe especially, if it not connected to many other data types. But when it is connected to the many other data types including phenotype and surveys, it may be very valuable. On May 19th, Regeneron announced the acquisition of 23andMe together with its data. It came as a surprise to me, the former 23andMe customer. But unlike other customers, I actually celebrated - maybe Regeneron will find a drug using the models trained on data that also includes my genome. It will be great for everyone including me.
But to the industry it came as a shock. I remember sitting in a conference room with my fellow pharma executives when the news hit: Regeneron was acquiring 23andMe for $256 million. This number seemed to be significantly higher than the value of the company at the time of the bankruptcy but much lower than I expected the net totality of the genomic and phenotypic data in the company. Fifteen million genomes, each coupled with rich personal and health data, changing hands for what amounted to pocket change per person. As an industry, we had spent years preaching that genomic data was priceless, sensitive beyond measure, practically radioactive in its potential for misuse. And here it was being sold at roughly $17 a head – less than the cost of a couple of lattes. My first reaction was disbelief, quickly followed by an uncomfortable question: Have we been overestimating the risk and underestimating the value of genomic data all along?
In that moment, I felt a jolt of perspective. For years, our conversations around genetic data had been dominated by fear. We tiptoed around it, treating every genome like a live grenade that might go off if handled improperly. Yet the 23andMe saga – a massive trove of DNA data auctioned off in bankruptcy – tells a more nuanced story. It forces us to reckon with the true value of genomic information and the real (as opposed to imagined) risks of letting it loose. As someone who’s spent decades in pharma, I want to challenge some of the sacred cows we’ve built around data privacy in genomics. I write this in a first-person, reflective tone because this is personal: it’s about how we, as pharma leaders, choose to see the DNA data of millions of people – as a toxic liability, or as a powerful asset for human health.
Let’s dive into what Regeneron really bought, why the feared genomic privacy apocalypse hasn’t materialized, and how we might forge a new social contract that balances innovation with individual rights. Strap in, because this isn’t the conventional “data privacy” sermon you’ve heard before. It’s an argument for rethinking everything.
The 23andMe Acquisition in Context
When the deal closed, the final price tag was $256 million for substantially all of 23andMe’s assets. For context, 23andMe isn’t just any biotech startup – it’s the company that persuaded over 15 million people to spit in a tube and share their DNA in exchange for ancestry tidbits and health trait reports. Regeneron’s purchase includes that entire database of genotypes and the associated phenotypic and survey data those customers provided. Do the math and it comes out to around $17 per customer profile, or about $21 per genome for those who consented to research use. In the world of biotech, that’s an astonishingly low cost per genome. It’s as if a luxury sports car was suddenly selling for the price of a used bicycle.
To grasp how extraordinary this is, compare it to a few well-known genomic data collections:
deCODE Genetics (Iceland) – In 2012, Amgen acquired deCODE for $415 million. deCODE had genotyped and sequenced around 160,000 Icelanders (with deep medical records to boot). Roughly speaking, that was about $2,500 per genome – and industry folks thought it was a steal at the time, given the quality of data and the unique resource it represented.
UK Biobank – The UK government and research charities have invested well over $100 million to genotype (and now whole-genome-sequence) a cohort of over 500,000 volunteers in the UK Biobank. That comes out to hundreds of dollars per person just in sequencing costs, not counting the extensive phenotypic data collection. And UK Biobank data isn’t “for sale” per se – it’s an open resource for approved researchers, arguably priceless in terms of scientific value.
GSK’s partnership with 23andMe – A few years back, pharma giant GSK paid $300 million for a mere stake and collaboration with 23andMe, hoping to tap into the database for drug target discovery. Even that partial, temporary access cost GSK about $20 per 23andMe customer at the time.
Regeneron Paid Less for Entire 23andMe than GSK paid to collaborate and have access to insights into the 23andMe Data!
Seen against these benchmarks, Regeneron’s $17-a-head deal looks almost absurd. Why so low? The answer lies in the circumstances. 23andMe’s direct-to-consumer business had been faltering, with demand for ancestry tests cooling. Then came a seismic event: a massive data breach in 2023 that compromised millions of customer records. Public trust plummeted. Legal bills mounted. By early 2025, 23andMe was on the ropes, filing for Chapter 11 bankruptcy. In the ensuing auction, there weren’t many takers for a tarnished company sitting atop a mountain of sensitive DNA data. Regeneron’s bid won, and suddenly a pharma company best known for its drug development pipeline became the custodian of 15 million genomes.
From a purely data standpoint, Regeneron scored the bargain of the century. Not just because of the quantity of data, but its unique nature: 23andMe’s database isn’t a random biobank. It’s enriched for people who were curious and engaged enough to seek out genetic testing, many of whom also answered hundreds of survey questions about their health habits, medical histories, personalities, life outcomes – you name it. It’s a quirky, high-dimensional dataset spanning everything from ancestry and raw genetic code to whether you hate cilantro or can curl your tongue. In research value, it’s unprecedented. If UK Biobank is a gold mine for science, 23andMe’s trove is a diamond mine with gold, silver, and some weird semi-precious gems mixed in. The catch, of course, is that this mine comes with a big warning sign: “Privacy Hazard: Handle with Care.”
The low sale price is a signal. It signals how wary the market has become about genomic data liabilities. We in pharma have to decode that signal. Is it saying “this data isn’t actually worth much” or is it saying “the data is invaluable, but the perceived privacy risks have discounted it heavily”? I’d bet on the latter. Regeneron certainly isn’t buying those 15 million genomes for fun – they believe there’s enormous drug discovery value hidden in there. The price was low because many others were scared off. They were scared of regulators, of lawsuits, of public backlash, of the proverbial “nuclear waste” that a genomic database can become if it leaks. Regeneron was willing to stomach that fear, likely because they have confidence in both their data security and the payoff of integrating genetics into drug R&D.
This brings us to a critical piece of the story: that big bad data breach, and whether the nightmare scenarios we’ve all imagined actually came true.
The Breach and the Myth of Genomic Harm
Let’s talk about the elephant in the room: the 23andMe data breach of 2023. When news broke that hackers had accessed data from about 7 million accounts, it was as if a collective chill went down the spine of every 23andMe customer and every privacy officer in our industry. Sensitive personal details, ancestry information, some health reports, even raw genetic data – all allegedly stolen. To make matters worse, rumors started to emerge that these stolen records started appearing on the dark web for sale.
The public reaction was swift and fierce. Media headlines screamed about DNA data on the loose. Customers panicked; many felt violated. Lawsuits were filed almost immediately, and regulators announced investigations. It was everything a company like 23andMe fears. Overnight, the trust they’d built with users evaporated. People wondered if their genetic secrets were out there for any criminal or nosy neighbor to find. The narrative of genomic data being uniquely dangerous seemed to be coming true.
But then, a funny thing happened – or rather, didn’t happen. Specifically, nothing visibly terrible happened to those 7 million people. In the months following the breach, there were zero confirmed cases of someone being harmed because their 23andMe data leaked. No reports of blackmail (“Pay up or we’ll tell the world about your BRCA mutation”). No insurance company was caught surreptitiously trawling the dark web to raise people’s premiums because of their genetic risk scores. No stalkers using DNA data to track down distant relatives of their targets. In short, the concrete impact on individuals was near zero, aside from understandable anxiety and annoyance.
This bears repeating: a trove of genetic data was exposed to all the “unscrupulous buyers” of the internet, and the sky did not fall. The worst outcome was for the company (23andMe’s reputation and finances were wrecked). The customers themselves, as far as we know, did not experience direct harm. This isn’t to trivialize the breach – it was a serious security failure and a violation of privacy. But it serves as a real-world test of all the horror stories that have been speculated about genomic data. And the reality is, those stories largely did not materialize.
Think about breaches of other sensitive data: If 7 million credit card numbers get stolen, you see fraudulent charges within days. If 7 million social security numbers get leaked, you brace for identity theft, credit ruin, IRS tax fraud – a whole cascade of nightmares that often do come true for the victims. But 7 million DNA profiles? It appears many ended up being traded around among data brokers and probably fed into hacker collections, yet no one’s life was ruined because of it. One reason might be that bad actors simply don’t have an immediate use for your raw genome the way they do for your bank info. Another reason: perhaps the truly sensitive parts of the data (like names and addresses) were already available elsewhere, making the genetic component moot from a misuse standpoint.
Interestingly, the very cheap price of the stolen 23andMe records on illicit markets tells a story by itself. It suggests that even in the bowels of the dark web, buyers weren’t lining up to get their hands on people’s SNP profiles and ancestry results. This data was not “hot” property; it was more like a curiosity sold in bulk. Contrast that with, say, stolen electronic health records or full identity packets – those fetch much higher prices because criminals can do something with them (file false insurance claims, open credit lines, etc.). The market spoke: genomic data without obvious financial or operational utility just isn’t that valuable to cybercriminals.
This reality check should prompt us, especially those of us in pharma and biotech, to ask: Have we been overhyping the dangers of genomic data exposure? We’ve been treating DNA data like a radioactive isotope – something that, if leaked, could lead to catastrophic outcomes. But the 23andMe breach indicates that maybe genomic data isn’t as immediately hazardous as we thought. In fact, one could argue the biggest “harm” was the harm we cause ourselves by panicking and pulling back from data sharing that could benefit science.
Usually, politicians playing on this sentiment would invoke the idea of creating individual DNA-targeting bioweapons but at this stage of technological development, it is complete nonsense. If scientists had this kind of technology, they would have cured many diseases by now. Also, the 23andMe data that was hacked and stolen included a large number of Ashkenazi Jews, whose data was Leaked Online. This is one of the population groups most commonly targeted by terrorists.
Don’t get me wrong – privacy matters, and we absolutely must protect personal data. The point is not that breaches are okay; it’s that not all breaches are equal. A leak of genetic data is not the same as a leak of your credit card, your passwords, or your medical treatment history. The nature of potential misuse is fundamentally different. To understand how, let’s reframe the privacy conversation altogether.
Reframing the Privacy Debate: Genomes Are Not Credit Cards
Here’s a mental exercise: imagine you have to choose between two bad scenarios. In one, a hacker steals your credit card number and security code. In the other, a hacker steals your raw 23andMe genetic data. Which keeps you up at night more? If you’re like most people, the credit card theft is the immediate nightmare – your money could be drained, your accounts frozen, your credit wrecked. The genetic data theft feels more abstract, almost puzzling: what could they even do with it? Create an evil clone? It’s not obvious.
And that’s precisely the point. We need to stop equating genomic data with financial data or login credentials. Your genome is not a password. It’s not a PIN or a social security number. It cannot be used to directly steal your car or empty your bank account or impersonate you in a transaction. Yet for years, the narrative has been “genetic data is the most personal identifier, guard it with your life.” Yes, it is intensely personal – it’s literally the code of your biological self. But personal doesn’t automatically mean dangerous in someone else’s hands. We have to differentiate between emotional reactions and practical risk.
Consider this: a genome sequence is a long string of A’s, T’s, C’s, and G’s. By itself, it tells you nothing obvious, unless you do some serious interpretation. It’s not like reading someone’s diary; it’s more like reading a deeply encrypted diary that even experts struggle to fully decode. Without context or correlation to other information, a raw genome is not very useful to a malicious actor. It’s not trivial to suddenly know “aha, this person is at high risk for Alzheimer’s” without additional data and analysis. And even if they did know that – how exactly do they weaponize it against you? By telling your employer? (That would be illegal discrimination and also requires proving it’s actually you, etc.) By telling your health insurer? (In many jurisdictions, including the U.S., health insurers are barred from using genetic information to set coverage or rates – more on that in a moment.)
Compare this with stolen financial data: utterly straightforward to monetize for criminals. Or stolen health records: chock full of details that can be used for fraud or extortion (imagine a hacker threatening to expose a celebrity’s mental health treatment – that’s actionable info). Genomic data, on the other hand, mostly yields probabilistic information about health or ancestry that even the owner often doesn’t fully understand. The worst you might say is “Hey, I see you have a BRCA1 mutation, you should worry about breast cancer” – but if someone tried to blackmail me with that, I’d probably respond, “Thanks for the tip, I’ll discuss it with my doctor.” It just doesn’t have the same punch as “I have your bank logins” or “I know your private medical history.”
Now, some people raise the concern of genetic discrimination – that an insurer or employer could use your DNA to deny you opportunities. That is a legitimate concern in theory, which is why laws like the Genetic Information Nondiscrimination Act (GINA) exist in the U.S., prohibiting employers and health insurers from using genetic info against you. Yes, GINA isn’t perfect (life insurance and long-term care insurance aren’t fully covered, for example), but to date there’s scant evidence of rogue insurers sneaking around trying to get hold of leaked DNA data to tweak policies. If anything, insurers rely on known medical diagnoses and family history (which are much easier to obtain) rather than raw genomic data. It’s easier for an underwriter to note “father died of colon cancer at 50” than to interpret a genome for colon cancer risk variants.
What about more sci-fi scenarios? People worry about someone creating a bioweapon targeted at their DNA or planting fake DNA at a crime scene. These scenarios occupy the extreme fringe. Designing a bioweapon that only hurts someone with a specific genetic variant is far beyond current capabilities (and if someone wanted to harm you that badly, there are simpler ways). As for planting DNA evidence: a criminal would need access to a sample of your DNA (not just the data file) to do that. If they were that determined, they could get your DNA from a used coffee cup or hairbrush far more easily than hacking a database and synthesizing a DNA sample from the digital code. The threat, while not zero, is more Hollywood than real-world at this point.
Perhaps the only concrete “misuse” of genetic data we’ve seen is in the realm of law enforcement. The Golden State Killer case famously used an open genealogy database to find a suspect via his relatives’ DNA. This raises valid debates: if your third cousin’s DNA is in a database, could that implicate you in something by familial association? It could, and that blurs lines of consent and privacy. However, note that in the 23andMe breach, law enforcement isn’t the one hacking data – they have legal processes (warrants, subpoenas) to request data, and companies have policies on if/how they comply. 23andMe actually boasted that they resist law enforcement requests without proper legal order, and they weren’t a big player in familial forensic searches. In any case, getting access to data via breach is not how police operate typically. So again, the marginal additional risk due to a breach is quite low.
The bottom line is this: we’ve been treating genomes like credit cards, when in fact they’re fundamentally different. A stolen credit card is an urgent crisis; a stolen genome is an embarrassment and a potential long-term worry, but not an immediate life-altering event for the victim. We should calibrate our responses and policies to reflect that. That doesn’t mean we ignore privacy – it means we manage it in a way that also considers the opportunity cost of clamping down too hard. And speaking of opportunity, let’s talk about the flip side of the coin: the immense scientific and medical value locked up in these genomic datasets, which we risk foregoing if we let fear dominate the conversation.
The Scientific Goldmine in Genomic Data
If genomic data is not a radioactive threat, then what is it? From where I’m standing – which is in the R&D halls of pharma – it’s a goldmine. Better yet, it’s like a vast reservoir of potential energy (to foreshadow my conclusion). Each genome in a database like 23andMe’s is a datapoint that, when combined with millions of others, can illuminate human biology in ways we never thought possible. We’re talking about discovering the biological mechanisms behind diseases, identifying new drug targets, figuring out why certain medications work great for some people but not at all for others, and finding humans with rare superpowers (genetic ones, anyway) who hold secrets to health that we can all benefit from.
History already provides some spectacular examples of single genetic insights leading to medical breakthroughs:
PCSK9 and Cholesterol: A couple of decades ago, researchers discovered that some people had mutations in a gene called PCSK9 that resulted in ultra-low LDL cholesterol levels – and, importantly, these people were surprisingly heart-healthy. That discovery directly led to the development of PCSK9 inhibitor drugs, a new class of potent cholesterol-lowering therapies. Those drugs are now saving lives by preventing heart attacks. The key was studying both a rare harmful mutation (in families with very high cholesterol) and rare beneficial mutations (in individuals with very low cholesterol). The human genome served up a clue, and pharma ran with it.
GPR75 and Obesity: More recently, a study led by Regeneron and academic partners analyzed half a million people’s DNA and health data (much of it from the UK Biobank) and found that individuals with certain rare mutations in the GPR75 gene were highly protected against obesity – they weighed much less on average than those without the mutation. Bingo: a new obesity drug target was born. Now there’s intense interest in developing drugs that mimic the effect of knocking out GPR75, which could become a novel therapy for obesity or metabolic disease. That kind of discovery only emerges when you have massive datasets to sift through, because those protective mutants were literally one-in-many-thousands.
ApoC3, ANGPTL3, and Triglycerides: Another example – people with rare loss-of-function mutations in certain genes have been found to have dramatically low triglyceride levels or other favorable metabolic profiles. These findings (like those for the genes ApoC3 and ANGPTL3) have spawned new therapies for hypertriglyceridemia and even new approaches to cardiovascular disease. Each time, the pattern is similar: find a gene variant that naturally does something beneficial, then design a drug to copy that effect.
Proteogenomics and New Biology: Beyond finding drug targets, having genomic data linked to other biological data opens new windows. For instance, large projects now measure thousands of proteins in people’s blood (the “proteome”) and connect those to genetic variations. This field of proteogenomics has yielded incredible insights – like identifying proteins whose levels are controlled by certain genes, which in turn affect disease risk. A concrete example: genetic variants that change the level of a protein called IL-6 receptor led researchers to develop IL-6 blocking antibodies for autoimmune diseases. If you have a million genomes with linked protein data or health records, you can do this kind of analysis at scale and perhaps find dozens of new angles for therapy.
Identifying “Super-Carriers”: In a huge database, you can find what I call genetic “superheroes” or at least unusual individuals. Maybe there’s someone with a gene that normally causes a deadly disease, but they are completely healthy – suggesting they have protective factors. Maybe you find someone who’s 80 years old, has smoked all their life and never got lung cancer, carrying clues in their genome as to why. These anecdotes become discoveries when you have data at scale. The famous CCR5 delta-32 mutation that makes people immune to HIV was discovered by studying individuals who were exposed to HIV but never contracted it. How many more protective variants are sitting in the 23andMe database, unlooked-for? With millions of people, likely quite a few. And within those lie blueprints for tomorrow’s medicines.
The point is, genomic data saves lives – but only if we use it. If we treat it like a deadly toxin that must be locked away, we miss out on these breakthroughs. Our industry spends billions on R&D, and yet something as simple as a genetic association study can point the way to a winning project or spare us from pursuing a dead end. (Evidence: drugs with genetic evidence backing their target have a significantly higher success rate in clinical trials. Genetics helps de-risk big decisions.)
Now, the 23andMe dataset is particularly interesting for a few reasons that are worth highlighting to my fellow pharma colleagues:
It’s diverse (at least more so than many studies). 23andMe’s customers come from a variety of backgrounds and ethnicities, meaning discoveries that might benefit populations beyond the typical European-ancestry cohorts used in a lot of research. For example, genetic factors more common in, say, East Asian or African populations might be lurking there, waiting to be found, for traits like diabetes or blood pressure. This is not just good ethics – it’s good science and business, because drugs need to work for everyone.
It’s linked to self-reported phenotypes. Yes, self-reported data can be messy or noisy – but with millions of data points, patterns emerge. 23andMe users have reported everything from whether they get migraines from exercise to how much they weigh at various ages, whether they have certain diseases, etc. That’s a trove of phenotype information you can’t get easily elsewhere at scale. When cleaned and validated, it can reveal correlations that spark hypotheses (e.g., a genetic variant that correlates with people reporting they never feel pain the same way could lead you to a new pain drug target).
It has longitudinal potential. So far, 23andMe’s data is largely baseline (one spit kit, one survey snapshot). But some users have been tested for years, new surveys launched, new data accumulated. If integrated with outside data (like electronic health records or wearable data, with consent), this could become a longitudinal study of millions. Imagine following 10 million people over 10 years genetically – the power to see which genetic profiles get which diseases or respond to which meds is unprecedented.
And specifically for metabolic diseases (as mentioned in the context given – GLP-1 analogs, muscle “incretins”, rare phenotypes): this is an area where genetics can shine. The craze over GLP-1 analogs (drugs like Ozempic for weight loss) shows we have potent tools, but not everyone responds the same or tolerates them. Genetics might predict responders or suggest combination therapies. Perhaps there are people in the database who lost weight easily and kept it off – do they carry rare variants we can mimic pharmacologically? “Muscle incretins” might refer to signals from muscle that affect metabolism or appetite (there’s speculation that exercising muscle releases factors that suppress hunger or improve insulin sensitivity). If some individuals have naturally high levels of such factors (due to genetics), we might find those signals and turn them into therapies. A large dataset helps find these needles in the haystack.
Every genome in that 15 million is a potential lesson. But only if we’re allowed to learn from it. That brings us to the interplay with regulation and public sentiment – how do we use this goldmine without triggering a privacy backlash or running afoul of laws? The good news is, I sense a shift in how regulators are thinking about this.
Regulation: From Punitive to Pragmatic
Not long ago, any company that lost control of customer data could expect metaphorical heads to roll. Massive fines, public shaming, executives grilled in hearings – the regulatory playbook was mainly about punishment and setting examples to deter others. With genomic data, regulators have been understandably cautious. The EU’s GDPR treats genetic information as sensitive personal data, deserving extra protection. In the US, we’ve seen the FTC and state authorities come down on companies for privacy missteps. One might expect that 23andMe’s breach and subsequent bankruptcy would result in a regulatory crackdown of epic proportions. But what actually happened was more measured, even pragmatic.
Yes, there have been consequences: Investigations by the UK Information Commissioner’s Office and other data protection agencies were launched. The UK ICO even signaled an intention to levy a fine on 23andMe (a few million pounds – meaningful, but nowhere near the maximum they could have gone for). Class-action lawsuits in the US prompted a settlement (reportedly around $30 million) to compensate affected users with some credit monitoring and restitution. These are not trivial, but they’re also not crippling in the context of big tech fines. Importantly, regulators did not move to block the transfer of data to Regeneron, nor did they demand that 23andMe’s database be purged or sequestered due to the breach. Instead, the approach has been: make sure the buyer (Regeneron) understands their obligations, put safeguards in place, and keep an eye on things.
In fact, as part of the bankruptcy proceedings, a court-appointed independent privacy overseer was assigned to review the deal’s implications. Regeneron had to publicly commit to upholding 23andMe’s privacy policies and complying with all data protection laws. Lawmakers made noise about “unscrupulous buyers”, but ultimately allowed the sale to proceed, implicitly acknowledging that having a responsible pharma company pick up the pieces might be better than leaving the data in limbo or letting it go to a less accountable entity. This is the pragmatism I’m talking about: rather than scorning any use of the data post-breach, the system sought to ensure it would be used ethically. Regulators appear to recognize that if there’s no actual harm to individuals manifesting, there’s no sense in burning the village to save it.
It’s a subtle but important shift: focus on outcomes, not hypotheticals. Regulators are increasingly asking, “Is there evidence of misuse? Is the company taking reasonable steps to prevent misuse? How can we enable beneficial research while mitigating risks?” This is a far cry from a purely punitive stance of “one strike and you’re out (of business)”. I see parallels in how data regulators are evolving similarly to how FDA might evolve – from blocking anything risky to managing risk while allowing innovation.
For example, a few years ago, if you mentioned a DNA database might get sold to a pharma company, privacy advocates would gasp and perhaps regulators might intervene. Today, with 23andMe, the conversation is: how do we make sure Regeneron doesn’t abuse it, rather than “no, you can’t have it at all”. There’s an implicit acknowledgement that using genetic data for research and drug development is a legitimate, even valuable, activity – one that needs oversight but not obstruction.
Even the tone of penalties is shifting. A multi-million dollar fine for a breach affecting millions might seem low (compared to say, GDPR’s theoretical fines of 4% of global revenue), but it’s proportional to the actual damage observed. We haven’t seen regulators equate the leakage of genomic data with, say, a leak of health treatment records (which typically draw huge fines because the sensitivity and misuse potential is known to be high). And that’s appropriate. It sets a precedent that while genetic data security is important, an incident will be evaluated on real-world impact, not just fear.
From a pharma executive perspective, this means the regulatory environment might be less hostile to large-scale genetics initiatives than we once assumed. If we engage proactively – e.g., by involving independent ethics boards, being transparent with participants, and rapidly addressing any issues – regulators seem willing to work with us to enable research rather than just punishing us for any imperfection. This is not to say we get a free pass (nor should we), but the mood music is changing. The conversation with regulators can include, “How can we unlock this data safely for the public good?” and not only, “Don’t you dare do anything without 10 layers of legal checkboxes.”
However, one area where there remains a thorny challenge is with laws like GDPR that give individuals strong rights – such as the right to erasure. This brings us to the practical dilemma of deleting genomic data on request.
The GDPR Dilemma: The Illusion of Deletion
Europe’s GDPR enshrines a powerful concept: if I gave you my data, I can later ask you to delete it and you must comply (with some exceptions). On paper, this sounds just and reasonable, especially for something as sensitive as DNA. If I decide I no longer trust 23andMe or I regret ever sending in my spit, I should be able to pull the plug, right? The reality, however, is not so simple. In practice, deletion is often partial, delayed, or somewhat illusory – not out of malice by companies, but because of the way data works in the modern world.
I spoke with a European friend who was a long-time 23andMe customer and, after the breach, he invoked his GDPR rights to have his data deleted. He got a polite confirmation email: his account was closed, his raw data file was deleted from the customer interface, and his sample destruction was initiated. Sounds good… until you scratch the surface. What about the copy of his genotype data that was sitting in a research database that 23andMe’s scientists use for internal studies? What about any aggregate statistics his data contributed to, like “frequency of gene X in population Y” in a research paper? What about the lab testing records? Under CLIA (the lab regulations in the US), labs are required to keep test records for a number of years. A DNA genotype might be considered a test result that can’t just vanish immediately because a customer wants it to – the lab might need to retain it for regulatory audits, quality control, etc., for say 10 years. So perhaps his raw data still lives in a CLIA-compliance binder or server backup somewhere, inaccessible to normal use but not truly gone.
Furthermore, 23andMe’s own research consent documentation states something to the effect: if you withdraw consent, they will stop using your data for new research, but data already used in past research cannot be clawed back. This makes sense – if your data was part of a calculation of, say, “20% of people with gene variant Z have diabetes,” you can’t undo that calculation retroactively. The company can’t somehow retract a published result or magically remove the fragment of your data that’s been mixed into a larger analysis. They can stop referencing your individual profile moving forward, but history can’t be rewritten.
So the GDPR ideal meets the scientific reality and we end up with a bit of a conflict. Companies do their best to honor deletion requests – the low-hanging fruit is deleting the user’s account, any identifiers, and isolating or destroying the physical DNA sample. But remnants of that data inevitably persist, be it in backups (which might be kept for disaster recovery) or derived data that isn’t easily attributed to one person. Most privacy laws acknowledge that truly wiping all traces might be impossible; they often allow retention of data for legitimate purposes like regulatory compliance or research that’s already in progress.
From a user standpoint, this is confusing at best and deceptive at worst. Users think “delete” means gone forever. In truth, it often means “mostly gone, as far as you’ll ever see, but we can’t honestly say 100% gone.” Companies could probably be more transparent about this nuance. But how do you explain that without scaring people? “We’ll delete your data, except not really, but don’t worry…” – it’s a tough message.
The GDPR dilemma for genomic data is this: How do we give individuals genuine control and peace of mind, without undermining the integrity of long-term research? If every time someone withdrew, a research dataset had to be purged and statistics recalculated, it would be a nightmare for science. Imagine if a thousand people in the Framingham Heart Study (a famous longitudinal study) suddenly said “delete me” – decades of research would be thrown into question. Yet, individuals do deserve the right to not have new analyses done on their data if that’s their wish.
One practical solution is what 23andMe did: honor the request going forward, but not retrospectively. This is essentially an implicit social contract: “Your data might contribute to research, and if you later withdraw, we won’t use it further, but we can’t undo what’s done.” Is that satisfying GDPR’s intent? Debatable. Regulators haven’t fully clarified how to handle derived data in cases like these. We’re in somewhat uncharted territory.
For pharma companies like Regeneron now inheriting these obligations, it means we’ll need to be very clear and careful with our European customers. EU users likely have the ability to request deletion via GDPR or similar laws. We must comply as best we can and document why some data can’t be entirely scrubbed (if that’s the case). And maybe this is where having an independent data steward or trustee can help (more on that in a moment) – an external party that can attest to what was deleted and what remnants remain, giving users more confidence that it’s not just the proverbial fox guarding the henhouse.
What I take from the GDPR dilemma is a lesson: transparency is key. People will understand that a completely sanitized deletion might be impossible, if you explain it. “We will delete all your personal identifiers and raw data. However, any research results already derived from your data will continue to exist, though they contain no information that could identify you.” That kind of message, while not perfect, at least sets realistic expectations.
But beyond managing deletions, what if we flip the script entirely? Instead of just reacting to privacy rules and breaches, can we proactively create a better model that satisfies people’s privacy concerns and unleashes the data’s value? I believe we can – and must. Let’s explore what that new social contract could look like.
Toward a New Social Contract for Genomic Data
It’s time for a reset in how we think about and manage genomic data. The old model was transactional and static: a customer gives their DNA sample, gets a report, and maybe signs a one-time consent allowing their data to be used in research (often buried in terms and conditions). The company then sits on that data like Smaug guarding gold, maybe collaborating with a few partners to mine it quietly. The customer is largely out of the loop thereafter. Trust is expected to be implicit (“we’ll keep your data safe and use it properly, just trust us”). That model has shown its cracks – people feel uneasy, breaches shatter trust, and as a result we risk the whole enterprise of genomic research by losing public support.
We need a new social contract for genomic data, one that is dynamic, participatory, and built on transparency and mutual benefit. In my view, key pillars of this new model should include:
1. Dynamic Consent: Instead of a one-and-done consent, let participants engage in an ongoing dialogue about how their data is used. This could mean giving users a say through an app or web portal where they can adjust settings: “Yes, you can use my data for research on condition X, but not for Y” or “I’m okay with my de-identified data being shared with academic researchers, but not with for-profit companies without additional permission” or vice-versa if they’re feeling altruistic across the board. Dynamic consent recognizes that people’s comfort levels can change over time or differ by context. Importantly, it treats participants as partners rather than passive data sources. It could even include notifying them of new studies or findings that result from their data – closing the feedback loop so they see the impact of their contribution. Imagine getting a message: “Your data (along with 5 million others) just helped discover a new gene linked to heart disease. We’re working on a drug for it now.” That creates a sense of shared mission. When people feel involved and respected, they are more likely to continue sharing and less likely to feel betrayed.
2. Federated Analytics and Privacy-Preserving Tech: In the wake of breaches, one might think the safest course is to never centralize data. But of course, analysis traditionally requires data to be pooled. Federated analytics offers a clever compromise: keep the data in silos (or on individuals’ own devices/cloud accounts) and send algorithms to the data rather than data to the algorithms. The idea is that you can compute insights without aggregating all the raw data in one vulnerable place. For example, instead of a researcher pulling millions of records to their computer to run a query, they submit the query to a secure platform where the computation happens behind the curtain and only aggregated results (say, allele frequencies, or regression coefficients) come out. Google and others do this for things like smartphone data collection – it’s proven feasible. In genomics, it’s nascent but being explored. We should invest in this. It means even if a hacker breaches one system, they don’t get everything, and each piece of data remains under tighter control. Coupled with techniques like differential privacy (adding a bit of statistical “noise” to results to make it hard to re-identify individuals), we can greatly reduce the risk of any malicious use. In a sense, it’s using technology to solve the very privacy problems technology created. Pharma companies and big research consortia should lead here, building federated data networks where multiple parties (hospitals, biobanks, companies) query each other’s data without ever fully exchanging it. This not only protects privacy, it can also get around cross-border issues (data can stay in its country of origin to comply with laws, but insights still flow globally).
3. Independent Data Stewardship: Trust is fragile. Why should the public trust a pharma company with their DNA, given companies have profit motives? One way is to introduce independent stewardship – essentially creating a buffer entity or governance council that oversees access to the data. This could be a nonprofit foundation, a consortium with academic ethicists, maybe even participant representatives sitting on the board. They would approve or deny research proposals for using the data, audit usage, and ensure compliance with privacy standards. Think of it like a library – the data is held not by the drug company directly but by a neutral trust, and researchers (including those from the pharma company) come to the library to “read” (analyze) the data under supervision. The pharma funder still benefits from discoveries, but they don’t have unchecked power to exploit data in ways participants didn’t agree to. In the case of 23andMe’s bankruptcy, amusingly, the second-highest bid was by a nonprofit led by the company’s own founder, which suggests there was an idea to keep the data in a more benevolent home. They lost the auction, but perhaps Regeneron could still implement a form of independent oversight to assuage fears. If I could wave a wand, I’d establish a Genomic Data Trust where companies deposit the data and an independent board (with legal teeth) ensures it’s only used for agreed purposes.
4. Data Altruism and Participant Benefit: We should encourage a culture where donating data for the greater good is normalized and celebrated – akin to blood donation. People should have the option to say, “I want my data to help as many research projects as possible; please share it widely (in a privacy-safe way).” The EU has even floated the term “data altruism” in policy discussions. If participants opt in to broad data sharing, that data could be made available to academic researchers worldwide, not just kept for the original company’s use. On the flip side, there’s the idea of ensuring participants also benefit from the fruits of research. While we can’t promise everyone gets a check when a drug is discovered (that’s not practical, and altruism is often given freely), we can at least promise return of knowledge. For instance, if a new health risk is found and you carry that risky variant, you should have the right to know (if you want to). Or when a drug comes out that was developed using the data, maybe participants get early access or a thank-you in some meaningful form. The social contract should be: your data helped create public (and commercial) good, so you are acknowledged and not forgotten in the process. At minimum, regular newsletters to participants about what’s been learned, or having an annual “participant appreciation day” where companies openly share all the cool discoveries enabled by the community’s data, can reinforce that trust and goodwill.
In short, the new model is about partnership. It’s not “we take your data and run,” but “we invite you into the journey of discovery, and together we make sure the data is used responsibly.” This might sound idealistic to some hardened industry veterans, but I truly believe it’s the way forward. Otherwise, we will see more public backlash, more users opting out, and more regulators slamming doors – which ultimately helps no one, not even the privacy advocates if it means slower medical progress.
Pharma executives reading this might wonder about the competitive aspect – isn’t data our secret sauce? Why share it or involve others? My answer is: some precompetitive collaboration on data infrastructure and ethics can actually enlarge the pie for everyone. If people trust the system more, more people will share data. If federated approaches let us collaborate without giving away IP, we can collectively amass larger sample sizes to detect the meaningful signals. And at the end of the day, developing the drug from those insights will still require all our execution capabilities – having a few more academic eyes on the data doesn’t diminish the advantage of being the one with the capacity to act on findings. In fact, it might accelerate finding those key insights in the first place.
The Genome Is Not Radioactive—It’s Potential Energy
After all this, I circle back to the mental image that’s been at the heart of our privacy debates. We’ve been treating the human genome like it’s some sort of radioactive material – handle it in lead-lined gloves, lock it deep underground if you’re done with it, and pray you never have a leak. But a genome isn’t plutonium. It’s more like a battery or a fuel source. It contains energy – information – that, if harnessed, can propel us to new frontiers of medicine. Like any fuel, it needs to be handled with respect (you don’t pour gasoline around willy-nilly either), but the answer isn’t to bury it and never use it. The answer is to build better engines and safer pipelines.
Yes, if you misuse it or if you’re careless, someone could get burned. But thus far we’ve seen that genomic data’s “radiation” is mostly imagined; the real mishaps have been minor. Meanwhile, the potential energy sitting in those 15 million 23andMe genomes (and in all the other databases worldwide) is staggering. It’s the potential to understand diseases at a molecular level, to tailor treatments, to find cures where none exist, to predict and prevent illness rather than react to it. Every genome is a story, and collectively they form the greatest library humanity has ever assembled about ourselves. Are we really going to let that library sit locked “for safety” when it could be changing the world for the better?
As a pharma executive, I also recognize the responsibility that comes with this data. Potential energy can explode if not managed correctly – public trust can blow up in our faces if we cut corners or appear tone-deaf to privacy. So I’m not advocating reckless abandon. I’m advocating smart, ethical, innovative use of genomic data. That means going above and beyond on security (we must strive to prevent breaches, absolutely), being transparent with participants, and inviting regulators to the table early to shape guidelines that make sense. It also means educating the public: we have to demystify what can really happen with their data and what safeguards exist. We must show through actions that we deserve the trust people place in us when they share something as personal as their DNA.
In closing, the 23andMe-Regeneron episode should be a wake-up call and an inspiration. A wake-up call that our traditional approach to genomic privacy (freak-out-and-lock-down) might be doing more harm than good. An inspiration that there is a middle path where data can flow to those who can use it to help humanity, without causing the harm so many fear. It’s a call for a new mindset: one that views genomic data not as a ticking time bomb, but as a powerful engine that we need to ignite – carefully, thoughtfully, but confidently.
I, for one, am optimistic. I believe we can create a future where donating your genome to science is seen as a noble act, not a foolish risk; where pharma companies are seen as trusted stewards of this information, not greedy data miners; and where regulators and researchers work hand in hand, rather than at odds, to ensure both privacy and progress. To get there, we have to challenge conventional thinking and be willing to try new approaches to data governance.
The human genome isn’t radioactive. It’s potential energy. And it’s high time we started treating it that way – with respect for the power it holds, yes, but also with an eagerness to release that power for the benefit of all humankind. Let’s lift the lid off the vault (safely) and see how far this energy can take us. The journey promises to be remarkable, and it’s one we shouldn’t delay any longer out of fear.
And One More Thing… Aging Research
We already know that genomic data is not the best source of new targets. Try to name 10 targets that were discovered using genomics data alone in addition to PCSK9 and several cancer mutants and you will know what I mean. In my opinion, the most important and abundant source of most impactful targets is aging research using Life Models trained on multiple data types coming from multiple animal species and humans from cradle to the grave. This is a new concept but it will be one of the topics of discussion at the upcoming 12th Aging Research and Drug Discovery conference in Copenhagen, 25-29th of August 2025. If you are in pharma or biotech or if you are working in academia but want to develop real products that will help many people live longer like the next GLP-1 - this is the conference for you. Register yourself and inform your friends. Let’s meet in Copenhagen during the best time to be in Copenhagen - it is beautiful at the end of August.