The Data Broker Industry and Mental Health: Why HIPAA Isn't Enough

Data brokers are buying and selling mental health diagnoses, trauma histories, and therapy records. HIPAA doesn't stop them. Here's what does.

June 20269 min read

When a client sits down in your office and starts talking, there is an implicit promise in the room: what they say stays between the two of you. That promise is the foundation of every therapeutic relationship. It is also, increasingly, a promise that the software industry has figured out how to honor on paper while violating in practice.

The mechanism is not a hack or a breach. It is a legal industry generating over $200 billion in annual revenue in the United States alone. It is operating in full view of regulators. And the federal privacy law most therapists rely on to protect their clients does almost nothing to stop it.

What data brokers actually do

Data brokers are companies whose entire business model is collecting personal information and selling it. There are thousands of them operating in the United States. They acquire data from retailers, loyalty programs, mobile apps, social networks, public records, credit agencies, and healthcare companies. They aggregate it into profiles, package it, and sell it to insurers, employers, advertisers, landlords, law enforcement agencies, and governments.

Most people have a vague awareness that this exists. Most people assume it does not apply to their health information.

That assumption is wrong, and it is wrong in a very specific way that matters for therapists.

What HIPAA actually covers and what it doesn't

HIPAA is a real protection, but it is narrower than most people in healthcare assume. The law applies to what it calls “covered entities” (hospitals, clinics, and insurers) and their “business associates” (vendors who handle protected health information on their behalf). It governs how those parties handle, store, and share patient data.

It does not govern what happens to that data after it leaves the covered entity's hands. Specifically, HIPAA allows health data to be “de-identified” under a standard called Safe Harbor, and once de-identified, it is no longer Protected Health Information under the law. It can be sold freely to anyone, for any purpose, with no further restrictions.

De-identification under HIPAA means removing 18 specific identifiers: name, address, phone number, Social Security number, and similar obvious markers. Once those are stripped, the data is legally clean. The downstream buyers, including data brokers and the companies they sell to, are not covered by HIPAA at all.

This is not a loophole. It is the design of the law. HIPAA was built in 1996 for a world where the primary concern was paper records being faxed between medical offices. The data broker industry as it exists today was not in the picture.

De-identified is not the same as anonymous

The de-identification standard in HIPAA was a reasonable idea at the time. It has not kept pace with what data aggregation can actually do.

In a landmark study, Latanya Sweeney found that 87% of Americans can be uniquely re-identified using only three data points: ZIP code, date of birth, and sex. More recent research puts the re-identification rate for “anonymized” health records even higher once cross-referenced with the commercial profiles that data brokers already maintain on most adults.

This is the practical problem. Data brokers do not receive de-identified health records in a vacuum. They are adding them to profiles that already contain purchase history, location data, social media activity, and financial records. When you drop a de-identified therapy record into a rich commercial profile, the 18 removed identifiers are often redundant. The profile already knows who the person is. Now it knows they were treated for PTSD.

Re-identification is not a theoretical risk requiring sophisticated attackers. It is achievable with ordinary commercial data, and the data broker industry has better data than most researchers.

Why mental health data is different from other health data

Most health data, a prescription for a cholesterol medication, a visit to an orthopedic surgeon, is sensitive in a general sense but relatively contained in what it reveals. Mental health data is categorically different.

Therapy records contain diagnoses, trauma histories, relationship disclosures, accounts of past behaviors, information about family members who were never patients, and the specific texture of a person's private experience. Clients share this information because they believe it will never leave the room. When it does, the consequences are not abstract.

Insurance companies use mental health history in underwriting and claims decisions. In custody disputes, therapy records have been subpoenaed and used against the very clients who sought treatment. In some jurisdictions, mental health records can be accessed by law enforcement without a court order under public safety provisions. Employers and landlords make decisions based on information purchased from data brokers, without any obligation to disclose that they did so.

In 2023, a report from Duke University's Sanford School of Public Policy found that 11 of 37 data brokers contacted were willing and able to sell mental health data outright, with no meaningful buyer verification required. The data on offer linked diagnoses like depression, bipolar disorder, and anxiety to names, home addresses, credit scores, and net worth. One broker told the researcher that buyers could “use the data freely.” Prices started at $275 for 5,000 records.

This is not a future risk. It is the current state of the industry.

How companies sidestep this with terms of service

The mechanism that connects cloud software to the data broker industry runs through Terms of Service language that most therapists never read carefully.

When you use a cloud-based therapy notes platform, your session data leaves your device and lives on their servers. The platform signs a Business Associate Agreement with you, which many therapists treat as a meaningful privacy guarantee. A BAA establishes that the vendor is permitted to handle PHI and commits them to HIPAA compliance. It does not prevent them from using that data for their own purposes within what HIPAA allows, and what HIPAA allows is broad.

Most platforms include Terms of Service language permitting them to use “aggregated or de-identified data” for “product improvement,” “research,” or “analytics.” That language is written by lawyers who know exactly what they can do under HIPAA without crossing a legal line. In practice, it often means using session transcripts to train or fine-tune the AI models the platform then sells back to you as a feature.

The cases where this went further than vague data use practices are documented. In 2023, the FTC brought action against BetterHelp for sharing users' mental health data, including the simple fact of therapy enrollment, with Facebook and Snapchat for advertising targeting. BetterHelp paid a $7.8 million settlement. That same year, Cerebral disclosed that it had been sharing sensitive mental health information with Meta, Google, and TikTok via tracking pixels embedded in its platform. These were not small startups acting carelessly. They were among the largest mental health platforms in the country, with legal teams and compliance officers.

The incentive structure is the issue. When session data is a company's most valuable asset, the pressure to monetize it, carefully, legally, in ways that do not trigger immediate outrage, is constant. Terms of service reflect that pressure.

The other thing worth understanding is that terms of service change. What a platform says it does today is not binding on what it does after the next funding round, the next acquisition, or the next policy update. Reading a privacy policy once does not tell you what a company will do with your clients' data in three years. The data, once on their servers, stays subject to whatever their future decisions turn out to be.

A specific question to ask any AI notes vendor:“Does your Terms of Service or Data Processing Agreement permit using session data, in any form, to train, fine-tune, or improve AI models?” If the answer is yes or unclear, that is what you have agreed to.

The subpoena problem no one talks about

There is a legal dimension to cloud storage that is separate from the data broker question but equally important.

Any data stored on a server can be subpoenaed. A valid court order compels a company to produce whatever records they hold. When your clients' session data lives on a vendor's servers, that subpoena goes to them. You may not be notified in time to respond, assert privilege, or consult an attorney on your client's behalf. The decision about what gets handed over happens without you.

When data lives only on your own device, a subpoena has to come to you directly. You remain in the chain. You have the opportunity to respond, consult an attorney, and advocate for your client, the same as you would with paper records.

The Change Healthcare cyberattack in February 2024 illustrated the breach dimension of this at scale. The largest healthcare data breach in US history exposed the records of over 100 million Americans: medical histories, mental health diagnoses, prescriptions, and insurance information. The company had signed BAAs with thousands of healthcare providers. None of that prevented the breach, because all of the data was stored in one place.

Why local AI is the only real solution

The problems above have a common root: when client data exists on a server, you have lost control of it. Everything else follows from that.

The only privacy guarantee that is durable across Terms of Service changes, acquisitions, regulatory gaps, and data broker ecosystems is data that never leaves your device. Not data protected by a strong privacy policy. Not data covered by a BAA. Data that physically does not exist on anyone else's infrastructure.

This used to require choosing between AI assistance and genuine privacy. It no longer does. Modern Apple Silicon chips have enough processing power to run capable AI models entirely on-device. Transcription, note drafting, everything can happen on your computer with no server involved.

The practical implications are significant:

There is no vendor server for a subpoena to compel. The subpoena comes to you.
There is no server to breach. The Change Healthcare attack could not happen to data that was never centralized.
There is no Terms of Service update that affects your clients' data. The vendor's policies govern what happens on their servers. Nothing is on their servers.
There is no de-identified data pipeline. Nothing to strip identifiers from and sell downstream.
A BAA is not required because there is no Business Associate handling PHI.

You can verify this yourself. Turn off your wifi. A genuinely local AI tool works exactly the same. That is not a marketing claim. It is a testable fact about where the processing happens.

What this means for therapists right now

If you are using a cloud-based platform for notes or session recordings, you are not necessarily doing something wrong. HIPAA compliance is achievable with cloud tools given the right BAA and vendor vetting. But “HIPAA compliant” does not mean what most therapists assume it means. It does not protect your clients from the data broker industry. It does not protect them from re-identification. It does not protect them from what happens to de-identified session content after the next Terms of Service update.

The question to sit with is whether “legally compliant” is the same as honoring the implicit promise in the room. For many therapists, once they understand what cloud platforms actually do with session data, the answer is no.

Confidant was built from the premise that the promise in the room is the one that matters. The AI runs entirely on your device. Session recordings, transcripts, and notes never leave your computer, not to our servers, not to any third party. We have no backend that stores client data, which means there is nothing to sell, nothing to breach, and nothing to subpoena. For more on how local AI works and why it became practical, this post covers the technical picture. For the HIPAA compliance question specifically, this post goes into what the law actually requires.

About Confidant

Confidant is the only AI-assisted therapy notes app that runs entirely on your Mac. No cloud, no servers, no subscription required.

Learn more →

← Back to all posts