All posts
·5 min read

Building AI-Powered PII Redaction for Mental Health Research

How we built an intelligent document redaction system for mental health research that handles multilingual transcripts, preserves research context, and gives researchers final control over what gets redacted.

aiprivacyhealthcareproduct-engineering

The Problem: Privacy vs. Research Value

A mental health research team came to us with a unique challenge: they conduct 50+ participant interviews per month across multiple Indian languages—Hindi, Tamil, Telugu, and English—often mixed within the same conversation.

Each interview transcript is 15-20 pages and contains deeply sensitive information: participant names, family members, therapist identities, specific clinic locations, and personal details shared during therapy sessions.

The critical problems:

  • Inconsistent: One researcher might redact "Dr. Kumar", another might leave it thinking it's generic
  • Error-prone: Easy to miss a Tamil name ("Lakshmi Amma") buried in an English paragraph
  • Context destruction: Replacing everything with [REDACTED] makes conversation analysis impossible
  • Language barriers: Code-mixed text ("I met amma at the clinic in Mylapore") requires understanding both languages
  • Massive time sink: Three researchers spend 2-3 hours each per document, reading line-by-line

The trust problem: Researchers needed to verify AI suggestions before finalizing redactions. Fully automated redaction without human review was a non-starter for privacy-critical research.

Why Existing Tools Failed

The team tried off-the-shelf redaction software. Standard tools trained on English business documents couldn't parse:

  • Hindi-English code-mixing
  • Regional names in Devanagari or Tamil script
  • Family relationship terms ("amma", "nani", "dada") that are PII in context

This led to either over-redaction (document becomes meaningless) or under-redaction (privacy violations).

Our Solution: AI + Human Review Workflow

We built a system that combines AI intelligence with human judgment. The AI does the heavy lifting, but researchers have final say.

The two-stage process:

  1. AI reads the entire document for context and language patterns
  2. AI identifies potential PII with confidence scores (95%+ = definitely PII, 70-95% = review recommended)
  3. Researcher sees document with AI-highlighted suggestions
  4. For each suggestion, researcher can accept, reject, or edit the redaction type
  5. System applies final redactions and creates audit log

Why this hybrid approach works:

  • Privacy-safe: Researchers verify every redaction before it's final
  • Faster: AI reduces 2-hour review to 20 minutes
  • Accurate: Human judgment on edge cases, AI consistency on obvious ones
  • Auditable: Complete record of what was changed and why

The Technical Challenge: Multilingual Context Understanding

Modern AI models (like Google's Gemini) are trained on massive multilingual datasets including Hindi, Tamil, Telugu, Bengali, and other Indian languages.

What this means in practice:

Hindi-English code-mixing:

"Meri behen ne kaha ki therapy helpful hai."
AI recognizes: "Meri behen" (my sister) → [FAMILY_MEMBER_1]

Tamil names in English text:

"She spoke about how Annamalai supported her recovery."
AI recognizes: "Annamalai" (Tamil personal name) → [PERSON_1]

But more impressively, the AI understands context. It distinguishes between:

  • "mother" (generic reference, keep for context)
  • "Radha" appearing later in conversation (specific name, likely the mother → [PERSON_1])

Numbered markers preserve research context:

Bad redaction: "The [REDACTED] spoke to [REDACTED] about [REDACTED]."

Smart redaction: "The [PARTICIPANT_1] spoke to [THERAPIST_1] about family support. Later [PARTICIPANT_1] mentioned [FAMILY_MEMBER_1] was helpful."

Researchers can still analyze who said what, relationships between people, conversation patterns, and therapy progress—without knowing actual identities.

The Results

Before AI-assisted redaction:

  • ⏱️ Time: 2-3 hours per document (pure manual work)
  • 📊 Accuracy: 80-85% (audit found 15-20% of personal info was missed)
  • 💰 Cost: 80 hours/month × $30/hour = $2,400/month in researcher time
  • 🌐 Language coverage: Hindi/Tamil names frequently missed
  • 😓 Team morale: Researchers resented this tedious, non-research work

After AI-assisted redaction:

  • ⏱️ Time: 15-20 minutes per document (AI + human review)
  • 📊 Accuracy: 96% (tested on 100 gold-standard documents)
  • 💰 Cost: $0.20 AI per document + 15 hours/month review = $250/month
  • 🌐 Language coverage: Consistent across English, Hindi, Tamil, Telugu, code-mixed text
  • Team morale: Researchers focus on actual research
  • 🔒 Trust: Researchers maintain final control, building confidence in the system

ROI: System saved $2,150/month. Paid for itself in the first week.

Unexpected benefits:

  • AI caught patterns humans consistently missed (family member names mentioned casually mid-paragraph)
  • Consistency across research team (AI applies same logic to all documents)
  • Confidence scores helped prioritize review time (accept 95%+ suggestions automatically, focus on 70-95% edge cases)

What We Learned

1. Human-in-the-loop is essential for privacy-critical work

We initially considered fully automated redaction. The research team rejected it immediately. When dealing with mental health data, researchers needed to verify that AI didn't over-redact (removing research-critical context) or under-redact (missing privacy violations). The hybrid approach worked: AI does the heavy lifting (reading 20 pages, identifying 50+ entities), humans do final verification (15 minutes of focused review).

2. Indian language support is non-negotiable for Indian research

Early testing with English-only AI models achieved 65% accuracy. With multilingual AI: 96%. The difference was recognizing "Amma" (Tamil/Hindi for mother) as a family reference, understanding "Dr. Sharma" is a person, and parsing code-mixed sentences. For any Indian language use case, test multilingual models from day one.

3. Confidence scores changed everything

Version 1 showed AI suggestions without confidence scores. Researchers reviewed every suggestion equally. Version 2 added confidence scores. Researchers now auto-accept 95%+ suggestions, focus review time on 70-95% edge cases, and manually verify 50-70% suggestions. Impact: Review time dropped from 35 minutes to 15 minutes per document.


Building a healthcare, research, or compliance-heavy product that handles sensitive documents? Let's talk →

We've built similar AI-powered systems for invoice processing, legal document analysis, and medical record redaction. The pattern is the same: AI that understands context beats pattern-matching every time.

Have questions about this post? Get in touch.