Ana Brainiall

Automatically Find CPF, RG, and Email in Documents

intermediario · 9 min · Por Ana Brainiall

What Is PII and Why LGPD Requires You to Find It

PII (Personally Identifiable Information) is any data that identifies a person: name, CPF, RG, email, phone number, address, banking details, photo, or biometrics. Under LGPD (Law 13.709/2018), if you store PII from Brazilian users, you must:

1. Know where each piece of PII is stored
2. Be able to export all of a user's PII upon request (art. 18)
3. Delete it completely when a user invokes their "right to be forgotten"
4. Audit who accessed each piece of personal data and when

The challenge: PII ends up scattered across logs, emails, Word documents, support tickets, screenshots, and legacy databases. Manually finding PII is simply impossible in any company with more than 100 employees.

ilustração de uma empresa como uma caixa cheia de documentos/arquivos com lupas

Brazil-Specific PII Types

International NER (Named Entity Recognition) models do a decent job detecting names, emails, phone numbers, and addresses. For Brazil, we need specialized recognition:

Brainiall uses a custom ONNX model trained on Brazilian documents combined with validated regex patterns to capture these types with 98%+ accuracy.

The Difference Between Detection and Anonymization

Detection is just the first step. What you do next depends on the context:

Brainiall's endpoint supports all 4 modes via the mode parameter.

Integrating With Your Pipeline

A typical enterprise workflow looks like this:

1. Discovery: periodic scans (weekly) across all data sources — databases, S3, logs, email
2. Classification: flag where PII exists, what type it is, and its criticality level
3. Minimization: PII that's no longer needed gets deleted or moved to encrypted cold storage
4. Request fulfillment: when a user requests an export or deletion, fast lookup via index

The detection API is just one layer of this pipeline. You'll also need metadata infrastructure, audit logging, and data mapping.

diagrama de 4 etapas do ciclo de vida de PII — Discovery → Classification → Mini

Common Pitfalls

Try It Right Now

In the Brainiall chat, just ask "detect PII in this text: [paste content]". Or use the API at /api/nlp/pii. For enterprise-scale compliance, the Business plan at $19 includes a batch API and audit log retention for 12 months.

Enjoyed this course?

Unlock 17 Pro courses + 40+ AIs in chat + video, music and full Studio generation.

Go Pro · $5.99/mo

Cancel anytime · No commitment