Automatically extract names, companies, and dates from text
What NER solves that regex can't
Regex is great for rigid patterns: a ZIP code always has a fixed format, an email always has @. But people's names, companies, and dates have no fixed pattern:
- "Pedro Silva", "Maria da Conceição dos Santos", "Dr. Fernando" — all names
- "Petrobras", "Banco do Brasil", "Itaú Unibanco SA", "Loja do Seu Zé" — all companies
- "January 5th", "01/05/2026", "last Friday", "next month" — all dates
NER uses a language model that learns to understand context: "the company Itaú" vs "Itaú street". Regex can't make that distinction; NER gets it right 95%+ of the time.

Standard and custom entities
Public NER models (spaCy, HuggingFace) detect:
- PER (Person): Pedro Silva, Dr. João
- ORG (Organization): Petrobras, Google
- LOC (Location): São Paulo, Brazil
- DATE: January 5th, 2026
- MONEY: R$ 1,500, USD 200
- TIME: 3:30 PM, 9 in the morning
- PERCENT: 20%, 0.5
For specific domains, you can train a custom model. Examples:
- Legal: laws (Lei 13.709), case numbers (N° 1234567-89.2024), courts
- Medical: medications, diseases (ICD-10), procedures
- Financial: stock tickers, bank branches, account numbers
Brainiall offers custom models on demand on the Business plan.
How it works under the hood (in 30 seconds)
1. Tokenization: text is broken into words and punctuation
2. POS tagging: each word receives a grammatical class (noun, verb...)
3. Contextualization: each word is converted into a vector of 768+ dimensions considering its neighbors
4. BIO classification: each token is tagged as Begin-entity, Inside-entity, or Outside. E.g.: "Pedro" (B-PER) "Silva" (I-PER) "works" (O) "at" (O) "Petrobras" (B-ORG)
5. Aggregation: consecutive B+I tokens become a single entity
Modern models (mBERT, XLM-R, multilingual DeBERTa) run this pipeline in ~10–50ms for a paragraph.
Practical use cases
- CRM enrichment: extract companies and contacts from emails to update your database
- News analysis: monitor mentions of your brand, competitors, and executives in the media
- Compliance: find personal names in documents for data privacy audits
- Research: extract authors, citations, and dates from academic papers at scale
- Legal analysis: identify parties in a case, cited laws, and judgment dates
Specific limitations for Portuguese
- Compound names with prepositions: "Maria dos Santos" — some models split it into "Maria" + "Santos" as two separate entities
- Family businesses without a legal suffix: "Padaria do Zé" may be treated as a description rather than an entity
- Nicknames: "Lula" as a person vs "lula" as the word for squid — case sensitivity varies
- Brazilian addresses: Street + name + number + ZIP code — segmentation can go wrong
- Acronyms: Is "USP" an entity or just a word?
Tip: for borderline cases, always manually review 100 examples before going to production.
Integrating via API
A single endpoint returns an array of entities:
`python
import httpx
r = httpx.post(
"https://api.brainiall.com/api/nlp/ner",
json={"text": "Pedro Silva, from Petrobras, announced on January 5th."},
headers={"Authorization": "Bearer brnl-xxx"}
)
# [{"text": "Pedro Silva", "type": "PER", "start": 0, "end": 11},
# {"text": "Petrobras", "type": "ORG", "start": 16, "end": 25},
# {"text": "January 5th", "type": "DATE", "start": 40, "end": 52}]`
Try it right now
Ask "extract people, companies, and dates from this text: [paste]" in the Brainiall chat. Or use the API at /api/nlp/ner. The Pro plan at $29 includes 10k requests/month; Business adds batch processing and custom models.