Options
| Option | Default | Description |
|---|---|---|
presidio_url | http://localhost:5002 | Presidio analyzer URL |
languages | (per image) | Languages to detect. Auto-configured in Docker images |
fallback_language | en | Fallback if detected language not in list |
score_threshold | 0.7 | Minimum confidence (0.0-1.0) |
entities | See below | Entity types to detect |
Languages
Languages are auto-configured per Docker image::enimage → English only:euimage → English, German, Spanish, French, Italian, Dutch, Polish, Portuguese, Romanian
ca, zh, hr, da, nl, en, fi, fr, de, el, it, ja, ko, lt, mk, nb, pl, pt, ro, ru, sl, es, sv, uk
Override Languages
For local development or custom setups, override via config:Fallback Language
If the detected language isn’t in your configured list, the fallback is used:Performance
If only one language is configured, language detection is skipped for better performance.Entities
| Entity | Examples |
|---|---|
PERSON | Dr. Sarah Chen, John Smith |
EMAIL_ADDRESS | sarah.chen@hospital.org |
PHONE_NUMBER | +1-555-123-4567 |
CREDIT_CARD | 4111-1111-1111-1111 |
IBAN_CODE | DE89 3704 0044 0532 0130 00 |
IP_ADDRESS | 192.168.1.1 |
LOCATION | New York, 123 Main St |
US_SSN | 123-45-6789 |
US_PASSPORT | 123456789 |
CRYPTO | Bitcoin addresses |
URL | https://example.com |
Score Threshold
Higher = fewer false positives, might miss some PII. Lower = catches more PII, more false positives.Whitelist
Exclude specific text patterns from PII masking. Useful for preventing false positives on company names or product identifiers.Scan Roles
By default, all message roles are scanned. To scan only user-controlled content:| Role | Description |
|---|---|
user | User messages (primary source of PII) |
assistant | Assistant responses |
system | System prompts |
tool | Tool/function call results |
function | Legacy function results (OpenAI) |