How Priivacy Works

Built for the IT team that has to actually maintain it.

Priivacy installs in your environment, scans the systems you point it at, and gives your team the controls to act on what it finds. Read-only by default. No phone-home. No data leaves your network without your sign-off. Fully air-gap capable.

Talk to our solutions engineer

Three deployment options. You choose.

On-premises

Priivacy runs as a set of Docker containers on a Linux host inside your network. Your data is scanned, classified, and reported on entirely within your environment. Nothing leaves unless you export it. Fully air-gap capable.

Your cloud tenant

Install Priivacy in your existing Azure, AWS, or Google Cloud environment using your IaaS. Same isolation as on-prem. No third-party cloud touches your data.

USC Data dedicated cloud

If your team would rather not run the infrastructure, we host a single-tenant Priivacy instance for you. We manage the underlying host; we cannot see your data. Authentication, encryption keys, and administrative controls remain yours.

All three options give you the same product. The only difference is who manages the underlying host. You can move between options later if your needs change.

Eleven connectors. Everywhere your data actually lives.

Priivacy reads from the systems where sensitive data ends up — not just the systems IT formally manages. The connector library covers Microsoft 365, file systems via lightweight remote agents, mounted server folders, and five SQL database engines.

Microsoft 365 (via Microsoft Graph)

SharePoint Online

Every site, every library

OneDrive

Every user, with optional multi-user concurrent scanning (up to 4 at a time)

Exchange Online

Message bodies and attachments

Privacy Posture

Read-only assessment of 24 tenant configuration tests (sharing, mailbox rules, dormant guests, public sites, and more)

Required Graph permissions: Sites.Read.All, Files.ReadWrite.All, User.Read.All, Mail.ReadWrite. Plus AuditLog.Read.All for the dormant-guests posture test only.

File systems (via Remote Agent)

Windows — Single binary (~10 MB), no installer
macOS — Apple Silicon and Intel builds
Linux — Single binary, runs as systemd service

One-time pairing code links the agent to your Priivacy account. WebSocket connection, heartbeats every 30 seconds. Scans local file systems on demand; files are uploaded one at a time for server-side analysis. The agent runs as a background service so scans can be triggered remotely.

Mounted server folders

Direct scan of folders mounted on the Priivacy host. Ideal for bulk imports via SFTP/FTP where clients drop files into a known location.

SQL databases (read-only)

SQL Server (on-prem and Managed Instance)
Azure SQL
PostgreSQL (including Aurora, Cloud SQL, Azure Database)
MySQL
MariaDB

Read-only — Priivacy only issues SELECT statements via a dedicated DB user. SSL/TLS required by default. Every row of every matched table is scanned (no sampling). A column denylist (case-insensitive glob patterns) is pre-seeded to skip secrets and tokens — password*, *_hash, api_key, secret*, token*, *_salt.

Distributed. Scheduled. Read-only by default.

Priivacy uses a Celery task queue with Redis for distributed scanning. Files are dispatched in batches of 50 and processed by independent workers that auto-scale based on queue depth. Each worker downloads the file, runs OCR and PII detection, and persists results. Scans survive container restarts — tasks in Redis are picked up by new workers automatically.

Schedule any cadence

Daily, weekly, monthly, or fixed interval (every N hours, up to 30 days). Re-scans only what's changed since the last run. Saved schedules dispatch fresh scans automatically.

Throttle and resume

Worker concurrency auto-scales based on queue depth. Long scans can be paused mid-flight with Stop and resumed later with Continue Scan — the scanner re-walks the source, fetches the list of files already uploaded, and skips them.

Read-only by default

Scanning does not modify your data. Every remediation action — redact, quarantine, permission change — requires explicit approval from an authorized user before it runs. Logged with user, timestamp, and before/after state.

Three detection engines. 51 PII types across 7 categories.

Priivacy uses a layered detection stack — pattern matching with mathematical checksum validation, named entity recognition, and document-aware classification — to find sensitive data across structured and unstructured sources.

Pattern Matching Engine

Built-in detectors for 51 PII types across 7 categories, with checksum validation for identifiers like AU TFN (weighted), Medicare, ABN/ACN, NZ IRD (mod-11), UK NHS (Modulus 11), Credit Card, IBAN. Multi-jurisdiction coverage: Australia, New Zealand, United Kingdom, European Union, United States, Singapore, France, Germany, Netherlands, Ireland.

Named Entity Recognition

Shared spaCy NER server for person names, locations, organisations, and nationalities. CPU-optimized. Microsoft Presidio provides the framework; custom recognisers extend the library with country-specific patterns and confidence thresholds.

Document-Aware Detection

Documents are classified by keyword match (e.g., "birth certificate", "tax return") and then document-specific regex patterns activate with confidence boosts on relevant PII types. Patterns that would be too noisy on general documents become precise when the document context is known.

A Detector Builder lets your admin synthesize new regex detectors from a handful of example values — paste 3–5 examples, upload sample documents, and Priivacy generates the regex deterministically and validates it against your content. A Tabular Classifier recognises ~300 column-header patterns in CSV, Excel, and database scans, emitting authoritative findings on every cell in known columns. Custom Header Mappings extend the dictionary with your org-specific column names.

From findings to fixes — without a separate tool.

Discovery is the easy part. Most platforms stop there. Priivacy gives your team the controls to act on findings inside the same platform.

Redact in place

For PDFs, DOCX, XLSX, text, images, and emails. Three modes: Replace (placeholder text), Mask (partial obscure), or Blackout (solid rectangle, PDFs and images only). Word docs get a deep XML cleanup pass that strips revision history, comments, and tracked changes so PII doesn't leak through Office internals.

Quarantine

Move files to a controlled holding location with restricted access. M365 files are downloaded via Graph and held centrally; local and agent files are moved on the source machine. Each quarantined file gets a manifest sidecar for audit trail.

Remediate access (M365)

Identify and revoke "anyone with the link" sharing, orphaned permissions from off-boarded users, and stale guest accounts. Approve in bulk or one at a time.

Workflow Rules (automation)

Composable IF/THEN rules. Define conditions (PII Type, Severity, File Age, Confidence, Document Type, File Owner, and more), pick actions (flag, forget, quarantine, redact, secure delete, notify), and run server-side in the background. Build once, run against any scan.

Defensible audit trail

Every action logged with user, timestamp, reason, and before/after state. Batch logs, per-file logs, and remediation history all live in the same audit ledger.

Built for the requests you legally have to answer.

Priivacy includes a full 10-stage Data Subject Access Request workflow aligned to GDPR Article 15, the UK Data Protection Act, and the Australian Privacy Act APP 12. Identity verification, jurisdiction-aware deadlines, Microsoft Graph search across mailboxes and SharePoint, AI-assisted triage, per-finding DPO decisions, and a sealed disclosure PDF — all auditable end-to-end.

The AI triage step uses a local Qwen2.5-3B language model that runs inside the Priivacy environment. No data leaves the appliance during DSAR processing. The DPO confirms or overrides every suggestion; nothing is auto-applied.

A simpler SAR Export option produces a quick GDPR Article 15 PDF dossier from existing estate scans — useful when you have a curated set of scans for a subject and just need a clean "all data on this person" handoff.

Plays well with what you already run.

Microsoft 365 (Graph API)

Native connectors for SharePoint, OneDrive, Exchange, and Entra ID. Reads Microsoft Purview sensitivity labels and flags compliance gaps (e.g., critical PII with a Public sensitivity label).

Notification channels

Microsoft Teams (Workflows webhooks with Adaptive Cards), Slack (webhook URL), email (SMTP or Microsoft Graph). Scan completions, workflow rule executions, and remediation job summaries.

Export formats

Standalone HTML reports (offline-readable), PDF (browser print), CSV, JSON, JSONL. Compliance Report ledger format separate from Full Scan JSONL (which contains raw PII values — treat with care).

System requirements.

Priivacy runs as Docker containers on a Linux host. A typical mid-sized environment runs on a single VM. Worker concurrency auto-detects from available RAM and CPU at container start.

Tier	RAM	Workers	Notes
Minimum (not recommended)	16 GB	4	May OOM under heavy scans
Recommended minimum	24 GB	4	Comfortable for mid-sized scans
Standard	32 GB	6	Good for most client workloads
Large	40 GB+	8+ (up to CPU count)	High-throughput, large tenants

The application container needs 8 GB. The shared NER server uses ~2 GB (loaded once). Each Celery worker uses ~200 MB steady state. PostgreSQL is the application database; Redis is the task queue; Apache Tika provides fallback text extraction for 1000+ file formats; Tesseract handles OCR.

If you'd rather not host any of this, the USC Data dedicated cloud option provisions and manages the host on your behalf.

Want a deeper walk-through?

Our solutions engineer will run a 45-minute working session with your IT lead — architecture, deployment options, permission model, and a live demo against a sample data set. Bring your hardest questions.

Book a technical demo Email connect@uscdata.com