D
Data Warehouse

TruePrivacy + Databricks

Scan Databricks lakehouse for personal data and manage compliance.

Auth: Service Account
Setup time: 20 minutes

Overview

Databricks is a unified data analytics platform built on Apache Spark, used for large-scale data engineering, ML, and analytics. TruePrivacy connects to Databricks to scan Delta Lake tables and Unity Catalog for personal data, classifying fields and building a data inventory of personal data across your lakehouse.

For organizations using Databricks for data engineering and ML pipelines, TruePrivacy ensures that personal data flowing through Databricks jobs and stored in Delta tables is governed and included in the overall compliance programme.

What TruePrivacy can do

Data Discovery
Data Classification

Data types accessed

  • Delta Lake table records
  • Streaming data tables
  • ML feature store data
  • Unity Catalog managed tables
  • External tables

DSR capabilities

  • Discover personal data across Delta Lake tables
  • Classify personal data fields by category
  • Export data subject records for access requests

How it works

  1. 1

    Create a Databricks service principal and personal access token for TruePrivacy.

  2. 2

    TruePrivacy scans Unity Catalog or Hive Metastore tables for personal data using SQL queries.

  3. 3

    Discovered personal data columns are classified and added to your data inventory.

  4. 4

    For deletion, TruePrivacy generates Delta Lake DELETE statements — supporting both standard and partitioned Delta tables.

Frequently asked questions

Yes. TruePrivacy supports both Unity Catalog and the legacy Hive Metastore for table discovery. Unity Catalog's three-level namespace (catalog.schema.table) is fully supported.

Yes. TruePrivacy can execute DELETE statements on Delta Lake tables. Delta's ACID transaction support ensures that deletions are atomic and do not corrupt table state. Time travel / history is handled in accordance with your Delta table retention configuration.

Connect TruePrivacy to Databricks today

Start your free trial and connect Databricks in 20 minutes.