AI Privacy Impact Assessment: A Step-by-Step Template for EU AI Act Compliance

EU AI Act Risk Tiers and When PIAs Are Required

The EU AI Act classifies AI systems into four risk tiers: unacceptable risk, high risk, limited risk, and minimal risk. Your obligations for privacy impact assessment depend entirely on where your AI system falls in this classification. Getting the classification wrong — in either direction — creates compliance risk or unnecessary cost.

Unacceptable-risk AI systems are banned outright. These include social scoring systems, real-time biometric identification in public spaces (with narrow exceptions), and AI that exploits vulnerable groups. If your system falls into this category, no PIA will save it — you need to redesign or discontinue it.

High-risk AI systems face the most demanding requirements, including mandatory fundamental rights impact assessments that encompass privacy impacts. These include AI used in employment decisions, credit scoring, education assessment, law enforcement, migration management, and critical infrastructure. High-risk systems must undergo conformity assessments before deployment and maintain ongoing compliance documentation.

Limited-risk systems — such as chatbots and emotion recognition systems — face transparency obligations but not mandatory PIAs. However, conducting a voluntary PIA for these systems is strongly recommended because they often process personal data in ways that trigger GDPR's DPIA requirement under Article 35, even if the AI Act does not mandate it. The interaction between the AI Act and GDPR creates a situation where the privacy impact assessment may be legally required under GDPR even when the AI Act does not explicitly demand one.

Minimal-risk systems have no specific obligations under the AI Act. Most AI applications fall into this category, including spam filters, AI-powered search, and recommendation systems. However, if these systems process personal data, GDPR's DPIA requirements still apply based on the nature and scope of processing, not the AI Act's risk classification.

Template Section 1: Purpose Definition and Scope

Every AI privacy impact assessment begins with a clear articulation of what the AI system does, why it exists, and what personal data it touches. This section establishes the boundaries of the assessment and provides context for the risk analysis that follows.

Start by documenting the AI system's purpose in specific, measurable terms. Avoid vague descriptions like 'improving customer experience.' Instead, state the concrete objective: 'Predicting customer churn likelihood based on usage patterns to trigger proactive retention outreach.' The purpose definition directly determines the legal basis for processing and the scope of data collection that can be justified under data minimisation principles.

Define the scope of personal data processing. List every category of personal data the system ingests, generates, or infers. Distinguish between data used for training, data used for inference, and data generated as output. Many PIAs fail because they only consider input data and overlook inferred data — if your AI system infers a user's income bracket from their behaviour, that inference is personal data subject to the same protections.

Identify all data subjects affected by the system. This includes direct users, individuals whose data appears in training sets, and individuals affected by the system's outputs. A fraud detection system affects not only the customers it evaluates but also the employees whose transaction handling patterns it analyses. Each category of data subject may have different rights and protections depending on their jurisdiction.

Document the decision-making impact. Is the AI system making or contributing to decisions that significantly affect individuals? Under both GDPR Article 22 and the AI Act, automated decision-making with significant effects triggers additional obligations including the right to human review, the right to contest the decision, and enhanced transparency requirements.

Template Section 2: Data Flow Mapping for AI Systems

AI data flows are more complex than traditional application data flows because they involve multiple stages — collection, preprocessing, training, validation, deployment, inference, and feedback — each with different privacy implications. Your PIA must map data flows at each stage.

Start with the training data pipeline. Document where training data originates, how it is collected, where it is stored during preprocessing, which transformations are applied (anonymisation, pseudonymisation, aggregation), and where the training environment is located. If you use transfer learning with pre-trained models, document the provenance of the pre-trained model and any personal data used in its original training.

Map the inference pipeline separately. When the deployed model processes a new input, trace the data from the point of collection through preprocessing, model inference, output generation, and storage of results. Include any feedback loops where inference outputs are fed back into the training pipeline. These feedback loops are often overlooked but create ongoing data processing that must be covered by the original legal basis.

Document data sharing with third parties at each stage. This includes cloud providers hosting the training infrastructure, annotation services labelling training data, MLOps platforms managing model deployment, and monitoring tools tracking model performance. Each third party is a data processor under GDPR and must be covered by a data processing agreement.

Pay special attention to data residency at each stage. Training might occur on GPU instances in the US, while inference runs on edge devices in multiple countries. Each cross-border transfer must be documented and assessed against applicable transfer restrictions. The training phase is particularly sensitive because it often involves large datasets that may include data from multiple jurisdictions.

Template Section 3: Risk Identification and Scoring

The risk assessment section is the core of the PIA. It requires identifying specific privacy risks created by the AI system, assessing their likelihood and severity, and determining whether existing controls adequately mitigate them.

Begin with a structured risk identification exercise. Common AI-specific privacy risks include: model memorisation (the model retains and can reproduce specific training examples), attribute inference (the model can infer sensitive attributes not explicitly provided), re-identification risk (the model's outputs enable identification of individuals in supposedly anonymised datasets), bias amplification (the model systematically treats certain groups differently, constituting discriminatory processing), and function creep (the model is repurposed for processing beyond the original stated purpose).

For each identified risk, assess likelihood on a scale from rare to almost certain, considering your specific context. A large language model fine-tuned on customer support transcripts has a high likelihood of memorisation risk. A simple classification model trained on aggregated statistics has a low likelihood. Base your assessment on published research, your model architecture, and the nature of your training data.

Assess severity by considering the potential impact on data subjects. Would the risk, if realised, result in financial loss, reputational damage, discrimination, loss of autonomy, or physical harm? Severity should reflect the worst plausible outcome, not the average outcome. A medical AI system that misclassifies a patient has higher severity than a recommendation system that suggests an irrelevant product.

Combine likelihood and severity into a risk score using a standard risk matrix. Risks scoring high on both dimensions require immediate mitigation before the system can be deployed. Medium risks should be mitigated where feasible and monitored where not. Low risks should be documented and reviewed periodically. The risk matrix should be calibrated to your organisation's risk appetite and regulatory environment.

Template Section 4: Mitigation Measures and Controls

For each risk identified in the previous section, document the specific mitigation measures that reduce either the likelihood or the severity to an acceptable level. Mitigations should be concrete and verifiable — not aspirational statements but implementable controls.

For model memorisation risk, implement differential privacy during training, which adds calibrated noise to prevent the model from memorising individual examples. Document the privacy budget (epsilon value) used and justify its selection. Complement this with membership inference testing: regularly test whether the model can distinguish between records that were and were not in the training set.

For attribute inference risk, implement output filtering that strips or masks inferred sensitive attributes before they reach downstream systems. If the model infers health status from behavioural data, ensure that inference is not stored, logged, or transmitted unless there is an explicit legal basis for processing health data. Document the filtering logic and test it against adversarial inputs.

For bias and discrimination risk, implement fairness testing across protected characteristics before deployment. Document which fairness metrics you use (demographic parity, equalised odds, predictive parity), why those metrics are appropriate for your use case, and the threshold values you consider acceptable. Establish ongoing monitoring that triggers alerts when fairness metrics drift beyond acceptable bounds.

For function creep risk, implement technical controls that prevent the model from being used for purposes beyond the documented scope. This might include API-level access controls that restrict which applications can call the model, input validation that rejects query types outside the intended use case, and audit logging that tracks how the model is being used. Contractual controls with internal teams and external partners should complement technical measures.

Document the residual risk after mitigations are applied. No mitigation eliminates risk entirely. The PIA should clearly state the residual risk level and confirm that it falls within the organisation's risk appetite. If residual risk remains high, the system should not be deployed without additional mitigation or executive risk acceptance.

Documenting Training Data Provenance

The EU AI Act places particular emphasis on training data quality and documentation. Article 10 requires that training datasets be subject to appropriate data governance and management practices, including examination of possible biases, identification of data gaps, and assessment of data relevance. Your PIA must document training data provenance comprehensively.

For each training dataset, document its origin, collection method, time period, and the population it represents. If you purchased data from a third party, document the vendor, the contractual terms governing use, and any representations the vendor made about the data's collection practices and consent status. If you scraped data from public sources, document the sources, the scraping methodology, and your legal analysis of why the data can be lawfully used for training.

Document the legal basis for using each dataset in training. Under GDPR, legitimate interest is the most commonly invoked basis for training data, but it requires a balancing test that weighs the organisation's interest against the data subjects' rights. Document this balancing test explicitly: what is the legitimate interest, what is the impact on data subjects, what safeguards are in place, and why the interest outweighs the impact.

Record any data quality issues identified during preprocessing. Did you find and remove duplicate records, correct labelling errors, or exclude outliers? Document these decisions and their rationale, as they affect both the model's performance and the privacy impact on individuals whose data was modified or excluded.

If you use synthetic data to supplement or replace personal data in training, document the generation methodology and validate that the synthetic data does not enable re-identification of individuals from the original dataset. Synthetic data is not automatically privacy-safe — poorly generated synthetic data can leak information about the source data. Document the privacy guarantees provided by your synthetic data generation approach and any testing you performed to validate those guarantees.

Ongoing Monitoring and PIA Updates

A PIA is not a one-time document. AI systems evolve through retraining, fine-tuning, and adaptation, and each change can alter the privacy risk profile. Your PIA must include a monitoring plan that specifies when and how the assessment will be updated.

Define triggers for PIA review. These should include: model retraining on new data, changes to the model architecture, expansion to new use cases or user populations, changes in the regulatory environment, privacy incidents or near-misses involving the system, and significant changes in model performance metrics that might indicate distributional shift. Any of these triggers should initiate a review of the relevant PIA sections.

Implement continuous monitoring for the risks identified in the PIA. Fairness metrics should be tracked in production and compared against the baseline established during the PIA. Memorisation testing should be repeated periodically, especially after retraining. Access logs should be reviewed to detect potential function creep. Establish alerting thresholds that trigger escalation when monitored metrics deviate from expected ranges.

Maintain a version history of the PIA that corresponds to the model version it describes. When the model is retrained, create a new PIA version that documents the changes: new training data sources, updated risk assessments, revised mitigation measures, and current monitoring results. This version history demonstrates to regulators that you maintain ongoing awareness and control of your AI system's privacy impacts.

Schedule periodic comprehensive reviews even in the absence of specific triggers. Annually at minimum, review the entire PIA with fresh eyes. Regulatory interpretations evolve, new attack vectors are discovered, and organisational context changes. A PIA written twelve months ago may be technically accurate but fail to address risks that have emerged since then.

Finally, ensure that PIA findings feed back into the product development process. If monitoring reveals a privacy risk that was not anticipated in the original assessment, that learning should inform not only the PIA update but also the design of future AI systems. The PIA process should be a learning mechanism that continuously improves your organisation's ability to build privacy-respecting AI.

AI Privacy Impact Assessment: A Step-by-Step Template for EU AI Act Compliance

EU AI Act Risk Tiers and When PIAs Are Required

Template Section 1: Purpose Definition and Scope

Template Section 2: Data Flow Mapping for AI Systems

Template Section 3: Risk Identification and Scoring

Template Section 4: Mitigation Measures and Controls

Documenting Training Data Provenance

Ongoing Monitoring and PIA Updates

Related articles

AI and Personal Data: How to Stay Compliant While Training Models

AI Governance Under the EU AI Act: A Practical Framework

Automate your privacy compliance