MC1310681 - Microsoft Purview Data Loss Prevention: Optical character recognition for images in Office and PDFs on Windows

Message Center

Summary

Microsoft Purview Data Loss Prevention on Windows endpoints will support OCR scanning of images embedded in Office documents and PDFs starting mid-May 2026. This closes detection gaps by identifying sensitive data in images, requires enabling OCR with associated Azure AI costs, and involves updating DLP policies and prerequisites.

Published

May 14, 2026

Service

Microsoft Purview

Tag

New feature
User impact
Admin impact

Platforms

Web

More information

Introduction

We are introducing optical character recognition (OCR) scanning support for Microsoft Purview Data Loss Prevention (DLP) on Windows endpoint devices. This enhancement enables DLP policies to detect sensitive information within images embedded inside Office documents and PDF files.

Previously, embedded images were skipped during endpoint DLP scanning, creating a detection blind spot. With this update, embedded images are OCR-processed, helping improve data protection coverage and reduce risk of accidental data exposure.

This message is associated with Microsoft 365 Roadmap ID 381750.

When this will happen:

  • General Availability (Worldwide): Rollout will begin in mid-May 2026 and is expected to complete by late May 2026.

How this affects your organization:

Who is affected:

  • Admins managing Microsoft Purview Data Loss Prevention
  • Organizations using Endpoint DLP on Windows devices
  • Users working with Word, Excel, PowerPoint, and PDF files on Windows endpoints

What will happen:

Once OCR is enabled for your organization, DLP policies applied to endpoint devices will be able to scan images embedded inside Office documents (Word, PowerPoint, Excel) and pdf files on Windows devices. This means:

  • Sensitive data detection in embedded images: If a document contains an embedded image with sensitive information (such as credit card numbers, Social Security numbers, or financial data), DLP policies can now detect that data and enforce configured actions (audit, block, or block with override).
  • Closes a detection blind spot: Previously, embedded images were entirely skipped during endpoint DLP scanning. This update ensures those images are processed using OCR and evaluated against your DLP policies.
  • Cost implications: Enabling OCR incurs additional costs based on the volume of images scanned. Costs are charged per image scanned via Azure AI Services. Organizations can monitor OCR costs through:
    • The OCR cost dashboard in the Microsoft Purview compliance portal under Data loss prevention → Overview
    • Cost Management + Billing in the Azure portal, filtered by your Azure AI Services resource 

What you can do to prepare:

To take advantage of this feature, complete the following steps: 

  1. Verify prerequisites:
    • Confirm Microsoft Sense Client version 10.8820.27904.1000 or later.
    • Run: MsSense.exe --version from C:\Program Files\Windows Defender Advanced Threat Protection.
    • Ensure Windows OS update KB5079473 is installed on endpoint devices.
  2. Enable OCR in Microsoft Purview:
    1. Navigate to the Microsoft Purview compliance portal.
    2. Go to Data loss prevention → Overview → Data loss prevention settings.
    3. Under Optical Character Recognition (OCR), select Turn on OCR scanning.
    4. Select the appropriate Azure pay-as-you-go subscription and Azure AI Services resource.
    5. Select Save.
  3. Review and update DLP policies:
    • Ensure policies include Devices as a location.
    • Consider testing with Test mode and policy tips before enforcing actions.
    • Verify sensitive information types are configured for detection in embedded images.
  4. Set up billing and monitor costs:
    • Configure OCR billing before enabling the feature.
    • Review the OCR cost dashboard in Purview after enabling.
  5. Assign appropriate roles:
    • Compliance Data Administrator
    • Compliance Administrator
    • Information Protection
    • Information Protection Admin

Learn more:

Compliance considerations:

Area Explanation
Data processing changes Embedded images within Office documents and PDFs are now processed using OCR, expanding how DLP evaluates file content.
AI/ML capabilities OCR uses Azure AI Services to analyze images for sensitive information detection.
DLP policy enforcement DLP policies are enhanced to detect sensitive information contained within embedded images.
Admin monitoring and reporting Admins can monitor OCR usage and associated costs through the Purview dashboard and Azure Cost Management.
Admin control The feature is disabled by default and must be explicitly enabled by an administrator.