NEWTurn locked documents into grounded data for Agentforce.

Transform scanned documents
into AI-ready data

Agentforce is only as good as the data it's grounded on. We turn PDFs, scans, and attachments into governed records in Data Cloud.

MuleSoft Anypoint IDP or Data Cloud Document AI: choosing the right path on volume, cost, governance, and where you already sit on the Salesforce platform.

A real architecture review, not a default to whatever was demoed last.

Unstructured

Medical record bundles

Loss runs & submissions

Certificates of insurance

Broker correspondence

Extraction
engine

Structured in Salesforce

Data Lake Object · Submission record

Policy numberWC-2026-04417

Named insuredNorthwind Logistics

CoverageWorkers' Comp

Limit$1,000,000

Effective date2026-01-01

Built on the Salesforce platform you already run

SalesforceData Cloud · Data 360MuleSoft AnypointAgentforceEinstein Trust Layer

On this page

The opportunity hiding in your documents

Most enterprise data that would make an agent genuinely useful is still locked away. The pattern is the same across insurance and beyond.

The agents are ready, the data isn't

The platform has the agents, the LLMs, and the Trust Layer. What's missing is the data, still locked in PDFs, scans, and email attachments. Grounding agents means freeing that data first.

The data exists, it just isn't structured

Loss runs, claim attachments, submissions, and policy documents already hold the answers. They simply aren't in Salesforce as queryable, governed records your agents and reps can use.

The extraction engine is swappable

The hard part is the pipeline around it, not the engine inside it. Choose IDP or Document AI, and change your mind later. The front door and landing zone stay constant.

The pattern that stays the same

One pipeline. A swappable engine.

Whichever extraction engine you pick, the surrounding architecture looks the same, so the choice stays contained and low-risk.

Document source

Upload, email, fax intake, or integration.

MuleSoft front door

Ingestion, normalization, and splitting under per-file limits.

Swappable

Extraction engine

IDP, Document AI, or both. This is the swappable piece.

Data Cloud Data Lake Object

Structured output, governed by the Einstein Trust Layer.

Salesforce + Agentforce

Records, summaries, and grounded agent context.

The architectures available today

Three credible answers

The right one depends on volume, cost structure, governance, and where the customer already sits on the platform.

MuleSoft Anypoint IDP

The mature, production-proven path

Best fit when

Already a MuleSoft customer with vCore capacity
High, predictable volume where batch is acceptable
Classification across many document types
Pipeline feeds multiple systems beyond Salesforce

Cost model

Consumption via Automation Credits at roughly 30 credits per page (Automation Credits 3.0).

Heavy workloads can saturate vCores and force incremental purchases. Output to Data Cloud needs explicit integration work.

Data Cloud Document AI

Salesforce's native, Agentforce-ready path

Best fit when

Committed to Data Cloud as the Agentforce foundation
Moderate volume or relatively small files
Native Agentforce grounding without extra pipeline work
Unified governance under the Einstein Trust Layer

Cost model

Consumption via Data Cloud credits at roughly 750 credits per MB under Intelligent Processing. No separate SKU.

20 MB per-file limit; scanned PDFs need rasterizing to JPEG. Per-MB metering means DPI choices flow straight to the bill.

Hybrid / Phased

For mixed workloads and transitions

Best fit when

Mixed workload profiles across document types
IDP for high-volume, complex-classification work
Document AI for newer Agentforce-grounded use cases
Customers mid-transition to Data Cloud

Cost model

MuleSoft as the front door; route to one engine or the other by document type and target system.

Two engines to operate and govern. Plan the path to selectively retire IDP routes as Agentforce grows.

Cost modeling done right

Predictable economics you can defend

Each engine meters on a different unit. Once you know which lever drives your cost, modeling becomes straightforward, producing a defensible estimate and confident, predictable budgeting from day one.

MuleSoft IDP

Leveraged by page count

Pages × 30 credits × your rate

Straightforward to model from your contracted credit rate, so it's easy to forecast and approve.
Plan vCore capacity up front and production volume scales predictably right alongside the rest of your roadmap.
No separate IDP SKU to manage, since consumption lives entirely within your existing Automation Credits.

Data Cloud Document AI

Leveraged by file size, not pages

Megabytes × 750 credits × your rate

Because cost tracks file size, scan-quality tuning becomes a direct lever you control to optimize spend.
Measuring your real files produces a precise, defensible estimate, giving you confidence instead of guesswork.
Right-sizing DPI keeps spend lean; we measure first so you commit with confidence.

Plan ahead and there are no surprises. Most contracts are sized for baseline platform usage, so we fold new extraction workloads into your credit planning up front, giving you predictable budgeting and full cost visibility from day one.

How to choose

Four questions decide the architecture

The customer's answers determine which path fits. Most decisions are clear once the document profile is actually measured.

Where's the platform center of gravity?

If MuleSoft is the orchestration backbone and Data Cloud is just provisioned, lean IDP. If Data Cloud is the platform of record and Agentforce is near-term, lean Document AI.

What's the document volume and profile?

High-volume batch of complex multi-form documents favors IDP's classification and predictable per-page cost. Moderate volume, simpler types, or real-time agentic use favors Document AI.

What's the governance posture?

For PHI, PII, or regulated data where audit, BAA coverage, and unified governance matter, Document AI's path through the Einstein Trust Layer simplifies the compliance story.

What's the AI roadmap?

Building toward Agentforce as the core agentic platform aligns with Document AI. Using Salesforce as one of several downstream systems favors IDP's broader ecosystem integration.

WHERE THE INDUSTRY IS GOING

Data Cloud is the foundation.
Document AI is the on-platform path.

Salesforce's investment direction is unambiguous. MuleSoft IDP remains a strong production product and stays supported, but for new builds on a Data Cloud + Agentforce foundation, Document AI is increasingly the path of least resistance.

That doesn't mean every customer should rush to it. The right architecture is the one that matches volume, document profile, governance posture, and platform direction.

What we help clients do

The full unstructured-to-structured lifecycle

PS Advisory works with insurance carriers, MGAs, and reinsurers on the whole journey, from the question to a production pipeline.

Discovery & architecture

Profile your documents across volume, size, type, and scan quality, map them to the right engine, and produce a cost model grounded in your contracted rates, not list pricing.

Pilot & validation

Stand up a contained 60–90 day parallel run that benchmarks accuracy on real documents, validates operational behavior at volume, and produces AE-confirmed cost figures.

Production implementation

Build the MuleSoft pipeline, the Data Cloud DLO schema, the Salesforce records, the Agentforce grounding, and the runbook, all with insurance-specific document patterns.

Optimization & roadmap

As Document AI matures and volume grows, the Year 1 choice may not fit Year 3. We help you measure, adjust, and migrate without redoing the pipeline.

Governed by design

Regulated data, handled correctly

For the carriers, MGAs, and reinsurers handling PHI, PII, and regulated content where audit and BAA coverage are non-negotiable.

Einstein Trust Layer

Document AI processes documents through the Trust Layer, with zero data retention, prompt and response masking, and toxicity filtering. Your data never trains external models.

Unified governance

Extracted data lands in Data Cloud DLOs under one governance model, instead of adding a second AI processing surface to audit and secure.

Grounded for Agentforce

Structured output is immediately available to agents, with the same data, governance, and model surface as the agents that will consume it.

Measured, not assumed

We benchmark accuracy and cost on your real documents before you commit, so the production system behaves the way the pilot promised.

From the question to the answer,
in weeks, not months.

Whether you're an AE choosing what to recommend, a partner scoping an implementation, or an architecture team deciding between IDP and Document AI for a real workload, we can help. Deep experience on both MuleSoft and Data Cloud, deeper on Salesforce for insurance.

Transform scanned documentsinto AI-ready data

The opportunity hiding in your documents

The agents are ready, the data isn't

The data exists, it just isn't structured

The extraction engine is swappable

One pipeline. A swappable engine.

Document source

MuleSoft front door

Extraction engine

Data Cloud Data Lake Object

Salesforce + Agentforce

Three credible answers

MuleSoft Anypoint IDP

Data Cloud Document AI

Hybrid / Phased

Predictable economics you can defend

MuleSoft IDP

Data Cloud Document AI

Four questions decide the architecture

Where's the platform center of gravity?

What's the document volume and profile?

What's the governance posture?

What's the AI roadmap?

Data Cloud is the foundation.Document AI is the on-platform path.

The full unstructured-to-structured lifecycle

Discovery & architecture

Pilot & validation

Production implementation

Optimization & roadmap

Regulated data, handled correctly

Einstein Trust Layer

Unified governance

Grounded for Agentforce

Measured, not assumed

From the question to the answer, in weeks, not months.

Transform scanned documents
into AI-ready data

Data Cloud is the foundation.
Document AI is the on-platform path.

From the question to the answer,
in weeks, not months.