A simulated enterprise environment, benchmark, and account data unification tool intend to help customers transform into agentic AI enterprises.

Salesforce AI Research on Wednesday announced three advancements geared toward helping customers transform into agentic AI enterprises.
They include a simulated enterprise environment framework for testing and training agents, a new benchmarking tool to measure the effectiveness of agents in enterprise use cases, and a new Data Cloud capability for autonomously consolidating and unifying duplicated account data.
Simulation testing and training for AI agents
Salesforce AI Research has built on the previously released CRMArena, an environment for testing single-turn B2C service tasks, with the new CRMArena-Pro, a simulated enterprise environment framework that enables the testing of AI agent performance in complex, multi-turn, multi-agent scenarios including sales forecasting, service case triage, and Configure, Price, Quote (CPQ) processes.
CRMArena-Pro also leverages synthetic data, which Salesforce said allows safe API calls while enforcing strict safeguards to protect PII.
Salesforce added that the environment acts much like a digital twin of a business, capturing the full complexity of enterprise operations and helping customers test agents in situations like customer service escalations or supply chain disruptions before they go live.
Benchmark evaluates AI agents in context
The new Agentic Benchmark for CRM is designed to evaluate the effectiveness of AI agents in specific business contexts: customer service, field service, marketing, and sales. It measures five enterprise metrics — accuracy, cost, speed, trust and safety, and sustainability — to determine agents’ readiness for real-world deployments.
Salesforce noted it added sustainability as a new metric that highlights the relative environmental impact of AI systems, and is intended to help businesses align model size with the specific level of intelligence required by an agent to complete an enterprise-specific task, minimizing the computational resources the agent uses.
In addition, Salesforce AI Research has published two complementary benchmarks, MCP-Eval and MCP-Universe, which are designed to measure agents and track LLMs as they interact with MCP servers in real-world use cases.
MCP-Eval uses synthetic tasks for scalable, automatic evaluation across a wide range of MCP servers, whereas MCP-Universe leverages challenging, real-world tasks with execution-based evaluators to stress-test agents in complex scenarios.
Autonomously consolidating duplicated account data
Finally, Account Matching is a new Data Cloud capability intended to help ensure AI agents have access to high-quality, unified data. Salesforce AI Research partnered with Salesforce product teams to fine-tune LLMs and SLMs to power Account Matching, which can identify duplicated records, incomplete fields, and inconsistent naming conventions, and reconcile them across data systems into a single, authoritative record.
Salesforce pointed to one customer that used a proprietary tool leveraging Account Matching that was able to unify more than a million accounts with a 95% match success rate in just the first month, and it reduced average handling time by 30 minutes.