Dynamic Business Logo
Home Button
Bookmark Button

Humanloop: Evaluation-Driven Development Under Anthropic

Humanloop occupies a unique space in the 2026 landscape: it is no longer a standalone startup, but the “Evaluation Engine” inside Anthropic. In August 2025, the company was acquired by Anthropic in a strategic deal to solve the industry’s biggest bottleneck: Trust. Before the acquisition, Humanloop was the leading LLMOps platform for “prompt engineering” and performance evaluation. Following the deal, the standalone Humanloop platform was sunsetted (offline as of September 8, 2025), and its team and technology were integrated directly into the Anthropic Console.

Today, Humanloop’s DNA lives on as the “Workbench” and “Evaluations” tabs within Anthropic’s enterprise suite. The acquisition was a tacit admission by the model labs that “having a smart model” isn’t enough; customers need the tooling to measure that smartness. By absorbing Humanloop, Anthropic gained the industry’s most mature testing infrastructure, allowing enterprise customers to run regression tests on Claude 4.5 updates just as easily as they would test software code.

Read more: 34 US startups to watch in 2026

Core Technology: Evaluation-Driven Development

  • The Workbench (formerly Humanloop Playground): An interactive environment now native to Anthropic, where developers can version-control prompts and run “A/B tests” between different model versions (e.g., Claude 3.5 Sonnet vs. Opus).
  • Evaluations: The core IP acquired by Anthropic. It allows teams to define “success criteria” (e.g., “Did the model answer in JSON?”, “Was the tone empathetic?”) and automatically grade thousands of historical logs against these rules.
  • Human-in-the-Loop Feedback: A streamlined interface for domain experts (like lawyers or doctors) to give thumbs-up/thumbs-down feedback on model outputs, which is then used to fine-tune the model for that specific customer.
  • Prompt Registry: A centralized library that treats prompts as code, complete with version history (v1.2, v1.3), ensuring that a bad prompt update can always be rolled back.

Business & Market Status

Status: Acquired & Integrated. The company ceased independent operations in late 2025.

Acquisition Logic: Anthropic acquired Humanloop to close the gap with OpenAI’s tooling ecosystem. Rather than building an evaluation stack from scratch, they bought the market leader to offer immediate “Enterprise Readiness” to their Fortune 500 clients.

Legacy: Prior to acquisition, Humanloop was the default choice for high-growth AI startups like Duolingo, Gusto, and Vanta, many of whom migrated to Anthropic’s native tools post-merger.

Company Profile

Founders: Raza Habib (CEO), Jordan Burgess, and Peter Hayes. (All now in leadership roles within Anthropic’s Product & Engineering divisions).

Headquarters: Formerly London, UK / San Francisco, CA.

Funding: Raised ~$5 Million (Seed) prior to acquisition.

Key Investors: Y Combinator, Index Ventures, LocalGlobe.

Key Use Cases (Legacy & Current Integration)

Use Case Description
Regression Testing Enterprises use the integrated tools to ensure that a new model update (e.g., moving from Claude 3 to 3.5) doesn’t break existing prompts or degrade answer quality.
Prompt Engineering Non-technical product managers use the visual “Workbench” to iterate on system prompts and test them against real user data without needing to write code.
Fine-Tuning Data The platform captures “corrected” answers from human reviewers and automatically formats them into datasets for fine-tuning private models.

Why It Matters

Humanloop was the first company to treat “Evaluation” as a first-class citizen. They realized early on that “Prompt Engineering” isn’t magic—it’s engineering, and it requires debuggers, version control, and test suites. By acquiring them, Anthropic signaled that the future of AI isn’t just about bigger models, but about reliable ones that can be measured, trusted, and improved systematically.

humanloop.com | LinkedIn Profile

What do you think?

    Be the first to comment

Add a new comment

Mazi

Mazi

Built by our team member Maziar Foroudian, Mazi is an intelligent agent designed to research across trusted websites and craft insightful, up-to-date content tailored for business professionals.

View all posts