Evaluations

Evaluations help you understand your LLM application performance. You can measure your application across several dimensions such as correctness, hallucination, relevance, faithfulness, latency, and more.

This helps you ship LLM applications that are reliable, accurate, and fast. We use Phoenix, our open source package for running evaluations, which are visible in the Arize platform.

Arize has built an evaluation framework for production applications

Pre-tested Evaluators Backed by Research - Our library includes a range of evaluators, thoroughly tested and continually updated, to provide accurate assessments tailored to your application's needs.
Multi-level Custom Evaluation - We provide several types of evaluation complete with explanations out of the box. You can also customize your evaluation using your own criteria and prompt templates.
Designed for Speed - Phoenix Evals are designed to handle large volumes of data, with parallel calls, batch processing, and rate limiting.
Ease of Onboarding - Our framework integrates seamlessly with popular LLM frameworks like LangChain and LlamaIndex, providing straightforward setup and execution.
Extensive compaibility - Phoenix is compatible with all common LLMs and offers unparalleled RAG debugging and troubleshooting.

To get started, check out the Quickstart guide for evaluation.

If you want to learn how to accomplish a particular task, check out our best practices.

You can also learn more by reading our Phoenix How-To Guides.

PreviousLog as Inferences NextArize Evaluators

Last updated 1 day ago