• Skip to primary navigation
  • Skip to content
  • Skip to footer
AI Model Code
  • About
  • Model statute
  • Technical resources
  • Industry guidance
  • Resource map
  • Give feedback
    • Disparity evaluation
    • LLM evaluation
    • Documenting models

    Evaluating language models for accuracy and bias

    Some tools for evaluating large language models include:

    • OpenAI evals: OpenAI’s LLM evaluation tool, with a benchmark repository
    • Evidently: an ML and LLM observability tool which can be used on general ML tasks, including LLMs

    Applicable statutes

    • Section 2.3

    Updated: November 6, 2024

    Previous Next
    • Feed
    © 2024 AI Model Code. Powered by Jekyll & Minimal Mistakes.