Project

evalir

0.01
No commit activity in last 3 years
No release in over 3 years
Evalir is used to measure search relevance at Companybook, and offers a number of standard measurements, from the basic precision and recall to single value summaries such as NDCG and MAP.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Development

>= 0
 Project Readme

What is Evalir?

Build Status Code quality

Evalir is a library for evaluation of IR systems. It incorporates a number of standard measurements, from the basic precision and recall, to single value summaries such as NDCG and MAP.

For a good reference on the theory behind this, please check out Manning, Raghavan & Schützes excellent Introduction to Information Retrieval, ch.8.

What can Evalir do?

How does Evalir work?

The goal of an Information Retrieval system is to provide the user with relevant information -- relevant w.r.t. the user's information need. For example, an information need might be:

Information on whether drinking red wine is more effective at reducing your risk of heart attacks than white wine.

However, this is not the query. A user will try to encode her need like a query, for instance:

red white wine reducing "heart attack"

To evaluate an IR system with Evalir, we will need human-annotated test data, each data point consisting of the following:

  • An explicit information need
  • A query
  • A list of documents that are relevant w.r.t. the information need (not the query)

For example, we have the aforementioned information need and query, and a list of documents that have been found to be relevant; { 123, 654, 29, 1029 }. If we had the actual query results in an array named results, we could use an Evalirator like this:

require 'rubygems'
require 'evalir'

relevant = [123, 654, 29, 1029]
e = Evalir::Evalirator.new(relevant, results)
puts "Precision: #{e.precision}"
puts "Recall: #{e.recall}"
puts "F-1: #{e.f1}"	
puts "F-3: #{e.f_measure(3)}"
puts "Precision at rank 10: #{e.precision_at_rank(10)}"
puts "Average Precision: #{e.average_precision}"
puts "NDCG @ 5: #{e.ndcg_at(5)}"

When you have several information needs and want to compute aggregate statistics, use an EvaliratorCollection like this:

e = Evalir::EvaliratorCollection.new
queries.each do |query|
  relevant = get_relevant_docids(query)
  results = get_results(query)
  e.add(relevant, results)
end
	
puts "MAP: #{e.mean_average_precision}"
puts "Precision-Recall Curve: #{e.precision_recall_curve}"
puts "Avg. NDCG @ 3: #{e.average_ndcg_at(3)}"