Corpora: IR language models

From: Djoerd Hiemstra (hiemstra@cs.utwente.nl)
Date: Wed Jun 21 2000 - 15:52:48 MET DST

Next message: Hugo Zaragoza: "Corpora: CFP (Extended): PKDD'2000 Workshop "MACHINE LEARNING AND TEXTUAL INFORMATION ACCESS""

Previous message: Alejandro Curado Fuentes: "Re: Corpora: Keywords in texts"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

A new technical report on the use of language models for information
retrieval can now be downloaded from:
http://www.ctit.utwente.nl/publications/Tr2000/

Comments are welcome.

D. Hiemstra and A.P. de Vries
"Relating the new language models of information retrieval to
the traditional retrieval models", CTIT Technical Report
TR-CTIT-00-09, May 2000

ABSTRACT
During the last two years, exciting new approaches to information
retrieval were introduced by a number of different research groups that
use statistical language models for retrieval. This paper relates the
retrieval algorithms suggested by these approaches to widely accepted
retrieval algorithms developed within three traditional models of
information retrieval: the Boolean model, the vector space model and
the probabilistic model. The paper shows the existence of efficient
retrieval algorithms that only use the matching terms in their
computation. Under these conditions, the language models of information
retrieval are surprisingly similar to both tf.idf term weighting as
developed for the vector space model and relevance weighting as
developed in the traditional probabilistic model. The paper suggests a
new method for relevance weighting and a new method to rank documents
giving Boolean queries. Experimental results on the TREC collection
indicate that the language modelling approach outperforms the three
traditional approaches.

Next message: Hugo Zaragoza: "Corpora: CFP (Extended): PKDD'2000 Workshop "MACHINE LEARNING AND TEXTUAL INFORMATION ACCESS""
Previous message: Alejandro Curado Fuentes: "Re: Corpora: Keywords in texts"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Wed Jun 21 2000 - 15:51:34 MET DST