Corpora: IR language models

From: Djoerd Hiemstra (hiemstra@cs.utwente.nl)
Date: Wed Jun 21 2000 - 15:52:48 MET DST

  • Next message: Hugo Zaragoza: "Corpora: CFP (Extended): PKDD'2000 Workshop "MACHINE LEARNING AND TEXTUAL INFORMATION ACCESS""

    A new technical report on the use of language models for information
    retrieval can now be downloaded from:
      http://www.ctit.utwente.nl/publications/Tr2000/

    Comments are welcome.

    D. Hiemstra and A.P. de Vries
    "Relating the new language models of information retrieval to
     the traditional retrieval models", CTIT Technical Report
     TR-CTIT-00-09, May 2000
     
    ABSTRACT
    During the last two years, exciting new approaches to information
    retrieval were introduced by a number of different research groups that
    use statistical language models for retrieval. This paper relates the
    retrieval algorithms suggested by these approaches to widely accepted
    retrieval algorithms developed within three traditional models of
    information retrieval: the Boolean model, the vector space model and
    the probabilistic model. The paper shows the existence of efficient
    retrieval algorithms that only use the matching terms in their
    computation. Under these conditions, the language models of information
    retrieval are surprisingly similar to both tf.idf term weighting as
    developed for the vector space model and relevance weighting as
    developed in the traditional probabilistic model. The paper suggests a
    new method for relevance weighting and a new method to rank documents
    giving Boolean queries. Experimental results on the TREC collection
    indicate that the language modelling approach outperforms the three
    traditional approaches.



    This archive was generated by hypermail 2b29 : Wed Jun 21 2000 - 15:51:34 MET DST