I'm not sure anyone knows the ``historiography'' for sure, but I
suspect that HMM taggers go back to at least the early 1980s. Here is
a section from my 1988 paper:
Statistical ngram models were quite popular in the 1950s, and have been
regaining popularity over the past few years. The IBM speech group is
perhaps the strongest advocate of ngram methods, especially in other
applications such as speech recognition. Robert Mercer (private
communication, 1982) has experimented with the tagging application,
using a restricted corpus (laser patents) and small vocabulary (1000
words). Another group of researchers working in Lancaster around the
same time, Leech, Garside and Atwell, also found ngram models highly
effective; they report 96.7% success in automatically tagging the LOB
Corpus, using a bigram model modified with heuristics to cope with more
important trigrams. The present work developed independently from the
LOB project.