As far as I recall from my Speech Recognition classes, the difference
between a `plain' Markov model and a hidden Markov model is as
follows:
Both consist of nodes and vertices, and while traversing the graph, at
each change of state a symbol is emitted. Thus, if you know the
structure of a *plain* Markov model, you can follow the path that lead
to the sequence of observed symbols.
A hidden Markov model is basically the same, but instead of just
emitting the same symbol at a given state, there is now a choice of
symbols, each with a certain probability of being selected. This
additional dimension of complexity may make it impossible to trace the
path, since the clear connection between a state and an observed symbol
is no longer valid (that's why it's called `hidden').
More complexity (usually) implies more power, and hidden Markov models
have become quite popular in Speech Recognition since about a decade
ago.
If this short explanation is too confusing, there is a very good text
on the matter: L. Rabbiner, Tutorial on Hidden Markov Models, IEEE
Spectrum(?). If I remember correctly, it was published around 1989.
> If so, what makes it more *hidden* than Church's and DeRose's taggers?
> Is it the fact that it is trained on un-tagged data?
I think the kind of data used for estimating the probabilities of state
transitions and symbol emissions has nothing to do with the structure
of the model itself.
> I have tried to trace this questions down in the literature but
> couldn't find a definite answer.
There is a lot of literature in Speech Recognition/ Signal Processing,
especially in the IEEE journals. Unfortunately I don't have any
references within reach.
Hope this helps,
Oliver Jakobs
-- Oliver Jakobs -----+--------------------------+ Wisdom is one of the few Research Associate \ email O.Jakobs@bham.ac.uk\ things that looks bigger School of English \ phone +44-(0)121-414-6206\ the further away it is. Birmingham University \ fax +44-(0)121-414-3288 +------- Terry Pratchett