23 Sep

Markov Chains

So I have been doing a fair amount of work on Markov Chains.  They are really cool for modelling of random process. Earlier this year I wrote a program that would take a book (it was the “Starship Titanic” by Douglas Adams) and convert it into a JSON stored Markov chain. All it did is said if the current word is this, the next word will probably be that. The really interesting component here is that if you fed it enough text and then ignored/hashed the words, you would be left with a data structure that would be a fingerprint of the author/language/time period which could then be matched to other bodies of work/languages/time periods. Think about it . . . you would be able to identify text’s authors or date it was written or the language purely based on the structure and not necessarily the words.

I quickly realized that there are caveats and limitations to standard Markov chains.

Firstly, as languages go the next would that appears may be very different based on context. I managed to get readable text between 5 and 10% of the time with what I will now call a first order Markov chain. What happens if you increase the order? readability goes to 30 to 50%. The trick is that the map becomes an order of magnitude more complicated because now your chain requires the current word AND the previous word to know the probability of the next word. This is extensible to 3rd and 4th and so on orders as well. This could make your Markov chain into a neural net with the addition of the next part.

Secondly, if you are using Markov chains as neural nets, you need to vary your probabilities based on either time decay or some other feedback function. This allows the chains to learn positive feedback behaviours and start to ignore negative feedback. There is a catch to this as well, your network has to consider 0 probability links as being linked, where as standard Markov chains allow for 0 probability links to be ignored (or effectively not there). This increases the storage space required is it becomes a Pn problem. n nodes has n x n connections (remember that a Markov chain has direction) and can loop to itself. This scenario does not lend itself to JSON storage which is essentially an efficient sparse matrix but rather to a dense matrix storage method. A second order of this would require (n x n) x n connections so would not lend itself to higher order chains, each order would add a dimension. 1st order would be 2 dimensions, 2nd order 3, 3rd order 4 etc.

The second case I have not yet done work one as I have no use-case at this time, but I’m sure I’ll get around to running a few tests in the few months.