Intelligently generate text with Markov Text Generator

Give this small program some text, and it can analyze what letters or words tend to appear together, then generate new text that’s statistically similar. You can configure it to look at pieces as small as individual letters, or at large pieces like whole groups of words. Under the hood, it counts how often particular letters or words appear after each unique piece, then uses those probabilities to build a new chain of pieces. I was informed that these are called Markov probability chains; thus, Markov Text Generator.

I originally wrote it since I wanted a program that could look at a few words in a language I was starting to make up and then create some more words that were similar, helping me get a better sense of the shape and sound of the language. But I soon discovered that I could use the same algorithm to generate all kinds of interesting text.

Unfortunately, this software only runs on Windows and has no nice user interface – it’s just a command line tool.

If you’d like to try it out, though, here’s how. First off, you can download Markov Text Generator here by clicking to download MarkovTextGenerator-1.2.zip. (You can also look at the source code there on GitHub.) Save the zip file somewhere, unzip it, and then open a command prompt and navigate to that folder. If you can figure out how to do that, you should be able to use MTG without much trouble.

Okay, so here we are in the folder where MTG is. Now you need to create some input text for MTG to analyze. How about first we try giving it a bunch of Alethi names from Brandon Sanderson’s Stormlight Archive, and see if it can generate some similarly Alethi-sounding names? I’m going to open up Notepad and put this text in a new file:

Kaladin Shallan Adolin Dalinar Kholin Renarin Gavilar Torol Sadeas Meridas Amaram Lirin Hesina Navani Jasnah Elhokar Laral Ialai Lin Balat Wikim Helaran Merin Aesudan Aladar Roion Coreb Avarak Matal Hashal Bashin Bethab Hatham Havar Jakamav Teleb Shulin Wistiow Tien Lamaril Natam Rillir Roshone Sebarial Ruthar Salinor Teshav Thanadal Vamah Yenev

I’ll save that file as “Alethi.txt” in the same folder that MTG is in, and then head back to my command prompt.

Now let’s try running MTG on this file with some basic configuration. In the command prompt, I’ll run MarkovTextGenerator.exe -i Alethi.txt -o 15. This means its input text is in “Alethi.txt”, and it should output 15 new words. Let’s see what we get…

Hmm, we’ve got some pretty odd words there. They look vaguely Alethi, but some of them are way off. How can we fix that? Well, right now by default, MTG is looking just at what comes after individual letters. It doesn’t see each letter in context. But if we told it to look at groups of letters instead, it can be more smart about what letters appear in what contexts…

So now let’s try running MarkovTextGenerator.exe -i Alethi.txt -o 15 -g 2. This tells MTG to use a “group size” of 2. Instead of seeing what tends to come after each unique letter, it will see what tends to come after each unique group of 2 letters.

We’ve still got some weird names in there, but we’re closer to actually sounding Alethi. Maybe try a group size of 3 with MarkovTextGenerator.exe -i Alethi.txt -o 15 -g 3?

Now we’re getting names so similar to the original ones that it’s not quite as useful anymore. See, group size is a sliding scale. At one end, with small group sizes, you get output that’s very different from the original text. As group size increases, the output becomes more and more similar to the original text.

So far we’ve just been using MTG to generate words. But how well can it do generating whole phrases and sentences? This time, I’m going to try giving it the entire text of Charles Dickens’ Great Expectations, snagged from Project Gutenberg. I took out the Project Gutenberg intro and licensing stuff at the beginning and end, then saved the text of the book as “Great_Expectations.txt”. So now let’s try running it through MTG, starting out with a large group size so that it will consider whole words and their usage instead of just letters. Something like MarkovTextGenerator.exe -i Great_Expectations.txt -o 30 -g 5.

Your result might be something like this. Pretty nonsensical, though it does mostly produce real English words. With larger group sizes, you might improve the output a bit…but there’s also another option. So far, MTG has been splitting up the text for analysis based on letters. But we can also tell it split up the text by words instead using the -w flag, like by running MarkovTextGenerator.exe -i Great_Expectations.txt -o 30 -w

This will probably take a while – it’s a pretty long book, after all – but once you’re finished you should get something along this line. Looking a bit better! But to improve it even more, we can set group size here, too. Instead of grouping letters like it did before, this will group words, so MTG will see what words tend to appear after each unique group of, say, 2 words. So let’s try running MarkovTextGenerator.exe -i Great_Expectations.txt -o 30 -w -g 2.

Nice! MTG is never going to generate text perfectly, since it’s totally unaware of rules of grammar and such. But the more input you give it to learn from, and the more you tweak the group size, the better output you should be able to produce. You can at least get some pretty entertaining nonsense.

If you enjoyed MTG, comment here or contact me and let me know!

Leave a Reply

Your email address will not be published. Required fields are marked *