Markov Chain is an extremely useful tool in the field of machine learning. During my writing of the final paper for my linguistics class, my friend mentioned about generating a paper by Markov training. The paper generated would look similar to the author’s writing style, but probably needs revision since it would very likely to be gibberish (of course you cannot send THAT to your professor)! Since I heard this idea long ago, this time I decided to make it work.
The basic idea of Markov Chain is really simple. It’s just to remember the sequence of states for future prediction. Say, a sunny day after a rainy day occurs 80% of the time, and a rainy day after a rainy day occurs 20% of the time. This is the simplest first-order Markov chain since it only remembers one state before. So if today rains, this Markov chain would predict tomorrow to be a sunny day for 80% of chance and a rainy day for 20% of chance. If you feel this kind of prediction sounds sketchy, how about this: the chance of raining after sunny, rainy, rainy, sunny, rainy, sunny and sunny is 19%? This is a 7th order Markov chain that keeps the record of a whole week’s weather to predict the weather of the day after. It’s doing better. If we apply this to some texts, then the Markov chain would be remembering words. At the beginning the program randomly chooses a key chain (the states in the record for prediction); then it starts a random walk between the states. It would choose the next element based on the probability recorded in the data, update the state, and take next step. If the state after update does not exist in key chains, it randomly picks a keychain again. For example, if (A B) : C with p = 0.25, (A B) : D with p = 0.75, (B C) : D, (A C): B, the program first starts with AB with probability of 0.33. It chooses element D with p = 0.75, and the current state becomes (B D). Since BD does not exist in the four brackets above, it randomly starts again with AC, with p = 0.33, and starts walking again… Now let’s apply this idea to text generation.
First I tried to make the chains by characters. That is, each single character is a simple state. That includes empty spaces, syntax, and some other non-word characters. As you would expect this does not lead to good, reasonable results; some weird characters would jump into the middle of words and disrupt everything. The result becomes better after the order of the chain exceeds 4, while the key chains get close to the average length of English words. It is still quite unstable, though, and misspelled words occur everywhere. I can’t use too high orders, either; it’s clear that high order Markov chains for text would result in almost similar excerpts of the original text, thus lose meanings for “text generation”.
Then I parsed the text input and applied the same mechanism to independent words. Each word now is an element, and the program takes a random walk between words. After a few tries, I found the best order to be 2, while 1 is complete nonsense and 3 contains too many similar excerpts from the input. I used my college application essays as input, and hopefully, you can recognize the following text to be my writing…
as your upbringing, community, and/or activities) impacted who you are, your future goals, and your place within it.
It’s a warm autumn afternoon. Sit next to them, I stood in the air. The still life in the field and achieve my pursuit, contribute my tiny part to this crispy water melody, trying to use statistics to the finish line.
11th place, in a vat?
In approximately one paragraph, please address the following questions in a wide scope, attracts me to doubt the world through small holes. Perhaps agnostic, I think this is what I saw: my body again! The shop assistant was angry, and I hope to generate a solution to solve puzzles in my city. By mapping the collected data into MATLAB and Excel sheets, I found an apparent correlation between science and philosophy enlightens me, attracts me to stop peddling in the sky made me develop a special disposition: my interests as well.
I live in the soothing aroma. Blanc. Nothing worried me. Smiling, I could still feel myself, but also honed my patience to keep good-shaped, easy and steadfast. I do things with the registration forms, working book and a card of instruction next to me in an industrial environment, she has an overlap with statistics. I even tried to relieve recent tension thoroughly. When I can’t wait for high school, my schools underscore “full-size” development, and shape me in this application, some interest or experience of yours that you will contribute to Rice community in a variety of places.
I’ve been living in Ningbo for 18 years since I couldn’t change my situation, I change myself and fight for a cup of coffee at our university coffee house, Campus Grounds. Whom would you choose to face my life. While steering through a hazy night, this epiphany evaded as the one in the eastern seaside part, exists this kind of learning, community, and contributing with all I have.
CMU is a magic, where professor may dress up as Darth Vader on final exam, where homework sometimes appears in both English and Klingon, where Quidditch club and homework parties exist. But most importantly, knowledge and jobs. If they reach certain living standard, then they can live on themselves, and our goal is achieved. Donation will be active in fundraising and project promotion, and will make no harm.
Now my biggest pursuit in next sentence! Statistics is really a fascinating thing.
After these efforts, the day I returned home, I decided to donate some money of my high school felt like, say, I can never rediscover the feelings of adolescence: the previous circle, only to realize that I was a note of Yahoo! Daily Pictures. As a habit, I click the link button everyday to view the pictures and information. But this epiphany hit me. Calligraphy was my compass to navigate through the shaking floor, while chairs and desks were deliberately blocking his way, violently rolling in the field and achieve my pursuit, contribute my tiny part to this prevalent value; I do the same time, I have never dabbled in music except singing Karaoke; but as I grew up, the world you come from — for example, your family, community or school — and tell us how your past circumstances and experiences (such as your upbringing, community, and/or activities) impacted who you are, your future goals, and your place within it.
It’s a warm autumn afternoon. Sit next to me on this aspect; the GATEWAY program also enables me to discover myself, but my hands just went through him. At that moment I opened my eyes, and a card of instruction next to them, I stood in the room became their targets. I turned aside, muffled myself within the quilt, shirking away from my toes when someone on the ground and a card of instruction next to them, I stood in the Icelandic springs, basking in hometown sunshine. Impurity, wickedness, and tension oozed out from my old life and those of others. A common life style, ordinary, I thought.
It was probably a fortuitous encounter, or maybe was fated. This circular mode ended when I need them.
7. Read novels
A way to relax at any second. But I have to try to compete. There was no way to relax at any second, but I still hold that dream in the boundary of absolute rightness. This is USC, the pith of my friends, or, myself? Maybe it was quite different from what I felt—I couldn’t tell which one was real me. I leaned on father’s shoulders. “Read all these ads and slogans,” he whispered to my observation: his facial muscle didn’t twitched even when that man over a hundred percent certainty: yes, these were what I was shocked by the overwhelming beauty of greenbelt. But that didn’t shake my decision. I drew closer to the buzz of electricity, defying the Law of Universal Gravitation itself, a formula that realizes the dream of being an engineer: relive my childhood hopes. To anyone, childhood memories are eternal treasures in life.
And here’s the Java code to do this:
Class MarkovGenerator
public class MarkovGenerator {
private int _outputLen;
private int _order;
private Random chooser = new Random(419);
private HashMap<String,HashMap<String,Integer>> _table;
public MarkovGenerator(int len, int order,
HashMap<String,HashMap<String,Integer>> table) {
_outputLen = len; _order = order; _table = table;
}
public String generateDoc() {
String start = randomPick(_table.keySet());
String constructed = start;
int k = 0;
while (k < _outputLen/_order) {
String next = weightedPick(_table.get(start));
if (next != null) {
start = next;
constructed += next;
k++;
} else {
start = randomPick(_table.keySet());
}
}
return constructed;
}
private String randomPick(Set T) {
int item = chooser.nextInt(T.size());
int i = 0;
for (String k : T) {
if (i == item)
return k;
i += 1;}
return null;
}
private String weightedPick(HashMap<String,Integer> map) {
if (map == null)
return null;
int totalCount = 0;
for (String c : map.keySet()) {
totalCount += map.get(c);
}
int random = chooser.nextInt() * totalCount;
for (String c : map.keySet()) {
random -= map.get(c);
if (random <= 0) {
return c;
}
}
return null;
}
}
Class MarkovTable
public class MarkovTable {
private String _learningText = "";
private int _order = 0;
private HashMap<String,HashMap<String,Integer>> table = new HashMap<String,
HashMap<String,Integer>>();
public MarkovTable(int or, String feed) {
_order = or; _learningText = feed;
}
public HashMap<String,HashMap<String,Integer>> scan() {
for (int i = 0; i < _learningText.length() - _order; i++) {
String s = _learningText.substring(i, i + _order);
if (!table.containsKey(s)) {
table.put(s, new HashMap<String,Integer>());
}
} // scans first time to create symbol table
for (int i = 0; i < _learningText.length() - 2 * _order; i++) {
String chain = _learningText.substring(i, i + _order);
String suffix = _learningText.substring(i + _order, i + 2 * _order);
if (!table.get(chain).containsKey(suffix)) {
table.get(chain).put(suffix, 1);
} else {
table.get(chain).put(suffix, table.get(chain).get(suffix) + 1);
} // scan second time and count the numbers of everything
}
return table;
}
}
Class DocumentReader:
import java.io.*;
public class DocumentReader {
private BufferedReader fileReader;
private StringBuilder buildCorpusString = new StringBuilder();
public DocumentReader(String fname) { try {
fileReader = new BufferedReader(new FileReader(fname)); }
catch (FileNotFoundException fn) {
System.out.println("Error: File not found!");
fn.printStackTrace(); }
String currentLine;
int count = 0; try {
while ((currentLine = fileReader.readLine()) != null) {
buildCorpusString.append(currentLine);
// read the line and parse it!
if (count % 2000 == 0) System.out.format(".");
count++; // a simple progress bar
}
fileReader.close(); } catch (IOException e) {
System.out.println("Problem parsing corpus.");
e.printStackTrace(); }
}
public String getString() {
return buildCorpusString.toString();
}
}
Class DocGenerator:
import java.io.*;
public class DocGenerator {
private String genText;
public DocGenerator(String filename) {
DocumentReader doc = new DocumentReader(filename);
String txt = doc.getString();
BufferedReader myConsole = new BufferedReader(new InputStreamReader(System.in));
Integer order = null;
System.out.format("What order of Markov Chain do you want to use? ");
while (order == null) try {
order = Integer.parseInt(myConsole.readLine()); } catch (Exception e) {
System.out.println("\nBad Input!");
}
while (order < 0 || order >= txt.length()) try {
System.out.println("Illegal order!");
order = Integer.parseInt(myConsole.readLine());
} catch (Exception e) {e.printStackTrace();}
Integer len = null;
System.out.format("What's your expected output length? ");
while (len == null) try {
len = Integer.parseInt(myConsole.readLine()); } catch (Exception e) {
System.out.println("\nBad Input!");
}
while (len < 0) try {
System.out.println("Illegal length!");
len = Integer.parseInt(myConsole.readLine());
} catch (Exception e) {e.printStackTrace();}
MarkovTable myTable = new MarkovTable(order,txt);
MarkovGenerator myMak = new MarkovGenerator(len,order,myTable.scan());
genText = myMak.generateDoc();
}
public void writeTxt(String fn) { try {
PrintWriter stream = new PrintWriter(new FileWriter(fn));
stream.write(genText);
stream.close();
} catch (IOException e) {e.printStackTrace();}
}
}
Function call:
DocGenerator myDoc = new DocGenerator("/Library/college.txt");
myDoc.writeTxt("try.xml");