Markov chain not working


#1

I have managed to scrape data successfully from a wine review website and what I want is to throw it in the Markov Chain and get new reviews. Unfortunately running the run.py file gives nothing. I have no idea what I am doing wrong since I don't get any errors. Fetching the data works so I'm thinking I am missing something in the run.py with the markov chain.


the following is my fetch_data.py:

from urllib2 import urlopen 
from bs4 import BeautifulSoup
import unicodedata
import ast
import requests
import re

html = requests.get("http://www.winespectator.com/dailypicks/category/catid/1/page/").content 
wineReviews = BeautifulSoup(html)
lines = []
for page in xrange(1, 10):
    for headLine in wineReviews.find_all("div", { "class" : "paragraph" }):
            txt1 = headLine.get_text()
            txt1 = re.sub('[ \t]+', ' ', txt1).strip()
            lines.append(txt1)
with open("/Users/mobpro/desktop/markov_chain/winereviews.txt", "w") as f:
    f.write(u'\n\n'.join(lines).encode('utf-8'))

the following is my run.py

from markov_python.cc_markov import MarkovChain

wr = open("winereviews.txt", "r")

mc = MarkovChain(wr)
print mc
print mc.generate_text(15)

running run.py gives the following:

[]

What should I change here to reach my goal?


#2

You'll need to take a closer look at how to use the MarkovChain class, you can look at the source code or even just have python give you a quick summary of what is in that file:

$ pydoc2 markov_python/cc_markov.py

Outputs:

Help on module cc_markov:

NAME
    cc_markov

FILE
    /tmp/markov/markov_python/cc_markov.py

CLASSES
    MarkovChain
    
    class MarkovChain
     |  Methods defined here:
     |  
     |  __init__(self, num_key_words=2)
     |  
     |  add_file(self, file_path)
     |  
     |  add_string(self, str)
     |  
     |  generate_text(self, max_length=20)

The method and argument names say quite a lot about how the class can be used!

The same can be produced by calling the function help with the module as the argument, or just the class


#3

Thank you. I followed your advice and now I have at least some comprehensive output and I use the methods of the given class.
I change run.py into the following:

from markov_python.cc_markov import MarkovChain

mc = MarkovChain()
mc.add_file('/Users/mobpro/desktop/winereviews.txt')

mc.add_string("red")

print mc.generate_text(10)

which gives me gems like:
['spicy', 'with', 'candied', 'ginger', 'and', 'pear', 'flavors', 'softly', 'juicy', 'this']

Now I'm trying to find a short way of getting rid of those brackets and ''.


#4

Hi @szarrinkelkgmail.com ,

You might be interested in looking at the Python: str.join(iterable) method documentation.


#5

can u help me with this?
problem: this is supposed to print 10 words but the terminal only prints []

fetch.py

from urllib2 import urlopen
from bs4 import BeautifulSoup
import unicodedata
import ast
import requests
import re

html = requests.get("http://www.winespectator.com/dailypicks/category/catid/1/page/").content
wineReviews = BeautifulSoup(html)
print wineReviews
lines = []
for page in xrange(1, 10):
for headLine in wineReviews.find_all("div", { "class" : "paragraph" }):
txt1 = headLine.get_text()
txt1 = re.sub('[ \t]+', ' ', txt1).strip()
lines.append(txt1)
with open("/Users/liamchae/documents/markov_chain/wineReviews.txt", "w") as f:
f.write(u'\n\n'.join(lines).encode('utf-8'))

run.py
from markov_python.cc_markov import MarkovChain

mc = MarkovChain()
mc.add_file('/Users/liamchae/Documents/markov_chain/wineReviews.txt')

mc.add_string("red")

print mc.generate_text(10)


#6

can u help me with this?
problem: this is supposed to print 10 words but the terminal only prints []

fetch.py

from urllib2 import urlopen
from bs4 import BeautifulSoup
import unicodedata
import ast
import requests
import re

html = requests.get("http://www.winespectator.com/dailypicks/category/catid/1/page/").content
wineReviews = BeautifulSoup(html)
print wineReviews
lines = []
for page in xrange(1, 10):
for headLine in wineReviews.find_all("div", { "class" : "paragraph" }):
txt1 = headLine.get_text()
txt1 = re.sub('[ \t]+', ' ', txt1).strip()
lines.append(txt1)
with open("/Users/liamchae/documents/markov_chain/wineReviews.txt", "w") as f:
f.write(u'\n\n'.join(lines).encode('utf-8'))

run.py
from markov_python.cc_markov import MarkovChain

mc = MarkovChain()
mc.add_file('/Users/liamchae/Documents/markov_chain/wineReviews.txt')

mc.add_string("red")

print mc.generate_text(10)