Python Final Project: Einstein Tweet Bot


#1



Hey guys. After a long time hitting my head against the wall trying to figure out how to do simple things I just didn't yet know how to do, I finally broke through to the other side and have some semblance of a completely final project for the Python course.


I have written a simple tweet bot that scrapes 25 Einstein quotes from a goodreads website, then formats the code into text that the markhov chain generator can read, saves it to a text doc, and spits out a brand new quote from Einstein. I call it Einstein Remastered.


It definitely does not work perfectly. It sometimes takes large chunks of quotes directly (I think the sample size of 25 quotes did not provide enough variety). It also is still all lowercase, and has no punctuation.

Please let me know what you think of my code, or if you have any improvements you think I could make!


Here is my fetch_data file

import html2text
from re import sub
from urllib2 import urlopen
from bs4 import BeautifulSoup

urlpage=urlopen("https://www.goodreads.com/author/quotes/9810.Albert_Einstein").read()
bswebpage=BeautifulSoup(urlpage, 'lxml')
results=bswebpage.findAll("div",{'class':"quoteText"})
equotes=""

for result in results:
    equotes += sub("“|.”","","".join(result.contents[0:1]).strip())

with open("EinsteinQuotes.txt", "w") as f:
		f.write(equotes.encode("utf-8"))

And here is my run.py file:

from markov_python.cc_markov import MarkovChain
import string
from re import sub

mc = MarkovChain()

mc.add_file("/Users/Alex/documents/alex/python/polonibot/EinsteinQuotes.txt")
EinsteinRemastered = []

EinsteinRemastered += (mc.generate_text(18))

output = " ".join(EinsteinRemastered)
print sub('\xe2\x80\x9d|\xe2\x80\x9c|“|”', '', output)