Markov Chain not randomizing output


#1

Hello. After much tinkering, I’ve finally been able to get the MarkovChain module to produce an output from the html that I parsed. I am collecting lyrics from an album to use as my chain. When I run the MarkovChain module on my fetch_data.py function, it is grabbing 20 words in order and putting them out, instead of producing a random group of 20 words.

My run.py file:

from markov_python.cc_markov import MarkovChain
import fetch_data


url = 'http://www.darklyrics.com/lyrics/metallica/andjusticeforall.html#2'
source = fetch_data.fetch_words(url)


markov = MarkovChain(3)
markov.add_string(source)

mc = markov.generate_text()

print(mc)


My fetch_data.py:

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re

def fetch_words(url):

    page = urlopen(url)
    soup = BeautifulSoup(page, 'html.parser')
    text = soup.find("div", {"class": "lyrics"})
    data = text.get_text(" ", strip=True)
    lyrics = str(data)
    return lyrics


This outputs:

[‘the’, ‘blatant’, ‘disarray’, ‘disfigure’, ‘the’, ‘public’, ‘eye’s’, ‘disgrace’, ‘defying’, ‘common’, ‘place’, ‘unending’, ‘paper’, ‘chase’, ‘unending’, ‘deafening’, ‘painstaking’, ‘reckoning’, ‘this’, ‘vertigo’]

These words are all in order as they are on the song. I don’t know how to modify cc_markov.py to make it randomize the output. Anyone know what to do? Thank you!

The cc_markov.py code: (obviously not mine)

import re
import random
from collections import defaultdict, deque

"""
Codecademy Pro Final Project supplementary code

Markov Chain generator
  This is a text generator that uses Markov Chains to generate text
  using a uniform distribution.

  num_key_words is the number of words that compose a key (suggested: 2 or 3)
"""

class MarkovChain:

  def __init__(self, num_key_words=2):
    self.num_key_words = num_key_words
    self.lookup_dict = defaultdict(list)
    self._punctuation_regex = re.compile('[,.!;\?\:\-\[\]\n]+')
    self._seeded = False
    self.__seed_me()

  def __seed_me(self, rand_seed=None):
    if self._seeded is not True:
      try:
        if rand_seed is not None:
          random.seed(rand_seed)
        else:
          random.seed()
        self._seeded = True
      except NotImplementedError:
        self._seeded = False

  """
  " Build Markov Chain from data source.
  " Use add_file() or add_string() to add the appropriate format source
  """
  def add_file(self, file_path):
    content = ''
    with open(file_path, 'r') as fh:
      self.__add_source_data(fh.read())

  def add_string(self, str):
    self.__add_source_data(str)

  def __add_source_data(self, str):
    clean_str = self._punctuation_regex.sub(' ', str).lower()
    tuples = self.__generate_tuple_keys(clean_str.split())
    for t in tuples:
      self.lookup_dict[t[0]].append(t[1])

  def __generate_tuple_keys(self, data):
    if len(data) < self.num_key_words:
      return

    for i in range(len(data) - self.num_key_words):
      yield [ tuple(data[i:i+self.num_key_words]), data[i+self.num_key_words] ]

  """
  " Generates text based on the data the Markov Chain contains
  " max_length is the maximum number of words to generate
  """
  def generate_text(self, max_length=20):
    context = deque()
    output = []
    if len(self.lookup_dict) > 0:
      self.__seed_me(rand_seed=len(self.lookup_dict))

      idx = random.randint(0, len(self.lookup_dict)-1)
      chain_head = list(self.lookup_dict.keys())[idx]
      context.extend(chain_head)

      while len(output) < (max_length - self.num_key_words):
        next_choices = self.lookup_dict[tuple(context)]
        if len(next_choices) > 0:
          next_word = random.choice(next_choices)
          context.append(next_word)
          output.append(context.popleft())
        else:
          break
      output.extend(list(context))
    return output

#2

Hey, did you figure this out? Just wondering. I can see myself running into this issue as well. Once I get my MC to generate SOMETHING…


#3

If I run the program over and over again, I will occasionally get something randomized. However, more often than not, it will spit out words in the correct order as they appear in the song. I’m still stumped.