Markov Chain not randomizing output


Hello. After much tinkering, I’ve finally been able to get the MarkovChain module to produce an output from the html that I parsed. I am collecting lyrics from an album to use as my chain. When I run the MarkovChain module on my function, it is grabbing 20 words in order and putting them out, instead of producing a random group of 20 words.

My file:

from markov_python.cc_markov import MarkovChain
import fetch_data

url = ''
source = fetch_data.fetch_words(url)

markov = MarkovChain(3)

mc = markov.generate_text()



from urllib.request import urlopen
from bs4 import BeautifulSoup
import re

def fetch_words(url):

    page = urlopen(url)
    soup = BeautifulSoup(page, 'html.parser')
    text = soup.find("div", {"class": "lyrics"})
    data = text.get_text(" ", strip=True)
    lyrics = str(data)
    return lyrics

This outputs:

[‘the’, ‘blatant’, ‘disarray’, ‘disfigure’, ‘the’, ‘public’, ‘eye’s’, ‘disgrace’, ‘defying’, ‘common’, ‘place’, ‘unending’, ‘paper’, ‘chase’, ‘unending’, ‘deafening’, ‘painstaking’, ‘reckoning’, ‘this’, ‘vertigo’]

These words are all in order as they are on the song. I don’t know how to modify to make it randomize the output. Anyone know what to do? Thank you!

The code: (obviously not mine)

import re
import random
from collections import defaultdict, deque

Codecademy Pro Final Project supplementary code

Markov Chain generator
  This is a text generator that uses Markov Chains to generate text
  using a uniform distribution.

  num_key_words is the number of words that compose a key (suggested: 2 or 3)

class MarkovChain:

  def __init__(self, num_key_words=2):
    self.num_key_words = num_key_words
    self.lookup_dict = defaultdict(list)
    self._punctuation_regex = re.compile('[,.!;\?\:\-\[\]\n]+')
    self._seeded = False

  def __seed_me(self, rand_seed=None):
    if self._seeded is not True:
        if rand_seed is not None:
        self._seeded = True
      except NotImplementedError:
        self._seeded = False

  " Build Markov Chain from data source.
  " Use add_file() or add_string() to add the appropriate format source
  def add_file(self, file_path):
    content = ''
    with open(file_path, 'r') as fh:

  def add_string(self, str):

  def __add_source_data(self, str):
    clean_str = self._punctuation_regex.sub(' ', str).lower()
    tuples = self.__generate_tuple_keys(clean_str.split())
    for t in tuples:

  def __generate_tuple_keys(self, data):
    if len(data) < self.num_key_words:

    for i in range(len(data) - self.num_key_words):
      yield [ tuple(data[i:i+self.num_key_words]), data[i+self.num_key_words] ]

  " Generates text based on the data the Markov Chain contains
  " max_length is the maximum number of words to generate
  def generate_text(self, max_length=20):
    context = deque()
    output = []
    if len(self.lookup_dict) > 0:

      idx = random.randint(0, len(self.lookup_dict)-1)
      chain_head = list(self.lookup_dict.keys())[idx]

      while len(output) < (max_length - self.num_key_words):
        next_choices = self.lookup_dict[tuple(context)]
        if len(next_choices) > 0:
          next_word = random.choice(next_choices)
    return output


Hey, did you figure this out? Just wondering. I can see myself running into this issue as well. Once I get my MC to generate SOMETHING…


If I run the program over and over again, I will occasionally get something randomized. However, more often than not, it will spit out words in the correct order as they appear in the song. I’m still stumped.