Markov Chain not randomizing output


Hello. After much tinkering, I’ve finally been able to get the MarkovChain module to produce an output from the html that I parsed. I am collecting lyrics from an album to use as my chain. When I run the MarkovChain module on my function, it is grabbing 20 words in order and putting them out, instead of producing a random group of 20 words.

My file:

from markov_python.cc_markov import MarkovChain
import fetch_data

url = ''
source = fetch_data.fetch_words(url)

markov = MarkovChain(3)

mc = markov.generate_text()



from urllib.request import urlopen
from bs4 import BeautifulSoup
import re

def fetch_words(url):

    page = urlopen(url)
    soup = BeautifulSoup(page, 'html.parser')
    text = soup.find("div", {"class": "lyrics"})
    data = text.get_text(" ", strip=True)
    lyrics = str(data)
    return lyrics

This outputs:

[‘the’, ‘blatant’, ‘disarray’, ‘disfigure’, ‘the’, ‘public’, ‘eye’s’, ‘disgrace’, ‘defying’, ‘common’, ‘place’, ‘unending’, ‘paper’, ‘chase’, ‘unending’, ‘deafening’, ‘painstaking’, ‘reckoning’, ‘this’, ‘vertigo’]

These words are all in order as they are on the song. I don’t know how to modify to make it randomize the output. Anyone know what to do? Thank you!

The code: (obviously not mine)

import re
import random
from collections import defaultdict, deque

Codecademy Pro Final Project supplementary code

Markov Chain generator
  This is a text generator that uses Markov Chains to generate text
  using a uniform distribution.

  num_key_words is the number of words that compose a key (suggested: 2 or 3)

class MarkovChain:

  def __init__(self, num_key_words=2):
    self.num_key_words = num_key_words
    self.lookup_dict = defaultdict(list)
    self._punctuation_regex = re.compile('[,.!;\?\:\-\[\]\n]+')
    self._seeded = False

  def __seed_me(self, rand_seed=None):
    if self._seeded is not True:
        if rand_seed is not None:
        self._seeded = True
      except NotImplementedError:
        self._seeded = False

  " Build Markov Chain from data source.
  " Use add_file() or add_string() to add the appropriate format source
  def add_file(self, file_path):
    content = ''
    with open(file_path, 'r') as fh:

  def add_string(self, str):

  def __add_source_data(self, str):
    clean_str = self._punctuation_regex.sub(' ', str).lower()
    tuples = self.__generate_tuple_keys(clean_str.split())
    for t in tuples:

  def __generate_tuple_keys(self, data):
    if len(data) < self.num_key_words:

    for i in range(len(data) - self.num_key_words):
      yield [ tuple(data[i:i+self.num_key_words]), data[i+self.num_key_words] ]

  " Generates text based on the data the Markov Chain contains
  " max_length is the maximum number of words to generate
  def generate_text(self, max_length=20):
    context = deque()
    output = []
    if len(self.lookup_dict) > 0:

      idx = random.randint(0, len(self.lookup_dict)-1)
      chain_head = list(self.lookup_dict.keys())[idx]

      while len(output) < (max_length - self.num_key_words):
        next_choices = self.lookup_dict[tuple(context)]
        if len(next_choices) > 0:
          next_word = random.choice(next_choices)
    return output


Hey, did you figure this out? Just wondering. I can see myself running into this issue as well. Once I get my MC to generate SOMETHING…


If I run the program over and over again, I will occasionally get something randomized. However, more often than not, it will spit out words in the correct order as they appear in the song. I’m still stumped.


OP has probably moved on by now but I also had this problem and my answer might be useful for others.

Basically you need to change num_key_words to =1 not =2. In your example line the program is looking for words that follow ‘the blatant’ which can only be ‘disarray’, and then ‘blatant disarray’ which can only be ‘disfigure’.
If you change num_key_words to 1 then it will look for words which follow ‘the’ which could be ‘blatant’ or ‘public’, and words which follow ‘unending’ which could be ‘paper’ or ‘deafening’.
Hope this helps