Mini Linter step 4


#1

I’m just totally confused. My code doesn’t throw an error but neither does it console.log() anything.

Task:
There is an array of words called overusedWords. These are words overused in this story. You want to let the user of your program know how many times they have used these overused words. There are two ways to achieve this. Try it on your own first. If you need help, consult the hint.

Hint:

  1. You can iterate over the betterWords array three separate times (once for each of the words in the overusedWords array). Create a variable that represents the total times that word appears. Add 1 to the variable every time the current word is the same as that word.

  2. You can make this simpler by using one if, and two else if statements in the function code block of your iterator. That way, you can gather the counts of all three overused words at one time.

let overusedWords = [‘really’, ‘very’, ‘basically’];

let unnecessaryWords = [‘extremely’, ‘literally’, ‘actually’ ];

let storyWords = story.split(’ ');

let veryCount = 0;

let reallyCount = 0;

let basicallyCount = 0;

let betterWords = storyWords.filter(storyWord => !unnecessaryWords.includes(storyWord))

for (let betterWordsIndex = 0; betterWordsIndex < betterWords.length; betterWordsIndex++) {
for (let overusedWordsIndex = 0; overusedWordsIndex < overusedWords.length; overusedWordsIndex++) {
if (betterWords[betterWordsIndex] === ‘very’) {
return veryCount++;
} else if (betterWords[betterWordsIndex] === ‘really’) {
return reallyCount++;
} else if (betterWords[betterWordsIndex] === ‘basically’) {
return basicallyCount++;
}
}
}
console.log(veryCount)
console.log(reallyCount)
console.log(basicallyCount)

Why doesn’t my code console.log() anything? How do I count each of the words in overusedWord; and console.log() them?


Mini Linter step 4
#2

So I just changed

to be Array’s

let veryCount = [];

let reallyCount = [];

let basicallyCount = [];

and instead of counting like before:

I would add matching words to the respective Array like this:

if (betterWords[betterWordsIndex] === ‘very’) {
veryCount.push(betterWords[betterWordsIndex]);
}

and just count the length of each Array:

console.log(veryCount.length) //returns 15 incorrectly
console.log(reallyCount.length) //returns 6 incorrectly
console.log(basicallyCount.length) // returns 3 incorrectly

this counts the words correctly but having to spereate them before counting them seems inefficient and somehow incorrect. It works but still. How would you guys do different?


#3

Mini Linter

At this early stage, efficiency is somewhere down the list of concerns. Writing code that produces the desired results and gives us the outcome we expect, however it is done, is of greater importance. Your approach of trying different things is one of the best ways to learn.

Being as this is a module on iterators, for is a lost leader given that we now have the more powerful, forEach to work with. We can iterate over the overUsedWords array and search for matches in the text. That will make the code a little simpler. What we do inside the loop is up to us, whether we accumulate a variable, or create a frequency table (histogram), or whichever comes to mind.

Say we go through the text and build a histogram of all the words. Then we simply look up the words from the overUsedWords list and retrieve each count. We could strip all the punctuation and switch everything to lowercase to make the list shorter and ensure it contains only words. It would even let us find what word(s) appear the most (the mode)


#4

I’d love for you to take a look at it. Maybe even point a finger at what I am not seeing or am not understanding.


#5

Let’s start by stripping all the punctuation, meaning quotes, commas and fullstops. Typically the geek types would reach for a regular expression to do this, and it would be justified on a massive text like the constitution or Mark Twains ‘Roughing It’ where size is a concern. For smaller stuff we don’t need the regular expression engine (which is a real resource hog on one-off, small or smallish bodies of text), just a decent algo to find and remove unwanted characters.

The easiest way to do this is on the raw text file. I’ve inserted this line just after we’ve filtered a few words and created betterWords.

let repro = betterWords.join(' ').split('');

Essentially, the array is joined into a string and then split on characters, not spaces. This creates an array of single characters.

non_word_characters = [',', ';', ':', '.', '?', '!', '"'];

This is an array of characters that we can use forEach on.

non_word_characters.forEach(x => repro = repro.filter(y => y != x));

console.log(repro.join(''))

All that’s left now is to split it into words again, this time with no punctuation.

Last weekend I took the most beautiful bike ride of my life The route is called The 9W to Nyack and it stretches all the way from Riverside Park in Manhattan to South Nyack New Jersey It's really an adventure from beginning to end It is a 48 mile loop and it basically took me an entire day I stopped at Riverbank State Park to take some artsy photos It was a short stop though because I had a really long way left to go After a quick photo op at the very popular Little Red Lighthouse I began my trek across the George Washington Bridge into New Jersey The GW is very long - 4760 feet I was already very tired by the time I got to the other side An hour later I reached Greenbrook Nature Sanctuary an beautiful park along the coast of the Hudson Something that was very surprising to me was that near the end of the route you cross back into New York At this point you are very close to the end

Last on the list is case, which may or may not be a concern, but for this demo it will be treated as such. We want only the words, not their significance or context. Just the words. New in New York counts as a new in this version of affairs.

Let’s just splash those characters with some lowercase goodness!

repro = repro.join('').toLowerCase()
console.log(repro)

and we get,

last weekend i took the most beautiful bike ride of my life the route is called the 9w to nyack and it stretches all the way from riverside park in manhattan to south nyack new jersey it's really an adventure from beginning to end it is a 48 mile loop and it basically took me an entire day i stopped at riverbank state park to take some artsy photos it was a short stop though because i had a really long way left to go after a quick photo op at the very popular little red lighthouse i began my trek across the george washington bridge into new jersey the gw is very long - 4760 feet i was already very tired by the time i got to the other side an hour later i reached greenbrook nature sanctuary an beautiful park along the coast of the hudson something that was very surprising to me was that near the end of the route you cross back into new york at this point you are very close to the end

Now we’re cooking with gas because this is going to give us a beautiful frequency table.

let freq = {}; 

for (let word of repro.split(' ')) {
  freq[word] = (freq[word] || 0) + 1;
}
console.log(freq['was'])    // <- 4
console.log(freq['the'])    // <- 14

#6

Be sure to suss out all the docs on any and every keyword that is new and foreign to you. MDN is my staple since they always link to W3C and ECMA from their pages.


#7

Summary

nwc = [',', ';', ':', '.', '?', '!', '"'];
nwc.forEach(x => repro = repro.filter(y => y != x))
repro = repro.join('').toLowerCase()
let freq = {}; 
for (let word of repro.split(' ')) {
  freq[word] = (freq[word] || 0) + 1;
}
let freqs = [];
for (let el in freq) {
  freqs.push([el, freq[el]])
}
freqs.sort((a, b) => b[1] - a[1]);
freqs.forEach(x => console.log(x))

#8
freqs.sort((a, b) => b[1] - a[1]);
console.log(`Mode: ${freqs[0][0]}, ${freqs[0][1]}`);

const overusedWords = [`really`, `very`, `basically`];
overusedWords.forEach(x => console.log(`${x}: ${freq[x]}`))
const unnecessaryWords = [`extremely`, `literally`, `actually` ];
unnecessaryWords.forEach(x => console.log(`${x}: ${freq[x]}`))
Mode: the, 14
really: 2
very: 5
basically: 1
extremely: 2
literally: 1
actually: 3

Had any of those words not been in the frequency table it would have given undefined. We can actually plan for that by seeding the frequency table with those keys at the start. If they turn up with values of 0, then we know they are not in the text. At least then there are no key errors.

Note that at this point, repro is still a string and can be useful for other operations that might come up. If memory were a concern, then one could consider letting it go to free some up, but only in dire circumstances would I let it go, given that it may have some future use.

Well, that is an understatement if I ever did proclaim one. Of course it has some use. We can do all our replacements on that string. Leading from this, restoring letter case and punctuation may be on the bill. We will need the original text for comparison.

In truth, a linter is no small matter to create. The basic principles emerge here in the lesson, but that is only the broader view. A linter in true form is granular inspection and picks out the specks. There are programming language linters and there are grammar linters (MS Word for example).

From this lesson, one I like to review from time to time, and play with it, we can take a great many leaps into the unknown just parsing and analyzing the text. I hope you will continue to explore the real essence of what it presents. The author may have left a few things out, but that doesn’t mean you have to. Have fun making an awesome linter as you move forward!


#9

Thank you. I appreciate the time you put into the reply you gave to my question. I actually tried to create a frequency table with your instructions. I was fascinated by the fact that it worked and also by the fact how it could be utilized so effortlessly.

So many possibilities for writing code. It is truly interesting. I’ll definitely continue learning JavaScript and other languages as time progresses and I get a better grasp of the skill of coding. Again thank you for widening my horizon so early in my learning experience.

Best regards, Jo


#10

You’re welcome. As mentioned earlier, I like to come back to this module and play around from time to time, though it only happens when somebody brings it up. So thank you for bringing it up.

At some point in the exercise we may be asked, 'How many sentences are there in the story?"

function splitOnFullStop (text) {
  var fullStops = ":.?!".split("");
  fullStops.forEach(s => {
    text = text.split(s).join('\n');
  });
  return text.split('\n ');
}
sentences = splitOnFullStop(story);
console.log(sentences.length);
sentences.forEach(x => console.log(x));

The return line splits the string into an array, but we could send back the string and split it later. The newlines are embedded and there is no more full stop punctuation. The string will print out the same as the array returned above, though it would be harder to get the count. Trade-offs all around.

This process could be front-of-line since it operates on the story text immediately, not something intermediate. As a string it would be simple to implement our earlier process of stripping non-word characters, and we now have less to strip. The only one’s left are the semi-colon, the comma, and the double quote.

In this demonstration (meant to tell you earlier) we left the single quote to allow for contractions. it's and its are two different things entirely, for example. Don’t want to mess things up too much.

function embedNewlines (text) {
  var fullStops = ":.?!".split("");
  fullStops.forEach(s => {
    text = text.split(s).join('\n');
  });
  return text;
}
embedNewlines(story)
=> 'Last weekend, I took literally the most beautiful bike ride of my life\n The route is called "The 9W to Nyack" and it actually stretches all the way from Riverside Park in Manhattan to South Nyack, New Jersey\n It\'s really an adventure from beginning to end\n It is a 48 mile loop and it basically took me an entire day\n I stopped at Riverbank State Park to take some extremely artsy photos\n It was a short stop, though, because I had a really long way left to go\n After a quick photo op at the very popular Little Red Lighthouse, I began my trek across the George Washington Bridge into New Jersey\n The GW is actually very long - 4,760 feet\n I was already very tired by the time I got to the other side\n An hour later, I reached Greenbrook Nature Sanctuary, an extremely beautiful park along the coast of the Hudson\n Something that was very surprising to me was that near the end of the route you actually cross back into New York\n At this point, you are very close to the end / finish\n'   
console.log(embedNewlines(story));
Last weekend, I took literally the most beautiful bike ride of my life
 The route is called "The 9W to Nyack" and it actually stretches all the way from Riverside Park in Manhattan to South Nyack, New Jersey
 It's really an adventure from beginning to end
 It is a 48 mile loop and it basically took me an entire day
 I stopped at Riverbank State Park to take some extremely artsy photos
 It was a short stop, though, because I had a really long way left to go
 After a quick photo op at the very popular Little Red Lighthouse, I began my trek across the George Washington Bridge into New Jersey
 The GW is actually very long - 4,760 feet
 I was already very tired by the time I got to the other side
 An hour later, I reached Greenbrook Nature Sanctuary, an extremely beautiful park along the coast of the Hudson
 Something that was very surprising to me was that near the end of the route you actually cross back into New York
 At this point, you are very close to the end / finish

When we inserted the newline substitute the space after the full stop still remained. We dealt with that in the array example. That space may still need to be dealt with when splitting this string, but I would ignore it since all that gets produced is an empty string in one element of the resultant array.

Seems our recent frequency table turned up thrree empty strings, as I recall.


Couldn’t let it go…

    text = text.split(`${s} `).join('\n');
Last weekend, I took literally the most beautiful bike ride of my life
The route is called "The 9W to Nyack" and it actually stretches all the way from Riverside Park in Manhattan to South Nyack, New Jersey
It's really an adventure from beginning to end
It is a 48 mile loop and it basically took me an entire day
I stopped at Riverbank State Park to take some extremely artsy photos
It was a short stop, though, because I had a really long way left to go
After a quick photo op at the very popular Little Red Lighthouse, I began my trek across the George Washington Bridge into New Jersey
The GW is actually very long - 4,760 feet
I was already very tired by the time I got to the other side
An hour later, I reached Greenbrook Nature Sanctuary, an extremely beautiful park along the coast of the Hudson
Something that was very surprising to me was that near the end of the route you actually cross back into New York
At this point, you are very close to the end / finish.

All this is so we can restore the case on words retained in the linting process. With these original sentences in tow we should be able to accomplish it. The actual full stop is not preserved so a period will have to suffice when we resolve this. However, we do have the original story. It could be useful to restore all the full stops.