Regular Expression in Jammming Project

Hey all - going through the Jammming project I’ve reached step 79 where we need to use a regular expression in order to match access tokens to hook into Spotify’s API.

My question is, can anyone break down what this particular regular expression is looking for? I’ve done some research on RegExp formatting and have a general understanding, but some of these characters are confusing given the context. Here’s some code:

Example URL to parse:
https://example.com/callback#access_token=NwAExz...BV3O2Tk&token_type=Bearer&expires_in=3600&state=123

Matching functions:

const accessToken = window.location.href.match(/access_token=([^&]*)/);
const expiresInMatch = window.location.href.match(/expires_in=([^&]*)/);

Based on that url - I can’t fathom how someone would even know what RegExp to use given that most of the token is hidden, and I’m not seeing characters in the Regular Expression that match in the url to use in the .match() function.

I watched through the help video on this section, and even the narrattor glosses over this particular task himself without much explanation as to how and why we’re using this particular Regular Expression. Can anyone enlighten me?

Thanks!

1 Like

So I’m starting to get it a little bit. Since each key value is separated by ‘&’, what [^&] is saying is to match everything BUT ‘&’, including any characters that have been hidden for security purposes.

It also looks like the final / could be used as an escaping mechanism, but I’m not really sure.

The *) is still confusing me though.

In JavaScript, when you’re working with the match() method, the regular expression is enclosed between two forward-slashes.

So, the complete expression may be /access_token=([^&]*)/, but what you’re actually matching here is access_token=([^&]*).

Let’s break that regex down, shall we.

The first part is simple. access_token= looks specifically for that string, so we’re looking for that parameter in the URI’s query string.

([^&]*) is what’s called a capturing group. This lets you group multiple regex tokens together.

The first part, [^&] means "do not match &". This is because, in the query string, each of your key-value pairs is separated by an ampersand. Since the character is not part of the key-value pair, you don’t want to capture it.

The second part, *, means “match 0 or more of the preceeding token”.

Taken together, the regex allows you to grab the entire key-value pair from the query string regardless of whether it has a value. Like so:

As you can see below, the expression still works even if there’s no value for access_token.

The reason we are using the “match 0 or more” quantifier, rather than the “match 1 or more” quantifier (which is the +) is because doing so would prevent us from matching an empty value. Like so:


If you’re struggling with regex, I wouldn’t feel too bad about it. They are pretty confusing.

There’s some information on them on the MDN JavaScript docs. :slight_smile:

2 Likes

Man, this is a great explanation. Thank you!

Just to clarify on the * vs the +

By using *, it’s allowing us to capture an empty value because an empty value is represented as the “0” in “0 or more”, where as + would not because it only includes 1 or more of the preceeding token. I’m assuming that “0” and “1” are referring specifically to the amount of characters in the token value. So whether the token is 1 character or 40,000, + allows us to capture it all - but wouldn’t allow us to capture if the value was empty. Is this correct?

One last question - why do you we need a * or a + at all? Does a bracketed regex always require a follow up expression like this to specify “0 or more” vs “1 or more”?

1 Like

“0” and “1” refers to the number of instances of the token to be matched.

In the case of the capturing group ([^&]), what we we are seeking to match is either “0 or more” (with *) or “1 or more” (with +) instance of “not &”.

We don’t capture the empty value when using + because we would need at least one instance of “not &” to trigger the match.

Yes, following on from what I’ve said above - using + would allow you to capture any value for the token providing it wasn’t empty as already explained.

The [^&] part of our regex is what’s called a “negated set”. Without a quantifier, we’re telling regex to match any single literal character which is not an ampersand.

As a result, if we omit the quantifier from our regex, we will only get a match if the access_token key exists in the query string and it has a value. However, we will only capture the first character of that value because we’ve only asked regex to match one character. Like this:

Does that help?

2 Likes

Yup, really appreciate you clarifying things. Thanks again!

1 Like

No problem. :slight_smile:

Regex can be tricky, especially at the beginning as there’s a lot to get your head around. (I still get caught out by them myself.)

If you get stuck with any more regex, pop back and we’ll do what we can to help out again. :slight_smile:

1 Like

This topic was automatically closed 18 hours after the last reply. New replies are no longer allowed.