How to use special characters or accented letters in regex?

Can you tell me the pattern for both “escape” and “@,!”
Thanks

3 Likes

In a regex, ‘special characters’ are those that symbolize a particular class, such as \s for whitespace characters, \w for word characters, or \? for an actual question mark, \* for an actual asterix, since ? and * (among others) are also special characters.

The @ character can be used directly in the pattern without escapement. It is not special at all since it has no meaning apart from being a printable character.

The lesson should have link in the narrative to a table of regex special characters and their meanings.

15 Likes

In this exercise , why do we have to add the “+” sign to “[a-zA-Z0-9]” ?

2 Likes

I want to know the answer too1

The + special character is known as a quantifier.

Quantifiers

Quantifier Legend Example Sample Match
+ One or more Version \w-\w+ Version A-b1_1
{3} Exactly three times \D{3} ABC
{2,4} Two to four times \d{2,4} 156
{3,} Three or more times \w{3,} regex_tutorial
* Zero or more times ABC* AAACC
? Once or none plurals? plural

https://www.rexegg.com/regex-quickstart.html#quantifiers

14 Likes

Hi there,

In Sweden we have 3 more letters after Z: Å, Ä + Ö

How do I manage that writing code? :slight_smile:

2 Likes

I was wondering the other day about str.isalpha() which if memory serves was a question regarding Chinese characters being recognized. Perhaps the same applies to Swedish?

@ionatan will be better informed than me as I believe he is from Sweden, as well. In Python 3 the default string object is unicode so one can well imagine string methods will work on all characters alphabetic (letter) in nature. Something to look into.

Unicode HOWTO

1 Like

Thanks for providing this information but I couldn’t associate the examples with their sample matches. If you could explain me this it’ll be very helpful.

  1. Version A-b1_1
    The specified pattern expects the word, ‘Version’ at the beginning, and a dash (-) between two word character groups. Note that were we to use the * quantifier the dash wouldn’t be mandatory (neither would be Version). The word character group is made up of, [0-9][A-Za-z][_] if I’m not mistaken. The dash above had to be manually written into the pattern.
  2. ABC
    The pattern expects non-digit characters where there are exactly three characters in the sample.
  3. 156
    The pattern expects 2 to 4 digit characters in the sample.
  4. regex_tutorial
    The pattern expects at minimum, 3 characters which must all be word characters.
  5. AAACC
    The pattern expects the three characters ‘ABC’ in that order where B is optional.
  6. plural
    The pattern expects an ‘s’ character which does not repeat, but may be absent.

That’s a kind of off the cuff rundown. Be sure to read up on Regex patterns if this is something that interests you. It can take years to master the engine so the more exposure you give yourself, the more it becomes second nature. I’m not there, to be sure. One still uses basic patterns in .replace(), test(), .match() when there is a measure of confidence.

Be sure to include JS’s RegExp() special object in your reading, study and practice.