Hi All. I’m hoping someone can help with my RegEx understanding…
I have files that contain lines (amongst others) of the format:
10 digits
4 spaces
2 digits
6 digits
200 characters/digits
6 digits
4 digits
? character to end of line.
I need to change the “4 digits” to “XXXX” where the first “6 digits” match the second “6 digits” (in the same line). Obviously if a line doesn’t match it should be left alone.
Well, this really depends on what you have at your disposal, but you could do something like this in python.
import re
#Generate a test string
string = '1'*10+' '*4+'2'*2+'6'*6+'a1'*100+'6'*6+'4'*4+'0'
#Match the required fields
matches = re.search('\d{10}\s{4}\d{2}(\d{6})[\d\w]{200}(\d{6})(\d{4}).', string)
#Check if our matches exist and replace the 4
if matches.group(1) == matches.group(2):
string = string[:-5]+'XXXX'+string[-1:]
Backreferences can be used to build a replacement with captured parts in the pattern
>>> s = 'I am visiting the zoo!'
>>> re.sub(r'visiting the (\w+)', r'leaving the \g<1>', s)
'I am leaving the zoo!'
>>> s2 = 'Bob is visiting the cinema.'
>>> re.sub(r'visiting the (\w+)', r'leaving the \g<1>', s2)
'Bob is leaving the cinema.'
…Not an amazing example I suppose, since I could have replaced just ‘visiting’ with ‘leaving’, point is that zoo and cinema could be captured and reused