Separating text inside " " using a comma


#1

Good morning. I am trying to separate the following text using the following code but its not working. Kindly advise me.

import re
with open(‘sample.txt’) as f:
for line in f:
print(re.split(r’(?<=")\s(?=")’,line))

my sample.txt file contains the following:

“Damka Properties, Nakuru+254722577965info@damkaproperties.co.ke”
“Rich Mash Egg Sellers, Kitengela0702819365richmashm@gmail.com
“Four Paws Veterinary Clinic, Nairobi0740130438fourpawsvetclinics@gmail.com
“Copiac Digital Systems, Nairobi0714298241sales@copiac.co.ke”
“Safbook, Nairobi+254720306424support@safbook.com


#2

I would like to get the following output:

“Damka Properties, Nakuru+254722577965info@damkaproperties.co.ke”, “Rich Mash Egg Sellers, Kitengela0702819365richmashm@gmail.com”, “Four Paws Veterinary Clinic, Nairobi0740130438fourpawsvetclinics@gmail.com”, “Copiac Digital Systems, Nairobi0714298241sales@copiac.co.ke”, “Safbook, Nairobi+254720306424support@safbook.com

I get the following using the my code:

[’“BizTech Limited, Nairobi0792009522info@kenyaplex.com”\n’]
[’“Cable Options Limited, Nairobi0722 308 793info@cableoptions.co.ke”\n’]
[’“Calla Marketing Kenya, Nairobi0720023593hello@callamarketing.co.ke”\n’]
[’“Damka Properties, Nakuru+254722577965info@damkaproperties.co.ke”\n’]
[’“Rich Mash Egg Sellers, Kitengela0702819365richmashm@gmail.com”\n’]
[’“Four Paws Veterinary Clinic, Nairobi0740130438fourpawsvetclinics@gmail.com”\n’]
[’“Copiac Digital Systems, Nairobi0714298241sales@copiac.co.ke”\n’]
[’“Safbook, Nairobi+254720306424support@safbook.com”\n’]
[’\n’]
[’\n’]

which is not what I need


#3

but if you want to split by comma you can simple use built-in split() function with the right argument? Why would you want to use re.split() for this?


#4

I am getting an error when I try to use split:

text =[“BizTech Limited, Nairobi0792009522info@kenyaplex.com
“Cable Options Limited, Nairobi0722 308 793info@cableoptions.co.ke”
“Calla Marketing Kenya, Nairobi0720023593hello@callamarketing.co.ke”
“Damka Properties, Nakuru+254722577965info@damkaproperties.co.ke”
“Rich Mash Egg Sellers, Kitengela0702819365richmashm@gmail.com
“Four Paws Veterinary Clinic, Nairobi0740130438fourpawsvetclinics@gmail.com
“Copiac Digital Systems, Nairobi0714298241sales@copiac.co.ke”
“Safbook, Nairobi+254720306424support@safbook.com”]
entries = split("," , text)
entries


NameError Traceback (most recent call last)
in ()
8 “Copiac Digital Systems, Nairobi0714298241sales@copiac.co.ke”
9 “Safbook, Nairobi+254720306424support@safbook.com”]
—> 10 entries = split("," , text)
11 entries

NameError: name ‘split’ is not defined


#5

i should be paying a bit more attention, .split() is a method belonging to string. So you will need to apply on the separate string elements of the list.


#6

The code below works fine, but mine does not. I dont understand why:

input=r'"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
re.findall('".+?"',input)# works fine


input=r'"BizTech Limited, Nairobi0792009522info@kenyaplex.com"
"Cable Options Limited, Nairobi0722 308 793info@cableoptions.co.ke"
"Calla Marketing Kenya, Nairobi0720023593hello@callamarketing.co.ke"
"Damka Properties, Nakuru+254722577965info@damkaproperties.co.ke"
"Rich Mash Egg Sellers, Kitengela0702819365richmashm@gmail.com"
"Four Paws Veterinary Clinic, Nairobi0740130438fourpawsvetclinics@gmail.com"
"Copiac Digital Systems, Nairobi0714298241sales@copiac.co.ke"
"Safbook, Nairobi+254720306424support@safbook.com"'

re.findall('".+?"', input)# does not work

#7

look:

input=r'"BizTech Limited, Nairobi0792009522info@kenyaplex.com"
"Cable Options Limited, Nairobi0722 308 793info@cableoptions.co.ke"
"Calla Marketing Kenya, Nairobi0720023593hello@callamarketing.co.ke"
"Damka Properties, Nakuru+254722577965info@damkaproperties.co.ke"
"Rich Mash Egg Sellers, Kitengela0702819365richmashm@gmail.com"
"Four Paws Veterinary Clinic, Nairobi0740130438fourpawsvetclinics@gmail.com"
"Copiac Digital Systems, Nairobi0714298241sales@copiac.co.ke"
"Safbook, Nairobi+254720306424support@safbook.com"'

print input # or print(input) if you use python3

your input variable (terrible name, input() is already a built-in function) only contains one line of the data set.

Maybe you should learn some python first?


#8

Found the above said code here:

but still not working for me


#9

Sure but, running your code raises a syntax error in your definition of input, so it’s not findall that isn’t working, you haven’t defined the data

  File "main.py", line 1
    input=r'"BizTech Limited, Nairobi0792009522info@kenyaplex.com"
                                                                 ^
SyntaxError: EOL while scanning string literal

Also, don’t describe a problem as not working, explain what’s happening differently instead (nobody around here reads minds, sadly) … which is also how you start considering the problem yourself, explaining what information you’re missing to fix it yourself is also really important information, probably more so than the problem itself


#10

This works, but not in the best way: I just need the answer to have only the double quotes and the output has both single and double quotes:

input=’“BizTech Limited, Nairobi0792009522info@kenyaplex.com” “Cable Options Limited, Nairobi0722 308 793info@cableoptions.co.ke”’

re.findall(’".+?"’, input)

[’“BizTech Limited, Nairobi0792009522info@kenyaplex.com”’,
‘“Cable Options Limited, Nairobi0722 308 793info@cableoptions.co.ke”’]

I would want the results to appear like this:

'“BizTech Limited, Nairobi0792009522info@kenyaplex.com”,
“Cable Options Limited, Nairobi0722 308 793info@cableoptions.co.ke”]

Any suggestions?


#11

There aren’t any single quotes in your string. That’s like saying you want to remove square brackets from a list (doesn’t make sense, there are no square brackets in a list, they’re not elements in the list)

Maybe you’re really asking how to print a string without including the quotes around it, which is done like so:

print('hello') # note that the quotes are not printed

Do note that there are double quotes in your string though. Maybe those are what you want to remove.


#12

My ultimate aim is to apply regex to the separated lines. You see, this line of code:

items = [
“BizTech Limited, Nairobi0792009522info@kenyaplex.com”,
“Cable Options Limited, Nairobi0722 308 793info@cableoptions.co.ke”
]

print(len(items))

import re
businesses = []#?

for item in items:
input = item
# x = re.match(’(^[A-Za-z ,]+)’+’([+ 0-9]+)’+’([a-z@.]+)’, a)
regExpStr = re.match(’(^[A-Za-z ,]+)’+’([+ 0-9]+)’+’([a-z@.]+)’, input)

            # print (regExpStr)
            # separate return value
businessNameLocation = regExpStr.group(1)
businessName = businessNameLocation.split(",")[0]
location = businessNameLocation.split(",")[1]
            # print(businessName)
            #returns businessname plus location
mobile = regExpStr.group(2)
            #return mobile number
email = regExpStr.group(3)
            #return email
            # print(businessName, location, mobile, email)
businessDataList = {
                                'name': businessName,
                                'location': location,
                                'mobile': mobile, 
                                'email': email
                                }
            # businessesJsonData = json.dumps(businessDataList)
businesses.append(businessDataList)
        
        # allbusinesess =json.dumps(businesses)
        # print(allbusinesess)
        # for index in businesses:
            # print(index["mobile"])
            # print(index["location"])

print(businesses)

Works perfectly, but:

items = [
’ “BizTech Limited, Nairobi0792009522info@kenyaplex.com”’,
’ “Cable Options Limited, Nairobi0722 308 793info@cableoptions.co.ke”’
]

print(len(items))

import re
businesses = []#?

for item in items:
input = item
# x = re.match(’(^[A-Za-z ,]+)’+’([+ 0-9]+)’+’([a-z@.]+)’, a)
regExpStr = re.match(’(^[A-Za-z ,]+)’+’([+ 0-9]+)’+’([a-z@.]+)’, input)

            # print (regExpStr)
            # separate return value
businessNameLocation = regExpStr.group(1)
businessName = businessNameLocation.split(",")[0]
location = businessNameLocation.split(",")[1]
            # print(businessName)
            #returns businessname plus location
mobile = regExpStr.group(2)
            #return mobile number
email = regExpStr.group(3)
            #return email
            # print(businessName, location, mobile, email)
businessDataList = {
                                'name': businessName,
                                'location': location,
                                'mobile': mobile, 
                                'email': email
                                }
            # businessesJsonData = json.dumps(businessDataList)
businesses.append(businessDataList)
        
        # allbusinesess =json.dumps(businesses)
        # print(allbusinesess)
        # for index in businesses:
            # print(index["mobile"])
            # print(index["location"])

print(businesses)

throws an error, because of this:

’ “BizTech Limited, Nairobi0792009522info@kenyaplex.com”’,
’ “Cable Options Limited, Nairobi0722 308 793info@cableoptions.co.ke”’

So my interest was just to have the items in double quotes only, unless there is another way I can do with and get similar results.


#13

They are only in double quotes. Look:

'"'

This string has one character, it is a double quote. This string does not contain any single quotes.

Your strings look like this:

'"stuff"'

There are only double quotes in the string. The single quotes are not part of the string
The single quotes here cannot be removed because there aren’t any. What you can do however, is to remove the double quotes because there are double quotes in the string.


#14

sorry, ignore the print len(items), it should have a hash (#) before it. Its not part of code.


#15

Thank you for your assistance. Let me try it out.