Importing a csv file in Python with numpy - HELP!


#1

Hey fellows,

i want to import a customer_data csv file in a python array with numpy.

Here the Poll-Data:
46

I tried importing the data with following code:

import numpy as np

#Extract Data from Excel in Python Array

total_customer_data = np.genfromtxt(‘ML_Project_1_Linear_Regression_CostumerData.csv’, delimiter = “;”, dtype=None)

print (total_customer_data)

However then the Output is:
[[ 1. nan nan]
[ 2. 5. 4.]
[ 3. nan nan]
[ 4. 5. nan]
[ 5. nan nan]
[ 6. 5. nan]
[ 7. nan nan]
[ 8. 5. nan]
[ 9. nan nan]
[10. nan 3.]
[11. nan nan]
[12. nan nan]
[13. 4. 4.]
[14. nan nan]
[15. 4. nan]
[16. 5. nan]
[17. 4. nan]
[18. nan nan]
[19. nan 2.]
[20. 5. nan]]

When i use “dtype=None” in genfromtxt i get the following Output:
/Users/anwender/Desktop/Machine Learning Projects/Project_1_CustomerLoyaltyPrediction/Project_1_ML_Code.py:6: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.
total_customer_data = np.genfromtxt(‘ML_Project_1_Linear_Regression_CostumerData.csv’, delimiter = “;”, dtype=None)
[( 1, b’2,8’, b’3,75’) ( 2, b’5’, b’4’) ( 3, b’3,6’, b’2,5’)
( 4, b’5’, b’4,25’) ( 5, b’3,8’, b’3,75’) ( 6, b’5’, b’4,63’)
( 7, b’4,6’, b’3,75’) ( 8, b’5’, b’4,88’) ( 9, b’2,2’, b’1,75’)
(10, b’2,8’, b’3’) (11, b’4,4’, b’2,5’) (12, b’3,6’, b’3,88’)
(13, b’4’, b’4’) (14, b’4,4’, b’3,88’) (15, b’4’, b’3,88’)
(16, b’5’, b’4,5’) (17, b’4’, b’3,5’) (18, b’2,8’, b’3,13’)
(19, b’2,6’, b’2’) (20, b’5’, b’2,75’)]

I also tried converting every value in excel into a number (float type) before importing it in python. However, then my output is:
/Users/anwender/Desktop/Machine Learning Projects/Project_1_CustomerLoyaltyPrediction/Project_1_ML_Code.py:6: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.
total_customer_data = np.genfromtxt(‘ML_Project_1_Linear_Regression_CostumerData.csv’, delimiter = “;”, dtype=None)
[[b’1,00’ b’2,80’ b’3,75’]
[b’2,00’ b’5,00’ b’4,00’]
[b’3,00’ b’3,60’ b’2,50’]
[b’4,00’ b’5,00’ b’4,25’]
[b’5,00’ b’3,80’ b’3,75’]
[b’6,00’ b’5,00’ b’4,63’]
[b’7,00’ b’4,60’ b’3,75’]
[b’8,00’ b’5,00’ b’4,88’]
[b’9,00’ b’2,20’ b’1,75’]
[b’10,00’ b’2,80’ b’3,00’]
[b’11,00’ b’4,40’ b’2,50’]
[b’12,00’ b’3,60’ b’3,88’]
[b’13,00’ b’4,00’ b’4,00’]
[b’14,00’ b’4,40’ b’3,88’]
[b’15,00’ b’4,00’ b’3,88’]
[b’16,00’ b’5,00’ b’4,50’]
[b’17,00’ b’4,00’ b’3,50’]
[b’18,00’ b’2,80’ b’3,13’]
[b’19,00’ b’2,60’ b’2,00’]
[b’20,00’ b’5,00’ b’2,75’]]

–> Has anyone an idea, how to create an array which looks like my first output (with the “nan(s)”), or the last Output, where every value is a float BUT without an “VisibleDeprecationWarning:” and the “b’” before the values.

Thank you so much for your help!

Regards!


#2

Did you try to change the decimal indicator from “,” comma to “.” point?


#3

Actually, the delimiter is “;” so changing it to “,” or “.” doesn’t work either :confused:


#4

Have you tried using the read_csv function from the pandas package?

Something like this:

import pandas as pd
df = pd.read_csv('myfile.csv', sep=',', header=None)

So you get a Dataframe which enable better manipulation that just using NumPy.


#5

Thank you very much - it works perfectly!