IndexError: list index out of range - while reading a column


#1


IndexError: list index out of range

I am trying to import a file (csv) in the below code (part of code) while I am writing

for i in credit_training1:
print(i[2])

its working fine, its giving me the age column but the below code is not working

for i in credit_training1:
    y = i[2]
    age = int(y)


#2

@harneet411,
could you give us a part of the CSV-file....


#3

Hi Leonhard,

Thanks for the reply.

I tried uploading the file but the website did not allowed me to do so.

It would be helpful if you can download the data from below link -

https://www.kaggle.com/c/GiveMeSomeCredit/data
File Name - cs-training

Regards,
Harneet.


#4

@harneet411
Just give me 2 or 3 lines from the file......


#5

Hi Leonhard,

Please find below few line of data -

SeriousDlqin2yrs
RevolvingUtilizationOfUnsecuredLines
age
NumberOfTime30-59DaysPastDueNotWorse
DebtRatio
MonthlyIncome
NumberOfOpenCreditLinesAndLoans
NumberOfTimes90DaysLate
NumberRealEstateLoansOrLines
NumberOfTime60-89DaysPastDueNotWorse
NumberOfDependents

1
0.766126609
45
2
0.802982129
9120
13
0
6
0
2

0
0.957151019
40
0
0.121876201
2600
4
0
0
0
1

0
0.65818014
38
1
0.085113375
3042
2
1
0
0
0

0
0.233809776
30
0
0.036049682
3300
5
0
0
0
0

0
0.9072394
49
1
0.024925695
63588
7
0
1
0
0

0
0.213178682
74
0
0.375606969
3500
3
0
1
0
1

0
0.305682465
57
0
5710
NA
8
0
3
0
0

0
0.754463648
39
0
0.209940017
3500
8
0
0
0
0

0
0.116950644
27
0
46
NA
2
0
0
0
NA

0
0.189169052
57
0
0.606290901
23684
9
0
4
0
2


#6

@harneet411
Normally when we talk about CSV ( Comma Separated Values )
https://en.wikipedia.org/wiki/Comma-separated_values
we expect a comma-, separator......
In your sample....

SeriousDlqin2yrs  RevolvingUtilizationOfUnsecuredLines  age  NumberOfTime30-59DaysPastDueNotWorse  DebtRatio  MonthlyIncome  NumberOfOpenCreditLinesAndLoans  NumberOfTimes90DaysLate  NumberRealEstateLoansOrLines  NumberOfTime60-89DaysPastDueNotWorse  NumberOfDependents
  1  0.766126609  45  2  0.802982129  9120  13  0  6  0  2
  0  0.957151019  40  0  0.121876201  2600  4  0  0  0  1
  0  0.65818014  38  1  0.085113375  3042  2  1  0  0  0
  0  0.233809776  30  0  0.036049682  3300  5  0  0  0  0
  0  0.9072394  49  1  0.024925695  63588  7  0  1  0  0
  0  0.213178682  74  0  0.375606969  3500  3  0  1  0  1
  0  0.305682465  57  0  5710  NA  8  0  3  0  0
  0  0.754463648  39  0  0.209940017  3500  8  0  0  0  0
  0  0.116950644  27  0  46  NA  2  0  0  0  NA
  0  0.189169052  57  0  0.606290901  23684  9  0  4  0  2

you see we have space-character Separated Values.....

Now if you receive this file,
you will read the file line-by-line.....

The 1st line is meant as =header= defining each =Element=

Now if you have read the 2nd =assigning= the line-string to the variable credit_training1
thus

credit_training1 = "  0  0.189169052  57  0  0.606290901  23684  9  0  4  0  2"

by using the split() Method
str.split([sep[, maxsplit]]) in https://docs.python.org/2/library/stdtypes.html#string-methods
like

credit_training1 = credit_training1.split()

you will have a variable which has an associated list Value

credit_training1 = ['1','0.766126609','45','2','0.802982129','9120','13','0','6','0','2']

You can now access the age Element directly by using credit_training1[2]
and as the accessed Value is of the type =string=
you convert the Value from =string= into a =integer= using int(credit_training1[2])
thus

age = int(credit_training1[2])

In compact form

#have read 2nd line into VARIABLE credit_training1
#
age = int( credit_traning1.split()[2] )

#7

@harneet411,
Here a way to get all Values of each second-line
accessable by a VARIABLE-name

##==== first line =======
first_line = "SeriousDlqin2yrs  RevolvingUtilizationOfUnsecuredLines  age  NumberOfTime30_59DaysPastDueNotWorse  DebtRatio  MonthlyIncome  NumberOfOpenCreditLinesAndLoans  NumberOfTimes90DaysLate  NumberRealEstateLoansOrLines  NumberOfTime60_89DaysPastDueNotWorse  NumberOfDependents"
first_line_list = first_line.split()
# print len(first_line_list)
first_line_list_length = len(first_line_list)
#range(start, stop[, step])
for i in range(0,first_line_list_length):
    # print i
    prep_var_string="elem{}".format(i)
    get_value= first_line_list[i]
    #as a "-" sign will cause a problem in VARIABLE defintion
    #the "-" sign will be replaced with an underscore "_"
    #maybe ask the CSV-provider to use UNDERSCORE instead of "-"
    prep_value=""
    for letter in get_value:
        if letter == "-":
            prep_value += letter
        else:
            prep_value += letter
            
    prep_def_line ="{}='{}'".format(prep_var_string,prep_value)
    #define elemX variable
    print prep_def_line
    exec(prep_def_line)
print elem2

##==== second & following line =======
other_line="  1  0.766126609  45  2  0.802982129  9120  13  0  6  0  2"
other_line_list = other_line.split()
other_line_list_length = len(other_line_list)
#range(start, stop[, step])
for i in range(0,other_line_list_length):
    #print other_line_list[i]
    prep_var_string = "elem{}".format(i)
    prep_var_def = "{}={}".format(eval(prep_var_string),other_line_list[i])
    exec prep_var_def

#You now have all-1st-line-words as VARIABLE-name with a VALUE
print age
print RevolvingUtilizationOfUnsecuredLines
print NumberOfDependents
print NumberOfTime30_59DaysPastDueNotWorse

http://stackoverflow.com/questions/9383740/what-does-pythons-eval-do