What does the `newline=" "` argument do?

Could somebody please explain to me in layman’s terms, what the newline = " " argument does? Documentation is written in a language which only takes me down a rabbit hole of words I don’t understand yet.

14 Likes

Footnotes

[1] (1, 2) If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n linendings on write an extra \r will be added. It should always be safe to specify newline='' , since the csv module does its own (universal) newline handling.

2 Likes

That’s the bit which I do not understand. What is meant by “on platforms that use \r\n linendings on write an extra \r will be added.”? Does it mean that the code will accidentally start writing at the beginning of the same line, because of incorrent interpretation? And why do we equate the word “newline” to an empty space as an argument? Am I simply missing some logical connection, or is the answer to that very technical, and I shouldn’t worry about it for now?

7 Likes

Some platforms may insert their own newline escape characters (\n) that conflict with the csv module. It’s sufficient for now to just accept that there is good reason for the recommended implementation and in due course of time you will get more into the technical side of things. I would just pass on this question, for now.

7 Likes

It would appear to overwrite the newline character so there are not two in a row when the module inserts its own. A space (or empty string) are the only substitutions that will not alter the data.

2 Likes

I’ve found an interesting explanation in the documentation.

LINK

newline controls how universal newlines mode works (it only applies to text mode). It can be None , '' , '\n' , '\r' , and '\r\n' . It works as follows:

  • When reading input from the stream, if newline is None , universal newlines mode is enabled. Lines in the input can end in '\n' , '\r' , or '\r\n' , and these are translated into '\n' before being returned to the caller. If it is '' , universal newlines mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.
  • When writing output to the stream, if newline is None , any '\n' characters written are translated to the system default line separator, os.linesep . If newline is '' or '\n' , no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.
19 Likes

Could you give an example of an instance where not using newline=’’ would cause an issue?

Not without following up the same information given above, and other SO type questions.

i am still confused. any help?

4 Likes

i agree with what @mtf said above, let us just “mug it up” it up for now as std practice when parsing a CSV in this use case scenario. It will become clearer in the future when we start working with Python irl.

1 Like

The possibility of a new line escaped by a \n character in our data is why we pass the newline='' keyword argument to the open() function.

1 Like

Hi, I was stuck at understanding the logic behind it too, and by accident I stumbled upon an example which clearly illustrates the use of newline=''. Before proceeding any further, please note that this was all done on Windows PC with Jupyter Lab, and result might differ if you are doing your work on Linux or MacOS. I am not sure how to reproduce this peculiarity on other OS

Without further ado, let’s create a csv using python:

[input]

with open('./data/competitions-from-python.csv', 'w', newline='') as f:
    data_writer = csv.writer(f, delimiter=',')
    data_writer.writerow(['Year', 'Event', 'Winner']) # First Line acts as header
    data_writer.writerow(['1995', 'Best Kept Lawn', 'None'])
    data_writer.writerow(['1999', 'Gobstones', 'Welch National'])
    data_writer.writerow(['2006', 'World Cup', 'Burkina Faso'])

Now let’s view it:

[input]

with open("./data/competitions-from-python.csv") as f:
    csv_file = csv.reader(f) 
    comp = []
    for line in csv_file: 
        print(line)
        comp = comp + line 

[output]

['Year', 'Event', 'Winner']
['1995', 'Best Kept Lawn', 'None']
['1999', 'Gobstones', 'Welch National']
['2006', 'World Cup', 'Burkina Faso']

Everything seems as it should. Now, lets append this file, but this time, we will not use the newline=''

[input]

with open('./data/competitions-from-python.csv', 'a') as f:
    data_writer = csv.writer(f, delimiter=',')
    data_writer.writerow(['2011', 'Butter Cup', 'France'])
    data_writer.writerow(['2013', 'Coffee Cup', 'Brazil'])
    data_writer.writerow(['2006', 'Food Cup', 'Italy'])

Now, lets read the file again:

[input]

with open("./data/competitions-from-python.csv") as f:
    csv_file = csv.reader(f) 
    comp = []
    for line in csv_file: 
        print(line) 
        comp = comp + line

[output]

['Year', 'Event', 'Winner']
['1995', 'Best Kept Lawn', 'None']
['1999', 'Gobstones', 'Welch National']
['2006', 'World Cup', 'Burkina Faso']
['2011', 'Butter Cup', 'France']
[]
['2013', 'Coffee Cup', 'Brazil']
[]
['2006', 'Food Cup', 'Italy']
[]

See the difference?

Let’s review the short explanation that @mtf gave and see if it makes any sense:

If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n linendings, an extra \r will be added to each line. It should always be safe to specify newline='', since the csv module does its own newline handling.

Since windows denote newline as \r\n, the python reads two new lines. First it reads the first line till before \r and creates a list from whatever data was before this character and then creates a new line. But then before any other data, it encounters another newline \n character, so it creates another list, but since there was no data read on this line, the list is empty. Finally python goes to next line and encounters data and creates a list, and the process goes on…

If you take a look at CSV file in a CSV viewer such as Excel, you won’t find anything wrong by looking at it. Similary if you view the CSV file in a text viewer, you may find the difference if you are using OS other than windows, but in windows, text viewer also does not show any irregularity when reading the .csv file.

8 Likes

this still doesn’t make sense

1 Like

When we were reviewing strings, remember we learned that to escape some characters we would use \ so the interpreter would read them properly? E.g. writing 'Luke, I\'m your father.' We also learned that with Python, in order to break and start a new line, we would use \n, right? Well, turns out, depending on your OS the symbols (also called control characters) used to start a new line can be different:

DOS vs. Unix Line Endings
Text files created on DOS/Windows machines have different line endings than files created on Unix/Linux. DOS uses carriage return and line feed ("\r\n") as a line ending, which Unix uses just line feed ("\n"). You need to be careful about transferring files between Windows machines and Unix machines to make sure the line endings are translated properly.

So if you’re trying to read a windows-based(made) file with Python, that can cause problems since they use different control characters to signal/interpret/translate a new line on a file. @uzairsuria8529348861’s example is great in showing that:

[‘2011’, ‘Butter Cup’, ‘France’]

[‘2013’, ‘Coffee Cup’, ‘Brazil’]

[‘2006’, ‘Food Cup’, ‘Italy’]

(…),
Since windows denote newline as \r\n , the python reads two new lines. First it reads the first line till before \r and creates a list from whatever data was before this character and then creates a new line. But then before any other data, it encounters another newline \n character, so it creates another list, but since there was no data read on this line, the list is empty. Finally python goes to next line and encounters data and creates a list, and the process goes on…

The way around this issue is passing an argument to overwrite those default control characters from each system. In this case newline = " " tells the interpreter to consider an empty string as the flag to start a new line.

More about this here and here.

Also, if I could make a suggestion: before starting the Files module on the Python 3 path, taking a sidestep and learning the basics of Command Line and Git (if you’re not familiar with those) has been really helpful for me as a total beginner.

6 Likes

Thanks a lot! Now I finally know what I put in my code.

1 Like

this was super helpful, in combination with all the other input here this concept is very clear now, but this was the paragraph that really clicked. how fortunate that it comes straight from the documentation.

use of dir() and help() are not encouraged enough in the ide codecademy provides on the site