Why can't I read a file twice?

I’ve noticed that when I call “file_object.read()”, then try to call any other file read operation a few lines later, the second one produces no output. I first noticed this while writing myself a file generator (since I do my lessons in PyCharm then copy them over to check them) as a practical exercise to practice what I’d learned in the course. The loop “for line in exercise_file.readlines():” behaved as expected at first, but when I added the check for unwanted characters that made use of “exercise_file.read()” further up in the file, I stopped getting output in my for loop.

import re
# Please excuse the variable names, I know they're terrible
u = '10'  # Unit number
l = '01'  # Lesson number
e = 1     # Exercise number - will be incremented automatically
exercises = 'exercises.txt'  # File containing names of exercises, one per line
init_string = '"""\n\n"""\n\n\n'

with open(exercises) as exercise_file:
    if re.search(r'[^a-zA-Z0-9-_.\n]', exercise_file.read()):  # Exit if exercise names have unwanted chars
        print('Error: Invalid filenames found.')
		exit(1)
	for line in exercise_file.readlines():  # This worked until I added the character check above
		line = line.strip('\n')
		filename = '{u}_{l}_{c}-{line}.py'.format(u=u, l=l, c=str(e).zfill(2), line=line)
		with open(filename, 'w') as lesson_file:
			lesson_file.write(init_string)
		e += 1

I did some troubleshooting and experimenting and narrowed it down to the file reads with some test code and data. What’s more, I discovered that if I read single lines, then subsequent reads to the file object will pick up where the last read left off. For instance:

with open('testing.txt') as test_file:
    print(test_file.readline())
    print(test_file.readline())
    print('------------')
    for line in test_file.readlines():
        print(line)
    print('============')

"""
Output:

Foo
Bar          <Two .readline() calls
------------
Foobar
Baz          < for loop with .readlines()
============
"""

What’s going on under the hood here? Python seems to be “remembering” where I left off reading a file, and gives me no output if I try to read after it reaches the end. Can I make it start reading from the beginning again without closing and reopening the file? If I can, could I make it read from some other position as well? Any help with understanding this would be greatly appreciated!

1 Like

Hey, @catacat

This is a very cool code that you’re working with here. Amazing!

I/O in python is handled by the io module. You can read more about it here: link

But, your question basically boils down to understanding how IO like reading from a file or writing to a file or reading an image, etc. works. Python works with these IO in the form of “streams” of data.

Think of it as the file is a source of data(water tank) and streams (pipes) are the tools through which you access data(water). Now, file_object that you’ve mentioned above is the pipe through which you can access all the data from the file(water tank). So, in order to access the data, you would need to open the tap on the file_object (pipe), right? That tap is the readline() method. It keeps the stream of data flowing till it finds the first \n in the stream and then closes it. Now, the data that came out of the stream is no longer there(the water that has flown out of the tank isn’t available in the tank), right? Because you’ve consumed it. So, when you call the .readline() again, it’ll give you the next piece of data until it encounters a \n again and so on. Remember, the data that is out of the stream is consumed or used by you and it’s not part of the source now. If you need to read the data again, you need to replenish the source with the same data again, meaning that you need to read the file again.

I hope I made some sense with this explanation and ananlogy. LoL.

To summarize, the file is the source of data, file_object is the stream of data and you’re consuming the data using the methods like readline() which gives you one line and readlines() which gives you entire file till EOF is found. And once the data is consumed, you can’t ask the source to give you the same data again. You need to refill the source with ‘data’ and then you can work with the same data again.

The next part I want to address is if you wanted to access the same data again, how would you do it.

To process this data, what you can do is read the file and store it as a string in your program and then work with this data, process it, make the changes you want to see in the file and once you’re done, write back the string to another file(You want to do this so that you don’t overwrite the original file with mistakes).

This is a LOT to take in. Take your time, read the python documentation and try to understand what “streams” really are.
:smiley:

I believe with this explanation you can work through the rest! Happy coding.

1 Like

Assuming your file is “seekable” and not sequential access only (e.g. something piped in) then you can move backwards any forwards through the given bytes. A typical way to view this is that there’s a cursor that moves as you move through the file, so if for example you read a line the cursor would stop by the beginning of the new line and the next line you read would be the new line. The alternative would perhaps be even more odd, especially if you consider working reading byte by byte (character by character if it’s just standard text).

However, you can of course move this current location wherever you like (for something seekable like your given text file). The link @goku-kun provided above should provide the details for this so read into it and have a look at the options escpecially .seek to move the cursor, .tell to find it and .find to locate specific bytes/characters.

2 Likes

Thank you both :blue_heart:! Between the link @goku-kun provided, the explanation and tip to look into .seek() from @tgrtim, and a bit of my own googling, I’ve learned a lot about file objects, how they work, and new ways to interact with them. I fixed my code by adding ‘exercise_file.seek(0)’ before any point where I need to reread the file and now everything’s working as it should. I even added some new functionality! I fixed those terrible variable names while I was at it, too. :sweat_smile:

"""
To do:
    - Convert unit, lesson, and exercise variables to input()
    - Learn how to pass variables from command line
"""

import re

unit = ''      # Unit number
lesson = ''    # Lesson number
exercise = 1   # Exercise number - will be incremented automatically
exercise_list = 'exercises.txt'  # Filename where exercise names can be found, one per line
file_template = '"""\n\n"""\n\n\n'

if not unit.isnumeric() or not lesson.isnumeric():  # Make sure user didn't forget to set variables
    print("Error: Did you properly specify your unit and lesson numbers?")
    exit(1)

with open(exercise_list) as exercise_file:
    if re.search(r'[^a-zA-Z0-9-_.\n]', exercise_file.read()):  # Check for unwanted characters
        print('Error: The following filenames are invalid:')
        exercise_file.seek(0)
        line_number = 1
        for line in exercise_file.readlines():  # Give user invalid names with line numbers
            if re.search(r'[^a-zA-Z0-9-_.\n]', line):
                print('Line', str(line_number).zfill(2) + ':', line.strip('\n'))
            line_number += 1
        print('Please correct any errors in {} and run the script again.'.format(exercise_list))
        exit(1)
    exercise_file.seek(0)
    for line in exercise_file.readlines():  # Generate a file for each lesson from template
        line = line.strip('\n')
        filename = '{u}_{l}_{e}-{ln}.py'.format(u=unit.zfill(2), l=lesson.zfill(2), e=str(exercise).zfill(2), ln=line)
        with open(filename, 'w') as lesson_file:
            lesson_file.write(file_template)
        exercise += 1
2 Likes

I had left out the seeking part because I was hoping for you to read the documentation to better understand everything but it worked out just fine in the end. Haha. Glad you were able to solve your problem. Variable names in the fixed code are semantic and very cool! I like them. :laughing:

1 Like