bpo-30228: TextIOWrapper uses abs_pos, not tell() by vstinner · Pull Request #1385 · python/cpython

Conversation

@vstinner

The TextIOWrapper constructor now gets directly the abs_pos attribute
of BufferedWriter and BufferedRandom instead of calling the tell()
method to avoid one lseek() syscall on open(fname, "w") and
open(fname, "w+").

Move the buffered structure to _iomodule.h and rename it to
_PyIO_buffered. Add also "pythread.h" to _iomodule.h, needed by
_PyIO_buffered lock.

https://bugs.python.org/issue30228

@mention-bot

@vstinner

The TextIOWrapper constructor now gets directly the abs_pos attribute of BufferedWriter and BufferedRandom instead of calling the tell()

This change means that TextIOWrapper becomes inconsistent if the file descriptor is moved direcly using os.lseek()... but BufferedReader/BufferedWriter don't detect neither when the file descriptor is moved directly, no? I mean, abs_pos cached attribute already has the bug, no?

@vstinner

@pitrou: Would you mind to review this one? Does it look like an acceptable optimization?

@vstinner

@serhiy-storchaka: Same questions. Would you mind to review this one? Does it look like an acceptable optimization?

@pitrou

@Haypo: this looks like an acceptable optimization, but the question is whether it brings any significant speedup.

pitrou

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it's not seekable?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only go into this path if self->seekable is set.

pitrou

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about BufferedReader?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only go into this path if self->encoder is set. self->encoder is only set if the buffer is writable.

@vstinner

@pitrou: I replied to your comments. So, what do you think? Is my PR safe?

@pitrou

As I said above : this looks like an acceptable optimization, but the question is whether it brings any significant speedup :-)

Of course I'm not against making optimizations in buffered I/O. I also know that buffering can be tricky (being responsible for the data corruption issue in Python 3.2 made me cautious about this!). But if this can increase performance significantly then ok (and you'll bear the responsability of any regression ;-)).

The TextIOWrapper constructor now gets directly the private abs_pos
attribute of BufferedWriter and BufferedRandom instead of calling the
tell() method to avoid one lseek() syscall on open(fname, "w") and
open(fname, "w+").

Move the buffered structure to _iomodule.h and rename it to
_PyIO_buffered. Add also "pythread.h" to _iomodule.h, needed by
_PyIO_buffered lock.

@vstinner

Labels