[Python-Dev] Mysterious Python pyc file corruption problems
Barry Warsaw
barry at python.org
Fri May 17 18:17:14 CEST 2013
More information about the Python-Dev mailing list
Fri May 17 18:17:14 CEST 2013
- Previous message: [Python-Dev] Mysterious Python pyc file corruption problems
- Next message: [Python-Dev] Mysterious Python pyc file corruption problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On May 16, 2013, at 02:19 PM, Guido van Rossum wrote: >Now consider the following scenario. It involves *three* processes. > >- Two unrelated processes both start and want to import the same module. >- They both see the .pyc file is missing/corrupt and decide to write it. >- The first process finishing writing the file, writing the correct header. >- Now a third process wants to import the module, sees the valid >header, and starts reading the file. >- However, while this is going on, the second process gets ready to >write the file. >- The second process truncates the file, writes the dummy header, and >then stalls. >- At this point the third process (which thought it was reading a >valid file) sees an unexpected EOF because the file has been >truncated. > >Now, this would explain the EOFError, but not necessarily the >ValueError with "unknown type code". However, it looks like marshal >doesn't always check for EOF immediately (sometimes it calls getc() >without checking the result, and sometimes it doesn't check the error >state after calling r_string()), so I think all the errors are >actually explainable from this scenario. Thanks for this, it's a very interesting scenario. I think this isn't a complete explanation of what's going on though. I've spoken with our defect analyst and looked at a bunch of the bug reports, and as far as we can tell, the corruptions are permanent. Users generally have to take manual action to delete the .pyc files and re-create them. One thing I hadn't realized until now is that until Python 3.4, py_compile.py doesn't write the pyc files atomically, and in fact this is the mechanism we're using to create the pyc files at package installation time. That could explain why we're still seeing these issues even in Python 3.3. I've also uncovered a bug from 2010 reported in Debian[1] about pyc file corruptions that happened when the byte-compilation driver program exited before its workers[2] could complete. We're definitely seeing issues post-landing of this fix, so I need to do some more analysis to see if that fix was enough. If it wasn't, and we're not doing atomic renames, than that could explain the permanent corruptions. Cheers, -Barry [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=590224 [2] the workers each call calling `$PYTHON -m py_compile - < py-filenames` -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: <http://mail.python.org/pipermail/python-dev/attachments/20130517/2ae934fd/attachment.pgp>
- Previous message: [Python-Dev] Mysterious Python pyc file corruption problems
- Next message: [Python-Dev] Mysterious Python pyc file corruption problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-Dev mailing list