subprocess.Popen() with universal_newlines=True does not convert line breaks correctly when the preferred encoding is UTF-16. For example, the following code--
code = r"import sys; sys.stdout.buffer.write('a\r\nb'.encode('utf-16'))"
args = [sys.executable, '-c', code]
popen = Popen(args, universal_newlines=True, stdin=PIPE, stdout=PIPE)
print(popen.communicate(input=''))
yields--
('a\n\nb', None)
instead of--
('a\nb', None)
The reason is that the code attempts to convert newlines before decoding to unicode instead of after:
http://hg.python.org/cpython/file/85266c6f9ae4/Lib/subprocess.py#l830
I am attaching a failing test case. I will upload a patch shortly.
Also see the related documentation issue 15561. |