gh-91378: Allow subprocess pass-thru with stdout/stderr capture by pprindeville · Pull Request #32344

gh-91378: Allow subprocess pass-thru with stdout/stderr capture by pprindeville · Pull Request #32344 · python/cpython

Allow pass-thru of subprocess output even when capturing to buffers

I maintain some build wrappers that both pass the output (both stdout and stderr) through when being run interactively, but also capture logs into artifacts when being run through a CI/CD pipeline.

Being able to call proc = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, tee=True) would allow harvesting results as output, errors = subprocess.communicate() as well as permitting the results of the running process to be seen in real-time (some make recipes invoke docker and can take a while to complete... buffering until completion would confuse users as it might appear that the recipe has hung).

#91378, formerly:

https://bugs.python.org/issue47222

Issue: subprocess.Popen() should allow capturing output and sending it to stdout and stderr #91378

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@pprindeville

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things:

You're calling data.decode() on the read data, but no codec is specified. This is going to cause problems as there is never a guarantee about what an arbitrary process will actually output. A true "tee" should take the binary data it read and write the same binary data out to the destination. Which means you should use sys.stdout.buffer.write(data) rather than sys.stdout.write(data.decode()) (same for stderr).
This needs a unittest. Look around test_subprocess.py for ideas. Executing a child process that executes a child process to see that the expected output winds up in the right place is entirely within reason.
What happens when tee is True but the stdout and/or stderr were not PIPE? If this would be useless, we should make it an error.
Documentation and docstrings need updating (I usually leave this to last), with a practical example.

I'm not decided on if I like this feature as designed or not yet, but I've seen enough code that could use it to know that the need is real in some form or another (you gave a good practical example). So lets get this into shape and see how it looks before making a final decision if this is the desired API or should be tweaked into a different shape.

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

A few things:

@gpshead Thanks for the comments.

You're calling data.decode() on the read data, but no codec is specified. This is going to cause problems as there is never a guarantee about what an arbitrary process will actually output. A true "tee" should take the binary data it read and write the same binary data out to the destination. Which means you should use sys.stdout.buffer.write(data) rather than sys.stdout.write(data.decode()) (same for stderr).

Done

This needs a unittest. Look around test_subprocess.py for ideas. Executing a child process that executes a child process to see that the expected output winds up in the right place is entirely within reason.

Took a stab at this, but I'm unfamiliar with the test framework or how to debug the failure... any suggestions are appreciated.

What happens when tee is True but the stdout and/or stderr were not PIPE? If this would be useless, we should make it an error.

Added a test. Please let me know if you feel it's adequate or not.

Documentation and docstrings need updating (I usually leave this to last), with a practical example.

I'm not decided on if I like this feature as designed or not yet, but I've seen enough code that could use it to know that the need is real in some form or another (you gave a good practical example). So lets get this into shape and see how it looks before making a final decision if this is the desired API or should be tweaked into a different shape.

Agreed. I'll do this last.

Update:

This needs a unittest. Look around test_subprocess.py for ideas. Executing a child process that executes a child process to see that the expected output winds up in the right place is entirely within reason.

@gpshead Added 2 unit tests.

I have made the requested changes; please review again

Thanks for making the requested changes!

@gpshead: please review the changes made to this pull request.

@gpshead Trying your change with .readline(), the test test_check_output_input_none_text seems to loop forever. Not sure why.

It's nearly identical to the preceding test test_check_output_input_none, but differs only in adding text=True.

For simplicity and consistency, why not just always loop reading 2000 lines at a time?

.readline(2000) bounds the read at 2000 bytes, not 2000 lines. Always doing that would be a performance regression for the most common existing users case: Where no checking for EOLs is done and the unbounded read() work is all in C with a more optimal read buffer size and result storage reallocation code.

@gpshead I could not get this to work. x86 would run out of memory, and amd64 would never exit (but get killed by a timeout). Can you please refine the suggestion?

I'll try and take a look this week.

I'll try and take a look this week.

Thanks. I appreciate all of your efforts.

I'll try and take a look this week.

Any luck?

@gpshead Not sure what to do at this point.

Thanks for the ping. Looking into this is still on my TODO list.

no, leave it open. i haven't had time.

Is it an option to proceed with what we have, and circle back to optimize it for WinNT?

Let's not go forward with this PR, see my overall comment on the issue. Thanks for your work on it! The particular implementation in progress being done here isn't the reason why, though it is an example of why this complexity is not great to maintain within the stdlib.

Let's not go forward with this PR, see my overall comment on the issue. Thanks for your work on it! The particular implementation in progress being done here isn't the reason why, though it is an example of why this complexity is not great to maintain within the stdlib.

I won't be contributing again.

gst commented
Nov 9, 2024
•

Loading

Hi @pprindeville , this idea would be to me very welcome as an independent normal library package/module.

I personally remember having needed and implemented such kind of feature, but was using thread(s) to consume and share to other thread(s) (and be able to retain the entire output as well) the subprocess output as it arrives basically. but I wished there would have been such package available, would have me saved of that extra work.

I understand your frustration but the invoked reason is really valid actually.

Anyway, have a good weekend.

Allow pass-thru of subprocess output even when capturing to buffers

CLA Missing

Choose a reason for hiding this comment

gst commented Nov 9, 2024 • Loading

gst commented
Nov 9, 2024
•

Loading