bpo-30119: Fix ftplib to handle the ftp user's commands. by corona10 · Pull Request #1214 · python/cpython
Hi Everyone,
I'm the guy that worked out how to exploit these bugs with firewalls. I just figured I would chime in and provide some opinions.
For one, any URL parsing code that is not FTP-specific should not have input validation checks added. After all, other protocols might be able to handle CR/LF/NUL just fine through an appropriate encoding mechanism. I don't think you guys were pushing for URL library management of this problem, but I just want to emphasize that.
I think any FTP command should be checked for both CR and LF. If either exist, an exception should be thrown, as you appear to be doing. This check should be performed just before sending it on the TCP connection. It is much safer to do this rather than trying to do something hacky like replace them with a space character. (Cleaning invalid input and continuing to use it is often a recipe for disaster in security.) I don't see a good reason to allow CR/LF at the end of the command... I mean, why? Just make the caller strip any string. If FTP happened to have some way to properly encode delimiter characters like this, then that would be the right way to fix it, but there isn't a standard way, so throwing exceptions is your best bet.
NUL (\0) bytes should also be viewed with skepticism. For instance, if the FTP command is a CWD, then a NUL should be rejected since those bytes aren't allowed in any filesystem and they have implications for string manipulation in C. (As an example, it used to be very easy to trick PHP and ASP applications into truncating filenames by inserting a '\0', which had significant security impacts.) If the character has no legitimate use, then reject it.
If the FTP implementation is designed as a low-level library for use by python programmers without support for URL fetching, should you bother with this stuff? Yes, I think so. The risk might currently be very low, but later someone could wrap up your library with something that accepts URLs and runs some commands. Their library may not be able to deal with these special characters, since it may be too abstract at that layer. So if a command isn't syntactically legal, then blow up at the low level.
HTH,
tim