Memory Leak Detected

Overview Description

There appears to be a memory leak generated by repeated calls to dates.parse_pattern(dt, format, locale) if the function is called with a wide variety of different formats and locales.

This is because each time a new DateTimePattern is created for a new (format, locale), the object is cached to _pattern_cache (dict), which grows endlessly.

babel/dates.py:1598 (Babel 2.9.1)

def parse_pattern(pattern):
    """Parse date, time, and datetime format patterns."""
    ...

    # here is the problem
     _pattern_cache[pattern] = pat = DateTimePattern(pattern, u''.join(result))

Perhaps a better design could be to simply lru_cache the dates.parse_pattern() function ?

from functools import lru_cache

@lru_cache(maxsize=1000)
def parse_pattern(pattern):
    """Parse date, time, and datetime format patterns."""
    ...

Steps to Reproduce

from datetime import datetime

from babel.localedata import locale_identifiers
from babel.dates import format_datetime

from pympler import tracker # track memory leaks(=> https://github.com/pympler/pympler)

# show initial memory usage
tr = tracker.SummaryTracker()
tr.print_diff()

# create some random datetime
d = datetime(2007, 4, 1, 13, 27, 53)

# create some datetime formats
custom_formats = [  r"M/d/yy, h:mm a" # short
                    ,r"MMM d, y, h:mm:ss a" # medium
                    ,r"MMMM d, y 'at' h:mm:ss a z" # long
                    ,r"EEEE, MMMM d, y 'at' h:mm:ss a zzzz" # full

                    ,r"EEEE, MMMM d, y 'at' hh:mm:ss zzz" # shorter timezone
                    ,r"EEEE, MMMM d, y 'at' hh:mm:ss zzzz" # full, 24hr
                        
                    ,r"EEEE, MMMM d, y 'at' hh:mm:ss"
                    ,r"EEEE, MMMM d, y 'at' h:mm:ss a"

                    ,r"EEEE, d MMM y hh:mm:ss"
                    ,r"EEEE, d MMM y h:mm:ss a"

                    ,r"d MMM y hh:mm:ss"
                    ,r"d MMM y h:mm:ss a"
                    ]

# call format_datetime for all locale/format combinations, about 9.4k combinations
for locale_name in locale_identifiers():
    for custom_format in custom_formats:
        s = format_datetime(d, locale=locale_name, format=custom_format)

# show difference in memory usage since start
tr.print_diff()




Actual Results

Initial Memory Snapshot
types | # objects | total size

               list |        3750 |     318.95 KB
                str |        3747 |     260.45 KB
                int |         817 |      22.34 KB

Final Memory Snapshot
types | # objects | total size

                         dict |      272282 |    113.17 MB
                          str |       21809 |      1.51 MB
                         list |       12416 |      1.12 MB
  babel.dates.DateTimePattern |        9668 |    453.19 KB
                        tuple |        6829 |    385.02 KB
  babel.numbers.NumberPattern |        7550 |    353.91 K

Expected Results

Reproducibility

Additional Information