Let mypyc optimise os.path.join by hauntsaninja · Pull Request #17949

Let mypyc optimise os.path.join by hauntsaninja · Pull Request #17949 · python/mypy

and others added 2 commits

October 14, 2024 23:07

See python#17948

There's one call site which has varargs that I leave as os.path.join, it
doesn't show up on my profile. I do see the `endswith` on the profile,
we could try `path[-1] == '/'` instead

In my work environment, this is about a 10% speedup:
```
λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy  -c "import torch" --no-incremental --python-executable /opt/oai/bin/python'
Benchmark 1: /tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy  -c "import torch" --no-incremental --python-executable /opt/oai/bin/python
  Time (mean ± σ):     30.842 s ±  0.119 s    [User: 26.383 s, System: 4.396 s]
  Range (min … max):   30.706 s … 30.927 s    3 runs
```
Compared to:
```
λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy  -c "import torch" --no-incremental --python-executable /opt/oai/bin/python'
Benchmark 1: /tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy  -c "import torch" --no-incremental --python-executable /opt/oai/bin/python
  Time (mean ± σ):     34.161 s ±  0.163 s    [User: 29.818 s, System: 4.289 s]
  Range (min … max):   34.013 s … 34.336 s    3 runs
```

In the toy "long" environment mentioned in the issue, this is about a 7%
speedup:
```
λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy  -c "import torch" --no-incremental --python-executable long/bin/python'
Benchmark 1: /tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy  -c "import torch" --no-incremental --python-executable long/bin/python
  Time (mean ± σ):     23.177 s ±  0.317 s    [User: 20.265 s, System: 2.873 s]
  Range (min … max):   22.815 s … 23.407 s    3 runs
```
Compared to:
```
λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy -c "import torch" --python-executable=long/bin/python --no-incremental'
Benchmark 1: /tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy -c "import torch" --python-executable=long/bin/python --no-incremental
  Time (mean ± σ):     24.838 s ±  0.237 s    [User: 22.038 s, System: 2.750 s]
  Range (min … max):   24.598 s … 25.073 s    3 runs
```

hauntsaninja added a commit that referenced this pull request

Oct 20, 2024

See #17948

There's one call site which has varargs that I leave as os.path.join, it
doesn't show up on my profile. I do see the `endswith` on the profile,
we could try `path[-1] == '/'` instead (could save a few dozen
milliseconds)

In my work environment, this is about a 10% speedup:
```
λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy  -c "import torch" --no-incremental --python-executable /opt/oai/bin/python'
Benchmark 1: /tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy  -c "import torch" --no-incremental --python-executable /opt/oai/bin/python
  Time (mean ± σ):     30.842 s ±  0.119 s    [User: 26.383 s, System: 4.396 s]
  Range (min … max):   30.706 s … 30.927 s    3 runs
```
Compared to:
```
λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy  -c "import torch" --no-incremental --python-executable /opt/oai/bin/python'
Benchmark 1: /tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy  -c "import torch" --no-incremental --python-executable /opt/oai/bin/python
  Time (mean ± σ):     34.161 s ±  0.163 s    [User: 29.818 s, System: 4.289 s]
  Range (min … max):   34.013 s … 34.336 s    3 runs
```

In the toy "long" environment mentioned in the issue, this is about a 7%
speedup:
```
λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy  -c "import torch" --no-incremental --python-executable long/bin/python'
Benchmark 1: /tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy  -c "import torch" --no-incremental --python-executable long/bin/python
  Time (mean ± σ):     23.177 s ±  0.317 s    [User: 20.265 s, System: 2.873 s]
  Range (min … max):   22.815 s … 23.407 s    3 runs
```
Compared to:
```
λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy -c "import torch" --python-executable=long/bin/python --no-incremental'
Benchmark 1: /tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy -c "import torch" --python-executable=long/bin/python --no-incremental
  Time (mean ± σ):     24.838 s ±  0.237 s    [User: 22.038 s, System: 2.750 s]
  Range (min … max):   24.598 s … 25.073 s    3 runs
```

In the "clean" environment, this is a 1% speedup, but below the noise
floor.

hauntsaninja added a commit to hauntsaninja/mypy that referenced this pull request

Feb 14, 2026

Removing buffering on file reads can make those reads like 20% faster.
This is probably worth a few percent on a warm cache profile I was
looking at.

Removing abspath in metastore also looks like a percent or two on a warm
cache profile.

I don't think using the faster path join from python#17949 will really help
the profiles I was looking at, but if we're microoptimising some of
these code paths we might as well. (On a cold profile I do think some
of the variadic os.path.join now add up enough that it could be worth
making a function for them, but whatever)

hauntsaninja added a commit to hauntsaninja/mypy that referenced this pull request

Feb 14, 2026

Removing buffering on file reads can make those reads like 20% faster.
This is probably worth a few percent on a warm cache profile I was
looking at.

Removing abspath in metastore also looks like a percent or two on a warm
cache profile.

I don't think using the faster path join from python#17949 will really help
the profiles I was looking at, but if we're microoptimising some of
these code paths we might as well. (On a cold profile I do think some
of the variadic os.path.join now add up enough that it could be worth
making a function for them, but whatever)

hauntsaninja added a commit that referenced this pull request

Feb 15, 2026

Removing buffering on file reads can make those reads like 20% faster.
This is probably worth a few percent on a warm cache profile I was
looking at.

Removing abspath in metastore also looks like a percent or two on a warm
cache profile.

I don't think using the faster path join from #17949 will really help
the profiles I was looking at, but if we're microoptimising some of
these code paths we might as well. (On a cold profile I do think some of
the variadic os.path.join now add up enough that it could be worth
making a function for them, but whatever)