perf: lazily load gazelle manifest files by mattem · Pull Request #2746 · bazel-contrib/rules_python
I'd like to test it on our monorepo before approving, but I predict it'll be just fine.
Sure, this has been running internally for us for ~2 months now, but we are still on the (now very diverged) original copy of the extension. Let me know how it goes.
For your case, what sort of speedups are you seeing? And how did you profile it (so that I can try to reproduce it)?
It's slightly tricky to say, as it depends on the directory that Gazelle is running over, but we saw a reduction of about 30 seconds when running over only web targets, where no Python manifests are required. If you always run Gazelle over Python directories, then this isn't going to change anything.
Gazelle can be profiled by via the -cpuprofile and -memprofile flags. The CPU profile is written to the given path, and can viewed via pprof (go tool pprof ...).
The other part I can upstream is the removal of filepath.Walkdir here, https://github.com/bazel-contrib/rules_python/blob/main/gazelle/python/generate.go#L154. There's no need to re-walk the file system, Gazelle has already collected up the files on the configuration pass. This is very noticeable on the profiles, given the walk does both file system walking and stats on files. Removing this took about 70 seconds off running Gazelle over our whole repo.
Do you mean ~250 gazelle_python.yaml files? Or 250 packages listed in a single gazelle_python.yaml?
250 individual gazelle_python.yaml files. 250 packages in a single file would be on the smaller side of the majority of manifest files.