git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git grep performance regression on macOS
@ 2023-09-29 23:56 Benjamin Hiller
  2023-09-30  5:45 ` Junio C Hamano
  2023-10-02  3:05 ` Carlo Marcelo Arenas Bel'on
  0 siblings, 2 replies; 3+ messages in thread
From: Benjamin Hiller @ 2023-09-29 23:56 UTC (permalink / raw)
  To: git

What did you do before the bug happened? (Steps to reproduce your issue)

git grep seems to have gotten much slower as of git 2.39 on macOS for
complex extended regexes.
We noticed this because git secrets --scan was running much more
slowly for some people on our team, and eventually realized that it
was due to them using a newer version of git. git secrets runs a git
grep command with an extended regex (this is a somewhat simplified
version of the command, but still shows the performance issue):

git grep -E "(A3T[A-Z0-9]|AKIA|AGPA|AIDA|AROA|AIPA|ANPA|ANVA|ASIA)[A-Z0-9]{16}|(\"|')?(AWS|aws|Aws)?_?(SECRET|secret|Secret)?_?(ACCESS|access|Access)?_?(KEY|key|Key)(\"|')?\s*(:|=>|=)\s*(\"|')?[A-Za-z0-9/\+=]{40}(\"|')?|(\"|')?(AWS|aws|Aws)?_?(ACCOUNT|account|Account)_?(ID|id|Id)?(\"|')?\s*(:|=>|=)\s*(\"|')?[0-9]{4}\-?[0-9]{4}\-?[0-9]{4}(\"|')?"

What did you expect to happen? (Expected behavior)
With git 2.38, that command took under half a second to run on a large repo.
Using the git (https://github.com/git/git) repo as an example, it took
0.2s on my laptop.

What happened instead? (Actual behavior)
After 2.39, it now takes over 40 seconds on my laptop with the git repo!

What's different between what you expected and what actually happened?
The command runs much more slowly, though it still does return the
correct result.

Anything else you want to add:
I confirmed that the performance regression was first introduced in
2.39. Additionally, I saw that reverting the change to Makefile from
https://github.com/git/git/commit/1819ad327b7a1f19540a819813b70a0e8a7f798f
fixed the performance regression and the git grep command went back to
taking <1 second. That seems to indicate that switching from Git's
regex library to the native macOS regex library caused this
performance regression, but I haven't investigated beyond that to see
why the native macOS regex library is so much slower.

Please review the rest of the bug report below.
You can delete any lines you don't wish to share.


[System Info]
git version:
git version 2.42.0
cpu: arm64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
feature: fsmonitor--daemon
uname: Darwin 22.4.0 Darwin Kernel Version 22.4.0: Mon Mar  6 21:00:41
PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T8103 arm64
compiler info: clang: 14.0.3 (clang-1403.0.22.14.1)
libc info: no libc information available
$SHELL (typically, interactive shell): /bin/zsh


[Enabled Hooks]
post-checkout
post-merge
pre-commit
pre-push

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-10-02  3:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-29 23:56 git grep performance regression on macOS Benjamin Hiller
2023-09-30  5:45 ` Junio C Hamano
2023-10-02  3:05 ` Carlo Marcelo Arenas Bel'on

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).