git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Benjamin Hiller <benhiller@gmail.com>
To: git@vger.kernel.org
Subject: git grep performance regression on macOS
Date: Fri, 29 Sep 2023 16:56:19 -0700	[thread overview]
Message-ID: <CAPWWTaDE5559vA1qa0zhBid_ep9ht+PxPSDS5YC7Dk0NN8sp9A@mail.gmail.com> (raw)

What did you do before the bug happened? (Steps to reproduce your issue)

git grep seems to have gotten much slower as of git 2.39 on macOS for
complex extended regexes.
We noticed this because git secrets --scan was running much more
slowly for some people on our team, and eventually realized that it
was due to them using a newer version of git. git secrets runs a git
grep command with an extended regex (this is a somewhat simplified
version of the command, but still shows the performance issue):

git grep -E "(A3T[A-Z0-9]|AKIA|AGPA|AIDA|AROA|AIPA|ANPA|ANVA|ASIA)[A-Z0-9]{16}|(\"|')?(AWS|aws|Aws)?_?(SECRET|secret|Secret)?_?(ACCESS|access|Access)?_?(KEY|key|Key)(\"|')?\s*(:|=>|=)\s*(\"|')?[A-Za-z0-9/\+=]{40}(\"|')?|(\"|')?(AWS|aws|Aws)?_?(ACCOUNT|account|Account)_?(ID|id|Id)?(\"|')?\s*(:|=>|=)\s*(\"|')?[0-9]{4}\-?[0-9]{4}\-?[0-9]{4}(\"|')?"

What did you expect to happen? (Expected behavior)
With git 2.38, that command took under half a second to run on a large repo.
Using the git (https://github.com/git/git) repo as an example, it took
0.2s on my laptop.

What happened instead? (Actual behavior)
After 2.39, it now takes over 40 seconds on my laptop with the git repo!

What's different between what you expected and what actually happened?
The command runs much more slowly, though it still does return the
correct result.

Anything else you want to add:
I confirmed that the performance regression was first introduced in
2.39. Additionally, I saw that reverting the change to Makefile from
https://github.com/git/git/commit/1819ad327b7a1f19540a819813b70a0e8a7f798f
fixed the performance regression and the git grep command went back to
taking <1 second. That seems to indicate that switching from Git's
regex library to the native macOS regex library caused this
performance regression, but I haven't investigated beyond that to see
why the native macOS regex library is so much slower.

Please review the rest of the bug report below.
You can delete any lines you don't wish to share.


[System Info]
git version:
git version 2.42.0
cpu: arm64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
feature: fsmonitor--daemon
uname: Darwin 22.4.0 Darwin Kernel Version 22.4.0: Mon Mar  6 21:00:41
PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T8103 arm64
compiler info: clang: 14.0.3 (clang-1403.0.22.14.1)
libc info: no libc information available
$SHELL (typically, interactive shell): /bin/zsh


[Enabled Hooks]
post-checkout
post-merge
pre-commit
pre-push

             reply	other threads:[~2023-09-29 23:57 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-29 23:56 Benjamin Hiller [this message]
2023-09-30  5:45 ` git grep performance regression on macOS Junio C Hamano
2023-10-02  3:05 ` Carlo Marcelo Arenas Bel'on

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPWWTaDE5559vA1qa0zhBid_ep9ht+PxPSDS5YC7Dk0NN8sp9A@mail.gmail.com \
    --to=benhiller@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).