From: "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Patrick Steinhardt <ps@pks.im>, Victoria Dye <vdye@github.com>
Subject: [PATCH v2 0/4] Performance improvement & cleanup in loose ref iteration
Date: Mon, 09 Oct 2023 21:58:52 +0000 [thread overview]
Message-ID: <pull.1594.v2.git.1696888736.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1594.git.1696615769.gitgitgadget@gmail.com>
While investigating ref iteration performance in builtins like
'for-each-ref' and 'show-ref', I found two small improvement opportunities.
The first patch tweaks the logic around prefix matching in
'cache_ref_iterator_advance' so that we correctly skip refs that do not
actually match a given prefix. The unnecessary iteration doesn't seem to be
causing any bugs in the ref iteration commands that I've tested, but it
doesn't hurt to be more precise (and it helps with some other patches I'm
working on ;) ).
The next three patches update how 'loose_fill_ref_dir' determines the type
of ref cache entry to create (directory or regular). On platforms that
include d_type information in 'struct dirent' (as far as I can tell, all
except NonStop & certain versions of Cygwin), this allows us to skip calling
'stat'. Benchmarking against repos with various quantities of loose refs
indicates a 5-8% speedup from these changes [1].
Changes since V1
================
* Added tests in patch 1 to demonstrate the bugfix
Thanks!
* Victoria
[1]
https://lore.kernel.org/git/28ae03f5-7091-d3f3-8a70-56aba6639640@github.com/
Victoria Dye (4):
ref-cache.c: fix prefix matching in ref iteration
dir.[ch]: expose 'get_dtype'
dir.[ch]: add 'follow_symlink' arg to 'get_dtype'
files-backend.c: avoid stat in 'loose_fill_ref_dir'
diagnose.c | 42 +++--------------------------------
dir.c | 33 +++++++++++++++++++++++++++
dir.h | 16 +++++++++++++
refs/files-backend.c | 14 +++++-------
refs/ref-cache.c | 3 ++-
t/t1500-rev-parse.sh | 23 +++++++++++++++++++
t/t4205-log-pretty-formats.sh | 30 +++++++++++++++++++++++++
7 files changed, 112 insertions(+), 49 deletions(-)
base-commit: 3a06386e314565108ad56a9bdb8f7b80ac52fb69
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1594%2Fvdye%2Fvdye%2Fref-iteration-cleanup-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1594/vdye/vdye/ref-iteration-cleanup-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1594
Range-diff vs v1:
1: 59276a5b3fd ! 1: 402176246ea ref-cache.c: fix prefix matching in ref iteration
@@ Commit message
'create_dir_entry' explicitly calls out the trailing slash requirement, so
this is a safe assumption to make.
+ This bug generally doesn't have any user-facing impact, since it requires:
+
+ 1. using a non-empty prefix without a trailing slash in an iteration like
+ 'for_each_fullref_in',
+ 2. the callback to said iteration not reapplying the original filter (as
+ for-each-ref does) to ensure unmatched refs are skipped, and
+ 3. the repository having one or more refs that match part of, but not all
+ of, the prefix.
+
+ However, there are some niche scenarios that meet those criteria
+ (specifically, 'rev-parse --bisect' and '(log|show|shortlog) --bisect'). Add
+ tests covering those cases to demonstrate the fix in this patch.
+
Signed-off-by: Victoria Dye <vdye@github.com>
## refs/ref-cache.c ##
@@ refs/ref-cache.c: static int cache_ref_iterator_advance(struct ref_iterator *ref
continue;
} else {
entry_prefix_state = level->prefix_state;
+
+ ## t/t1500-rev-parse.sh ##
+@@ t/t1500-rev-parse.sh: test_expect_success 'rev-parse --since= unsqueezed ordering' '
+ test_cmp expect actual
+ '
+
++test_expect_success 'rev-parse --bisect includes bad, excludes good' '
++ test_commit_bulk 6 &&
++
++ git update-ref refs/bisect/bad-1 HEAD~1 &&
++ git update-ref refs/bisect/b HEAD~2 &&
++ git update-ref refs/bisect/bad-3 HEAD~3 &&
++ git update-ref refs/bisect/good-3 HEAD~3 &&
++ git update-ref refs/bisect/bad-4 HEAD~4 &&
++ git update-ref refs/bisect/go HEAD~4 &&
++
++ # Note: refs/bisect/b and refs/bisect/go should be ignored because they
++ # do not match the refs/bisect/bad or refs/bisect/good prefixes.
++ cat >expect <<-EOF &&
++ refs/bisect/bad-1
++ refs/bisect/bad-3
++ refs/bisect/bad-4
++ ^refs/bisect/good-3
++ EOF
++
++ git rev-parse --symbolic-full-name --bisect >actual &&
++ test_cmp expect actual
++'
++
+ test_done
+
+ ## t/t4205-log-pretty-formats.sh ##
+@@ t/t4205-log-pretty-formats.sh: test_expect_success '%S in git log --format works with other placeholders (part
+ test_cmp expect actual
+ '
+
++test_expect_success 'setup more commits for %S with --bisect' '
++ test_commit four &&
++ test_commit five &&
++
++ head1=$(git rev-parse --verify HEAD~0) &&
++ head2=$(git rev-parse --verify HEAD~1) &&
++ head3=$(git rev-parse --verify HEAD~2) &&
++ head4=$(git rev-parse --verify HEAD~3)
++'
++
++test_expect_success '%S with --bisect labels commits with refs/bisect/bad ref' '
++ git update-ref refs/bisect/bad-$head1 $head1 &&
++ git update-ref refs/bisect/go $head1 &&
++ git update-ref refs/bisect/bad-$head2 $head2 &&
++ git update-ref refs/bisect/b $head3 &&
++ git update-ref refs/bisect/bad-$head4 $head4 &&
++ git update-ref refs/bisect/good-$head4 $head4 &&
++
++ # We expect to see the range of commits betwee refs/bisect/good-$head4
++ # and refs/bisect/bad-$head1. The "source" ref is the nearest bisect ref
++ # from which the commit is reachable.
++ cat >expect <<-EOF &&
++ $head1 refs/bisect/bad-$head1
++ $head2 refs/bisect/bad-$head2
++ $head3 refs/bisect/bad-$head2
++ EOF
++ git log --bisect --format="%H %S" >actual &&
++ test_cmp expect actual
++'
++
+ test_expect_success 'log --pretty=reference' '
+ git log --pretty="tformat:%h (%s, %as)" >expect &&
+ git log --pretty=reference >actual &&
2: 24014010ea3 = 2: 172538b5e30 dir.[ch]: expose 'get_dtype'
3: a382d2ba652 = 3: 295ca94003b dir.[ch]: add 'follow_symlink' arg to 'get_dtype'
4: e193a453182 = 4: e89501cb51f files-backend.c: avoid stat in 'loose_fill_ref_dir'
--
gitgitgadget
next prev parent reply other threads:[~2023-10-09 21:59 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-06 18:09 [PATCH 0/4] Performance improvement & cleanup in loose ref iteration Victoria Dye via GitGitGadget
2023-10-06 18:09 ` [PATCH 1/4] ref-cache.c: fix prefix matching in " Victoria Dye via GitGitGadget
2023-10-06 21:51 ` Junio C Hamano
2023-10-09 10:04 ` Patrick Steinhardt
2023-10-09 16:21 ` Victoria Dye
2023-10-09 18:15 ` Junio C Hamano
2023-10-06 18:09 ` [PATCH 2/4] dir.[ch]: expose 'get_dtype' Victoria Dye via GitGitGadget
2023-10-06 22:00 ` Junio C Hamano
2023-10-06 18:09 ` [PATCH 3/4] dir.[ch]: add 'follow_symlink' arg to 'get_dtype' Victoria Dye via GitGitGadget
2023-10-06 18:09 ` [PATCH 4/4] files-backend.c: avoid stat in 'loose_fill_ref_dir' Victoria Dye via GitGitGadget
2023-10-06 22:12 ` Junio C Hamano
2023-10-06 19:09 ` [PATCH 0/4] Performance improvement & cleanup in loose ref iteration Junio C Hamano
2023-10-09 10:04 ` Patrick Steinhardt
2023-10-09 21:49 ` Victoria Dye
2023-10-10 7:21 ` Patrick Steinhardt
2023-10-09 21:58 ` Victoria Dye via GitGitGadget [this message]
2023-10-09 21:58 ` [PATCH v2 1/4] ref-cache.c: fix prefix matching in " Victoria Dye via GitGitGadget
2023-10-10 7:21 ` Patrick Steinhardt
2023-10-09 21:58 ` [PATCH v2 2/4] dir.[ch]: expose 'get_dtype' Victoria Dye via GitGitGadget
2023-10-09 21:58 ` [PATCH v2 3/4] dir.[ch]: add 'follow_symlink' arg to 'get_dtype' Victoria Dye via GitGitGadget
2023-10-09 21:58 ` [PATCH v2 4/4] files-backend.c: avoid stat in 'loose_fill_ref_dir' Victoria Dye via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.1594.v2.git.1696888736.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=ps@pks.im \
--cc=vdye@github.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).