From: "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Victoria Dye <vdye@github.com>
Subject: [PATCH 0/9] for-each-ref optimizations & usability improvements
Date: Tue, 07 Nov 2023 01:25:52 +0000 [thread overview]
Message-ID: <pull.1609.git.1699320361.gitgitgadget@gmail.com> (raw)
This series is a bit of an informal follow-up to [1], adding some more
substantial optimizations and usability fixes around ref
filtering/formatting. Some of the changes here affect user-facing behavior,
some are internal-only, but they're all interdependent enough to warrant
putting them together in one series.
[1]
https://lore.kernel.org/git/pull.1594.v2.git.1696888736.gitgitgadget@gmail.com/
Patch 1 changes the behavior of the '--no-sort' option in 'for-each-ref',
'tag', and 'branch'. Currently, it just removes previous sort keys and, if
no further keys are specified, falls back on ascending refname sort (which,
IMO, makes the name '--no-sort' somewhat misleading).
Patch 2 updates the 'for-each-ref' docs to clearly state what happens if you
use '--omit-empty' and '--count' together. I based the explanation on what
the current behavior is (i.e., refs omitted with '--omit-empty' do count
towards the total limited by '--count').
Patches 3-7 incrementally refactor various parts of the ref
filtering/formatting workflows in order to create a
'filter_and_format_refs()' function. If certain conditions are met (sorting
disabled, no reachability filtering or ahead-behind formatting), ref
filtering & formatting is done within a single 'for_each_fullref_in'
callback. Especially in large repositories, this makes a huge difference in
memory usage & runtime for certain usages of 'for-each-ref', since it's no
longer writing everything to a 'struct ref_array' then repeatedly whittling
down/updating its contents.
Patch 8 introduces a new option to 'for-each-ref' called '--full-deref'.
When provided, any format fields for the dereferenced value of a tag (e.g.
"%(*objectname)") will be populated with the fully peeled target of the tag;
right now, those fields are populated with the immediate target of a tag
(which can be another tag). This avoids the need to pipe 'for-each-ref'
results to 'cat-file --batch-check' to get fully-peeled tag information. It
also benefits from the 'filter_and_format_refs()' single-iteration
optimization, since 'peel_iterated_oid()' may be able to read the
pre-computed peeled OID from a packed ref. A couple notes on this one:
* I went with a command line option for '--full-deref' rather than another
format specifier (like ** instead of *) because it seems unlikely that a
user is going to want to perform a shallow dereference and a full
dereference in the same 'for-each-ref'. There's also a NEEDSWORK going
all the way back to the introduction of 'for-each-ref' in 9f613ddd21c
(Add git-for-each-ref: helper for language bindings, 2006-09-15) that (to
me) implies different dereferencing behavior corresponds to different use
cases/user needs.
* I'm not attached to '--full-deref' as a name - if someone has an idea for
a more descriptive name, please suggest it!
Finally, patch 9 adds performance tests for 'for-each-ref', showing the
effects of optimizations made throughout the series. Here are some sample
results from my Ubuntu VM (test names shortened for space):
Test this branch
----------------------------------------------------------------------------
6300.2: (loose) 4.78(0.89+3.82)
6300.3: (loose, no sort) 4.51(0.86+3.58)
6300.4: (loose, --count=1) 4.70(0.90+3.73)
6300.5: (loose, --count=1, no sort) 4.35(0.58+3.73)
6300.6: (loose, tags) 2.45(0.44+1.95)
6300.7: (loose, tags, no sort) 2.38(0.44+1.90)
6300.8: (loose, tags, shallow deref) 3.33(1.27+1.99)
6300.9: (loose, tags, shallow deref, no sort) 3.29(1.29+1.93)
6300.10: (loose, tags, full deref) 3.76(1.69+1.99)
6300.11: (loose, tags, full deref, no sort) 3.73(1.71+1.94)
6300.12: for-each-ref + cat-file (loose, tags) 4.25(2.16+2.17)
6300.14: (packed) 0.61(0.50+0.09)
6300.15: (packed, no sort) 0.46(0.40+0.04)
6300.16: (packed, --count=1) 0.59(0.44+0.13)
6300.17: (packed, --count=1, no sort) 0.02(0.01+0.01)
6300.18: (packed, tags) 0.28(0.18+0.09)
6300.19: (packed, tags, no sort) 0.29(0.24+0.03)
6300.20: (packed, tags, shallow deref) 1.20(1.03+0.13)
6300.21: (packed, tags, shallow deref, no sort) 1.13(0.99+0.08)
6300.22: (packed, tags, full deref) 1.57(1.45+0.11)
6300.23: (packed, tags, full deref, no sort) 1.07(1.01+0.05)
6300.24: for-each-ref + cat-file (packed, tags) 2.01(1.81+0.33)
* Victoria
Victoria Dye (9):
ref-filter.c: really don't sort when using --no-sort
for-each-ref: clarify interaction of --omit-empty & --count
ref-filter.h: add max_count and omit_empty to ref_format
ref-filter.h: move contains caches into filter
ref-filter.h: add functions for filter/format & format-only
ref-filter.c: refactor to create common helper functions
ref-filter.c: filter & format refs in the same callback
for-each-ref: add option to fully dereference tags
t/perf: add perf tests for for-each-ref
Documentation/git-for-each-ref.txt | 12 +-
builtin/branch.c | 42 +++--
builtin/for-each-ref.c | 41 ++---
builtin/ls-remote.c | 10 +-
builtin/tag.c | 32 +---
ref-filter.c | 277 ++++++++++++++++++++---------
ref-filter.h | 26 +++
t/perf/p6300-for-each-ref.sh | 87 +++++++++
t/t3200-branch.sh | 68 ++++++-
t/t6300-for-each-ref.sh | 55 ++++++
t/t7004-tag.sh | 45 +++++
11 files changed, 532 insertions(+), 163 deletions(-)
create mode 100755 t/perf/p6300-for-each-ref.sh
base-commit: bc5204569f7db44d22477485afd52ea410d83743
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1609%2Fvdye%2Fvdye%2Ffor-each-ref-optimizations-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1609/vdye/vdye/for-each-ref-optimizations-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1609
--
gitgitgadget
next reply other threads:[~2023-11-07 1:26 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-07 1:25 Victoria Dye via GitGitGadget [this message]
2023-11-07 1:25 ` [PATCH 1/9] ref-filter.c: really don't sort when using --no-sort Victoria Dye via GitGitGadget
2023-11-07 10:49 ` Patrick Steinhardt
2023-11-07 18:13 ` Victoria Dye
2023-11-07 1:25 ` [PATCH 2/9] for-each-ref: clarify interaction of --omit-empty & --count Victoria Dye via GitGitGadget
2023-11-07 19:23 ` Øystein Walle
2023-11-07 19:30 ` Victoria Dye
2023-11-08 7:53 ` Øystein Walle
2023-11-08 10:00 ` Kristoffer Haugsbakk
2023-11-07 1:25 ` [PATCH 3/9] ref-filter.h: add max_count and omit_empty to ref_format Victoria Dye via GitGitGadget
2023-11-07 1:25 ` [PATCH 4/9] ref-filter.h: move contains caches into filter Victoria Dye via GitGitGadget
2023-11-07 10:49 ` Patrick Steinhardt
2023-11-07 1:25 ` [PATCH 5/9] ref-filter.h: add functions for filter/format & format-only Victoria Dye via GitGitGadget
2023-11-07 1:25 ` [PATCH 6/9] ref-filter.c: refactor to create common helper functions Victoria Dye via GitGitGadget
2023-11-07 10:49 ` Patrick Steinhardt
2023-11-07 18:41 ` Victoria Dye
2023-11-07 1:25 ` [PATCH 7/9] ref-filter.c: filter & format refs in the same callback Victoria Dye via GitGitGadget
2023-11-07 10:49 ` Patrick Steinhardt
2023-11-07 19:45 ` Victoria Dye
2023-11-07 1:26 ` [PATCH 8/9] for-each-ref: add option to fully dereference tags Victoria Dye via GitGitGadget
2023-11-07 10:50 ` Patrick Steinhardt
2023-11-08 1:13 ` Victoria Dye
2023-11-08 3:14 ` Junio C Hamano
2023-11-08 7:19 ` Patrick Steinhardt
2023-11-08 18:02 ` Victoria Dye
2023-11-09 1:22 ` Junio C Hamano
2023-11-09 1:23 ` Junio C Hamano
2023-11-09 1:32 ` Junio C Hamano
2023-11-07 1:26 ` [PATCH 9/9] t/perf: add perf tests for for-each-ref Victoria Dye via GitGitGadget
2023-11-07 2:36 ` [PATCH 0/9] for-each-ref optimizations & usability improvements Junio C Hamano
2023-11-07 2:48 ` Victoria Dye
2023-11-07 3:04 ` Junio C Hamano
2023-11-07 10:49 ` Patrick Steinhardt
2023-11-08 1:31 ` Victoria Dye
2023-11-14 19:53 ` [PATCH v2 00/10] " Victoria Dye via GitGitGadget
2023-11-14 19:53 ` [PATCH v2 01/10] ref-filter.c: really don't sort when using --no-sort Victoria Dye via GitGitGadget
2023-11-16 5:29 ` Junio C Hamano
2023-11-14 19:53 ` [PATCH v2 02/10] ref-filter.h: add max_count and omit_empty to ref_format Victoria Dye via GitGitGadget
2023-11-16 12:06 ` Øystein Walle
2023-11-14 19:53 ` [PATCH v2 03/10] ref-filter.h: move contains caches into filter Victoria Dye via GitGitGadget
2023-11-14 19:53 ` [PATCH v2 04/10] ref-filter.h: add functions for filter/format & format-only Victoria Dye via GitGitGadget
2023-11-16 5:39 ` Junio C Hamano
2023-11-14 19:53 ` [PATCH v2 05/10] ref-filter.c: rename 'ref_filter_handler()' to 'filter_one()' Victoria Dye via GitGitGadget
2023-11-14 19:53 ` [PATCH v2 06/10] ref-filter.c: refactor to create common helper functions Victoria Dye via GitGitGadget
2023-11-14 19:53 ` [PATCH v2 07/10] ref-filter.c: filter & format refs in the same callback Victoria Dye via GitGitGadget
2023-11-14 19:53 ` [PATCH v2 08/10] for-each-ref: clean up documentation of --format Victoria Dye via GitGitGadget
2023-11-14 19:53 ` [PATCH v2 09/10] ref-filter.c: use peeled tag for '*' format fields Victoria Dye via GitGitGadget
2023-11-16 5:48 ` Junio C Hamano
2023-11-14 19:53 ` [PATCH v2 10/10] t/perf: add perf tests for for-each-ref Victoria Dye via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.1609.git.1699320361.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=vdye@github.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.