From: Derrick Stolee <stolee@gmail.com>
To: Taylor Blau <me@ttaylorr.com>, git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>,
Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH v2 2/4] pack-objects: support reachability bitmaps with `--path-walk`
Date: Fri, 12 Jun 2026 09:24:32 -0400 [thread overview]
Message-ID: <849c659f-efa8-430a-bfac-0c26a3ed1aaa@gmail.com> (raw)
In-Reply-To: <ffad584a43ebf3cb2138e8dce7daef84ab72712f.1780438896.git.me@ttaylorr.com>
On 6/2/2026 6:21 PM, Taylor Blau wrote:
> When 'pack-objects' is invoked with '--path-walk', it prevents us from
> using reachability bitmaps.
My earlier response focused on the _use_ of bitmaps when creating a
packfile, but your patch also enables _writing_ bitmaps with the
--path-walk option, which is significant and potentially more
interesting from my perspective: we have evidence that --path-walk
can produce significantly smaller packfiles than the standard
algorithm, and once those packfiles are created we can benefit from
that size in later packfile creation steps by reusing those deltas.
In this sense, I think the _writing_ is more important during a
repack scenario. The fetch/clone scenarios can benefit directly even
without integrating --path-walk with --use-bitmap-indexes.
> * A path-walk repack that writes bitmaps does not give the bitmap
> selector any commits, because path-walk reveals commits through
> `add_objects_by_path()` rather than through `show_commit()`, where
> `index_commit_for_bitmap()` is normally called.
...
> * On the writing side: teach the path-walk object callback to call
> `index_commit_for_bitmap()` for commits that it adds to the pack.
> That gives the bitmap selector the commit candidates it would have
> seen from the regular traversal.
My earlier reply to this patch was focused on the performance results
when using the "reading bitmaps" case, and I expressed suspicion about
the "exact" sizes of the packfiles.
Even more important here is that we have demonstrated examples of repos
that change their packfile size when using the --path-walk method. We
should demonstrate that the size continues to shrink with --path-walk
even when producing a matching .bitmap file with --write-bitmap-index.
The other thing that I notice here is that the bitmaps will need to
compute their reachable object set independently from the path-walk
algorithm. But I suppose that already happens separately from the
revision-walk approach that normally produces the packfile contents.
Note: A lot of my thoughts around asking for more evidence here is
that this patch seems suspiciously simple for integrating two
complicated features. The test suite (especially with
GIT_TEST_PACK_PATH_WALK=1) helps to guarantee that the result is
_correct_, but with performance features like this it's not enough to
"just" be correct. I want to see that we're having the intended
results.
From my perspective, the point of integrating these two things are:
1. Reachability bitmaps make it much faster to discover the reachable
set and reuse bits of existing packfiles. (Your performance table
demonstrates this is true.)
2. The --path-walk option can shrink packfile sizes by grouping
trees and blobs by path before those paths collide in the name-hash
sort. (I haven't seen evidence that this is happening.)
With evidence of (1) and not (2), it's not clear from the data that
these features are integrating completely. Without looking at the
code, those numbers would be the same if we had instead swapped the
preference of "the --path-walk option disables bitmaps" to "bitmaps
disable --path-walk".
Finally, I'll just note that I don't expect the _bitmaps_ to change
size dramatically. The --path-walk option does change the order of
the objects for its first pass of delta compression, but then uses
the (name-hash, size) sort to finalize the object ordering, so the
final object ordering _should_ be the same (unless I'm mistaken, in
which case the bitmaps could change size due to bitmap compression
concerns).
Thanks,
-Stolee
next prev parent reply other threads:[~2026-06-12 13:24 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-27 23:18 [PATCH 0/3] pack-objects: support bitmaps and delta-islands with `--path-walk` Taylor Blau
2026-05-27 23:18 ` [PATCH 1/3] pack-objects: support reachability bitmaps " Taylor Blau
2026-05-27 23:18 ` [PATCH 2/3] pack-objects: extract `record_tree_depth()` helper Taylor Blau
2026-05-27 23:18 ` [PATCH 3/3] pack-objects: support `--delta-islands` with `--path-walk` Taylor Blau
2026-05-28 15:28 ` [PATCH 0/3] pack-objects: support bitmaps and delta-islands " Derrick Stolee
2026-05-29 17:26 ` Derrick Stolee
2026-05-29 20:07 ` Taylor Blau
2026-05-29 21:28 ` Derrick Stolee
2026-05-29 22:20 ` Taylor Blau
2026-06-02 22:21 ` [PATCH v2 0/4] " Taylor Blau
2026-06-02 22:21 ` [PATCH v2 1/4] t/perf: drop p5311's lookup-table permutation Taylor Blau
2026-06-02 22:21 ` [PATCH v2 2/4] pack-objects: support reachability bitmaps with `--path-walk` Taylor Blau
2026-06-12 13:03 ` Derrick Stolee
2026-06-12 13:24 ` Derrick Stolee [this message]
2026-06-02 22:21 ` [PATCH v2 3/4] pack-objects: extract `record_tree_depth()` helper Taylor Blau
2026-06-02 22:21 ` [PATCH v2 4/4] pack-objects: support `--delta-islands` with `--path-walk` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=849c659f-efa8-430a-bfac-0c26a3ed1aaa@gmail.com \
--to=stolee@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox