From: Derrick Stolee <stolee@gmail.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: Patrick Steinhardt <ps@pks.im>,
Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org, gitster@pobox.com,
johannes.schindelin@gmx.de, peff@peff.net, johncai86@gmail.com,
newren@gmail.com, christian.couder@gmail.com,
kristofferhaugsbakk@fastmail.com
Subject: Re: [PATCH v2 00/17] pack-objects: add --path-walk option for better deltas
Date: Wed, 30 Oct 2024 22:28:22 -0400 [thread overview]
Message-ID: <9aa2471b-0850-4707-9733-d3b33609f5f2@gmail.com> (raw)
In-Reply-To: <ZyEjHOcf9A4eMSFG@nand.local>
On 10/29/24 2:02 PM, Taylor Blau wrote:
> On Mon, Oct 28, 2024 at 03:46:11PM -0400, Derrick Stolee wrote:
>> On 10/28/24 1:25 PM, Taylor Blau wrote:
>>> Unfortunately, there is no easy way to reuse the format of the existing
>>> hashcache extension as-is to indicate to the reader whether they are
>>> recording traditional name-hash values, or the new --path-walk hash
>>> values.
>>
>> The --path-walk option does not mess with the name-hash. You're thinking
>> of the --full-name-hash feature [1] that was pulled out due to a lack of
>> interest (and better results with --path-walk).
>>
>> [1] https://lore.kernel.org/git/pull.1785.git.1725890210.gitgitgadget@gmail.com/
>
> Ah, gotcha. Thanks for clarifying.
>
> What is the incompatibility between the two, then? Is it just that
> bitmaps give us the objects in pack- or pseudo-pack order, and we don't
> have a way to permute that back into the order that --path-walk would
> give us?
The incompatibility of reading bitmaps and using the path-walk API is
that the path-walk API does not check a bitmap to see if an object is
already discovered. Thus, it does not use the reachability information
from the bitmap at all and would parse commits and trees to find the
objects that should be in the pack-file.
It should also be worth noting that using something like 'git repack
--path-walk' does not mean that future 'git pack-objects' executions
from that packfile data need to use the --path-walk option. I expect
that it should be painless to write bitmaps on top of a packfile created
with 'git repack -adf --path-walk', but since most places doing so also
likely want delta islands, I have not explored this option thoroughly.
(Delta islands are their own challenge, since the path-walk API is not
spreading the reachability information across the objects it walks.
However, this could be remedied by doing a separate walk to identify
islands using the normal method. I believe Peff had an idea in that
direction in another thread. This requires some integration and testing
that I don't have the expertise to provide.)
> If so, a couple of thoughts:
> ...
Since the incompatibility is in a different direction, I don't think
these thoughts were relevant to the problem.
> OTOH, the order in which we pack objects is extremely important to
> performance as you no doubt are aware of. So changing that order to more
> closely match the --path-walk option should be done with great care.
This is a place where I'm unsure about how the --path-walk option adjusts
the object order within the pack. The packing list gets resorted to match
the typical method, at least for how the delta compression window works.
This would be another good reason to consider the --path-walk option in
server environments very carefully. My patch series puts up guard rails
specifically because it makes no claim to be effective in all of the
dimensions that matter for those scenarios. Hopefully, others will be
motivated enough to determine if the compression that's possible with
this algorithm could be achieved in a way that is compatible with server
needs.
> Anyway. All of that is to say that I want to better understand what does
> and doesn't work together between bitmaps and path-walk. Given my
> current understanding, it seems there are a couple of approaches to
> unifying these two things together, so it would be nice to be able to
> do so if possible.
I think this is an excellent opportunity for testing and debugging to
build up more intuition with how the path-walk API works. When I submit
the next version later tonight, the path-walk algorithm will be better
documented.
That said, I don't have any personal motivation to integrate the two
together, so I don't expect to be contributing that integration point
myself. I think that the results speak for themselves in the very
common environment of a Git client without bitmaps.
Thanks,
-Stolee
next prev parent reply other threads:[~2024-10-31 2:28 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-08 14:11 [PATCH 00/17] pack-objects: add --path-walk option for better deltas Derrick Stolee via GitGitGadget
2024-10-08 14:11 ` [PATCH 01/17] path-walk: introduce an object walk by path Derrick Stolee via GitGitGadget
2024-10-08 14:11 ` [PATCH 02/17] t6601: add helper for testing path-walk API Derrick Stolee via GitGitGadget
2024-10-08 14:11 ` [PATCH 03/17] path-walk: allow consumer to specify object types Derrick Stolee via GitGitGadget
2024-10-08 14:11 ` [PATCH 04/17] path-walk: allow visiting tags Derrick Stolee via GitGitGadget
2024-10-08 14:11 ` [PATCH 05/17] revision: create mark_trees_uninteresting_dense() Derrick Stolee via GitGitGadget
2024-10-08 14:11 ` [PATCH 06/17] path-walk: add prune_all_uninteresting option Derrick Stolee via GitGitGadget
2024-10-08 14:11 ` [PATCH 07/17] pack-objects: extract should_attempt_deltas() Derrick Stolee via GitGitGadget
2024-10-08 14:11 ` [PATCH 08/17] pack-objects: add --path-walk option Derrick Stolee via GitGitGadget
2024-10-28 19:54 ` Jonathan Tan
2024-10-29 18:07 ` Taylor Blau
2024-10-29 21:36 ` Jonathan Tan
2024-10-29 22:16 ` Taylor Blau
2024-10-31 2:04 ` Derrick Stolee
2024-10-31 2:14 ` Derrick Stolee
2024-10-31 21:02 ` Taylor Blau
2024-10-31 2:12 ` Derrick Stolee
2024-10-08 14:11 ` [PATCH 09/17] pack-objects: update usage to match docs Derrick Stolee via GitGitGadget
2024-10-08 14:11 ` [PATCH 10/17] p5313: add performance tests for --path-walk Derrick Stolee via GitGitGadget
2024-10-08 14:11 ` [PATCH 11/17] pack-objects: introduce GIT_TEST_PACK_PATH_WALK Derrick Stolee via GitGitGadget
2024-10-08 14:11 ` [PATCH 12/17] repack: add --path-walk option Derrick Stolee via GitGitGadget
2024-10-08 14:11 ` [PATCH 13/17] repack: update usage to match docs Derrick Stolee via GitGitGadget
2024-10-08 14:12 ` [PATCH 14/17] pack-objects: enable --path-walk via config Derrick Stolee via GitGitGadget
2024-10-08 14:12 ` [PATCH 15/17] scalar: enable path-walk during push " Derrick Stolee via GitGitGadget
2024-10-08 14:12 ` [PATCH 16/17] pack-objects: refactor path-walk delta phase Derrick Stolee via GitGitGadget
2024-10-08 14:12 ` [PATCH 17/17] pack-objects: thread the path-based compression Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 00/17] pack-objects: add --path-walk option for better deltas Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 01/17] path-walk: introduce an object walk by path Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 02/17] t6601: add helper for testing path-walk API Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 03/17] path-walk: allow consumer to specify object types Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 04/17] path-walk: allow visiting tags Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 05/17] revision: create mark_trees_uninteresting_dense() Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 06/17] path-walk: add prune_all_uninteresting option Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 07/17] pack-objects: extract should_attempt_deltas() Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 08/17] pack-objects: add --path-walk option Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 09/17] pack-objects: update usage to match docs Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 10/17] p5313: add performance tests for --path-walk Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 11/17] pack-objects: introduce GIT_TEST_PACK_PATH_WALK Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 12/17] repack: add --path-walk option Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 13/17] repack: update usage to match docs Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 14/17] pack-objects: enable --path-walk via config Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 15/17] scalar: enable path-walk during push " Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 16/17] pack-objects: refactor path-walk delta phase Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 17/17] pack-objects: thread the path-based compression Derrick Stolee via GitGitGadget
2024-10-21 21:43 ` [PATCH v2 00/17] pack-objects: add --path-walk option for better deltas Taylor Blau
2024-10-24 13:29 ` Derrick Stolee
2024-10-24 15:52 ` Taylor Blau
2024-10-28 5:46 ` Patrick Steinhardt
2024-10-28 16:47 ` Taylor Blau
2024-10-28 17:13 ` Derrick Stolee
2024-10-28 17:25 ` Taylor Blau
2024-10-28 19:46 ` Derrick Stolee
2024-10-29 18:02 ` Taylor Blau
2024-10-31 2:28 ` Derrick Stolee [this message]
2024-10-31 21:07 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9aa2471b-0850-4707-9733-d3b33609f5f2@gmail.com \
--to=stolee@gmail.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=johannes.schindelin@gmx.de \
--cc=johncai86@gmail.com \
--cc=kristofferhaugsbakk@fastmail.com \
--cc=me@ttaylorr.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).