From: Phillip Wood <phillip.wood123@gmail.com>
To: Elijah Newren <newren@gmail.com>, phillip.wood@dunelm.org.uk
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org
Subject: Re: [PATCH v2 2/3] builtin/log: prefetch necessary blobs for `git cherry`
Date: Thu, 23 Apr 2026 16:15:51 +0100 [thread overview]
Message-ID: <2abdc8ba-e361-492c-88b7-0c807ee9fb4d@gmail.com> (raw)
In-Reply-To: <CABPp-BF4woakYQ5RZ32J8SzDs_VpvT2Wv+Y2WaHTnFnM=96Kzg@mail.gmail.com>
Hi Elijah
On 21/04/2026 22:28, Elijah Newren wrote:
> On Sun, Apr 19, 2026 at 7:04 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
>> On 18/04/2026 01:32, Elijah Newren via GitGitGadget wrote:
>>> From: Elijah Newren <newren@gmail.com>
>>>
>>> In partial clones, `git cherry` fetches necessary blobs on-demand one
>>> at a time, which can be very slow. We would like to prefetch all
>>> necessary blobs upfront. To do so, we need to be able to first figure
>>> out which blobs are needed.
>>
>> "git rebase" without "--reapply-cherry-picks" suffers from this problem
>> as well as it does the equivalent of "git log --cherry-pick". Is there
>> any way to share prefetch_cherry_blobs() with the cherry-pick detection
>> in revision.c?
>
> Yes, you're right; git rebase without --reapply-cherry-picks and git
> log --cherry-pick both go through cherry_pick_list() in revision.c,
> which has exactly the same shape as the patch-ids loop in
> cmd_cherry(): build a hashmap of one side via add_commit_patch_id(),
> then look up the other side via patch_id_iter_first(). The on-demand
> blob fetches come from the same patch_id_neq() callback.
>
> After poking around, I think the approximate scope of the fix would
> be: Move collect_diff_blob_oids(), always_match(), and
> prefetch_cherry_blobs() from builtin/log.c to patch-ids.c and expose
> the last one in patch-ids.h. In cherry_pick_list(), between the
> add_commit_patch_id loop and the comparison loop, build a temporary
> list of just the lookup-side commits (filtering by
> SYMMETRIC_LEFT/BOUNDARY as the existing loop already does) and call
> prefetch_cherry_blobs() on it.
Thanks for taking a look
> That said, I'd rather leave this out of the current series. The bigger
> picture is that I have reservations about expanding partial-clone
> support further into this area. git cherry, git log --cherry-pick, and
> the default cherry-pick detection in git rebase all exist to answer
> "has this patch already landed upstream?" -- a question that, in
> repositories large enough to need partial clones, I feel is rarely
> worth the cost of computing patch-ids across arbitrary amounts of
> history. The honest guidance I would probably give for users on a
> large repo is "pass --reapply-cherry-picks (with rebase) and skip this
> entirely" or to narrow the range under consideration.
"--reapply-cherry-picks --empty=drop" is certainly more efficient. When
we're computing patch ids do we do it for every upstream commit or just
the ones that modify the set of paths that are modified in the branch
we're rebasing?
It is a shame that we don't have a config setting for
"-reapply-cherry-picks" as it is easy to forget to pass that option.
Unfortunately it is not supported by the apply backend which makes such
a setting potentially confusing.
> The omission of
> a --no-reapply-cherry-picks option in git-replay wasn't a lack of
> effort or oversight, but a deliberate choice where I'd rather hold off
> (possibly indefinitely) on implementing it. So I'm a bit reluctant to
> make the performance hazard less visible without also asking whether
> we should even be doing that piece of the operation.
>
> I only implemented the git cherry fix because of a specific customer
> situation where the operation was already baked into tooling, and
> prefetching at least makes the worst case tolerable.
I'm a bit surprised customers aren't complaining about tools that use
"git rebase" being slow.
> I don't want to
> hold myself to doing the same for the cherry_pick_list() path, but I'm
> fairly confident the code here can be re-used for those other cases
> and I'd help review a patch from anyone who wants to carry it forward.
>
> Anyway, you are making the right connection, it's just that my
> personal answer is to let some other interested individual do it.
Fair enough
Thanks
Phillip
next prev parent reply other threads:[~2026-04-23 15:15 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-16 22:48 [PATCH 0/3] Batch prefetching Elijah Newren via GitGitGadget
2026-04-16 22:48 ` [PATCH 1/3] patch-ids.h: add missing trailing parenthesis in documentation comment Elijah Newren via GitGitGadget
2026-04-16 22:48 ` [PATCH 2/3] builtin/log: prefetch necessary blobs for `git cherry` Elijah Newren via GitGitGadget
2026-04-17 21:42 ` Junio C Hamano
2026-04-17 22:02 ` Elijah Newren
2026-04-16 22:48 ` [PATCH 3/3] grep: prefetch necessary blobs Elijah Newren via GitGitGadget
2026-04-18 0:32 ` [PATCH v2 0/3] Batch prefetching Elijah Newren via GitGitGadget
2026-04-18 0:32 ` [PATCH v2 1/3] patch-ids.h: add missing trailing parenthesis in documentation comment Elijah Newren via GitGitGadget
2026-04-18 0:32 ` [PATCH v2 2/3] builtin/log: prefetch necessary blobs for `git cherry` Elijah Newren via GitGitGadget
2026-04-19 14:04 ` Phillip Wood
2026-04-21 21:28 ` Elijah Newren
2026-04-23 15:15 ` Phillip Wood [this message]
2026-04-23 17:38 ` Elijah Newren
2026-04-27 13:16 ` Derrick Stolee
2026-05-11 2:51 ` Junio C Hamano
2026-05-11 17:45 ` Elijah Newren
2026-05-13 23:17 ` Elijah Newren
2026-04-18 0:32 ` [PATCH v2 3/3] grep: prefetch necessary blobs Elijah Newren via GitGitGadget
2026-04-27 12:59 ` Derrick Stolee
2026-05-13 19:21 ` Elijah Newren
2026-05-14 16:25 ` [PATCH v3 0/4] Batch prefetching Elijah Newren via GitGitGadget
2026-05-14 16:25 ` [PATCH v3 1/4] promisor-remote: document caller filtering contract Elijah Newren via GitGitGadget
2026-05-14 16:25 ` [PATCH v3 2/4] patch-ids.h: add missing trailing parenthesis in documentation comment Elijah Newren via GitGitGadget
2026-05-14 16:25 ` [PATCH v3 3/4] builtin/log: prefetch necessary blobs for `git cherry` Elijah Newren via GitGitGadget
2026-05-14 16:25 ` [PATCH v3 4/4] grep: prefetch necessary blobs Elijah Newren via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2abdc8ba-e361-492c-88b7-0c807ee9fb4d@gmail.com \
--to=phillip.wood123@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=newren@gmail.com \
--cc=phillip.wood@dunelm.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox