Git development
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Cc: Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH v2 2/3] builtin/log: prefetch necessary blobs for `git cherry`
Date: Mon, 27 Apr 2026 09:16:59 -0400	[thread overview]
Message-ID: <a2fbb23d-0809-4a9d-8bf9-8ac0dc8ee054@gmail.com> (raw)
In-Reply-To: <a705852723fbe88e94ad3de1daba548dbce32211.1776472347.git.gitgitgadget@gmail.com>

On 4/17/2026 8:32 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>

(I'm sorry that I'm reviewing out of order. This reply includes my
feelings about patch 3 after reading both.)

> +/*
> + * Enumerate blob OIDs from a single commit's diff, inserting them into blobs.
> + * Skips files whose userdiff driver explicitly declares binary status
> + * (drv->binary > 0), since patch-ID uses oid_to_hex() for those and
> + * never reads blob content.  Use userdiff_find_by_path() since
> + * diff_filespec_load_driver() is static in diff.c.
> + *
> + * Clean up with diff_queue_clear() (from diffcore.h).
> + */
> +static void collect_diff_blob_oids(struct commit *commit,
> +				   struct diff_options *opts,
> +				   struct oidset *blobs)

I think that this is generally a good idea, though I worry that
having this hidden in builtin/log.c may not be the right long-
term home.

I expect that we'll find more and more examples where we want to
prefetch blobs in different operations, those that exist now and
those that may be created in the future. It would be preferred if
they could automatically take advantage of the logic already in
diff_queued_diff_prefetch() within diffcore_std() in diff.c.

Ultimately, _this_ patch cares about a diff. Could we compute a
"diff prep" computation using the core diff library instead of
inventing a second queue of results for diffing?

Patch 3 cares about a "scan prep" which cares about loading all
blobs for a given tree with respect to a pathspec. This is very
similar to what a checkout would do, though it ultimately uses
a form of diff to find out what change should be applied to the
working directory. Perhaps 'git archive' is a better matching
example.

I don't mean to make your series more complicated. I value what
you're doing and can see how your current attention can be used
to make further improvements later. By implementing things in a
common location, then we can have later integrations add to the
confidence in the feature through tests covering each user-facing
use.

I'm not sure if it makes sense to attempt to create a universal
library method that would be used by builtin/log.c _and_ diff.c,
at least not right now. I'm most interested in having this logic
be more reusable in the future without needing to move code
across files.

Thanks,
-Stolee


  parent reply	other threads:[~2026-04-27 13:17 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-16 22:48 [PATCH 0/3] Batch prefetching Elijah Newren via GitGitGadget
2026-04-16 22:48 ` [PATCH 1/3] patch-ids.h: add missing trailing parenthesis in documentation comment Elijah Newren via GitGitGadget
2026-04-16 22:48 ` [PATCH 2/3] builtin/log: prefetch necessary blobs for `git cherry` Elijah Newren via GitGitGadget
2026-04-17 21:42   ` Junio C Hamano
2026-04-17 22:02     ` Elijah Newren
2026-04-16 22:48 ` [PATCH 3/3] grep: prefetch necessary blobs Elijah Newren via GitGitGadget
2026-04-18  0:32 ` [PATCH v2 0/3] Batch prefetching Elijah Newren via GitGitGadget
2026-04-18  0:32   ` [PATCH v2 1/3] patch-ids.h: add missing trailing parenthesis in documentation comment Elijah Newren via GitGitGadget
2026-04-18  0:32   ` [PATCH v2 2/3] builtin/log: prefetch necessary blobs for `git cherry` Elijah Newren via GitGitGadget
2026-04-19 14:04     ` Phillip Wood
2026-04-21 21:28       ` Elijah Newren
2026-04-23 15:15         ` Phillip Wood
2026-04-23 17:38           ` Elijah Newren
2026-04-27 13:16     ` Derrick Stolee [this message]
2026-05-11  2:51       ` Junio C Hamano
2026-05-11 17:45         ` Elijah Newren
2026-05-13 23:17       ` Elijah Newren
2026-04-18  0:32   ` [PATCH v2 3/3] grep: prefetch necessary blobs Elijah Newren via GitGitGadget
2026-04-27 12:59     ` Derrick Stolee
2026-05-13 19:21       ` Elijah Newren
2026-05-14 16:25   ` [PATCH v3 0/4] Batch prefetching Elijah Newren via GitGitGadget
2026-05-14 16:25     ` [PATCH v3 1/4] promisor-remote: document caller filtering contract Elijah Newren via GitGitGadget
2026-05-14 16:25     ` [PATCH v3 2/4] patch-ids.h: add missing trailing parenthesis in documentation comment Elijah Newren via GitGitGadget
2026-05-14 16:25     ` [PATCH v3 3/4] builtin/log: prefetch necessary blobs for `git cherry` Elijah Newren via GitGitGadget
2026-05-14 16:25     ` [PATCH v3 4/4] grep: prefetch necessary blobs Elijah Newren via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a2fbb23d-0809-4a9d-8bf9-8ac0dc8ee054@gmail.com \
    --to=stolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox