From: Derrick Stolee <stolee@gmail.com>
To: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org
Cc: Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH v2 3/3] grep: prefetch necessary blobs
Date: Mon, 27 Apr 2026 08:59:48 -0400 [thread overview]
Message-ID: <31763514-2602-4d8e-ac25-70590f090947@gmail.com> (raw)
In-Reply-To: <8fbfe69bc4d0c6166967986f24861ffa393ed7cf.1776472347.git.gitgitgadget@gmail.com>
On 4/17/2026 8:32 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
>
> In partial clones, `git grep` fetches necessary blobs on-demand one
> at a time, which can be very slow. In partial clones, add an extra
> preliminary walk over the tree similar to grep_tree() which collects
> the blobs of interest, and then prefetches them.
A log of the code is about walking trees to find blobs matching
the input pathspec, with this being the core method:
> +static void collect_blob_oids_for_tree(struct repository *repo,
> + const struct pathspec *pathspec,
> + struct tree_desc *tree,
> + struct strbuf *base,
> + int tn_len,
> + struct oidset *blob_oids)
And in your test, you set up a repo to have three blobs with
matches in two of the files:
> +test_expect_success 'grep of revision in partial clone does bulk prefetch' '
> + test_when_finished "rm -rf grep-partial-src grep-partial" &&
> +
> + git init grep-partial-src &&
> + (
> + cd grep-partial-src &&
> + git config uploadpack.allowfilter 1 &&
> + git config uploadpack.allowanysha1inwant 1 &&
> + echo "needle in haystack" >searchme &&
> + echo "no match here" >other &&
> + mkdir subdir &&
> + echo "needle again" >subdir/deep &&
> + git add . &&
> + git commit -m "initial"
> + ) &&
But then the command downloads all of the blobs, not using a
pathspec:
> + # grep HEAD should batch-prefetch all blobs in one request.
> + GIT_TRACE2_EVENT="$(pwd)/grep-trace" \
> + git -C grep-partial grep -c "needle" HEAD >result &&
> +
> + # Should find matches in two files.
> + test_line_count = 2 result &&
> +
> + # Should have prefetched all 3 objects at once
> + test_trace2_data promisor fetch_count 3 <grep-trace
> +'
I think your code is correct, but I'd like to see a test
here that demonstrates a pathspec filter on the 'grep'
command to help filter out a blob that has a matching string.
Perhaps something like:
* matches.txt (has needle)
* nomatch.txt (does not have needle)
* matches.md (has needle)
and then 'git grep -c "needle" HEAD -- *.txt' would
download two blobs and find one match. A second run without
the pathspec would download one blob and find two matches.
Does that make sense as a test?
Thanks,
-Stolee
next prev parent reply other threads:[~2026-04-27 12:59 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-16 22:48 [PATCH 0/3] Batch prefetching Elijah Newren via GitGitGadget
2026-04-16 22:48 ` [PATCH 1/3] patch-ids.h: add missing trailing parenthesis in documentation comment Elijah Newren via GitGitGadget
2026-04-16 22:48 ` [PATCH 2/3] builtin/log: prefetch necessary blobs for `git cherry` Elijah Newren via GitGitGadget
2026-04-17 21:42 ` Junio C Hamano
2026-04-17 22:02 ` Elijah Newren
2026-04-16 22:48 ` [PATCH 3/3] grep: prefetch necessary blobs Elijah Newren via GitGitGadget
2026-04-18 0:32 ` [PATCH v2 0/3] Batch prefetching Elijah Newren via GitGitGadget
2026-04-18 0:32 ` [PATCH v2 1/3] patch-ids.h: add missing trailing parenthesis in documentation comment Elijah Newren via GitGitGadget
2026-04-18 0:32 ` [PATCH v2 2/3] builtin/log: prefetch necessary blobs for `git cherry` Elijah Newren via GitGitGadget
2026-04-19 14:04 ` Phillip Wood
2026-04-21 21:28 ` Elijah Newren
2026-04-23 15:15 ` Phillip Wood
2026-04-23 17:38 ` Elijah Newren
2026-04-27 13:16 ` Derrick Stolee
2026-05-11 2:51 ` Junio C Hamano
2026-05-11 17:45 ` Elijah Newren
2026-05-13 23:17 ` Elijah Newren
2026-04-18 0:32 ` [PATCH v2 3/3] grep: prefetch necessary blobs Elijah Newren via GitGitGadget
2026-04-27 12:59 ` Derrick Stolee [this message]
2026-05-13 19:21 ` Elijah Newren
2026-05-14 16:25 ` [PATCH v3 0/4] Batch prefetching Elijah Newren via GitGitGadget
2026-05-14 16:25 ` [PATCH v3 1/4] promisor-remote: document caller filtering contract Elijah Newren via GitGitGadget
2026-05-14 16:25 ` [PATCH v3 2/4] patch-ids.h: add missing trailing parenthesis in documentation comment Elijah Newren via GitGitGadget
2026-05-14 16:25 ` [PATCH v3 3/4] builtin/log: prefetch necessary blobs for `git cherry` Elijah Newren via GitGitGadget
2026-05-14 16:25 ` [PATCH v3 4/4] grep: prefetch necessary blobs Elijah Newren via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=31763514-2602-4d8e-ac25-70590f090947@gmail.com \
--to=stolee@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=newren@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox