From: Junio C Hamano <gitster@pobox.com>
To: "Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Kristofer Karlsson <krka@spotify.com>
Subject: Re: [PATCH] fetch: pass transport to post-fetch connectivity check
Date: Sun, 24 May 2026 21:53:54 +0900 [thread overview]
Message-ID: <xmqq4ijxhst9.fsf@gitster.g> (raw)
In-Reply-To: <pull.2123.git.1779625693328.gitgitgadget@gmail.com> (Kristofer Karlsson via GitGitGadget's message of "Sun, 24 May 2026 12:28:12 +0000")
"Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
writes:
> From: Kristofer Karlsson <krka@spotify.com>
>
> When fetching with a transport that sets `self_contained_and_connected`
> (as index-pack does for self-contained packs), check_connected() can
> use find_pack_entry_one() to skip connectivity verification for refs
> whose objects exist in the new pack. This avoids sending those OIDs to
> the rev-list child process.
>
> However, store_updated_refs() never passed the transport to
> check_connected(), so opt.transport was always NULL and this
> optimization was dead code for post-fetch connectivity checks.
>
> Thread the transport parameter through store_updated_refs() and set
> opt.transport so that check_connected() can take advantage of
> self-contained packs.
>
> On a large repository (2.4M commits, 374K files, 10.9K local refs),
> fetching 200 new commits:
>
> Before: rev-list connectivity check 22s, total fetch 36s
> After: rev-list connectivity check 5s, total fetch 14s
>
> The remaining 5s is spent verifying refs not contained in the new pack.
Impressive.
The check_connected() function itself is a battle tested helper
function, with the optimization that originates in c6807a40 (clone:
open a shortcut for connectivity check, 2013-05-26), and then
polished in 26b974b3 (check_connected(): delay opening new_pack,
2026-03-05), allowing available "transport" to be taken into account
does make very good sense.
The other call to check_connected() that appear in builtin/fetch.c
does not pass opt.transport, either, but this one checks before we
even fetch any packs over any transport, so a tweak similar to this
patch would not help that code path, I guess. In fact, many calls
to check_connected() elsewhere use opt that is often local to the
scope, that do not have transport at all. I wonder if there are
some of them that benefit from a similar tweak?
Thanks.
>
> Signed-off-by: Kristofer Karlsson <krka@spotify.com>
> ---
> fetch: pass transport to post-fetch connectivity check
>
> We're working on reducing git fetch times on a large monorepo (2.4M
> commits, 374K files, 10.9K local refs). Profiling showed the post-fetch
> connectivity check (rev-list --objects --stdin --not --all) dominating
> wall time when there are new objects.
>
> While investigating, I noticed that check_connected() already has a fast
> path for self-contained packs — it uses find_pack_entry_one() to skip
> refs whose objects are in the new pack. builtin/clone.c passes the
> transport to enable this, but store_updated_refs() in builtin/fetch.c
> does not, making the optimization dead code for fetches.
>
> The fix is a three-line change to thread the transport through.
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2123%2Fspkrka%2Ffetch-transport-fix-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2123/spkrka/fetch-transport-fix-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/2123
>
> builtin/fetch.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index a22c319467..647fd1c30c 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -1213,6 +1213,7 @@ N_("it took %.2f seconds to check forced updates; you can use\n"
> "to avoid this check\n");
>
> static int store_updated_refs(struct display_state *display_state,
> + struct transport *transport,
> int connectivity_checked,
> struct ref_transaction *transaction, struct ref *ref_map,
> struct fetch_head *fetch_head,
> @@ -1228,6 +1229,7 @@ static int store_updated_refs(struct display_state *display_state,
> if (!connectivity_checked) {
> struct check_connected_options opt = CHECK_CONNECTED_INIT;
>
> + opt.transport = transport;
> opt.exclude_hidden_refs_section = "fetch";
> rm = ref_map;
> if (check_connected(iterate_ref_map, &rm, &opt)) {
> @@ -1432,7 +1434,7 @@ static int fetch_and_consume_refs(struct display_state *display_state,
> }
>
> trace2_region_enter("fetch", "consume_refs", the_repository);
> - ret = store_updated_refs(display_state, connectivity_checked,
> + ret = store_updated_refs(display_state, transport, connectivity_checked,
> transaction, ref_map, fetch_head, config,
> display_array);
> trace2_region_leave("fetch", "consume_refs", the_repository);
>
> base-commit: 6a4418c36d6bad69a599044b3cf49dcbd049cb45
next prev parent reply other threads:[~2026-05-24 12:53 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-24 12:28 [PATCH] fetch: pass transport to post-fetch connectivity check Kristofer Karlsson via GitGitGadget
2026-05-24 12:53 ` Junio C Hamano [this message]
2026-05-24 13:04 ` Kristofer Karlsson
2026-05-27 8:32 ` Jeff King
2026-05-27 10:04 ` Kristofer Karlsson
2026-05-27 10:39 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqq4ijxhst9.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=krka@spotify.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.