From: Junio C Hamano <gitster@pobox.com>
To: Han Young <hanyang.tony@bytedance.com>
Cc: git@vger.kernel.org, chriscool@tuxfamily.org, jonathantanmy@google.com
Subject: Re: [PATCH] promisor-remote: remove the promisor object check for failed fetch
Date: Thu, 29 May 2025 08:23:17 -0700 [thread overview]
Message-ID: <xmqqplfrmoey.fsf@gitster.g> (raw)
In-Reply-To: <20250528095830.30306-1-hanyang.tony@bytedance.com> (Han Young's message of "Wed, 28 May 2025 17:58:30 +0800")
Han Young <hanyang.tony@bytedance.com> writes:
> If the promisor objects fail to fetch, we check the remaining objects
> to see if they are indeed promisor objects. Then, we die on the first
> remaining promisor object. However, this promisor object check is
> unnecessary because callers of promisor_remote_get_direct already filter
> out local objects. All objects passed to promisor_remote_get_direct are
> promisor objects.
>
> The is_promisor_object check essentially iterates through every object
> in the local packfiles and adds them to an oid set. This process is
> agonizingly slow for large repositories.
> Remove the check so that we fail immediately.
Thanks for CC'ing Christian and Jonathan Tan, who may indeed be good
reviewers for this change.
let me think aloud a bit, but because I haven't looked at this part
of the system for quite some time, what I mumble may not make much
sense. Please bear with me.
So the incoming list remaining_oids[] is what was already filtered
by the caller and everything in it should be promisor? If so ...
> - for (i = 0; i < remaining_nr; i++) {
> - if (is_promisor_object(repo, &remaining_oids[i]))
> - die(_("could not fetch %s from promisor remote"),
> - oid_to_hex(&remaining_oids[i]));
... the first iteration of this loop, after is_promisor_object()
says the object is a promisor object (which is guaranteed if
everybody in that remaining_oids[] array is promisor object), will
die immediately. So your "agonizingly slow" comes from just one
single call to is_promisor_object() function is ultra slow?
If that is the case, the patch, including the above explanation,
makes perfect sense (note that I didn't verify the "everybody in the
remaining_oids[] array is guaranteed to be a promisor object" claim
myself, though).
But at the same time, it sounds like is_promisor_object() seriously
is wrong. Perhaps we need to tell pack-objects to pre-compute the
packfile.c:add_promisor_object() stuff and cache the result in an
on-disk file, just like reverse index is stored in an auxiliary
file?
What do the callers of the function use to "filter out local
objects" to ensure that "all objects passed ... are promisor
objects" do? Have they already spent agonizingly large amount of
time to do so? I sampled a few existing callers but they do not
explicitly restrict what they throw at the function to promisor
objects---they prepare an oid array, throw anything they do not see
locally in the array and call this function. So objects in the
array are either promisor objects or missing due to repository
corruption---we simply cannot tell. I suspect that the claim
"everything the caller calls the function with is a promisor object"
is not exactly correct.
The end-result when remaining_nr is not zero (i.e., some objects
that the caller wanted us to be fetched) would not exactly be the
same. The function used to die only when the object we failed to
obtain was what a promisor remote promised to give us. With this
change, we also die when an object the caller asked us to fetch is
not promised by any promisor. I do not know what the implication
of this behaviour change would be. Reviewers, comments?
> + if (remaining_nr) {
> + die(_("could not fetch %s from promisor remote"),
> + oid_to_hex(&remaining_oids[0]));
> }
next prev parent reply other threads:[~2025-05-29 15:23 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-28 9:58 [PATCH] promisor-remote: remove the promisor object check for failed fetch Han Young
2025-05-29 15:23 ` Junio C Hamano [this message]
2025-05-30 13:26 ` [External] " Han Young
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqplfrmoey.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=chriscool@tuxfamily.org \
--cc=git@vger.kernel.org \
--cc=hanyang.tony@bytedance.com \
--cc=jonathantanmy@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).