* [PATCH] promisor-remote: remove the promisor object check for failed fetch
@ 2025-05-28 9:58 Han Young
2025-05-29 15:23 ` Junio C Hamano
0 siblings, 1 reply; 3+ messages in thread
From: Han Young @ 2025-05-28 9:58 UTC (permalink / raw)
To: git; +Cc: chriscool, jonathantanmy, Han Young
If the promisor objects fail to fetch, we check the remaining objects
to see if they are indeed promisor objects. Then, we die on the first
remaining promisor object. However, this promisor object check is
unnecessary because callers of promisor_remote_get_direct already filter
out local objects. All objects passed to promisor_remote_get_direct are
promisor objects.
The is_promisor_object check essentially iterates through every object
in the local packfiles and adds them to an oid set. This process is
agonizingly slow for large repositories.
Remove the check so that we fail immediately.
Signed-off-by: Han Young <hanyang.tony@bytedance.com>
---
promisor-remote.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/promisor-remote.c b/promisor-remote.c
index 9d058586df..f42ea4ce78 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -275,7 +275,6 @@ void promisor_remote_get_direct(struct repository *repo,
struct object_id *remaining_oids = (struct object_id *)oids;
int remaining_nr = oid_nr;
int to_free = 0;
- int i;
if (oid_nr == 0)
return;
@@ -296,10 +295,9 @@ void promisor_remote_get_direct(struct repository *repo,
goto all_fetched;
}
- for (i = 0; i < remaining_nr; i++) {
- if (is_promisor_object(repo, &remaining_oids[i]))
- die(_("could not fetch %s from promisor remote"),
- oid_to_hex(&remaining_oids[i]));
+ if (remaining_nr) {
+ die(_("could not fetch %s from promisor remote"),
+ oid_to_hex(&remaining_oids[0]));
}
all_fetched:
--
2.48.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] promisor-remote: remove the promisor object check for failed fetch
2025-05-28 9:58 [PATCH] promisor-remote: remove the promisor object check for failed fetch Han Young
@ 2025-05-29 15:23 ` Junio C Hamano
2025-05-30 13:26 ` [External] " Han Young
0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2025-05-29 15:23 UTC (permalink / raw)
To: Han Young; +Cc: git, chriscool, jonathantanmy
Han Young <hanyang.tony@bytedance.com> writes:
> If the promisor objects fail to fetch, we check the remaining objects
> to see if they are indeed promisor objects. Then, we die on the first
> remaining promisor object. However, this promisor object check is
> unnecessary because callers of promisor_remote_get_direct already filter
> out local objects. All objects passed to promisor_remote_get_direct are
> promisor objects.
>
> The is_promisor_object check essentially iterates through every object
> in the local packfiles and adds them to an oid set. This process is
> agonizingly slow for large repositories.
> Remove the check so that we fail immediately.
Thanks for CC'ing Christian and Jonathan Tan, who may indeed be good
reviewers for this change.
let me think aloud a bit, but because I haven't looked at this part
of the system for quite some time, what I mumble may not make much
sense. Please bear with me.
So the incoming list remaining_oids[] is what was already filtered
by the caller and everything in it should be promisor? If so ...
> - for (i = 0; i < remaining_nr; i++) {
> - if (is_promisor_object(repo, &remaining_oids[i]))
> - die(_("could not fetch %s from promisor remote"),
> - oid_to_hex(&remaining_oids[i]));
... the first iteration of this loop, after is_promisor_object()
says the object is a promisor object (which is guaranteed if
everybody in that remaining_oids[] array is promisor object), will
die immediately. So your "agonizingly slow" comes from just one
single call to is_promisor_object() function is ultra slow?
If that is the case, the patch, including the above explanation,
makes perfect sense (note that I didn't verify the "everybody in the
remaining_oids[] array is guaranteed to be a promisor object" claim
myself, though).
But at the same time, it sounds like is_promisor_object() seriously
is wrong. Perhaps we need to tell pack-objects to pre-compute the
packfile.c:add_promisor_object() stuff and cache the result in an
on-disk file, just like reverse index is stored in an auxiliary
file?
What do the callers of the function use to "filter out local
objects" to ensure that "all objects passed ... are promisor
objects" do? Have they already spent agonizingly large amount of
time to do so? I sampled a few existing callers but they do not
explicitly restrict what they throw at the function to promisor
objects---they prepare an oid array, throw anything they do not see
locally in the array and call this function. So objects in the
array are either promisor objects or missing due to repository
corruption---we simply cannot tell. I suspect that the claim
"everything the caller calls the function with is a promisor object"
is not exactly correct.
The end-result when remaining_nr is not zero (i.e., some objects
that the caller wanted us to be fetched) would not exactly be the
same. The function used to die only when the object we failed to
obtain was what a promisor remote promised to give us. With this
change, we also die when an object the caller asked us to fetch is
not promised by any promisor. I do not know what the implication
of this behaviour change would be. Reviewers, comments?
> + if (remaining_nr) {
> + die(_("could not fetch %s from promisor remote"),
> + oid_to_hex(&remaining_oids[0]));
> }
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [External] Re: [PATCH] promisor-remote: remove the promisor object check for failed fetch
2025-05-29 15:23 ` Junio C Hamano
@ 2025-05-30 13:26 ` Han Young
0 siblings, 0 replies; 3+ messages in thread
From: Han Young @ 2025-05-30 13:26 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, chriscool, jonathantanmy
On Thu, May 29, 2025 at 11:23 PM Junio C Hamano <gitster@pobox.com> wrote:
> So your "agonizingly slow" comes from just one
> single call to is_promisor_object() function is ultra slow?
That's correct, is_promisor_object() function constructs the promisor
objects oid set by iterating through every object in the promisor
packfiles and calling add_promisor_object() function for each object.
add_promisor_object() function parses the object, adds itself and the
objects it refers to to the oid set. For a very large partial clone
repository, the promisor packfiles will contain millions of objects.
Parsing each one takes a considerable amount of time.
> But at the same time, it sounds like is_promisor_object() seriously
> is wrong. Perhaps we need to tell pack-objects to pre-compute the
> packfile.c:add_promisor_object() stuff and cache the result in an
> on-disk file, just like reverse index is stored in an auxiliary
> file?
As Calvin summarized in this thread [1], creating a promisor object set
is very expensive. The fact that a local object can become a
"promisor object" makes disk caching infeasible.
is_promisor_object() function is not inherently wrong, it's the definition
of "promisor object" that makes implementation difficult.
> What do the callers of the function use to "filter out local
> objects" to ensure that "all objects passed ... are promisor
> objects" do? Have they already spent agonizingly large amount of
> time to do so?
They call oid_object_info_extended() function to check whether the object
exists in the local storage. Only objects that do not exist locally are
sent to promisor_remote_get_direct().
> So objects in the
> array are either promisor objects or missing due to repository
> corruption---we simply cannot tell. I suspect that the claim
> "everything the caller calls the function with is a promisor object"
> is not exactly correct.
You are right that objects that are not present in the local repository
do not equal promisor objects. The array could contain missing objects
that are not promisor objects.
> The end-result when remaining_nr is not zero (i.e., some objects
> that the caller wanted us to be fetched) would not exactly be the
> same. The function used to die only when the object we failed to
> obtain was what a promisor remote promised to give us. With this
> change, we also die when an object the caller asked us to fetch is
> not promised by any promisor. I do not know what the implication
> of this behaviour change would be.
I tested this patch with git-cat-file and git-diff on a repository
missing a normal object. The commands fail regardless of the patch,
but the error message is different. The message changed from
fatal: Not a valid object name
to
fatal: could not fetch ... from promisor remote
[1] https://lore.kernel.org/git/CAFySSZCyoaKCGycYgJjCJGJ2mV1yfg+gVFb7RytGKmkjupkNkQ@mail.gmail.com/
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-05-30 13:26 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-28 9:58 [PATCH] promisor-remote: remove the promisor object check for failed fetch Han Young
2025-05-29 15:23 ` Junio C Hamano
2025-05-30 13:26 ` [External] " Han Young
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).