From: Junio C Hamano <gitster@pobox.com>
To: Emily Shaffer <emilyshaffer@google.com>
Cc: git@vger.kernel.org, Calvin Wan <calvinwan@google.com>,
Han Young <hanyang.tony@bytedance.com>,
Jonathan Tan <jonathantanmy@google.com>,
sokcevic@google.com
Subject: Re: [RFC PATCH] promisor-remote: always JIT fetch with --refetch
Date: Sun, 06 Oct 2024 15:43:36 -0700 [thread overview]
Message-ID: <xmqqset8c0o7.fsf@gitster.g> (raw)
In-Reply-To: <20241003223546.1935471-1-emilyshaffer@google.com> (Emily Shaffer's message of "Thu, 3 Oct 2024 15:35:46 -0700")
Emily Shaffer <emilyshaffer@google.com> writes:
> By the time we decide we need to do a partial clone fetch, we already
> know the object is missing, even if the_repository->parsed_objects
> thinks it exists. But --refetch bypasses the local object check, so we
> can guarantee that a JIT fetch will fix incorrect local caching.
> ...
> The culprit is that we're assuming all local refs already must have
> objects in place. Using --refetch means we ignore that assumption during
> JIT fetch.
Hmph. The whole lazy fetch business looks more and more broken X-<.
There is a comment in the refetch code path that tells us to "perform
a full refetch ignoring existing objects", but if an object truly
exists, there should be no need to refetch, and it starts to smell
more like "ignoring somebody who gives us an incorrect information
that these objects exist".
But a ref that points at a missing commit is "somebody giving a
false information" and an option to ignore such misinformation would
be a perfect tool fit to sweep such a breakage under the rug.
But is this sufficient? Looking at how check_exist_and_connected()
does its work, I am not sure how it would cope with a case where an
object that is pointed by a ref does happen to exist, but the commit
that is referred to by the commit is missing, as it only checks the
existence of the tips.
> diff --git a/promisor-remote.c b/promisor-remote.c
> index 9345ae3db2..cf00e31d3b 100644
> --- a/promisor-remote.c
> +++ b/promisor-remote.c
> @@ -43,7 +43,7 @@ static int fetch_objects(struct repository *repo,
> strvec_pushl(&child.args, "-c", "fetch.negotiationAlgorithm=noop",
> "fetch", remote_name, "--no-tags",
> "--no-write-fetch-head", "--recurse-submodules=no",
> - "--filter=blob:none", "--stdin", NULL);
> + "--filter=blob:none", "--refetch", "--stdin", NULL);
> if (!git_config_get_bool("promisor.quiet", &quiet) && quiet)
> strvec_push(&child.args, "--quiet");
> if (start_command(&child))
The documentation for "git fetch --refetch" says that this grabs
everything as if we are making a fresh clone, ignoring everything we
already have. Which makes the change in this patch prohibitively
expensive for asking each single object lazily from the promisor
remote, but is that really the case? If there is a reasonable
safety that prevents us from doing something silly like transferring
one clone worth of data for every single object we lazily fetch,
perhaps this would be a workable solution (but if that is the case,
perhaps "git fetch --refetch" documentation needs to be rephrased,
to avoid such an impression).
Thanks.
next prev parent reply other threads:[~2024-10-06 22:43 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-03 22:35 [RFC PATCH] promisor-remote: always JIT fetch with --refetch Emily Shaffer
2024-10-06 22:43 ` Junio C Hamano [this message]
2024-10-07 0:21 ` Robert Coup
2024-10-07 0:37 ` Junio C Hamano
2024-10-11 16:40 ` Emily Shaffer
2024-10-11 17:54 ` Junio C Hamano
2024-10-23 0:28 ` [PATCH v2] fetch-pack: don't mark COMPLETE unless we have the full object Emily Shaffer
2024-10-23 18:53 ` Emily Shaffer
2024-10-23 20:11 ` Taylor Blau
2024-10-28 22:55 ` Jonathan Tan
2024-10-29 21:11 ` [PATCH 0/2] When fetching, warn if in commit graph but not obj db Jonathan Tan
2024-10-29 21:11 ` [PATCH 1/2] Revert "fetch-pack: add a deref_without_lazy_fetch_extended()" Jonathan Tan
2024-10-30 21:22 ` Josh Steadmon
2024-10-29 21:11 ` [PATCH 2/2] fetch-pack: warn if in commit graph but not obj db Jonathan Tan
2024-10-30 21:22 ` Josh Steadmon
2024-10-31 21:23 ` Jonathan Tan
2024-10-31 20:59 ` Taylor Blau
2024-10-31 21:43 ` Jonathan Tan
2024-11-01 14:33 ` Taylor Blau
2024-11-01 17:33 ` Jonathan Tan
2024-10-30 21:22 ` [PATCH 0/2] When fetching, " Josh Steadmon
2024-10-31 21:18 ` [PATCH v2 0/2] When fetching, die " Jonathan Tan
2024-10-31 21:19 ` [PATCH v2 1/2] Revert "fetch-pack: add a deref_without_lazy_fetch_extended()" Jonathan Tan
2024-10-31 21:19 ` [PATCH v2 2/2] fetch-pack: warn if in commit graph but not obj db Jonathan Tan
2024-11-01 2:22 ` Junio C Hamano
2024-11-01 4:25 ` Junio C Hamano
2024-11-01 8:59 ` [External] " Han Xin
2024-11-01 17:46 ` Jonathan Tan
2024-11-01 17:40 ` Jonathan Tan
2024-11-02 2:08 ` Junio C Hamano
2024-11-01 17:36 ` Jonathan Tan
2024-11-01 15:18 ` Taylor Blau
2024-11-01 17:49 ` Jonathan Tan
2024-10-31 22:33 ` [PATCH v2 0/2] When fetching, die " Josh Steadmon
2024-11-05 19:24 ` [PATCH v3 " Jonathan Tan
2024-11-05 19:24 ` [PATCH v3 1/2] Revert "fetch-pack: add a deref_without_lazy_fetch_extended()" Jonathan Tan
2024-11-05 19:24 ` [PATCH v3 2/2] fetch-pack: die if in commit graph but not obj db Jonathan Tan
2024-11-06 3:12 ` [PATCH v3 0/2] When fetching, " Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqset8c0o7.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=calvinwan@google.com \
--cc=emilyshaffer@google.com \
--cc=git@vger.kernel.org \
--cc=hanyang.tony@bytedance.com \
--cc=jonathantanmy@google.com \
--cc=sokcevic@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).