git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] promisor-remote: always JIT fetch with --refetch
@ 2024-10-03 22:35 Emily Shaffer
  2024-10-06 22:43 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Emily Shaffer @ 2024-10-03 22:35 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer, Calvin Wan, Han Young, Jonathan Tan, sokcevic

By the time we decide we need to do a partial clone fetch, we already
know the object is missing, even if the_repository->parsed_objects
thinks it exists. But --refetch bypasses the local object check, so we
can guarantee that a JIT fetch will fix incorrect local caching.

This manifested at $DAYJOB in a repo with the following features:
 * blob-filtered partial clone enabled
 * commit graph enabled
 * ref Foo pointing to commit object 6aaaca
 * object 6aaaca missing[a]

With these prerequisites, we noticed that `git fetch` in the repo
produced an infinite loop:
1. `git fetch` tries to fetch, but thinks it has all objects, so it
   noops.
2. At the end of cmd_fetch(), we try to write_commit_graph_reachable().
3. write_commit_graph_reachable() does a reachability walk, including
   starting from Foo
4. The reachability walk tries to peel Foo, and notices it's missing
   6aaaca.
5. The partial clone machinery asks for a per-object JIT fetch of
   6aaaca.
6. `git fetch` (child process) is asked to fetch 6aaaca.
7. We put together the_repository->parsed_objects, adding all commit IDs
   reachable from local refs to it
   (fetch-pack.c:mark_complete_and_common_refs(), trace region
   mark_complete_local_refs). We see Foo, so we add 6aaaca to
   the_repository->parsed_objects and mark it as COMPLETE.
8. cmd_fetch notices that the object ID it was asked for is already
   known, so does not fetch anything new.
9. GOTO 2.

The culprit is that we're assuming all local refs already must have
objects in place. Using --refetch means we ignore that assumption during
JIT fetch.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>

---

There are a few alternative approaches for this issue that I talked
about with some folks at $DAYJOB:

i. Just disabling the commit graph rewrite allows this to fall
eventually into a path where the fetch actually succeeds. I didn't like
this solution - it's just whack-a-mole - so I didn't look too hard into
why it succeeds that way. It *could* make sense to disable commit graph
rewrite when we do a JIT fetch with blob or tree filter provided - but
if later we want to implement commit filter (something we've talked
about at Google) then I'd worry about this situation coming up again.

ii. We could decide not to mark local refs (and commits reachable from
them) as COMPLETE in the_repository->parsed_objects. I didn't try this
solution out, and I'm not sure what the performance implications are,
but Jonathan Tan likes this solution, so I may try it out and see what
breaks shortly.

iii. We could do all the JIT fetches with --refetch. In my opinion, this
is the safest/most self-healing solution; the JIT fetch only happens
when we really know we're missing the object, so it doesn't make sense
for that fetch to be canceled by any cache. It doesn't have performance
implications as far as I can guess (except that I think we still build
the parsed_objects hash even though we are going to ignore it, but we
already were doing that anyway). Of course, that's what this patch does.

iv. We could do nothing; when cmd_fetch gets a fetch-by-object-id but
decides there is nothing more to do, it could terminate with an error.
That should stop the infinite recursion, and the error could suggest the
user to run `git fsck` and discover what the problem is. Depending on
the remediation we suggest, though, I think a direct fetch to fix this
particular loop would not work.

I'm curious to hear thoughts from people who are more expert than me on
partial clone and fetching in general, though.

This change is also still in RFC, for two reasons:

First, it's intermittently failing tests for me locally, in weirdly
flaky ways:

- t0410-partial-clone.sh fails when I run it from prove, but passes when
  I run it manually, every time.
- t5601-clone.sh and t5505-remote.sh fail nonsensically on `rm -rf` that
  should succeed (and does succeed if I stop the test with test_pause),
  which makes me think there's something else borked in my setup, but
  I'm not sure what.
- t5616-partial-clone.sh actually does fail in a way that I could see
  having to do with this change (since I guess we might download more
  packs than usual), but I was so confused by the other two errors I
  haven't looked closely yet.

And secondly, I didn't write tests verifying the breakage and that this
change fixes it yet, either.

I'm going to work on both those things in the background, but I wanted
to get the description and RFC out early so that folks could take a look
and we could decide which approach is best.

Thanks,
 - Emily

a: That commit object went missing as a byproduct of this partial clone
   gc issue that Calvin, Jonathan, Han Young, and others have been
   investigating:
   https://lore.kernel.org/git/20241001191811.1934900-1-calvinwan@google.com/
---
 promisor-remote.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/promisor-remote.c b/promisor-remote.c
index 9345ae3db2..cf00e31d3b 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -43,7 +43,7 @@ static int fetch_objects(struct repository *repo,
 	strvec_pushl(&child.args, "-c", "fetch.negotiationAlgorithm=noop",
 		     "fetch", remote_name, "--no-tags",
 		     "--no-write-fetch-head", "--recurse-submodules=no",
-		     "--filter=blob:none", "--stdin", NULL);
+		     "--filter=blob:none", "--refetch", "--stdin", NULL);
 	if (!git_config_get_bool("promisor.quiet", &quiet) && quiet)
 		strvec_push(&child.args, "--quiet");
 	if (start_command(&child))
-- 
2.47.0.rc0.187.ge670bccf7e-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH] promisor-remote: always JIT fetch with --refetch
  2024-10-03 22:35 [RFC PATCH] promisor-remote: always JIT fetch with --refetch Emily Shaffer
@ 2024-10-06 22:43 ` Junio C Hamano
  2024-10-07  0:21   ` Robert Coup
  2024-10-11 16:40   ` Emily Shaffer
  2024-10-23  0:28 ` [PATCH v2] fetch-pack: don't mark COMPLETE unless we have the full object Emily Shaffer
  2024-10-29 21:11 ` [PATCH 0/2] When fetching, warn if in commit graph but not obj db Jonathan Tan
  2 siblings, 2 replies; 38+ messages in thread
From: Junio C Hamano @ 2024-10-06 22:43 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git, Calvin Wan, Han Young, Jonathan Tan, sokcevic

Emily Shaffer <emilyshaffer@google.com> writes:

> By the time we decide we need to do a partial clone fetch, we already
> know the object is missing, even if the_repository->parsed_objects
> thinks it exists. But --refetch bypasses the local object check, so we
> can guarantee that a JIT fetch will fix incorrect local caching.
> ...
> The culprit is that we're assuming all local refs already must have
> objects in place. Using --refetch means we ignore that assumption during
> JIT fetch.

Hmph.  The whole lazy fetch business looks more and more broken X-<.
There is a comment in the refetch code path that tells us to "perform
a full refetch ignoring existing objects", but if an object truly
exists, there should be no need to refetch, and it starts to smell
more like "ignoring somebody who gives us an incorrect information
that these objects exist".

But a ref that points at a missing commit is "somebody giving a
false information" and an option to ignore such misinformation would
be a perfect tool fit to sweep such a breakage under the rug.

But is this sufficient?  Looking at how check_exist_and_connected()
does its work, I am not sure how it would cope with a case where an
object that is pointed by a ref does happen to exist, but the commit
that is referred to by the commit is missing, as it only checks the
existence of the tips.

> diff --git a/promisor-remote.c b/promisor-remote.c
> index 9345ae3db2..cf00e31d3b 100644
> --- a/promisor-remote.c
> +++ b/promisor-remote.c
> @@ -43,7 +43,7 @@ static int fetch_objects(struct repository *repo,
>  	strvec_pushl(&child.args, "-c", "fetch.negotiationAlgorithm=noop",
>  		     "fetch", remote_name, "--no-tags",
>  		     "--no-write-fetch-head", "--recurse-submodules=no",
> -		     "--filter=blob:none", "--stdin", NULL);
> +		     "--filter=blob:none", "--refetch", "--stdin", NULL);
>  	if (!git_config_get_bool("promisor.quiet", &quiet) && quiet)
>  		strvec_push(&child.args, "--quiet");
>  	if (start_command(&child))

The documentation for "git fetch --refetch" says that this grabs
everything as if we are making a fresh clone, ignoring everything we
already have.  Which makes the change in this patch prohibitively
expensive for asking each single object lazily from the promisor
remote, but is that really the case?  If there is a reasonable
safety that prevents us from doing something silly like transferring
one clone worth of data for every single object we lazily fetch,
perhaps this would be a workable solution (but if that is the case,
perhaps "git fetch --refetch" documentation needs to be rephrased,
to avoid such an impression).

Thanks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH] promisor-remote: always JIT fetch with --refetch
  2024-10-06 22:43 ` Junio C Hamano
@ 2024-10-07  0:21   ` Robert Coup
  2024-10-07  0:37     ` Junio C Hamano
  2024-10-11 16:40   ` Emily Shaffer
  1 sibling, 1 reply; 38+ messages in thread
From: Robert Coup @ 2024-10-07  0:21 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: git, Calvin Wan, Han Young, Jonathan Tan, sokcevic,
	Junio C Hamano

Hi Emily,

I was the one who originally implemented --refetch in [1][2]

[1] https://lore.kernel.org/git/pull.1138.v4.git.1648476131.gitgitgadget@gmail.com/
[2] https://github.com/gitgitgadget/git/pull/1138

On Sun, 6 Oct 2024 at 23:43, Junio C Hamano <gitster@pobox.com> wrote:
>
> Hmph.  The whole lazy fetch business looks more and more broken X-<.
> There is a comment in the refetch code path that tells us to "perform
> a full refetch ignoring existing objects", but if an object truly
> exists, there should be no need to refetch, and it starts to smell
> more like "ignoring somebody who gives us an incorrect information
> that these objects exist".

Basically --refetch was originally designed to send no 'have's during a fetch,
the original motivation being changing a partial clone filter and fetching
all the newly-applicable trees & blobs in a single transfer.

> The documentation for "git fetch --refetch" says that this grabs
> everything as if we are making a fresh clone, ignoring everything we
> already have.  Which makes the change in this patch prohibitively
> expensive for asking each single object lazily from the promisor
> remote, but is that really the case?

From a very quick re-review this is correct that it's expensive: refetch sends
no 'have's, so if you pass a single commit oid then it'll fetch all ancestors
and all the dependent trees & blobs, duplicating what's in the object store and
relying on a repack to clean up. If a commit is missing that's one way to fix
it, but it's a pretty nuclear option: feels like your option iv (terminate with
an error) leading to fsck invoking/suggesting --refetch might avoid
unintentionally recloning the entire repo.

In my original RFC [3], Jonathan Tan suggested that --refetch could be useful
to repair missing objects like this, but it was out of scope for me at the time.
But maybe there's a way to improve it for this sort of case?

[3] https://lore.kernel.org/git/20220202185957.1928631-1-jonathantanmy@google.com/

> Emily Shaffer <emilyshaffer@google.com> writes:
>
> This manifested at $DAYJOB in a repo with the following features:
> * blob-filtered partial clone enabled
> * commit graph enabled
> * ref Foo pointing to commit object 6aaaca
> * object 6aaaca missing[a]

I presume there wasn't an obvious/related cause for commit 6aaaca to go missing
in the first place?

Thanks,

Rob :)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH] promisor-remote: always JIT fetch with --refetch
  2024-10-07  0:21   ` Robert Coup
@ 2024-10-07  0:37     ` Junio C Hamano
  0 siblings, 0 replies; 38+ messages in thread
From: Junio C Hamano @ 2024-10-07  0:37 UTC (permalink / raw)
  To: Robert Coup
  Cc: Emily Shaffer, git, Calvin Wan, Han Young, Jonathan Tan, sokcevic

Robert Coup <robert.coup@koordinates.com> writes:

> Basically --refetch was originally designed to send no 'have's during a fetch,
> the original motivation being changing a partial clone filter and fetching
> all the newly-applicable trees & blobs in a single transfer.
> ...
> If a commit is missing that's one way to fix
> it, but it's a pretty nuclear option: feels like your option iv (terminate with
> an error) leading to fsck invoking/suggesting --refetch might avoid
> unintentionally recloning the entire repo.
> ...
> In my original RFC [3], Jonathan Tan suggested that --refetch could be useful
> to repair missing objects like this, but it was out of scope for me at the time.
> But maybe there's a way to improve it for this sort of case?
>
> [3] https://lore.kernel.org/git/20220202185957.1928631-1-jonathantanmy@google.com/

Thanks for your comments on the original story behind that option.

> I presume there wasn't an obvious/related cause for commit 6aaaca to go missing
> in the first place?

Emily had this after the three-dash line


a: That commit object went missing as a byproduct of this partial clone
   gc issue that Calvin, Jonathan, Han Young, and others have been
   investigating:
   https://lore.kernel.org/git/20241001191811.1934900-1-calvinwan@google.com/

IOW, I think how the lossage was caused is well understood by now.

Thanks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH] promisor-remote: always JIT fetch with --refetch
  2024-10-06 22:43 ` Junio C Hamano
  2024-10-07  0:21   ` Robert Coup
@ 2024-10-11 16:40   ` Emily Shaffer
  2024-10-11 17:54     ` Junio C Hamano
  1 sibling, 1 reply; 38+ messages in thread
From: Emily Shaffer @ 2024-10-11 16:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Calvin Wan, Han Young, Jonathan Tan, sokcevic

Sorry for the slow response/page-context-back-in. I'll be working on
this today and try to send a different approach, but after that point,
I'm not sure when the next time I'll get a chance to work on it may
be. If I don't come up with something suitable today, it's likely that
Jonathan Tan will take over the effort from me, but I'm not sure
around when he'll be able to prioritize it.

On Sun, Oct 6, 2024 at 3:43 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Emily Shaffer <emilyshaffer@google.com> writes:
>
> > By the time we decide we need to do a partial clone fetch, we already
> > know the object is missing, even if the_repository->parsed_objects
> > thinks it exists. But --refetch bypasses the local object check, so we
> > can guarantee that a JIT fetch will fix incorrect local caching.
> > ...
> > The culprit is that we're assuming all local refs already must have
> > objects in place. Using --refetch means we ignore that assumption during
> > JIT fetch.
>
> Hmph.  The whole lazy fetch business looks more and more broken X-<.

By "lazy fetch", are you referring to the partial clone fetch, or are
you referring to the mark_complete stuff? (I know we have been having
lots of issues with the partial clone fetch at Google in the last
month or so, so excuse me disambiguating :) )

> There is a comment in the refetch code path that tells us to "perform
> a full refetch ignoring existing objects", but if an object truly
> exists, there should be no need to refetch, and it starts to smell
> more like "ignoring somebody who gives us an incorrect information
> that these objects exist".
>
> But a ref that points at a missing commit is "somebody giving a
> false information" and an option to ignore such misinformation would
> be a perfect tool fit to sweep such a breakage under the rug.
>
> But is this sufficient?  Looking at how check_exist_and_connected()
> does its work, I am not sure how it would cope with a case where an
> object that is pointed by a ref does happen to exist, but the commit
> that is referred to by the commit is missing, as it only checks the
> existence of the tips.

Is that so? mark_complete_and_common claims that it recurses through
all parents of all local refs and marks them existing, too. Looks like
it does that in fetch-pack.c:mark_recent_complete_commits(), only up
to a certain date cutoff, and doesn't do that at all if there's no
cutoff provided. I don't think I see anywhere else that it's recursing
over parents, so I'm not sure why the comment says that. In fact, I
sort of wonder if the comment is wrong; it was introduced in this[1]
series much later than this code block has existed. But then, nobody
questioned it during the series, so I can also be misreading the code
:)

>
> > diff --git a/promisor-remote.c b/promisor-remote.c
> > index 9345ae3db2..cf00e31d3b 100644
> > --- a/promisor-remote.c
> > +++ b/promisor-remote.c
> > @@ -43,7 +43,7 @@ static int fetch_objects(struct repository *repo,
> >       strvec_pushl(&child.args, "-c", "fetch.negotiationAlgorithm=noop",
> >                    "fetch", remote_name, "--no-tags",
> >                    "--no-write-fetch-head", "--recurse-submodules=no",
> > -                  "--filter=blob:none", "--stdin", NULL);
> > +                  "--filter=blob:none", "--refetch", "--stdin", NULL);
> >       if (!git_config_get_bool("promisor.quiet", &quiet) && quiet)
> >               strvec_push(&child.args, "--quiet");
> >       if (start_command(&child))
>
> The documentation for "git fetch --refetch" says that this grabs
> everything as if we are making a fresh clone, ignoring everything we
> already have.  Which makes the change in this patch prohibitively
> expensive for asking each single object lazily from the promisor
> remote, but is that really the case?  If there is a reasonable
> safety that prevents us from doing something silly like transferring
> one clone worth of data for every single object we lazily fetch,
> perhaps this would be a workable solution (but if that is the case,
> perhaps "git fetch --refetch" documentation needs to be rephrased,
> to avoid such an impression).

Yeah, this is on me for not reading the entire documentation, just
noticing in code that it disabled this COMPLETE cache thingie. You're
right that it would be too expensive to use this way. As I said at the
top, I'll try to send one of the other alternative approaches today.

 - Emily

1: https://lore.kernel.org/git/pull.451.git.1572981981.gitgitgadget@gmail.com/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH] promisor-remote: always JIT fetch with --refetch
  2024-10-11 16:40   ` Emily Shaffer
@ 2024-10-11 17:54     ` Junio C Hamano
  0 siblings, 0 replies; 38+ messages in thread
From: Junio C Hamano @ 2024-10-11 17:54 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git, Calvin Wan, Han Young, Jonathan Tan, sokcevic

Emily Shaffer <nasamuffin@google.com> writes:

> By "lazy fetch", are you referring to the partial clone fetch, or are

Anything we do to compensate for the fact that that initial clone or
fetch can be told to deliberately omit objects, in the hope that we
can grab missing objects on demand.

> Yeah, this is on me for not reading the entire documentation, just
> noticing in code that it disabled this COMPLETE cache thingie. You're
> right that it would be too expensive to use this way. As I said at the
> top, I'll try to send one of the other alternative approaches today.

I thought you were already on your vacation ;-).  I'll go offline
end of this week and won't be back until the end of the month.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2] fetch-pack: don't mark COMPLETE unless we have the full object
  2024-10-03 22:35 [RFC PATCH] promisor-remote: always JIT fetch with --refetch Emily Shaffer
  2024-10-06 22:43 ` Junio C Hamano
@ 2024-10-23  0:28 ` Emily Shaffer
  2024-10-23 18:53   ` Emily Shaffer
  2024-10-23 20:11   ` Taylor Blau
  2024-10-29 21:11 ` [PATCH 0/2] When fetching, warn if in commit graph but not obj db Jonathan Tan
  2 siblings, 2 replies; 38+ messages in thread
From: Emily Shaffer @ 2024-10-23  0:28 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer, Calvin Wan, Han Young, Jonathan Tan, sokcevic

When fetching, we decide which objects to skip asking for marking
certain commit IDs with the COMPLETE flag. This flag is set in
fetch-pack.c:mark_complete(), which is called from a few different
functions who decide what to mark or not mark. mark_complete() is
insulated against null pointer deref and repeatedly writing the same
commit to the list of objects to skip; because it's the central function
which decides that an object is COMPLETE and doesn't need to be fetched,
let's also insulate it against corruption where the object is not
present (even though we think it is).

Without this check, it's possible to reach a corrupted state where
fetches can infinitely loop because we decide to skip fetching, but we
don't actually have the skipped commit in our object store.

This manifested at $DAYJOB in a repo with the following features:
 * blob-filtered partial clone enabled
 * commit graph enabled
 * ref Foo pointing to commit object 6aaaca
 * object 6aaaca missing[1]

With these prerequisites, we noticed that `git fetch` in the repo
produced an infinite loop:
1. `git fetch` tries to fetch, but thinks it has all objects, so it
   noops.
2. At the end of cmd_fetch(), we try to write_commit_graph_reachable().
3. write_commit_graph_reachable() does a reachability walk, including
   starting from Foo
4. The reachability walk tries to peel Foo, and notices it's missing
   6aaaca.
5. The partial clone machinery asks for a per-object JIT fetch of
   6aaaca.
6. `git fetch` (child process) is asked to fetch 6aaaca.
7. We put together the_repository->parsed_objects, adding all commit IDs
   reachable from local refs to it
   (fetch-pack.c:mark_complete_and_common_refs(), trace region
   mark_complete_local_refs). We see Foo, so we add 6aaaca to
   the_repository->parsed_objects and mark it as COMPLETE.
8. cmd_fetch notices that the object ID it was asked for is already
   known, so does not fetch anything new.
9. GOTO 2.

The culprit is that we're assuming all local refs already must have
objects in place. Let's not assume that, and explicitly check
has_object() before marking objects as COMPLETE.

NEEDSWORK: It could be valuable to emit a trace event when we try to
mark_complete() an object where we don't actually has_object() (to
understand how often we're having to heal from corruption).

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Helped-by: Jonathan Tan <jonathantanmy@google.com>

1: That commit object went missing as a byproduct of this partial clone
   gc issue:
   https://lore.kernel.org/git/20241001191811.1934900-1-calvinwan@google.com/

---


On the list, Junio and Robert Coup suggested that we should notice cases
when the list of object IDs we are trying to fetch gets filtered to
nothing during a lazy fetch. However, this turned out to be quite tricky
to implement - fetch-by-OID stores the OID in a ref object, so it's
challenging to guess that the ref we care about is an OID instead of a
normal ref. It also required some heuristic approach to catch the exact
moment after we removed COMPLETE objects from the list we wanted to
fetch, and to notice that the wrong objects were missing from that list.

(From the list of alternatives I included with v1, v1 was approach iii;
Junio and Robert suggested iv; this patch holds something close to ii.)

Jonathan Tan suggested that instead, we could be more careful about what
we mark COMPLETE or not; it seems like this is pretty straightforward to
do.

I do wonder if it's a layering violation to do the has_object() check in
mark_complete(), which is otherwise a pretty stupid function; I included
it there because all of the other flavors of mark_complete_* eventually
boil down to mark_complete(), so we wouldn't need to remember to check
for object existence any other time we're trying to do this COMPLETE
marking thing.

Note that I added a test to guarantee this behavior works, but without
commit graph enabled, the test passes both before and after; I guess
maybe it's better to add an explicit regression test for the
combination? But, this test fails with reftable - because we don't have
a way (that I know of) to force-create a ref with a bad ID, to force
this error condition. In the test as written I'm writing to
.git/refs/heads/bar directly; that doesn't work for reftable. But `git
update-ref` is too smart to let me set a ref to garbage. Any tips there
are welcome.

The CI run is at
https://github.com/nasamuffin/git/actions/runs/11470039702 - it seems
the reftable tests are the only things failing.

Also, I am sending this version, but if there are any additional
comments or it requires more changes, please expect Jonathan Tan to take
over driving this patch the rest of the way for me. As previously
stated[2], I'll be OOO after this Friday for most of the rest of the
year; the rest of this week I'm trying to get the rest of my
non-upstream loose ends tied up, so I won't have time to do another
iteration. See folks around Christmastime :)

 - Emily

2: https://lore.kernel.org/git/CAJoAoZnovapqMcu72DGR40jRRqRn57uJVTJg82kZ_rohtGDSfQ@mail.gmail.com/

Cover letter from v1 follows:

There are a few alternative approaches for this issue that I talked
about with some folks at $DAYJOB:

i. Just disabling the commit graph rewrite allows this to fall
eventually into a path where the fetch actually succeeds. I didn't like
this solution - it's just whack-a-mole - so I didn't look too hard into
why it succeeds that way. It *could* make sense to disable commit graph
rewrite when we do a JIT fetch with blob or tree filter provided - but
if later we want to implement commit filter (something we've talked
about at Google) then I'd worry about this situation coming up again.

ii. We could decide not to mark local refs (and commits reachable from
them) as COMPLETE in the_repository->parsed_objects. I didn't try this
solution out, and I'm not sure what the performance implications are,
but Jonathan Tan likes this solution, so I may try it out and see what
breaks shortly.

iii. We could do all the JIT fetches with --refetch. In my opinion, this
is the safest/most self-healing solution; the JIT fetch only happens
when we really know we're missing the object, so it doesn't make sense
for that fetch to be canceled by any cache. It doesn't have performance
implications as far as I can guess (except that I think we still build
the parsed_objects hash even though we are going to ignore it, but we
already were doing that anyway). Of course, that's what this patch does.

iv. We could do nothing; when cmd_fetch gets a fetch-by-object-id but
decides there is nothing more to do, it could terminate with an error.
That should stop the infinite recursion, and the error could suggest the
user to run `git fsck` and discover what the problem is. Depending on
the remediation we suggest, though, I think a direct fetch to fix this
particular loop would not work.

I'm curious to hear thoughts from people who are more expert than me on
partial clone and fetching in general, though.

This change is also still in RFC, for two reasons:

First, it's intermittently failing tests for me locally, in weirdly
flaky ways:

- t0410-partial-clone.sh fails when I run it from prove, but passes when
  I run it manually, every time.
- t5601-clone.sh and t5505-remote.sh fail nonsensically on `rm -rf` that
  should succeed (and does succeed if I stop the test with test_pause),
  which makes me think there's something else borked in my setup, but
  I'm not sure what.
- t5616-partial-clone.sh actually does fail in a way that I could see
  having to do with this change (since I guess we might download more
  packs than usual), but I was so confused by the other two errors I
  haven't looked closely yet.

And secondly, I didn't write tests verifying the breakage and that this
change fixes it yet, either.

I'm going to work on both those things in the background, but I wanted
to get the description and RFC out early so that folks could take a look
and we could decide which approach is best.

Thanks,
 - Emily
---
 fetch-pack.c             |  4 +++-
 t/t0410-partial-clone.sh | 30 ++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index f752da93a8..8cb2ce4c54 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -603,7 +603,9 @@ static int mark_complete(const struct object_id *oid)
 {
 	struct commit *commit = deref_without_lazy_fetch(oid, 1);
 
-	if (commit && !(commit->object.flags & COMPLETE)) {
+	if (commit &&
+	    !(commit->object.flags & COMPLETE) &&
+	    has_object(the_repository, oid, 0)) {
 		commit->object.flags |= COMPLETE;
 		commit_list_insert(commit, &complete);
 	}
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 818700fbec..95de18ec40 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -241,6 +241,36 @@ test_expect_success 'fetching of missing objects works with ref-in-want enabled'
 	grep "fetch< fetch=.*ref-in-want" trace
 '
 
+test_expect_success 'fetching missing objects pointed to by a local ref' '
+	rm -rf reliable-server unreliable-client &&
+	test_when_finished rm -rf reliable-server unreliable-client &&
+	test_create_repo reliable-server &&
+	git -C reliable-server config uploadpack.allowanysha1inwant 1 &&
+	git -C reliable-server config uploadpack.allowfilter 1 &&
+	test_commit -C reliable-server foo &&
+
+	git clone --filter=blob:none "file://$(pwd)/reliable-server" unreliable-client &&
+
+	# to simulate the unreliable client losing a referenced object by
+	# corruption, create the object on the server side, then create only a
+	# reference to that object on the client side (without providing the
+	# object itself).
+	test_commit -C reliable-server bar &&
+	HASH=$(git -C reliable-server rev-parse HEAD) &&
+	echo "$HASH" >unreliable-client/.git/refs/heads/bar &&
+
+	# the object is really missing
+	# check if we can rev-parse a partial SHA. partial so we do not fetch it,
+	# but barely partial (trim only the last char) so that we do not collide
+	test_must_fail git -C unreliable-client rev-parse ${HASH%%?} &&
+
+	# trigger a remote fetch by checking out `bar`
+	git -C unreliable-client switch bar &&
+
+	# and now we have the missing object
+	git -C unreliable-client rev-parse ${HASH%%?}
+'
+
 test_expect_success 'fetching of missing objects from another promisor remote' '
 	git clone "file://$(pwd)/server" server2 &&
 	test_commit -C server2 bar &&

Range-diff against v1:
1:  092be0a655 ! 1:  4db6bbb4cd promisor-remote: always JIT fetch with --refetch
    - ## promisor-remote.c ##
    -@@ promisor-remote.c: static int fetch_objects(struct repository *repo,
    - 	strvec_pushl(&child.args, "-c", "fetch.negotiationAlgorithm=noop",
    - 		     "fetch", remote_name, "--no-tags",
    - 		     "--no-write-fetch-head", "--recurse-submodules=no",
    --		     "--filter=blob:none", "--stdin", NULL);
    -+		     "--filter=blob:none", "--refetch", "--stdin", NULL);
    - 	if (!git_config_get_bool("promisor.quiet", &quiet) && quiet)
    - 		strvec_push(&child.args, "--quiet");
    - 	if (start_command(&child))
    + ## fetch-pack.c ##
    +@@ fetch-pack.c: static int mark_complete(const struct object_id *oid)
    + {
    + 	struct commit *commit = deref_without_lazy_fetch(oid, 1);
    + 
    +-	if (commit && !(commit->object.flags & COMPLETE)) {
    ++	if (commit &&
    ++	    !(commit->object.flags & COMPLETE) &&
    ++	    has_object(the_repository, oid, 0)) {
    + 		commit->object.flags |= COMPLETE;
    + 		commit_list_insert(commit, &complete);
    + 	}
    +
    + ## t/t0410-partial-clone.sh ##
    +@@ t/t0410-partial-clone.sh: test_expect_success 'fetching of missing objects works with ref-in-want enabled'
    + 	grep "fetch< fetch=.*ref-in-want" trace
    + '
    + 
    ++test_expect_success 'fetching missing objects pointed to by a local ref' '
    ++	rm -rf reliable-server unreliable-client &&
    ++	test_when_finished rm -rf reliable-server unreliable-client &&
    ++	test_create_repo reliable-server &&
    ++	git -C reliable-server config uploadpack.allowanysha1inwant 1 &&
    ++	git -C reliable-server config uploadpack.allowfilter 1 &&
    ++	test_commit -C reliable-server foo &&
    ++
    ++	git clone --filter=blob:none "file://$(pwd)/reliable-server" unreliable-client &&
    ++
    ++	# to simulate the unreliable client losing a referenced object by
    ++	# corruption, create the object on the server side, then create only a
    ++	# reference to that object on the client side (without providing the
    ++	# object itself).
    ++	test_commit -C reliable-server bar &&
    ++	HASH=$(git -C reliable-server rev-parse HEAD) &&
    ++	echo "$HASH" >unreliable-client/.git/refs/heads/bar &&
    ++
    ++	# the object is really missing
    ++	# check if we can rev-parse a partial SHA. partial so we do not fetch it,
    ++	# but barely partial (trim only the last char) so that we do not collide
    ++	test_must_fail git -C unreliable-client rev-parse ${HASH%%?} &&
    ++
    ++	# trigger a remote fetch by checking out `bar`
    ++	git -C unreliable-client switch bar &&
    ++
    ++	# and now we have the missing object
    ++	git -C unreliable-client rev-parse ${HASH%%?}
    ++'
    ++
    + test_expect_success 'fetching of missing objects from another promisor remote' '
    + 	git clone "file://$(pwd)/server" server2 &&
    + 	test_commit -C server2 bar &&
-- 
2.47.0.105.g07ac214952-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v2] fetch-pack: don't mark COMPLETE unless we have the full object
  2024-10-23  0:28 ` [PATCH v2] fetch-pack: don't mark COMPLETE unless we have the full object Emily Shaffer
@ 2024-10-23 18:53   ` Emily Shaffer
  2024-10-23 20:11   ` Taylor Blau
  1 sibling, 0 replies; 38+ messages in thread
From: Emily Shaffer @ 2024-10-23 18:53 UTC (permalink / raw)
  To: git
  Cc: Calvin Wan, Han Young, Jonathan Tan, sokcevic, Junio C Hamano,
	robert.coup

(I missed ccing reviewers from last round; adding now, although I know
Junio is vacationing. Sorry about that.)

On Tue, Oct 22, 2024 at 5:28 PM Emily Shaffer <emilyshaffer@google.com> wrote:
>
> When fetching, we decide which objects to skip asking for marking
> certain commit IDs with the COMPLETE flag. This flag is set in
> fetch-pack.c:mark_complete(), which is called from a few different
> functions who decide what to mark or not mark. mark_complete() is
> insulated against null pointer deref and repeatedly writing the same
> commit to the list of objects to skip; because it's the central function
> which decides that an object is COMPLETE and doesn't need to be fetched,
> let's also insulate it against corruption where the object is not
> present (even though we think it is).
>
> Without this check, it's possible to reach a corrupted state where
> fetches can infinitely loop because we decide to skip fetching, but we
> don't actually have the skipped commit in our object store.
>
> This manifested at $DAYJOB in a repo with the following features:
>  * blob-filtered partial clone enabled
>  * commit graph enabled
>  * ref Foo pointing to commit object 6aaaca
>  * object 6aaaca missing[1]
>
> With these prerequisites, we noticed that `git fetch` in the repo
> produced an infinite loop:
> 1. `git fetch` tries to fetch, but thinks it has all objects, so it
>    noops.
> 2. At the end of cmd_fetch(), we try to write_commit_graph_reachable().
> 3. write_commit_graph_reachable() does a reachability walk, including
>    starting from Foo
> 4. The reachability walk tries to peel Foo, and notices it's missing
>    6aaaca.
> 5. The partial clone machinery asks for a per-object JIT fetch of
>    6aaaca.
> 6. `git fetch` (child process) is asked to fetch 6aaaca.
> 7. We put together the_repository->parsed_objects, adding all commit IDs
>    reachable from local refs to it
>    (fetch-pack.c:mark_complete_and_common_refs(), trace region
>    mark_complete_local_refs). We see Foo, so we add 6aaaca to
>    the_repository->parsed_objects and mark it as COMPLETE.
> 8. cmd_fetch notices that the object ID it was asked for is already
>    known, so does not fetch anything new.
> 9. GOTO 2.
>
> The culprit is that we're assuming all local refs already must have
> objects in place. Let's not assume that, and explicitly check
> has_object() before marking objects as COMPLETE.
>
> NEEDSWORK: It could be valuable to emit a trace event when we try to
> mark_complete() an object where we don't actually has_object() (to
> understand how often we're having to heal from corruption).
>
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> Helped-by: Jonathan Tan <jonathantanmy@google.com>
>
> 1: That commit object went missing as a byproduct of this partial clone
>    gc issue:
>    https://lore.kernel.org/git/20241001191811.1934900-1-calvinwan@google.com/
>
> ---
>
>
> On the list, Junio and Robert Coup suggested that we should notice cases
> when the list of object IDs we are trying to fetch gets filtered to
> nothing during a lazy fetch. However, this turned out to be quite tricky
> to implement - fetch-by-OID stores the OID in a ref object, so it's
> challenging to guess that the ref we care about is an OID instead of a
> normal ref. It also required some heuristic approach to catch the exact
> moment after we removed COMPLETE objects from the list we wanted to
> fetch, and to notice that the wrong objects were missing from that list.
>
> (From the list of alternatives I included with v1, v1 was approach iii;
> Junio and Robert suggested iv; this patch holds something close to ii.)
>
> Jonathan Tan suggested that instead, we could be more careful about what
> we mark COMPLETE or not; it seems like this is pretty straightforward to
> do.
>
> I do wonder if it's a layering violation to do the has_object() check in
> mark_complete(), which is otherwise a pretty stupid function; I included
> it there because all of the other flavors of mark_complete_* eventually
> boil down to mark_complete(), so we wouldn't need to remember to check
> for object existence any other time we're trying to do this COMPLETE
> marking thing.
>
> Note that I added a test to guarantee this behavior works, but without
> commit graph enabled, the test passes both before and after; I guess
> maybe it's better to add an explicit regression test for the
> combination? But, this test fails with reftable - because we don't have
> a way (that I know of) to force-create a ref with a bad ID, to force
> this error condition. In the test as written I'm writing to
> .git/refs/heads/bar directly; that doesn't work for reftable. But `git
> update-ref` is too smart to let me set a ref to garbage. Any tips there
> are welcome.

I keep thinking about this test, and the more I think, the less
valuable I believe it is.

I think we aren't super in the habit of writing regression tests, but
would it be that valuable to write a regression test in this case
instead? On the other hand, I think the code diff is quite obviously a
good idea, and we can see from the test suite that there isn't really
a performance hit from it. Is it necessary to add a test at all?

Or, I guess that we could try to inspect how many fetch attempts were
needed to do the JIT fetch in this test. I suspect the number will be
too high without this patch - I know it recurses at least more than
once. I dunno. Anybody have stronger opinions than me?

>
> The CI run is at
> https://github.com/nasamuffin/git/actions/runs/11470039702 - it seems
> the reftable tests are the only things failing.
>
> Also, I am sending this version, but if there are any additional
> comments or it requires more changes, please expect Jonathan Tan to take
> over driving this patch the rest of the way for me. As previously
> stated[2], I'll be OOO after this Friday for most of the rest of the
> year; the rest of this week I'm trying to get the rest of my
> non-upstream loose ends tied up, so I won't have time to do another
> iteration. See folks around Christmastime :)
>
>  - Emily
>
> 2: https://lore.kernel.org/git/CAJoAoZnovapqMcu72DGR40jRRqRn57uJVTJg82kZ_rohtGDSfQ@mail.gmail.com/
>
> Cover letter from v1 follows:
>
> There are a few alternative approaches for this issue that I talked
> about with some folks at $DAYJOB:
>
> i. Just disabling the commit graph rewrite allows this to fall
> eventually into a path where the fetch actually succeeds. I didn't like
> this solution - it's just whack-a-mole - so I didn't look too hard into
> why it succeeds that way. It *could* make sense to disable commit graph
> rewrite when we do a JIT fetch with blob or tree filter provided - but
> if later we want to implement commit filter (something we've talked
> about at Google) then I'd worry about this situation coming up again.
>
> ii. We could decide not to mark local refs (and commits reachable from
> them) as COMPLETE in the_repository->parsed_objects. I didn't try this
> solution out, and I'm not sure what the performance implications are,
> but Jonathan Tan likes this solution, so I may try it out and see what
> breaks shortly.
>
> iii. We could do all the JIT fetches with --refetch. In my opinion, this
> is the safest/most self-healing solution; the JIT fetch only happens
> when we really know we're missing the object, so it doesn't make sense
> for that fetch to be canceled by any cache. It doesn't have performance
> implications as far as I can guess (except that I think we still build
> the parsed_objects hash even though we are going to ignore it, but we
> already were doing that anyway). Of course, that's what this patch does.
>
> iv. We could do nothing; when cmd_fetch gets a fetch-by-object-id but
> decides there is nothing more to do, it could terminate with an error.
> That should stop the infinite recursion, and the error could suggest the
> user to run `git fsck` and discover what the problem is. Depending on
> the remediation we suggest, though, I think a direct fetch to fix this
> particular loop would not work.
>
> I'm curious to hear thoughts from people who are more expert than me on
> partial clone and fetching in general, though.
>
> This change is also still in RFC, for two reasons:
>
> First, it's intermittently failing tests for me locally, in weirdly
> flaky ways:
>
> - t0410-partial-clone.sh fails when I run it from prove, but passes when
>   I run it manually, every time.
> - t5601-clone.sh and t5505-remote.sh fail nonsensically on `rm -rf` that
>   should succeed (and does succeed if I stop the test with test_pause),
>   which makes me think there's something else borked in my setup, but
>   I'm not sure what.
> - t5616-partial-clone.sh actually does fail in a way that I could see
>   having to do with this change (since I guess we might download more
>   packs than usual), but I was so confused by the other two errors I
>   haven't looked closely yet.
>
> And secondly, I didn't write tests verifying the breakage and that this
> change fixes it yet, either.
>
> I'm going to work on both those things in the background, but I wanted
> to get the description and RFC out early so that folks could take a look
> and we could decide which approach is best.
>
> Thanks,
>  - Emily
> ---
>  fetch-pack.c             |  4 +++-
>  t/t0410-partial-clone.sh | 30 ++++++++++++++++++++++++++++++
>  2 files changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/fetch-pack.c b/fetch-pack.c
> index f752da93a8..8cb2ce4c54 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -603,7 +603,9 @@ static int mark_complete(const struct object_id *oid)
>  {
>         struct commit *commit = deref_without_lazy_fetch(oid, 1);
>
> -       if (commit && !(commit->object.flags & COMPLETE)) {
> +       if (commit &&
> +           !(commit->object.flags & COMPLETE) &&
> +           has_object(the_repository, oid, 0)) {
>                 commit->object.flags |= COMPLETE;
>                 commit_list_insert(commit, &complete);
>         }
> diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
> index 818700fbec..95de18ec40 100755
> --- a/t/t0410-partial-clone.sh
> +++ b/t/t0410-partial-clone.sh
> @@ -241,6 +241,36 @@ test_expect_success 'fetching of missing objects works with ref-in-want enabled'
>         grep "fetch< fetch=.*ref-in-want" trace
>  '
>
> +test_expect_success 'fetching missing objects pointed to by a local ref' '
> +       rm -rf reliable-server unreliable-client &&
> +       test_when_finished rm -rf reliable-server unreliable-client &&
> +       test_create_repo reliable-server &&
> +       git -C reliable-server config uploadpack.allowanysha1inwant 1 &&
> +       git -C reliable-server config uploadpack.allowfilter 1 &&
> +       test_commit -C reliable-server foo &&
> +
> +       git clone --filter=blob:none "file://$(pwd)/reliable-server" unreliable-client &&
> +
> +       # to simulate the unreliable client losing a referenced object by
> +       # corruption, create the object on the server side, then create only a
> +       # reference to that object on the client side (without providing the
> +       # object itself).
> +       test_commit -C reliable-server bar &&
> +       HASH=$(git -C reliable-server rev-parse HEAD) &&
> +       echo "$HASH" >unreliable-client/.git/refs/heads/bar &&
> +
> +       # the object is really missing
> +       # check if we can rev-parse a partial SHA. partial so we do not fetch it,
> +       # but barely partial (trim only the last char) so that we do not collide
> +       test_must_fail git -C unreliable-client rev-parse ${HASH%%?} &&
> +
> +       # trigger a remote fetch by checking out `bar`
> +       git -C unreliable-client switch bar &&
> +
> +       # and now we have the missing object
> +       git -C unreliable-client rev-parse ${HASH%%?}
> +'
> +
>  test_expect_success 'fetching of missing objects from another promisor remote' '
>         git clone "file://$(pwd)/server" server2 &&
>         test_commit -C server2 bar &&
>
> Range-diff against v1:
> 1:  092be0a655 ! 1:  4db6bbb4cd promisor-remote: always JIT fetch with --refetch
>     - ## promisor-remote.c ##
>     -@@ promisor-remote.c: static int fetch_objects(struct repository *repo,
>     -   strvec_pushl(&child.args, "-c", "fetch.negotiationAlgorithm=noop",
>     -                "fetch", remote_name, "--no-tags",
>     -                "--no-write-fetch-head", "--recurse-submodules=no",
>     --               "--filter=blob:none", "--stdin", NULL);
>     -+               "--filter=blob:none", "--refetch", "--stdin", NULL);
>     -   if (!git_config_get_bool("promisor.quiet", &quiet) && quiet)
>     -           strvec_push(&child.args, "--quiet");
>     -   if (start_command(&child))
>     + ## fetch-pack.c ##
>     +@@ fetch-pack.c: static int mark_complete(const struct object_id *oid)
>     + {
>     +   struct commit *commit = deref_without_lazy_fetch(oid, 1);
>     +
>     +-  if (commit && !(commit->object.flags & COMPLETE)) {
>     ++  if (commit &&
>     ++      !(commit->object.flags & COMPLETE) &&
>     ++      has_object(the_repository, oid, 0)) {
>     +           commit->object.flags |= COMPLETE;
>     +           commit_list_insert(commit, &complete);
>     +   }
>     +
>     + ## t/t0410-partial-clone.sh ##
>     +@@ t/t0410-partial-clone.sh: test_expect_success 'fetching of missing objects works with ref-in-want enabled'
>     +   grep "fetch< fetch=.*ref-in-want" trace
>     + '
>     +
>     ++test_expect_success 'fetching missing objects pointed to by a local ref' '
>     ++  rm -rf reliable-server unreliable-client &&
>     ++  test_when_finished rm -rf reliable-server unreliable-client &&
>     ++  test_create_repo reliable-server &&
>     ++  git -C reliable-server config uploadpack.allowanysha1inwant 1 &&
>     ++  git -C reliable-server config uploadpack.allowfilter 1 &&
>     ++  test_commit -C reliable-server foo &&
>     ++
>     ++  git clone --filter=blob:none "file://$(pwd)/reliable-server" unreliable-client &&
>     ++
>     ++  # to simulate the unreliable client losing a referenced object by
>     ++  # corruption, create the object on the server side, then create only a
>     ++  # reference to that object on the client side (without providing the
>     ++  # object itself).
>     ++  test_commit -C reliable-server bar &&
>     ++  HASH=$(git -C reliable-server rev-parse HEAD) &&
>     ++  echo "$HASH" >unreliable-client/.git/refs/heads/bar &&
>     ++
>     ++  # the object is really missing
>     ++  # check if we can rev-parse a partial SHA. partial so we do not fetch it,
>     ++  # but barely partial (trim only the last char) so that we do not collide
>     ++  test_must_fail git -C unreliable-client rev-parse ${HASH%%?} &&
>     ++
>     ++  # trigger a remote fetch by checking out `bar`
>     ++  git -C unreliable-client switch bar &&
>     ++
>     ++  # and now we have the missing object
>     ++  git -C unreliable-client rev-parse ${HASH%%?}
>     ++'
>     ++
>     + test_expect_success 'fetching of missing objects from another promisor remote' '
>     +   git clone "file://$(pwd)/server" server2 &&
>     +   test_commit -C server2 bar &&
> --
> 2.47.0.105.g07ac214952-goog
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2] fetch-pack: don't mark COMPLETE unless we have the full object
  2024-10-23  0:28 ` [PATCH v2] fetch-pack: don't mark COMPLETE unless we have the full object Emily Shaffer
  2024-10-23 18:53   ` Emily Shaffer
@ 2024-10-23 20:11   ` Taylor Blau
  2024-10-28 22:55     ` Jonathan Tan
  1 sibling, 1 reply; 38+ messages in thread
From: Taylor Blau @ 2024-10-23 20:11 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git, Calvin Wan, Han Young, Jonathan Tan, sokcevic

On Tue, Oct 22, 2024 at 05:28:05PM -0700, Emily Shaffer wrote:
> This change is also still in RFC, for two reasons:
>
> First, it's intermittently failing tests for me locally, in weirdly
> flaky ways:
>
> - t0410-partial-clone.sh fails when I run it from prove, but passes when
>   I run it manually, every time.
> - t5601-clone.sh and t5505-remote.sh fail nonsensically on `rm -rf` that
>   should succeed (and does succeed if I stop the test with test_pause),
>   which makes me think there's something else borked in my setup, but
>   I'm not sure what.
> - t5616-partial-clone.sh actually does fail in a way that I could see
>   having to do with this change (since I guess we might download more
>   packs than usual), but I was so confused by the other two errors I
>   haven't looked closely yet.
>
> And secondly, I didn't write tests verifying the breakage and that this
> change fixes it yet, either.
>
> I'm going to work on both those things in the background, but I wanted
> to get the description and RFC out early so that folks could take a look
> and we could decide which approach is best.

I am a little confused. Here you say that this patch is still in RFC,
but the subject line dropped the RFC present in the first round. What is
the state of this patch's readiness?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2] fetch-pack: don't mark COMPLETE unless we have the full object
  2024-10-23 20:11   ` Taylor Blau
@ 2024-10-28 22:55     ` Jonathan Tan
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Tan @ 2024-10-28 22:55 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Jonathan Tan, Emily Shaffer, git, Calvin Wan, Han Young, sokcevic

Taylor Blau <me@ttaylorr.com> writes:
> > I'm going to work on both those things in the background, but I wanted
> > to get the description and RFC out early so that folks could take a look
> > and we could decide which approach is best.
> 
> I am a little confused. Here you say that this patch is still in RFC,
> but the subject line dropped the RFC present in the first round. What is
> the state of this patch's readiness?
> 
> Thanks,
> Taylor

As Emily said [1], I'll be taking over driving this patch.

The tl;dr is this patch is not ready, so I think you (the interim
maintainer) can drop it.

This patch strives to avoid marking missing objects as COMPLETE by doing
a check in mark_complete(), but deref_without_lazy_fetch() already makes
such a check (note how it can return NULL; also its name implies that
it knows something about missing objects, namely that it says it won't
lazy fetch them) so the question should be: why doesn't it return NULL
when an object is missing? It turns out that it first checks the commit
graph file and if it's there, then it's considered to be present, so
it does not fetch at all. However there are code paths during commit
graph writing (executed after every fetch, in builtin/fetch.c) that
access the object store directly without going through the commit graph
file (search for "object_info" in commit-graph.c), and those functions
perform lazy fetches when the object is missing. So there's an infinite
loop of "commit graph writer reads X" -> "fetch X" (nothing gets
fetched) -> "write new commit graph, as we always do after a fetch" ->
"commit graph writer reads X" -> ...

So my initial proposal to not mark objects as COMPLETE if they are
missing does not work, because they are already not marked as COMPLETE
if they are missing. One could say that we should check both the commit
graph file and the object store (or, perhaps even better, only the
object store) before stating that an object is present, but I think
that Git already assumes in some places that a commit is present merely
by its presence in the commit graph file, and it's not worth changing
this design.

One solution to fix this is to make the commit graph writer never
lazy-fetch. This closes us off to being able to have missing commits
in a partial clone (at least, if we also want to use the commit graph
file). This might be a reasonable thing to do - at least, partial clone
has been around for a few years and we've not made many concrete steps
towards that - but I feel that we shouldn't close ourselves off if
there's an alternative.

The alternative I'm currently thinking of is to detect if we didn't
fetch any packfiles, and if we didn't, don't write a commit graph
(and don't GC), much like we do when we have --negotiate-only. (A
packfile-less fetch still can cause refs to be rewritten and thus reduce
the number of reachable objects, thus enabling a smaller commit graph to
be written and some objects to be GC-ed, but I think that this situation
still doesn't warrant commit graph writing and/or GC - we can just do
those next time.) The main issue is that we don't always know whether
a pack is written or not - in particular, if we use something other
than "connect" or "stateless-connect" on a remote helper, we won't know
if a packfile was sent. We can solve this by (1) only write the commit
graph and GC if we know for sure that a packfile was sent, or (2) write
the commit graph and GC unless we know for sure that a packfile was
not sent. I'm leaning towards (1) because it seems more conceptually
coherent even though it is a change of behavior (from auto commit graph
and GC to no), because I think that the repos that need scalability
the most already use protocol v2 during fetching (which does require
"connect" or "stateless-connect" from remote helpers, so we're covered
here), but am OK with (2) as well.

Feel free to let me know if you have any ideas. In the meantime I'll
look at (1).

[1] https://lore.kernel.org/git/20241023002806.367082-1-emilyshaffer@google.com/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 0/2] When fetching, warn if in commit graph but not obj db
  2024-10-03 22:35 [RFC PATCH] promisor-remote: always JIT fetch with --refetch Emily Shaffer
  2024-10-06 22:43 ` Junio C Hamano
  2024-10-23  0:28 ` [PATCH v2] fetch-pack: don't mark COMPLETE unless we have the full object Emily Shaffer
@ 2024-10-29 21:11 ` Jonathan Tan
  2024-10-29 21:11   ` [PATCH 1/2] Revert "fetch-pack: add a deref_without_lazy_fetch_extended()" Jonathan Tan
                     ` (4 more replies)
  2 siblings, 5 replies; 38+ messages in thread
From: Jonathan Tan @ 2024-10-29 21:11 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

I mentioned previously [1] the possibility of not running maintenance
steps (commit graph writing and "git maintenance") if no packs were
fetched, but looking at things again, I think that we shouldn't do
that - in particular, if I ran "git fetch --refetch", I would fully
expect the objects to be repacked, even if Git wasn't able to detect
conclusively whether a pack was transmitted.

So I went back to my original idea of detecting when an object is
missing. In trying to balance the concerns of both doing something as
reasonable as possible in such a repo corruption case, and not slowing
down and/or unnecessarily complicating the main code flow, I decided
to detect when an object is present in the commit graph but not in the
object DB, and to limit this detection for objects specified in the
fetch refspec.

Upon detection, we can't fix it due to reasons mentioned in the commit
message, so I decided to print a warning. An alternate option is to make
it a fatal error (instead of a warning) if an object is detected to be
in the commit graph but not the object DB. I haven't thought through the
ramifications of that, though.

[1] https://lore.kernel.org/git/20241028225504.4151804-1-jonathantanmy@google.com/

Jonathan Tan (2):
  Revert "fetch-pack: add a deref_without_lazy_fetch_extended()"
  fetch-pack: warn if in commit graph but not obj db

 fetch-pack.c | 45 +++++++++++++++++++++++++--------------------
 object.h     |  2 +-
 2 files changed, 26 insertions(+), 21 deletions(-)

-- 
2.47.0.163.g1226f6d8fa-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 1/2] Revert "fetch-pack: add a deref_without_lazy_fetch_extended()"
  2024-10-29 21:11 ` [PATCH 0/2] When fetching, warn if in commit graph but not obj db Jonathan Tan
@ 2024-10-29 21:11   ` Jonathan Tan
  2024-10-30 21:22     ` Josh Steadmon
  2024-10-29 21:11   ` [PATCH 2/2] fetch-pack: warn if in commit graph but not obj db Jonathan Tan
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 38+ messages in thread
From: Jonathan Tan @ 2024-10-29 21:11 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

This reverts commit a6e65fb39caf18259c660c1c7910d5bf80bc15cb.

The commit message of that commit mentions that the new function "will
be used for the bundle-uri client in a subsequent commit", but it seems
that eventually it wasn't used.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 fetch-pack.c | 25 +++++++------------------
 1 file changed, 7 insertions(+), 18 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index f752da93a8..6728a0d2f5 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -122,12 +122,11 @@ static void for_each_cached_alternate(struct fetch_negotiator *negotiator,
 		cb(negotiator, cache.items[i]);
 }
 
-static struct commit *deref_without_lazy_fetch_extended(const struct object_id *oid,
-							int mark_tags_complete,
-							enum object_type *type,
-							unsigned int oi_flags)
+static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
+					       int mark_tags_complete)
 {
-	struct object_info info = { .typep = type };
+	enum object_type type;
+	struct object_info info = { .typep = &type };
 	struct commit *commit;
 
 	commit = lookup_commit_in_graph(the_repository, oid);
@@ -136,9 +135,9 @@ static struct commit *deref_without_lazy_fetch_extended(const struct object_id *
 
 	while (1) {
 		if (oid_object_info_extended(the_repository, oid, &info,
-					     oi_flags))
+					     OBJECT_INFO_SKIP_FETCH_OBJECT | OBJECT_INFO_QUICK))
 			return NULL;
-		if (*type == OBJ_TAG) {
+		if (type == OBJ_TAG) {
 			struct tag *tag = (struct tag *)
 				parse_object(the_repository, oid);
 
@@ -152,7 +151,7 @@ static struct commit *deref_without_lazy_fetch_extended(const struct object_id *
 		}
 	}
 
-	if (*type == OBJ_COMMIT) {
+	if (type == OBJ_COMMIT) {
 		struct commit *commit = lookup_commit(the_repository, oid);
 		if (!commit || repo_parse_commit(the_repository, commit))
 			return NULL;
@@ -162,16 +161,6 @@ static struct commit *deref_without_lazy_fetch_extended(const struct object_id *
 	return NULL;
 }
 
-
-static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
-					       int mark_tags_complete)
-{
-	enum object_type type;
-	unsigned flags = OBJECT_INFO_SKIP_FETCH_OBJECT | OBJECT_INFO_QUICK;
-	return deref_without_lazy_fetch_extended(oid, mark_tags_complete,
-						 &type, flags);
-}
-
 static int rev_list_insert_ref(struct fetch_negotiator *negotiator,
 			       const struct object_id *oid)
 {
-- 
2.47.0.163.g1226f6d8fa-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 2/2] fetch-pack: warn if in commit graph but not obj db
  2024-10-29 21:11 ` [PATCH 0/2] When fetching, warn if in commit graph but not obj db Jonathan Tan
  2024-10-29 21:11   ` [PATCH 1/2] Revert "fetch-pack: add a deref_without_lazy_fetch_extended()" Jonathan Tan
@ 2024-10-29 21:11   ` Jonathan Tan
  2024-10-30 21:22     ` Josh Steadmon
  2024-10-31 20:59     ` Taylor Blau
  2024-10-30 21:22   ` [PATCH 0/2] When fetching, " Josh Steadmon
                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 38+ messages in thread
From: Jonathan Tan @ 2024-10-29 21:11 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

When fetching, there is a step in which sought objects are first checked
against the local repository; only objects that are not in the local
repository are then fetched. This check first looks up the commit graph
file, and returns "present" if the object is in there.

However, the action of first looking up the commit graph file is not
done everywhere in Git, especially if the type of the object at the time
of lookup is not known. This means that in a repo corruption situation,
a user may encounter an "object missing" error, attempt to fetch it, and
still encounter the same error later when they reattempt their original
action, because the object is present in the commit graph file but not in
the object DB.

Therefore, detect when this occurs and print a warning. (Note that
we cannot proceed to include this object in the list of objects to
be fetched without changing at least the fetch negotiation code:
what would happen is that the client will send "want X" and "have X"
and when I tested at $DAYJOB with a work server that uses JGit, the
server reasonably returned an empty packfile. And changing the fetch
negotiation code to only use the object DB when deciding what to report
as "have" would be an unnecessary slowdown, I think.)

This was discovered when a lazy fetch of a missing commit completed with
nothing actually fetched, and the writing of the commit graph file after
every fetch then attempted to read said missing commit, triggering a
lazy fetch of said missing commit, resulting in an infinite loop with no
user-visible indication (until they check the list of processes running
on their computer). With this fix, at least a warning message will be
printed. Note that although the repo corruption we discovered was caused
by a bug in GC in a partial clone, the behavior that this patch teaches
Git to warn about applies to any repo with commit graph enabled and with
a missing commit, whether it is a partial clone or not.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 fetch-pack.c | 22 +++++++++++++++++++---
 object.h     |  2 +-
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 6728a0d2f5..5a0020366b 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -57,6 +57,7 @@ static struct string_list uri_protocols = STRING_LIST_INIT_DUP;
 #define ALTERNATE	(1U << 1)
 #define COMMON		(1U << 6)
 #define REACH_SCRATCH	(1U << 7)
+#define COMPLETE_FROM_COMMIT_GRAPH	(1U << 8)
 
 /*
  * After sending this many "have"s if we do not get any new ACK , we
@@ -123,15 +124,18 @@ static void for_each_cached_alternate(struct fetch_negotiator *negotiator,
 }
 
 static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
-					       int mark_tags_complete)
+					       int mark_additional_complete_information)
 {
 	enum object_type type;
 	struct object_info info = { .typep = &type };
 	struct commit *commit;
 
 	commit = lookup_commit_in_graph(the_repository, oid);
-	if (commit)
+	if (commit) {
+		if (mark_additional_complete_information)
+			commit->object.flags |= COMPLETE_FROM_COMMIT_GRAPH;
 		return commit;
+	}
 
 	while (1) {
 		if (oid_object_info_extended(the_repository, oid, &info,
@@ -143,7 +147,7 @@ static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
 
 			if (!tag->tagged)
 				return NULL;
-			if (mark_tags_complete)
+			if (mark_additional_complete_information)
 				tag->object.flags |= COMPLETE;
 			oid = &tag->tagged->oid;
 		} else {
@@ -809,6 +813,14 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
 	save_commit_buffer = old_save_commit_buffer;
 }
 
+static void warn_in_commit_graph_only(const struct object_id *oid)
+{
+	warning(_("You are attempting to fetch %s, which is in the commit graph file but not in the object database."),
+		oid_to_hex(oid));
+	warning(_("This is probably due to repo corruption."));
+	warning(_("If you are attempting to repair this repo corruption by refetching the missing object, use 'git fetch --refetch' with the missing object."));
+}
+
 /*
  * Returns 1 if every object pointed to by the given remote refs is available
  * locally and reachable from a local ref, and 0 otherwise.
@@ -830,6 +842,10 @@ static int everything_local(struct fetch_pack_args *args,
 				      ref->name);
 			continue;
 		}
+		if (o->flags & COMPLETE_FROM_COMMIT_GRAPH) {
+			if (!has_object(the_repository, remote, 0))
+				warn_in_commit_graph_only(remote);
+		}
 		print_verbose(args, _("already have %s (%s)"), oid_to_hex(remote),
 			      ref->name);
 	}
diff --git a/object.h b/object.h
index 17f32f1103..196e489253 100644
--- a/object.h
+++ b/object.h
@@ -65,7 +65,7 @@ void object_array_init(struct object_array *array);
 /*
  * object flag allocation:
  * revision.h:               0---------10         15               23------27
- * fetch-pack.c:             01    67
+ * fetch-pack.c:             01    6-8
  * negotiator/default.c:       2--5
  * walker.c:                 0-2
  * upload-pack.c:                4       11-----14  16-----19
-- 
2.47.0.163.g1226f6d8fa-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/2] When fetching, warn if in commit graph but not obj db
  2024-10-29 21:11 ` [PATCH 0/2] When fetching, warn if in commit graph but not obj db Jonathan Tan
  2024-10-29 21:11   ` [PATCH 1/2] Revert "fetch-pack: add a deref_without_lazy_fetch_extended()" Jonathan Tan
  2024-10-29 21:11   ` [PATCH 2/2] fetch-pack: warn if in commit graph but not obj db Jonathan Tan
@ 2024-10-30 21:22   ` Josh Steadmon
  2024-10-31 21:18   ` [PATCH v2 0/2] When fetching, die " Jonathan Tan
  2024-11-05 19:24   ` [PATCH v3 " Jonathan Tan
  4 siblings, 0 replies; 38+ messages in thread
From: Josh Steadmon @ 2024-10-30 21:22 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On 2024.10.29 14:11, Jonathan Tan wrote:
> I mentioned previously [1] the possibility of not running maintenance
> steps (commit graph writing and "git maintenance") if no packs were
> fetched, but looking at things again, I think that we shouldn't do
> that - in particular, if I ran "git fetch --refetch", I would fully
> expect the objects to be repacked, even if Git wasn't able to detect
> conclusively whether a pack was transmitted.
> 
> [1] https://lore.kernel.org/git/20241028225504.4151804-1-jonathantanmy@google.com/

A note for upstream, because I'm not sure it was ever explicitly
mentioned: at $DAYJOB, we saw this fetch recursion error as a
side-effect of the erroneous GC of local commits discussed at [2].

[2] https://lore.kernel.org/git/cover.1729792911.git.jonathantanmy@google.com/

> So I went back to my original idea of detecting when an object is
> missing. In trying to balance the concerns of both doing something as
> reasonable as possible in such a repo corruption case, and not slowing
> down and/or unnecessarily complicating the main code flow, I decided
> to detect when an object is present in the commit graph but not in the
> object DB, and to limit this detection for objects specified in the
> fetch refspec.
> 
> Upon detection, we can't fix it due to reasons mentioned in the commit
> message, so I decided to print a warning. An alternate option is to make
> it a fatal error (instead of a warning) if an object is detected to be
> in the commit graph but not the object DB. I haven't thought through the
> ramifications of that, though.

At first glance, I lean towards making this a fatal error, but I'll try
thinking out loud a bit:

First, we believe that [2] above should fix the root cause of the
particular case we saw at $DAYJOB (hopefully this type of error doesn't
have multiple root causes). So we expect to basically never encounter
this error again after [2] is merged and rolled out, and all existing
cases of repo corruption have been repaired. However, interacting with a
broken repo even with a client that includes [2] would still hit this
condition and issue a warning.

With the current implementation, fetching in a corrupt repo would still
cause git-fetch to infinitely recurse, and therefore would repeatedly
print the same error message to the console, until either the user
noticed, or we fail to launch a new git-fetch process due to resource
exhaustion.

I don't see any reason why the above situation is more friendly or
desirable than exiting (with the same error message) as soon as we
detect this type of corruption. However, I don't feel super strongly
about it. If the rest of the list is OK with repeated error messages,
then I can live with it.

> Jonathan Tan (2):
>   Revert "fetch-pack: add a deref_without_lazy_fetch_extended()"
>   fetch-pack: warn if in commit graph but not obj db
> 
>  fetch-pack.c | 45 +++++++++++++++++++++++++--------------------
>  object.h     |  2 +-
>  2 files changed, 26 insertions(+), 21 deletions(-)
> 
> -- 
> 2.47.0.163.g1226f6d8fa-goog
> 
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] Revert "fetch-pack: add a deref_without_lazy_fetch_extended()"
  2024-10-29 21:11   ` [PATCH 1/2] Revert "fetch-pack: add a deref_without_lazy_fetch_extended()" Jonathan Tan
@ 2024-10-30 21:22     ` Josh Steadmon
  0 siblings, 0 replies; 38+ messages in thread
From: Josh Steadmon @ 2024-10-30 21:22 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On 2024.10.29 14:11, Jonathan Tan wrote:
> This reverts commit a6e65fb39caf18259c660c1c7910d5bf80bc15cb.
> 
> The commit message of that commit mentions that the new function "will
> be used for the bundle-uri client in a subsequent commit", but it seems
> that eventually it wasn't used.
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  fetch-pack.c | 25 +++++++------------------
>  1 file changed, 7 insertions(+), 18 deletions(-)

Nit: can you mention in the commit description that this cleanup
simplifies a later patch to detect a case of repo corruption?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/2] fetch-pack: warn if in commit graph but not obj db
  2024-10-29 21:11   ` [PATCH 2/2] fetch-pack: warn if in commit graph but not obj db Jonathan Tan
@ 2024-10-30 21:22     ` Josh Steadmon
  2024-10-31 21:23       ` Jonathan Tan
  2024-10-31 20:59     ` Taylor Blau
  1 sibling, 1 reply; 38+ messages in thread
From: Josh Steadmon @ 2024-10-30 21:22 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On 2024.10.29 14:11, Jonathan Tan wrote:
> When fetching, there is a step in which sought objects are first checked
> against the local repository; only objects that are not in the local
> repository are then fetched. This check first looks up the commit graph
> file, and returns "present" if the object is in there.
> 
> However, the action of first looking up the commit graph file is not
> done everywhere in Git, especially if the type of the object at the time
> of lookup is not known. This means that in a repo corruption situation,
> a user may encounter an "object missing" error, attempt to fetch it, and
> still encounter the same error later when they reattempt their original
> action, because the object is present in the commit graph file but not in
> the object DB.
> 
> Therefore, detect when this occurs and print a warning. (Note that
> we cannot proceed to include this object in the list of objects to
> be fetched without changing at least the fetch negotiation code:
> what would happen is that the client will send "want X" and "have X"
> and when I tested at $DAYJOB with a work server that uses JGit, the
> server reasonably returned an empty packfile. And changing the fetch
> negotiation code to only use the object DB when deciding what to report
> as "have" would be an unnecessary slowdown, I think.)
> 
> This was discovered when a lazy fetch of a missing commit completed with
> nothing actually fetched, and the writing of the commit graph file after
> every fetch then attempted to read said missing commit, triggering a
> lazy fetch of said missing commit, resulting in an infinite loop with no
> user-visible indication (until they check the list of processes running
> on their computer). With this fix, at least a warning message will be
> printed. Note that although the repo corruption we discovered was caused
> by a bug in GC in a partial clone, the behavior that this patch teaches
> Git to warn about applies to any repo with commit graph enabled and with
> a missing commit, whether it is a partial clone or not.
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  fetch-pack.c | 22 +++++++++++++++++++---
>  object.h     |  2 +-
>  2 files changed, 20 insertions(+), 4 deletions(-)
> 
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 6728a0d2f5..5a0020366b 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -57,6 +57,7 @@ static struct string_list uri_protocols = STRING_LIST_INIT_DUP;
>  #define ALTERNATE	(1U << 1)
>  #define COMMON		(1U << 6)
>  #define REACH_SCRATCH	(1U << 7)
> +#define COMPLETE_FROM_COMMIT_GRAPH	(1U << 8)

We're defining a new flag, and we note it in object.h as well below, so
looks good so far.


>  /*
>   * After sending this many "have"s if we do not get any new ACK , we
> @@ -123,15 +124,18 @@ static void for_each_cached_alternate(struct fetch_negotiator *negotiator,
>  }
>  
>  static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
> -					       int mark_tags_complete)
> +					       int mark_additional_complete_information)

We're already marking some completion flags here, so we're just making
the parameter name more descriptive, OK.


>  {
>  	enum object_type type;
>  	struct object_info info = { .typep = &type };
>  	struct commit *commit;
>  
>  	commit = lookup_commit_in_graph(the_repository, oid);
> -	if (commit)
> +	if (commit) {
> +		if (mark_additional_complete_information)
> +			commit->object.flags |= COMPLETE_FROM_COMMIT_GRAPH;
>  		return commit;
> +	}

We already have a case where we're checking the commit graph, so we can
also mark the commit complete here... well, not the original "COMPLETE"
flag since we don't want to change behavior, but our new
COMPLETE_FROM_COMMIT_GRAPH flag. Sounds good.


>  
>  	while (1) {
>  		if (oid_object_info_extended(the_repository, oid, &info,
> @@ -143,7 +147,7 @@ static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
>  
>  			if (!tag->tagged)
>  				return NULL;
> -			if (mark_tags_complete)
> +			if (mark_additional_complete_information)
>  				tag->object.flags |= COMPLETE;
>  			oid = &tag->tagged->oid;
>  		} else {
> @@ -809,6 +813,14 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
>  	save_commit_buffer = old_save_commit_buffer;
>  }
>  
> +static void warn_in_commit_graph_only(const struct object_id *oid)
> +{
> +	warning(_("You are attempting to fetch %s, which is in the commit graph file but not in the object database."),
> +		oid_to_hex(oid));
> +	warning(_("This is probably due to repo corruption."));
> +	warning(_("If you are attempting to repair this repo corruption by refetching the missing object, use 'git fetch --refetch' with the missing object."));
> +}
> +

Here's the new warning. As mentioned in my reply to the cover letter, I
feel like it makes more sense to die(), but I don't feel too strongly
about it.


>  /*
>   * Returns 1 if every object pointed to by the given remote refs is available
>   * locally and reachable from a local ref, and 0 otherwise.
> @@ -830,6 +842,10 @@ static int everything_local(struct fetch_pack_args *args,
>  				      ref->name);
>  			continue;
>  		}
> +		if (o->flags & COMPLETE_FROM_COMMIT_GRAPH) {
> +			if (!has_object(the_repository, remote, 0))
> +				warn_in_commit_graph_only(remote);
> +		}

And now that we're checking what's local, we issue our warning if we
have an object missing from the DB but mentioned in the commit graph.
Seems fine, although I wonder if it makes more sense to fail earlier. It
looks like the only place we do the
`mark_additional_complete_information` checks is in `mark_complete()`,
so should we just check this condition there? No strong feelings either
way, just curious.


>  		print_verbose(args, _("already have %s (%s)"), oid_to_hex(remote),
>  			      ref->name);
>  	}
> diff --git a/object.h b/object.h
> index 17f32f1103..196e489253 100644
> --- a/object.h
> +++ b/object.h
> @@ -65,7 +65,7 @@ void object_array_init(struct object_array *array);
>  /*
>   * object flag allocation:
>   * revision.h:               0---------10         15               23------27
> - * fetch-pack.c:             01    67
> + * fetch-pack.c:             01    6-8
>   * negotiator/default.c:       2--5
>   * walker.c:                 0-2
>   * upload-pack.c:                4       11-----14  16-----19
> -- 
> 2.47.0.163.g1226f6d8fa-goog
> 
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/2] fetch-pack: warn if in commit graph but not obj db
  2024-10-29 21:11   ` [PATCH 2/2] fetch-pack: warn if in commit graph but not obj db Jonathan Tan
  2024-10-30 21:22     ` Josh Steadmon
@ 2024-10-31 20:59     ` Taylor Blau
  2024-10-31 21:43       ` Jonathan Tan
  1 sibling, 1 reply; 38+ messages in thread
From: Taylor Blau @ 2024-10-31 20:59 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Tue, Oct 29, 2024 at 02:11:05PM -0700, Jonathan Tan wrote:
> When fetching, there is a step in which sought objects are first checked
> against the local repository; only objects that are not in the local
> repository are then fetched. This check first looks up the commit graph
> file, and returns "present" if the object is in there.

OK.

> However, the action of first looking up the commit graph file is not
> done everywhere in Git, especially if the type of the object at the time
> of lookup is not known. This means that in a repo corruption situation,
> a user may encounter an "object missing" error, attempt to fetch it, and
> still encounter the same error later when they reattempt their original
> action, because the object is present in the commit graph file but not in
> the object DB.

I think the type of repository corruption here may be underspecified.

You say that we have some object, say X, whose type is not known. So we
don't load the commit-graph, realize that X is missing, and then try and
fetch it. In this scenario, is X actually in the commit-graph, but not
in the object database? Further, if X is in the commit-graph, I assume
we do not look it up there because we first try and find its type, which
fails, so we assume we don't have it (despite it appearing corruptly in
the commit-graph)?

I think that matches the behavior you're describing, but I want to make
sure that I'm not thinking of something else.

> This was discovered when a lazy fetch of a missing commit completed with
> nothing actually fetched, and the writing of the commit graph file after
> every fetch then attempted to read said missing commit, triggering a
> lazy fetch of said missing commit, resulting in an infinite loop with no
> user-visible indication (until they check the list of processes running
> on their computer).

Yuck :-).

> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  fetch-pack.c | 22 +++++++++++++++++++---
>  object.h     |  2 +-
>  2 files changed, 20 insertions(+), 4 deletions(-)
>
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 6728a0d2f5..5a0020366b 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -57,6 +57,7 @@ static struct string_list uri_protocols = STRING_LIST_INIT_DUP;
>  #define ALTERNATE	(1U << 1)
>  #define COMMON		(1U << 6)
>  #define REACH_SCRATCH	(1U << 7)
> +#define COMPLETE_FROM_COMMIT_GRAPH	(1U << 8)
>
>  /*
>   * After sending this many "have"s if we do not get any new ACK , we
> @@ -123,15 +124,18 @@ static void for_each_cached_alternate(struct fetch_negotiator *negotiator,
>  }
>
>  static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
> -					       int mark_tags_complete)
> +					       int mark_additional_complete_information)
>  {
>  	enum object_type type;
>  	struct object_info info = { .typep = &type };
>  	struct commit *commit;
>
>  	commit = lookup_commit_in_graph(the_repository, oid);
> -	if (commit)
> +	if (commit) {
> +		if (mark_additional_complete_information)
> +			commit->object.flags |= COMPLETE_FROM_COMMIT_GRAPH;
>  		return commit;
> +	}
>
>  	while (1) {
>  		if (oid_object_info_extended(the_repository, oid, &info,
> @@ -143,7 +147,7 @@ static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
>
>  			if (!tag->tagged)
>  				return NULL;
> -			if (mark_tags_complete)
> +			if (mark_additional_complete_information)
>  				tag->object.flags |= COMPLETE;
>  			oid = &tag->tagged->oid;
>  		} else {
> @@ -809,6 +813,14 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
>  	save_commit_buffer = old_save_commit_buffer;
>  }
>
> +static void warn_in_commit_graph_only(const struct object_id *oid)
> +{
> +	warning(_("You are attempting to fetch %s, which is in the commit graph file but not in the object database."),
> +		oid_to_hex(oid));
> +	warning(_("This is probably due to repo corruption."));
> +	warning(_("If you are attempting to repair this repo corruption by refetching the missing object, use 'git fetch --refetch' with the missing object."));
> +}
> +
>  /*
>   * Returns 1 if every object pointed to by the given remote refs is available
>   * locally and reachable from a local ref, and 0 otherwise.
> @@ -830,6 +842,10 @@ static int everything_local(struct fetch_pack_args *args,
>  				      ref->name);
>  			continue;
>  		}
> +		if (o->flags & COMPLETE_FROM_COMMIT_GRAPH) {
> +			if (!has_object(the_repository, remote, 0))
> +				warn_in_commit_graph_only(remote);

You discuss this a little bit in your commit message, but I wonder if we
should just die() here. I feel like we're trying to work around a
situation where the commit-graph is obviously broken because it refers
to commit objects that don't actually exist in the object store.

A few thoughts in this area:

  - What situation provokes this to be true? I could imagine there is
    some bug that we don't fully have a grasp of. But I wonder if it is
    even easier to provoke than that, say by pruning some objects out of
    the object store, then not rewriting the commit-graph, leaving some
    of the references dangling.

  - Does 'git fsck' catch this case within the commit-graph?

  - Are the other areas of the code that rely on the assumption that all
    entries in the commit-graph actually exist on disk? If so, are they
    similarly broken?

Another thought about this whole thing is that we essentially have a
code path that says: "I found this object from the commit-graph, but
don't know if I actually have it on disk, so mark it to be checked later
via has_object()".

I wonder if it would be more straightforward to replace the call to
lookup_commit_in_graph() with a direct call to has_object() in the
deref_without_lazy_fetch() function, which I think would both (a)
eliminate the need for a new flag bit to be allocated, and (b) prevent
looking up the object twice.

Thoughts?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 0/2] When fetching, die if in commit graph but not obj db
  2024-10-29 21:11 ` [PATCH 0/2] When fetching, warn if in commit graph but not obj db Jonathan Tan
                     ` (2 preceding siblings ...)
  2024-10-30 21:22   ` [PATCH 0/2] When fetching, " Josh Steadmon
@ 2024-10-31 21:18   ` Jonathan Tan
  2024-10-31 21:19     ` [PATCH v2 1/2] Revert "fetch-pack: add a deref_without_lazy_fetch_extended()" Jonathan Tan
                       ` (2 more replies)
  2024-11-05 19:24   ` [PATCH v3 " Jonathan Tan
  4 siblings, 3 replies; 38+ messages in thread
From: Jonathan Tan @ 2024-10-31 21:18 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, steadmon, me, hanxin.hx

Thanks everyone for your review comments. I've updated the patch 1
commit message as Josh requested. I'll reply individually to comments on
patch 2.

Jonathan Tan (2):
  Revert "fetch-pack: add a deref_without_lazy_fetch_extended()"
  fetch-pack: warn if in commit graph but not obj db

 fetch-pack.c                               | 42 +++++++++++-----------
 t/t5330-no-lazy-fetch-with-commit-graph.sh |  2 +-
 2 files changed, 23 insertions(+), 21 deletions(-)

Range-diff against v1:
1:  4dea8933cf ! 1:  34e87b8388 Revert "fetch-pack: add a deref_without_lazy_fetch_extended()"
    @@ Commit message
     
         This reverts commit a6e65fb39caf18259c660c1c7910d5bf80bc15cb.
     
    +    This revert simplifies the next patch in this patch set.
    +
         The commit message of that commit mentions that the new function "will
         be used for the bundle-uri client in a subsequent commit", but it seems
         that eventually it wasn't used.
2:  1027ff2cb7 ! 2:  631b9a8677 fetch-pack: warn if in commit graph but not obj db
    @@ Commit message
         action, because the object is present in the commit graph file but not in
         the object DB.
     
    -    Therefore, detect when this occurs and print a warning. (Note that
    -    we cannot proceed to include this object in the list of objects to
    -    be fetched without changing at least the fetch negotiation code:
    -    what would happen is that the client will send "want X" and "have X"
    -    and when I tested at $DAYJOB with a work server that uses JGit, the
    -    server reasonably returned an empty packfile. And changing the fetch
    -    negotiation code to only use the object DB when deciding what to report
    -    as "have" would be an unnecessary slowdown, I think.)
    +    Therefore, make it a fatal error when this occurs. (Note that we cannot
    +    proceed to include this object in the list of objects to be fetched
    +    without changing at least the fetch negotiation code: what would happen
    +    is that the client will send "want X" and "have X" and when I tested
    +    at $DAYJOB with a work server that uses JGit, the server reasonably
    +    returned an empty packfile. And changing the fetch negotiation code to
    +    only use the object DB when deciding what to report as "have" would be
    +    an unnecessary slowdown, I think.)
     
         This was discovered when a lazy fetch of a missing commit completed with
         nothing actually fetched, and the writing of the commit graph file after
         every fetch then attempted to read said missing commit, triggering a
         lazy fetch of said missing commit, resulting in an infinite loop with no
         user-visible indication (until they check the list of processes running
    -    on their computer). With this fix, at least a warning message will be
    -    printed. Note that although the repo corruption we discovered was caused
    -    by a bug in GC in a partial clone, the behavior that this patch teaches
    -    Git to warn about applies to any repo with commit graph enabled and with
    -    a missing commit, whether it is a partial clone or not.
    +    on their computer). With this fix, there is no infinite loop. Note that
    +    although the repo corruption we discovered was caused by a bug in GC in
    +    a partial clone, the behavior that this patch teaches Git to warn about
    +    applies to any repo with commit graph enabled and with a missing commit,
    +    whether it is a partial clone or not.
    +
    +    t5330, introduced in 3a1ea94a49 (commit-graph.c: no lazy fetch in
    +    lookup_commit_in_graph(), 2022-07-01), tests that an interaction between
    +    fetch and the commit graph does not cause an infinite loop. This patch
    +    changes the exit code in that situation, so that test had to be changed.
     
         Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
     
      ## fetch-pack.c ##
    -@@ fetch-pack.c: static struct string_list uri_protocols = STRING_LIST_INIT_DUP;
    - #define ALTERNATE	(1U << 1)
    - #define COMMON		(1U << 6)
    - #define REACH_SCRATCH	(1U << 7)
    -+#define COMPLETE_FROM_COMMIT_GRAPH	(1U << 8)
    - 
    - /*
    -  * After sending this many "have"s if we do not get any new ACK , we
     @@ fetch-pack.c: static void for_each_cached_alternate(struct fetch_negotiator *negotiator,
    + 		cb(negotiator, cache.items[i]);
      }
      
    ++static void die_in_commit_graph_only(const struct object_id *oid)
    ++{
    ++	die(_("You are attempting to fetch %s, which is in the commit graph file but not in the object database.\n"
    ++	      "This is probably due to repo corruption.\n"
    ++	      "If you are attempting to repair this repo corruption by refetching the missing object, use 'git fetch --refetch' with the missing object."),
    ++	      oid_to_hex(oid));
    ++}
    ++
      static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
     -					       int mark_tags_complete)
    -+					       int mark_additional_complete_information)
    ++					       int mark_tags_complete_and_check_obj_db)
      {
      	enum object_type type;
      	struct object_info info = { .typep = &type };
    @@ fetch-pack.c: static void for_each_cached_alternate(struct fetch_negotiator *neg
      	commit = lookup_commit_in_graph(the_repository, oid);
     -	if (commit)
     +	if (commit) {
    -+		if (mark_additional_complete_information)
    -+			commit->object.flags |= COMPLETE_FROM_COMMIT_GRAPH;
    ++		if (mark_tags_complete_and_check_obj_db) {
    ++			if (!has_object(the_repository, oid, 0))
    ++				die_in_commit_graph_only(oid);
    ++		}
      		return commit;
     +	}
      
    @@ fetch-pack.c: static struct commit *deref_without_lazy_fetch(const struct object
      			if (!tag->tagged)
      				return NULL;
     -			if (mark_tags_complete)
    -+			if (mark_additional_complete_information)
    ++			if (mark_tags_complete_and_check_obj_db)
      				tag->object.flags |= COMPLETE;
      			oid = &tag->tagged->oid;
      		} else {
    -@@ fetch-pack.c: static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
    - 	save_commit_buffer = old_save_commit_buffer;
    - }
    - 
    -+static void warn_in_commit_graph_only(const struct object_id *oid)
    -+{
    -+	warning(_("You are attempting to fetch %s, which is in the commit graph file but not in the object database."),
    -+		oid_to_hex(oid));
    -+	warning(_("This is probably due to repo corruption."));
    -+	warning(_("If you are attempting to repair this repo corruption by refetching the missing object, use 'git fetch --refetch' with the missing object."));
    -+}
    -+
    - /*
    -  * Returns 1 if every object pointed to by the given remote refs is available
    -  * locally and reachable from a local ref, and 0 otherwise.
    -@@ fetch-pack.c: static int everything_local(struct fetch_pack_args *args,
    - 				      ref->name);
    - 			continue;
    - 		}
    -+		if (o->flags & COMPLETE_FROM_COMMIT_GRAPH) {
    -+			if (!has_object(the_repository, remote, 0))
    -+				warn_in_commit_graph_only(remote);
    -+		}
    - 		print_verbose(args, _("already have %s (%s)"), oid_to_hex(remote),
    - 			      ref->name);
    - 	}
     
    - ## object.h ##
    -@@ object.h: void object_array_init(struct object_array *array);
    - /*
    -  * object flag allocation:
    -  * revision.h:               0---------10         15               23------27
    -- * fetch-pack.c:             01    67
    -+ * fetch-pack.c:             01    6-8
    -  * negotiator/default.c:       2--5
    -  * walker.c:                 0-2
    -  * upload-pack.c:                4       11-----14  16-----19
    + ## t/t5330-no-lazy-fetch-with-commit-graph.sh ##
    +@@ t/t5330-no-lazy-fetch-with-commit-graph.sh: test_expect_success 'fetch any commit from promisor with the usage of the commit
    + 	test_commit -C with-commit any-commit &&
    + 	anycommit=$(git -C with-commit rev-parse HEAD) &&
    + 	GIT_TRACE="$(pwd)/trace.txt" \
    +-		git -C with-commit-graph fetch origin $anycommit 2>err &&
    ++		test_must_fail git -C with-commit-graph fetch origin $anycommit 2>err &&
    + 	! grep "fatal: promisor-remote: unable to fork off fetch subprocess" err &&
    + 	grep "git fetch origin" trace.txt >actual &&
    + 	test_line_count = 1 actual
-- 
2.47.0.163.g1226f6d8fa-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 1/2] Revert "fetch-pack: add a deref_without_lazy_fetch_extended()"
  2024-10-31 21:18   ` [PATCH v2 0/2] When fetching, die " Jonathan Tan
@ 2024-10-31 21:19     ` Jonathan Tan
  2024-10-31 21:19     ` [PATCH v2 2/2] fetch-pack: warn if in commit graph but not obj db Jonathan Tan
  2024-10-31 22:33     ` [PATCH v2 0/2] When fetching, die " Josh Steadmon
  2 siblings, 0 replies; 38+ messages in thread
From: Jonathan Tan @ 2024-10-31 21:19 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, steadmon, me, hanxin.hx

This reverts commit a6e65fb39caf18259c660c1c7910d5bf80bc15cb.

This revert simplifies the next patch in this patch set.

The commit message of that commit mentions that the new function "will
be used for the bundle-uri client in a subsequent commit", but it seems
that eventually it wasn't used.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 fetch-pack.c | 25 +++++++------------------
 1 file changed, 7 insertions(+), 18 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index f752da93a8..6728a0d2f5 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -122,12 +122,11 @@ static void for_each_cached_alternate(struct fetch_negotiator *negotiator,
 		cb(negotiator, cache.items[i]);
 }
 
-static struct commit *deref_without_lazy_fetch_extended(const struct object_id *oid,
-							int mark_tags_complete,
-							enum object_type *type,
-							unsigned int oi_flags)
+static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
+					       int mark_tags_complete)
 {
-	struct object_info info = { .typep = type };
+	enum object_type type;
+	struct object_info info = { .typep = &type };
 	struct commit *commit;
 
 	commit = lookup_commit_in_graph(the_repository, oid);
@@ -136,9 +135,9 @@ static struct commit *deref_without_lazy_fetch_extended(const struct object_id *
 
 	while (1) {
 		if (oid_object_info_extended(the_repository, oid, &info,
-					     oi_flags))
+					     OBJECT_INFO_SKIP_FETCH_OBJECT | OBJECT_INFO_QUICK))
 			return NULL;
-		if (*type == OBJ_TAG) {
+		if (type == OBJ_TAG) {
 			struct tag *tag = (struct tag *)
 				parse_object(the_repository, oid);
 
@@ -152,7 +151,7 @@ static struct commit *deref_without_lazy_fetch_extended(const struct object_id *
 		}
 	}
 
-	if (*type == OBJ_COMMIT) {
+	if (type == OBJ_COMMIT) {
 		struct commit *commit = lookup_commit(the_repository, oid);
 		if (!commit || repo_parse_commit(the_repository, commit))
 			return NULL;
@@ -162,16 +161,6 @@ static struct commit *deref_without_lazy_fetch_extended(const struct object_id *
 	return NULL;
 }
 
-
-static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
-					       int mark_tags_complete)
-{
-	enum object_type type;
-	unsigned flags = OBJECT_INFO_SKIP_FETCH_OBJECT | OBJECT_INFO_QUICK;
-	return deref_without_lazy_fetch_extended(oid, mark_tags_complete,
-						 &type, flags);
-}
-
 static int rev_list_insert_ref(struct fetch_negotiator *negotiator,
 			       const struct object_id *oid)
 {
-- 
2.47.0.163.g1226f6d8fa-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 2/2] fetch-pack: warn if in commit graph but not obj db
  2024-10-31 21:18   ` [PATCH v2 0/2] When fetching, die " Jonathan Tan
  2024-10-31 21:19     ` [PATCH v2 1/2] Revert "fetch-pack: add a deref_without_lazy_fetch_extended()" Jonathan Tan
@ 2024-10-31 21:19     ` Jonathan Tan
  2024-11-01  2:22       ` Junio C Hamano
  2024-11-01 15:18       ` Taylor Blau
  2024-10-31 22:33     ` [PATCH v2 0/2] When fetching, die " Josh Steadmon
  2 siblings, 2 replies; 38+ messages in thread
From: Jonathan Tan @ 2024-10-31 21:19 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, steadmon, me, hanxin.hx

When fetching, there is a step in which sought objects are first checked
against the local repository; only objects that are not in the local
repository are then fetched. This check first looks up the commit graph
file, and returns "present" if the object is in there.

However, the action of first looking up the commit graph file is not
done everywhere in Git, especially if the type of the object at the time
of lookup is not known. This means that in a repo corruption situation,
a user may encounter an "object missing" error, attempt to fetch it, and
still encounter the same error later when they reattempt their original
action, because the object is present in the commit graph file but not in
the object DB.

Therefore, make it a fatal error when this occurs. (Note that we cannot
proceed to include this object in the list of objects to be fetched
without changing at least the fetch negotiation code: what would happen
is that the client will send "want X" and "have X" and when I tested
at $DAYJOB with a work server that uses JGit, the server reasonably
returned an empty packfile. And changing the fetch negotiation code to
only use the object DB when deciding what to report as "have" would be
an unnecessary slowdown, I think.)

This was discovered when a lazy fetch of a missing commit completed with
nothing actually fetched, and the writing of the commit graph file after
every fetch then attempted to read said missing commit, triggering a
lazy fetch of said missing commit, resulting in an infinite loop with no
user-visible indication (until they check the list of processes running
on their computer). With this fix, there is no infinite loop. Note that
although the repo corruption we discovered was caused by a bug in GC in
a partial clone, the behavior that this patch teaches Git to warn about
applies to any repo with commit graph enabled and with a missing commit,
whether it is a partial clone or not.

t5330, introduced in 3a1ea94a49 (commit-graph.c: no lazy fetch in
lookup_commit_in_graph(), 2022-07-01), tests that an interaction between
fetch and the commit graph does not cause an infinite loop. This patch
changes the exit code in that situation, so that test had to be changed.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 fetch-pack.c                               | 19 ++++++++++++++++---
 t/t5330-no-lazy-fetch-with-commit-graph.sh |  2 +-
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 6728a0d2f5..fe1fb3c1b7 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -122,16 +122,29 @@ static void for_each_cached_alternate(struct fetch_negotiator *negotiator,
 		cb(negotiator, cache.items[i]);
 }
 
+static void die_in_commit_graph_only(const struct object_id *oid)
+{
+	die(_("You are attempting to fetch %s, which is in the commit graph file but not in the object database.\n"
+	      "This is probably due to repo corruption.\n"
+	      "If you are attempting to repair this repo corruption by refetching the missing object, use 'git fetch --refetch' with the missing object."),
+	      oid_to_hex(oid));
+}
+
 static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
-					       int mark_tags_complete)
+					       int mark_tags_complete_and_check_obj_db)
 {
 	enum object_type type;
 	struct object_info info = { .typep = &type };
 	struct commit *commit;
 
 	commit = lookup_commit_in_graph(the_repository, oid);
-	if (commit)
+	if (commit) {
+		if (mark_tags_complete_and_check_obj_db) {
+			if (!has_object(the_repository, oid, 0))
+				die_in_commit_graph_only(oid);
+		}
 		return commit;
+	}
 
 	while (1) {
 		if (oid_object_info_extended(the_repository, oid, &info,
@@ -143,7 +156,7 @@ static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
 
 			if (!tag->tagged)
 				return NULL;
-			if (mark_tags_complete)
+			if (mark_tags_complete_and_check_obj_db)
 				tag->object.flags |= COMPLETE;
 			oid = &tag->tagged->oid;
 		} else {
diff --git a/t/t5330-no-lazy-fetch-with-commit-graph.sh b/t/t5330-no-lazy-fetch-with-commit-graph.sh
index 5eb28f0512..feccd58324 100755
--- a/t/t5330-no-lazy-fetch-with-commit-graph.sh
+++ b/t/t5330-no-lazy-fetch-with-commit-graph.sh
@@ -39,7 +39,7 @@ test_expect_success 'fetch any commit from promisor with the usage of the commit
 	test_commit -C with-commit any-commit &&
 	anycommit=$(git -C with-commit rev-parse HEAD) &&
 	GIT_TRACE="$(pwd)/trace.txt" \
-		git -C with-commit-graph fetch origin $anycommit 2>err &&
+		test_must_fail git -C with-commit-graph fetch origin $anycommit 2>err &&
 	! grep "fatal: promisor-remote: unable to fork off fetch subprocess" err &&
 	grep "git fetch origin" trace.txt >actual &&
 	test_line_count = 1 actual
-- 
2.47.0.163.g1226f6d8fa-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/2] fetch-pack: warn if in commit graph but not obj db
  2024-10-30 21:22     ` Josh Steadmon
@ 2024-10-31 21:23       ` Jonathan Tan
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Tan @ 2024-10-31 21:23 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: Jonathan Tan, git

Josh Steadmon <steadmon@google.com> writes:
> >  /*
> >   * Returns 1 if every object pointed to by the given remote refs is available
> >   * locally and reachable from a local ref, and 0 otherwise.
> > @@ -830,6 +842,10 @@ static int everything_local(struct fetch_pack_args *args,
> >  				      ref->name);
> >  			continue;
> >  		}
> > +		if (o->flags & COMPLETE_FROM_COMMIT_GRAPH) {
> > +			if (!has_object(the_repository, remote, 0))
> > +				warn_in_commit_graph_only(remote);
> > +		}
> 
> And now that we're checking what's local, we issue our warning if we
> have an object missing from the DB but mentioned in the commit graph.
> Seems fine, although I wonder if it makes more sense to fail earlier. It
> looks like the only place we do the
> `mark_additional_complete_information` checks is in `mark_complete()`,
> so should we just check this condition there? No strong feelings either
> way, just curious.

When we were merely warning, it was useful to mark everything then
check later, so that a warning message would be printed once per
object, instead of potentially multiple times. (In the infinite case
that we discovered at $DAYJOB, it doesn't really matter since the
message is going to be printed an infinite number of times anyway,
but in the "plain" case in which the user is missing a commit and does
not have automatic commit graph writing enabled, the fetch will indeed
terminate.)

But since we're making this a fatal error, yes, it makes sense to fail
earlier. I've made the change.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/2] fetch-pack: warn if in commit graph but not obj db
  2024-10-31 20:59     ` Taylor Blau
@ 2024-10-31 21:43       ` Jonathan Tan
  2024-11-01 14:33         ` Taylor Blau
  0 siblings, 1 reply; 38+ messages in thread
From: Jonathan Tan @ 2024-10-31 21:43 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jonathan Tan, git

Taylor Blau <me@ttaylorr.com> writes:
> > However, the action of first looking up the commit graph file is not
> > done everywhere in Git, especially if the type of the object at the time
> > of lookup is not known. This means that in a repo corruption situation,
> > a user may encounter an "object missing" error, attempt to fetch it, and
> > still encounter the same error later when they reattempt their original
> > action, because the object is present in the commit graph file but not in
> > the object DB.
> 
> I think the type of repository corruption here may be underspecified.

Hmm...if you have any specific points you'd like me to elaborate on (or
better yet, wording suggestions), please let me know.

> You say that we have some object, say X, whose type is not known. So we
> don't load the commit-graph, realize that X is missing, and then try and
> fetch it.

Yes.

> In this scenario, is X actually in the commit-graph, but not
> in the object database?

Yes.

> Further, if X is in the commit-graph, I assume
> we do not look it up there because we first try and find its type, which
> fails, so we assume we don't have it (despite it appearing corruptly in
> the commit-graph)?
> 
> I think that matches the behavior you're describing, but I want to make
> sure that I'm not thinking of something else.

Strictly speaking, we are not trying to find its type. We are trying
to find the object itself. (One could argue that if we find out that
an object is a commit, we can then ignore the packfile and go look up
the commit graph file. I'm not so sure this is a good idea, but this is
moot, I think - as far as I know, we currently don't do this.)

But yes, if the object is not in the object DB, we assume we don't have
it.

> You discuss this a little bit in your commit message, but I wonder if we
> should just die() here. I feel like we're trying to work around a
> situation where the commit-graph is obviously broken because it refers
> to commit objects that don't actually exist in the object store.

Yeah, that seems to be the consensus. I've switched it to a fatal error.

> A few thoughts in this area:
> 
>   - What situation provokes this to be true? I could imagine there is
>     some bug that we don't fully have a grasp of. But I wonder if it is
>     even easier to provoke than that, say by pruning some objects out of
>     the object store, then not rewriting the commit-graph, leaving some
>     of the references dangling.

The fetching of promisor objects that are descendants of non-promisor
objects. [1]

I think that the rewriting of the commit graph happens on every repack,
thus avoiding the situation you describe (unless there is a bug there).

[1] https://lore.kernel.org/git/20241001191811.1934900-1-calvinwan@google.com/

>   - Does 'git fsck' catch this case within the commit-graph?

Honestly, I haven't checked - I've been concentrating on fixing the
fetch part for now (and also the bug that caused the missing commits
[2]).

[2] https://lore.kernel.org/git/cover.1729792911.git.jonathantanmy@google.com/

>   - Are the other areas of the code that rely on the assumption that all
>     entries in the commit-graph actually exist on disk? If so, are they
>     similarly broken?

Yes, the fetch negotiation code. It is not "broken" in that it solely
uses repo_parse_commit() which always checks the commit graph, so as
long as the commit graph has everything we need, there will be no error.

There might be other systems that rely both on the commit graph and the
object DB, and thus have an inconsistent view (so, "similarly broken" as
you describe it) but at least in the partial clone case, the severity of
the issue is not as high as in "fetch", because these other systems can
lazily fetch the missing commit and then proceed.

> Another thought about this whole thing is that we essentially have a
> code path that says: "I found this object from the commit-graph, but
> don't know if I actually have it on disk, so mark it to be checked later
> via has_object()".
> 
> I wonder if it would be more straightforward to replace the call to
> lookup_commit_in_graph() with a direct call to has_object() in the
> deref_without_lazy_fetch() function, which I think would both (a)
> eliminate the need for a new flag bit to be allocated, and (b) prevent
> looking up the object twice.
> 
> Thoughts?
> 
> Thanks,
> Taylor

This would undo the optimization in 62b5a35a33 (fetch-pack: optimize
loading of refs via commit graph, 2021-09-01), and also would not work
without changes to the fetch negotiation code - I tried to describe it
in the commit message, perhaps not very clearly, but the issue is that
even if we emit "want X", the fetch negotiation code would emit "have
X" (the X is the same in both), and at least for our JGit server at
$DAYJOB, the combination of "want X" and "have X" results in the server
sending an empty packfile (reasonable behavior, I think). (And I don't
think the changes to the fetch negotiation code are worth it.)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 0/2] When fetching, die if in commit graph but not obj db
  2024-10-31 21:18   ` [PATCH v2 0/2] When fetching, die " Jonathan Tan
  2024-10-31 21:19     ` [PATCH v2 1/2] Revert "fetch-pack: add a deref_without_lazy_fetch_extended()" Jonathan Tan
  2024-10-31 21:19     ` [PATCH v2 2/2] fetch-pack: warn if in commit graph but not obj db Jonathan Tan
@ 2024-10-31 22:33     ` Josh Steadmon
  2 siblings, 0 replies; 38+ messages in thread
From: Josh Steadmon @ 2024-10-31 22:33 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, me, hanxin.hx

On 2024.10.31 14:18, Jonathan Tan wrote:
> Thanks everyone for your review comments. I've updated the patch 1
> commit message as Josh requested. I'll reply individually to comments on
> patch 2.
> 
> Jonathan Tan (2):
>   Revert "fetch-pack: add a deref_without_lazy_fetch_extended()"
>   fetch-pack: warn if in commit graph but not obj db
> 
>  fetch-pack.c                               | 42 +++++++++++-----------
>  t/t5330-no-lazy-fetch-with-commit-graph.sh |  2 +-
>  2 files changed, 23 insertions(+), 21 deletions(-)

This version looks good to me, thanks!

Reviewed-by: Josh Steadmon <steadmon@google.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/2] fetch-pack: warn if in commit graph but not obj db
  2024-10-31 21:19     ` [PATCH v2 2/2] fetch-pack: warn if in commit graph but not obj db Jonathan Tan
@ 2024-11-01  2:22       ` Junio C Hamano
  2024-11-01  4:25         ` Junio C Hamano
  2024-11-01 17:36         ` Jonathan Tan
  2024-11-01 15:18       ` Taylor Blau
  1 sibling, 2 replies; 38+ messages in thread
From: Junio C Hamano @ 2024-11-01  2:22 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, steadmon, me, hanxin.hx

Jonathan Tan <jonathantanmy@google.com> writes:

>  static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
> -					       int mark_tags_complete)
> +					       int mark_tags_complete_and_check_obj_db)
>  {
>  	enum object_type type;
>  	struct object_info info = { .typep = &type };
>  	struct commit *commit;
>  
>  	commit = lookup_commit_in_graph(the_repository, oid);
> -	if (commit)
> +	if (commit) {
> +		if (mark_tags_complete_and_check_obj_db) {
> +			if (!has_object(the_repository, oid, 0))
> +				die_in_commit_graph_only(oid);
> +		}
>  		return commit;
> +	}

Hmph, even when we are not doing the mark-tags-complete thing,
wouldn't it be a fatal error if the commit graph claims a commit
exists but we are missing it?

It also makes me wonder if it would be sufficient to prevent us from
saying "have X" if we just pretend as if lookup_commit_in_graph()
returned NULL in this case.

In any case, infinitely recursing to lazily fetch a single commit is
definitely worth fixing.  Thanks for digging to the bottom of the
problem and fixing it.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/2] fetch-pack: warn if in commit graph but not obj db
  2024-11-01  2:22       ` Junio C Hamano
@ 2024-11-01  4:25         ` Junio C Hamano
  2024-11-01  8:59           ` [External] " Han Xin
  2024-11-01 17:40           ` Jonathan Tan
  2024-11-01 17:36         ` Jonathan Tan
  1 sibling, 2 replies; 38+ messages in thread
From: Junio C Hamano @ 2024-11-01  4:25 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, steadmon, me, hanxin.hx

Junio C Hamano <gitster@pobox.com> writes:

>>  	commit = lookup_commit_in_graph(the_repository, oid);
>> -	if (commit)
>> +	if (commit) {
>> +		if (mark_tags_complete_and_check_obj_db) {
>> +			if (!has_object(the_repository, oid, 0))
>> +				die_in_commit_graph_only(oid);
>> +		}
>>  		return commit;
>> +	}
>
> Hmph, even when we are not doing the mark-tags-complete thing,
> wouldn't it be a fatal error if the commit graph claims a commit
> exists but we are missing it?
>
> It also makes me wonder if it would be sufficient to prevent us from
> saying "have X" if we just pretend as if lookup_commit_in_graph()
> returned NULL in this case.

Again, sorry for the noise.

I think the posted patch is better without either of these two,
simply because the "commit graph lies" case is a repository
corruption, and "git fsck" should catch such a corruption (and if
not, we should make sure it does).

The normal codepaths should assume a healthy working repository.

As has_object() is not without cost, an extra check is warranted
only because not checking will go into infinite recursion.  If it
does not make us fail in such an unpleasant way if we return such a
commit when we are not doing the mark-tags-complete thing (but makes
us fail in some other controlled way), not paying cost for an extra
check is the right thing.

Thanks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [External] Re: [PATCH v2 2/2] fetch-pack: warn if in commit graph but not obj db
  2024-11-01  4:25         ` Junio C Hamano
@ 2024-11-01  8:59           ` Han Xin
  2024-11-01 17:46             ` Jonathan Tan
  2024-11-01 17:40           ` Jonathan Tan
  1 sibling, 1 reply; 38+ messages in thread
From: Han Xin @ 2024-11-01  8:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, git, steadmon, me

On Fri, Nov 1, 2024 at 12:25 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> >>      commit = lookup_commit_in_graph(the_repository, oid);
> >> -    if (commit)
> >> +    if (commit) {
> >> +            if (mark_tags_complete_and_check_obj_db) {
> >> +                    if (!has_object(the_repository, oid, 0))
> >> +                            die_in_commit_graph_only(oid);
> >> +            }
> >>              return commit;
> >> +    }
> >
> > Hmph, even when we are not doing the mark-tags-complete thing,
> > wouldn't it be a fatal error if the commit graph claims a commit
> > exists but we are missing it?
> >
> > It also makes me wonder if it would be sufficient to prevent us from
> > saying "have X" if we just pretend as if lookup_commit_in_graph()
> > returned NULL in this case.
>
> Again, sorry for the noise.
>
> I think the posted patch is better without either of these two,
> simply because the "commit graph lies" case is a repository
> corruption, and "git fsck" should catch such a corruption (and if
> not, we should make sure it does).
>
> The normal codepaths should assume a healthy working repository.
>
> As has_object() is not without cost, an extra check is warranted
> only because not checking will go into infinite recursion.  If it
> does not make us fail in such an unpleasant way if we return such a
> commit when we are not doing the mark-tags-complete thing (but makes
> us fail in some other controlled way), not paying cost for an extra
> check is the right thing.
>
> Thanks.

Although the scenario I faked in t/t5330-no-lazy-fetch-with-commit-graph.sh
usually does not occur, if we are unfortunate enough to encounter this issue,
I hope it can automatically fix the problem as much as possible without
relying on me to take an extra action.

Thanks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/2] fetch-pack: warn if in commit graph but not obj db
  2024-10-31 21:43       ` Jonathan Tan
@ 2024-11-01 14:33         ` Taylor Blau
  2024-11-01 17:33           ` Jonathan Tan
  0 siblings, 1 reply; 38+ messages in thread
From: Taylor Blau @ 2024-11-01 14:33 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Thu, Oct 31, 2024 at 02:43:19PM -0700, Jonathan Tan wrote:
> > Another thought about this whole thing is that we essentially have a
> > code path that says: "I found this object from the commit-graph, but
> > don't know if I actually have it on disk, so mark it to be checked later
> > via has_object()".
> >
> > I wonder if it would be more straightforward to replace the call to
> > lookup_commit_in_graph() with a direct call to has_object() in the
> > deref_without_lazy_fetch() function, which I think would both (a)
> > eliminate the need for a new flag bit to be allocated, and (b) prevent
> > looking up the object twice.
> >
> > Thoughts?
>
> This would undo the optimization in 62b5a35a33 (fetch-pack: optimize
> loading of refs via commit graph, 2021-09-01), and also would not work
> without changes to the fetch negotiation code - I tried to describe it
> in the commit message, perhaps not very clearly, but the issue is that
> even if we emit "want X", the fetch negotiation code would emit "have
> X" (the X is the same in both), and at least for our JGit server at
> $DAYJOB, the combination of "want X" and "have X" results in the server
> sending an empty packfile (reasonable behavior, I think). (And I don't
> think the changes to the fetch negotiation code are worth it.)

Thanks for the clarifications above. What I was trying to poke at here
was... doesn't the change as presented undo that optimization, just in a
different way?

In 62b5a35a33 we taught deref_without_lazy_fetch() to lookup commits
through the commit-graph. But in this patch, we now call has_object()
on top of that existing check. Am I missing something obvious?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/2] fetch-pack: warn if in commit graph but not obj db
  2024-10-31 21:19     ` [PATCH v2 2/2] fetch-pack: warn if in commit graph but not obj db Jonathan Tan
  2024-11-01  2:22       ` Junio C Hamano
@ 2024-11-01 15:18       ` Taylor Blau
  2024-11-01 17:49         ` Jonathan Tan
  1 sibling, 1 reply; 38+ messages in thread
From: Taylor Blau @ 2024-11-01 15:18 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, steadmon, hanxin.hx

On Thu, Oct 31, 2024 at 02:19:01PM -0700, Jonathan Tan wrote:
> diff --git a/t/t5330-no-lazy-fetch-with-commit-graph.sh b/t/t5330-no-lazy-fetch-with-commit-graph.sh
> index 5eb28f0512..feccd58324 100755
> --- a/t/t5330-no-lazy-fetch-with-commit-graph.sh
> +++ b/t/t5330-no-lazy-fetch-with-commit-graph.sh
> @@ -39,7 +39,7 @@ test_expect_success 'fetch any commit from promisor with the usage of the commit
>  	test_commit -C with-commit any-commit &&
>  	anycommit=$(git -C with-commit rev-parse HEAD) &&
>  	GIT_TRACE="$(pwd)/trace.txt" \
> -		git -C with-commit-graph fetch origin $anycommit 2>err &&
> +		test_must_fail git -C with-commit-graph fetch origin $anycommit 2>err &&

It appears that this line breaks CI:

    https://github.com/ttaylorr/git/actions/runs/11631453312/job/32392591229

because you're using a one-shot environment variable assignment before
calling a shell function.

This should instead be:

    test_must_fail env GIT_TRACE="$(pwd)/trace.txt" \
      git -C with-commit-graph fetch origin $anycommit 2>err &&

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/2] fetch-pack: warn if in commit graph but not obj db
  2024-11-01 14:33         ` Taylor Blau
@ 2024-11-01 17:33           ` Jonathan Tan
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Tan @ 2024-11-01 17:33 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jonathan Tan, git

Taylor Blau <me@ttaylorr.com> writes:
> On Thu, Oct 31, 2024 at 02:43:19PM -0700, Jonathan Tan wrote:
> > > Another thought about this whole thing is that we essentially have a
> > > code path that says: "I found this object from the commit-graph, but
> > > don't know if I actually have it on disk, so mark it to be checked later
> > > via has_object()".
> > >
> > > I wonder if it would be more straightforward to replace the call to
> > > lookup_commit_in_graph() with a direct call to has_object() in the
> > > deref_without_lazy_fetch() function, which I think would both (a)
> > > eliminate the need for a new flag bit to be allocated, and (b) prevent
> > > looking up the object twice.
> > >
> > > Thoughts?
> >
> > This would undo the optimization in 62b5a35a33 (fetch-pack: optimize
> > loading of refs via commit graph, 2021-09-01), and also would not work
> > without changes to the fetch negotiation code - I tried to describe it
> > in the commit message, perhaps not very clearly, but the issue is that
> > even if we emit "want X", the fetch negotiation code would emit "have
> > X" (the X is the same in both), and at least for our JGit server at
> > $DAYJOB, the combination of "want X" and "have X" results in the server
> > sending an empty packfile (reasonable behavior, I think). (And I don't
> > think the changes to the fetch negotiation code are worth it.)
> 
> Thanks for the clarifications above. What I was trying to poke at here
> was... doesn't the change as presented undo that optimization, just in a
> different way?
> 
> In 62b5a35a33 we taught deref_without_lazy_fetch() to lookup commits
> through the commit-graph. But in this patch, we now call has_object()
> on top of that existing check. Am I missing something obvious?
> 
> Thanks,
> Taylor

deref_without_lazy_fetch() is used in these situations:
 (1) to mark things COMPLETE (the 2nd argument is set to 1)
 (2) all other situations (the 2nd argument is set to 0)

62b5a35a33 teaches deref_without_lazy_fetch() to use the commit-graph in
all situations.

The change I have presented in this patch set teaches
deref_without_lazy_fetch() to read both the commit-graph and the object
DB in (1) but not (2). So I'm undoing the optimization, but not for
all situations.

My understanding of your suggestion was to undo the optimization in
all situations.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/2] fetch-pack: warn if in commit graph but not obj db
  2024-11-01  2:22       ` Junio C Hamano
  2024-11-01  4:25         ` Junio C Hamano
@ 2024-11-01 17:36         ` Jonathan Tan
  1 sibling, 0 replies; 38+ messages in thread
From: Jonathan Tan @ 2024-11-01 17:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, git, steadmon, me, hanxin.hx

Junio C Hamano <gitster@pobox.com> writes:
> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> >  static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
> > -					       int mark_tags_complete)
> > +					       int mark_tags_complete_and_check_obj_db)
> >  {
> >  	enum object_type type;
> >  	struct object_info info = { .typep = &type };
> >  	struct commit *commit;
> >  
> >  	commit = lookup_commit_in_graph(the_repository, oid);
> > -	if (commit)
> > +	if (commit) {
> > +		if (mark_tags_complete_and_check_obj_db) {
> > +			if (!has_object(the_repository, oid, 0))
> > +				die_in_commit_graph_only(oid);
> > +		}
> >  		return commit;
> > +	}
> 
> Hmph, even when we are not doing the mark-tags-complete thing,
> wouldn't it be a fatal error if the commit graph claims a commit
> exists but we are missing it?

If we can detect this cheaply, yes it would be ideal if every time
we read a commit from the commit graph, we also check that the commit
exists in the object DB. I don't think we can do it cheaply, though.

> It also makes me wonder if it would be sufficient to prevent us from
> saying "have X" if we just pretend as if lookup_commit_in_graph()
> returned NULL in this case.

"have X" is controlled by the packfile negotiation part, so we would
have to change that. (This part controls whether we send "want X" for a
specific OID or not.)

> In any case, infinitely recursing to lazily fetch a single commit is
> definitely worth fixing.  Thanks for digging to the bottom of the
> problem and fixing it.

Thanks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/2] fetch-pack: warn if in commit graph but not obj db
  2024-11-01  4:25         ` Junio C Hamano
  2024-11-01  8:59           ` [External] " Han Xin
@ 2024-11-01 17:40           ` Jonathan Tan
  2024-11-02  2:08             ` Junio C Hamano
  1 sibling, 1 reply; 38+ messages in thread
From: Jonathan Tan @ 2024-11-01 17:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, git, steadmon, me, hanxin.hx

Junio C Hamano <gitster@pobox.com> writes:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> >>  	commit = lookup_commit_in_graph(the_repository, oid);
> >> -	if (commit)
> >> +	if (commit) {
> >> +		if (mark_tags_complete_and_check_obj_db) {
> >> +			if (!has_object(the_repository, oid, 0))
> >> +				die_in_commit_graph_only(oid);
> >> +		}
> >>  		return commit;
> >> +	}
> >
> > Hmph, even when we are not doing the mark-tags-complete thing,
> > wouldn't it be a fatal error if the commit graph claims a commit
> > exists but we are missing it?
> >
> > It also makes me wonder if it would be sufficient to prevent us from
> > saying "have X" if we just pretend as if lookup_commit_in_graph()
> > returned NULL in this case.
> 
> Again, sorry for the noise.
> 
> I think the posted patch is better without either of these two,
> simply because the "commit graph lies" case is a repository
> corruption, and "git fsck" should catch such a corruption (and if
> not, we should make sure it does).
> 
> The normal codepaths should assume a healthy working repository.
> 
> As has_object() is not without cost, an extra check is warranted
> only because not checking will go into infinite recursion.  If it
> does not make us fail in such an unpleasant way if we return such a
> commit when we are not doing the mark-tags-complete thing (but makes
> us fail in some other controlled way), not paying cost for an extra
> check is the right thing.
> 
> Thanks.

Just checking...by "the posted patch is better without either
of these two", do you mean that we should not use has_object()
here? I included it here in as narrow a scope as possible (when
"mark_tags_complete_and_check_obj_db" is true) precisely because not
checking will go into infinite recursion, as you said. (And indeed I did
not expand the scope because I agree that the normal codepaths should
assume a healthy working repository.)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [External] Re: [PATCH v2 2/2] fetch-pack: warn if in commit graph but not obj db
  2024-11-01  8:59           ` [External] " Han Xin
@ 2024-11-01 17:46             ` Jonathan Tan
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Tan @ 2024-11-01 17:46 UTC (permalink / raw)
  To: Han Xin; +Cc: Jonathan Tan, Junio C Hamano, git, steadmon, me

Han Xin <hanxin.hx@bytedance.com> writes:
> Although the scenario I faked in t/t5330-no-lazy-fetch-with-commit-graph.sh
> usually does not occur, if we are unfortunate enough to encounter this issue,
> I hope it can automatically fix the problem as much as possible without
> relying on me to take an extra action.
> 
> Thanks.

Note that unfortunately the user will still need to take an extra
action.

My goal at first was to try to teach Git to fetch the commit (in the
commit graph file but not in the object DB) anyway, so that (as you
said) the problem will fix itself, but that requires not only a change
in the part of the code that emits "want", but also the part that emits
"have" (the fetch negotiation code). At that point, I decided that it's
better to stop early and warn the user (it was not a fatal error in the
original version of the code, but I have changed it).

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/2] fetch-pack: warn if in commit graph but not obj db
  2024-11-01 15:18       ` Taylor Blau
@ 2024-11-01 17:49         ` Jonathan Tan
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Tan @ 2024-11-01 17:49 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jonathan Tan, git, steadmon, hanxin.hx

Taylor Blau <me@ttaylorr.com> writes:
> On Thu, Oct 31, 2024 at 02:19:01PM -0700, Jonathan Tan wrote:
> > diff --git a/t/t5330-no-lazy-fetch-with-commit-graph.sh b/t/t5330-no-lazy-fetch-with-commit-graph.sh
> > index 5eb28f0512..feccd58324 100755
> > --- a/t/t5330-no-lazy-fetch-with-commit-graph.sh
> > +++ b/t/t5330-no-lazy-fetch-with-commit-graph.sh
> > @@ -39,7 +39,7 @@ test_expect_success 'fetch any commit from promisor with the usage of the commit
> >  	test_commit -C with-commit any-commit &&
> >  	anycommit=$(git -C with-commit rev-parse HEAD) &&
> >  	GIT_TRACE="$(pwd)/trace.txt" \
> > -		git -C with-commit-graph fetch origin $anycommit 2>err &&
> > +		test_must_fail git -C with-commit-graph fetch origin $anycommit 2>err &&
> 
> It appears that this line breaks CI:
> 
>     https://github.com/ttaylorr/git/actions/runs/11631453312/job/32392591229
> 
> because you're using a one-shot environment variable assignment before
> calling a shell function.
> 
> This should instead be:
> 
>     test_must_fail env GIT_TRACE="$(pwd)/trace.txt" \
>       git -C with-commit-graph fetch origin $anycommit 2>err &&
> 
> Thanks,
> Taylor

Ah, thanks. I've made the change locally. There are a few ongoing
conversations about this patch set (thanks everyone for participating)
so I'll wait a while before sending out the next set.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/2] fetch-pack: warn if in commit graph but not obj db
  2024-11-01 17:40           ` Jonathan Tan
@ 2024-11-02  2:08             ` Junio C Hamano
  0 siblings, 0 replies; 38+ messages in thread
From: Junio C Hamano @ 2024-11-02  2:08 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, steadmon, me, hanxin.hx

Jonathan Tan <jonathantanmy@google.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>> Junio C Hamano <gitster@pobox.com> writes:
>> 
>> >>  	commit = lookup_commit_in_graph(the_repository, oid);
>> >> -	if (commit)
>> >> +	if (commit) {
>> >> +		if (mark_tags_complete_and_check_obj_db) {
>> >> +			if (!has_object(the_repository, oid, 0))
>> >> +				die_in_commit_graph_only(oid);
>> >> +		}
>> >>  		return commit;
>> >> +	}
>> >
>> > Hmph, even when we are not doing the mark-tags-complete thing,
>> > wouldn't it be a fatal error if the commit graph claims a commit
>> > exists but we are missing it?
>> >
>> > It also makes me wonder if it would be sufficient to prevent us from
>> > saying "have X" if we just pretend as if lookup_commit_in_graph()
>> > returned NULL in this case.
>> 
>> Again, sorry for the noise.
>> 
>> I think the posted patch is better without either of these two,
>> simply because the "commit graph lies" case is a repository
>> corruption, and "git fsck" should catch such a corruption (and if
>> not, we should make sure it does).
>> 
>> The normal codepaths should assume a healthy working repository.
>> 
>> As has_object() is not without cost, an extra check is warranted
>> only because not checking will go into infinite recursion.  If it
>> does not make us fail in such an unpleasant way if we return such a
>> commit when we are not doing the mark-tags-complete thing (but makes
>> us fail in some other controlled way), not paying cost for an extra
>> check is the right thing.
>> 
>> Thanks.
>
> Just checking...by "the posted patch is better without either
> of these two", do you mean that we should not use has_object()
> here?

No, "these two" refers to two changes I hinted at in my message,
i.e. (1) regardless of mark_tags_complete_and_check_obj_db shouldn't
we check with has_object() and die? and (2) if we commit=NULL and
keep going, would it be sufficient to fix it?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 0/2] When fetching, die if in commit graph but not obj db
  2024-10-29 21:11 ` [PATCH 0/2] When fetching, warn if in commit graph but not obj db Jonathan Tan
                     ` (3 preceding siblings ...)
  2024-10-31 21:18   ` [PATCH v2 0/2] When fetching, die " Jonathan Tan
@ 2024-11-05 19:24   ` Jonathan Tan
  2024-11-05 19:24     ` [PATCH v3 1/2] Revert "fetch-pack: add a deref_without_lazy_fetch_extended()" Jonathan Tan
                       ` (2 more replies)
  4 siblings, 3 replies; 38+ messages in thread
From: Jonathan Tan @ 2024-11-05 19:24 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster

Changes: the commit message title of the second patch, and a change from
grep to test_grep.

Jonathan Tan (2):
  Revert "fetch-pack: add a deref_without_lazy_fetch_extended()"
  fetch-pack: die if in commit graph but not obj db

 fetch-pack.c                               | 42 +++++++++++-----------
 t/t5330-no-lazy-fetch-with-commit-graph.sh |  4 +--
 2 files changed, 24 insertions(+), 22 deletions(-)

Range-diff against v2:
1:  34e87b8388 = 1:  34e87b8388 Revert "fetch-pack: add a deref_without_lazy_fetch_extended()"
2:  a35e386a0e ! 2:  c92b2c9e50 fetch-pack: warn if in commit graph but not obj db
    @@ Metadata
     Author: Jonathan Tan <jonathantanmy@google.com>
     
      ## Commit message ##
    -    fetch-pack: warn if in commit graph but not obj db
    +    fetch-pack: die if in commit graph but not obj db
     
         When fetching, there is a step in which sought objects are first checked
         against the local repository; only objects that are not in the local
    @@ t/t5330-no-lazy-fetch-with-commit-graph.sh: test_expect_success 'fetch any commi
     -	GIT_TRACE="$(pwd)/trace.txt" \
     +	test_must_fail env GIT_TRACE="$(pwd)/trace.txt" \
      		git -C with-commit-graph fetch origin $anycommit 2>err &&
    - 	! grep "fatal: promisor-remote: unable to fork off fetch subprocess" err &&
    +-	! grep "fatal: promisor-remote: unable to fork off fetch subprocess" err &&
    ++	test_grep ! "fatal: promisor-remote: unable to fork off fetch subprocess" err &&
      	grep "git fetch origin" trace.txt >actual &&
    + 	test_line_count = 1 actual
    + '
-- 
2.47.0.277.g8800431eea-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 1/2] Revert "fetch-pack: add a deref_without_lazy_fetch_extended()"
  2024-11-05 19:24   ` [PATCH v3 " Jonathan Tan
@ 2024-11-05 19:24     ` Jonathan Tan
  2024-11-05 19:24     ` [PATCH v3 2/2] fetch-pack: die if in commit graph but not obj db Jonathan Tan
  2024-11-06  3:12     ` [PATCH v3 0/2] When fetching, " Junio C Hamano
  2 siblings, 0 replies; 38+ messages in thread
From: Jonathan Tan @ 2024-11-05 19:24 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster

This reverts commit a6e65fb39caf18259c660c1c7910d5bf80bc15cb.

This revert simplifies the next patch in this patch set.

The commit message of that commit mentions that the new function "will
be used for the bundle-uri client in a subsequent commit", but it seems
that eventually it wasn't used.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 fetch-pack.c | 25 +++++++------------------
 1 file changed, 7 insertions(+), 18 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index f752da93a8..6728a0d2f5 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -122,12 +122,11 @@ static void for_each_cached_alternate(struct fetch_negotiator *negotiator,
 		cb(negotiator, cache.items[i]);
 }
 
-static struct commit *deref_without_lazy_fetch_extended(const struct object_id *oid,
-							int mark_tags_complete,
-							enum object_type *type,
-							unsigned int oi_flags)
+static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
+					       int mark_tags_complete)
 {
-	struct object_info info = { .typep = type };
+	enum object_type type;
+	struct object_info info = { .typep = &type };
 	struct commit *commit;
 
 	commit = lookup_commit_in_graph(the_repository, oid);
@@ -136,9 +135,9 @@ static struct commit *deref_without_lazy_fetch_extended(const struct object_id *
 
 	while (1) {
 		if (oid_object_info_extended(the_repository, oid, &info,
-					     oi_flags))
+					     OBJECT_INFO_SKIP_FETCH_OBJECT | OBJECT_INFO_QUICK))
 			return NULL;
-		if (*type == OBJ_TAG) {
+		if (type == OBJ_TAG) {
 			struct tag *tag = (struct tag *)
 				parse_object(the_repository, oid);
 
@@ -152,7 +151,7 @@ static struct commit *deref_without_lazy_fetch_extended(const struct object_id *
 		}
 	}
 
-	if (*type == OBJ_COMMIT) {
+	if (type == OBJ_COMMIT) {
 		struct commit *commit = lookup_commit(the_repository, oid);
 		if (!commit || repo_parse_commit(the_repository, commit))
 			return NULL;
@@ -162,16 +161,6 @@ static struct commit *deref_without_lazy_fetch_extended(const struct object_id *
 	return NULL;
 }
 
-
-static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
-					       int mark_tags_complete)
-{
-	enum object_type type;
-	unsigned flags = OBJECT_INFO_SKIP_FETCH_OBJECT | OBJECT_INFO_QUICK;
-	return deref_without_lazy_fetch_extended(oid, mark_tags_complete,
-						 &type, flags);
-}
-
 static int rev_list_insert_ref(struct fetch_negotiator *negotiator,
 			       const struct object_id *oid)
 {
-- 
2.47.0.277.g8800431eea-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 2/2] fetch-pack: die if in commit graph but not obj db
  2024-11-05 19:24   ` [PATCH v3 " Jonathan Tan
  2024-11-05 19:24     ` [PATCH v3 1/2] Revert "fetch-pack: add a deref_without_lazy_fetch_extended()" Jonathan Tan
@ 2024-11-05 19:24     ` Jonathan Tan
  2024-11-06  3:12     ` [PATCH v3 0/2] When fetching, " Junio C Hamano
  2 siblings, 0 replies; 38+ messages in thread
From: Jonathan Tan @ 2024-11-05 19:24 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster

When fetching, there is a step in which sought objects are first checked
against the local repository; only objects that are not in the local
repository are then fetched. This check first looks up the commit graph
file, and returns "present" if the object is in there.

However, the action of first looking up the commit graph file is not
done everywhere in Git, especially if the type of the object at the time
of lookup is not known. This means that in a repo corruption situation,
a user may encounter an "object missing" error, attempt to fetch it, and
still encounter the same error later when they reattempt their original
action, because the object is present in the commit graph file but not in
the object DB.

Therefore, make it a fatal error when this occurs. (Note that we cannot
proceed to include this object in the list of objects to be fetched
without changing at least the fetch negotiation code: what would happen
is that the client will send "want X" and "have X" and when I tested
at $DAYJOB with a work server that uses JGit, the server reasonably
returned an empty packfile. And changing the fetch negotiation code to
only use the object DB when deciding what to report as "have" would be
an unnecessary slowdown, I think.)

This was discovered when a lazy fetch of a missing commit completed with
nothing actually fetched, and the writing of the commit graph file after
every fetch then attempted to read said missing commit, triggering a
lazy fetch of said missing commit, resulting in an infinite loop with no
user-visible indication (until they check the list of processes running
on their computer). With this fix, there is no infinite loop. Note that
although the repo corruption we discovered was caused by a bug in GC in
a partial clone, the behavior that this patch teaches Git to warn about
applies to any repo with commit graph enabled and with a missing commit,
whether it is a partial clone or not.

t5330, introduced in 3a1ea94a49 (commit-graph.c: no lazy fetch in
lookup_commit_in_graph(), 2022-07-01), tests that an interaction between
fetch and the commit graph does not cause an infinite loop. This patch
changes the exit code in that situation, so that test had to be changed.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 fetch-pack.c                               | 19 ++++++++++++++++---
 t/t5330-no-lazy-fetch-with-commit-graph.sh |  4 ++--
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 6728a0d2f5..fe1fb3c1b7 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -122,16 +122,29 @@ static void for_each_cached_alternate(struct fetch_negotiator *negotiator,
 		cb(negotiator, cache.items[i]);
 }
 
+static void die_in_commit_graph_only(const struct object_id *oid)
+{
+	die(_("You are attempting to fetch %s, which is in the commit graph file but not in the object database.\n"
+	      "This is probably due to repo corruption.\n"
+	      "If you are attempting to repair this repo corruption by refetching the missing object, use 'git fetch --refetch' with the missing object."),
+	      oid_to_hex(oid));
+}
+
 static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
-					       int mark_tags_complete)
+					       int mark_tags_complete_and_check_obj_db)
 {
 	enum object_type type;
 	struct object_info info = { .typep = &type };
 	struct commit *commit;
 
 	commit = lookup_commit_in_graph(the_repository, oid);
-	if (commit)
+	if (commit) {
+		if (mark_tags_complete_and_check_obj_db) {
+			if (!has_object(the_repository, oid, 0))
+				die_in_commit_graph_only(oid);
+		}
 		return commit;
+	}
 
 	while (1) {
 		if (oid_object_info_extended(the_repository, oid, &info,
@@ -143,7 +156,7 @@ static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
 
 			if (!tag->tagged)
 				return NULL;
-			if (mark_tags_complete)
+			if (mark_tags_complete_and_check_obj_db)
 				tag->object.flags |= COMPLETE;
 			oid = &tag->tagged->oid;
 		} else {
diff --git a/t/t5330-no-lazy-fetch-with-commit-graph.sh b/t/t5330-no-lazy-fetch-with-commit-graph.sh
index 5eb28f0512..21f36eb8c3 100755
--- a/t/t5330-no-lazy-fetch-with-commit-graph.sh
+++ b/t/t5330-no-lazy-fetch-with-commit-graph.sh
@@ -38,9 +38,9 @@ test_expect_success 'fetch any commit from promisor with the usage of the commit
 	git -C with-commit-graph config remote.origin.partialclonefilter blob:none &&
 	test_commit -C with-commit any-commit &&
 	anycommit=$(git -C with-commit rev-parse HEAD) &&
-	GIT_TRACE="$(pwd)/trace.txt" \
+	test_must_fail env GIT_TRACE="$(pwd)/trace.txt" \
 		git -C with-commit-graph fetch origin $anycommit 2>err &&
-	! grep "fatal: promisor-remote: unable to fork off fetch subprocess" err &&
+	test_grep ! "fatal: promisor-remote: unable to fork off fetch subprocess" err &&
 	grep "git fetch origin" trace.txt >actual &&
 	test_line_count = 1 actual
 '
-- 
2.47.0.277.g8800431eea-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 0/2] When fetching, die if in commit graph but not obj db
  2024-11-05 19:24   ` [PATCH v3 " Jonathan Tan
  2024-11-05 19:24     ` [PATCH v3 1/2] Revert "fetch-pack: add a deref_without_lazy_fetch_extended()" Jonathan Tan
  2024-11-05 19:24     ` [PATCH v3 2/2] fetch-pack: die if in commit graph but not obj db Jonathan Tan
@ 2024-11-06  3:12     ` Junio C Hamano
  2 siblings, 0 replies; 38+ messages in thread
From: Junio C Hamano @ 2024-11-06  3:12 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> Changes: the commit message title of the second patch, and a change from
> grep to test_grep.
>
> Jonathan Tan (2):
>   Revert "fetch-pack: add a deref_without_lazy_fetch_extended()"
>   fetch-pack: die if in commit graph but not obj db

I presume that with this the topic should be ready for 'next'.

Thanks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2024-11-06  3:13 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-03 22:35 [RFC PATCH] promisor-remote: always JIT fetch with --refetch Emily Shaffer
2024-10-06 22:43 ` Junio C Hamano
2024-10-07  0:21   ` Robert Coup
2024-10-07  0:37     ` Junio C Hamano
2024-10-11 16:40   ` Emily Shaffer
2024-10-11 17:54     ` Junio C Hamano
2024-10-23  0:28 ` [PATCH v2] fetch-pack: don't mark COMPLETE unless we have the full object Emily Shaffer
2024-10-23 18:53   ` Emily Shaffer
2024-10-23 20:11   ` Taylor Blau
2024-10-28 22:55     ` Jonathan Tan
2024-10-29 21:11 ` [PATCH 0/2] When fetching, warn if in commit graph but not obj db Jonathan Tan
2024-10-29 21:11   ` [PATCH 1/2] Revert "fetch-pack: add a deref_without_lazy_fetch_extended()" Jonathan Tan
2024-10-30 21:22     ` Josh Steadmon
2024-10-29 21:11   ` [PATCH 2/2] fetch-pack: warn if in commit graph but not obj db Jonathan Tan
2024-10-30 21:22     ` Josh Steadmon
2024-10-31 21:23       ` Jonathan Tan
2024-10-31 20:59     ` Taylor Blau
2024-10-31 21:43       ` Jonathan Tan
2024-11-01 14:33         ` Taylor Blau
2024-11-01 17:33           ` Jonathan Tan
2024-10-30 21:22   ` [PATCH 0/2] When fetching, " Josh Steadmon
2024-10-31 21:18   ` [PATCH v2 0/2] When fetching, die " Jonathan Tan
2024-10-31 21:19     ` [PATCH v2 1/2] Revert "fetch-pack: add a deref_without_lazy_fetch_extended()" Jonathan Tan
2024-10-31 21:19     ` [PATCH v2 2/2] fetch-pack: warn if in commit graph but not obj db Jonathan Tan
2024-11-01  2:22       ` Junio C Hamano
2024-11-01  4:25         ` Junio C Hamano
2024-11-01  8:59           ` [External] " Han Xin
2024-11-01 17:46             ` Jonathan Tan
2024-11-01 17:40           ` Jonathan Tan
2024-11-02  2:08             ` Junio C Hamano
2024-11-01 17:36         ` Jonathan Tan
2024-11-01 15:18       ` Taylor Blau
2024-11-01 17:49         ` Jonathan Tan
2024-10-31 22:33     ` [PATCH v2 0/2] When fetching, die " Josh Steadmon
2024-11-05 19:24   ` [PATCH v3 " Jonathan Tan
2024-11-05 19:24     ` [PATCH v3 1/2] Revert "fetch-pack: add a deref_without_lazy_fetch_extended()" Jonathan Tan
2024-11-05 19:24     ` [PATCH v3 2/2] fetch-pack: die if in commit graph but not obj db Jonathan Tan
2024-11-06  3:12     ` [PATCH v3 0/2] When fetching, " Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).