git treeless-clone + wait + pull → problem, again pull → OK

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* git treeless-clone + wait + pull → problem, again pull → OK
@ 2025-07-01  9:24 Дилян Палаузов
  2025-07-22  9:17 ` Jeff King
  0 siblings, 1 reply; 3+ messages in thread
From: Дилян Палаузов @ 2025-07-01  9:24 UTC (permalink / raw)
  To: git

Hello,

the problem is that when I do a treeless or blobless clone and some time later git pull, git prints many, many lines that it tries to fetch data, then I interrupt with Ctrl+C, then do git pull again and then it completes.  However I never tried to precisely document this until now:

On 26 June 2025 I do

$ git clone --filter=tree:0 https://github.com/git/git.git
…
$ git show --oneline
cf6f63ea6 (HEAD -> master, origin/master, origin/HEAD) The fourth batch


Today I do 

$ git pull
From https://github.com/git/git
   cf6f63ea6..83014dc05  master     -> origin/master
   74e6fc65d..83e99ddf4  next       -> origin/next
 + bc3287e71...a842a7780 seen       -> origin/seen  (forced update)
   fefffbb31..7af8e2e03  todo       -> origin/todo
fatal: You are attempting to fetch cf6f63ea6bf35173e02e18bdc6a4ba41288acff9, which is in the commit graph file but not in the object database.
This is probably due to repo corruption.
If you are attempting to repair this repo corruption by refetching the missing object, use 'git fetch --refetch' with the missing object.
fatal: could not fetch 5e66731277a4d791043dc51e2804dc0b496c523b from promisor remote

$ git pull
Updating cf6f63ea6..83014dc05                   
remote: Enumerating objects: 22, done.                                                                          
remote: Counting objects: 100% (21/21), done.   
remote: Compressing objects: 100% (21/21), done.
Receiving objects: 100% (22/22), 137.56 KiB | 8.60 MiB/s, done.
remote: Total 22 (delta 0), reused 1 (delta 0), pack-reused 1 (from 1)
Fast-forward                                            
 Documentation/RelNotes/2.51.0.adoc     |  15 +++    
 Documentation/config/merge.adoc        |  14 +-   
 Documentation/git-merge.adoc           |   2 +-
 Documentation/git-stash.adoc           |  29 ++++-
 Documentation/merge-options.adoc       |   3 +
 builtin/merge.c                        |  66 +++++++++-
 builtin/pull.c                         |   3 +
 builtin/stash.c                        | 460 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 contrib/coccinelle/commit.cocci        |   3 +-
 hash.h                                 |   1 +
 object-name.c                          |   6 +-
 t/t0021-conversion.sh                  |   4 +-
 t/t0610-reftable-basics.sh             |   6 +-
 t/t0612-reftable-jgit-compatibility.sh |  13 +-
 t/t0613-reftable-write-options.sh      |  24 +---
 t/t1400-update-ref.sh                  |  10 +-
 t/t3903-stash.sh                       | 101 +++++++++++++++
 t/t5004-archive-corner-cases.sh        |   5 +-
 t/t6422-merge-rename-corner-cases.sh   |  10 +-
 t/t7422-submodule-output.sh            |   9 +-
 t/t7600-merge.sh                       |  74 ++++++++++-
 t/test-lib-functions.sh                |  16 ++-
 22 files changed, 789 insertions(+), 85 deletions(-)

As can be seen, after treeless (or blobless) clone, git pull has to be executed twice to complete the operation, the first time calling always git fails.  With always I mean over longer period of time I tried this with many different repositories.

git version 2.50.0

Kind regards
  Дилян

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: git treeless-clone + wait + pull → problem, again pull → OK
  2025-07-01  9:24 git treeless-clone + wait + pull → problem, again pull → OK Дилян Палаузов
@ 2025-07-22  9:17 ` Jeff King
  2025-07-22  9:24   ` Jeff King
  0 siblings, 1 reply; 3+ messages in thread
From: Jeff King @ 2025-07-22  9:17 UTC (permalink / raw)
  To: Дилян Палаузов
  Cc: Jonathan Tan, git

[+cc Jonathan Tan]

On Tue, Jul 01, 2025 at 12:24:05PM +0300, Дилян Палаузов wrote:

> the problem is that when I do a treeless or blobless clone and some
> time later git pull, git prints many, many lines that it tries to
> fetch data, then I interrupt with Ctrl+C, then do git pull again and
> then it completes.  However I never tried to precisely document this
> until now:
> 
> On 26 June 2025 I do
> 
> $ git clone --filter=tree:0 https://github.com/git/git.git
> …
> $ git show --oneline
> cf6f63ea6 (HEAD -> master, origin/master, origin/HEAD) The fourth batch
> 
> 
> Today I do 
> 
> $ git pull
> From https://github.com/git/git
>    cf6f63ea6..83014dc05  master     -> origin/master
>    74e6fc65d..83e99ddf4  next       -> origin/next
>  + bc3287e71...a842a7780 seen       -> origin/seen  (forced update)
>    fefffbb31..7af8e2e03  todo       -> origin/todo
> fatal: You are attempting to fetch cf6f63ea6bf35173e02e18bdc6a4ba41288acff9, which is in the commit graph file but not in the object database.
> This is probably due to repo corruption.
> If you are attempting to repair this repo corruption by refetching the missing object, use 'git fetch --refetch' with the missing object.
> fatal: could not fetch 5e66731277a4d791043dc51e2804dc0b496c523b from promisor remote

I took a look at this a few weeks ago and came up with a reproducible
recipe:

  git init repo
  cd repo

  url=https://github.com/git/git.git
  git fetch --filter=tree:0 $url cf6f63ea6bf35173e02e18bdc6a4ba41288acff9:refs/heads/foo
  git checkout foo
  git commit-graph write --reachable
  git fetch $url

That final fetch ends up spawning a seemingly endless (though I suspect
actually finite) series of child fetches. Looking at the callstack, it's
coming from fetch_submodules(), which wants to do tree diffs in the
fetched history looking for changed submodules. But of course we don't
have those trees, so we fault them in one by one.

Which certainly seems non-ideal. But what is more interesting is that
while doing so, we eventually hit that same fatal error:

  fatal: You are attempting to fetch cf6f63ea6bf35173e02e18bdc6a4ba41288acff9, which is in the commit graph file but not in the object database.

I tried swapping out $url for a local copy of the repo (to stop
hammering poor GitHub's servers). But it doesn't seem to reproduce. That
plus the apparently-random time of failure makes me think it's a race
condition.

And there is something interesting happening in the background here:
each of those sub-fetches may kick off an asynchronous "git maintenance"
run. Which will find something useful to do because we're building up a
big pile of packs. So it eventually tries to repack.

And so it seems our race is that fetch sees the commit in the commit
graph but sometimes _not_ the actual packfile (because it got repacked).
Usually we'd try to re-scan the packfiles for exactly this case. But the
caller in deref_without_lazy_fetch() does not do so. It calls
has_object() without any flags, avoiding the re-scan, like this:

          commit = lookup_commit_in_graph(the_repository, oid);
          if (commit) {
                  if (mark_tags_complete_and_check_obj_db) {
                          if (!odb_has_object(the_repository->objects, oid, 0))
                                  die_in_commit_graph_only(oid);
                  }
                  return commit;
          }

This is due to 5d4cc78f72 (fetch-pack: die if in commit graph but not
obj db, 2024-11-05). I'm not sure I fully understand all of the details
of that commit, so I don't have a solution. Maybe it should be passing
HAS_OBJECT_RECHECK_PACKED?

Normally I'd worry about performance, since fetch is often asking about
objects we don't expect to have (and that re-scan is expensive and
normally just confirms that no, we don't have the object). But in this
case we'll only hit this check if we called lookup_commit_in_graph().
Which implies we _do_ expect to have it (or it's a weird state where the
graph file is stale). So maybe it would be OK to use that flag in this
call?

I don't think this race is strictly limited to using filters. It's just
that it's a convenient way to start a big string of fetches that will
race with repacks.

Whether or not fetch should avoid kicking off that big string of
fetches, I don't know. Passing --no-recurse-submodules obviously dulls
the pain. Perhaps the default behavior ought to be different in a
tree-less repo. Or maybe those tree diffs should be done with
lazy-fetching turned off (there is no point in recursing for a version
of a submodule whose parent tree we don't even have!). But I think
that's all orthogonal to the race.

Hmm. So I actually was just intending to write up my notes from a few
weeks ago. But I think I may have talked myself into the idea that this
patch is the right fix:

diff --git a/fetch-pack.c b/fetch-pack.c
index 5e74235fc0..7288e2d251 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -142,7 +142,8 @@ static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
 	commit = lookup_commit_in_graph(the_repository, oid);
 	if (commit) {
 		if (mark_tags_complete_and_check_obj_db) {
-			if (!odb_has_object(the_repository->objects, oid, 0))
+			if (!odb_has_object(the_repository->objects, oid,
+					    HAS_OBJECT_RECHECK_PACKED))
 				die_in_commit_graph_only(oid);
 		}
 		return commit;

I'd like to hear what Jonathan Tan thinks, though.

-Peff

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: git treeless-clone + wait + pull → problem, again pull → OK
  2025-07-22  9:17 ` Jeff King
@ 2025-07-22  9:24   ` Jeff King
  0 siblings, 0 replies; 3+ messages in thread
From: Jeff King @ 2025-07-22  9:24 UTC (permalink / raw)
  To: Дилян Палаузов
  Cc: Jonathan Tan, git

On Tue, Jul 22, 2025 at 05:17:49AM -0400, Jeff King wrote:

> Whether or not fetch should avoid kicking off that big string of
> fetches, I don't know. Passing --no-recurse-submodules obviously dulls
> the pain. Perhaps the default behavior ought to be different in a
> tree-less repo. Or maybe those tree diffs should be done with
> lazy-fetching turned off (there is no point in recursing for a version
> of a submodule whose parent tree we don't even have!). But I think
> that's all orthogonal to the race.

In an ideal world, I'd imagine that something like this would make
sense:

diff --git a/submodule.c b/submodule.c
index f8373a9ea7..e064fefd9a 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1851,7 +1851,12 @@ int fetch_submodules(struct repository *r,
 	strvec_push(&spf.args, "--recurse-submodules-default");
 	/* default value, "--submodule-prefix" and its value are added later */
 
-	calculate_changed_submodule_paths(r, &spf.changed_submodule_names);
+	{
+		int save = fetch_if_missing;
+		fetch_if_missing = 0;
+		calculate_changed_submodule_paths(r, &spf.changed_submodule_names);
+		fetch_if_missing = save;
+	}
 	string_list_sort(&spf.changed_submodule_names);
 	run_processes_parallel(&opts);
 

But it doesn't work, because all of the diff code under the hood in the
calculate_changed_submodule_paths() call is not prepared for trees to be
missing. So you just get:

  fatal: unable to read tree (3a112b53a40e2d1240b2a4d01d5e616e0f4f09fd)

or similar. We'd need to teach the diff code some permissive mode where
it quietly ignores trees we don't have locally.

-Peff

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-07-22  9:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-01  9:24 git treeless-clone + wait + pull → problem, again pull → OK Дилян Палаузов
2025-07-22  9:17 ` Jeff King
2025-07-22  9:24   ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).