All of lore.kernel.org
 help / color / mirror / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Jeff King <peff@peff.net>, git@vger.kernel.org
Subject: Re: commit-graph overflow generation chicken and egg
Date: Wed, 8 Jun 2022 16:08:03 -0400	[thread overview]
Message-ID: <581c7ef2-3de4-eb8a-bfbb-d4bca3522a2d@github.com> (raw)
In-Reply-To: <YqD5dgalb9EPnz85@coredump.intra.peff.net>

On 6/8/2022 3:33 PM, Jeff King wrote:
> I hadn't touched my git.git repository for a while, so I upgraded to the
> most recent version of Git (v2.36.1) and was met with this:
> 
>   $ git rev-list --all
>   fatal: commit-graph requires overflow generation data but has none
> 
> Not very friendly, but OK, maybe my commit graph is out of date. Let's
> regenerate it:
> 
>   $ git gc
>   fatal: commit-graph requires overflow generation data but has none
>   fatal: failed to run repack
> 
> OK, we can't get far enough in the gc to rebuild the commit graph. Let's
> try doing it manually:
> 
>   $ git commit-graph write --reachable
>   fatal: commit-graph requires overflow generation data but has none
>   $ git commit-graph write
>   fatal: commit-graph requires overflow generation data but has none
> 
> Yikes. Here's where it happens within the write process:
> 
>   $ GIT_PROGRESS_DELAY=0 git commit-graph write
>   Finding commits for commit graph among packed objects: 100% (360229/360229), done.
>   Loading known commits in commit graph: 100% (78366/78366), done.
>   Expanding reachable commits in commit graph: 78366, done.
>   Clearing commit marks in commit graph: 100% (78366/78366), done.
>   Finding extra edges in commit graph: 100% (78366/78366), done.
>   Computing commit graph topological levels: 100% (78366/78366), done.
>   Computing commit graph generation numbers: 100% (78366/78366), done.
>   fatal: commit-graph requires overflow generation data but has none
> 
> Now being the enterprising fellow that I am, I was able to get out of it
> like this:
> 
>   $ rm -f objects/info/commit-graph
>   $ git gc
> 
> But I wonder if this is a foot-gun waiting for some other user. I'm not
> sure how I got into the broken state exactly. The repo was last touched
> in December using a version of Git running 'next'. It worked fine with
> versions of Git prior to 6dbf4b8172 (commit-graph: declare bankruptcy on
> GDAT chunks, 2022-03-02). It's entirely possible that the bad state was
> generated by a version of Git that wasn't ever released, and this isn't
> a problem that normal humans would ever run into. It does feel a bit
> unfriendly that neither gc nor commit-graph could unstick things,
> though. Especially because 6dbf4b8172 says:
> 
>   [...]a previous version of Git wrote possibly erroneous data in these
>   chunks with the IDs "GDAT" and "GDOV". By changing the IDs, newer
>   versions of Git will silently ignore those older chunks[...]

You've done enough homework to discover exactly what's going on here,
including talking about the commit that I was going to point out.

> Presumably we _are_ ignoring those chunks, but some other part of the
> commit-graph file has a dependency on them (and of course we don't have
> the new GDA2/GDO2 chunks to read in their place). If that's true, then
> the solution may be a more graceful "we can't use this commit graph"
> error return rather than the "fatal:" message seen above.
> 
> I have a copy of the broken repo state if anybody would care to look at
> it.

I'd love to see the full binary, but for the sake of sharing on the
list, could you give the following output?

	xxd .git/objects/info/commit-graph | head

or any other command that shows the first few hex bytes along with
their ASCII equivalents. Here is one that used Git 2.34.0:

00000000: 4347 5048 0101 0500 4f49 4446 0000 0000  CGPH....OIDF....
00000010: 0000 0050 4f49 444c 0000 0000 0000 0450  ...POIDL.......P
00000020: 4344 4154 0000 0000 002d cf0c 4744 4154  CDAT.....-..GDAT
00000030: 0000 0000 0080 3bf8 4544 4745 0000 0000  ......;.EDGE....
00000040: 0089 6484 0000 0000 0000 0000 0089 6660  ..d...........f`
00000050: 0000 0246 0000 0498 0000 06c8 0000 0908  ...F............
00000060: 0000 0b4b 0000 0d58 0000 0f9e 0000 1207  ...K...X........
00000070: 0000 1423 0000 168e 0000 18d3 0000 1b36  ...#...........6
00000080: 0000 1d67 0000 1f9b 0000 2219 0000 2456  ...g......"...$V
00000090: 0000 26a2 0000 2908 0000 2b5c 0000 2d96  ..&...)...+\..-.

and here is one with latest master:

00000000: 4347 5048 0101 0500 4f49 4446 0000 0000  CGPH....OIDF....
00000010: 0000 0050 4f49 444c 0000 0000 0000 0450  ...POIDL.......P
00000020: 4344 4154 0000 0000 002e 20b0 4744 4132  CDAT...... .GDA2
00000030: 0000 0000 0081 2090 4544 4745 0000 0000  ...... .EDGE....
00000040: 008a 5970 0000 0000 0000 0000 008a 5b4c  ..Yp..........[L
00000050: 0000 024c 0000 04a1 0000 06d6 0000 091a  ...L............
00000060: 0000 0b60 0000 0d70 0000 0fbb 0000 1229  ...`...p.......)
00000070: 0000 1448 0000 16b6 0000 1905 0000 1b6c  ...H...........l
00000080: 0000 1da3 0000 1fd9 0000 225b 0000 249d  .........."[..$.
00000090: 0000 26f0 0000 2956 0000 2bac 0000 2dea  ..&...)V..+...-.

If you have a GDA2 chunk, then we are somehow incorrectly writing the
offset there. However, I don't see this happening in the latest code.

We have this macro in commit.h:

#define GENERATION_NUMBER_V2_OFFSET_MAX ((1ULL << 31) - 1)

and this in commit-graph.c:

#define CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW (1ULL << 31)

being used here:

static int write_graph_chunk_generation_data(struct hashfile *f,
					     void *data)
{
	struct write_commit_graph_context *ctx = data;
	int i, num_generation_data_overflows = 0;

	for (i = 0; i < ctx->commits.nr; i++) {
		struct commit *c = ctx->commits.list[i];
		timestamp_t offset;
		repo_parse_commit(ctx->r, c);
		offset = commit_graph_data_at(c)->generation - c->date;
		display_progress(ctx->progress, ++ctx->progress_cnt);

		if (offset > GENERATION_NUMBER_V2_OFFSET_MAX) {
			offset = CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW | num_generation_data_overflows;
			num_generation_data_overflows++;
		}

		hashwrite_be32(f, offset);
	}

	return 0;
}

But this hasn't been changed since eb9071912f (commit-graph: anonymize
data in chunk_write_fn, 2021-02-05) and 90cb1c47c7 (commit-graph: always
parse before commit_graph_data_at(), 2021-02-01) which were not meaningfully
different from the first implementation in e8b63005c4 (commit-graph:
implement generation data chunk, 2021-01-16).

However, the lack of the large offset chunk could be due to the bug fixed by
75979d9460 (commit-graph: fix ordering bug in generation numbers,
2022-03-01). Perhaps that was the thing that was missing from your version?

But otherwise, I'm stumped. I'd be very interested to see a repro from a
fresh repository. That is: what situation do we need to be in to write such
an offset without including the large offset chunk?

Thanks,
-Stolee

  reply	other threads:[~2022-06-08 20:08 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-08 19:33 commit-graph overflow generation chicken and egg Jeff King
2022-06-08 20:08 ` Derrick Stolee [this message]
2022-06-08 23:17   ` Jeff King
2022-07-01 12:06     ` Patrick Steinhardt
2022-07-04 10:46       ` Patrick Steinhardt
2022-07-04 20:50         ` Derrick Stolee
2022-07-05 21:03           ` Will Chandler
2022-07-05 22:28             ` Taylor Blau
2022-07-06  8:52               ` Jeff King
2022-07-06  9:11           ` Jeff King
2022-06-09  7:49   ` Ævar Arnfjörð Bjarmason
2022-06-09 15:26     ` Jeff King
2022-06-09 15:39       ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=581c7ef2-3de4-eb8a-bfbb-d4bca3522a2d@github.com \
    --to=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.