From: Patrick Steinhardt <ps@pks.im>
To: Junio C Hamano <gitster@pobox.com>
Cc: Ilya K <me@0upti.me>,
git@vger.kernel.org,
"brian m. carlson" <sandals@crustytoothpaste.net>
Subject: Re: git 2.46.0 crashes when trying to verify-pack outside of a repo
Date: Mon, 2 Sep 2024 01:45:43 +0200 [thread overview]
Message-ID: <ZtT8p06fdTwXO7iX@tanuki> (raw)
In-Reply-To: <xmqq7cbvpf8v.fsf@gitster.g>
On Sun, Sep 01, 2024 at 08:26:08AM -0700, Junio C Hamano wrote:
> Ilya K <me@0upti.me> writes:
>
> > We've updated to Git 2.46.0 in NixOS, and encountered an issue
> > with Dulwich (a Python Git implementation) tests failing[0]
> > because it attempts to call `git verify-pack` on a bare pack, with
> > no surrounding repo. This used to work in Git 2.45.x, but in 2.46
> > it simply prints "error: index-pack died of signal 11".
>
> Thanks. This is a fallout from code-wide clean-up in 2.46.0 where
> we do not assume that everybody runs SHA-1.
Yup, indeed. The problem lies deeper than what the below patch fixes
though. The issue isn't in git-verify-pack(1), but in git-index-pack(1),
and can be fixed like this:
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index fd968d673d2..e6edd96d099 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1733,7 +1733,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
unsigned char pack_hash[GIT_MAX_RAWSZ];
unsigned foreign_nr = 1; /* zero is a "good" value, assume bad */
int report_end_of_input = 0;
- int hash_algo = 0;
+ int hash_algo = GIT_HASH_UNKNOWN;;
/*
* index-pack never needs to fetch missing objects except when
@@ -1857,6 +1857,9 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
pack_name = arg;
}
+ if (!the_repository->hash_algo && hash_algo == GIT_HASH_UNKNOWN)
+ repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
if (!pack_name && !from_stdin)
usage(index_pack_usage);
if (fix_thin_pack && !from_stdin)
Unfortunately, this once again uncovers a deeper issue: neither the
packfile nor their index encode the object format they use. So while
falling back to SHA1 papers over the issue, it means that we misparse
SHA256 indices. Also, we misparse SHA1 indices if we happen to be in a
SHA256 repository. E.g. when parsing a SHA256 file in a SHA1 repo:
$ git index-pack --verify '/tmp/git-tests/trash directory.t5300-pack-object/repo/.git/objects/pack/pack-aa45f7f08f043c9f0388f1844a2a797587254e249919b35ac9dc2b52c1aada29.pack'
error: wrong index v2 file size in /tmp/git-tests/trash directory.t5300-pack-object/repo/.git/objects/pack/pack-aa45f7f08f043c9f0388f1844a2a797587254e249919b35ac9dc2b52c1aada29.idx
fatal: Cannot open existing pack idx file for '/tmp/git-tests/trash directory.t5300-pack-object/repo/.git/objects/pack/pack-aa45f7f08f043c9f0388f1844a2a797587254e249919b35ac9dc2b52c1aada29.idx'
The error message isn't even properly indicating what the actual issue
is.
One potential solution would be to try and derive the object format from
the hash that the packfile index name has. But that is quite roundabout
and rather ugly, and packfiles may not necessarily have that hash in the
first place. It would also become potentially ambiguous in the future if
we were to ever adopt another hash that has the same length as either
SHA1 or SHA256.
So we basically have three different options:
- Accept that we just don't handle this case correctly and let the
code error out. This pessimizes all hashes but SHA256.
- Bail out when outside of a repository when `--object-format=` wasn't
given. This pessimizes all hashes, but gives a clear indicator to
the user why things don't work.
- Introduce packfiles v3 and encode the object format into the header.
Then do either (1) or (2) on top.
The last option is of course the cleanest, but also the most involved.
Patrick
> ------- >8 -------
> Subject: verify-pack: fall back to SHA-1 outside a repo
>
> In c8aed5e8da (repository: stop setting SHA1 as the default object hash,
> 2024-05-07), we have stopped setting the default hash algorithm for
> `the_repository`. Consequently, code that relies on `the_hash_algo` will
> now crash when it hasn't explicitly been initialized, which may be the
> case when running outside of a Git repository.
>
> As the verify-pack command ought to be able to infer what algorithm
> is used in the input file (and if the input file does not have such
> an information, that by itself is a problem), and the command allows
> an option to explicitly tell what algorithm to use in case it cannot
> be guessed from the input file, in theory we shouldn't have to use
> the default algorithm anywhere in the operation of the command, but
> we fail fairly early in the process when run outside a repository
> without any default algorithm set.
>
> Resurrect the setting of the default algorithm just like we used to
> do before 2.46.0
>
> Reported-by: Ilya K <me@0upti.me>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> builtin/verify-pack.c | 4 ++++
> t/t5300-pack-object.sh | 4 ++++
> 2 files changed, 8 insertions(+)
>
> diff --git c/builtin/verify-pack.c w/builtin/verify-pack.c
> index 011dddd2dc..5b663905ae 100644
> --- c/builtin/verify-pack.c
> +++ w/builtin/verify-pack.c
> @@ -1,6 +1,7 @@
> #include "builtin.h"
> #include "config.h"
> #include "gettext.h"
> +#include "hash.h"
> #include "run-command.h"
> #include "parse-options.h"
> #include "strbuf.h"
> @@ -77,6 +78,9 @@ int cmd_verify_pack(int argc, const char **argv, const char *prefix)
> OPT_END()
> };
>
> + if (!the_hash_algo)
> + repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> +
> git_config(git_default_config, NULL);
> argc = parse_options(argc, argv, prefix, verify_pack_options,
> verify_pack_usage, 0);
> diff --git c/t/t5300-pack-object.sh w/t/t5300-pack-object.sh
> index 4ad023c846..d6f45d8923 100755
> --- c/t/t5300-pack-object.sh
> +++ w/t/t5300-pack-object.sh
> @@ -322,6 +322,10 @@ test_expect_success 'verify-pack catches a corrupted sum of the index file itsel
> fi
> '
>
> +test_expect_success 'verify-pack outside a repository' '
> + nongit git verify-pack -v "$(pwd)/test-1-${packname_1}.idx"
> +'
> +
> test_expect_success 'build pack index for an existing pack' '
> cat test-1-${packname_1}.pack >test-3.pack &&
> git index-pack -o tmp.idx test-3.pack &&
next prev parent reply other threads:[~2024-09-01 23:45 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-31 6:46 git 2.46.0 crashes when trying to verify-pack outside of a repo Ilya K
2024-09-01 15:26 ` Junio C Hamano
2024-09-01 23:45 ` Patrick Steinhardt [this message]
2024-09-02 13:18 ` brian m. carlson
2024-09-02 13:47 ` Patrick Steinhardt
2024-09-03 15:52 ` Junio C Hamano
2024-09-04 6:26 ` [PATCH] builtin/index-pack: fix segfaults when running " Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZtT8p06fdTwXO7iX@tanuki \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@0upti.me \
--cc=sandals@crustytoothpaste.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).