From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, peff@peff.net, dstolee@microsoft.com
Subject: Re: [PATCH 4/4] midx: report checksum mismatches during 'verify'
Date: Thu, 11 Nov 2021 00:11:32 +0100 [thread overview]
Message-ID: <20211110231132.GB5811@szeder.dev> (raw)
In-Reply-To: <94e9de44e3b52513c5ab48aecd74f809dc34cbe3.1624473543.git.me@ttaylorr.com>
On Wed, Jun 23, 2021 at 02:39:15PM -0400, Taylor Blau wrote:
> 'git multi-pack-index verify' inspects the data in an existing MIDX for
> correctness by checking that the recorded object offsets are correct,
> and so on.
>
> But it does not check that the file's trailing checksum matches the data
> that it records. So, if an on-disk corruption happened to occur in the
> final few bytes (and all other data was recorded correctly), we would:
>
> - get a clean result from 'git multi-pack-index verify', but
> - be unable to reuse the existing MIDX when writing a new one (since
> we now check for checksum mismatches before reusing a MIDX)
>
> Teach the 'verify' sub-command to recognize corruption in the checksum
> by calling midx_checksum_valid().
>
> Suggested-by: Derrick Stolee <dstolee@microsoft.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
> midx.c | 3 +++
> t/t5319-multi-pack-index.sh | 5 +++++
> 2 files changed, 8 insertions(+)
>
> diff --git a/midx.c b/midx.c
> index a12cbbf928..9a35b0255d 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -1228,6 +1228,9 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
> return result;
> }
>
> + if (!midx_checksum_valid(m))
> + midx_report(_("incorrect checksum"));
> +
> if (flags & MIDX_PROGRESS)
> progress = start_delayed_progress(_("Looking for referenced packfiles"),
> m->num_packs);
> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> index d582f370c4..7609f1ea64 100755
> --- a/t/t5319-multi-pack-index.sh
> +++ b/t/t5319-multi-pack-index.sh
> @@ -418,6 +418,11 @@ test_expect_success 'corrupt MIDX is not reused' '
> git multi-pack-index verify
> '
>
> +test_expect_success 'verify incorrect checksum' '
> + pos=$(($(wc -c <$objdir/pack/multi-pack-index) - 1)) &&
> + corrupt_midx_and_verify $pos "\377" $objdir "incorrect checksum"
This test is flaky and can fail with:
...
+ printf \377
+ dd of=.git/objects/pack/multi-pack-index bs=1 seek=3839 conv=notrunc
1+0 records in
1+0 records out
1 byte copied, 5.0656e-05 s, 19.7 kB/s
+ test_must_fail git multi-pack-index verify --object-dir=.git/objects
test_must_fail: command succeeded: git multi-pack-index verify --object-dir=.git/objects
error: last command exited with $?=1
+ mv midx-backup .git/objects/pack/multi-pack-index
not ok 44 - verify incorrect checksum
So the test corrupts the checksum trailer in the 'multi-pack-index'
file by overwriting its last byte with 0xff, but if that byte were
already 0xff, then the file would be left as is, and 'git
multi-pack-index verify' wouldn't find anything amiss.
Since SHA1 is essentially random, there's a 1:256 chance of that
happening, assuming that the file's content is random. That't not
really the case, however, because both the test repository's objects
and the resulting packfiles are deterministic, and, consequently, the
content of the MIDX is somewhat deterministic. Only "somewhat",
though, because several of those objects appear in multiple packfiles,
and the MIDX selects a copy in the most recently modified packfile, so
filesystem mtime resolution and second boundaries become significant,
and cause some variance in the contents of the OOFF chunk.
Recently a laptop crossed my way that is somehow exceptionally good at
generating MIDX files ending with 0xff:
# ad-hoc checksum statistics:
$ git diff
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index f1ee2ce56d..605713b518 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -482,9 +482,13 @@ test_expect_success 'corrupt MIDX is not reused' '
'
test_expect_success 'verify incorrect checksum' '
+ skip=$(($(wc -c <$objdir/pack/multi-pack-index) - 20)) &&
+ printf "checksum: " >&5 &&
+ od -x -w20 -j$skip --endian=big -An "$objdir/pack/multi-pack-index" >&5 &&
pos=$(($(wc -c <$objdir/pack/multi-pack-index) - 1)) &&
corrupt_midx_and_verify $pos "\377" $objdir "incorrect checksum"
'
+test_done
test_expect_success 'repack progress off for redirected stderr' '
GIT_PROGRESS_DELAY=0 git multi-pack-index --object-dir=$objdir repack 2>err &&
$ for i in {1..500} ; do ./t5319-multi-pack-index.sh |sed -n -e 's/^checksum: //p' ; done |sort |uniq -c
31 1a70 3b1c 8ed3 56a6 5101 2a38 057e 698d 6faf fbaa
340 5fc0 552f 0ac0 c876 f229 d9e3 ef13 a314 5847 89ff
129 ce7d 3710 fd21 ef7b 8316 2b99 4e6c e5d5 5e7c 7b08
That's almost 70% failure rate, though I haven't been able to trigger
this failure on any of the other machines that I have access to.
next prev parent reply other threads:[~2021-11-10 23:11 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-23 18:39 [PATCH 0/4] midx: verify MIDX checksum before reusing Taylor Blau
2021-06-23 18:39 ` [PATCH 1/4] csum-file: introduce checksum_valid() Taylor Blau
2021-06-24 19:42 ` Jeff King
2021-06-23 18:39 ` [PATCH 2/4] commit-graph: rewrite to use checksum_valid() Taylor Blau
2021-06-24 19:42 ` Jeff King
2021-06-23 18:39 ` [PATCH 3/4] midx: don't reuse corrupt MIDXs when writing Taylor Blau
2021-06-24 20:00 ` Jeff King
2021-06-23 18:39 ` [PATCH 4/4] midx: report checksum mismatches during 'verify' Taylor Blau
2021-06-24 4:22 ` Bagas Sanjaya
2021-06-24 20:10 ` Jeff King
2021-11-10 23:11 ` SZEDER Gábor [this message]
2021-11-11 10:05 ` Jeff King
2021-11-16 21:10 ` Taylor Blau
2021-11-16 21:38 ` [PATCH] t5319: corrupt more bytes of the midx checksum Jeff King
2021-11-16 21:43 ` Taylor Blau
2021-11-16 22:12 ` Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211110231132.GB5811@szeder.dev \
--to=szeder.dev@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.