From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>,
Elijah Newren <newren@gmail.com>
Subject: [PATCH v2 1/2] t5332-multi-pack-reuse.sh: demonstrate duplicate packing failure
Date: Thu, 14 Nov 2024 08:42:09 -0500 [thread overview]
Message-ID: <d791b7b20c94d637e52bb645ff8f06ea25e4bd77.1731591708.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1731591708.git.me@ttaylorr.com>
In the multi-pack reuse code, there are two paths for reusing the
on-disk representation of an object, handled by:
- builtin/pack-objects.c::write_reused_pack_one()
- builtin/pack-objects.c::write_reused_pack_verbatim()
The former is responsible for copying the bytes for a single object out
of an existing source pack. The latter does the same but for a region of
objects aligned at eword_t boundaries.
Demonstrate a bug whereby write_reused_pack_verbatim() can be tricked
into writing out objects from some source pack, even when those objects
were selected from a different source pack in the MIDX bitmap.
When the caller wants at least one of the objects in that region,
pack-objects will write the same object twice as a result of this bug.
In the other case where the caller doesn't want any of the objects in
the region of interest, we will write out objects that weren't
requested.
Demonstrate this bug by creating two packs, where the preferred one of
those packs contains a single object which also appears in the main
(non-preferred) pack. A separate bug[^1] prevents us from triggering the
main bug when the duplicated object is the last one in the main pack,
but any earlier object will suffice.
We could fix that separate bug, but the following commit will simplify
write_reused_pack_verbatim() and only call it on the preferred pack, so
doing so would have little point.
[^1]: Because write_reused_pack_verbatim() only reuses bits in the range
off_t pack_start_off = pack_pos_to_offset(reuse_packfile->p, 0);
off_t pack_end_off = pack_pos_to_offset(reuse_packfile->p,
pos - reuse_packfile->bitmap_pos);
written += pos - reuse_packfile->bitmap_pos;
/* We're recording one chunk, not one object. */
record_reused_object(pack_start_off,
pack_start_off - (hashfile_total(out) - pack_start));
, or in other words excluding the object beginning at position 'pos -
reuse_packfile->bitmap_pos' in the source pack. But since
reuse_packfile->bitmap_pos is '1' in the non-preferred pack
(accounting for the single-object pack which is preferred), we don't
actually copy the bytes from the last object.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
t/t5332-multi-pack-reuse.sh | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/t/t5332-multi-pack-reuse.sh b/t/t5332-multi-pack-reuse.sh
index 955ea42769b..d87ea0ae19b 100755
--- a/t/t5332-multi-pack-reuse.sh
+++ b/t/t5332-multi-pack-reuse.sh
@@ -259,4 +259,26 @@ test_expect_success 'duplicate objects' '
)
'
+test_expect_failure 'duplicate objects with verbatim reuse' '
+ git init duplicate-objects-verbatim &&
+ (
+ cd duplicate-objects-verbatim &&
+
+ git config pack.allowPackReuse multi &&
+
+ test_commit_bulk 64 &&
+
+ # take the first object from the main pack...
+ git show-index <$(ls $packdir/pack-*.idx) >obj.raw &&
+ sort -nk1 <obj.raw | head -n1 | cut -d" " -f2 >in &&
+
+ # ...and create a separate pack containing just that object
+ p="$(git pack-objects $packdir/pack <in)" &&
+
+ git multi-pack-index write --bitmap --preferred-pack=pack-$p.idx &&
+
+ test_pack_objects_reused_all 192 2
+ )
+'
+
test_done
--
2.46.0.421.g159f2d50e75
next prev parent reply other threads:[~2024-11-14 13:42 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-13 17:32 [PATCH 0/2] pack-objects: more brown-paper-bag multi-pack reuse fixes Taylor Blau
2024-11-13 17:32 ` [PATCH 1/2] t5332-multi-pack-reuse.sh: demonstrate duplicate packing failure Taylor Blau
2024-11-14 1:12 ` Junio C Hamano
2024-11-14 13:37 ` Taylor Blau
2024-11-13 17:32 ` [PATCH 2/2] pack-objects: only perform verbatim reuse on the preferred pack Taylor Blau
2024-11-14 0:25 ` Jeff King
2024-11-14 13:40 ` Taylor Blau
2024-11-15 9:57 ` Jeff King
2024-11-22 9:16 ` Kristoffer Haugsbakk
2024-11-14 13:42 ` [PATCH v2 0/2] pack-objects: more brown-paper-bag multi-pack reuse fixes Taylor Blau
2024-11-14 13:42 ` Taylor Blau [this message]
2024-11-14 13:42 ` [PATCH v2 2/2] pack-objects: only perform verbatim reuse on the preferred pack Taylor Blau
2024-11-22 4:44 ` Junio C Hamano
2024-11-22 8:33 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d791b7b20c94d637e52bb645ff8f06ea25e4bd77.1731591708.git.me@ttaylorr.com \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).