* [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible
@ 2024-10-09 20:30 Taylor Blau
2024-10-09 20:31 ` [PATCH 01/11] pack-bitmap.c: do not pass `pack_pos` to `try_partial_reuse()` Taylor Blau
` (12 more replies)
0 siblings, 13 replies; 31+ messages in thread
From: Taylor Blau @ 2024-10-09 20:30 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
This patch series enables more objects to be candidates for verbatim
reuse when generating a pack with the aide of reachability bitmaps.
By the end of this series, two new classes of objects are now reuse
candidates which were not before. They are:
- Cross-pack deltas. In multi-pack bitmaps, if the delta and base
were selected from different packs, the delta was not reusable.
- Thin deltas. In both single- and multi-pack bitmaps, we did not
consider reusing deltas whose base object appears in the 'haves'
bitmap.
This series teaches the pack-objects machinery to convert cross-pack
and thin deltas to reference deltas so that they may be reused.
The series started off as a couple of patches, each addressing one of
the two classes from the above. But as I started refining these
patches, the series grew longer, as there were a handful of
opportunities to clean up some of the existing pack-reuse bitmap code.
The series is organized as follows:
- The first seven patches are various cleanups that make the
pack-reuse code more readable and prepare us to enable delta reuse
in more situation as above.
pack-bitmap.c: do not pass `pack_pos` to `try_partial_reuse()`
pack-bitmap.c: avoid unnecessary `offset_to_pack_pos()`
pack-bitmap.c: delay calling 'offset_to_pack_pos()'
pack-bitmap.c: compare `base_offset` to `delta_obj_offset`
pack-bitmap.c: extract `find_base_bitmap_pos()`
pack-bitmap: drop `from_midx` field from `bitmapped_pack`
write_reused_pack_one(): translate bit positions directly
- The next patch is a test fixup:
t5332: enable OFS_DELTAs via test_pack_objects_reused
- The next patch after that enables us to reuse deltas whose base
appears in another pack:
pack-bitmap: enable cross-pack delta reuse
- The final two patches enable the "thin deltas" optimization
above:
pack-bitmap.c: record whether the result was filtered
pack-bitmap: enable reusing deltas with base objects in 'haves' bitmap
I think that the first seven patches are worthwhile on their own. I
considered splitting those out into their own series, but decided
against it to so that we could realize their benefit more readily.
The first optimization (cross-pack deltas) should help clones and
fetches, but the second optimization (thin deltas) will only help
fetches, since the 'haves' bitmap will necessarily be empty for
clones.
Of course, REF_DELTAs have a less compact representation than
OFS_DELTAs, so the resulting packs will trade off some CPU time for a
slightly larger pack. But we're only talking about a handful of extra
bytes, and network bandwidth is pretty cheap, so I think the
optimization is worthwhile here too.
Thanks in advance for your review!
Taylor Blau (11):
pack-bitmap.c: do not pass `pack_pos` to `try_partial_reuse()`
pack-bitmap.c: avoid unnecessary `offset_to_pack_pos()`
pack-bitmap.c: delay calling 'offset_to_pack_pos()'
pack-bitmap.c: compare `base_offset` to `delta_obj_offset`
pack-bitmap.c: extract `find_base_bitmap_pos()`
pack-bitmap: drop `from_midx` field from `bitmapped_pack`
write_reused_pack_one(): translate bit positions directly
t5332: enable OFS_DELTAs via test_pack_objects_reused
pack-bitmap: enable cross-pack delta reuse
pack-bitmap.c: record whether the result was filtered
pack-bitmap: enable reusing deltas with base objects in 'haves' bitmap
builtin/pack-objects.c | 110 ++++++++++++--------
midx.c | 1 -
pack-bitmap.c | 198 ++++++++++++++++++++++++------------
pack-bitmap.h | 6 +-
t/t5332-multi-pack-reuse.sh | 72 ++++++++++++-
5 files changed, 275 insertions(+), 112 deletions(-)
base-commit: 777489f9e09c8d0dd6b12f9d90de6376330577a2
--
2.47.0.11.g487258bca34
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH 01/11] pack-bitmap.c: do not pass `pack_pos` to `try_partial_reuse()`
2024-10-09 20:30 [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible Taylor Blau
@ 2024-10-09 20:31 ` Taylor Blau
2024-10-09 20:31 ` [PATCH 02/11] pack-bitmap.c: avoid unnecessary `offset_to_pack_pos()` Taylor Blau
` (11 subsequent siblings)
12 siblings, 0 replies; 31+ messages in thread
From: Taylor Blau @ 2024-10-09 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
The function `try_partial_reuse()` uses its `pack_pos` parameter to
determine if:
- the object specified by that position is in bounds of the pack
it's operating on, and
- in the case that the specified object is an OFS_DELTA (and we're
operating on a single-pack reachability bitmap) to ensure that the
position of the base object occurs earlier than the delta
But neither of these checks are necessary. The bounds-check is
redundant because we are operating on bit positions which we know
correspond to objects in some pack, and the caller in
reuse_partial_packfile_from_bitmap_1() is smart enough to know to stop
processing bits when it is at the end of some pack.
Likewise, the delta dependency check is also unnecessary, because we
can use the object offsets within a single pack as a proxy for bit
positions in that pack's bitmap.
So let's avoid passing in this redundant piece of information, and
save one call to offset_to_pack_pos(), which is O(log N) in the number
of objects in the pack we're currently processing.
(This all comes from a discussion on a semi-related series from [1]).
[1]: https://lore.kernel.org/git/Zti0xrzCtpWScPjz@nand.local/
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 9d9b8c4bfbc..706ff26a7b1 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2054,7 +2054,6 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
static int try_partial_reuse(struct bitmap_index *bitmap_git,
struct bitmapped_pack *pack,
size_t bitmap_pos,
- uint32_t pack_pos,
off_t offset,
struct bitmap *reuse,
struct pack_window **w_curs)
@@ -2063,9 +2062,6 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
enum object_type type;
unsigned long size;
- if (pack_pos >= pack->p->num_objects)
- return -1; /* not actually in the pack */
-
delta_obj_offset = offset;
type = unpack_object_header(pack->p, w_curs, &offset, &size);
if (type < 0)
@@ -2121,8 +2117,13 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
* OFS_DELTA is the default). But let's double check to
* make sure the pack wasn't written with odd
* parameters.
+ *
+ * Since we're working on a single-pack bitmap, we can
+ * use the object offset as a proxy for the bit
+ * position, since the bits are ordered by their
+ * positions within the pack.
*/
- if (base_pos >= pack_pos)
+ if (base_offset >= offset)
return 0;
base_bitmap_pos = pack->bitmap_pos + base_pos;
}
@@ -2184,7 +2185,6 @@ static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git
for (offset = 0; offset < BITS_IN_EWORD; offset++) {
size_t bit_pos;
- uint32_t pack_pos;
off_t ofs;
if (word >> offset == 0)
@@ -2203,12 +2203,9 @@ static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git
midx_pos = pack_pos_to_midx(bitmap_git->midx, bit_pos);
ofs = nth_midxed_offset(bitmap_git->midx, midx_pos);
-
- if (offset_to_pack_pos(pack->p, ofs, &pack_pos) < 0)
- BUG("could not find object in pack %s "
- "at offset %"PRIuMAX" in MIDX",
- pack_basename(pack->p), (uintmax_t)ofs);
} else {
+ uint32_t pack_pos;
+
pack_pos = cast_size_t_to_uint32_t(st_sub(bit_pos, pack->bitmap_pos));
if (pack_pos >= pack->p->num_objects)
BUG("advanced beyond the end of pack %s (%"PRIuMAX" > %"PRIu32")",
@@ -2219,7 +2216,7 @@ static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git
}
if (try_partial_reuse(bitmap_git, pack, bit_pos,
- pack_pos, ofs, reuse, &w_curs) < 0) {
+ ofs, reuse, &w_curs) < 0) {
/*
* try_partial_reuse indicated we couldn't reuse
* any bits, so there is no point in trying more
--
2.47.0.11.g487258bca34
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 02/11] pack-bitmap.c: avoid unnecessary `offset_to_pack_pos()`
2024-10-09 20:30 [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible Taylor Blau
2024-10-09 20:31 ` [PATCH 01/11] pack-bitmap.c: do not pass `pack_pos` to `try_partial_reuse()` Taylor Blau
@ 2024-10-09 20:31 ` Taylor Blau
2024-10-09 20:31 ` [PATCH 03/11] pack-bitmap.c: delay calling 'offset_to_pack_pos()' Taylor Blau
` (10 subsequent siblings)
12 siblings, 0 replies; 31+ messages in thread
From: Taylor Blau @ 2024-10-09 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
In `try_partial_reuse()`, for any object which is either an OFS_DELTA
or REF_DELTA, we need to determine the bitmap position for its base.
We handle the case of single- and multi-pack bitmaps separately:
- For multi-pack bitmaps, we look to see if the MIDX selected the
base we're looking at to represent that object. If the base object
was selected from a different pack in the MIDX, we won't reuse the
delta.
- For single-pack bitmaps, we convert the base object's offset into
a pack-relative position (identified here as 'base_pos') by
calling `offset_to_pack_pos()`.
But we also call `offset_to_pack_pos()` before handling each case
above separately. For the MIDX case, this is unnecessary, since we
don't need to lookup the base object's pack-relative position;
instead, we just care whether the selected copy of that base is from
the same pack we're currently operating on.
For the non-MIDX case, we end up calling offset_to_pack_pos() again,
yielding the same result, making the earlier call wasteful. This
behavior dates back to 519e17ff75 (pack-bitmap: prepare to mark
objects from multiple packs for reuse, 2023-12-14).
Let's correct that by avoiding the unconditional call to
'offset_to_pack_pos()'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 706ff26a7b1..5c5c26efe0d 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2069,7 +2069,6 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
if (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA) {
off_t base_offset;
- uint32_t base_pos;
uint32_t base_bitmap_pos;
/*
@@ -2085,8 +2084,6 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
if (!base_offset)
return 0;
- offset_to_pack_pos(pack->p, base_offset, &base_pos);
-
if (bitmap_is_midx(bitmap_git)) {
/*
* Cross-pack deltas are rejected for now, but could
@@ -2105,6 +2102,8 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
return 0;
}
} else {
+ uint32_t base_pos;
+
if (offset_to_pack_pos(pack->p, base_offset,
&base_pos) < 0)
return 0;
--
2.47.0.11.g487258bca34
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 03/11] pack-bitmap.c: delay calling 'offset_to_pack_pos()'
2024-10-09 20:30 [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible Taylor Blau
2024-10-09 20:31 ` [PATCH 01/11] pack-bitmap.c: do not pass `pack_pos` to `try_partial_reuse()` Taylor Blau
2024-10-09 20:31 ` [PATCH 02/11] pack-bitmap.c: avoid unnecessary `offset_to_pack_pos()` Taylor Blau
@ 2024-10-09 20:31 ` Taylor Blau
2024-10-09 20:31 ` [PATCH 04/11] pack-bitmap.c: compare `base_offset` to `delta_obj_offset` Taylor Blau
` (9 subsequent siblings)
12 siblings, 0 replies; 31+ messages in thread
From: Taylor Blau @ 2024-10-09 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
When handling single object reuse from a single-pack bitmap, we first
convert the base object's offset into a pack-relative position. Then,
we check if the base object's offset occurs after the delta object's
offset. If that's the case, then we kick the delta'd object back to
the slow path.
But there are a couple of oddities here:
- However unlikely it is that we'll find a delta/base pair with
base_offset >= offset, it doesn't make sense to convert the base's
offset to a pack-relative position if we're going to throw out the
reuse opportunity anyway.
- We "convert" the base object's position into bitmap order by
offsetting it by 'pack->bitmap_pos'. But this makes no sense,
since single-pack bitmaps have only one pack (by definition), and
that pack's first object occurs at bit position 0.
Let's clean up this part of 'try_partial_reuse()' by (a) first seeing
if we can reuse the delta before converting its base object's offset
into a pack position, and (b) avoid an unnecessary conversion with
'pack->bitmap_pos'.
(b) allows us to avoid the intermediate 'base_pos' variable, and
instead write directly into 'base_bitmap_pos', which we also change
here for further clarity.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 5c5c26efe0d..faabc0ba0e9 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2102,11 +2102,6 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
return 0;
}
} else {
- uint32_t base_pos;
-
- if (offset_to_pack_pos(pack->p, base_offset,
- &base_pos) < 0)
- return 0;
/*
* We assume delta dependencies always point backwards.
* This lets us do a single pass, and is basically
@@ -2124,7 +2119,9 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
*/
if (base_offset >= offset)
return 0;
- base_bitmap_pos = pack->bitmap_pos + base_pos;
+ if (offset_to_pack_pos(pack->p, base_offset,
+ &base_bitmap_pos) < 0)
+ return 0;
}
/*
--
2.47.0.11.g487258bca34
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 04/11] pack-bitmap.c: compare `base_offset` to `delta_obj_offset`
2024-10-09 20:30 [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible Taylor Blau
` (2 preceding siblings ...)
2024-10-09 20:31 ` [PATCH 03/11] pack-bitmap.c: delay calling 'offset_to_pack_pos()' Taylor Blau
@ 2024-10-09 20:31 ` Taylor Blau
2024-10-09 20:31 ` [PATCH 05/11] pack-bitmap.c: extract `find_base_bitmap_pos()` Taylor Blau
` (8 subsequent siblings)
12 siblings, 0 replies; 31+ messages in thread
From: Taylor Blau @ 2024-10-09 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
When comparing the offsets of either half of a delta/base-pair, we
compare `base_offset` to `offset`.
There is nothing functionally wrong with that comparison, but it is
slightly confusing, since `offset` points to just after the delta
object's type header in the pack, whereas `base_offset` points to the
beginning of the header.
In practice, that distinction does not matter, and it is perfectly
fine to compare base_offset to offset.
But we already make a copy of `offset` before it is moved forward by
calling the function `unpack_object_header()`. So let's use that copy
(which points at the beginning of the delta object's header) in the
comparison so that we are comparing like quantities.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index faabc0ba0e9..3e1034cabf3 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2117,7 +2117,7 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
* position, since the bits are ordered by their
* positions within the pack.
*/
- if (base_offset >= offset)
+ if (base_offset >= delta_obj_offset)
return 0;
if (offset_to_pack_pos(pack->p, base_offset,
&base_bitmap_pos) < 0)
--
2.47.0.11.g487258bca34
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 05/11] pack-bitmap.c: extract `find_base_bitmap_pos()`
2024-10-09 20:30 [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible Taylor Blau
` (3 preceding siblings ...)
2024-10-09 20:31 ` [PATCH 04/11] pack-bitmap.c: compare `base_offset` to `delta_obj_offset` Taylor Blau
@ 2024-10-09 20:31 ` Taylor Blau
2024-10-09 20:31 ` [PATCH 06/11] pack-bitmap: drop `from_midx` field from `bitmapped_pack` Taylor Blau
` (7 subsequent siblings)
12 siblings, 0 replies; 31+ messages in thread
From: Taylor Blau @ 2024-10-09 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Inside of `try_partial_reuse()` is a block of code that handles
specially the case where the object we are attempting reuse on is
either an OFS_DELTA or a REF_DELTA.
Extract that part of the routine out to a separate function, named
`find_base_bitmap_pos()`. This will allow us to make that routine
slightly more complex in order to implement cross-pack and thin-delta
conversion (which we'll describe in detail in subsequent patches)
without compromising the readability of its caller,
`try_partial_reuse()`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 90 ++++++++++++++++++++++++++++-----------------------
1 file changed, 49 insertions(+), 41 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 3e1034cabf3..6dbe6a2c5bc 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2047,6 +2047,52 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
return NULL;
}
+static int find_base_bitmap_pos(struct bitmap_index *bitmap_git,
+ struct bitmapped_pack *pack,
+ off_t base_offset,
+ off_t delta_obj_offset,
+ uint32_t *base_bitmap_pos)
+{
+ if (bitmap_is_midx(bitmap_git)) {
+ /*
+ * Cross-pack deltas are rejected for now, but could
+ * theoretically be supported in the future.
+ *
+ * We would need to ensure that we're sending both
+ * halves of the delta/base pair, regardless of whether
+ * or not the two cross a pack boundary. If they do,
+ * then we must convert the delta to an REF_DELTA to
+ * refer back to the base in the other pack.
+ * */
+ if (midx_pair_to_pack_pos(bitmap_git->midx, pack->pack_int_id,
+ base_offset, base_bitmap_pos) < 0)
+ return -1;
+ } else {
+ /*
+ * We assume delta dependencies always point backwards.
+ * This lets us do a single pass, and is basically
+ * always true due to the way OFS_DELTAs work. You would
+ * not typically find REF_DELTA in a bitmapped pack,
+ * since we only bitmap packs we write fresh, and
+ * OFS_DELTA is the default). But let's double check to
+ * make sure the pack wasn't written with odd
+ * parameters.
+ *
+ * Since we're working on a single-pack bitmap, we can
+ * use the object offset as a proxy for the bit
+ * position, since the bits are ordered by their
+ * positions within the pack.
+ */
+ if (base_offset >= delta_obj_offset)
+ return -1;
+ if (offset_to_pack_pos(pack->p, base_offset,
+ base_bitmap_pos) < 0)
+ return -1;
+ }
+
+ return 0;
+}
+
/*
* -1 means "stop trying further objects"; 0 means we may or may not have
* reused, but you can keep feeding bits.
@@ -2081,49 +2127,11 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
*/
base_offset = get_delta_base(pack->p, w_curs, &offset, type,
delta_obj_offset);
- if (!base_offset)
+ if (!base_offset ||
+ find_base_bitmap_pos(bitmap_git, pack, base_offset,
+ delta_obj_offset, &base_bitmap_pos) < 0)
return 0;
- if (bitmap_is_midx(bitmap_git)) {
- /*
- * Cross-pack deltas are rejected for now, but could
- * theoretically be supported in the future.
- *
- * We would need to ensure that we're sending both
- * halves of the delta/base pair, regardless of whether
- * or not the two cross a pack boundary. If they do,
- * then we must convert the delta to an REF_DELTA to
- * refer back to the base in the other pack.
- * */
- if (midx_pair_to_pack_pos(bitmap_git->midx,
- pack->pack_int_id,
- base_offset,
- &base_bitmap_pos) < 0) {
- return 0;
- }
- } else {
- /*
- * We assume delta dependencies always point backwards.
- * This lets us do a single pass, and is basically
- * always true due to the way OFS_DELTAs work. You would
- * not typically find REF_DELTA in a bitmapped pack,
- * since we only bitmap packs we write fresh, and
- * OFS_DELTA is the default). But let's double check to
- * make sure the pack wasn't written with odd
- * parameters.
- *
- * Since we're working on a single-pack bitmap, we can
- * use the object offset as a proxy for the bit
- * position, since the bits are ordered by their
- * positions within the pack.
- */
- if (base_offset >= delta_obj_offset)
- return 0;
- if (offset_to_pack_pos(pack->p, base_offset,
- &base_bitmap_pos) < 0)
- return 0;
- }
-
/*
* And finally, if we're not sending the base as part of our
* reuse chunk, then don't send this object either. The base
--
2.47.0.11.g487258bca34
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 06/11] pack-bitmap: drop `from_midx` field from `bitmapped_pack`
2024-10-09 20:30 [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible Taylor Blau
` (4 preceding siblings ...)
2024-10-09 20:31 ` [PATCH 05/11] pack-bitmap.c: extract `find_base_bitmap_pos()` Taylor Blau
@ 2024-10-09 20:31 ` Taylor Blau
2024-10-09 20:31 ` [PATCH 07/11] write_reused_pack_one(): translate bit positions directly Taylor Blau
` (6 subsequent siblings)
12 siblings, 0 replies; 31+ messages in thread
From: Taylor Blau @ 2024-10-09 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
This field was added in 41cd4b478f7 (pack-bitmap: tag bitmapped packs
with their corresponding MIDX, 2024-08-27) in order to expose the
bitmap's MIDX in order to translate bit positions correctly from
within 'pack-objects' during pack-reuse (c.f., 125c32605ab
(builtin/pack-objects.c: translate bit positions during pack-reuse,
2024-08-27) for more details).
But another approach would have been to use the `->midx` field of the
`struct bitmap_index *` directly, which feels clearer and avoids
duplicating information.
Unfortunately, we can't access that field directly since it is part of
the `bitmap_index` structure which is static within the pack-bitmap.c
compilation unit.
So let's instead introduce a new function which returns that pointer
to us, and replace the `from_midx` field with uses of that new
function (which we call `bitmap_midx()` here).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/pack-objects.c | 4 ++--
midx.c | 1 -
pack-bitmap.c | 6 +++++-
pack-bitmap.h | 2 +-
4 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 0fc0680b402..097bb5ac2ca 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1201,7 +1201,7 @@ static void write_reused_pack(struct bitmapped_pack *reuse_packfile,
if (pos + offset >= reuse_packfile->bitmap_pos + reuse_packfile->bitmap_nr)
goto done;
- if (reuse_packfile->bitmap_pos) {
+ if (bitmap_is_midx(bitmap_git)) {
/*
* When doing multi-pack reuse on a
* non-preferred pack, translate bit positions
@@ -1209,7 +1209,7 @@ static void write_reused_pack(struct bitmapped_pack *reuse_packfile,
* pack-relative positions before attempting
* reuse.
*/
- struct multi_pack_index *m = reuse_packfile->from_midx;
+ struct multi_pack_index *m = bitmap_midx(bitmap_git);
uint32_t midx_pos;
off_t pack_ofs;
diff --git a/midx.c b/midx.c
index 67e0d640046..ca98bfd7c64 100644
--- a/midx.c
+++ b/midx.c
@@ -496,7 +496,6 @@ int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
MIDX_CHUNK_BITMAPPED_PACKS_WIDTH * local_pack_int_id +
sizeof(uint32_t));
bp->pack_int_id = pack_int_id;
- bp->from_midx = m;
return 0;
}
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 6dbe6a2c5bc..b9ea1fab397 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2326,7 +2326,6 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
packs[packs_nr].pack_int_id = pack_int_id;
packs[packs_nr].bitmap_nr = pack->num_objects;
packs[packs_nr].bitmap_pos = 0;
- packs[packs_nr].from_midx = bitmap_git->midx;
objects_nr = packs[packs_nr++].bitmap_nr;
}
@@ -2981,6 +2980,11 @@ int bitmap_is_midx(struct bitmap_index *bitmap_git)
return !!bitmap_git->midx;
}
+struct multi_pack_index *bitmap_midx(struct bitmap_index *bitmap_git)
+{
+ return bitmap_git->midx;
+}
+
const struct string_list *bitmap_preferred_tips(struct repository *r)
{
const struct string_list *dest;
diff --git a/pack-bitmap.h b/pack-bitmap.h
index d7f4b8b8e95..a1e8c8936c9 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -60,7 +60,6 @@ struct bitmapped_pack {
uint32_t bitmap_pos;
uint32_t bitmap_nr;
- struct multi_pack_index *from_midx; /* MIDX only */
uint32_t pack_int_id; /* MIDX only */
};
@@ -157,6 +156,7 @@ char *midx_bitmap_filename(struct multi_pack_index *midx);
char *pack_bitmap_filename(struct packed_git *p);
int bitmap_is_midx(struct bitmap_index *bitmap_git);
+struct multi_pack_index *bitmap_midx(struct bitmap_index *bitmap_git);
const struct string_list *bitmap_preferred_tips(struct repository *r);
int bitmap_is_preferred_refname(struct repository *r, const char *refname);
--
2.47.0.11.g487258bca34
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 07/11] write_reused_pack_one(): translate bit positions directly
2024-10-09 20:30 [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible Taylor Blau
` (5 preceding siblings ...)
2024-10-09 20:31 ` [PATCH 06/11] pack-bitmap: drop `from_midx` field from `bitmapped_pack` Taylor Blau
@ 2024-10-09 20:31 ` Taylor Blau
2024-10-11 8:16 ` Jeff King
2024-10-09 20:31 ` [PATCH 08/11] t5332: enable OFS_DELTAs via test_pack_objects_reused Taylor Blau
` (5 subsequent siblings)
12 siblings, 1 reply; 31+ messages in thread
From: Taylor Blau @ 2024-10-09 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
A future commit will want to deal with bit positions instead of pack
positions from within builtin/pack-objects.c::write_reused_pack_one().
That function at present takes a pack position, so one approach to
accommodating the new functionality would be to add a secondary bit
position parameter, making the function's declaration look something
like:
static void write_reused_pack_one(struct packed_git *reuse_packfile,
size_t pack_pos, size_t bitmap_pos,
struct hashfile *out,
off_t pack_start,
struct pack_window **w_curs);
But because the pack-relative position can be easily derived from the
bit position, it makes senes to just pass the latter and let the
function itself translate it into a pack-relative position.
To do this, extract a new function `bitmap_to_pack_pos()` from the
existing `write_reused_pack()` function. This new routine is responsible
for performing the conversion from bitmap- to pack-relative positions.
Instead of performing that translation in `write_reused_pack()`, instead
call the new function from within `write_reused_pack_one()` so that we
can just pass a single bit position to it.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/pack-objects.c | 78 ++++++++++++++++++++++--------------------
1 file changed, 41 insertions(+), 37 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 097bb5ac2ca..7f50d58a235 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1017,6 +1017,42 @@ static off_t find_reused_offset(off_t where)
return reused_chunks[lo-1].difference;
}
+static uint32_t bitmap_to_pack_pos(struct packed_git *reuse_packfile,
+ size_t pos)
+{
+ if (bitmap_is_midx(bitmap_git)) {
+ /*
+ * When doing multi-pack reuse on a
+ * non-preferred pack, translate bit positions
+ * from the MIDX pseudo-pack order back to their
+ * pack-relative positions before attempting
+ * reuse.
+ */
+ struct multi_pack_index *m = bitmap_midx(bitmap_git);
+ uint32_t midx_pos, pack_pos;
+ off_t pack_ofs;
+
+ if (!m)
+ BUG("non-zero bitmap position without MIDX");
+
+ midx_pos = pack_pos_to_midx(m, pos);
+ pack_ofs = nth_midxed_offset(m, midx_pos);
+
+ if (offset_to_pack_pos(reuse_packfile, pack_ofs, &pack_pos) < 0)
+ BUG("could not find expected object at offset %"PRIuMAX" in pack %s",
+ (uintmax_t)pack_ofs, pack_basename(reuse_packfile));
+
+ return pack_pos;
+ } else {
+ /*
+ * Can use bit positions directly, even for MIDX
+ * bitmaps. See comment in try_partial_reuse()
+ * for why.
+ */
+ return pos;
+ }
+}
+
static void write_reused_pack_one(struct packed_git *reuse_packfile,
size_t pos, struct hashfile *out,
off_t pack_start,
@@ -1025,9 +1061,10 @@ static void write_reused_pack_one(struct packed_git *reuse_packfile,
off_t offset, next, cur;
enum object_type type;
unsigned long size;
+ uint32_t pack_pos = bitmap_to_pack_pos(reuse_packfile, pos);
- offset = pack_pos_to_offset(reuse_packfile, pos);
- next = pack_pos_to_offset(reuse_packfile, pos + 1);
+ offset = pack_pos_to_offset(reuse_packfile, pack_pos);
+ next = pack_pos_to_offset(reuse_packfile, pack_pos + 1);
record_reused_object(offset,
offset - (hashfile_total(out) - pack_start));
@@ -1191,7 +1228,6 @@ static void write_reused_pack(struct bitmapped_pack *reuse_packfile,
size_t pos = (i * BITS_IN_EWORD);
for (offset = 0; offset < BITS_IN_EWORD; ++offset) {
- uint32_t pack_pos;
if ((word >> offset) == 0)
break;
@@ -1201,40 +1237,8 @@ static void write_reused_pack(struct bitmapped_pack *reuse_packfile,
if (pos + offset >= reuse_packfile->bitmap_pos + reuse_packfile->bitmap_nr)
goto done;
- if (bitmap_is_midx(bitmap_git)) {
- /*
- * When doing multi-pack reuse on a
- * non-preferred pack, translate bit positions
- * from the MIDX pseudo-pack order back to their
- * pack-relative positions before attempting
- * reuse.
- */
- struct multi_pack_index *m = bitmap_midx(bitmap_git);
- uint32_t midx_pos;
- off_t pack_ofs;
-
- if (!m)
- BUG("non-zero bitmap position without MIDX");
-
- midx_pos = pack_pos_to_midx(m, pos + offset);
- pack_ofs = nth_midxed_offset(m, midx_pos);
-
- if (offset_to_pack_pos(reuse_packfile->p,
- pack_ofs, &pack_pos) < 0)
- BUG("could not find expected object at offset %"PRIuMAX" in pack %s",
- (uintmax_t)pack_ofs,
- pack_basename(reuse_packfile->p));
- } else {
- /*
- * Can use bit positions directly, even for MIDX
- * bitmaps. See comment in try_partial_reuse()
- * for why.
- */
- pack_pos = pos + offset;
- }
-
- write_reused_pack_one(reuse_packfile->p, pack_pos, f,
- pack_start, &w_curs);
+ write_reused_pack_one(reuse_packfile->p, pos + offset,
+ f, pack_start, &w_curs);
display_progress(progress_state, ++written);
}
}
--
2.47.0.11.g487258bca34
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 08/11] t5332: enable OFS_DELTAs via test_pack_objects_reused
2024-10-09 20:30 [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible Taylor Blau
` (6 preceding siblings ...)
2024-10-09 20:31 ` [PATCH 07/11] write_reused_pack_one(): translate bit positions directly Taylor Blau
@ 2024-10-09 20:31 ` Taylor Blau
2024-10-11 8:19 ` Jeff King
2024-10-09 20:31 ` [PATCH 09/11] pack-bitmap: enable cross-pack delta reuse Taylor Blau
` (4 subsequent siblings)
12 siblings, 1 reply; 31+ messages in thread
From: Taylor Blau @ 2024-10-09 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Back when test_pack_objects_reused was introduced via commit
7c01878eeb (t5332-multi-pack-reuse.sh: extract pack-objects helper
functions, 2024-02-05), we converted bare pack-objects invocations
into one of two wrapped variants, either test_pack_objects_reused or
test_pack_objects_reused_all.
The latter passes `--delta-base-offset`, allowing pack-objects to
generate OFS_DELTAs in its output pack. But the former does not, for
no good reason.
As we do not want to convert OFS_DELTAs to REF_DELTAs unnecessarily,
let's unify these two and pass `--delta-base-offset` to both.
Instrumenting the codepath
where we convert OFS_DELTAs to REF_DELTAs with a BUG() like so:
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 0fc0680b402..0f1b22b8674 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1051,6 +1051,8 @@ static void write_reused_pack_one(struct packed_git *reuse_packfile,
uint32_t base_pos;
struct object_id base_oid;
+ BUG("tested");
+
if (offset_to_pack_pos(reuse_packfile, base_offset, &base_pos) < 0)
die(_("expected object at offset %"PRIuMAX" "
"in pack %s"),
, and seeing what test(s) fail yields the following.
Test Summary Report
-------------------
t5326-multi-pack-bitmaps.sh (Wstat: 256 (exited 1) Tests: 357 Failed: 6)
Failed tests: 46, 91, 167, 220, 265, 341
Non-zero exit status: 1
t5310-pack-bitmaps.sh (Wstat: 256 (exited 1) Tests: 227 Failed: 6)
Failed tests: 46-47, 120-121, 197-198
Non-zero exit status: 1
t5327-multi-pack-bitmaps-rev.sh (Wstat: 256 (exited 1) Tests: 314 Failed: 6)
Failed tests: 46, 91, 157, 203, 248, 314
Non-zero exit status: 1
So the OFS_DELTA to REF_DELTA conversion is still tested thoroughly in
t5310, t5326, and t5327.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
t/t5332-multi-pack-reuse.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/t/t5332-multi-pack-reuse.sh b/t/t5332-multi-pack-reuse.sh
index 955ea42769b..8bcb736c75a 100755
--- a/t/t5332-multi-pack-reuse.sh
+++ b/t/t5332-multi-pack-reuse.sh
@@ -43,7 +43,7 @@ test_pack_objects_reused_all () {
test_pack_objects_reused () {
: >trace2.txt &&
GIT_TRACE2_EVENT="$PWD/trace2.txt" \
- git pack-objects --stdout --revs >got.pack &&
+ git pack-objects --delta-base-offset --stdout --revs >got.pack &&
test_pack_reused "$1" <trace2.txt &&
test_packs_reused "$2" <trace2.txt &&
--
2.47.0.11.g487258bca34
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 09/11] pack-bitmap: enable cross-pack delta reuse
2024-10-09 20:30 [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible Taylor Blau
` (7 preceding siblings ...)
2024-10-09 20:31 ` [PATCH 08/11] t5332: enable OFS_DELTAs via test_pack_objects_reused Taylor Blau
@ 2024-10-09 20:31 ` Taylor Blau
2024-10-11 8:31 ` Jeff King
2024-10-09 20:31 ` [PATCH 10/11] pack-bitmap.c: record whether the result was filtered Taylor Blau
` (3 subsequent siblings)
12 siblings, 1 reply; 31+ messages in thread
From: Taylor Blau @ 2024-10-09 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Back when multi-pack bitmaps were first introduced, we performed
verbatim pack reuse over the MIDX's preferred pack only. Since all
duplicate objects were resolved in favor of that pack, any object from
the preferred pack that was stored as an OFS_DELTA is directly
reusable.
That's not the case when we are reusing objects from multiple packs,
like was introduced via af626ac0e0 (pack-bitmap: enable reuse from
all bitmapped packs, 2023-12-14). When reusing an object stored as a
delta from some pack other than the preferred one, it's possible that
the base was selected from a different pack.
When that's the case, we could have converted those OFS_DELTAs to
REF_DELTAs on the fly (like we do when running 'pack-objects' without
the `--delta-base-offset` option). But af626ac0e0 decided instead to
kick those objects back into the non-verbatim reuse path, meaning that
reusing them was slightly slower.
This patch removes that limitation, and teaches pack-objects to
convert deltas (whose base was selected from a different pack in the
MIDX) from OFS_DELTAs to REF_DELTAs.
In order to do this, the pack-reuse code within pack-bitmap.c marks
bits in a separate bitmap called 'reuse_as_ref_delta'. Objects whose
bits are marked in that bitmap must be converted from OFS_DELTAs to
REF_DELTAs.
To mark bits in that bitmap, we adjust find_base_bitmap_pos() to
return the bitmap position of any delta object's base regardless of
whether or not it came from the same pack. This is done by:
1. First converting the base object's into a pack position (via
`offset_to_pack_pos()`).
2. Then converting from pack position into into lexical order (via
`pack_pos_to_index()`).
3. Then into an object ID (via `nth_packed_object_id()`).
4. Then into a position in the MIDX's lexical order of object IDs
(via `bsearch_midx()`).
5. And finally into a position in the MIDX's pseudo-pack order (via
`midx_pair_to_pack_pos()`).
If we can find that base object in the MIDX, then we use its position
in the MIDX's pseudo-pack order to determine whether or not it was
selected from the same pack. If it is, no adjustment is necessary.
Otherwise, we mark the delta object's position in the new
`reuse_as_ref_delta` bitmap, and convert accordingly from within
`write_reused_pack_one()`.
The only tweak we need to make outside of the above is to teach
`write_reused_pack_verbatim()` to only consider words of the
`reuse_packfile_bitmap` as reusable verbatim if the word does not have
any bits which are marked in the reuse_as_ref_delta bitmap, in which
case they must be converted and cannot be reused verbatim.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/pack-objects.c | 29 ++++++++++++++---
pack-bitmap.c | 63 +++++++++++++++++++++++++++----------
pack-bitmap.h | 1 +
t/t5332-multi-pack-reuse.sh | 4 +--
4 files changed, 75 insertions(+), 22 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 7f50d58a235..dbcd632483e 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -223,6 +223,7 @@ static size_t reuse_packfiles_nr;
static size_t reuse_packfiles_used_nr;
static uint32_t reuse_packfile_objects;
static struct bitmap *reuse_packfile_bitmap;
+static struct bitmap *reuse_as_ref_delta_packfile_bitmap;
static int use_bitmap_index_default = 1;
static int use_bitmap_index = -1;
@@ -1084,7 +1085,8 @@ static void write_reused_pack_one(struct packed_git *reuse_packfile,
assert(base_offset != 0);
/* Convert to REF_DELTA if we must... */
- if (!allow_ofs_delta) {
+ if (!allow_ofs_delta ||
+ bitmap_get(reuse_as_ref_delta_packfile_bitmap, pos)) {
uint32_t base_pos;
struct object_id base_oid;
@@ -1160,6 +1162,10 @@ static size_t write_reused_pack_verbatim(struct bitmapped_pack *reuse_packfile,
if (!bitmap_get(reuse_packfile_bitmap,
word_pos * BITS_IN_EWORD + offset))
return word_pos;
+ if (reuse_as_ref_delta_packfile_bitmap &&
+ bitmap_get(reuse_as_ref_delta_packfile_bitmap,
+ word_pos * BITS_IN_EWORD + offset))
+ return word_pos;
}
pos += BITS_IN_EWORD - (pos % BITS_IN_EWORD);
@@ -1182,10 +1188,24 @@ static size_t write_reused_pack_verbatim(struct bitmapped_pack *reuse_packfile,
if (pos >= end)
return reuse_packfile->bitmap_pos / BITS_IN_EWORD;
- while (pos < end &&
- reuse_packfile_bitmap->words[pos / BITS_IN_EWORD] == (eword_t)~0)
+ while (pos < end) {
+ size_t wpos = pos / BITS_IN_EWORD;
+ eword_t reuse;
+
+ reuse = reuse_packfile_bitmap->words[wpos];
+ if (reuse_as_ref_delta_packfile_bitmap) {
+ /*
+ * Can't reuse verbatim any objects which need
+ * to be first rewritten as REF_DELTAs.
+ */
+ reuse &= ~reuse_as_ref_delta_packfile_bitmap->words[wpos];
+ }
+
+ if (reuse != (eword_t)~0)
+ break;
+
pos += BITS_IN_EWORD;
-
+ }
if (pos > end)
pos = end;
@@ -4078,6 +4098,7 @@ static int get_object_list_from_bitmap(struct rev_info *revs)
&reuse_packfiles,
&reuse_packfiles_nr,
&reuse_packfile_bitmap,
+ &reuse_as_ref_delta_packfile_bitmap,
allow_pack_reuse == MULTI_PACK_REUSE);
if (reuse_packfiles) {
diff --git a/pack-bitmap.c b/pack-bitmap.c
index b9ea1fab397..8326a20979e 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2054,18 +2054,34 @@ static int find_base_bitmap_pos(struct bitmap_index *bitmap_git,
uint32_t *base_bitmap_pos)
{
if (bitmap_is_midx(bitmap_git)) {
+ struct object_id base_oid;
+ uint32_t base_pack_pos;
+ uint32_t base_idx_pos;
+ uint32_t base_midx_pos;
+
/*
- * Cross-pack deltas are rejected for now, but could
- * theoretically be supported in the future.
- *
- * We would need to ensure that we're sending both
- * halves of the delta/base pair, regardless of whether
- * or not the two cross a pack boundary. If they do,
- * then we must convert the delta to an REF_DELTA to
- * refer back to the base in the other pack.
- * */
- if (midx_pair_to_pack_pos(bitmap_git->midx, pack->pack_int_id,
- base_offset, base_bitmap_pos) < 0)
+ * First convert from the base object's offset
+ * to its pack position in pack order relative
+ * to its source pack. Use that information to
+ * find the base object's OID.
+ */
+ if (offset_to_pack_pos(pack->p, base_offset,
+ &base_pack_pos) < 0)
+ return -1;
+ base_idx_pos = pack_pos_to_index(pack->p, base_pack_pos);
+ if (nth_packed_object_id(&base_oid, pack->p, base_idx_pos) < 0)
+ return -1;
+
+ /*
+ * Now find the base object's lexical position
+ * in the MIDX, and convert that to a bitmap
+ * position in the MIDX's pseudo-pack order.
+ */
+ if (!bsearch_midx(&base_oid, bitmap_git->midx,
+ &base_midx_pos))
+ return -1;
+ if (midx_to_pack_pos(bitmap_git->midx, base_midx_pos,
+ base_bitmap_pos) < 0)
return -1;
} else {
/*
@@ -2102,6 +2118,7 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
size_t bitmap_pos,
off_t offset,
struct bitmap *reuse,
+ struct bitmap *reuse_as_ref_delta,
struct pack_window **w_curs)
{
off_t delta_obj_offset;
@@ -2116,6 +2133,7 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
if (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA) {
off_t base_offset;
uint32_t base_bitmap_pos;
+ int cross_pack;
/*
* Find the position of the base object so we can look it up
@@ -2129,7 +2147,8 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
delta_obj_offset);
if (!base_offset ||
find_base_bitmap_pos(bitmap_git, pack, base_offset,
- delta_obj_offset, &base_bitmap_pos) < 0)
+ delta_obj_offset,
+ &base_bitmap_pos) < 0)
return 0;
/*
@@ -2142,8 +2161,11 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
*/
if (!bitmap_get(reuse, base_bitmap_pos))
return 0;
- }
+ cross_pack = base_bitmap_pos < pack->bitmap_pos;
+ if (cross_pack)
+ bitmap_set(reuse_as_ref_delta, bitmap_pos);
+ }
/*
* If we got here, then the object is OK to reuse. Mark it.
*/
@@ -2153,7 +2175,8 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git,
struct bitmapped_pack *pack,
- struct bitmap *reuse)
+ struct bitmap *reuse,
+ struct bitmap *reuse_as_ref_delta)
{
struct bitmap *result = bitmap_git->result;
struct pack_window *w_curs = NULL;
@@ -2220,7 +2243,8 @@ static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git
}
if (try_partial_reuse(bitmap_git, pack, bit_pos,
- ofs, reuse, &w_curs) < 0) {
+ ofs, reuse, reuse_as_ref_delta,
+ &w_curs) < 0) {
/*
* try_partial_reuse indicated we couldn't reuse
* any bits, so there is no point in trying more
@@ -2255,12 +2279,14 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
struct bitmapped_pack **packs_out,
size_t *packs_nr_out,
struct bitmap **reuse_out,
+ struct bitmap **reuse_as_ref_delta_out,
int multi_pack_reuse)
{
struct repository *r = the_repository;
struct bitmapped_pack *packs = NULL;
struct bitmap *result = bitmap_git->result;
struct bitmap *reuse;
+ struct bitmap *reuse_as_ref_delta = NULL;
size_t i;
size_t packs_nr = 0, packs_alloc = 0;
size_t word_alloc;
@@ -2333,14 +2359,18 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
word_alloc = objects_nr / BITS_IN_EWORD;
if (objects_nr % BITS_IN_EWORD)
word_alloc++;
+
reuse = bitmap_word_alloc(word_alloc);
+ reuse_as_ref_delta = bitmap_word_alloc(word_alloc);
for (i = 0; i < packs_nr; i++)
- reuse_partial_packfile_from_bitmap_1(bitmap_git, &packs[i], reuse);
+ reuse_partial_packfile_from_bitmap_1(bitmap_git, &packs[i],
+ reuse, reuse_as_ref_delta);
if (bitmap_is_empty(reuse)) {
free(packs);
bitmap_free(reuse);
+ bitmap_free(reuse_as_ref_delta);
return;
}
@@ -2352,6 +2382,7 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
*packs_out = packs;
*packs_nr_out = packs_nr;
*reuse_out = reuse;
+ *reuse_as_ref_delta_out = reuse_as_ref_delta;
}
int bitmap_walk_contains(struct bitmap_index *bitmap_git,
diff --git a/pack-bitmap.h b/pack-bitmap.h
index a1e8c8936c9..e7962ac90e8 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -86,6 +86,7 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
struct bitmapped_pack **packs_out,
size_t *packs_nr_out,
struct bitmap **reuse_out,
+ struct bitmap **reuse_as_ref_delta_out,
int multi_pack_reuse);
int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data *mapping,
kh_oid_map_t *reused_bitmaps, int show_progress);
diff --git a/t/t5332-multi-pack-reuse.sh b/t/t5332-multi-pack-reuse.sh
index 8bcb736c75a..6edff02d124 100755
--- a/t/t5332-multi-pack-reuse.sh
+++ b/t/t5332-multi-pack-reuse.sh
@@ -194,7 +194,7 @@ test_expect_success 'omit delta with uninteresting base (same pack)' '
test_pack_objects_reused 3 1 <in
'
-test_expect_success 'omit delta from uninteresting base (cross pack)' '
+test_expect_success 'can retain delta from uninteresting base (cross pack)' '
cat >in <<-EOF &&
$(git rev-parse $base)
^$(git rev-parse $delta)
@@ -207,7 +207,7 @@ test_expect_success 'omit delta from uninteresting base (cross pack)' '
packs_nr="$(find $packdir -type f -name "pack-*.pack" | wc -l)" &&
objects_nr="$(git rev-list --count --all --objects)" &&
- test_pack_objects_reused_all $(($objects_nr - 1)) $packs_nr
+ test_pack_objects_reused_all $objects_nr $packs_nr
'
test_expect_success 'non-omitted delta in MIDX preferred pack' '
--
2.47.0.11.g487258bca34
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 10/11] pack-bitmap.c: record whether the result was filtered
2024-10-09 20:30 [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible Taylor Blau
` (8 preceding siblings ...)
2024-10-09 20:31 ` [PATCH 09/11] pack-bitmap: enable cross-pack delta reuse Taylor Blau
@ 2024-10-09 20:31 ` Taylor Blau
2024-10-11 8:35 ` Jeff King
2024-10-09 20:31 ` [PATCH 11/11] pack-bitmap: enable reusing deltas with base objects in 'haves' bitmap Taylor Blau
` (2 subsequent siblings)
12 siblings, 1 reply; 31+ messages in thread
From: Taylor Blau @ 2024-10-09 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
In a following commit, we will prepare to mark objects for reuse which
are stored as deltas, but whose base object wasn't included in the
output pack (e.g., because the client already has it).
We can't, however, take advantage of that optimization when the
traversal removed objects from the result due to either (a) object
filtering, or (b) --unpacked.
That's because in either case, we can't rely on the principle that "if
an object appears in the 'haves' bitmap, that the client already has
that object", since they may not have it as a result of filtering.
Keep track of whether or not we performed any object filtering during
our traversal in preparation to actually implement thin delta
conversion.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 8326a20979e..67ea267ed2a 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -112,6 +112,12 @@ struct bitmap_index {
/* "have" bitmap from the last performed walk */
struct bitmap *haves;
+ /*
+ * Whether the last performed walk had objects removed from
+ * 'result' due to object filtering.
+ */
+ int filtered;
+
/* Version of the bitmap index */
unsigned int version;
};
@@ -1684,6 +1690,8 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
bitmap_unset(to_filter, pos);
}
+ bitmap_git->filtered = 1;
+
bitmap_free(tips);
}
@@ -1776,6 +1784,8 @@ static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_git,
bitmap_unset(to_filter, pos);
}
+ bitmap_git->filtered = 1;
+
bitmap_free(tips);
}
@@ -1892,6 +1902,8 @@ static void filter_packed_objects_from_bitmap(struct bitmap_index *bitmap_git,
if (has_object_pack(&eindex->objects[i]->oid))
bitmap_unset(result, objects_nr + i);
}
+
+ bitmap_git->filtered = 1;
}
struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
--
2.47.0.11.g487258bca34
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 11/11] pack-bitmap: enable reusing deltas with base objects in 'haves' bitmap
2024-10-09 20:30 [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible Taylor Blau
` (9 preceding siblings ...)
2024-10-09 20:31 ` [PATCH 10/11] pack-bitmap.c: record whether the result was filtered Taylor Blau
@ 2024-10-09 20:31 ` Taylor Blau
2024-10-10 16:46 ` [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible Junio C Hamano
2024-10-11 8:38 ` Jeff King
12 siblings, 0 replies; 31+ messages in thread
From: Taylor Blau @ 2024-10-09 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Ever since the current verbatim pack-reuse strategy was devised in
bb514de356 (pack-objects: improve partial packfile reuse, 2019-12-18),
we have skipped sending delta objects whenever we're not sending that
object's base.
But it's fine to send such an object as a REF_DELTA against the base
object, provided the following are true:
- We know that the client has the object already, i.e. it appears
in the 'haves' bitmap.
- The client supports thin packs, i.e. 'pack-objects' was invoked
with the '--thin' option.
- The client did not request object filtering, in which case we
cannot trust that they actually do have all of the objects in the
'haves' bitmap, since we only filter the 'result' bitmap.
When all of those criteria are met, we can send the delta object
verbatim, meaning that we can reuse more objects from existing packs
via the verbatim reuse mechanism, which should be faster than kicking
those objects back to the slow path.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/pack-objects.c | 3 +-
pack-bitmap.c | 52 ++++++++++++++++++++---------
pack-bitmap.h | 3 +-
t/t5332-multi-pack-reuse.sh | 66 +++++++++++++++++++++++++++++++++++++
4 files changed, 107 insertions(+), 17 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index dbcd632483e..027c64f931f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -4099,7 +4099,8 @@ static int get_object_list_from_bitmap(struct rev_info *revs)
&reuse_packfiles_nr,
&reuse_packfile_bitmap,
&reuse_as_ref_delta_packfile_bitmap,
- allow_pack_reuse == MULTI_PACK_REUSE);
+ allow_pack_reuse == MULTI_PACK_REUSE,
+ thin);
if (reuse_packfiles) {
reuse_packfile_objects = bitmap_popcount(reuse_packfile_bitmap);
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 67ea267ed2a..e8094453ca3 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2131,6 +2131,7 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
off_t offset,
struct bitmap *reuse,
struct bitmap *reuse_as_ref_delta,
+ int thin_deltas,
struct pack_window **w_curs)
{
off_t delta_obj_offset;
@@ -2145,7 +2146,7 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
if (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA) {
off_t base_offset;
uint32_t base_bitmap_pos;
- int cross_pack;
+ int wants_base, cross_pack;
/*
* Find the position of the base object so we can look it up
@@ -2164,19 +2165,25 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
return 0;
/*
- * And finally, if we're not sending the base as part of our
- * reuse chunk, then don't send this object either. The base
- * would come after us, along with other objects not
- * necessarily in the pack, which means we'd need to convert
- * to REF_DELTA on the fly. Better to just let the normal
- * object_entry code path handle it.
+ * And finally, if we're not sending the base as part of
+ * our reuse chunk, then either convert the delta to a
+ * REF_DELTA if the client supports thin deltas, or
+ * don't send this object either.
*/
- if (!bitmap_get(reuse, base_bitmap_pos))
- return 0;
-
+ wants_base = bitmap_get(reuse, base_bitmap_pos);
cross_pack = base_bitmap_pos < pack->bitmap_pos;
- if (cross_pack)
+
+ if (!wants_base) {
+ if (!thin_deltas)
+ return 0;
+ if (!bitmap_git->haves ||
+ !bitmap_get(bitmap_git->haves, base_bitmap_pos))
+ return 0;
+
bitmap_set(reuse_as_ref_delta, bitmap_pos);
+ } else if (cross_pack) {
+ bitmap_set(reuse_as_ref_delta, bitmap_pos);
+ }
}
/*
* If we got here, then the object is OK to reuse. Mark it.
@@ -2188,7 +2195,8 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git,
struct bitmapped_pack *pack,
struct bitmap *reuse,
- struct bitmap *reuse_as_ref_delta)
+ struct bitmap *reuse_as_ref_delta,
+ int thin_deltas)
{
struct bitmap *result = bitmap_git->result;
struct pack_window *w_curs = NULL;
@@ -2256,7 +2264,7 @@ static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git
if (try_partial_reuse(bitmap_git, pack, bit_pos,
ofs, reuse, reuse_as_ref_delta,
- &w_curs) < 0) {
+ thin_deltas, &w_curs) < 0) {
/*
* try_partial_reuse indicated we couldn't reuse
* any bits, so there is no point in trying more
@@ -2292,7 +2300,8 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
size_t *packs_nr_out,
struct bitmap **reuse_out,
struct bitmap **reuse_as_ref_delta_out,
- int multi_pack_reuse)
+ int multi_pack_reuse,
+ int thin_deltas)
{
struct repository *r = the_repository;
struct bitmapped_pack *packs = NULL;
@@ -2375,9 +2384,22 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
reuse = bitmap_word_alloc(word_alloc);
reuse_as_ref_delta = bitmap_word_alloc(word_alloc);
+ if (bitmap_git->filtered) {
+ /*
+ * If the bitmap traversal filtered objects, then we
+ * can't trust the client actually has all of the
+ * objects that appear in the 'haves' bitmap. In that
+ * case, pretend like we don't support thin-deltas,
+ * since we can't guarantee that the client has all of
+ * the objects we think it has.
+ */
+ thin_deltas = 0;
+ }
+
for (i = 0; i < packs_nr; i++)
reuse_partial_packfile_from_bitmap_1(bitmap_git, &packs[i],
- reuse, reuse_as_ref_delta);
+ reuse, reuse_as_ref_delta,
+ thin_deltas);
if (bitmap_is_empty(reuse)) {
free(packs);
diff --git a/pack-bitmap.h b/pack-bitmap.h
index e7962ac90e8..1a204fec31e 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -87,7 +87,8 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
size_t *packs_nr_out,
struct bitmap **reuse_out,
struct bitmap **reuse_as_ref_delta_out,
- int multi_pack_reuse);
+ int multi_pack_reuse,
+ int thin_deltas);
int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data *mapping,
kh_oid_map_t *reused_bitmaps, int show_progress);
void free_bitmap_index(struct bitmap_index *);
diff --git a/t/t5332-multi-pack-reuse.sh b/t/t5332-multi-pack-reuse.sh
index 6edff02d124..6237c680ae9 100755
--- a/t/t5332-multi-pack-reuse.sh
+++ b/t/t5332-multi-pack-reuse.sh
@@ -51,6 +51,33 @@ test_pack_objects_reused () {
git index-pack --strict -o got.idx got.pack
}
+# test_pack_objects_reused_thin <pack-reused> <packs-reused>
+test_pack_objects_reused_thin () {
+ : >trace2.txt &&
+ GIT_TRACE2_EVENT="$PWD/trace2.txt" \
+ git pack-objects --thin --delta-base-offset --stdout --revs \
+ >got.pack &&
+
+ test_pack_reused "$1" <trace2.txt &&
+ test_packs_reused "$2" <trace2.txt &&
+
+ git index-pack --strict --stdin --fix-thin -o got.idx <got.pack
+}
+
+# test_pack_objects_reused_filter <filter> <pack-reused> <packs-reused>
+test_pack_objects_reused_filter () {
+ : >trace2.txt &&
+ GIT_TRACE2_EVENT="$PWD/trace2.txt" \
+ git pack-objects --thin --delta-base-offset --stdout --revs \
+ --thin --filter="$1" \
+ >got.pack &&
+
+ test_pack_reused "$2" <trace2.txt &&
+ test_packs_reused "$3" <trace2.txt &&
+
+ git index-pack --strict --stdin --fix-thin -o got.idx <got.pack
+}
+
test_expect_success 'preferred pack is reused for single-pack reuse' '
test_config pack.allowPackReuse single &&
@@ -210,6 +237,45 @@ test_expect_success 'can retain delta from uninteresting base (cross pack)' '
test_pack_objects_reused_all $objects_nr $packs_nr
'
+test_expect_success 'converts OFS_DELTA to REF_DELTA when possible' '
+ git init ofs-to-ref-delta &&
+ (
+ cd ofs-to-ref-delta &&
+
+ git config pack.allowPackReuse multi &&
+
+ test_seq 64 >f &&
+ git add f &&
+ test_tick &&
+ git commit -m "base" &&
+ base="$(git rev-parse HEAD)" &&
+
+ test_seq 32 >f &&
+ test_tick &&
+ git commit -a -m "delta" &&
+ delta="$(git rev-parse HEAD)" &&
+
+ git repack -ad &&
+
+ test_commit other &&
+
+ pack=$(git pack-objects --all --unpacked $packdir/pack) &&
+ git multi-pack-index write --bitmap \
+ --preferred-pack=pack-$pack.pack &&
+
+ have_delta "$(git rev-parse $delta:f)" "$(git rev-parse $base:f)" &&
+
+ cat >in <<-EOF &&
+ $delta
+ ^$base
+ EOF
+
+ test_pack_objects_reused_thin 3 1 <in &&
+ test_pack_objects_reused 2 1 <in &&
+ test_pack_objects_reused_filter "blob:none" 2 1 <in
+ )
+'
+
test_expect_success 'non-omitted delta in MIDX preferred pack' '
test_config pack.allowPackReuse single &&
--
2.47.0.11.g487258bca34
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible
2024-10-09 20:30 [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible Taylor Blau
` (10 preceding siblings ...)
2024-10-09 20:31 ` [PATCH 11/11] pack-bitmap: enable reusing deltas with base objects in 'haves' bitmap Taylor Blau
@ 2024-10-10 16:46 ` Junio C Hamano
2024-10-10 17:07 ` Taylor Blau
2024-10-11 8:38 ` Jeff King
12 siblings, 1 reply; 31+ messages in thread
From: Junio C Hamano @ 2024-10-10 16:46 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Patrick Steinhardt
Taylor Blau <me@ttaylorr.com> writes:
> This patch series enables more objects to be candidates for verbatim
> reuse when generating a pack with the aide of reachability bitmaps.
>
> By the end of this series, two new classes of objects are now reuse
> candidates which were not before. They are:
>
> - Cross-pack deltas. In multi-pack bitmaps, if the delta and base
> were selected from different packs, the delta was not reusable.
Hmph. Suppose that you need to send object X, you happen to have X
as a ofs-delta against Y, but Y may appear in multiple packs.
Even if the copy of Y you are going to send together with X is from
the same packfile, you may not be sending all the objects between X
and Y in the original local packfile, so you would need to recompute
the offset to give ofs-delta X to the distance between X and Y in
the resulting packstream, no?
So when you pick the copy of Y out of another pack, what's so
different? After emitting Y to the resulting pack stream (and
remembering where in the packstream you did so), when it is X's turn
to be emitted, shouldn't you be able to compute the distance in the
resulting packstream to represent X as an ofs-delta against Y, which
should already be happening when you had both X and Y in the same
original pack?
> - Thin deltas. In both single- and multi-pack bitmaps, we did not
> consider reusing deltas whose base object appears in the 'haves'
> bitmap.
I hope this optimization does not kick in unless the receiving end
is prepared to do "index-pack --fix-thin".
I've never thought about this specifically, but it is interesting to
realize that by definition "thin" deltas cannot be ofs-deltas.
> Of course, REF_DELTAs have a less compact representation than
> OFS_DELTAs, so the resulting packs will trade off some CPU time for a
> slightly larger pack.
Is comparing ref- vs ofs- delta a meaningful thing to do in the
context of this series?
What does the current code without these patches do in the same
situation? Give up on reusing the existing delta and then? If we
send the base representation instead, the comparison is "we
currently do not use delta, but with this change we can reuse delta
(even though we do not bother recompute the offset and instead use
ref-delta)".
Do we recompute the delta on the fly and show it as an ofs-delta
with the current code? Then the comparison would be "we spend time
doing diff-delta once right now but instead reuse an existing one
(even though we do not bother recompute the offset and instead use
ref-delta)".
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible
2024-10-10 16:46 ` [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible Junio C Hamano
@ 2024-10-10 17:07 ` Taylor Blau
2024-10-10 20:20 ` Junio C Hamano
0 siblings, 1 reply; 31+ messages in thread
From: Taylor Blau @ 2024-10-10 17:07 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Elijah Newren, Jeff King, Patrick Steinhardt
On Thu, Oct 10, 2024 at 09:46:00AM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > This patch series enables more objects to be candidates for verbatim
> > reuse when generating a pack with the aide of reachability bitmaps.
> >
> > By the end of this series, two new classes of objects are now reuse
> > candidates which were not before. They are:
> >
> > - Cross-pack deltas. In multi-pack bitmaps, if the delta and base
> > were selected from different packs, the delta was not reusable.
>
> Hmph. Suppose that you need to send object X, you happen to have X
> as a ofs-delta against Y, but Y may appear in multiple packs.
>
> Even if the copy of Y you are going to send together with X is from
> the same packfile, you may not be sending all the objects between X
> and Y in the original local packfile, so you would need to recompute
> the offset to give ofs-delta X to the distance between X and Y in
> the resulting packstream, no?
That's right. When doing verbatim pack reuse, we mark reused "chunks"
(c.f. 'struct reused_chunk'), where each chunk records the offset of the
beginning of the chunk in the source pack, and the difference between
that value and the corresponding byte in the assembled pack.
So suppose like in your example we have X and Y from the same pack, and
we are sending both objects, but not necessarily every object in between
the two from the source pack. In that case, X and Y will end up in
different chunks.
When we go to actually rewrite the OFS_DELTA for X, we'll compute:
off_t fixup = find_reused_offset(X) - find_reused_offset(Y);
(in the code, X and Y are actually the offsets for each object in the
source pack, but I'm using the object names here for clarity).
When there is a non-zero fixup value, we'll patch the OFS_DELTA to
account for the gap between it and its base object. The details of how
that is done are in builtin/pack-objects.c::write_reused_pack_one(), in
the 'if (fixup) { ... }' block.
> So when you pick the copy of Y out of another pack, what's so
> different? After emitting Y to the resulting pack stream (and
> remembering where in the packstream you did so), when it is X's turn
> to be emitted, shouldn't you be able to compute the distance in the
> resulting packstream to represent X as an ofs-delta against Y, which
> should already be happening when you had both X and Y in the same
> original pack?
Good question. The difference is that if you're reusing X and Y from
same pack, you know that Y occurs some number of bytes *before* X in the
resulting pack.
But if Y comes from a different pack, it may get pushed further back in
the MIDX pseudo-pack order. So in that case the assembled pack may list
X before Y, in which case X cannot be an OFS_DELTA of Y, since offset
deltas require that the base object appears first.
> > - Thin deltas. In both single- and multi-pack bitmaps, we did not
> > consider reusing deltas whose base object appears in the 'haves'
> > bitmap.
>
> I hope this optimization does not kick in unless the receiving end
> is prepared to do "index-pack --fix-thin".
It does. We only allow computing "thin deltas" when pack-objects was
invoked with `--thin`.
> I've never thought about this specifically, but it is interesting to
> realize that by definition "thin" deltas cannot be ofs-deltas.
Yep.
> > Of course, REF_DELTAs have a less compact representation than
> > OFS_DELTAs, so the resulting packs will trade off some CPU time for a
> > slightly larger pack.
>
> Is comparing ref- vs ofs- delta a meaningful thing to do in the
> context of this series?
>
> What does the current code without these patches do in the same
> situation? Give up on reusing the existing delta and then? If we
> send the base representation instead, the comparison is "we
> currently do not use delta, but with this change we can reuse delta
> (even though we do not bother recompute the offset and instead use
> ref-delta)".
The current code does not reuse deltas when we're either (a) not sending
the base, (b) we are sending the base, but it's in a different pack, or
(c) both. When any of those conditions are met, we do not reuse the
existing on-disk representation of the delta object verbatim, and
instead kick it back to the normal pack generation routine.
That might result in us finding a new base, or applying sending the
object's contents as a standalone object.
> Do we recompute the delta on the fly and show it as an ofs-delta
> with the current code? Then the comparison would be "we spend time
> doing diff-delta once right now but instead reuse an existing one
> (even though we do not bother recompute the offset and instead use
> ref-delta)".
Fair.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible
2024-10-10 17:07 ` Taylor Blau
@ 2024-10-10 20:20 ` Junio C Hamano
2024-10-10 20:32 ` Taylor Blau
2024-10-11 7:54 ` Jeff King
0 siblings, 2 replies; 31+ messages in thread
From: Junio C Hamano @ 2024-10-10 20:20 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Patrick Steinhardt
Taylor Blau <me@ttaylorr.com> writes:
>> So when you pick the copy of Y out of another pack, what's so
>> different? After emitting Y to the resulting pack stream (and
>> remembering where in the packstream you did so), when it is X's turn
>> to be emitted, shouldn't you be able to compute the distance in the
>> resulting packstream to represent X as an ofs-delta against Y, which
>> should already be happening when you had both X and Y in the same
>> original pack?
>
> Good question. The difference is that if you're reusing X and Y from
> same pack, you know that Y occurs some number of bytes *before* X in the
> resulting pack.
>
> But if Y comes from a different pack, it may get pushed further back in
> the MIDX pseudo-pack order. So in that case the assembled pack may list
> X before Y, in which case X cannot be an OFS_DELTA of Y, since offset
> deltas require that the base object appears first.
That is what we have always done even before we started bitmap based
optimization. If we happen to write Y before X, we consider doing
ofs-delta for X, but otherwise we do ref-delta for X. We do reorder
fairly late in the pipeline when we notice that X that we are about
to write out depends on Y that we haven't emitted to avoid this,
though. All of that the bitmap-based optimization code path should
be able to imitate, I would think.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible
2024-10-10 20:20 ` Junio C Hamano
@ 2024-10-10 20:32 ` Taylor Blau
2024-10-11 7:54 ` Jeff King
1 sibling, 0 replies; 31+ messages in thread
From: Taylor Blau @ 2024-10-10 20:32 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Elijah Newren, Jeff King, Patrick Steinhardt
On Thu, Oct 10, 2024 at 01:20:06PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> >> So when you pick the copy of Y out of another pack, what's so
> >> different? After emitting Y to the resulting pack stream (and
> >> remembering where in the packstream you did so), when it is X's turn
> >> to be emitted, shouldn't you be able to compute the distance in the
> >> resulting packstream to represent X as an ofs-delta against Y, which
> >> should already be happening when you had both X and Y in the same
> >> original pack?
> >
> > Good question. The difference is that if you're reusing X and Y from
> > same pack, you know that Y occurs some number of bytes *before* X in the
> > resulting pack.
> >
> > But if Y comes from a different pack, it may get pushed further back in
> > the MIDX pseudo-pack order. So in that case the assembled pack may list
> > X before Y, in which case X cannot be an OFS_DELTA of Y, since offset
> > deltas require that the base object appears first.
>
> That is what we have always done even before we started bitmap based
> optimization. If we happen to write Y before X, we consider doing
> ofs-delta for X, but otherwise we do ref-delta for X. We do reorder
> fairly late in the pipeline when we notice that X that we are about
> to write out depends on Y that we haven't emitted to avoid this,
> though. All of that the bitmap-based optimization code path should
> be able to imitate, I would think.
I see..., so yeah: before this patch series we would have not processed
X via the bitmap-powered pack reuse mechanism, and after this series we
will.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible
2024-10-10 20:20 ` Junio C Hamano
2024-10-10 20:32 ` Taylor Blau
@ 2024-10-11 7:54 ` Jeff King
2024-10-11 8:01 ` Jeff King
1 sibling, 1 reply; 31+ messages in thread
From: Jeff King @ 2024-10-11 7:54 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Taylor Blau, git, Elijah Newren, Patrick Steinhardt
On Thu, Oct 10, 2024 at 01:20:06PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> >> So when you pick the copy of Y out of another pack, what's so
> >> different? After emitting Y to the resulting pack stream (and
> >> remembering where in the packstream you did so), when it is X's turn
> >> to be emitted, shouldn't you be able to compute the distance in the
> >> resulting packstream to represent X as an ofs-delta against Y, which
> >> should already be happening when you had both X and Y in the same
> >> original pack?
> >
> > Good question. The difference is that if you're reusing X and Y from
> > same pack, you know that Y occurs some number of bytes *before* X in the
> > resulting pack.
> >
> > But if Y comes from a different pack, it may get pushed further back in
> > the MIDX pseudo-pack order. So in that case the assembled pack may list
> > X before Y, in which case X cannot be an OFS_DELTA of Y, since offset
> > deltas require that the base object appears first.
>
> That is what we have always done even before we started bitmap based
> optimization. If we happen to write Y before X, we consider doing
> ofs-delta for X, but otherwise we do ref-delta for X. We do reorder
> fairly late in the pipeline when we notice that X that we are about
> to write out depends on Y that we haven't emitted to avoid this,
> though. All of that the bitmap-based optimization code path should
> be able to imitate, I would think.
A small nitpick on your final sentence here. As you note, we do not ever
write Y before X, because compute_write_order() always places bases
before their deltas in the output pack (and we do not allow cycles of
deltas, of course).
And even with bitmaps we'd do the same, as long as those objects are
both fed to the regular pack-writing machinery.
It is only the special verbatim-pack-reuse[1] code that is trying to
blit out the start of an existing pack that is affected. And in theory
there it _could_ try to reorder to produce an ofs delta, but in practice
the whole point is to take a single very cheap pass over the start of
the pack (or multiple packs in the case of the midx). Doing any
reordering would be counterproductive to the "cheap" adjective there (it
does not even keep a list of object ids it is sending), so we are better
to leave those objects for the regular output code (which does make such
a list).
Taylor's series introduces an in-between where we choose not to reorder,
but switch to REF_DELTA. That is still cheap on CPU on the generating
side, though the resulting pack is slightly larger.
-Peff
[1] I wish we had good names to distinguish the various cases, because
the term "reuse" is kind of overloaded. The "slower" regular
object-sending path may still reuse verbatim bytes found in an
on-disk path. But this "blit out matching parts of a pack without
otherwise considering the objects" feature happens outside of that.
We called it "pack reuse" back in 2013, but that was not a good name
even then. I don't have a good suggestion, though.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible
2024-10-11 7:54 ` Jeff King
@ 2024-10-11 8:01 ` Jeff King
2024-10-11 16:23 ` Junio C Hamano
0 siblings, 1 reply; 31+ messages in thread
From: Jeff King @ 2024-10-11 8:01 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Taylor Blau, git, Elijah Newren, Patrick Steinhardt
On Fri, Oct 11, 2024 at 03:54:51AM -0400, Jeff King wrote:
> [1] I wish we had good names to distinguish the various cases, because
> the term "reuse" is kind of overloaded. The "slower" regular
> object-sending path may still reuse verbatim bytes found in an
> on-disk path. But this "blit out matching parts of a pack without
> otherwise considering the objects" feature happens outside of that.
> We called it "pack reuse" back in 2013, but that was not a good name
> even then. I don't have a good suggestion, though.
Actually, confusing things more, there are really _three_ layers of
reuse:
1. At the beginning of a pack, we can blit out the bytes for objects
starting from the beginning of the pack that are being sent (we
know any delta will be satisfied since its base comes before it).
2. After that, we process objects one by one, but do so very cheaply
by just deciding if we can blit them out one by one, fixing up
delta base offsets to account for gaps.
3. Otherwise, we generate an object_entry struct in packing_data for
them, try to find new deltas, and so on. We may then reuse the
on-disk bytes after deciding they're suitable.
We call all of these "reuse", and certainly both (1) and (2) are "pack
reuse", but I think that term is sufficiently vague that it could apply
to (3) as well.
-Peff
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 07/11] write_reused_pack_one(): translate bit positions directly
2024-10-09 20:31 ` [PATCH 07/11] write_reused_pack_one(): translate bit positions directly Taylor Blau
@ 2024-10-11 8:16 ` Jeff King
2024-11-04 20:36 ` Taylor Blau
0 siblings, 1 reply; 31+ messages in thread
From: Jeff King @ 2024-10-11 8:16 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Wed, Oct 09, 2024 at 04:31:21PM -0400, Taylor Blau wrote:
> +static uint32_t bitmap_to_pack_pos(struct packed_git *reuse_packfile,
> + size_t pos)
> +{
> + if (bitmap_is_midx(bitmap_git)) {
> + /*
> + * When doing multi-pack reuse on a
> + * non-preferred pack, translate bit positions
> + * from the MIDX pseudo-pack order back to their
> + * pack-relative positions before attempting
> + * reuse.
> + */
> + struct multi_pack_index *m = bitmap_midx(bitmap_git);
> + uint32_t midx_pos, pack_pos;
> + off_t pack_ofs;
> +
> + if (!m)
> + BUG("non-zero bitmap position without MIDX");
The text of this BUG() seems weird: we haven't asserted a non-zero
bitmap position. We're really only checking that bitmap_is_midx() and
bitmap_midx() agree that there is a midx. I was going to suggest that
the former could be implemented with a NULL-check on the latter, but
really, that is already how it works (except that it accesses
bitmap_index->midx directly).
So yes, it truly would be a surprising BUG() to see them disagree. :)
I do not mind keeping the BUG() there if you want to be extra careful,
but I just found the message text confusing.
Ah...hmm. This is all being copied from the earlier function. So I think
the culprit may be patch 6, which swaps:
if (reuse_packfile->bitmap_pos)
for:
if (bitmap_is_midx(bitmap_git))
which is what makes the BUG() text confusing. But then, what about this:
> + } else {
> + /*
> + * Can use bit positions directly, even for MIDX
> + * bitmaps. See comment in try_partial_reuse()
> + * for why.
> + */
> + return pos;
> + }
> +}
This "even for MIDX" is not really accurate, as we know this else block
is for the non-midx case. Are we losing the optimization that the first
pack in the midx can be treated the same as the single-pack case (we
know that its pack positions and the start of the midx bit positions are
identical, which is what the comment it mentions explains)?
-Peff
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 08/11] t5332: enable OFS_DELTAs via test_pack_objects_reused
2024-10-09 20:31 ` [PATCH 08/11] t5332: enable OFS_DELTAs via test_pack_objects_reused Taylor Blau
@ 2024-10-11 8:19 ` Jeff King
2024-11-04 20:50 ` Taylor Blau
0 siblings, 1 reply; 31+ messages in thread
From: Jeff King @ 2024-10-11 8:19 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Wed, Oct 09, 2024 at 04:31:24PM -0400, Taylor Blau wrote:
> Back when test_pack_objects_reused was introduced via commit
> 7c01878eeb (t5332-multi-pack-reuse.sh: extract pack-objects helper
> functions, 2024-02-05), we converted bare pack-objects invocations
> into one of two wrapped variants, either test_pack_objects_reused or
> test_pack_objects_reused_all.
>
> The latter passes `--delta-base-offset`, allowing pack-objects to
> generate OFS_DELTAs in its output pack. But the former does not, for
> no good reason.
>
> As we do not want to convert OFS_DELTAs to REF_DELTAs unnecessarily,
> let's unify these two and pass `--delta-base-offset` to both.
I think that matches what happens in the real world. I am puzzled that
your BUG() instrumenting turns up some conversion cases. Why are we
still converting in those cases?
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 0fc0680b402..0f1b22b8674 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
You need to indent or otherwise comment-out this diff. Otherwise "git
am" will pick it up as the start of the actual diff, adding the bogus
BUG() call to the applied patch (and dropping the rest of your commit
message).
-Peff
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 09/11] pack-bitmap: enable cross-pack delta reuse
2024-10-09 20:31 ` [PATCH 09/11] pack-bitmap: enable cross-pack delta reuse Taylor Blau
@ 2024-10-11 8:31 ` Jeff King
2024-11-04 21:00 ` Taylor Blau
0 siblings, 1 reply; 31+ messages in thread
From: Jeff King @ 2024-10-11 8:31 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Wed, Oct 09, 2024 at 04:31:28PM -0400, Taylor Blau wrote:
> In order to do this, the pack-reuse code within pack-bitmap.c marks
> bits in a separate bitmap called 'reuse_as_ref_delta'. Objects whose
> bits are marked in that bitmap must be converted from OFS_DELTAs to
> REF_DELTAs.
>
> To mark bits in that bitmap, we adjust find_base_bitmap_pos() to
> return the bitmap position of any delta object's base regardless of
> whether or not it came from the same pack. This is done by:
>
> 1. First converting the base object's into a pack position (via
> `offset_to_pack_pos()`).
>
> 2. Then converting from pack position into into lexical order (via
> `pack_pos_to_index()`).
>
> 3. Then into an object ID (via `nth_packed_object_id()`).
>
> 4. Then into a position in the MIDX's lexical order of object IDs
> (via `bsearch_midx()`).
>
> 5. And finally into a position in the MIDX's pseudo-pack order (via
> `midx_pair_to_pack_pos()`).
>
> If we can find that base object in the MIDX, then we use its position
> in the MIDX's pseudo-pack order to determine whether or not it was
> selected from the same pack. If it is, no adjustment is necessary.
> Otherwise, we mark the delta object's position in the new
> `reuse_as_ref_delta` bitmap, and convert accordingly from within
> `write_reused_pack_one()`.
OK, that makes sense. It does feel like a non-trivial amount of work to
do for each delta we're going to (potentially) reuse from a midx'd pack.
Can we recognize the common case that the base is in the same pack and
also being sent/reused without doing the full conversion to oid and the
resulting bsearch?
> @@ -1182,10 +1188,24 @@ static size_t write_reused_pack_verbatim(struct bitmapped_pack *reuse_packfile,
> if (pos >= end)
> return reuse_packfile->bitmap_pos / BITS_IN_EWORD;
>
> - while (pos < end &&
> - reuse_packfile_bitmap->words[pos / BITS_IN_EWORD] == (eword_t)~0)
> + while (pos < end) {
> + size_t wpos = pos / BITS_IN_EWORD;
> + eword_t reuse;
> +
> + reuse = reuse_packfile_bitmap->words[wpos];
> + if (reuse_as_ref_delta_packfile_bitmap) {
> + /*
> + * Can't reuse verbatim any objects which need
> + * to be first rewritten as REF_DELTAs.
> + */
> + reuse &= ~reuse_as_ref_delta_packfile_bitmap->words[wpos];
> + }
> +
> + if (reuse != (eword_t)~0)
> + break;
> +
> pos += BITS_IN_EWORD;
> -
> + }
This is accessing reuse_as_ref_delta_packfile_bitmap->words directly
using pos/end as limits. But those come from reuse_packfile_bitmap. Are
we guaranteed to have zero-extended the reuse_as_ref_delta bitmap as far
as the original went?
Could we just be calling bitmap_get() here, which would do the length
check for us? Though I guess we would miss out on some whole-word magic
it is doing. So maybe we need to just do that length check ourselves.
-Peff
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 10/11] pack-bitmap.c: record whether the result was filtered
2024-10-09 20:31 ` [PATCH 10/11] pack-bitmap.c: record whether the result was filtered Taylor Blau
@ 2024-10-11 8:35 ` Jeff King
2024-11-04 21:01 ` Taylor Blau
0 siblings, 1 reply; 31+ messages in thread
From: Jeff King @ 2024-10-11 8:35 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Wed, Oct 09, 2024 at 04:31:31PM -0400, Taylor Blau wrote:
> In a following commit, we will prepare to mark objects for reuse which
> are stored as deltas, but whose base object wasn't included in the
> output pack (e.g., because the client already has it).
>
> We can't, however, take advantage of that optimization when the
> traversal removed objects from the result due to either (a) object
> filtering, or (b) --unpacked.
>
> That's because in either case, we can't rely on the principle that "if
> an object appears in the 'haves' bitmap, that the client already has
> that object", since they may not have it as a result of filtering.
I think this is a reasonable precaution, though it does make me wonder
if the non-reuse code paths are so careful. That is, if we have object Y
which is a delta against object X, but we know the other side _could_
have X (because it's reachable from some commit they claim to have),
would we ever send a thin delta against X?
I don't recall seeing any protections for that, though I also wouldn't
be too surprised if it somehow just works out because we never figure
out whether they have X or not. :-/
I wonder, though: should thin deltas just be turned off entirely when
filtering is in play?
Likewise for --unpacked (though in practice I think it would never be
used with --thin, as it is about generating new on-disk packs).
-Peff
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible
2024-10-09 20:30 [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible Taylor Blau
` (11 preceding siblings ...)
2024-10-10 16:46 ` [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible Junio C Hamano
@ 2024-10-11 8:38 ` Jeff King
2024-11-19 23:08 ` Taylor Blau
12 siblings, 1 reply; 31+ messages in thread
From: Jeff King @ 2024-10-11 8:38 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Wed, Oct 09, 2024 at 04:30:52PM -0400, Taylor Blau wrote:
> The first optimization (cross-pack deltas) should help clones and
> fetches, but the second optimization (thin deltas) will only help
> fetches, since the 'haves' bitmap will necessarily be empty for
> clones.
>
> Of course, REF_DELTAs have a less compact representation than
> OFS_DELTAs, so the resulting packs will trade off some CPU time for a
> slightly larger pack. But we're only talking about a handful of extra
> bytes, and network bandwidth is pretty cheap, so I think the
> optimization is worthwhile here too.
It would be nice to see some numbers, even simulated ones from t/perf.
Of course we'd like to show off any reduced server CPU for generating a
fetch response. But I'd also like to see if the extra steps to find the
cross-pack bases have any measurable negative effect (so the ideal there
would be a big midx repo that doesn't have a lot of those bases, as it
would not be helped much by the optimization and would be hurt by the
time spent trying to apply it).
-Peff
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible
2024-10-11 8:01 ` Jeff King
@ 2024-10-11 16:23 ` Junio C Hamano
0 siblings, 0 replies; 31+ messages in thread
From: Junio C Hamano @ 2024-10-11 16:23 UTC (permalink / raw)
To: Jeff King; +Cc: Taylor Blau, git, Elijah Newren, Patrick Steinhardt
Jeff King <peff@peff.net> writes:
> On Fri, Oct 11, 2024 at 03:54:51AM -0400, Jeff King wrote:
>
>> [1] I wish we had good names to distinguish the various cases, because
>> the term "reuse" is kind of overloaded. The "slower" regular
>> object-sending path may still reuse verbatim bytes found in an
>> on-disk path. But this "blit out matching parts of a pack without
>> otherwise considering the objects" feature happens outside of that.
>> We called it "pack reuse" back in 2013, but that was not a good name
>> even then. I don't have a good suggestion, though.
>
> Actually, confusing things more, there are really _three_ layers of
> reuse:
>
> 1. At the beginning of a pack, we can blit out the bytes for objects
> starting from the beginning of the pack that are being sent (we
> know any delta will be satisfied since its base comes before it).
Yes, I wouldn't be worried about that one. The data encoded as an
ofs-delta in this section already point at their base correctly in
the original pack, and in the resulting pack.
> 2. After that, we process objects one by one, but do so very cheaply
> by just deciding if we can blit them out one by one, fixing up
> delta base offsets to account for gaps.
This is the part I said "we have to remember where the base was emitted
and subtract it from the offset of the delta anyway even if we are
reusing delta from the same pack, so what do we need a separate code path
for this?" in my initial response.
I guess, "fixing up" could be done by using the difference between
offsets in the original pack for this step, which would be an
unfortunate design that prevents it from getting reused.
> 3. Otherwise, we generate an object_entry struct in packing_data for
> them, try to find new deltas, and so on. We may then reuse the
> on-disk bytes after deciding they're suitable.
It is a bit unfortunate, if we were to trust the existing delta base
selection in the pack like we did since Feb 2006 [*], we should be
omitting the "try to find new deltas" step. Perhaps that comes for
free as the object_entry knows that our object has a delta_base?
> We call all of these "reuse", and certainly both (1) and (2) are "pack
> reuse", but I think that term is sufficiently vague that it could apply
> to (3) as well.
>
> -Peff
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 07/11] write_reused_pack_one(): translate bit positions directly
2024-10-11 8:16 ` Jeff King
@ 2024-11-04 20:36 ` Taylor Blau
0 siblings, 0 replies; 31+ messages in thread
From: Taylor Blau @ 2024-11-04 20:36 UTC (permalink / raw)
To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Fri, Oct 11, 2024 at 04:16:15AM -0400, Jeff King wrote:
> Ah...hmm. This is all being copied from the earlier function. So I think
> the culprit may be patch 6, which swaps:
>
> if (reuse_packfile->bitmap_pos)
>
> for:
>
> if (bitmap_is_midx(bitmap_git))
>
> which is what makes the BUG() text confusing. But then, what about this:
>
> > + } else {
> > + /*
> > + * Can use bit positions directly, even for MIDX
> > + * bitmaps. See comment in try_partial_reuse()
> > + * for why.
> > + */
> > + return pos;
> > + }
> > +}
>
> This "even for MIDX" is not really accurate, as we know this else block
> is for the non-midx case. Are we losing the optimization that the first
> pack in the midx can be treated the same as the single-pack case (we
> know that its pack positions and the start of the midx bit positions are
> identical, which is what the comment it mentions explains)?
Great catch.
We indeed lost that optimization when converting "if
(reuse_packfile->bitmap_pos)" to "if (bitmap_is_midx(bitmap_git))".
Let's restore that by keeping the conditional unchanged, which:
- makes the BUG() make sense as written, and
- preserves the optimization where the first pack in a MIDX can be
treated the same as if it came from a single-pack bitmap, and
bypass the bit position translations.
Thanks for spotting.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 08/11] t5332: enable OFS_DELTAs via test_pack_objects_reused
2024-10-11 8:19 ` Jeff King
@ 2024-11-04 20:50 ` Taylor Blau
0 siblings, 0 replies; 31+ messages in thread
From: Taylor Blau @ 2024-11-04 20:50 UTC (permalink / raw)
To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Fri, Oct 11, 2024 at 04:19:39AM -0400, Jeff King wrote:
> On Wed, Oct 09, 2024 at 04:31:24PM -0400, Taylor Blau wrote:
>
> > Back when test_pack_objects_reused was introduced via commit
> > 7c01878eeb (t5332-multi-pack-reuse.sh: extract pack-objects helper
> > functions, 2024-02-05), we converted bare pack-objects invocations
> > into one of two wrapped variants, either test_pack_objects_reused or
> > test_pack_objects_reused_all.
> >
> > The latter passes `--delta-base-offset`, allowing pack-objects to
> > generate OFS_DELTAs in its output pack. But the former does not, for
> > no good reason.
> >
> > As we do not want to convert OFS_DELTAs to REF_DELTAs unnecessarily,
> > let's unify these two and pass `--delta-base-offset` to both.
>
> I think that matches what happens in the real world. I am puzzled that
> your BUG() instrumenting turns up some conversion cases. Why are we
> still converting in those cases?
These are cases where we're calling pack-objects directly without
passing the --delta-base-offset flag, so all deltas get converted into
REF_DELTAs.
> > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> > index 0fc0680b402..0f1b22b8674 100644
> > --- a/builtin/pack-objects.c
> > +++ b/builtin/pack-objects.c
>
> You need to indent or otherwise comment-out this diff. Otherwise "git
> am" will pick it up as the start of the actual diff, adding the bogus
> BUG() call to the applied patch (and dropping the rest of your commit
> message).
Oops, of course. Thanks for spotting again.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 09/11] pack-bitmap: enable cross-pack delta reuse
2024-10-11 8:31 ` Jeff King
@ 2024-11-04 21:00 ` Taylor Blau
0 siblings, 0 replies; 31+ messages in thread
From: Taylor Blau @ 2024-11-04 21:00 UTC (permalink / raw)
To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Fri, Oct 11, 2024 at 04:31:34AM -0400, Jeff King wrote:
> On Wed, Oct 09, 2024 at 04:31:28PM -0400, Taylor Blau wrote:
>
> > In order to do this, the pack-reuse code within pack-bitmap.c marks
> > bits in a separate bitmap called 'reuse_as_ref_delta'. Objects whose
> > bits are marked in that bitmap must be converted from OFS_DELTAs to
> > REF_DELTAs.
> >
> > To mark bits in that bitmap, we adjust find_base_bitmap_pos() to
> > return the bitmap position of any delta object's base regardless of
> > whether or not it came from the same pack. This is done by:
> >
> > 1. First converting the base object's into a pack position (via
> > `offset_to_pack_pos()`).
> >
> > 2. Then converting from pack position into into lexical order (via
> > `pack_pos_to_index()`).
> >
> > 3. Then into an object ID (via `nth_packed_object_id()`).
> >
> > 4. Then into a position in the MIDX's lexical order of object IDs
> > (via `bsearch_midx()`).
> >
> > 5. And finally into a position in the MIDX's pseudo-pack order (via
> > `midx_pair_to_pack_pos()`).
> >
> > If we can find that base object in the MIDX, then we use its position
> > in the MIDX's pseudo-pack order to determine whether or not it was
> > selected from the same pack. If it is, no adjustment is necessary.
> > Otherwise, we mark the delta object's position in the new
> > `reuse_as_ref_delta` bitmap, and convert accordingly from within
> > `write_reused_pack_one()`.
>
> OK, that makes sense. It does feel like a non-trivial amount of work to
> do for each delta we're going to (potentially) reuse from a midx'd pack.
> Can we recognize the common case that the base is in the same pack and
> also being sent/reused without doing the full conversion to oid and the
> resulting bsearch?
I don't think it ends up saving you anything if you don't find anything
matching the pack/offset pair in the MIDX. If you perform that lookup
with bsearch_midx() and get nothing back, then you have to take the
slower path above anyway.
My figuring here was that it would be better to uniformly take a
slightly slower path instead of taking a hopefully-faster path which
might fail, only to then go back to the slower path on top.
Of course, you could do both, or apply some heuristics like avoiding the
cross-pack lookup if you know you're in the preferred pack, etc. I'm not
sure how much it is worth doing so, TBH.
> > @@ -1182,10 +1188,24 @@ static size_t write_reused_pack_verbatim(struct bitmapped_pack *reuse_packfile,
> > if (pos >= end)
> > return reuse_packfile->bitmap_pos / BITS_IN_EWORD;
> >
> > - while (pos < end &&
> > - reuse_packfile_bitmap->words[pos / BITS_IN_EWORD] == (eword_t)~0)
> > + while (pos < end) {
> > + size_t wpos = pos / BITS_IN_EWORD;
> > + eword_t reuse;
> > +
> > + reuse = reuse_packfile_bitmap->words[wpos];
> > + if (reuse_as_ref_delta_packfile_bitmap) {
> > + /*
> > + * Can't reuse verbatim any objects which need
> > + * to be first rewritten as REF_DELTAs.
> > + */
> > + reuse &= ~reuse_as_ref_delta_packfile_bitmap->words[wpos];
> > + }
> > +
> > + if (reuse != (eword_t)~0)
> > + break;
> > +
> > pos += BITS_IN_EWORD;
> > -
> > + }
>
> This is accessing reuse_as_ref_delta_packfile_bitmap->words directly
> using pos/end as limits. But those come from reuse_packfile_bitmap. Are
> we guaranteed to have zero-extended the reuse_as_ref_delta bitmap as far
> as the original went?
Yeah, we know this is OK because both are allocated with the same size
in reuse_partial_packfile_from_bitmap(), where the relevant portion is:
word_alloc = objects_nr / BITS_IN_EWORD;
if (objects_nr % BITS_IN_EWORD)
word_alloc++;
reuse = bitmap_word_alloc(word_alloc);
reuse_as_ref_delta = bitmap_word_alloc(word_alloc);
all of the bitmap_set() operations on the former are bounded in
try_partial_reuse(), but adding a length check can be done here as an
extra safety measure.
> Could we just be calling bitmap_get() here, which would do the length
> check for us? Though I guess we would miss out on some whole-word magic
> it is doing. So maybe we need to just do that length check ourselves.
Yeah, we don't use bitmap_get() because we want to access the whole word
at a time.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 10/11] pack-bitmap.c: record whether the result was filtered
2024-10-11 8:35 ` Jeff King
@ 2024-11-04 21:01 ` Taylor Blau
0 siblings, 0 replies; 31+ messages in thread
From: Taylor Blau @ 2024-11-04 21:01 UTC (permalink / raw)
To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Fri, Oct 11, 2024 at 04:35:45AM -0400, Jeff King wrote:
> I wonder, though: should thin deltas just be turned off entirely when
> filtering is in play?
>
> Likewise for --unpacked (though in practice I think it would never be
> used with --thin, as it is about generating new on-disk packs).
They should; that's what this commit is setting us up to do in the next
one, which does not allocate the "convert to REF_DELTAs" bitmap when
filtering is in play.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible
2024-10-11 8:38 ` Jeff King
@ 2024-11-19 23:08 ` Taylor Blau
2024-11-19 23:34 ` Taylor Blau
0 siblings, 1 reply; 31+ messages in thread
From: Taylor Blau @ 2024-11-19 23:08 UTC (permalink / raw)
To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Fri, Oct 11, 2024 at 04:38:38AM -0400, Jeff King wrote:
> On Wed, Oct 09, 2024 at 04:30:52PM -0400, Taylor Blau wrote:
>
> > The first optimization (cross-pack deltas) should help clones and
> > fetches, but the second optimization (thin deltas) will only help
> > fetches, since the 'haves' bitmap will necessarily be empty for
> > clones.
> >
> > Of course, REF_DELTAs have a less compact representation than
> > OFS_DELTAs, so the resulting packs will trade off some CPU time for a
> > slightly larger pack. But we're only talking about a handful of extra
> > bytes, and network bandwidth is pretty cheap, so I think the
> > optimization is worthwhile here too.
>
> It would be nice to see some numbers, even simulated ones from t/perf.
> Of course we'd like to show off any reduced server CPU for generating a
> fetch response. But I'd also like to see if the extra steps to find the
> cross-pack bases have any measurable negative effect (so the ideal there
> would be a big midx repo that doesn't have a lot of those bases, as it
> would not be helped much by the optimization and would be hurt by the
> time spent trying to apply it).
I put together some loose numbers for this, and there is a definite
measurable slowdown imposed by this series.
The setup is getting a fresh copy of the kernel, and then repacking it
with:
$ git repack -da --cruft --write-midx --write-bitmap-index \
--max-pack-size 1G --max-cruft-size 1G
, which on my machine produces the following pack structure:
$ for idx in .git/objects/pack/*.idx
do
echo $(basename $idx) \
$(git show-index <$idx | wc -l)
done | sort -rnk2
pack-8dc61c00e623765c54cbd332f31795bad7edf142.idx 3385859
pack-d84e88715cde55d4d1b09afb23254617a123e668.idx 2064396
pack-cd408bebd5c02719df7621941d92415f4dbb313c.idx 1805431
pack-b11c6ac57f5ecf338314ee59134b0d724d257494.idx 1036443
pack-a42cc1b409940d7e38a4673bca1150d320392bf0.idx 819701
pack-93d255a1000f438a77528907c480c06b277e225d.idx 755019
pack-600621b90dc29c761ee168cc4237b2ff2956b0e8.idx 609754
Generating a full clone[^1] with multi-pack reuse enabled (based on the tip
of tb/multi-pack-reuse-dupfix) gives me the following timings:
13.94s user 0.27s system 99% cpu 14.207 total
13.93s user 0.30s system 99% cpu 14.223 total
13.90s user 0.33s system 99% cpu 14.229 total
After applying the series, I get:
14.94s user 0.33s system 99% cpu 15.279 total
14.92s user 0.32s system 99% cpu 15.244 total
14.87s user 0.40s system 99% cpu 15.280 total
Or around a ~5% slowdown. I think what's really killing us is all of the
back and forth in pack-bitmap.c::find_base_bitmap_pos(), where we have
to:
- Find the pack-relative position of the base object given its offset.
- Convert that pack-relative position into a lexical position (still
relative to the source pack).
- Convert that lexical position into an object ID.
- Call bsearch_midx() to see if we have a copy of that object, and
record its MIDX-relative lexical position.
- Convert the MIDX-relative lexical position into its pseudo-pack
position.
I think that the first four of those steps are unavoidable. But I think
we can do better on the last step if we compute a forward index from
MIDX lexical position to pseudo-pack position.
A cheap way to run that experiment is to just add a temporary field into
the MIDX to store that mapping, compute it at runtime, and then deduct
the time it took to compute that mapping (presumably you'd do it during
MIDX creation, and store it in a chunk, etc.).
Here's some code to do that:
--- 8< ---
diff --git a/midx.c b/midx.c
index ca98bfd7c64..2bd892f899c 100644
--- a/midx.c
+++ b/midx.c
@@ -988,3 +988,18 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
return verify_midx_error;
}
+
+void midx_populate_forward_index(struct multi_pack_index *m)
+{
+ uint32_t i;
+
+ ALLOC_ARRAY(m->forward_idx, m->num_objects);
+
+ trace2_region_enter("midx", "populate_forward_index", the_repository);
+
+ for (i = 0; i < m->num_objects; i++)
+ if (midx_to_pack_pos(m, i, m->forward_idx + i) < 0)
+ BUG("oops");
+
+ trace2_region_leave("midx", "populate_forward_index", the_repository);
+}
diff --git a/midx.h b/midx.h
index 42d4f8d149e..1ef755fe45b 100644
--- a/midx.h
+++ b/midx.h
@@ -65,6 +65,8 @@ struct multi_pack_index {
const unsigned char *chunk_revindex;
size_t chunk_revindex_len;
+ uint32_t *forward_idx;
+
struct multi_pack_index *base_midx;
uint32_t num_objects_in_base;
uint32_t num_packs_in_base;
@@ -74,6 +76,8 @@ struct multi_pack_index {
char object_dir[FLEX_ARRAY];
};
+void midx_populate_forward_index(struct multi_pack_index *m);
+
#define MIDX_PROGRESS (1 << 0)
#define MIDX_WRITE_REV_INDEX (1 << 1)
#define MIDX_WRITE_BITMAP (1 << 2)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index e8094453ca3..1eb72347c02 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2092,9 +2092,13 @@ static int find_base_bitmap_pos(struct bitmap_index *bitmap_git,
if (!bsearch_midx(&base_oid, bitmap_git->midx,
&base_midx_pos))
return -1;
+#if 0
if (midx_to_pack_pos(bitmap_git->midx, base_midx_pos,
base_bitmap_pos) < 0)
return -1;
+#else
+ *base_bitmap_pos = bitmap_git->midx->forward_idx[base_midx_pos];
+#endif
} else {
/*
* We assume delta dependencies always point backwards.
@@ -2316,6 +2320,7 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
assert(result);
load_reverse_index(r, bitmap_git);
+ midx_populate_forward_index(bitmap_git->midx);
if (!bitmap_is_midx(bitmap_git) || !bitmap_git->midx->chunk_bitmapped_packs)
multi_pack_reuse = 0;
--- >8 ---
Then when running the same command, we get results that are quite
encouraging. The runtime jumps to 24.213 seconds, which is ~9.73 seconds
slower than the average of the baseline measurements. But it takes
~10.418 seconds on my machine to compute the forward index. So it's
really around 688ms *faster* than the baseline, despite doing a little
more work.
The fact that we can improve on the baseline rather than just meet it
suggests that implementing this forward index may have some benefit
outside of just this series.
Thanks,
Taylor
[^1]: The setup here is:
$ git for-each-ref --format='%(objectname)' refs/heads refs/tags >in
$ git pack-objects --stdout --delta-base-offset --use-bitmap-index --revs <in >/dev/null
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible
2024-11-19 23:08 ` Taylor Blau
@ 2024-11-19 23:34 ` Taylor Blau
2024-12-18 12:57 ` Jeff King
0 siblings, 1 reply; 31+ messages in thread
From: Taylor Blau @ 2024-11-19 23:34 UTC (permalink / raw)
To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Tue, Nov 19, 2024 at 06:08:24PM -0500, Taylor Blau wrote:
> Then when running the same command, we get results that are quite
> encouraging. The runtime jumps to 24.213 seconds, which is ~9.73 seconds
> slower than the average of the baseline measurements. But it takes
> ~10.418 seconds on my machine to compute the forward index. So it's
> really around 688ms *faster* than the baseline, despite doing a little
> more work.
Sorry, there is a much quicker way to generate the forward index at
runtime, which is the following:
--- 8< ---
diff --git a/midx.c b/midx.c
index 67e0d64004..1023bfa51a 100644
--- a/midx.c
+++ b/midx.c
@@ -989,3 +989,17 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
return verify_midx_error;
}
+
+void midx_populate_forward_index(struct multi_pack_index *m)
+{
+ uint32_t i;
+
+ ALLOC_ARRAY(m->forward_idx, m->num_objects);
+
+ trace2_region_enter("midx", "populate_forward_index", the_repository);
+
+ for (i = 0; i < m->num_objects; i++)
+ m->forward_idx[pack_pos_to_midx(m, i)] = i;
+
+ trace2_region_leave("midx", "populate_forward_index", the_repository);
+}
--- >8 ---
I don't know if this makes as much of a difference for regular
multi-pack reuse, since there we're only making calls to
midx_pair_to_pack_pos(), which already uses midx_pack_order_cmp(), so
there isn't any additional translation to do.
Thanks,
Taylor
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible
2024-11-19 23:34 ` Taylor Blau
@ 2024-12-18 12:57 ` Jeff King
0 siblings, 0 replies; 31+ messages in thread
From: Jeff King @ 2024-12-18 12:57 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Tue, Nov 19, 2024 at 06:34:12PM -0500, Taylor Blau wrote:
> On Tue, Nov 19, 2024 at 06:08:24PM -0500, Taylor Blau wrote:
> > Then when running the same command, we get results that are quite
> > encouraging. The runtime jumps to 24.213 seconds, which is ~9.73 seconds
> > slower than the average of the baseline measurements. But it takes
> > ~10.418 seconds on my machine to compute the forward index. So it's
> > really around 688ms *faster* than the baseline, despite doing a little
> > more work.
>
> Sorry, there is a much quicker way to generate the forward index at
> runtime, which is the following:
> [...]
Neat. I didn't see timings for this method, but I'd assume it's quite
fast. So if it shaves off the same 688ms, I'd expect it to be an overall
win. The code itself looks reasonable to me.
I think when you re-post this series it might make sense to add a t/perf
script to demonstrate these timings (not just what we're discussing
here, but also the overall speedup we're hoping to achieve).
-Peff
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2024-12-18 12:57 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-09 20:30 [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible Taylor Blau
2024-10-09 20:31 ` [PATCH 01/11] pack-bitmap.c: do not pass `pack_pos` to `try_partial_reuse()` Taylor Blau
2024-10-09 20:31 ` [PATCH 02/11] pack-bitmap.c: avoid unnecessary `offset_to_pack_pos()` Taylor Blau
2024-10-09 20:31 ` [PATCH 03/11] pack-bitmap.c: delay calling 'offset_to_pack_pos()' Taylor Blau
2024-10-09 20:31 ` [PATCH 04/11] pack-bitmap.c: compare `base_offset` to `delta_obj_offset` Taylor Blau
2024-10-09 20:31 ` [PATCH 05/11] pack-bitmap.c: extract `find_base_bitmap_pos()` Taylor Blau
2024-10-09 20:31 ` [PATCH 06/11] pack-bitmap: drop `from_midx` field from `bitmapped_pack` Taylor Blau
2024-10-09 20:31 ` [PATCH 07/11] write_reused_pack_one(): translate bit positions directly Taylor Blau
2024-10-11 8:16 ` Jeff King
2024-11-04 20:36 ` Taylor Blau
2024-10-09 20:31 ` [PATCH 08/11] t5332: enable OFS_DELTAs via test_pack_objects_reused Taylor Blau
2024-10-11 8:19 ` Jeff King
2024-11-04 20:50 ` Taylor Blau
2024-10-09 20:31 ` [PATCH 09/11] pack-bitmap: enable cross-pack delta reuse Taylor Blau
2024-10-11 8:31 ` Jeff King
2024-11-04 21:00 ` Taylor Blau
2024-10-09 20:31 ` [PATCH 10/11] pack-bitmap.c: record whether the result was filtered Taylor Blau
2024-10-11 8:35 ` Jeff King
2024-11-04 21:01 ` Taylor Blau
2024-10-09 20:31 ` [PATCH 11/11] pack-bitmap: enable reusing deltas with base objects in 'haves' bitmap Taylor Blau
2024-10-10 16:46 ` [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible Junio C Hamano
2024-10-10 17:07 ` Taylor Blau
2024-10-10 20:20 ` Junio C Hamano
2024-10-10 20:32 ` Taylor Blau
2024-10-11 7:54 ` Jeff King
2024-10-11 8:01 ` Jeff King
2024-10-11 16:23 ` Junio C Hamano
2024-10-11 8:38 ` Jeff King
2024-11-19 23:08 ` Taylor Blau
2024-11-19 23:34 ` Taylor Blau
2024-12-18 12:57 ` Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).