From: Patrick Steinhardt <ps@pks.im>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 14/24] pack-objects: keep track of `pack_start` for each reuse pack
Date: Thu, 7 Dec 2023 14:13:24 +0100 [thread overview]
Message-ID: <ZXHE9L7iqXQAit_1@tanuki> (raw)
In-Reply-To: <6f4fba861b59f909148775ee64c3ba89afc872b5.1701198172.git.me@ttaylorr.com>
[-- Attachment #1: Type: text/plain, Size: 4093 bytes --]
On Tue, Nov 28, 2023 at 02:08:32PM -0500, Taylor Blau wrote:
> When reusing objects from a pack, we keep track of a set of one or more
> `reused_chunk`s, corresponding to sections of one or more object(s) from
> a source pack that we are reusing. Each chunk contains two pieces of
> information:
>
> - the offset of the first object in the source pack (relative to the
> beginning of the source pack)
> - the difference between that offset, and the corresponding offset in
> the pack we're generating
>
> The purpose of keeping track of these is so that we can patch an
> OFS_DELTAs that cross over a section of the reuse pack that we didn't
> take.
>
> For instance, consider a hypothetical pack as shown below:
>
> (chunk #2)
> __________...
> /
> /
> +--------+---------+-------------------+---------+
> ... | <base> | <other> | (unused) | <delta> | ...
> +--------+---------+-------------------+---------+
> \ /
> \______________/
> (chunk #1)
>
> Suppose that we are sending objects "base", "other", and "delta", and
> that the "delta" object is stored as an OFS_DELTA, and that its base is
> "base". If we don't send any objects in the "(unused)" range, we can't
> copy the delta'd object directly, since its delta offset includes a
> range of the pack that we didn't copy, so we have to account for that
> difference when patching and reassembling the delta.
>
> In order to compute this value correctly, we need to know not only where
> we are in the packfile we're assembling (with `hashfile_total(f)`) but
> also the position of the first byte of the packfile that we are
> currently reusing.
>
> Together, these two allow us to compute the reused chunk's offset
> difference relative to the start of the reused pack, as desired.
Hm. I'm not quite sure I fully understand the motivation here. Is this
something that was broken all along? Why does it become a problem now?
Sorry if I'm missing the obvious here.
> Helped-by: Jeff King <peff@peff.net>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
> builtin/pack-objects.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 7682bd65bb..eb8be514d1 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -1016,6 +1016,7 @@ static off_t find_reused_offset(off_t where)
>
> static void write_reused_pack_one(struct packed_git *reuse_packfile,
> size_t pos, struct hashfile *out,
> + off_t pack_start,
> struct pack_window **w_curs)
> {
> off_t offset, next, cur;
> @@ -1025,7 +1026,8 @@ static void write_reused_pack_one(struct packed_git *reuse_packfile,
> offset = pack_pos_to_offset(reuse_packfile, pos);
> next = pack_pos_to_offset(reuse_packfile, pos + 1);
>
> - record_reused_object(offset, offset - hashfile_total(out));
> + record_reused_object(offset,
> + offset - (hashfile_total(out) - pack_start));
>
> cur = offset;
> type = unpack_object_header(reuse_packfile, w_curs, &cur, &size);
> @@ -1095,6 +1097,7 @@ static void write_reused_pack_one(struct packed_git *reuse_packfile,
>
> static size_t write_reused_pack_verbatim(struct packed_git *reuse_packfile,
> struct hashfile *out,
> + off_t pack_start UNUSED,
> struct pack_window **w_curs)
> {
> size_t pos = 0;
> @@ -1126,10 +1129,12 @@ static void write_reused_pack(struct packed_git *reuse_packfile,
> {
> size_t i = 0;
> uint32_t offset;
> + off_t pack_start = hashfile_total(f) - sizeof(struct pack_header);
Given that this patch in its current state doesn't seem to do anything
yet, am I right in assuming that `hashfile_total(f) - sizeof(struct
pack_header)` is always expected to be zero for now?
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2023-12-07 13:13 UTC|newest]
Thread overview: 107+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-28 19:07 [PATCH 00/24] pack-objects: multi-pack verbatim reuse Taylor Blau
2023-11-28 19:07 ` [PATCH 01/24] pack-objects: free packing_data in more places Taylor Blau
2023-11-30 10:18 ` Patrick Steinhardt
2023-11-30 19:08 ` Taylor Blau
2023-11-28 19:07 ` [PATCH 02/24] pack-bitmap-write: deep-clear the `bb_commit` slab Taylor Blau
2023-11-30 10:18 ` Patrick Steinhardt
2023-11-30 19:11 ` Taylor Blau
2023-12-12 7:04 ` Jeff King
2023-12-12 16:48 ` Taylor Blau
2023-11-28 19:08 ` [PATCH 03/24] pack-bitmap: plug leak in find_objects() Taylor Blau
2023-12-12 7:04 ` Jeff King
2023-11-28 19:08 ` [PATCH 04/24] midx: factor out `fill_pack_info()` Taylor Blau
2023-11-30 10:18 ` Patrick Steinhardt
2023-11-30 19:19 ` Taylor Blau
2023-11-28 19:08 ` [PATCH 05/24] midx: implement `DISP` chunk Taylor Blau
2023-11-30 10:18 ` Patrick Steinhardt
2023-11-30 19:27 ` Taylor Blau
2023-12-03 13:15 ` Junio C Hamano
2023-12-05 19:26 ` Taylor Blau
2023-12-09 1:40 ` Junio C Hamano
2023-12-09 2:30 ` Taylor Blau
2023-12-12 8:03 ` Jeff King
2023-12-13 18:28 ` Taylor Blau
2023-12-13 19:20 ` Junio C Hamano
2023-11-28 19:08 ` [PATCH 06/24] midx: implement `midx_locate_pack()` Taylor Blau
2023-11-28 19:08 ` [PATCH 07/24] midx: implement `--retain-disjoint` mode Taylor Blau
2023-11-30 10:18 ` Patrick Steinhardt
2023-11-30 19:29 ` Taylor Blau
2023-12-01 8:02 ` Patrick Steinhardt
2023-11-28 19:08 ` [PATCH 08/24] pack-objects: implement `--ignore-disjoint` mode Taylor Blau
2023-11-30 10:18 ` Patrick Steinhardt
2023-11-30 19:32 ` Taylor Blau
2023-12-01 8:17 ` Patrick Steinhardt
2023-12-01 19:58 ` Taylor Blau
2023-11-28 19:08 ` [PATCH 09/24] repack: implement `--extend-disjoint` mode Taylor Blau
2023-12-07 13:13 ` Patrick Steinhardt
2023-12-07 20:28 ` Taylor Blau
2023-12-08 8:19 ` Patrick Steinhardt
2023-12-08 22:48 ` Taylor Blau
2023-12-11 8:18 ` Patrick Steinhardt
2023-12-11 19:59 ` Taylor Blau
2023-11-28 19:08 ` [PATCH 10/24] pack-bitmap: pass `bitmapped_pack` struct to pack-reuse functions Taylor Blau
2023-12-07 13:13 ` Patrick Steinhardt
2023-12-07 20:34 ` Taylor Blau
2023-12-08 8:19 ` Patrick Steinhardt
2023-11-28 19:08 ` [PATCH 11/24] pack-bitmap: simplify `reuse_partial_packfile_from_bitmap()` signature Taylor Blau
2023-12-07 13:13 ` Patrick Steinhardt
2023-12-07 14:36 ` Taylor Blau
2023-11-28 19:08 ` [PATCH 12/24] pack-bitmap: return multiple packs via `reuse_partial_packfile_from_bitmap()` Taylor Blau
2023-11-28 19:08 ` [PATCH 13/24] pack-objects: parameterize pack-reuse routines over a single pack Taylor Blau
2023-11-28 19:08 ` [PATCH 14/24] pack-objects: keep track of `pack_start` for each reuse pack Taylor Blau
2023-12-07 13:13 ` Patrick Steinhardt [this message]
2023-12-07 20:43 ` Taylor Blau
2023-11-28 19:08 ` [PATCH 15/24] pack-objects: pass `bitmapped_pack`'s to pack-reuse functions Taylor Blau
2023-11-28 19:08 ` [PATCH 16/24] pack-objects: prepare `write_reused_pack()` for multi-pack reuse Taylor Blau
2023-12-07 13:13 ` Patrick Steinhardt
2023-12-07 20:47 ` Taylor Blau
2023-11-28 19:08 ` [PATCH 17/24] pack-objects: prepare `write_reused_pack_verbatim()` " Taylor Blau
2023-11-28 19:08 ` [PATCH 18/24] pack-objects: include number of packs reused in output Taylor Blau
2023-11-28 19:08 ` [PATCH 19/24] pack-bitmap: prepare to mark objects from multiple packs for reuse Taylor Blau
2023-11-28 19:08 ` [PATCH 20/24] pack-objects: add tracing for various packfile metrics Taylor Blau
2023-11-28 19:08 ` [PATCH 21/24] t/test-lib-functions.sh: implement `test_trace2_data` helper Taylor Blau
2023-11-28 19:08 ` [PATCH 22/24] pack-objects: allow setting `pack.allowPackReuse` to "single" Taylor Blau
2023-11-28 19:08 ` [PATCH 23/24] pack-bitmap: reuse objects from all disjoint packs Taylor Blau
2023-11-28 19:08 ` [PATCH 24/24] t/perf: add performance tests for multi-pack reuse Taylor Blau
2023-11-30 10:18 ` [PATCH 00/24] pack-objects: multi-pack verbatim reuse Patrick Steinhardt
2023-11-30 19:39 ` Taylor Blau
2023-12-01 8:31 ` Patrick Steinhardt
2023-12-01 20:02 ` Taylor Blau
2023-12-04 8:49 ` Patrick Steinhardt
2023-12-12 8:12 ` Jeff King
2023-12-15 15:37 ` Taylor Blau
2023-12-21 11:13 ` Jeff King
2024-01-04 22:22 ` Taylor Blau
2023-12-14 22:23 ` [PATCH v2 00/26] " Taylor Blau
2023-12-14 22:23 ` [PATCH v2 01/26] pack-objects: free packing_data in more places Taylor Blau
2023-12-14 22:23 ` [PATCH v2 02/26] pack-bitmap-write: deep-clear the `bb_commit` slab Taylor Blau
2023-12-14 22:23 ` [PATCH v2 03/26] pack-bitmap: plug leak in find_objects() Taylor Blau
2023-12-14 22:23 ` [PATCH v2 04/26] midx: factor out `fill_pack_info()` Taylor Blau
2023-12-14 22:23 ` [PATCH v2 05/26] midx: implement `BTMP` chunk Taylor Blau
2023-12-14 22:23 ` [PATCH v2 06/26] midx: implement `midx_locate_pack()` Taylor Blau
2023-12-14 22:23 ` [PATCH v2 07/26] pack-bitmap: pass `bitmapped_pack` struct to pack-reuse functions Taylor Blau
2023-12-14 22:23 ` [PATCH v2 08/26] ewah: implement `bitmap_is_empty()` Taylor Blau
2023-12-14 22:24 ` [PATCH v2 09/26] pack-bitmap: simplify `reuse_partial_packfile_from_bitmap()` signature Taylor Blau
2023-12-14 22:24 ` [PATCH v2 10/26] pack-bitmap: return multiple packs via `reuse_partial_packfile_from_bitmap()` Taylor Blau
2023-12-14 22:24 ` [PATCH v2 11/26] pack-objects: parameterize pack-reuse routines over a single pack Taylor Blau
2023-12-14 22:24 ` [PATCH v2 12/26] pack-objects: keep track of `pack_start` for each reuse pack Taylor Blau
2023-12-14 22:24 ` [PATCH v2 13/26] pack-objects: pass `bitmapped_pack`'s to pack-reuse functions Taylor Blau
2023-12-14 22:24 ` [PATCH v2 14/26] pack-objects: prepare `write_reused_pack()` for multi-pack reuse Taylor Blau
2023-12-14 22:24 ` [PATCH v2 15/26] pack-objects: prepare `write_reused_pack_verbatim()` " Taylor Blau
2023-12-14 22:24 ` [PATCH v2 16/26] pack-objects: include number of packs reused in output Taylor Blau
2023-12-14 22:24 ` [PATCH v2 17/26] git-compat-util.h: implement checked size_t to uint32_t conversion Taylor Blau
2023-12-14 22:24 ` [PATCH v2 18/26] midx: implement `midx_preferred_pack()` Taylor Blau
2023-12-14 22:24 ` [PATCH v2 19/26] pack-revindex: factor out `midx_key_to_pack_pos()` helper Taylor Blau
2023-12-14 22:24 ` [PATCH v2 20/26] pack-revindex: implement `midx_pair_to_pack_pos()` Taylor Blau
2023-12-14 22:24 ` [PATCH v2 21/26] pack-bitmap: prepare to mark objects from multiple packs for reuse Taylor Blau
2023-12-14 22:24 ` [PATCH v2 22/26] pack-objects: add tracing for various packfile metrics Taylor Blau
2023-12-14 22:24 ` [PATCH v2 23/26] t/test-lib-functions.sh: implement `test_trace2_data` helper Taylor Blau
2023-12-14 22:24 ` [PATCH v2 24/26] pack-objects: allow setting `pack.allowPackReuse` to "single" Taylor Blau
2023-12-14 22:24 ` [PATCH v2 25/26] pack-bitmap: enable reuse from all bitmapped packs Taylor Blau
2023-12-14 22:24 ` [PATCH v2 26/26] t/perf: add performance tests for multi-pack reuse Taylor Blau
2023-12-15 0:06 ` [PATCH v2 00/26] pack-objects: multi-pack verbatim reuse Junio C Hamano
2023-12-15 0:38 ` Taylor Blau
2023-12-15 0:40 ` Junio C Hamano
2023-12-15 1:25 ` Taylor Blau
2023-12-21 10:51 ` Jeff King
2024-01-04 22:24 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZXHE9L7iqXQAit_1@tanuki \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).