All of lore.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Elijah Newren <newren@gmail.com>,
	Junio C Hamano <gitster@pobox.com>,
	Patrick Steinhardt <ps@pks.im>
Subject: Re: [PATCH 09/11] pack-bitmap: enable cross-pack delta reuse
Date: Mon, 4 Nov 2024 16:00:08 -0500	[thread overview]
Message-ID: <Zyk12FZ+NcOUALAz@nand.local> (raw)
In-Reply-To: <20241011083134.GG18010@coredump.intra.peff.net>

On Fri, Oct 11, 2024 at 04:31:34AM -0400, Jeff King wrote:
> On Wed, Oct 09, 2024 at 04:31:28PM -0400, Taylor Blau wrote:
>
> > In order to do this, the pack-reuse code within pack-bitmap.c marks
> > bits in a separate bitmap called 'reuse_as_ref_delta'. Objects whose
> > bits are marked in that bitmap must be converted from OFS_DELTAs to
> > REF_DELTAs.
> >
> > To mark bits in that bitmap, we adjust find_base_bitmap_pos() to
> > return the bitmap position of any delta object's base regardless of
> > whether or not it came from the same pack. This is done by:
> >
> >   1. First converting the base object's into a pack position (via
> >      `offset_to_pack_pos()`).
> >
> >   2. Then converting from pack position into into lexical order (via
> >      `pack_pos_to_index()`).
> >
> >   3. Then into an object ID (via `nth_packed_object_id()`).
> >
> >   4. Then into a position in the MIDX's lexical order of object IDs
> >      (via `bsearch_midx()`).
> >
> >   5. And finally into a position in the MIDX's pseudo-pack order (via
> >      `midx_pair_to_pack_pos()`).
> >
> > If we can find that base object in the MIDX, then we use its position
> > in the MIDX's pseudo-pack order to determine whether or not it was
> > selected from the same pack. If it is, no adjustment is necessary.
> > Otherwise, we mark the delta object's position in the new
> > `reuse_as_ref_delta` bitmap, and convert accordingly from within
> > `write_reused_pack_one()`.
>
> OK, that makes sense. It does feel like a non-trivial amount of work to
> do for each delta we're going to (potentially) reuse from a midx'd pack.
> Can we recognize the common case that the base is in the same pack and
> also being sent/reused without doing the full conversion to oid and the
> resulting bsearch?

I don't think it ends up saving you anything if you don't find anything
matching the pack/offset pair in the MIDX. If you perform that lookup
with bsearch_midx() and get nothing back, then you have to take the
slower path above anyway.

My figuring here was that it would be better to uniformly take a
slightly slower path instead of taking a hopefully-faster path which
might fail, only to then go back to the slower path on top.

Of course, you could do both, or apply some heuristics like avoiding the
cross-pack lookup if you know you're in the preferred pack, etc. I'm not
sure how much it is worth doing so, TBH.

> > @@ -1182,10 +1188,24 @@ static size_t write_reused_pack_verbatim(struct bitmapped_pack *reuse_packfile,
> >  	if (pos >= end)
> >  		return reuse_packfile->bitmap_pos / BITS_IN_EWORD;
> >
> > -	while (pos < end &&
> > -	       reuse_packfile_bitmap->words[pos / BITS_IN_EWORD] == (eword_t)~0)
> > +	while (pos < end) {
> > +		size_t wpos = pos / BITS_IN_EWORD;
> > +		eword_t reuse;
> > +
> > +		reuse = reuse_packfile_bitmap->words[wpos];
> > +		if (reuse_as_ref_delta_packfile_bitmap) {
> > +			/*
> > +			 * Can't reuse verbatim any objects which need
> > +			 * to be first rewritten as REF_DELTAs.
> > +			 */
> > +			reuse &= ~reuse_as_ref_delta_packfile_bitmap->words[wpos];
> > +		}
> > +
> > +		if (reuse != (eword_t)~0)
> > +			break;
> > +
> >  		pos += BITS_IN_EWORD;
> > -
> > +	}
>
> This is accessing reuse_as_ref_delta_packfile_bitmap->words directly
> using pos/end as limits. But those come from reuse_packfile_bitmap. Are
> we guaranteed to have zero-extended the reuse_as_ref_delta bitmap as far
> as the original went?

Yeah, we know this is OK because both are allocated with the same size
in reuse_partial_packfile_from_bitmap(), where the relevant portion is:

    word_alloc = objects_nr / BITS_IN_EWORD;
    if (objects_nr % BITS_IN_EWORD)
            word_alloc++;

    reuse = bitmap_word_alloc(word_alloc);
    reuse_as_ref_delta = bitmap_word_alloc(word_alloc);

all of the bitmap_set() operations on the former are bounded in
try_partial_reuse(), but adding a length check can be done here as an
extra safety measure.

> Could we just be calling bitmap_get() here, which would do the length
> check for us? Though I guess we would miss out on some whole-word magic
> it is doing. So maybe we need to just do that length check ourselves.

Yeah, we don't use bitmap_get() because we want to access the whole word
at a time.

Thanks,
Taylor

  reply	other threads:[~2024-11-04 21:00 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-09 20:30 [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible Taylor Blau
2024-10-09 20:31 ` [PATCH 01/11] pack-bitmap.c: do not pass `pack_pos` to `try_partial_reuse()` Taylor Blau
2024-10-09 20:31 ` [PATCH 02/11] pack-bitmap.c: avoid unnecessary `offset_to_pack_pos()` Taylor Blau
2024-10-09 20:31 ` [PATCH 03/11] pack-bitmap.c: delay calling 'offset_to_pack_pos()' Taylor Blau
2024-10-09 20:31 ` [PATCH 04/11] pack-bitmap.c: compare `base_offset` to `delta_obj_offset` Taylor Blau
2024-10-09 20:31 ` [PATCH 05/11] pack-bitmap.c: extract `find_base_bitmap_pos()` Taylor Blau
2024-10-09 20:31 ` [PATCH 06/11] pack-bitmap: drop `from_midx` field from `bitmapped_pack` Taylor Blau
2024-10-09 20:31 ` [PATCH 07/11] write_reused_pack_one(): translate bit positions directly Taylor Blau
2024-10-11  8:16   ` Jeff King
2024-11-04 20:36     ` Taylor Blau
2024-10-09 20:31 ` [PATCH 08/11] t5332: enable OFS_DELTAs via test_pack_objects_reused Taylor Blau
2024-10-11  8:19   ` Jeff King
2024-11-04 20:50     ` Taylor Blau
2024-10-09 20:31 ` [PATCH 09/11] pack-bitmap: enable cross-pack delta reuse Taylor Blau
2024-10-11  8:31   ` Jeff King
2024-11-04 21:00     ` Taylor Blau [this message]
2024-10-09 20:31 ` [PATCH 10/11] pack-bitmap.c: record whether the result was filtered Taylor Blau
2024-10-11  8:35   ` Jeff King
2024-11-04 21:01     ` Taylor Blau
2024-10-09 20:31 ` [PATCH 11/11] pack-bitmap: enable reusing deltas with base objects in 'haves' bitmap Taylor Blau
2024-10-10 16:46 ` [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible Junio C Hamano
2024-10-10 17:07   ` Taylor Blau
2024-10-10 20:20     ` Junio C Hamano
2024-10-10 20:32       ` Taylor Blau
2024-10-11  7:54       ` Jeff King
2024-10-11  8:01         ` Jeff King
2024-10-11 16:23           ` Junio C Hamano
2024-10-11  8:38 ` Jeff King
2024-11-19 23:08   ` Taylor Blau
2024-11-19 23:34     ` Taylor Blau
2024-12-18 12:57       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zyk12FZ+NcOUALAz@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.