From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Subject: Re: [PATCH 03/12] reftable/merged: advance subiter on subsequent iteration
Date: Tue, 27 Feb 2024 17:50:01 +0100 [thread overview]
Message-ID: <Zd4SuYKesIE07nlp@ncase> (raw)
In-Reply-To: <ns2yw3icnl3udejbgsv4ojwgzbe7eg57bvsm53kciuemlvmgbr@mpjtogpzdavi>
[-- Attachment #1: Type: text/plain, Size: 5191 bytes --]
On Tue, Feb 20, 2024 at 12:25:10PM -0600, Justin Tobler wrote:
> On 24/02/14 08:45AM, Patrick Steinhardt wrote:
> > When advancing the merged iterator, we pop the top-most entry from its
>
> s/top-most/topmost
>
> > priority queue and then advance the sub-iterator that the entry belongs
> > to, adding the result as a new entry. This is quite sensible in the case
> > where the merged iterator is used to actual iterate through records. But
>
> s/actual/actually
>
> > the merged iterator is also used when we look up a single record, only,
> > so advancing the sub-iterator is wasted effort because we would never
> > even look at the result.
> >
> > Instead of immediately advancing the sub-iterator, we can also defer
> > this to the next iteration of the merged iterator by storing the
> > intent-to-advance. This results in a small speedup when reading many
> > records. The following benchmark creates 10000 refs, which will also end
> > up with many ref lookups:
> >
> > Benchmark 1: update-ref: create many refs (revision = HEAD~)
> > Time (mean ± σ): 337.2 ms ± 7.3 ms [User: 200.1 ms, System: 136.9 ms]
> > Range (min … max): 329.3 ms … 373.2 ms 100 runs
> >
> > Benchmark 2: update-ref: create many refs (revision = HEAD)
> > Time (mean ± σ): 332.5 ms ± 5.9 ms [User: 197.2 ms, System: 135.1 ms]
> > Range (min … max): 327.6 ms … 359.8 ms 100 runs
> >
> > Summary
> > update-ref: create many refs (revision = HEAD) ran
> > 1.01 ± 0.03 times faster than update-ref: create many refs (revision = HEAD~)
> >
> > While this speedup alone isn't really worth it, this refactoring will
> > also allow two additional optimizations in subsequent patches. First, it
> > will allow us to special-case when there is only a single sub-iter left
> > to circumvent the priority queue altogether. And second, it makes it
> > easier to avoid copying records to the caller.
> >
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> > reftable/merged.c | 26 ++++++++++++--------------
> > 1 file changed, 12 insertions(+), 14 deletions(-)
> >
> > diff --git a/reftable/merged.c b/reftable/merged.c
> > index 12ebd732e8..9b1ccfff00 100644
> > --- a/reftable/merged.c
> > +++ b/reftable/merged.c
> > @@ -19,11 +19,12 @@ license that can be found in the LICENSE file or at
> >
> > struct merged_iter {
> > struct reftable_iterator *stack;
> > + struct merged_iter_pqueue pq;
> > uint32_t hash_id;
> > size_t stack_len;
> > uint8_t typ;
> > int suppress_deletions;
> > - struct merged_iter_pqueue pq;
> > + ssize_t advance_index;
> > };
> >
> > static int merged_iter_init(struct merged_iter *mi)
> > @@ -96,13 +97,17 @@ static int merged_iter_next_entry(struct merged_iter *mi,
> > struct pq_entry entry = { 0 };
> > int err = 0;
> >
> > + if (mi->advance_index >= 0) {
> > + err = merged_iter_advance_subiter(mi, mi->advance_index);
> > + if (err < 0)
> > + return err;
> > + mi->advance_index = -1;
> > + }
> > +
>
> Without additional context, it isn't immediately clear to me why the
> sub-iterator is condionally advanced at the beginning. Maybe a comment
> could be added to explain as done in the commit message to help with
> clarity?
I tried to mention this in the commit message with the last paragraph.
Adding a comment doesn't make much sense at this point in the patch
seires because a later patch changes how this works.
> > if (merged_iter_pqueue_is_empty(mi->pq))
> > return 1;
> >
> > entry = merged_iter_pqueue_remove(&mi->pq);
> > - err = merged_iter_advance_subiter(mi, entry.index);
> > - if (err < 0)
> > - return err;
> >
> > /*
> > One can also use reftable as datacenter-local storage, where the ref
> > @@ -116,14 +121,6 @@ static int merged_iter_next_entry(struct merged_iter *mi,
> > struct pq_entry top = merged_iter_pqueue_top(mi->pq);
> > int cmp;
> >
> > - /*
> > - * When the next entry comes from the same queue as the current
> > - * entry then it must by definition be larger. This avoids a
> > - * comparison in the most common case.
> > - */
> > - if (top.index == entry.index)
> > - break;
> > -
>
> I'm not quite sure I follow by the above check is removed as part of
> this change. Would you mind clarifying?
The loop that this comparison has been part of was popping all entries
from the priority queue that are being shadowed by the sub-iterator from
which we're about to return the entry. So e.g. in the case of a ref
record, we discard all records with the same refname which are shadowed
by a newer (higher update-index) table.
The removed condition was an optimization was a micro-optimization: when
the next entry in the pqueue is from the same index as the entry we are
about to return, then we know that it cannot have been shadowed. This
allowed us to avoid a key comparison.
But with the change in this commit we don't even add the next record of
the current sub-iter to the pqueue, and thus the condition cannot happen
anymore.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-02-27 16:50 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-14 7:45 [PATCH 00/12] reftable: improve ref iteration performance (pt.2) Patrick Steinhardt
2024-02-14 7:45 ` [PATCH 01/12] reftable/pq: use `size_t` to track iterator index Patrick Steinhardt
2024-02-14 7:45 ` [PATCH 02/12] reftable/merged: make `merged_iter` structure private Patrick Steinhardt
2024-02-20 18:15 ` Justin Tobler
2024-02-27 16:49 ` Patrick Steinhardt
2024-02-14 7:45 ` [PATCH 03/12] reftable/merged: advance subiter on subsequent iteration Patrick Steinhardt
2024-02-20 18:25 ` Justin Tobler
2024-02-27 16:50 ` Patrick Steinhardt [this message]
2024-02-14 7:45 ` [PATCH 04/12] reftable/merged: make subiters own their records Patrick Steinhardt
2024-02-14 7:45 ` [PATCH 05/12] reftable/merged: remove unnecessary null check for subiters Patrick Steinhardt
2024-02-14 7:46 ` [PATCH 06/12] reftable/merged: handle subiter cleanup on close only Patrick Steinhardt
2024-02-14 7:46 ` [PATCH 07/12] reftable/merged: circumvent pqueue with single subiter Patrick Steinhardt
2024-02-14 7:46 ` [PATCH 08/12] reftable/merged: avoid duplicate pqueue emptiness check Patrick Steinhardt
2024-02-27 23:53 ` James Liu
2024-02-14 7:46 ` [PATCH 09/12] reftable/record: reuse refname when decoding Patrick Steinhardt
2024-02-28 0:06 ` James Liu
2024-03-04 10:39 ` Patrick Steinhardt
2024-02-14 7:46 ` [PATCH 10/12] reftable/record: reuse refname when copying Patrick Steinhardt
2024-02-28 0:08 ` James Liu
2024-02-14 7:46 ` [PATCH 11/12] reftable/record: decode keys in place Patrick Steinhardt
2024-02-28 0:13 ` James Liu
2024-03-04 10:39 ` Patrick Steinhardt
2024-02-14 7:46 ` [PATCH 12/12] reftable: allow inlining of a few functions Patrick Steinhardt
2024-02-27 15:06 ` [PATCH v2 00/13] reftable: improve ref iteration performance (pt.2) Patrick Steinhardt
2024-02-27 15:06 ` [PATCH v2 01/13] reftable/pq: use `size_t` to track iterator index Patrick Steinhardt
2024-02-27 15:06 ` [PATCH v2 02/13] reftable/merged: make `merged_iter` structure private Patrick Steinhardt
2024-02-27 15:06 ` [PATCH v2 03/13] reftable/merged: advance subiter on subsequent iteration Patrick Steinhardt
2024-02-27 15:06 ` [PATCH v2 04/13] reftable/merged: make subiters own their records Patrick Steinhardt
2024-02-27 15:06 ` [PATCH v2 05/13] reftable/merged: remove unnecessary null check for subiters Patrick Steinhardt
2024-02-27 15:06 ` [PATCH v2 06/13] reftable/merged: handle subiter cleanup on close only Patrick Steinhardt
2024-02-27 15:06 ` [PATCH v2 07/13] reftable/merged: circumvent pqueue with single subiter Patrick Steinhardt
2024-02-27 15:06 ` [PATCH v2 08/13] reftable/merged: avoid duplicate pqueue emptiness check Patrick Steinhardt
2024-02-27 15:06 ` [PATCH v2 09/13] reftable/record: reuse refname when decoding Patrick Steinhardt
2024-02-27 15:06 ` [PATCH v2 10/13] reftable/record: reuse refname when copying Patrick Steinhardt
2024-02-27 15:06 ` [PATCH v2 11/13] reftable/record: decode keys in place Patrick Steinhardt
2024-02-27 15:07 ` [PATCH v2 12/13] reftable: allow inlining of a few functions Patrick Steinhardt
2024-02-27 15:07 ` [PATCH v2 13/13] refs/reftable: precompute prefix length Patrick Steinhardt
2024-03-04 10:48 ` [PATCH v3 00/13] reftable: improve ref iteration performance (pt.2) Patrick Steinhardt
2024-03-04 10:48 ` [PATCH v3 01/13] reftable/pq: use `size_t` to track iterator index Patrick Steinhardt
2024-03-04 10:48 ` [PATCH v3 02/13] reftable/merged: make `merged_iter` structure private Patrick Steinhardt
2024-03-04 10:48 ` [PATCH v3 03/13] reftable/merged: advance subiter on subsequent iteration Patrick Steinhardt
2024-03-04 10:48 ` [PATCH v3 04/13] reftable/merged: make subiters own their records Patrick Steinhardt
2024-03-04 10:49 ` [PATCH v3 05/13] reftable/merged: remove unnecessary null check for subiters Patrick Steinhardt
2024-03-04 10:49 ` [PATCH v3 06/13] reftable/merged: handle subiter cleanup on close only Patrick Steinhardt
2024-03-04 10:49 ` [PATCH v3 07/13] reftable/merged: circumvent pqueue with single subiter Patrick Steinhardt
2024-03-04 10:49 ` [PATCH v3 08/13] reftable/merged: avoid duplicate pqueue emptiness check Patrick Steinhardt
2024-03-04 10:49 ` [PATCH v3 09/13] reftable/record: reuse refname when decoding Patrick Steinhardt
2024-03-04 10:49 ` [PATCH v3 10/13] reftable/record: reuse refname when copying Patrick Steinhardt
2024-03-04 10:49 ` [PATCH v3 11/13] reftable/record: decode keys in place Patrick Steinhardt
2024-03-04 10:49 ` [PATCH v3 12/13] reftable: allow inlining of a few functions Patrick Steinhardt
2024-03-04 10:49 ` [PATCH v3 13/13] refs/reftable: precompute prefix length Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zd4SuYKesIE07nlp@ncase \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).