From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>, Justin Tobler <jltobler@gmail.com>
Subject: [PATCH v2 00/13] reftable: prepare for re-seekable iterators
Date: Mon, 13 May 2024 10:46:56 +0200 [thread overview]
Message-ID: <cover.1715589670.git.ps@pks.im> (raw)
In-Reply-To: <cover.1715166175.git.ps@pks.im>
[-- Attachment #1: Type: text/plain, Size: 6572 bytes --]
Hi,
this is the second version of my patch series that prepares the reftable
iterators to become re-seekable. These refactorings will eventually
allow us to reuse data structures by iterators and optimize for certain
cases.
Changes compared to v1:
- Various fixes to commit messages.
- Fixed a copy & pasted comment to refer to logs instead of refs.
The series continues to build on top of ps/reftable-write-optim. There
is a merge conflict with the in-flight ps/reftable-write-options, but
given that this series has not yet been merged to next and because Junio
has already resolved the conflict in "seen" I decided to not pull it in
as an additional dependency.
Thanks!
Patrick
Patrick Steinhardt (13):
reftable/block: use `size_t` to track restart point index
reftable/reader: avoid copying index iterator
reftable/reader: unify indexed and linear seeking
reftable/reader: separate concerns of table iter and reftable reader
reftable/reader: inline `reader_seek_internal()`
reftable/reader: set up the reader when initializing table iterator
reftable/merged: split up initialization and seeking of records
reftable/merged: simplify indices for subiterators
reftable/generic: move seeking of records into the iterator
reftable/generic: adapt interface to allow reuse of iterators
reftable/reader: adapt interface to allow reuse of iterators
reftable/stack: provide convenience functions to create iterators
reftable/merged: adapt interface to allow reuse of iterators
refs/reftable-backend.c | 48 ++++----
reftable/block.c | 4 +-
reftable/generic.c | 94 +++++++++++----
reftable/generic.h | 9 +-
reftable/iter.c | 23 +++-
reftable/merged.c | 148 ++++++++----------------
reftable/merged.h | 6 +
reftable/merged_test.c | 19 ++-
reftable/reader.c | 218 +++++++++++++++--------------------
reftable/readwrite_test.c | 35 ++++--
reftable/reftable-generic.h | 8 +-
reftable/reftable-iterator.h | 21 ++++
reftable/reftable-merged.h | 15 ---
reftable/reftable-reader.h | 45 ++------
reftable/reftable-stack.h | 18 +++
reftable/stack.c | 29 ++++-
16 files changed, 378 insertions(+), 362 deletions(-)
Range-diff against v1:
1: ca86a8b58d = 1: ca86a8b58d reftable/block: use `size_t` to track restart point index
2: bdbebdbd36 = 2: bdbebdbd36 reftable/reader: avoid copying index iterator
3: 716863a580 = 3: 716863a580 reftable/reader: unify indexed and linear seeking
4: 91db2f18c1 = 4: 91db2f18c1 reftable/reader: separate concerns of table iter and reftable reader
5: 4d498ef342 = 5: 4d498ef342 reftable/reader: inline `reader_seek_internal()`
6: 5a10a11584 = 6: 5a10a11584 reftable/reader: set up the reader when initializing table iterator
7: 21b3e3ab5f ! 7: 12c10fd366 reftable/merged: split up initialization and seeking of records
@@ Commit message
To initialize a `struct merged_iter`, we need to seek all subiterators
to the wanted record and then add their results to the priority queue
used to sort the records. This logic is split up across two functions,
- `merged_table_seek_record()` and `merged_table_iter()`. The scope of
- these functions is somewhat weird though, where `merged_table_iter()` is
+ `merged_table_seek_record()` and `merged_iter_init()`. The scope of
+ these functions is somewhat weird though, where `merged_iter_init()` is
only responsible for adding the records of the subiterators to the
priority queue.
- Clarify the scope of those functions such that `merged_table_iter()` is
+ Clarify the scope of those functions such that `merged_iter_init()` is
only responsible for initializing the iterator's structure. Performing
the subiterator seeks are now part of `merged_table_seek_record()`.
8: f0f42cd56b ! 8: 31bdfc1e34 reftable/merged: simplify indices for subiterators
@@ Commit message
reftable/merged: simplify indices for subiterators
When seeking on a merged table, we perform the seek for each of the
- subiterators. If the subiterator hasa the desired record we add it to
- the priority queue, otherwise we skip it and don't add it to the stack
- of subiterators hosted by the merged table.
+ subiterators. If the subiterator has the desired record we add it to the
+ priority queue, otherwise we skip it and don't add it to the stack of
+ subiterators hosted by the merged table.
The consequence of this is that the index of the subiterator in the
merged table does not necessarily correspond to the index of it in the
9: 859b399e71 ! 9: 3e91c3830e reftable/generic: move seeking of records into the iterator
@@ Commit message
The second and more significant downside is that it is impossible to
reuse iterators for multiple seeks. Whenever you want to look up a
record, you need to re-create the whole infrastructure again, which is
- quite a waste of time. Furthermore, it is impossible to for example
- optimize seeks, for example when seeking the same record multiple times.
+ quite a waste of time. Furthermore, it is impossible to optimize seeks,
+ such as when seeking the same record multiple times.
To address this, we essentially split up the concerns properly such that
the parent data structure is responsible for setting up the iterator via
10: 727b8fa432 = 10: b0641dd800 reftable/generic: adapt interface to allow reuse of iterators
11: f688f8f59a ! 11: 0745951650 reftable/reader: adapt interface to allow reuse of iterators
@@ reftable/reftable-reader.h: struct reftable_table;
+void reftable_reader_init_ref_iterator(struct reftable_reader *r,
+ struct reftable_iterator *it);
+
-+/* Initialize a reftable iterator for reading refs. */
++/* Initialize a reftable iterator for reading logs. */
+void reftable_reader_init_log_iterator(struct reftable_reader *r,
+ struct reftable_iterator *it);
12: 68cc35919b = 12: 51480c4328 reftable/stack: provide convenience functions to create iterators
13: be4da295c6 = 13: 2586e93c44 reftable/merged: adapt interface to allow reuse of iterators
--
2.45.GIT
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-05-13 8:47 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-08 11:03 [PATCH 00/13] reftable: prepare for re-seekable iterators Patrick Steinhardt
2024-05-08 11:03 ` [PATCH 01/13] reftable/block: use `size_t` to track restart point index Patrick Steinhardt
2024-05-08 11:03 ` [PATCH 02/13] reftable/reader: avoid copying index iterator Patrick Steinhardt
2024-05-08 11:03 ` [PATCH 03/13] reftable/reader: unify indexed and linear seeking Patrick Steinhardt
2024-05-08 11:03 ` [PATCH 04/13] reftable/reader: separate concerns of table iter and reftable reader Patrick Steinhardt
2024-05-08 11:03 ` [PATCH 05/13] reftable/reader: inline `reader_seek_internal()` Patrick Steinhardt
2024-05-08 11:04 ` [PATCH 06/13] reftable/reader: set up the reader when initializing table iterator Patrick Steinhardt
2024-05-08 11:04 ` [PATCH 07/13] reftable/merged: split up initialization and seeking of records Patrick Steinhardt
2024-05-10 19:18 ` Justin Tobler
2024-05-13 8:36 ` Patrick Steinhardt
2024-05-08 11:04 ` [PATCH 08/13] reftable/merged: simplify indices for subiterators Patrick Steinhardt
2024-05-10 19:25 ` Justin Tobler
2024-05-13 8:36 ` Patrick Steinhardt
2024-05-08 11:04 ` [PATCH 09/13] reftable/generic: move seeking of records into the iterator Patrick Steinhardt
2024-05-10 21:44 ` Justin Tobler
2024-05-13 8:36 ` Patrick Steinhardt
2024-05-08 11:04 ` [PATCH 10/13] reftable/generic: adapt interface to allow reuse of iterators Patrick Steinhardt
2024-05-08 11:04 ` [PATCH 11/13] reftable/reader: " Patrick Steinhardt
2024-05-10 21:48 ` Justin Tobler
2024-05-13 8:36 ` Patrick Steinhardt
2024-05-08 11:04 ` [PATCH 12/13] reftable/stack: provide convenience functions to create iterators Patrick Steinhardt
2024-05-08 11:04 ` [PATCH 13/13] reftable/merged: adapt interface to allow reuse of iterators Patrick Steinhardt
2024-05-08 23:42 ` [PATCH 00/13] reftable: prepare for re-seekable iterators Junio C Hamano
2024-05-09 0:16 ` Junio C Hamano
2024-05-10 7:48 ` Patrick Steinhardt
2024-05-10 15:40 ` Junio C Hamano
2024-05-10 16:13 ` Patrick Steinhardt
2024-05-10 17:17 ` Junio C Hamano
2024-05-13 8:46 ` Patrick Steinhardt [this message]
2024-05-13 8:47 ` [PATCH v2 01/13] reftable/block: use `size_t` to track restart point index Patrick Steinhardt
2024-05-21 13:34 ` Karthik Nayak
2024-05-22 7:23 ` Patrick Steinhardt
2024-05-13 8:47 ` [PATCH v2 02/13] reftable/reader: avoid copying index iterator Patrick Steinhardt
2024-05-13 8:47 ` [PATCH v2 03/13] reftable/reader: unify indexed and linear seeking Patrick Steinhardt
2024-05-21 14:41 ` Karthik Nayak
2024-05-22 7:23 ` Patrick Steinhardt
2024-05-22 7:56 ` Karthik Nayak
2024-05-13 8:47 ` [PATCH v2 04/13] reftable/reader: separate concerns of table iter and reftable reader Patrick Steinhardt
2024-05-13 8:47 ` [PATCH v2 05/13] reftable/reader: inline `reader_seek_internal()` Patrick Steinhardt
2024-05-13 8:47 ` [PATCH v2 06/13] reftable/reader: set up the reader when initializing table iterator Patrick Steinhardt
2024-05-13 8:47 ` [PATCH v2 07/13] reftable/merged: split up initialization and seeking of records Patrick Steinhardt
2024-05-13 8:47 ` [PATCH v2 08/13] reftable/merged: simplify indices for subiterators Patrick Steinhardt
2024-05-13 8:47 ` [PATCH v2 09/13] reftable/generic: move seeking of records into the iterator Patrick Steinhardt
2024-05-13 8:47 ` [PATCH v2 10/13] reftable/generic: adapt interface to allow reuse of iterators Patrick Steinhardt
2024-05-13 8:47 ` [PATCH v2 11/13] reftable/reader: " Patrick Steinhardt
2024-05-13 8:47 ` [PATCH v2 12/13] reftable/stack: provide convenience functions to create iterators Patrick Steinhardt
2024-05-13 8:48 ` [PATCH v2 13/13] reftable/merged: adapt interface to allow reuse of iterators Patrick Steinhardt
2024-05-15 20:15 ` [PATCH v2 00/13] reftable: prepare for re-seekable iterators Justin Tobler
2024-05-21 15:31 ` Karthik Nayak
2024-05-22 7:23 ` Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1715589670.git.ps@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jltobler@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).