From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>, Han-Wen Nienhuys <hanwenn@gmail.com>
Subject: [PATCH v2 00/11] reftable: optimize write performance
Date: Thu, 4 Apr 2024 07:48:05 +0200 [thread overview]
Message-ID: <cover.1712209149.git.ps@pks.im> (raw)
In-Reply-To: <cover.1712078736.git.ps@pks.im>
[-- Attachment #1: Type: text/plain, Size: 6776 bytes --]
Hi,
this is the second version of my patch series that aims to optimize
write performance in the reftable backend. I've made the following
changes compared to v2:
- Added a new patch to drop "reftable/refname.{c,h}" and related
completely.
- Reworded the commit message for the committer ident refactorings.
- Added a new patch that unifies the code paths that release reftable
writer resources.
- Fixed a stale comment claiming that we retry deflating, which we
don't.
Thanks!
Patrick
Patrick Steinhardt (11):
refs/reftable: fix D/F conflict error message on ref copy
refs/reftable: perform explicit D/F check when writing symrefs
refs/reftable: skip duplicate name checks
reftable: remove name checks
refs/reftable: don't recompute committer ident
reftable/writer: refactorings for `writer_add_record()`
reftable/writer: refactorings for `writer_flush_nonempty_block()`
reftable/writer: unify releasing memory
reftable/writer: reset `last_key` instead of releasing it
reftable/block: reuse zstream when writing log blocks
reftable/block: reuse compressed array
Makefile | 2 -
refs/reftable-backend.c | 75 +++++++++----
reftable/block.c | 80 ++++++++------
reftable/block.h | 4 +
reftable/error.c | 2 -
reftable/refname.c | 209 -------------------------------------
reftable/refname.h | 29 -----
reftable/refname_test.c | 101 ------------------
reftable/reftable-error.h | 3 -
reftable/reftable-tests.h | 1 -
reftable/reftable-writer.h | 4 -
reftable/stack.c | 67 +-----------
reftable/stack_test.c | 39 -------
reftable/writer.c | 137 +++++++++++++++---------
t/helper/test-reftable.c | 1 -
t/t0610-reftable-basics.sh | 35 ++++++-
16 files changed, 230 insertions(+), 559 deletions(-)
delete mode 100644 reftable/refname.c
delete mode 100644 reftable/refname.h
delete mode 100644 reftable/refname_test.c
Range-diff against v1:
1: 14b4dacd73 = 1: 926e802395 refs/reftable: fix D/F conflict error message on ref copy
2: 55db366e61 = 2: 6190171906 refs/reftable: perform explicit D/F check when writing symrefs
3: ad8210ec65 = 3: 80008cc5e7 refs/reftable: skip duplicate name checks
-: ---------- > 4: 3497a570b4 reftable: remove name checks
4: a9a6795c02 ! 5: f892a3007b refs/reftable: don't recompute committer ident
@@ Commit message
refs/reftable: don't recompute committer ident
In order to write reflog entries we need to compute the committer's
- identity as it becomes encoded in the log record itself. In the reftable
- backend, computing the identity is repeated for every single reflog
- entry which we are about to write in a transaction. Needless to say,
- this can be quite a waste of effort when writing many refs with reflog
- entries in a single transaction.
+ identity as it gets encoded in the log record itself. The reftable
+ backend does this via `git_committer_info()` and `split_ident_line()` in
+ `fill_reftable_log_record()`, which use the Git config as well as
+ environment variables to figure out the identity.
+
+ While most callers would only call `fill_reftable_log_record()` once or
+ twice, `write_transaction_table()` will call it as many times as there
+ are queued ref updates. This can be quite a waste of effort when writing
+ many refs with reflog entries in a single transaction.
Refactor the code to pre-compute the committer information. This results
in a small speedup when writing 100000 refs in a single transaction:
5: 8e9d69e9e6 = 6: 4877ab3921 reftable/writer: refactorings for `writer_add_record()`
6: 1f903afdda = 7: 8f1c5b4169 reftable/writer: refactorings for `writer_flush_nonempty_block()`
-: ---------- > 8: 41db7414e1 reftable/writer: unify releasing memory
9: 6950ae4ea7 ! 9: e5c7dbe417 reftable/writer: reset `last_key` instead of releasing it
@@ reftable/writer.c: static void writer_reinit_block_writer(struct reftable_writer
block_writer_init(&w->block_writer_data, typ, w->block,
w->opts.block_size, block_start,
hash_size(w->opts.hash_id));
-@@ reftable/writer.c: void reftable_writer_free(struct reftable_writer *w)
- block_writer_release(w->block_writer);
- w->block_writer = NULL;
- }
-+ strbuf_release(&w->last_key);
- reftable_free(w->block);
- reftable_free(w);
- }
@@ reftable/writer.c: static int writer_finish_section(struct reftable_writer *w)
bstats->max_index_level = max_level;
7: 86dab54dfe ! 10: 26f422703f reftable/block: reuse zstream when writing log blocks
@@ reftable/block.c: int block_writer_finish(struct block_writer *w)
+ w->zstream->avail_in = src_len;
+
+ /*
-+ * We want to perform all decompression in a single
-+ * step, which is why we can pass Z_FINISH here. Note
-+ * that both `Z_OK` and `Z_BUF_ERROR` indicate that we
-+ * need to retry according to documentation.
-+ *
-+ * If the call fails we retry with a bigger output
-+ * buffer.
++ * We want to perform all decompression in a single step, which
++ * is why we can pass Z_FINISH here. As we have precomputed the
++ * deflated buffer's size via `deflateBound()` this function is
++ * guaranteed to succeed according to the zlib documentation.
+ */
+ ret = deflate(w->zstream, Z_FINISH);
+ if (ret != Z_STREAM_END) {
@@ reftable/block.h: license that can be found in the LICENSE file or at
uint8_t *buf;
uint32_t block_size;
-
- ## reftable/writer.c ##
-@@ reftable/writer.c: void reftable_writer_free(struct reftable_writer *w)
- {
- if (!w)
- return;
-+ if (w->block_writer) {
-+ block_writer_release(w->block_writer);
-+ w->block_writer = NULL;
-+ }
- reftable_free(w->block);
- reftable_free(w);
- }
8: 9899b58dcf ! 11: 4f9df714da reftable/block: reuse compressed array
@@ reftable/block.c: int block_writer_finish(struct block_writer *w)
w->zstream->next_in = w->buf + block_header_skip;
w->zstream->avail_in = src_len;
@@ reftable/block.c: int block_writer_finish(struct block_writer *w)
- * buffer.
+ * guaranteed to succeed according to the zlib documentation.
*/
ret = deflate(w->zstream, Z_FINISH);
- if (ret != Z_STREAM_END) {
--
2.44.GIT
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-04-04 5:48 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-02 17:29 [PATCH 0/9] reftable: optimize write performance Patrick Steinhardt
2024-04-02 17:29 ` [PATCH 1/9] refs/reftable: fix D/F conflict error message on ref copy Patrick Steinhardt
2024-04-03 18:28 ` Junio C Hamano
2024-04-02 17:29 ` [PATCH 2/9] refs/reftable: perform explicit D/F check when writing symrefs Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 3/9] refs/reftable: skip duplicate name checks Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 4/9] refs/reftable: don't recompute committer ident Patrick Steinhardt
2024-04-03 18:58 ` Junio C Hamano
2024-04-04 5:36 ` Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 5/9] reftable/writer: refactorings for `writer_add_record()` Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 6/9] reftable/writer: refactorings for `writer_flush_nonempty_block()` Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 7/9] reftable/block: reuse zstream when writing log blocks Patrick Steinhardt
2024-04-03 19:35 ` Junio C Hamano
2024-04-04 5:36 ` Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 8/9] reftable/block: reuse compressed array Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 9/9] reftable/writer: reset `last_key` instead of releasing it Patrick Steinhardt
2024-04-04 5:48 ` Patrick Steinhardt [this message]
2024-04-04 5:48 ` [PATCH v2 01/11] refs/reftable: fix D/F conflict error message on ref copy Patrick Steinhardt
2024-04-04 5:48 ` [PATCH v2 02/11] refs/reftable: perform explicit D/F check when writing symrefs Patrick Steinhardt
2024-04-04 5:48 ` [PATCH v2 03/11] refs/reftable: skip duplicate name checks Patrick Steinhardt
2024-04-04 5:48 ` [PATCH v2 04/11] reftable: remove " Patrick Steinhardt
2024-04-04 5:48 ` [PATCH v2 05/11] refs/reftable: don't recompute committer ident Patrick Steinhardt
2024-04-04 5:48 ` [PATCH v2 06/11] reftable/writer: refactorings for `writer_add_record()` Patrick Steinhardt
2024-04-04 6:58 ` Han-Wen Nienhuys
2024-04-04 7:32 ` Patrick Steinhardt
2024-04-04 5:48 ` [PATCH v2 07/11] reftable/writer: refactorings for `writer_flush_nonempty_block()` Patrick Steinhardt
2024-04-04 5:48 ` [PATCH v2 08/11] reftable/writer: unify releasing memory Patrick Steinhardt
2024-04-04 7:08 ` Han-Wen Nienhuys
2024-04-04 7:32 ` Patrick Steinhardt
2024-04-04 9:00 ` Han-Wen Nienhuys
2024-04-04 11:43 ` Patrick Steinhardt
2024-04-04 5:48 ` [PATCH v2 09/11] reftable/writer: reset `last_key` instead of releasing it Patrick Steinhardt
2024-04-04 5:48 ` [PATCH v2 10/11] reftable/block: reuse zstream when writing log blocks Patrick Steinhardt
2024-04-04 5:48 ` [PATCH v2 11/11] reftable/block: reuse compressed array Patrick Steinhardt
2024-04-04 7:09 ` [PATCH v2 00/11] reftable: optimize write performance Han-Wen Nienhuys
2024-04-04 7:32 ` Patrick Steinhardt
2024-04-08 12:23 ` [PATCH v3 " Patrick Steinhardt
2024-04-08 12:23 ` [PATCH v3 01/11] refs/reftable: fix D/F conflict error message on ref copy Patrick Steinhardt
2024-04-08 12:23 ` [PATCH v3 02/11] refs/reftable: perform explicit D/F check when writing symrefs Patrick Steinhardt
2024-04-08 12:24 ` [PATCH v3 03/11] refs/reftable: skip duplicate name checks Patrick Steinhardt
2024-04-08 12:24 ` [PATCH v3 04/11] reftable: remove " Patrick Steinhardt
2024-04-08 12:24 ` [PATCH v3 05/11] refs/reftable: don't recompute committer ident Patrick Steinhardt
2024-04-08 12:24 ` [PATCH v3 06/11] reftable/writer: refactorings for `writer_add_record()` Patrick Steinhardt
2024-04-08 12:24 ` [PATCH v3 07/11] reftable/writer: refactorings for `writer_flush_nonempty_block()` Patrick Steinhardt
2024-04-08 12:24 ` [PATCH v3 08/11] reftable/writer: unify releasing memory Patrick Steinhardt
2024-04-08 12:24 ` [PATCH v3 09/11] reftable/writer: reset `last_key` instead of releasing it Patrick Steinhardt
2024-04-08 12:24 ` [PATCH v3 10/11] reftable/block: reuse zstream when writing log blocks Patrick Steinhardt
2024-04-08 12:24 ` [PATCH v3 11/11] reftable/block: reuse compressed array Patrick Steinhardt
2024-04-09 0:09 ` [PATCH v3 00/11] reftable: optimize write performance Junio C Hamano
2024-04-09 3:16 ` Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1712209149.git.ps@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=hanwenn@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).