From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>, Han-Wen Nienhuys <hanwenn@gmail.com>
Subject: [PATCH v2 00/11] reftable: optimize write performance
Date: Thu, 4 Apr 2024 07:48:05 +0200 [thread overview]
Message-ID: <cover.1712209149.git.ps@pks.im> (raw)
In-Reply-To: <cover.1712078736.git.ps@pks.im>
[-- Attachment #1: Type: text/plain, Size: 6776 bytes --]
Hi,
this is the second version of my patch series that aims to optimize
write performance in the reftable backend. I've made the following
changes compared to v2:
- Added a new patch to drop "reftable/refname.{c,h}" and related
completely.
- Reworded the commit message for the committer ident refactorings.
- Added a new patch that unifies the code paths that release reftable
writer resources.
- Fixed a stale comment claiming that we retry deflating, which we
don't.
Thanks!
Patrick
Patrick Steinhardt (11):
refs/reftable: fix D/F conflict error message on ref copy
refs/reftable: perform explicit D/F check when writing symrefs
refs/reftable: skip duplicate name checks
reftable: remove name checks
refs/reftable: don't recompute committer ident
reftable/writer: refactorings for `writer_add_record()`
reftable/writer: refactorings for `writer_flush_nonempty_block()`
reftable/writer: unify releasing memory
reftable/writer: reset `last_key` instead of releasing it
reftable/block: reuse zstream when writing log blocks
reftable/block: reuse compressed array
Makefile | 2 -
refs/reftable-backend.c | 75 +++++++++----
reftable/block.c | 80 ++++++++------
reftable/block.h | 4 +
reftable/error.c | 2 -
reftable/refname.c | 209 -------------------------------------
reftable/refname.h | 29 -----
reftable/refname_test.c | 101 ------------------
reftable/reftable-error.h | 3 -
reftable/reftable-tests.h | 1 -
reftable/reftable-writer.h | 4 -
reftable/stack.c | 67 +-----------
reftable/stack_test.c | 39 -------
reftable/writer.c | 137 +++++++++++++++---------
t/helper/test-reftable.c | 1 -
t/t0610-reftable-basics.sh | 35 ++++++-
16 files changed, 230 insertions(+), 559 deletions(-)
delete mode 100644 reftable/refname.c
delete mode 100644 reftable/refname.h
delete mode 100644 reftable/refname_test.c
Range-diff against v1:
1: 14b4dacd73 = 1: 926e802395 refs/reftable: fix D/F conflict error message on ref copy
2: 55db366e61 = 2: 6190171906 refs/reftable: perform explicit D/F check when writing symrefs
3: ad8210ec65 = 3: 80008cc5e7 refs/reftable: skip duplicate name checks
-: ---------- > 4: 3497a570b4 reftable: remove name checks
4: a9a6795c02 ! 5: f892a3007b refs/reftable: don't recompute committer ident
@@ Commit message
refs/reftable: don't recompute committer ident
In order to write reflog entries we need to compute the committer's
- identity as it becomes encoded in the log record itself. In the reftable
- backend, computing the identity is repeated for every single reflog
- entry which we are about to write in a transaction. Needless to say,
- this can be quite a waste of effort when writing many refs with reflog
- entries in a single transaction.
+ identity as it gets encoded in the log record itself. The reftable
+ backend does this via `git_committer_info()` and `split_ident_line()` in
+ `fill_reftable_log_record()`, which use the Git config as well as
+ environment variables to figure out the identity.
+
+ While most callers would only call `fill_reftable_log_record()` once or
+ twice, `write_transaction_table()` will call it as many times as there
+ are queued ref updates. This can be quite a waste of effort when writing
+ many refs with reflog entries in a single transaction.
Refactor the code to pre-compute the committer information. This results
in a small speedup when writing 100000 refs in a single transaction:
5: 8e9d69e9e6 = 6: 4877ab3921 reftable/writer: refactorings for `writer_add_record()`
6: 1f903afdda = 7: 8f1c5b4169 reftable/writer: refactorings for `writer_flush_nonempty_block()`
-: ---------- > 8: 41db7414e1 reftable/writer: unify releasing memory
9: 6950ae4ea7 ! 9: e5c7dbe417 reftable/writer: reset `last_key` instead of releasing it
@@ reftable/writer.c: static void writer_reinit_block_writer(struct reftable_writer
block_writer_init(&w->block_writer_data, typ, w->block,
w->opts.block_size, block_start,
hash_size(w->opts.hash_id));
-@@ reftable/writer.c: void reftable_writer_free(struct reftable_writer *w)
- block_writer_release(w->block_writer);
- w->block_writer = NULL;
- }
-+ strbuf_release(&w->last_key);
- reftable_free(w->block);
- reftable_free(w);
- }
@@ reftable/writer.c: static int writer_finish_section(struct reftable_writer *w)
bstats->max_index_level = max_level;
7: 86dab54dfe ! 10: 26f422703f reftable/block: reuse zstream when writing log blocks
@@ reftable/block.c: int block_writer_finish(struct block_writer *w)
+ w->zstream->avail_in = src_len;
+
+ /*
-+ * We want to perform all decompression in a single
-+ * step, which is why we can pass Z_FINISH here. Note
-+ * that both `Z_OK` and `Z_BUF_ERROR` indicate that we
-+ * need to retry according to documentation.
-+ *
-+ * If the call fails we retry with a bigger output
-+ * buffer.
++ * We want to perform all decompression in a single step, which
++ * is why we can pass Z_FINISH here. As we have precomputed the
++ * deflated buffer's size via `deflateBound()` this function is
++ * guaranteed to succeed according to the zlib documentation.
+ */
+ ret = deflate(w->zstream, Z_FINISH);
+ if (ret != Z_STREAM_END) {
@@ reftable/block.h: license that can be found in the LICENSE file or at
uint8_t *buf;
uint32_t block_size;
-
- ## reftable/writer.c ##
-@@ reftable/writer.c: void reftable_writer_free(struct reftable_writer *w)
- {
- if (!w)
- return;
-+ if (w->block_writer) {
-+ block_writer_release(w->block_writer);
-+ w->block_writer = NULL;
-+ }
- reftable_free(w->block);
- reftable_free(w);
- }
8: 9899b58dcf ! 11: 4f9df714da reftable/block: reuse compressed array
@@ reftable/block.c: int block_writer_finish(struct block_writer *w)
w->zstream->next_in = w->buf + block_header_skip;
w->zstream->avail_in = src_len;
@@ reftable/block.c: int block_writer_finish(struct block_writer *w)
- * buffer.
+ * guaranteed to succeed according to the zlib documentation.
*/
ret = deflate(w->zstream, Z_FINISH);
- if (ret != Z_STREAM_END) {
--
2.44.GIT
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-04-04 5:48 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-02 17:29 [PATCH 0/9] reftable: optimize write performance Patrick Steinhardt
2024-04-02 17:29 ` [PATCH 1/9] refs/reftable: fix D/F conflict error message on ref copy Patrick Steinhardt
2024-04-03 18:28 ` Junio C Hamano
2024-04-02 17:29 ` [PATCH 2/9] refs/reftable: perform explicit D/F check when writing symrefs Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 3/9] refs/reftable: skip duplicate name checks Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 4/9] refs/reftable: don't recompute committer ident Patrick Steinhardt
2024-04-03 18:58 ` Junio C Hamano
2024-04-04 5:36 ` Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 5/9] reftable/writer: refactorings for `writer_add_record()` Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 6/9] reftable/writer: refactorings for `writer_flush_nonempty_block()` Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 7/9] reftable/block: reuse zstream when writing log blocks Patrick Steinhardt
2024-04-03 19:35 ` Junio C Hamano
2024-04-04 5:36 ` Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 8/9] reftable/block: reuse compressed array Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 9/9] reftable/writer: reset `last_key` instead of releasing it Patrick Steinhardt
2024-04-04 5:48 ` Patrick Steinhardt [this message]
2024-04-04 5:48 ` [PATCH v2 01/11] refs/reftable: fix D/F conflict error message on ref copy Patrick Steinhardt
2024-04-04 5:48 ` [PATCH v2 02/11] refs/reftable: perform explicit D/F check when writing symrefs Patrick Steinhardt
2024-04-04 5:48 ` [PATCH v2 03/11] refs/reftable: skip duplicate name checks Patrick Steinhardt
2024-04-04 5:48 ` [PATCH v2 04/11] reftable: remove " Patrick Steinhardt
2024-04-04 5:48 ` [PATCH v2 05/11] refs/reftable: don't recompute committer ident Patrick Steinhardt
2024-04-04 5:48 ` [PATCH v2 06/11] reftable/writer: refactorings for `writer_add_record()` Patrick Steinhardt
2024-04-04 6:58 ` Han-Wen Nienhuys
2024-04-04 7:32 ` Patrick Steinhardt
2024-04-04 5:48 ` [PATCH v2 07/11] reftable/writer: refactorings for `writer_flush_nonempty_block()` Patrick Steinhardt
2024-04-04 5:48 ` [PATCH v2 08/11] reftable/writer: unify releasing memory Patrick Steinhardt
2024-04-04 7:08 ` Han-Wen Nienhuys
2024-04-04 7:32 ` Patrick Steinhardt
2024-04-04 9:00 ` Han-Wen Nienhuys
2024-04-04 11:43 ` Patrick Steinhardt
2024-04-04 5:48 ` [PATCH v2 09/11] reftable/writer: reset `last_key` instead of releasing it Patrick Steinhardt
2024-04-04 5:48 ` [PATCH v2 10/11] reftable/block: reuse zstream when writing log blocks Patrick Steinhardt
2024-04-04 5:48 ` [PATCH v2 11/11] reftable/block: reuse compressed array Patrick Steinhardt
2024-04-04 7:09 ` [PATCH v2 00/11] reftable: optimize write performance Han-Wen Nienhuys
2024-04-04 7:32 ` Patrick Steinhardt
2024-04-08 12:23 ` [PATCH v3 " Patrick Steinhardt
2024-04-08 12:23 ` [PATCH v3 01/11] refs/reftable: fix D/F conflict error message on ref copy Patrick Steinhardt
2024-04-08 12:23 ` [PATCH v3 02/11] refs/reftable: perform explicit D/F check when writing symrefs Patrick Steinhardt
2024-04-08 12:24 ` [PATCH v3 03/11] refs/reftable: skip duplicate name checks Patrick Steinhardt
2024-04-08 12:24 ` [PATCH v3 04/11] reftable: remove " Patrick Steinhardt
2024-04-08 12:24 ` [PATCH v3 05/11] refs/reftable: don't recompute committer ident Patrick Steinhardt
2024-04-08 12:24 ` [PATCH v3 06/11] reftable/writer: refactorings for `writer_add_record()` Patrick Steinhardt
2024-04-08 12:24 ` [PATCH v3 07/11] reftable/writer: refactorings for `writer_flush_nonempty_block()` Patrick Steinhardt
2024-04-08 12:24 ` [PATCH v3 08/11] reftable/writer: unify releasing memory Patrick Steinhardt
2024-04-08 12:24 ` [PATCH v3 09/11] reftable/writer: reset `last_key` instead of releasing it Patrick Steinhardt
2024-04-08 12:24 ` [PATCH v3 10/11] reftable/block: reuse zstream when writing log blocks Patrick Steinhardt
2024-04-08 12:24 ` [PATCH v3 11/11] reftable/block: reuse compressed array Patrick Steinhardt
2024-04-09 0:09 ` [PATCH v3 00/11] reftable: optimize write performance Junio C Hamano
2024-04-09 3:16 ` Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1712209149.git.ps@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=hanwenn@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.