public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Matt Smiley <msmiley@gitlab.com>,
	"brian m. carlson" <sandals@crustytoothpaste.net>
Subject: Re: [PATCH v2 03/10] upload-pack: reduce lock contention when writing packfile data
Date: Tue, 10 Mar 2026 13:14:41 +0100	[thread overview]
Message-ID: <abALMbYE0C0UlwbD@pks.im> (raw)
In-Reply-To: <20260305011638.GC4943@coredump.intra.peff.net>

On Wed, Mar 04, 2026 at 08:16:38PM -0500, Jeff King wrote:
> On Tue, Mar 03, 2026 at 04:00:18PM +0100, Patrick Steinhardt wrote:
> 
> > Extend our use of the buffering infrastructure so that we soak up bytes
> > until the buffer is filled up at least 2/3rds of its capacity. The
> > change is relatively simple to implement as we already know to flush the
> > buffer in `create_pack_file()` after git-pack-objects(1) has finished.
> 
> This 2/3rds feels kind of arbitrary. Isn't our best bet to try to fill
> pkt-lines? Later you say:
> 
> > Now we could of course go even further and make sure that we always fill
> > up the whole buffer. But this might cause an increase in read(3p)
> > syscalls, and some tests show that this only reduces the number of
> > write(3p) syscalls from 130,000 to 100,000. So overall this doesn't seem
> > worth it.
> 
> But I am not clear how it increases the number of read() calls. I guess
> you are concerned that we'll get 50k, and then do a read for the
> remaining 14k, and then read 50k, and then 14k, and so on. But I'm
> unconvinced that 2/3 is really any better here, as it depends on the
> buffering patterns of the upstream writer. They could be writing 1 byte
> less than 2/3, and we'd wait to buffer, then read half their next
> packet, write it, then read the second of of their next packet, wait to
> buffer, and so on.
> 
> Even just doing:
> 
>   git clone --upload-pack='strace -e write git-upload-pack' --bare --no-local . foo.git
> 
> with this patch (and not the later one to increase the buffer size of
> pack-objects), I see an interesting flip-flop between packets of size
> 65515 and 61461. But we never send a single full-size one, even though
> pack-objects should be outpacing us (because we're slowed by running
> under strace). That's probably an OK loss of efficiency in practice, but
> it's very dependent on pack-objects buffering.

The big question though is whether it's a loss of efficiency compared to
the previous state where we didn't buffer at all.

> I'm still a little bit negative on the whole concept of buffering in
> upload-pack, just because the interactions between buffering layers can
> be so subtle. But I guess I'm not really making any argument that I
> didn't make in v1, and you kept this in v2, so I suppose you are not
> swayed by it. ;)

Yeah. Its usefulness highly depends on the downstream command indeed, so
it may or may not help. What I'm not convinced of though is that there
are likely scenarios where it'll perform _worse_ than before. If we can
come up with such a scenario then I'll be happy to drop this patch.

For downstream git-pack-objects(1) it shouldn't really make much of a
difference at the end of this series anyway, as it starts to write data
at pktline lengh. But it may help users of the pack-objects hook.

> If we are going to buffer in upload-pack, there is an obvious
> optimization that I didn't see in your series. When we send a keepalive,
> we should just send whatever we have in os->buffer (even if it is
> nothing).  If we are wasting 5 bytes of pkt-line header and a write()
> call to send the keepalive, we may as well send what data we do have.
> 
> I don't know how much it would help in practice, though. Most keepalives
> will come before the pack data starts, as once pack-objects starts
> producing data, it tends to do so pretty consistently. And of course we
> can't send os->buffer before we see the PACK header, because the whole
> point is to buffer the early bit waiting for packfile uris.
> 
> So it might not be worth adding.

It's an interesting idea indeed. We also need to be mindful that we try
to hold one byte back so that we can "fail" downstream callers in case
git-pack-objects(1) fails.

So we need to repeat some conditions here, but overall it's not too bad:

diff --git a/upload-pack.c b/upload-pack.c
index 1b1c81ea63..199bbd1ad6 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -484,7 +484,16 @@ static void create_pack_file(struct upload_pack_data *pack_data,
 		 */
 		if (!ret && pack_data->use_sideband) {
 			static const char buf[] = "0005\1";
-			write_or_die(1, buf, 5);
+
+			if (output_state->packfile_started && output_state->used > 1) {
+				send_client_data(1, output_state->buffer, output_state->used - 1,
+						 pack_data->use_sideband);
+				output_state->buffer[0] = output_state->buffer[output_state->used - 1];
+				output_state->used = 1;
+			} else {
+				write_or_die(1, buf, 5);
+			}
+
 			last_sent_ms = now_ms;
 		}
 	}

I'll include this in the next version, thanks!

Patrick

  reply	other threads:[~2026-03-10 12:14 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-27 11:22 [PATCH 0/2] upload-pack: reduce lock contention when writing packfile data Patrick Steinhardt
2026-02-27 11:23 ` [PATCH 1/2] upload-pack: fix debug statement when flushing " Patrick Steinhardt
2026-02-27 11:23 ` [PATCH 2/2] upload-pack: reduce lock contention when writing " Patrick Steinhardt
2026-02-27 13:04   ` brian m. carlson
2026-02-27 18:14     ` Patrick Steinhardt
2026-02-27 17:29   ` Junio C Hamano
2026-02-27 19:37   ` Jeff King
2026-03-02 12:12     ` Patrick Steinhardt
2026-03-02 18:20       ` Jeff King
2026-03-03  9:31         ` Patrick Steinhardt
2026-03-03 13:35           ` Jeff King
2026-03-03 13:47             ` Patrick Steinhardt
2026-03-03 15:00 ` [PATCH v2 00/10] " Patrick Steinhardt
2026-03-03 15:00   ` [PATCH v2 01/10] upload-pack: fix debug statement when flushing " Patrick Steinhardt
2026-03-03 15:00   ` [PATCH v2 02/10] upload-pack: adapt keepalives based on buffering Patrick Steinhardt
2026-03-05  0:56     ` Jeff King
2026-03-10 12:08       ` Patrick Steinhardt
2026-03-03 15:00   ` [PATCH v2 03/10] upload-pack: reduce lock contention when writing packfile data Patrick Steinhardt
2026-03-05  1:16     ` Jeff King
2026-03-10 12:14       ` Patrick Steinhardt [this message]
2026-03-03 15:00   ` [PATCH v2 04/10] git-compat-util: introduce `cast_size_t_to_ssize_t()` Patrick Steinhardt
2026-03-03 15:00   ` [PATCH v2 05/10] compat/posix: introduce writev(3p) wrapper Patrick Steinhardt
2026-03-04 22:01     ` Junio C Hamano
2026-03-05  0:37       ` Jeff King
2026-03-05  2:16         ` brian m. carlson
2026-03-05  6:39           ` Johannes Sixt
2026-03-05 22:22             ` brian m. carlson
2026-03-10 12:09               ` Patrick Steinhardt
2026-03-03 15:00   ` [PATCH v2 06/10] wrapper: introduce writev(3p) wrappers Patrick Steinhardt
2026-03-03 15:00   ` [PATCH v2 07/10] sideband: use writev(3p) to send pktlines Patrick Steinhardt
2026-03-04 22:05     ` Junio C Hamano
2026-03-03 15:00   ` [PATCH v2 08/10] csum-file: introduce `hashfd_ext()` Patrick Steinhardt
2026-03-04 22:11     ` Junio C Hamano
2026-03-10 12:09       ` Patrick Steinhardt
2026-03-03 15:00   ` [PATCH v2 09/10] csum-file: drop `hashfd_throughput()` Patrick Steinhardt
2026-03-03 15:00   ` [PATCH v2 10/10] builtin/pack-objects: reduce lock contention when writing packfile data Patrick Steinhardt
2026-03-10 13:24 ` [PATCH v3 00/10] upload-pack: " Patrick Steinhardt
2026-03-10 13:24   ` [PATCH v3 01/10] upload-pack: fix debug statement when flushing " Patrick Steinhardt
2026-03-10 13:24   ` [PATCH v3 02/10] upload-pack: adapt keepalives based on buffering Patrick Steinhardt
2026-03-10 13:24   ` [PATCH v3 03/10] upload-pack: prefer flushing data over sending keepalive Patrick Steinhardt
2026-03-10 17:09     ` Junio C Hamano
2026-03-10 17:43       ` Patrick Steinhardt
2026-03-10 13:25   ` [PATCH v3 04/10] upload-pack: reduce lock contention when writing packfile data Patrick Steinhardt
2026-03-10 13:25   ` [PATCH v3 05/10] compat/posix: introduce writev(3p) wrapper Patrick Steinhardt
2026-03-10 16:59     ` Junio C Hamano
2026-03-10 13:25   ` [PATCH v3 06/10] wrapper: introduce writev(3p) wrappers Patrick Steinhardt
2026-03-10 13:25   ` [PATCH v3 07/10] sideband: use writev(3p) to send pktlines Patrick Steinhardt
2026-03-10 13:25   ` [PATCH v3 08/10] csum-file: introduce `hashfd_ext()` Patrick Steinhardt
2026-03-10 13:25   ` [PATCH v3 09/10] csum-file: drop `hashfd_throughput()` Patrick Steinhardt
2026-03-10 13:25   ` [PATCH v3 10/10] builtin/pack-objects: reduce lock contention when writing packfile data Patrick Steinhardt
2026-03-10 17:11   ` [PATCH v3 00/10] upload-pack: " Junio C Hamano
2026-03-10 20:56   ` Johannes Sixt
2026-03-11  6:27     ` Patrick Steinhardt
2026-03-13  6:45 ` [PATCH v4 " Patrick Steinhardt
2026-03-13  6:45   ` [PATCH v4 01/10] upload-pack: fix debug statement when flushing " Patrick Steinhardt
2026-03-13  6:45   ` [PATCH v4 02/10] upload-pack: adapt keepalives based on buffering Patrick Steinhardt
2026-03-13  6:45   ` [PATCH v4 03/10] upload-pack: prefer flushing data over sending keepalive Patrick Steinhardt
2026-03-13  6:45   ` [PATCH v4 04/10] upload-pack: reduce lock contention when writing packfile data Patrick Steinhardt
2026-03-13  6:45   ` [PATCH v4 05/10] compat/posix: introduce writev(3p) wrapper Patrick Steinhardt
2026-03-13  6:45   ` [PATCH v4 06/10] wrapper: introduce writev(3p) wrappers Patrick Steinhardt
2026-03-13  6:45   ` [PATCH v4 07/10] sideband: use writev(3p) to send pktlines Patrick Steinhardt
2026-03-13  6:45   ` [PATCH v4 08/10] csum-file: introduce `hashfd_ext()` Patrick Steinhardt
2026-03-13  6:45   ` [PATCH v4 09/10] csum-file: drop `hashfd_throughput()` Patrick Steinhardt
2026-03-13  6:45   ` [PATCH v4 10/10] builtin/pack-objects: reduce lock contention when writing packfile data Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abALMbYE0C0UlwbD@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=msmiley@gitlab.com \
    --cc=peff@peff.net \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox