From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Derrick Stolee <stolee@gmail.com>,
Scott Bauersfeld via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org, Scott Bauersfeld <sbauersfeld@g.ucla.edu>
Subject: Re: [PATCH] index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB
Date: Mon, 27 Apr 2026 22:09:33 -0400 [thread overview]
Message-ID: <20260428020933.GA660154@coredump.intra.peff.net> (raw)
In-Reply-To: <xmqqecjz26wr.fsf@gitster.g>
On Tue, Apr 28, 2026 at 10:46:44AM +0900, Junio C Hamano wrote:
> The application may have produced only 2kB before it issues a
> "flush". Whether the buffer size is 4kB or 128kB, such a flush will
> only write out 2kB, and the larger buffer size does not help at all.
> But if the application has produced 90kB before it issues a "flush",
> the larger buffer size would give us a great improvement. With 4kB
> buffer, before such an application level "flush", we would have seen
> 22 = floor(90/4) calls of write(2) to flush the buffer, plus a 2kB
> write(2). With 128kB buffer, we would see a single 90kB write(2).
>
> So the apparently lower improvement than I naively have expected may
> be attributable to the fact that many application level "flush" was
> not large enough to benefit from 128kB buffer? How much of the
> total number of bytes written came in large batches, vs tiny ones?
The input to index-pack in a fetch is going to be the demuxing of the
sideband via git-fetch. So it's probably flushing 64k or less each time
(because that's the max size of a packet), and unless index-pack is
going much slower than the input, that maximizes how much it will read.
Depending on the source, though, it may be possible to go faster than
index-pack (which has to at least update the pack checksum for every
byte, and may even zlib inflate and hash the object itself if it's a
non-delta). In which case the sideband demuxer would start filling the
pipe and index-pack may get larger reads.
We could actually reduce the number of syscalls further if index-pack
did the demuxing itself, and we just handed it the descriptor. That
probably doesn't help all that much in this case, though, if the problem
is not raw reads/writes on pipes, but rather ones that go to the slow
FUSE filesystem. And as long as those pipe reads/writes are "wide"
(allowing the eventual filesystem writes to also be wide), then the
exact number may not be as important.
But the demuxing may also explain why the total number of writes did not
decrease as much as you expected, since those ones will probably not be
reduced by the patch in question. So the improvement is a percentage of
only a smaller portion of the total (but not necessarily half, because
they may have been larger writes in the first place).
-Peff
next prev parent reply other threads:[~2026-04-28 2:09 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-24 19:14 [PATCH] index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB Scott Bauersfeld via GitGitGadget
2026-04-25 10:21 ` Junio C Hamano
2026-04-27 12:36 ` Derrick Stolee
2026-04-28 1:46 ` Junio C Hamano
2026-04-28 2:09 ` Jeff King [this message]
2026-04-27 16:08 ` [PATCH v2] " Scott Bauersfeld via GitGitGadget
2026-04-27 17:23 ` Derrick Stolee
2026-04-27 19:26 ` [PATCH v3] " Scott Bauersfeld via GitGitGadget
2026-04-27 20:12 ` Derrick Stolee
2026-04-28 1:47 ` Junio C Hamano
2026-04-28 14:47 ` [PATCH v4] " Scott Bauersfeld via GitGitGadget
2026-05-12 5:51 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260428020933.GA660154@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=sbauersfeld@g.ucla.edu \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox