From: Junio C Hamano <gitster@pobox.com>
To: "Scott Bauersfeld via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Derrick Stolee <stolee@gmail.com>,
Jeff King <peff@peff.net>,
Scott Bauersfeld <sbauersfeld@g.ucla.edu>
Subject: Re: [PATCH v4] index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB
Date: Tue, 12 May 2026 14:51:16 +0900 [thread overview]
Message-ID: <xmqqy0hpnpkb.fsf@gitster.g> (raw)
In-Reply-To: <pull.2282.v4.git.git.1777387660841.gitgitgadget@gmail.com> (Scott Bauersfeld via GitGitGadget's message of "Tue, 28 Apr 2026 14:47:40 +0000")
"Scott Bauersfeld via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Scott Bauersfeld <sbauersfeld@g.ucla.edu>
>
> index-pack and unpack-objects both read pack data from stdin through
> a 4 KiB static buffer. In index-pack, each fill() flushes consumed
> bytes to the pack file via write_or_die(), capping every write(2)
> at 4 KiB. unpack-objects uses the same buffer pattern for reads.
>
> On FUSE-backed filesystems every write(2) is a synchronous round
> trip through the FUSE protocol (userspace -> kernel -> userspace ->
> back), so the 4 KiB buffer turns a clone into many unnecessary tiny
> writes with noticeable latency overhead.
>
> Increase the buffer from 4 KiB to 128 KiB. Introduce a shared
> DEFAULT_IO_BUFFER_SIZE constant in git-compat-util.h (next to
> MAX_IO_SIZE) and use it in index-pack, unpack-objects, and the
> hashfile layer in csum-file (which already used 128 KiB but
> hardcoded the value).
>
> Pack file writes to a FUSE filesystem with writeback caching
> disabled during HTTPS clones of git/git (~293 MB pack):
>
> 74,958 -> 4,687 (94% fewer)
>
> Wall-clock time of git clone over HTTPS onto a FUSE passthrough
> filesystem with writeback caching disabled, 3 runs per variant:
>
> vscode (~1.26 GB pack): 84.5s -> 75.7s avg (10% faster)
> git/git (~306 MB pack): 22.6s -> 20.0s avg (11% faster)
>
> Signed-off-by: Scott Bauersfeld <sbauersfeld@g.ucla.edu>
> ---
>...
>
> Changes since v3
> ================
>
> * Replaced strace-based syscall measurements with FUSE daemon write
> logging. The earlier strace numbers (72,465 → 24,943, 65% reduction)
> were distorted: strace -f ptrace intercepts every syscall in all
> traced processes and added enough overhead to distort the
> measurements. The FUSE daemon logging captures write sizes without
> perturbing the traced processes, showing the true reduction is 94%
> (74,958 → 4,687).
> * Note: Why 4,687 writes instead of ~2k writes as would be expected
> with a 128 KiB buffer size? It appears that fill() is calling xread()
> on a pipe and the linux default buffer size for pipes is 64KiB. I
> also tested using fcntl(F_SETPIPE_SZ) to increase the pipe's buffer
> size to 128KiB, which does indeed reduce total pack file writes to
> ~2.4K.
It seems that everybody was happy with v3 already, so let's merge it
down to 'next'.
Thanks.
prev parent reply other threads:[~2026-05-12 5:51 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-24 19:14 [PATCH] index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB Scott Bauersfeld via GitGitGadget
2026-04-25 10:21 ` Junio C Hamano
2026-04-27 12:36 ` Derrick Stolee
2026-04-28 1:46 ` Junio C Hamano
2026-04-28 2:09 ` Jeff King
2026-04-27 16:08 ` [PATCH v2] " Scott Bauersfeld via GitGitGadget
2026-04-27 17:23 ` Derrick Stolee
2026-04-27 19:26 ` [PATCH v3] " Scott Bauersfeld via GitGitGadget
2026-04-27 20:12 ` Derrick Stolee
2026-04-28 1:47 ` Junio C Hamano
2026-04-28 14:47 ` [PATCH v4] " Scott Bauersfeld via GitGitGadget
2026-05-12 5:51 ` Junio C Hamano [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqy0hpnpkb.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=peff@peff.net \
--cc=sbauersfeld@g.ucla.edu \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.