Git development
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Scott Bauersfeld via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org,  Derrick Stolee <stolee@gmail.com>,
	 Jeff King <peff@peff.net>,
	 Scott Bauersfeld <sbauersfeld@g.ucla.edu>
Subject: Re: [PATCH v4] index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB
Date: Tue, 12 May 2026 14:51:16 +0900	[thread overview]
Message-ID: <xmqqy0hpnpkb.fsf@gitster.g> (raw)
In-Reply-To: <pull.2282.v4.git.git.1777387660841.gitgitgadget@gmail.com> (Scott Bauersfeld via GitGitGadget's message of "Tue, 28 Apr 2026 14:47:40 +0000")

"Scott Bauersfeld via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Scott Bauersfeld <sbauersfeld@g.ucla.edu>
>
> index-pack and unpack-objects both read pack data from stdin through
> a 4 KiB static buffer. In index-pack, each fill() flushes consumed
> bytes to the pack file via write_or_die(), capping every write(2)
> at 4 KiB. unpack-objects uses the same buffer pattern for reads.
>
> On FUSE-backed filesystems every write(2) is a synchronous round
> trip through the FUSE protocol (userspace -> kernel -> userspace ->
> back), so the 4 KiB buffer turns a clone into many unnecessary tiny
> writes with noticeable latency overhead.
>
> Increase the buffer from 4 KiB to 128 KiB. Introduce a shared
> DEFAULT_IO_BUFFER_SIZE constant in git-compat-util.h (next to
> MAX_IO_SIZE) and use it in index-pack, unpack-objects, and the
> hashfile layer in csum-file (which already used 128 KiB but
> hardcoded the value).
>
> Pack file writes to a FUSE filesystem with writeback caching
> disabled during HTTPS clones of git/git (~293 MB pack):
>
>   74,958 -> 4,687 (94% fewer)
>
> Wall-clock time of git clone over HTTPS onto a FUSE passthrough
> filesystem with writeback caching disabled, 3 runs per variant:
>
>   vscode (~1.26 GB pack): 84.5s -> 75.7s avg (10% faster)
>   git/git (~306 MB pack):  22.6s -> 20.0s avg (11% faster)
>
> Signed-off-by: Scott Bauersfeld <sbauersfeld@g.ucla.edu>
> ---
>...
>     
>     Changes since v3
>     ================
>     
>      * Replaced strace-based syscall measurements with FUSE daemon write
>        logging. The earlier strace numbers (72,465 → 24,943, 65% reduction)
>        were distorted: strace -f ptrace intercepts every syscall in all
>        traced processes and added enough overhead to distort the
>        measurements. The FUSE daemon logging captures write sizes without
>        perturbing the traced processes, showing the true reduction is 94%
>        (74,958 → 4,687).
>      * Note: Why 4,687 writes instead of ~2k writes as would be expected
>        with a 128 KiB buffer size? It appears that fill() is calling xread()
>        on a pipe and the linux default buffer size for pipes is 64KiB. I
>        also tested using fcntl(F_SETPIPE_SZ) to increase the pipe's buffer
>        size to 128KiB, which does indeed reduce total pack file writes to
>        ~2.4K.

It seems that everybody was happy with v3 already, so let's merge it
down to 'next'.

Thanks.

      reply	other threads:[~2026-05-12  5:51 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-24 19:14 [PATCH] index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB Scott Bauersfeld via GitGitGadget
2026-04-25 10:21 ` Junio C Hamano
2026-04-27 12:36   ` Derrick Stolee
2026-04-28  1:46     ` Junio C Hamano
2026-04-28  2:09       ` Jeff King
2026-04-27 16:08 ` [PATCH v2] " Scott Bauersfeld via GitGitGadget
2026-04-27 17:23   ` Derrick Stolee
2026-04-27 19:26   ` [PATCH v3] " Scott Bauersfeld via GitGitGadget
2026-04-27 20:12     ` Derrick Stolee
2026-04-28  1:47       ` Junio C Hamano
2026-04-28 14:47     ` [PATCH v4] " Scott Bauersfeld via GitGitGadget
2026-05-12  5:51       ` Junio C Hamano [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqy0hpnpkb.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=peff@peff.net \
    --cc=sbauersfeld@g.ucla.edu \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox