All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Scott Bauersfeld via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee <stolee@gmail.com>, Jeff King <peff@peff.net>,
	Scott Bauersfeld <sbauersfeld@g.ucla.edu>,
	Scott Bauersfeld <sbauersfeld@g.ucla.edu>
Subject: [PATCH v4] index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB
Date: Tue, 28 Apr 2026 14:47:40 +0000	[thread overview]
Message-ID: <pull.2282.v4.git.git.1777387660841.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2282.v3.git.git.1777317998098.gitgitgadget@gmail.com>

From: Scott Bauersfeld <sbauersfeld@g.ucla.edu>

index-pack and unpack-objects both read pack data from stdin through
a 4 KiB static buffer. In index-pack, each fill() flushes consumed
bytes to the pack file via write_or_die(), capping every write(2)
at 4 KiB. unpack-objects uses the same buffer pattern for reads.

On FUSE-backed filesystems every write(2) is a synchronous round
trip through the FUSE protocol (userspace -> kernel -> userspace ->
back), so the 4 KiB buffer turns a clone into many unnecessary tiny
writes with noticeable latency overhead.

Increase the buffer from 4 KiB to 128 KiB. Introduce a shared
DEFAULT_IO_BUFFER_SIZE constant in git-compat-util.h (next to
MAX_IO_SIZE) and use it in index-pack, unpack-objects, and the
hashfile layer in csum-file (which already used 128 KiB but
hardcoded the value).

Pack file writes to a FUSE filesystem with writeback caching
disabled during HTTPS clones of git/git (~293 MB pack):

  74,958 -> 4,687 (94% fewer)

Wall-clock time of git clone over HTTPS onto a FUSE passthrough
filesystem with writeback caching disabled, 3 runs per variant:

  vscode (~1.26 GB pack): 84.5s -> 75.7s avg (10% faster)
  git/git (~306 MB pack):  22.6s -> 20.0s avg (11% faster)

Signed-off-by: Scott Bauersfeld <sbauersfeld@g.ucla.edu>
---
    index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB
    
    index-pack and unpack-objects read pack data from stdin through a 4 KiB
    static buffer. In index-pack, each fill() flushes consumed bytes to the
    pack file via write_or_die(), capping every write(2) at 4 KiB.
    unpack-objects uses the same buffer pattern for reads.
    
    On FUSE-backed filesystems every write(2) is a synchronous round trip
    through the FUSE protocol (userspace → kernel → userspace → back), so
    the 4 KiB buffer turns a clone into many unnecessary tiny writes with
    noticeable latency overhead.
    
    Increase the buffer from 4 KiB to 128 KiB. Introduce a shared
    DEFAULT_IO_BUFFER_SIZE constant in git-compat-util.h (next to
    MAX_IO_SIZE) and use it in index-pack, unpack-objects, and the hashfile
    layer in csum-file (which already used 128 KiB but hardcoded the value).
    
    
    Pack file write reduction
    =========================
    
    Pack file writes to a FUSE filesystem with writeback caching disabled
    during HTTPS clones of git/git (~293 MB pack):
    
    Unpatched avg Patched avg Change 74,958 4,687 −94%
    
    Write counts measured by logging writes in a FUSE passthrough daemon
    (libfuse 3.10.5, writeback cache off).
    
    
    Wall-clock time on FUSE
    =======================
    
    Measured wall-clock time of git clone over HTTPS onto a FUSE passthrough
    filesystem with writeback caching disabled. 3 runs per variant:
    
    Repo Unpatched avg Patched avg Change microsoft/vscode (~1.26 GB pack)
    84.5s 75.7s −10% git/git (~306 MB pack) 22.6s 20.0s −11%
    
    
    Changes since v3
    ================
    
     * Replaced strace-based syscall measurements with FUSE daemon write
       logging. The earlier strace numbers (72,465 → 24,943, 65% reduction)
       were distorted: strace -f ptrace intercepts every syscall in all
       traced processes and added enough overhead to distort the
       measurements. The FUSE daemon logging captures write sizes without
       perturbing the traced processes, showing the true reduction is 94%
       (74,958 → 4,687).
     * Note: Why 4,687 writes instead of ~2k writes as would be expected
       with a 128 KiB buffer size? It appears that fill() is calling xread()
       on a pipe and the linux default buffer size for pipes is 64KiB. I
       also tested using fcntl(F_SETPIPE_SZ) to increase the pipe's buffer
       size to 128KiB, which does indeed reduce total pack file writes to
       ~2.4K.
    
    
    Changes since v2
    ================
    
     * Renamed DEFAULT_PACKFILE_BUFFER_SIZE → DEFAULT_IO_BUFFER_SIZE per
       Stolee's feedback. The constant is not packfile-specific, since it is
       also used by the hashfile layer.
     * Stolee noted that WRITE_BUFFER_SIZE in read-cache.c could be
       consolidated. That constant was already removed in f6e2cd0625
       ("read-cache: delete unused hashing methods", 2021-05-18) when
       read-cache.c was converted to use the hashfile API, so there is
       nothing left to unify. The rename to DEFAULT_IO_BUFFER_SIZE helps
       account for the multiple usages of this constant.
    
    
    Changes since v1
    ================
    
     * Introduced shared DEFAULT_PACKFILE_BUFFER_SIZE constant in
       git-compat-util.h (next to MAX_IO_SIZE), replacing per-file #define
       and the hardcoded value in csum-file.c. Placed here rather than
       environment.h since it is an I/O buffer size, not an environment
       variable or repo config.
     * Added wall-clock timing on a FUSE filesystem.
     * Cleaned up the commit description a bit.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-2282%2Fsbauersfeld%2Fsb%2Fincrease-index-pack-input-buffer-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-2282/sbauersfeld/sb/increase-index-pack-input-buffer-v4
Pull-Request: https://github.com/git/git/pull/2282

Range-diff vs v3:

 1:  df754ac879 ! 1:  146b1846a5 index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB
     @@ Commit message
          hashfile layer in csum-file (which already used 128 KiB but
          hardcoded the value).
      
     -    Syscall counts via strace on HTTPS clones of git/git (~296 MB pack,
     -    5 runs per variant, isolated builds from the same v2.54.0 source):
     +    Pack file writes to a FUSE filesystem with writeback caching
     +    disabled during HTTPS clones of git/git (~293 MB pack):
      
     -      index-pack pack file writes: 72,465 -> 24,943 avg (65% fewer)
     -      total write() syscalls:     310,192 -> 259,530 avg (16% fewer)
     -      writes of exactly 4096 bytes: ~40,077 -> 0
     +      74,958 -> 4,687 (94% fewer)
      
          Wall-clock time of git clone over HTTPS onto a FUSE passthrough
          filesystem with writeback caching disabled, 3 runs per variant:


 builtin/index-pack.c     | 3 +--
 builtin/unpack-objects.c | 3 +--
 csum-file.c              | 2 +-
 git-compat-util.h        | 6 ++++++
 4 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index ca7784dc2c..bb3639641c 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -145,8 +145,7 @@ static int check_self_contained_and_connected;
 
 static struct progress *progress;
 
-/* We always read in 4kB chunks. */
-static unsigned char input_buffer[4096];
+static unsigned char input_buffer[DEFAULT_IO_BUFFER_SIZE];
 static unsigned int input_offset, input_len;
 static off_t consumed_bytes;
 static off_t max_input_size;
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index e01cf6e360..af67d1a1d3 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -23,8 +23,7 @@
 static int dry_run, quiet, recover, has_errors, strict;
 static const char unpack_usage[] = "git unpack-objects [-n] [-q] [-r] [--strict]";
 
-/* We always read in 4kB chunks. */
-static unsigned char buffer[4096];
+static unsigned char buffer[DEFAULT_IO_BUFFER_SIZE];
 static unsigned int offset, len;
 static off_t consumed_bytes;
 static off_t max_input_size;
diff --git a/csum-file.c b/csum-file.c
index 9558177a11..d7a682c2b6 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -178,7 +178,7 @@ struct hashfile *hashfd_ext(const struct git_hash_algo *algop,
 	f->algop = unsafe_hash_algo(algop);
 	f->algop->init_fn(&f->ctx);
 
-	f->buffer_len = opts->buffer_len ? opts->buffer_len : 128 * 1024;
+	f->buffer_len = opts->buffer_len ? opts->buffer_len : DEFAULT_IO_BUFFER_SIZE;
 	f->buffer = xmalloc(f->buffer_len);
 	f->check_buffer = NULL;
 
diff --git a/git-compat-util.h b/git-compat-util.h
index ae1bdc90a4..5024814bd4 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -712,6 +712,12 @@ static inline uint64_t u64_add(uint64_t a, uint64_t b)
 # endif
 #endif
 
+/*
+ * Default buffer size for buffered I/O in index-pack, unpack-objects,
+ * and the hashfile layer in csum-file.
+ */
+#define DEFAULT_IO_BUFFER_SIZE (128 * 1024)
+
 #ifdef HAVE_ALLOCA_H
 # include <alloca.h>
 # define xalloca(size)      (alloca(size))

base-commit: 94f057755b7941b321fd11fec1b2e3ca5313a4e0
-- 
gitgitgadget

  parent reply	other threads:[~2026-04-28 14:47 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-24 19:14 [PATCH] index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB Scott Bauersfeld via GitGitGadget
2026-04-25 10:21 ` Junio C Hamano
2026-04-27 12:36   ` Derrick Stolee
2026-04-28  1:46     ` Junio C Hamano
2026-04-28  2:09       ` Jeff King
2026-04-27 16:08 ` [PATCH v2] " Scott Bauersfeld via GitGitGadget
2026-04-27 17:23   ` Derrick Stolee
2026-04-27 19:26   ` [PATCH v3] " Scott Bauersfeld via GitGitGadget
2026-04-27 20:12     ` Derrick Stolee
2026-04-28  1:47       ` Junio C Hamano
2026-04-28 14:47     ` Scott Bauersfeld via GitGitGadget [this message]
2026-05-12  5:51       ` [PATCH v4] " Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.2282.v4.git.git.1777387660841.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=sbauersfeld@g.ucla.edu \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.