Git development
 help / color / mirror / Atom feed
From: "Scott Bauersfeld via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee <stolee@gmail.com>,
	Scott Bauersfeld <sbauersfeld@g.ucla.edu>,
	Scott Bauersfeld <sbauersfeld@g.ucla.edu>
Subject: [PATCH v2] index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB
Date: Mon, 27 Apr 2026 16:08:34 +0000	[thread overview]
Message-ID: <pull.2282.v2.git.git.1777306114914.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2282.git.git.1777058098756.gitgitgadget@gmail.com>

From: Scott Bauersfeld <sbauersfeld@g.ucla.edu>

index-pack and unpack-objects both read pack data from stdin through
a 4 KiB static buffer. In index-pack, each fill() flushes consumed
bytes to the pack file via write_or_die(), capping every write(2)
at 4 KiB. unpack-objects uses the same buffer pattern for reads.

On FUSE-backed filesystems every write(2) is a synchronous round
trip through the FUSE protocol (userspace -> kernel -> userspace ->
back), so the 4 KiB buffer turns a clone into many unnecessary tiny
writes with noticeable latency overhead.

Increase the buffer from 4 KiB to 128 KiB. Introduce a shared
DEFAULT_PACKFILE_BUFFER_SIZE constant in git-compat-util.h (next to
MAX_IO_SIZE) and use it in index-pack, unpack-objects, and the
hashfile layer in csum-file (which already used 128 KiB but
hardcoded the value).

Syscall counts via strace on HTTPS clones of git/git (~296 MB pack,
5 runs per variant, isolated builds from the same v2.54.0 source):

  index-pack pack file writes: 72,465 -> 24,943 avg (65% fewer)
  total write() syscalls:     310,192 -> 259,530 avg (16% fewer)
  writes of exactly 4096 bytes: ~40,077 -> 0

Wall-clock time of git clone over HTTPS onto a FUSE passthrough
filesystem with writeback caching disabled, 3 runs per variant:

  vscode (~1.26 GB pack): 84.5s -> 75.7s avg (10% faster)
  git/git (~306 MB pack):  22.6s -> 20.0s avg (11% faster)

Signed-off-by: Scott Bauersfeld <sbauersfeld@g.ucla.edu>
---
    index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB
    
    index-pack and unpack-objects read pack data from stdin through a 4 KiB
    static buffer. In index-pack, each fill() flushes consumed bytes to the
    pack file via write_or_die(), capping every write(2) at 4 KiB.
    unpack-objects uses the same buffer pattern for reads.
    
    On FUSE-backed filesystems every write(2) is a synchronous round trip
    through the FUSE protocol (userspace → kernel → userspace → back), so
    the 4 KiB buffer turns a clone into many unnecessary tiny writes with
    noticeable latency overhead.
    
    Increase the buffer from 4 KiB to 128 KiB. Introduce a shared
    DEFAULT_PACKFILE_BUFFER_SIZE constant in git-compat-util.h (next to
    MAX_IO_SIZE) and use it in index-pack, unpack-objects, and the hashfile
    layer in csum-file (which already used 128 KiB but hardcoded the value).
    
    
    Syscall reduction
    =================
    
    Measured via strace -f on HTTPS clones of git/git (~296 MB pack, 5 runs
    per variant, isolated builds from the same v2.54.0 source):
    
    Metric Unpatched (4 KiB) Patched (128 KiB) Change index-pack writes to
    pack file 72,465 avg 24,943 avg −65% Total write() syscalls (all
    processes) 310,192 avg 259,530 avg −16% Writes of exactly 4096 bytes
    ~40,077 avg 0 eliminated HEAD / file count / fsck ✓ ✓ identical
    
    
    Wall-clock time on FUSE
    =======================
    
    Measured wall-clock time of git clone over HTTPS onto a FUSE passthrough
    filesystem with writeback caching disabled. 3 runs per variant:
    
    Repo Unpatched avg Patched avg Change microsoft/vscode (~1.26 GB pack)
    84.5s 75.7s −10% git/git (~306 MB pack) 22.6s 20.0s −11%
    
    
    Changes since v1
    ================
    
     * Introduced shared DEFAULT_PACKFILE_BUFFER_SIZE constant in
       git-compat-util.h (next to MAX_IO_SIZE), replacing per-file #define
       and the hardcoded value in csum-file.c. Placed here rather than
       environment.h since it is an I/O buffer size, not an environment
       variable or repo config.
     * Added wall-clock timing on a FUSE filesystem.
     * Cleaned up the commit description a bit.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-2282%2Fsbauersfeld%2Fsb%2Fincrease-index-pack-input-buffer-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-2282/sbauersfeld/sb/increase-index-pack-input-buffer-v2
Pull-Request: https://github.com/git/git/pull/2282

Range-diff vs v1:

 1:  c388e1dc2f ! 1:  ac2559ccb5 index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB
     @@ Metadata
       ## Commit message ##
          index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB
      
     -    Both index-pack and unpack-objects read pack data from stdin through
     -    a 4 KiB static buffer (input_buffer[4096]). On each fill(), consumed
     -    bytes are flushed to the output pack file via write_or_die(), so
     -    every write(2) moves at most 4 KiB.
     +    index-pack and unpack-objects both read pack data from stdin through
     +    a 4 KiB static buffer. In index-pack, each fill() flushes consumed
     +    bytes to the pack file via write_or_die(), capping every write(2)
     +    at 4 KiB. unpack-objects uses the same buffer pattern for reads.
      
          On FUSE-backed filesystems every write(2) is a synchronous round
          trip through the FUSE protocol (userspace -> kernel -> userspace ->
          back), so the 4 KiB buffer turns a clone into many unnecessary tiny
          writes with noticeable latency overhead.
      
     -    Increase the buffer from 4 KiB to 128 KiB, matching the default
     -    already used by the hashfile layer in csum-file.c.
     +    Increase the buffer from 4 KiB to 128 KiB. Introduce a shared
     +    DEFAULT_PACKFILE_BUFFER_SIZE constant in git-compat-util.h (next to
     +    MAX_IO_SIZE) and use it in index-pack, unpack-objects, and the
     +    hashfile layer in csum-file (which already used 128 KiB but
     +    hardcoded the value).
      
     -    Testing with strace on HTTPS clones of git/git (~296 MB pack, 5 runs
     -    per variant, isolated builds from the same v2.54.0 source) shows:
     +    Syscall counts via strace on HTTPS clones of git/git (~296 MB pack,
     +    5 runs per variant, isolated builds from the same v2.54.0 source):
      
     -      index-pack pack file writes: 72,465 -> 24,943 avg (66% reduction)
     -      total write() syscalls:     310,192 -> 259,530 avg (17% reduction)
     -      writes of exactly 4096 bytes: ~40,077 -> 0 (eliminated)
     +      index-pack pack file writes: 72,465 -> 24,943 avg (65% fewer)
     +      total write() syscalls:     310,192 -> 259,530 avg (16% fewer)
     +      writes of exactly 4096 bytes: ~40,077 -> 0
      
     -    All clones produce identical HEAD, file count, and pass fsck.
     +    Wall-clock time of git clone over HTTPS onto a FUSE passthrough
     +    filesystem with writeback caching disabled, 3 runs per variant:
     +
     +      vscode (~1.26 GB pack): 84.5s -> 75.7s avg (10% faster)
     +      git/git (~306 MB pack):  22.6s -> 20.0s avg (11% faster)
      
          Signed-off-by: Scott Bauersfeld <sbauersfeld@g.ucla.edu>
      
     @@ builtin/index-pack.c: static int check_self_contained_and_connected;
       
      -/* We always read in 4kB chunks. */
      -static unsigned char input_buffer[4096];
     -+#define INPUT_BUFFER_SIZE (128 * 1024)
     -+static unsigned char input_buffer[INPUT_BUFFER_SIZE];
     ++static unsigned char input_buffer[DEFAULT_PACKFILE_BUFFER_SIZE];
       static unsigned int input_offset, input_len;
       static off_t consumed_bytes;
       static off_t max_input_size;
     @@ builtin/unpack-objects.c
       
      -/* We always read in 4kB chunks. */
      -static unsigned char buffer[4096];
     -+#define INPUT_BUFFER_SIZE (128 * 1024)
     -+static unsigned char buffer[INPUT_BUFFER_SIZE];
     ++static unsigned char buffer[DEFAULT_PACKFILE_BUFFER_SIZE];
       static unsigned int offset, len;
       static off_t consumed_bytes;
       static off_t max_input_size;
     +
     + ## csum-file.c ##
     +@@ csum-file.c: struct hashfile *hashfd_ext(const struct git_hash_algo *algop,
     + 	f->algop = unsafe_hash_algo(algop);
     + 	f->algop->init_fn(&f->ctx);
     + 
     +-	f->buffer_len = opts->buffer_len ? opts->buffer_len : 128 * 1024;
     ++	f->buffer_len = opts->buffer_len ? opts->buffer_len : DEFAULT_PACKFILE_BUFFER_SIZE;
     + 	f->buffer = xmalloc(f->buffer_len);
     + 	f->check_buffer = NULL;
     + 
     +
     + ## git-compat-util.h ##
     +@@ git-compat-util.h: static inline uint64_t u64_add(uint64_t a, uint64_t b)
     + # endif
     + #endif
     + 
     ++/*
     ++ * Default buffer size for buffered I/O in pack file operations (index-pack,
     ++ * unpack-objects) and the hashfile layer in csum-file.
     ++ */
     ++#define DEFAULT_PACKFILE_BUFFER_SIZE (128 * 1024)
     ++
     + #ifdef HAVE_ALLOCA_H
     + # include <alloca.h>
     + # define xalloca(size)      (alloca(size))


 builtin/index-pack.c     | 3 +--
 builtin/unpack-objects.c | 3 +--
 csum-file.c              | 2 +-
 git-compat-util.h        | 6 ++++++
 4 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index ca7784dc2c..d86476676f 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -145,8 +145,7 @@ static int check_self_contained_and_connected;
 
 static struct progress *progress;
 
-/* We always read in 4kB chunks. */
-static unsigned char input_buffer[4096];
+static unsigned char input_buffer[DEFAULT_PACKFILE_BUFFER_SIZE];
 static unsigned int input_offset, input_len;
 static off_t consumed_bytes;
 static off_t max_input_size;
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index e01cf6e360..da8ec83d9f 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -23,8 +23,7 @@
 static int dry_run, quiet, recover, has_errors, strict;
 static const char unpack_usage[] = "git unpack-objects [-n] [-q] [-r] [--strict]";
 
-/* We always read in 4kB chunks. */
-static unsigned char buffer[4096];
+static unsigned char buffer[DEFAULT_PACKFILE_BUFFER_SIZE];
 static unsigned int offset, len;
 static off_t consumed_bytes;
 static off_t max_input_size;
diff --git a/csum-file.c b/csum-file.c
index 9558177a11..c1aeaf587a 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -178,7 +178,7 @@ struct hashfile *hashfd_ext(const struct git_hash_algo *algop,
 	f->algop = unsafe_hash_algo(algop);
 	f->algop->init_fn(&f->ctx);
 
-	f->buffer_len = opts->buffer_len ? opts->buffer_len : 128 * 1024;
+	f->buffer_len = opts->buffer_len ? opts->buffer_len : DEFAULT_PACKFILE_BUFFER_SIZE;
 	f->buffer = xmalloc(f->buffer_len);
 	f->check_buffer = NULL;
 
diff --git a/git-compat-util.h b/git-compat-util.h
index ae1bdc90a4..a2f037811c 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -712,6 +712,12 @@ static inline uint64_t u64_add(uint64_t a, uint64_t b)
 # endif
 #endif
 
+/*
+ * Default buffer size for buffered I/O in pack file operations (index-pack,
+ * unpack-objects) and the hashfile layer in csum-file.
+ */
+#define DEFAULT_PACKFILE_BUFFER_SIZE (128 * 1024)
+
 #ifdef HAVE_ALLOCA_H
 # include <alloca.h>
 # define xalloca(size)      (alloca(size))

base-commit: 94f057755b7941b321fd11fec1b2e3ca5313a4e0
-- 
gitgitgadget

  parent reply	other threads:[~2026-04-27 16:08 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-24 19:14 [PATCH] index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB Scott Bauersfeld via GitGitGadget
2026-04-25 10:21 ` Junio C Hamano
2026-04-27 12:36   ` Derrick Stolee
2026-04-28  1:46     ` Junio C Hamano
2026-04-28  2:09       ` Jeff King
2026-04-27 16:08 ` Scott Bauersfeld via GitGitGadget [this message]
2026-04-27 17:23   ` [PATCH v2] " Derrick Stolee
2026-04-27 19:26   ` [PATCH v3] " Scott Bauersfeld via GitGitGadget
2026-04-27 20:12     ` Derrick Stolee
2026-04-28  1:47       ` Junio C Hamano
2026-04-28 14:47     ` [PATCH v4] " Scott Bauersfeld via GitGitGadget
2026-05-12  5:51       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.2282.v2.git.git.1777306114914.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=sbauersfeld@g.ucla.edu \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox