Git development
 help / color / mirror / Atom feed
* [PATCH] index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB
@ 2026-04-24 19:14 Scott Bauersfeld via GitGitGadget
  2026-04-25 10:21 ` Junio C Hamano
  2026-04-27 16:08 ` [PATCH v2] " Scott Bauersfeld via GitGitGadget
  0 siblings, 2 replies; 12+ messages in thread
From: Scott Bauersfeld via GitGitGadget @ 2026-04-24 19:14 UTC (permalink / raw)
  To: git; +Cc: Scott Bauersfeld, Scott Bauersfeld

From: Scott Bauersfeld <sbauersfeld@g.ucla.edu>

Both index-pack and unpack-objects read pack data from stdin through
a 4 KiB static buffer (input_buffer[4096]). On each fill(), consumed
bytes are flushed to the output pack file via write_or_die(), so
every write(2) moves at most 4 KiB.

On FUSE-backed filesystems every write(2) is a synchronous round
trip through the FUSE protocol (userspace -> kernel -> userspace ->
back), so the 4 KiB buffer turns a clone into many unnecessary tiny
writes with noticeable latency overhead.

Increase the buffer from 4 KiB to 128 KiB, matching the default
already used by the hashfile layer in csum-file.c.

Testing with strace on HTTPS clones of git/git (~296 MB pack, 5 runs
per variant, isolated builds from the same v2.54.0 source) shows:

  index-pack pack file writes: 72,465 -> 24,943 avg (66% reduction)
  total write() syscalls:     310,192 -> 259,530 avg (17% reduction)
  writes of exactly 4096 bytes: ~40,077 -> 0 (eliminated)

All clones produce identical HEAD, file count, and pass fsck.

Signed-off-by: Scott Bauersfeld <sbauersfeld@g.ucla.edu>
---
    index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB
    
    Both index-pack and unpack-objects read pack data from stdin through a 4
    KiB static buffer (input_buffer[4096]). On each fill(), consumed bytes
    are flushed to the output pack file via write_or_die(), so every
    write(2) moves at most 4 KiB.
    
    On FUSE-backed filesystems every write(2) is a synchronous round trip
    through the FUSE protocol (userspace → kernel → userspace → back), so
    the 4 KiB buffer turns a clone into many unnecessary tiny writes with
    noticeable latency overhead.
    
    This change increase the buffer from 4 KiB to 128 KiB, matching the
    default already used by the hashfile layer in csum-file.c.
    
    Benchmarked with 5 HTTPS clones per version of
    https://github.com/sbauersfeld/git.git (~296 MB pack), using strace -f
    to count write() syscalls. Both binaries built from the same v2.54.0
    source tree in isolated directories to ensure the bin-wrappers resolve
    to the correct binary.
    
    Correctness verified via git fsck --no-dangling, rev-parse HEAD, and
    working tree file count — all 10 clones match.
    
    Results:
    
    Metric Unpatched (4 KiB) Patched (128 KiB) Change index-pack writes to
    pack file 72,465 avg 24,943 avg −66% Total write() syscalls (all
    processes) 310,192 avg 259,530 avg −17% Writes of exactly 4096 bytes
    ~40,077 avg 0 eliminated HEAD / file count / fsck ✓ ✓ None
    
    Raw data:
    
    unpatched (input_buffer[4096]): run 1: total_writes=311787
    ip_pack_writes=72353 ip_4k=35311 run 2: total_writes=310252
    ip_pack_writes=72348 ip_4k=38024 run 3: total_writes=309737
    ip_pack_writes=72303 ip_4k=43003 run 4: total_writes=309801
    ip_pack_writes=72661 ip_4k=42349 run 5: total_writes=309383
    ip_pack_writes=72662 ip_4k=41702
    
    patched (input_buffer[128 * 1024]): run 1: total_writes=264659
    ip_pack_writes=26605 ip_4k=0 run 2: total_writes=264276
    ip_pack_writes=26568 ip_4k=0 run 3: total_writes=227796 ip_pack_writes=
    9762 ip_4k=0 run 4: total_writes=262464 ip_pack_writes=27830 ip_4k=0 run
    5: total_writes=278455 ip_pack_writes=33952 ip_4k=0

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-2282%2Fsbauersfeld%2Fsb%2Fincrease-index-pack-input-buffer-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-2282/sbauersfeld/sb/increase-index-pack-input-buffer-v1
Pull-Request: https://github.com/git/git/pull/2282

 builtin/index-pack.c     | 4 ++--
 builtin/unpack-objects.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index ca7784dc2c..81a628bf34 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -145,8 +145,8 @@ static int check_self_contained_and_connected;
 
 static struct progress *progress;
 
-/* We always read in 4kB chunks. */
-static unsigned char input_buffer[4096];
+#define INPUT_BUFFER_SIZE (128 * 1024)
+static unsigned char input_buffer[INPUT_BUFFER_SIZE];
 static unsigned int input_offset, input_len;
 static off_t consumed_bytes;
 static off_t max_input_size;
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index e01cf6e360..535c019f82 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -23,8 +23,8 @@
 static int dry_run, quiet, recover, has_errors, strict;
 static const char unpack_usage[] = "git unpack-objects [-n] [-q] [-r] [--strict]";
 
-/* We always read in 4kB chunks. */
-static unsigned char buffer[4096];
+#define INPUT_BUFFER_SIZE (128 * 1024)
+static unsigned char buffer[INPUT_BUFFER_SIZE];
 static unsigned int offset, len;
 static off_t consumed_bytes;
 static off_t max_input_size;

base-commit: 94f057755b7941b321fd11fec1b2e3ca5313a4e0
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-05-12  5:51 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-24 19:14 [PATCH] index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB Scott Bauersfeld via GitGitGadget
2026-04-25 10:21 ` Junio C Hamano
2026-04-27 12:36   ` Derrick Stolee
2026-04-28  1:46     ` Junio C Hamano
2026-04-28  2:09       ` Jeff King
2026-04-27 16:08 ` [PATCH v2] " Scott Bauersfeld via GitGitGadget
2026-04-27 17:23   ` Derrick Stolee
2026-04-27 19:26   ` [PATCH v3] " Scott Bauersfeld via GitGitGadget
2026-04-27 20:12     ` Derrick Stolee
2026-04-28  1:47       ` Junio C Hamano
2026-04-28 14:47     ` [PATCH v4] " Scott Bauersfeld via GitGitGadget
2026-05-12  5:51       ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox