All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Scott Bauersfeld via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Scott Bauersfeld <sbauersfeld@g.ucla.edu>,
	Scott Bauersfeld <sbauersfeld@g.ucla.edu>
Subject: [PATCH] index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB
Date: Fri, 24 Apr 2026 19:14:58 +0000	[thread overview]
Message-ID: <pull.2282.git.git.1777058098756.gitgitgadget@gmail.com> (raw)

From: Scott Bauersfeld <sbauersfeld@g.ucla.edu>

Both index-pack and unpack-objects read pack data from stdin through
a 4 KiB static buffer (input_buffer[4096]). On each fill(), consumed
bytes are flushed to the output pack file via write_or_die(), so
every write(2) moves at most 4 KiB.

On FUSE-backed filesystems every write(2) is a synchronous round
trip through the FUSE protocol (userspace -> kernel -> userspace ->
back), so the 4 KiB buffer turns a clone into many unnecessary tiny
writes with noticeable latency overhead.

Increase the buffer from 4 KiB to 128 KiB, matching the default
already used by the hashfile layer in csum-file.c.

Testing with strace on HTTPS clones of git/git (~296 MB pack, 5 runs
per variant, isolated builds from the same v2.54.0 source) shows:

  index-pack pack file writes: 72,465 -> 24,943 avg (66% reduction)
  total write() syscalls:     310,192 -> 259,530 avg (17% reduction)
  writes of exactly 4096 bytes: ~40,077 -> 0 (eliminated)

All clones produce identical HEAD, file count, and pass fsck.

Signed-off-by: Scott Bauersfeld <sbauersfeld@g.ucla.edu>
---
    index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB
    
    Both index-pack and unpack-objects read pack data from stdin through a 4
    KiB static buffer (input_buffer[4096]). On each fill(), consumed bytes
    are flushed to the output pack file via write_or_die(), so every
    write(2) moves at most 4 KiB.
    
    On FUSE-backed filesystems every write(2) is a synchronous round trip
    through the FUSE protocol (userspace → kernel → userspace → back), so
    the 4 KiB buffer turns a clone into many unnecessary tiny writes with
    noticeable latency overhead.
    
    This change increase the buffer from 4 KiB to 128 KiB, matching the
    default already used by the hashfile layer in csum-file.c.
    
    Benchmarked with 5 HTTPS clones per version of
    https://github.com/sbauersfeld/git.git (~296 MB pack), using strace -f
    to count write() syscalls. Both binaries built from the same v2.54.0
    source tree in isolated directories to ensure the bin-wrappers resolve
    to the correct binary.
    
    Correctness verified via git fsck --no-dangling, rev-parse HEAD, and
    working tree file count — all 10 clones match.
    
    Results:
    
    Metric Unpatched (4 KiB) Patched (128 KiB) Change index-pack writes to
    pack file 72,465 avg 24,943 avg −66% Total write() syscalls (all
    processes) 310,192 avg 259,530 avg −17% Writes of exactly 4096 bytes
    ~40,077 avg 0 eliminated HEAD / file count / fsck ✓ ✓ None
    
    Raw data:
    
    unpatched (input_buffer[4096]): run 1: total_writes=311787
    ip_pack_writes=72353 ip_4k=35311 run 2: total_writes=310252
    ip_pack_writes=72348 ip_4k=38024 run 3: total_writes=309737
    ip_pack_writes=72303 ip_4k=43003 run 4: total_writes=309801
    ip_pack_writes=72661 ip_4k=42349 run 5: total_writes=309383
    ip_pack_writes=72662 ip_4k=41702
    
    patched (input_buffer[128 * 1024]): run 1: total_writes=264659
    ip_pack_writes=26605 ip_4k=0 run 2: total_writes=264276
    ip_pack_writes=26568 ip_4k=0 run 3: total_writes=227796 ip_pack_writes=
    9762 ip_4k=0 run 4: total_writes=262464 ip_pack_writes=27830 ip_4k=0 run
    5: total_writes=278455 ip_pack_writes=33952 ip_4k=0

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-2282%2Fsbauersfeld%2Fsb%2Fincrease-index-pack-input-buffer-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-2282/sbauersfeld/sb/increase-index-pack-input-buffer-v1
Pull-Request: https://github.com/git/git/pull/2282

 builtin/index-pack.c     | 4 ++--
 builtin/unpack-objects.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index ca7784dc2c..81a628bf34 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -145,8 +145,8 @@ static int check_self_contained_and_connected;
 
 static struct progress *progress;
 
-/* We always read in 4kB chunks. */
-static unsigned char input_buffer[4096];
+#define INPUT_BUFFER_SIZE (128 * 1024)
+static unsigned char input_buffer[INPUT_BUFFER_SIZE];
 static unsigned int input_offset, input_len;
 static off_t consumed_bytes;
 static off_t max_input_size;
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index e01cf6e360..535c019f82 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -23,8 +23,8 @@
 static int dry_run, quiet, recover, has_errors, strict;
 static const char unpack_usage[] = "git unpack-objects [-n] [-q] [-r] [--strict]";
 
-/* We always read in 4kB chunks. */
-static unsigned char buffer[4096];
+#define INPUT_BUFFER_SIZE (128 * 1024)
+static unsigned char buffer[INPUT_BUFFER_SIZE];
 static unsigned int offset, len;
 static off_t consumed_bytes;
 static off_t max_input_size;

base-commit: 94f057755b7941b321fd11fec1b2e3ca5313a4e0
-- 
gitgitgadget

             reply	other threads:[~2026-04-24 19:15 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-24 19:14 Scott Bauersfeld via GitGitGadget [this message]
2026-04-25 10:21 ` [PATCH] index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB Junio C Hamano
2026-04-27 12:36   ` Derrick Stolee
2026-04-28  1:46     ` Junio C Hamano
2026-04-28  2:09       ` Jeff King
2026-04-27 16:08 ` [PATCH v2] " Scott Bauersfeld via GitGitGadget
2026-04-27 17:23   ` Derrick Stolee
2026-04-27 19:26   ` [PATCH v3] " Scott Bauersfeld via GitGitGadget
2026-04-27 20:12     ` Derrick Stolee
2026-04-28  1:47       ` Junio C Hamano
2026-04-28 14:47     ` [PATCH v4] " Scott Bauersfeld via GitGitGadget
2026-05-12  5:51       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.2282.git.git.1777058098756.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=sbauersfeld@g.ucla.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.