From: "Scott Bauersfeld via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Scott Bauersfeld <sbauersfeld@g.ucla.edu>,
Scott Bauersfeld <sbauersfeld@g.ucla.edu>
Subject: [PATCH] index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB
Date: Fri, 24 Apr 2026 19:14:58 +0000 [thread overview]
Message-ID: <pull.2282.git.git.1777058098756.gitgitgadget@gmail.com> (raw)
From: Scott Bauersfeld <sbauersfeld@g.ucla.edu>
Both index-pack and unpack-objects read pack data from stdin through
a 4 KiB static buffer (input_buffer[4096]). On each fill(), consumed
bytes are flushed to the output pack file via write_or_die(), so
every write(2) moves at most 4 KiB.
On FUSE-backed filesystems every write(2) is a synchronous round
trip through the FUSE protocol (userspace -> kernel -> userspace ->
back), so the 4 KiB buffer turns a clone into many unnecessary tiny
writes with noticeable latency overhead.
Increase the buffer from 4 KiB to 128 KiB, matching the default
already used by the hashfile layer in csum-file.c.
Testing with strace on HTTPS clones of git/git (~296 MB pack, 5 runs
per variant, isolated builds from the same v2.54.0 source) shows:
index-pack pack file writes: 72,465 -> 24,943 avg (66% reduction)
total write() syscalls: 310,192 -> 259,530 avg (17% reduction)
writes of exactly 4096 bytes: ~40,077 -> 0 (eliminated)
All clones produce identical HEAD, file count, and pass fsck.
Signed-off-by: Scott Bauersfeld <sbauersfeld@g.ucla.edu>
---
index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB
Both index-pack and unpack-objects read pack data from stdin through a 4
KiB static buffer (input_buffer[4096]). On each fill(), consumed bytes
are flushed to the output pack file via write_or_die(), so every
write(2) moves at most 4 KiB.
On FUSE-backed filesystems every write(2) is a synchronous round trip
through the FUSE protocol (userspace → kernel → userspace → back), so
the 4 KiB buffer turns a clone into many unnecessary tiny writes with
noticeable latency overhead.
This change increase the buffer from 4 KiB to 128 KiB, matching the
default already used by the hashfile layer in csum-file.c.
Benchmarked with 5 HTTPS clones per version of
https://github.com/sbauersfeld/git.git (~296 MB pack), using strace -f
to count write() syscalls. Both binaries built from the same v2.54.0
source tree in isolated directories to ensure the bin-wrappers resolve
to the correct binary.
Correctness verified via git fsck --no-dangling, rev-parse HEAD, and
working tree file count — all 10 clones match.
Results:
Metric Unpatched (4 KiB) Patched (128 KiB) Change index-pack writes to
pack file 72,465 avg 24,943 avg −66% Total write() syscalls (all
processes) 310,192 avg 259,530 avg −17% Writes of exactly 4096 bytes
~40,077 avg 0 eliminated HEAD / file count / fsck ✓ ✓ None
Raw data:
unpatched (input_buffer[4096]): run 1: total_writes=311787
ip_pack_writes=72353 ip_4k=35311 run 2: total_writes=310252
ip_pack_writes=72348 ip_4k=38024 run 3: total_writes=309737
ip_pack_writes=72303 ip_4k=43003 run 4: total_writes=309801
ip_pack_writes=72661 ip_4k=42349 run 5: total_writes=309383
ip_pack_writes=72662 ip_4k=41702
patched (input_buffer[128 * 1024]): run 1: total_writes=264659
ip_pack_writes=26605 ip_4k=0 run 2: total_writes=264276
ip_pack_writes=26568 ip_4k=0 run 3: total_writes=227796 ip_pack_writes=
9762 ip_4k=0 run 4: total_writes=262464 ip_pack_writes=27830 ip_4k=0 run
5: total_writes=278455 ip_pack_writes=33952 ip_4k=0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-2282%2Fsbauersfeld%2Fsb%2Fincrease-index-pack-input-buffer-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-2282/sbauersfeld/sb/increase-index-pack-input-buffer-v1
Pull-Request: https://github.com/git/git/pull/2282
builtin/index-pack.c | 4 ++--
builtin/unpack-objects.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index ca7784dc2c..81a628bf34 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -145,8 +145,8 @@ static int check_self_contained_and_connected;
static struct progress *progress;
-/* We always read in 4kB chunks. */
-static unsigned char input_buffer[4096];
+#define INPUT_BUFFER_SIZE (128 * 1024)
+static unsigned char input_buffer[INPUT_BUFFER_SIZE];
static unsigned int input_offset, input_len;
static off_t consumed_bytes;
static off_t max_input_size;
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index e01cf6e360..535c019f82 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -23,8 +23,8 @@
static int dry_run, quiet, recover, has_errors, strict;
static const char unpack_usage[] = "git unpack-objects [-n] [-q] [-r] [--strict]";
-/* We always read in 4kB chunks. */
-static unsigned char buffer[4096];
+#define INPUT_BUFFER_SIZE (128 * 1024)
+static unsigned char buffer[INPUT_BUFFER_SIZE];
static unsigned int offset, len;
static off_t consumed_bytes;
static off_t max_input_size;
base-commit: 94f057755b7941b321fd11fec1b2e3ca5313a4e0
--
gitgitgadget
next reply other threads:[~2026-04-24 19:15 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-24 19:14 Scott Bauersfeld via GitGitGadget [this message]
2026-04-25 10:21 ` [PATCH] index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB Junio C Hamano
2026-04-27 12:36 ` Derrick Stolee
2026-04-28 1:46 ` Junio C Hamano
2026-04-28 2:09 ` Jeff King
2026-04-27 16:08 ` [PATCH v2] " Scott Bauersfeld via GitGitGadget
2026-04-27 17:23 ` Derrick Stolee
2026-04-27 19:26 ` [PATCH v3] " Scott Bauersfeld via GitGitGadget
2026-04-27 20:12 ` Derrick Stolee
2026-04-28 1:47 ` Junio C Hamano
2026-04-28 14:47 ` [PATCH v4] " Scott Bauersfeld via GitGitGadget
2026-05-12 5:51 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.2282.git.git.1777058098756.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=sbauersfeld@g.ucla.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox