From: "Scott Bauersfeld via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>,
Derrick Stolee <stolee@gmail.com>,
Scott Bauersfeld <sbauersfeld@g.ucla.edu>,
Scott Bauersfeld <sbauersfeld@g.ucla.edu>
Subject: [PATCH v3] index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB
Date: Mon, 27 Apr 2026 19:26:38 +0000 [thread overview]
Message-ID: <pull.2282.v3.git.git.1777317998098.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2282.v2.git.git.1777306114914.gitgitgadget@gmail.com>
From: Scott Bauersfeld <sbauersfeld@g.ucla.edu>
index-pack and unpack-objects both read pack data from stdin through
a 4 KiB static buffer. In index-pack, each fill() flushes consumed
bytes to the pack file via write_or_die(), capping every write(2)
at 4 KiB. unpack-objects uses the same buffer pattern for reads.
On FUSE-backed filesystems every write(2) is a synchronous round
trip through the FUSE protocol (userspace -> kernel -> userspace ->
back), so the 4 KiB buffer turns a clone into many unnecessary tiny
writes with noticeable latency overhead.
Increase the buffer from 4 KiB to 128 KiB. Introduce a shared
DEFAULT_IO_BUFFER_SIZE constant in git-compat-util.h (next to
MAX_IO_SIZE) and use it in index-pack, unpack-objects, and the
hashfile layer in csum-file (which already used 128 KiB but
hardcoded the value).
Syscall counts via strace on HTTPS clones of git/git (~296 MB pack,
5 runs per variant, isolated builds from the same v2.54.0 source):
index-pack pack file writes: 72,465 -> 24,943 avg (65% fewer)
total write() syscalls: 310,192 -> 259,530 avg (16% fewer)
writes of exactly 4096 bytes: ~40,077 -> 0
Wall-clock time of git clone over HTTPS onto a FUSE passthrough
filesystem with writeback caching disabled, 3 runs per variant:
vscode (~1.26 GB pack): 84.5s -> 75.7s avg (10% faster)
git/git (~306 MB pack): 22.6s -> 20.0s avg (11% faster)
Signed-off-by: Scott Bauersfeld <sbauersfeld@g.ucla.edu>
---
index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB
index-pack and unpack-objects read pack data from stdin through a 4 KiB
static buffer. In index-pack, each fill() flushes consumed bytes to the
pack file via write_or_die(), capping every write(2) at 4 KiB.
unpack-objects uses the same buffer pattern for reads.
On FUSE-backed filesystems every write(2) is a synchronous round trip
through the FUSE protocol (userspace → kernel → userspace → back), so
the 4 KiB buffer turns a clone into many unnecessary tiny writes with
noticeable latency overhead.
Increase the buffer from 4 KiB to 128 KiB. Introduce a shared
DEFAULT_IO_BUFFER_SIZE constant in git-compat-util.h (next to
MAX_IO_SIZE) and use it in index-pack, unpack-objects, and the hashfile
layer in csum-file (which already used 128 KiB but hardcoded the value).
Syscall reduction
=================
Measured via strace -f on HTTPS clones of git/git (~296 MB pack, 5 runs
per variant, isolated builds from the same v2.54.0 source):
Metric Unpatched (4 KiB) Patched (128 KiB) Change index-pack writes to
pack file 72,465 avg 24,943 avg −65% Total write() syscalls (all
processes) 310,192 avg 259,530 avg −16% Writes of exactly 4096 bytes
~40,077 avg 0 eliminated HEAD / file count / fsck ✓ ✓ identical
Wall-clock time on FUSE
=======================
Measured wall-clock time of git clone over HTTPS onto a FUSE passthrough
filesystem with writeback caching disabled. 3 runs per variant:
Repo Unpatched avg Patched avg Change microsoft/vscode (~1.26 GB pack)
84.5s 75.7s −10% git/git (~306 MB pack) 22.6s 20.0s −11%
Changes since v2
================
* Renamed DEFAULT_PACKFILE_BUFFER_SIZE → DEFAULT_IO_BUFFER_SIZE per
Stolee's feedback. The constant is not packfile-specific, since it is
also used by the hashfile layer.
* Stolee noted that WRITE_BUFFER_SIZE in read-cache.c could be
consolidated. That constant was already removed in f6e2cd0625
("read-cache: delete unused hashing methods", 2021-05-18) when
read-cache.c was converted to use the hashfile API, so there is
nothing left to unify. The rename to DEFAULT_IO_BUFFER_SIZE helps
account for the multiple usages of this constant.
Changes since v1
================
* Introduced shared DEFAULT_PACKFILE_BUFFER_SIZE constant in
git-compat-util.h (next to MAX_IO_SIZE), replacing per-file #define
and the hardcoded value in csum-file.c. Placed here rather than
environment.h since it is an I/O buffer size, not an environment
variable or repo config.
* Added wall-clock timing on a FUSE filesystem.
* Cleaned up the commit description a bit.
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-2282%2Fsbauersfeld%2Fsb%2Fincrease-index-pack-input-buffer-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-2282/sbauersfeld/sb/increase-index-pack-input-buffer-v3
Pull-Request: https://github.com/git/git/pull/2282
Range-diff vs v2:
1: ac2559ccb5 ! 1: df754ac879 index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB
@@ Commit message
writes with noticeable latency overhead.
Increase the buffer from 4 KiB to 128 KiB. Introduce a shared
- DEFAULT_PACKFILE_BUFFER_SIZE constant in git-compat-util.h (next to
+ DEFAULT_IO_BUFFER_SIZE constant in git-compat-util.h (next to
MAX_IO_SIZE) and use it in index-pack, unpack-objects, and the
hashfile layer in csum-file (which already used 128 KiB but
hardcoded the value).
@@ builtin/index-pack.c: static int check_self_contained_and_connected;
-/* We always read in 4kB chunks. */
-static unsigned char input_buffer[4096];
-+static unsigned char input_buffer[DEFAULT_PACKFILE_BUFFER_SIZE];
++static unsigned char input_buffer[DEFAULT_IO_BUFFER_SIZE];
static unsigned int input_offset, input_len;
static off_t consumed_bytes;
static off_t max_input_size;
@@ builtin/unpack-objects.c
-/* We always read in 4kB chunks. */
-static unsigned char buffer[4096];
-+static unsigned char buffer[DEFAULT_PACKFILE_BUFFER_SIZE];
++static unsigned char buffer[DEFAULT_IO_BUFFER_SIZE];
static unsigned int offset, len;
static off_t consumed_bytes;
static off_t max_input_size;
@@ csum-file.c: struct hashfile *hashfd_ext(const struct git_hash_algo *algop,
f->algop->init_fn(&f->ctx);
- f->buffer_len = opts->buffer_len ? opts->buffer_len : 128 * 1024;
-+ f->buffer_len = opts->buffer_len ? opts->buffer_len : DEFAULT_PACKFILE_BUFFER_SIZE;
++ f->buffer_len = opts->buffer_len ? opts->buffer_len : DEFAULT_IO_BUFFER_SIZE;
f->buffer = xmalloc(f->buffer_len);
f->check_buffer = NULL;
@@ git-compat-util.h: static inline uint64_t u64_add(uint64_t a, uint64_t b)
#endif
+/*
-+ * Default buffer size for buffered I/O in pack file operations (index-pack,
-+ * unpack-objects) and the hashfile layer in csum-file.
++ * Default buffer size for buffered I/O in index-pack, unpack-objects,
++ * and the hashfile layer in csum-file.
+ */
-+#define DEFAULT_PACKFILE_BUFFER_SIZE (128 * 1024)
++#define DEFAULT_IO_BUFFER_SIZE (128 * 1024)
+
#ifdef HAVE_ALLOCA_H
# include <alloca.h>
builtin/index-pack.c | 3 +--
builtin/unpack-objects.c | 3 +--
csum-file.c | 2 +-
git-compat-util.h | 6 ++++++
4 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index ca7784dc2c..bb3639641c 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -145,8 +145,7 @@ static int check_self_contained_and_connected;
static struct progress *progress;
-/* We always read in 4kB chunks. */
-static unsigned char input_buffer[4096];
+static unsigned char input_buffer[DEFAULT_IO_BUFFER_SIZE];
static unsigned int input_offset, input_len;
static off_t consumed_bytes;
static off_t max_input_size;
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index e01cf6e360..af67d1a1d3 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -23,8 +23,7 @@
static int dry_run, quiet, recover, has_errors, strict;
static const char unpack_usage[] = "git unpack-objects [-n] [-q] [-r] [--strict]";
-/* We always read in 4kB chunks. */
-static unsigned char buffer[4096];
+static unsigned char buffer[DEFAULT_IO_BUFFER_SIZE];
static unsigned int offset, len;
static off_t consumed_bytes;
static off_t max_input_size;
diff --git a/csum-file.c b/csum-file.c
index 9558177a11..d7a682c2b6 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -178,7 +178,7 @@ struct hashfile *hashfd_ext(const struct git_hash_algo *algop,
f->algop = unsafe_hash_algo(algop);
f->algop->init_fn(&f->ctx);
- f->buffer_len = opts->buffer_len ? opts->buffer_len : 128 * 1024;
+ f->buffer_len = opts->buffer_len ? opts->buffer_len : DEFAULT_IO_BUFFER_SIZE;
f->buffer = xmalloc(f->buffer_len);
f->check_buffer = NULL;
diff --git a/git-compat-util.h b/git-compat-util.h
index ae1bdc90a4..5024814bd4 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -712,6 +712,12 @@ static inline uint64_t u64_add(uint64_t a, uint64_t b)
# endif
#endif
+/*
+ * Default buffer size for buffered I/O in index-pack, unpack-objects,
+ * and the hashfile layer in csum-file.
+ */
+#define DEFAULT_IO_BUFFER_SIZE (128 * 1024)
+
#ifdef HAVE_ALLOCA_H
# include <alloca.h>
# define xalloca(size) (alloca(size))
base-commit: 94f057755b7941b321fd11fec1b2e3ca5313a4e0
--
gitgitgadget
next prev parent reply other threads:[~2026-04-27 19:26 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-24 19:14 [PATCH] index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB Scott Bauersfeld via GitGitGadget
2026-04-25 10:21 ` Junio C Hamano
2026-04-27 12:36 ` Derrick Stolee
2026-04-28 1:46 ` Junio C Hamano
2026-04-28 2:09 ` Jeff King
2026-04-27 16:08 ` [PATCH v2] " Scott Bauersfeld via GitGitGadget
2026-04-27 17:23 ` Derrick Stolee
2026-04-27 19:26 ` Scott Bauersfeld via GitGitGadget [this message]
2026-04-27 20:12 ` [PATCH v3] " Derrick Stolee
2026-04-28 1:47 ` Junio C Hamano
2026-04-28 14:47 ` [PATCH v4] " Scott Bauersfeld via GitGitGadget
2026-05-12 5:51 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.2282.v3.git.git.1777317998098.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=sbauersfeld@g.ucla.edu \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.