From: Derrick Stolee <stolee@gmail.com>
To: Scott Bauersfeld via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>,
Scott Bauersfeld <sbauersfeld@g.ucla.edu>
Subject: Re: [PATCH v2] index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB
Date: Mon, 27 Apr 2026 13:23:06 -0400 [thread overview]
Message-ID: <5498637e-178f-48aa-8cdc-adc38b100627@gmail.com> (raw)
In-Reply-To: <pull.2282.v2.git.git.1777306114914.gitgitgadget@gmail.com>
On 4/27/2026 12:08 PM, Scott Bauersfeld via GitGitGadget wrote:
> From: Scott Bauersfeld <sbauersfeld@g.ucla.edu>
...
> Wall-clock time of git clone over HTTPS onto a FUSE passthrough
> filesystem with writeback caching disabled, 3 runs per variant:
>
> vscode (~1.26 GB pack): 84.5s -> 75.7s avg (10% faster)
> git/git (~306 MB pack): 22.6s -> 20.0s avg (11% faster)
Wow! This is much higher than I expected. Great find.
I imagine that other platforms or non-FUSE setups will not
have the same benefits. As long as they aren't _regressions_
then this is a great find.
> -/* We always read in 4kB chunks. */
> -static unsigned char input_buffer[4096];
> +static unsigned char input_buffer[DEFAULT_PACKFILE_BUFFER_SIZE];
> -/* We always read in 4kB chunks. */
> -static unsigned char buffer[4096];
> +static unsigned char buffer[DEFAULT_PACKFILE_BUFFER_SIZE];
These changes are what I expected in v2.
> diff --git a/csum-file.c b/csum-file.c
> index 9558177a11..c1aeaf587a 100644
> --- a/csum-file.c
> +++ b/csum-file.c
> @@ -178,7 +178,7 @@ struct hashfile *hashfd_ext(const struct git_hash_algo *algop,
> f->algop = unsafe_hash_algo(algop);
> f->algop->init_fn(&f->ctx);
>
> - f->buffer_len = opts->buffer_len ? opts->buffer_len : 128 * 1024;
> + f->buffer_len = opts->buffer_len ? opts->buffer_len : DEFAULT_PACKFILE_BUFFER_SIZE;
> f->buffer = xmalloc(f->buffer_len);
> f->check_buffer = NULL;
This one surprised me, as this hunk wasn't in your v1 patch.
I think using this replacement makes sense, since it _is_ an
exact value. It did make me think as to how we landed on 128K
for this example.
The previous line is due to a1118c0a446 (csum-file: introduce
`hashfd_ext()`, 2026-03-13), but it only moved the 128K default
from hashfd(). Notably, hashfd_throughput() still uses an 8K
setting in opt->buffer_len.
Hilariously, I went spelunking for the original reason for the
128K and it was 2ca245f8be5 (csum-file.h: increase hashfile
buffer size, 2021-05-18) written by...me. The motivation was
due to using the hashfile logic for the .git/index file which
also used 128K buffers in f279894 (read-cache: make the index
write buffer size 128K, 2021-02-18).
All this is to say that we now have two constants of identical
value, where WRITE_BUFFER_SIZE in read-cache.c could be replaced
with your new DEFAULT_PACKFILE_BUFFER_SIZE.
This does make me think that maybe DEFAULT_PACKFILE_BUFFER_SIZE
is misnamed? Should it be DEFAULT_HASHFILE_BUFFER_SIZE or
DEFAULT_FILESYSTEM_BUFFER_SIZE to better fit this size value
being used in both packfiles and index files?
> diff --git a/git-compat-util.h b/git-compat-util.h
> index ae1bdc90a4..a2f037811c 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -712,6 +712,12 @@ static inline uint64_t u64_add(uint64_t a, uint64_t b)
> # endif
> #endif
>
> +/*
> + * Default buffer size for buffered I/O in pack file operations (index-pack,
> + * unpack-objects) and the hashfile layer in csum-file.
> + */
> +#define DEFAULT_PACKFILE_BUFFER_SIZE (128 * 1024)
> +
I see. Putting this in git-compat-util.h makes the rest
of the changes good without any need to add a new include.
Thanks,
-Stolee
next prev parent reply other threads:[~2026-04-27 17:23 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-24 19:14 [PATCH] index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB Scott Bauersfeld via GitGitGadget
2026-04-25 10:21 ` Junio C Hamano
2026-04-27 12:36 ` Derrick Stolee
2026-04-28 1:46 ` Junio C Hamano
2026-04-28 2:09 ` Jeff King
2026-04-27 16:08 ` [PATCH v2] " Scott Bauersfeld via GitGitGadget
2026-04-27 17:23 ` Derrick Stolee [this message]
2026-04-27 19:26 ` [PATCH v3] " Scott Bauersfeld via GitGitGadget
2026-04-27 20:12 ` Derrick Stolee
2026-04-28 1:47 ` Junio C Hamano
2026-04-28 14:47 ` [PATCH v4] " Scott Bauersfeld via GitGitGadget
2026-05-12 5:51 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5498637e-178f-48aa-8cdc-adc38b100627@gmail.com \
--to=stolee@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=sbauersfeld@g.ucla.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.