From: Derrick Stolee <stolee@gmail.com>
To: Scott Bauersfeld via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>,
Scott Bauersfeld <sbauersfeld@g.ucla.edu>
Subject: Re: [PATCH v2] index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB
Date: Mon, 27 Apr 2026 13:23:06 -0400 [thread overview]
Message-ID: <5498637e-178f-48aa-8cdc-adc38b100627@gmail.com> (raw)
In-Reply-To: <pull.2282.v2.git.git.1777306114914.gitgitgadget@gmail.com>
On 4/27/2026 12:08 PM, Scott Bauersfeld via GitGitGadget wrote:
> From: Scott Bauersfeld <sbauersfeld@g.ucla.edu>
...
> Wall-clock time of git clone over HTTPS onto a FUSE passthrough
> filesystem with writeback caching disabled, 3 runs per variant:
>
> vscode (~1.26 GB pack): 84.5s -> 75.7s avg (10% faster)
> git/git (~306 MB pack): 22.6s -> 20.0s avg (11% faster)
Wow! This is much higher than I expected. Great find.
I imagine that other platforms or non-FUSE setups will not
have the same benefits. As long as they aren't _regressions_
then this is a great find.
> -/* We always read in 4kB chunks. */
> -static unsigned char input_buffer[4096];
> +static unsigned char input_buffer[DEFAULT_PACKFILE_BUFFER_SIZE];
> -/* We always read in 4kB chunks. */
> -static unsigned char buffer[4096];
> +static unsigned char buffer[DEFAULT_PACKFILE_BUFFER_SIZE];
These changes are what I expected in v2.
> diff --git a/csum-file.c b/csum-file.c
> index 9558177a11..c1aeaf587a 100644
> --- a/csum-file.c
> +++ b/csum-file.c
> @@ -178,7 +178,7 @@ struct hashfile *hashfd_ext(const struct git_hash_algo *algop,
> f->algop = unsafe_hash_algo(algop);
> f->algop->init_fn(&f->ctx);
>
> - f->buffer_len = opts->buffer_len ? opts->buffer_len : 128 * 1024;
> + f->buffer_len = opts->buffer_len ? opts->buffer_len : DEFAULT_PACKFILE_BUFFER_SIZE;
> f->buffer = xmalloc(f->buffer_len);
> f->check_buffer = NULL;
This one surprised me, as this hunk wasn't in your v1 patch.
I think using this replacement makes sense, since it _is_ an
exact value. It did make me think as to how we landed on 128K
for this example.
The previous line is due to a1118c0a446 (csum-file: introduce
`hashfd_ext()`, 2026-03-13), but it only moved the 128K default
from hashfd(). Notably, hashfd_throughput() still uses an 8K
setting in opt->buffer_len.
Hilariously, I went spelunking for the original reason for the
128K and it was 2ca245f8be5 (csum-file.h: increase hashfile
buffer size, 2021-05-18) written by...me. The motivation was
due to using the hashfile logic for the .git/index file which
also used 128K buffers in f279894 (read-cache: make the index
write buffer size 128K, 2021-02-18).
All this is to say that we now have two constants of identical
value, where WRITE_BUFFER_SIZE in read-cache.c could be replaced
with your new DEFAULT_PACKFILE_BUFFER_SIZE.
This does make me think that maybe DEFAULT_PACKFILE_BUFFER_SIZE
is misnamed? Should it be DEFAULT_HASHFILE_BUFFER_SIZE or
DEFAULT_FILESYSTEM_BUFFER_SIZE to better fit this size value
being used in both packfiles and index files?
> diff --git a/git-compat-util.h b/git-compat-util.h
> index ae1bdc90a4..a2f037811c 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -712,6 +712,12 @@ static inline uint64_t u64_add(uint64_t a, uint64_t b)
> # endif
> #endif
>
> +/*
> + * Default buffer size for buffered I/O in pack file operations (index-pack,
> + * unpack-objects) and the hashfile layer in csum-file.
> + */
> +#define DEFAULT_PACKFILE_BUFFER_SIZE (128 * 1024)
> +
I see. Putting this in git-compat-util.h makes the rest
of the changes good without any need to add a new include.
Thanks,
-Stolee
next prev parent reply other threads:[~2026-04-27 17:23 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-24 19:14 [PATCH] index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB Scott Bauersfeld via GitGitGadget
2026-04-25 10:21 ` Junio C Hamano
2026-04-27 12:36 ` Derrick Stolee
2026-04-28 1:46 ` Junio C Hamano
2026-04-28 2:09 ` Jeff King
2026-04-27 16:08 ` [PATCH v2] " Scott Bauersfeld via GitGitGadget
2026-04-27 17:23 ` Derrick Stolee [this message]
2026-04-27 19:26 ` [PATCH v3] " Scott Bauersfeld via GitGitGadget
2026-04-27 20:12 ` Derrick Stolee
2026-04-28 1:47 ` Junio C Hamano
2026-04-28 14:47 ` [PATCH v4] " Scott Bauersfeld via GitGitGadget
2026-05-12 5:51 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5498637e-178f-48aa-8cdc-adc38b100627@gmail.com \
--to=stolee@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=sbauersfeld@g.ucla.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox