Git development
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Scott Bauersfeld via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>,
	Scott Bauersfeld <sbauersfeld@g.ucla.edu>
Subject: Re: [PATCH v2] index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB
Date: Mon, 27 Apr 2026 13:23:06 -0400	[thread overview]
Message-ID: <5498637e-178f-48aa-8cdc-adc38b100627@gmail.com> (raw)
In-Reply-To: <pull.2282.v2.git.git.1777306114914.gitgitgadget@gmail.com>

On 4/27/2026 12:08 PM, Scott Bauersfeld via GitGitGadget wrote:
> From: Scott Bauersfeld <sbauersfeld@g.ucla.edu>
...
> Wall-clock time of git clone over HTTPS onto a FUSE passthrough
> filesystem with writeback caching disabled, 3 runs per variant:
> 
>   vscode (~1.26 GB pack): 84.5s -> 75.7s avg (10% faster)
>   git/git (~306 MB pack):  22.6s -> 20.0s avg (11% faster)

Wow! This is much higher than I expected. Great find.

I imagine that other platforms or non-FUSE setups will not
have the same benefits. As long as they aren't _regressions_
then this is a great find.

> -/* We always read in 4kB chunks. */
> -static unsigned char input_buffer[4096];
> +static unsigned char input_buffer[DEFAULT_PACKFILE_BUFFER_SIZE];

> -/* We always read in 4kB chunks. */
> -static unsigned char buffer[4096];
> +static unsigned char buffer[DEFAULT_PACKFILE_BUFFER_SIZE];

These changes are what I expected in v2.

> diff --git a/csum-file.c b/csum-file.c
> index 9558177a11..c1aeaf587a 100644
> --- a/csum-file.c
> +++ b/csum-file.c
> @@ -178,7 +178,7 @@ struct hashfile *hashfd_ext(const struct git_hash_algo *algop,
>  	f->algop = unsafe_hash_algo(algop);
>  	f->algop->init_fn(&f->ctx);
>  
> -	f->buffer_len = opts->buffer_len ? opts->buffer_len : 128 * 1024;
> +	f->buffer_len = opts->buffer_len ? opts->buffer_len : DEFAULT_PACKFILE_BUFFER_SIZE;
>  	f->buffer = xmalloc(f->buffer_len);
>  	f->check_buffer = NULL;

This one surprised me, as this hunk wasn't in your v1 patch.

I think using this replacement makes sense, since it _is_ an
exact value. It did make me think as to how we landed on 128K
for this example.

The previous line is due to a1118c0a446 (csum-file: introduce
`hashfd_ext()`, 2026-03-13), but it only moved the 128K default
from hashfd(). Notably, hashfd_throughput() still uses an 8K
setting in opt->buffer_len.

Hilariously, I went spelunking for the original reason for the
128K and it was 2ca245f8be5 (csum-file.h: increase hashfile
buffer size, 2021-05-18) written by...me. The motivation was
due to using the hashfile logic for the .git/index file which
also used 128K buffers in  f279894 (read-cache: make the index
write buffer size 128K, 2021-02-18).

All this is to say that we now have two constants of identical
value, where WRITE_BUFFER_SIZE in read-cache.c could be replaced
with your new DEFAULT_PACKFILE_BUFFER_SIZE.

This does make me think that maybe DEFAULT_PACKFILE_BUFFER_SIZE
is misnamed? Should it be DEFAULT_HASHFILE_BUFFER_SIZE or
DEFAULT_FILESYSTEM_BUFFER_SIZE to better fit this size value
being used in both packfiles and index files?

> diff --git a/git-compat-util.h b/git-compat-util.h
> index ae1bdc90a4..a2f037811c 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -712,6 +712,12 @@ static inline uint64_t u64_add(uint64_t a, uint64_t b)
>  # endif
>  #endif
>  
> +/*
> + * Default buffer size for buffered I/O in pack file operations (index-pack,
> + * unpack-objects) and the hashfile layer in csum-file.
> + */
> +#define DEFAULT_PACKFILE_BUFFER_SIZE (128 * 1024)
> +
I see. Putting this in git-compat-util.h makes the rest
of the changes good without any need to add a new include.

Thanks,
-Stolee

  reply	other threads:[~2026-04-27 17:23 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-24 19:14 [PATCH] index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB Scott Bauersfeld via GitGitGadget
2026-04-25 10:21 ` Junio C Hamano
2026-04-27 12:36   ` Derrick Stolee
2026-04-28  1:46     ` Junio C Hamano
2026-04-28  2:09       ` Jeff King
2026-04-27 16:08 ` [PATCH v2] " Scott Bauersfeld via GitGitGadget
2026-04-27 17:23   ` Derrick Stolee [this message]
2026-04-27 19:26   ` [PATCH v3] " Scott Bauersfeld via GitGitGadget
2026-04-27 20:12     ` Derrick Stolee
2026-04-28  1:47       ` Junio C Hamano
2026-04-28 14:47     ` [PATCH v4] " Scott Bauersfeld via GitGitGadget
2026-05-12  5:51       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5498637e-178f-48aa-8cdc-adc38b100627@gmail.com \
    --to=stolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=sbauersfeld@g.ucla.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox