git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: "brian m. carlson" <sandals@crustytoothpaste.net>,
	Haylin Moore <hmoore@qumulo.com>,
	git@vger.kernel.org
Subject: Re: git only writing 4k at a time
Date: Tue, 1 Jul 2025 13:58:13 +0200	[thread overview]
Message-ID: <aGPNVRxZQiFpCIFs@pks.im> (raw)
In-Reply-To: <aFM9Uh0K2TSAuoHb@fruit.crustytoothpaste.net>

On Wed, Jun 18, 2025 at 10:27:30PM +0000, brian m. carlson wrote:
> On 2025-06-18 at 20:58:52, Haylin Moore wrote:
> > Hiya list,
> > 
> > I've been investigating some performance issues around git clones over
> > network mounts. We have noticed that git is only writing 4k at a time.
> > These small serial writes are making it such that even though each
> > write is only a 3ms operation, the total time balloons. Looking around
> > the source code I found that reftable_writer is initialized by default
> > (though I cannot find the block_size argument being supplied in my
> > cursory look) always to DEFAULT_BLOCK_SIZE (4096). Is there some way
> > to increase/configure this block size such that larger writes happen?
> > In git/Documentation/config/reftable.adoc this block size is mentioned
> > in a manner that almost feels configurable, but I'm not sure if this
> > is just internal for development.
> 
> It's fine to adjust reftable.blockSize upwards if you'd like, which
> controls the block size for reftable writes (which is what you're seeing
> if the writes are from the reftable code).  I think at least some
> versions of JGit use 64 KiB for various reasons.

Yup. The default block size of 4kB was picked because most filesystems
use it. Google uses 64kB because to the best of my knowledge they use
Spanner to store the tables? At least that's what I recall from past
conversations.

> As the documentation describes, there may be some performance penalties
> during reads since more refs will have to be read, so reading a single
> ref will likely be more expensive.  However, you may find that
> acceptable and you can adjust the values such that they provide the
> right balance in your environment.  I would definitely recommend a
> power-of-two block size, though.

So this kind of depends on the filesystem's block size. If yours uses
bigger blocks it's definitely recommended to adjust as needed. The block
size is ultimately a tradeoff, and the best value heavily depends on
both your system and on your use case.

I'm curious though -- are you sure that this is actually the bottleneck?
Reftables are only used if you explicitly opted into them, and I would
be very surprised if a clone is really slowed down significantly by a
clone.

Patrick

      reply	other threads:[~2025-07-01 11:58 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-18 20:58 git only writing 4k at a time Haylin Moore
2025-06-18 22:27 ` brian m. carlson
2025-07-01 11:58   ` Patrick Steinhardt [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aGPNVRxZQiFpCIFs@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=hmoore@qumulo.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).