public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Alan Braithwaite via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, christian.couder@gmail.com,
	jonathantanmy@google.com, me@ttaylorr.com, gitster@pobox.com,
	Alan Braithwaite <alan@braithwaite.dev>
Subject: Re: [PATCH] fetch, clone: add fetch.blobSizeLimit config
Date: Mon, 2 Mar 2026 12:53:32 +0100	[thread overview]
Message-ID: <aaV6PLJCrpb2mQnq@pks.im> (raw)
In-Reply-To: <pull.2058.git.1772383499900.gitgitgadget@gmail.com>

On Sun, Mar 01, 2026 at 04:44:59PM +0000, Alan Braithwaite via GitGitGadget wrote:
> From: Alan Braithwaite <alan@braithwaite.dev>
> 
> External tools like git-lfs and git-fat use the filter clean/smudge
> mechanism to manage large binary objects, but this requires pointer
> files, a separate storage backend, and careful coordination. Git's
> partial clone infrastructure provides a more native approach: large
> blobs can be excluded at the protocol level during fetch and lazily
> retrieved on demand. However, enabling this requires passing
> `--filter=blob:limit=<size>` on every clone, which is not
> discoverable and cannot be set as a global default.

I'm not sure that we should make blob size limiting the default. The
problem with specifying a limit is that this is comparatively expensive
to compute on the server side: we have to look up each blob so that we
can determine its size. Unfortunately, such requests cannot (currently)
be optimized via for example bitmaps, or any other cache that we have.

So if we want to make any filter the default, I'd propose that we should
rather think about filters that are computationally less expensive, like
for example `--filter=blob:none`. This can be computed efficiently via
bitmaps.

The downside is of course that in this case we have to do way more
backfill fetches compared to the case where we only leave out a couple
of blobs. But unless we figure out a way to serve the size limit filter
in a more efficient way I'm not sure about proper alternatives.

Another question to consider: is it really sensible to set this setting
globally? It is very much dependent on the forge that you're connecting
to, as forges may not even allow object filters at all, or only a subset
of them.

Thanks!

Patrick

  reply	other threads:[~2026-03-02 11:53 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-01 16:44 [PATCH] fetch, clone: add fetch.blobSizeLimit config Alan Braithwaite via GitGitGadget
2026-03-02 11:53 ` Patrick Steinhardt [this message]
2026-03-02 18:28   ` Jeff King
2026-03-02 18:57   ` Junio C Hamano
2026-03-02 21:36     ` Alan Braithwaite
2026-03-03  6:30       ` Patrick Steinhardt
2026-03-03 14:00         ` Alan Braithwaite
2026-03-03 15:08           ` Patrick Steinhardt
2026-03-03 17:58             ` Junio C Hamano
2026-03-04  5:07               ` Patrick Steinhardt
2026-03-03 17:05         ` Junio C Hamano
2026-03-03 14:34       ` Jeff King
2026-03-05  0:57 ` [PATCH v2] clone: add clone.<url>.defaultObjectFilter config Alan Braithwaite via GitGitGadget
2026-03-05 19:01   ` Junio C Hamano
2026-03-05 23:11     ` Alan Braithwaite
2026-03-06  6:55   ` [PATCH v3] " Alan Braithwaite via GitGitGadget
2026-03-06 10:39     ` brian m. carlson
2026-03-06 19:33       ` Junio C Hamano
2026-03-06 21:50         ` Alan Braithwaite
2026-03-06 21:47     ` [PATCH v4] " Alan Braithwaite via GitGitGadget
2026-03-06 22:18       ` Junio C Hamano
2026-03-07  1:04         ` Alan Braithwaite
2026-03-07  1:33       ` [PATCH v5] " Alan Braithwaite via GitGitGadget
2026-03-11  7:44         ` Patrick Steinhardt
2026-03-15  1:33           ` Alan Braithwaite
2026-03-15  5:37         ` [PATCH v6] " Alan Braithwaite via GitGitGadget
2026-03-15 21:32           ` Junio C Hamano
2026-03-16  7:47           ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aaV6PLJCrpb2mQnq@pks.im \
    --to=ps@pks.im \
    --cc=alan@braithwaite.dev \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox