public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fetch, clone: add fetch.blobSizeLimit config
@ 2026-03-01 16:44 Alan Braithwaite via GitGitGadget
  2026-03-02 11:53 ` Patrick Steinhardt
  2026-03-05  0:57 ` [PATCH v2] clone: add clone.<url>.defaultObjectFilter config Alan Braithwaite via GitGitGadget
  0 siblings, 2 replies; 28+ messages in thread
From: Alan Braithwaite via GitGitGadget @ 2026-03-01 16:44 UTC (permalink / raw)
  To: git
  Cc: ps, christian.couder, jonathantanmy, me, gitster,
	Alan Braithwaite, Alan Braithwaite

From: Alan Braithwaite <alan@braithwaite.dev>

External tools like git-lfs and git-fat use the filter clean/smudge
mechanism to manage large binary objects, but this requires pointer
files, a separate storage backend, and careful coordination. Git's
partial clone infrastructure provides a more native approach: large
blobs can be excluded at the protocol level during fetch and lazily
retrieved on demand. However, enabling this requires passing
`--filter=blob:limit=<size>` on every clone, which is not
discoverable and cannot be set as a global default.

Add a new `fetch.blobSizeLimit` configuration option that enables
size-based partial clone behavior globally. When set, both `git
clone` and `git fetch` automatically apply a `blob:limit=<size>`
filter. Blobs larger than the threshold that are not needed for the
current worktree are excluded from the transfer and lazily fetched
on demand when needed (e.g., during checkout, diff, or merge).

This makes it easy to work with repositories that have accumulated
large binary files in their history, without downloading all of
them upfront.

The precedence order is:
  1. Explicit `--filter=` on the command line (highest)
  2. Existing `remote.<name>.partialclonefilter`
  3. `fetch.blobSizeLimit` (new, lowest)

Once a clone or fetch applies this setting, the remote is registered
as a promisor remote with the corresponding filter spec, so
subsequent fetches inherit it automatically. If the server does not
support object filtering, the setting is silently ignored.

Signed-off-by: Alan Braithwaite <alan@braithwaite.dev>
---
    fetch, clone: add fetch.blobSizeLimit config

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2058%2Fabraithwaite%2Falan%2Ffetch-blob-size-limit-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2058/abraithwaite/alan/fetch-blob-size-limit-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/2058

 Documentation/config/fetch.adoc | 19 +++++++++++
 builtin/clone.c                 | 13 +++++++
 builtin/fetch.c                 | 45 +++++++++++++++++++------
 t/t5616-partial-clone.sh        | 60 +++++++++++++++++++++++++++++++++
 4 files changed, 127 insertions(+), 10 deletions(-)

diff --git a/Documentation/config/fetch.adoc b/Documentation/config/fetch.adoc
index cd40db0cad..4165354dd9 100644
--- a/Documentation/config/fetch.adoc
+++ b/Documentation/config/fetch.adoc
@@ -103,6 +103,25 @@ config setting.
 	file helps performance of many Git commands, including `git merge-base`,
 	`git push -f`, and `git log --graph`. Defaults to `false`.
 
+`fetch.blobSizeLimit`::
+	When set to a size value (e.g., `1m`, `100k`, `1g`), both
+	linkgit:git-clone[1] and linkgit:git-fetch[1] will automatically
+	use `--filter=blob:limit=<value>` to enable partial clone
+	behavior. Blobs larger than this threshold are excluded from the
+	initial transfer and lazily fetched on demand when needed (e.g.,
+	during checkout).
++
+This provides a convenient way to enable size-based partial clones
+globally without passing `--filter` on every command. Once a clone or
+fetch applies this setting, the remote is registered as a promisor
+remote with the corresponding filter, so subsequent fetches inherit
+the filter automatically.
++
+An explicit `--filter` option on the command line takes precedence over
+this config. An existing `remote.<name>.partialclonefilter` also takes
+precedence. If the server does not support object filtering, the
+setting is silently ignored.
+
 `fetch.bundleURI`::
 	This value stores a URI for downloading Git object data from a bundle
 	URI before performing an incremental fetch from the origin Git server.
diff --git a/builtin/clone.c b/builtin/clone.c
index 45d8fa0eed..1e3261b623 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -78,6 +78,7 @@ static struct string_list option_optional_reference = STRING_LIST_INIT_NODUP;
 static int max_jobs = -1;
 static struct string_list option_recurse_submodules = STRING_LIST_INIT_NODUP;
 static int config_filter_submodules = -1;    /* unspecified */
+static char *config_blob_size_limit;
 static int option_remote_submodules;
 
 static int recurse_submodules_cb(const struct option *opt,
@@ -753,6 +754,10 @@ static int git_clone_config(const char *k, const char *v,
 		config_reject_shallow = git_config_bool(k, v);
 	if (!strcmp(k, "clone.filtersubmodules"))
 		config_filter_submodules = git_config_bool(k, v);
+	if (!strcmp(k, "fetch.blobsizelimit")) {
+		free(config_blob_size_limit);
+		git_config_string(&config_blob_size_limit, k, v);
+	}
 
 	return git_default_config(k, v, ctx, cb);
 }
@@ -1010,6 +1015,13 @@ int cmd_clone(int argc,
 	argc = parse_options(argc, argv, prefix, builtin_clone_options,
 			     builtin_clone_usage, 0);
 
+	if (!filter_options.choice && config_blob_size_limit) {
+		struct strbuf buf = STRBUF_INIT;
+		strbuf_addf(&buf, "blob:limit=%s", config_blob_size_limit);
+		parse_list_objects_filter(&filter_options, buf.buf);
+		strbuf_release(&buf);
+	}
+
 	if (argc > 2)
 		usage_msg_opt(_("Too many arguments."),
 			builtin_clone_usage, builtin_clone_options);
@@ -1634,6 +1646,7 @@ int cmd_clone(int argc,
 		       ref_storage_format);
 
 	list_objects_filter_release(&filter_options);
+	free(config_blob_size_limit);
 
 	string_list_clear(&option_not, 0);
 	string_list_clear(&option_config, 0);
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 573c295241..ff898cb6f4 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -109,6 +109,7 @@ struct fetch_config {
 	int recurse_submodules;
 	int parallel;
 	int submodule_fetch_jobs;
+	char *blob_size_limit;
 };
 
 static int git_fetch_config(const char *k, const char *v,
@@ -160,6 +161,9 @@ static int git_fetch_config(const char *k, const char *v,
 		return 0;
 	}
 
+	if (!strcmp(k, "fetch.blobsizelimit"))
+		return git_config_string(&fetch_config->blob_size_limit, k, v);
+
 	if (!strcmp(k, "fetch.output")) {
 		if (!v)
 			return config_error_nonbool(k);
@@ -2342,7 +2346,8 @@ static int fetch_multiple(struct string_list *list, int max_children,
  * or inherit the default filter-spec from the config.
  */
 static inline void fetch_one_setup_partial(struct remote *remote,
-					   struct list_objects_filter_options *filter_options)
+					   struct list_objects_filter_options *filter_options,
+					   const struct fetch_config *config)
 {
 	/*
 	 * Explicit --no-filter argument overrides everything, regardless
@@ -2352,10 +2357,12 @@ static inline void fetch_one_setup_partial(struct remote *remote,
 		return;
 
 	/*
-	 * If no prior partial clone/fetch and the current fetch DID NOT
-	 * request a partial-fetch, do a normal fetch.
+	 * If no prior partial clone/fetch, the current fetch did not
+	 * request a partial-fetch, and no global blob size limit is
+	 * configured, do a normal fetch.
 	 */
-	if (!repo_has_promisor_remote(the_repository) && !filter_options->choice)
+	if (!repo_has_promisor_remote(the_repository) &&
+	    !filter_options->choice && !config->blob_size_limit)
 		return;
 
 	/*
@@ -2372,11 +2379,27 @@ static inline void fetch_one_setup_partial(struct remote *remote,
 	/*
 	 * Do a partial-fetch from the promisor remote using either the
 	 * explicitly given filter-spec or inherit the filter-spec from
-	 * the config.
+	 * the per-remote config.
+	 */
+	if (repo_has_promisor_remote(the_repository)) {
+		partial_clone_get_default_filter_spec(filter_options,
+						      remote->name);
+		if (filter_options->choice)
+			return;
+	}
+
+	/*
+	 * Fall back to the global fetch.blobSizeLimit config. This
+	 * enables partial clone behavior without requiring --filter
+	 * on the command line or a pre-existing promisor remote.
 	 */
-	if (!filter_options->choice)
-		partial_clone_get_default_filter_spec(filter_options, remote->name);
-	return;
+	if (!filter_options->choice && config->blob_size_limit) {
+		struct strbuf buf = STRBUF_INIT;
+		strbuf_addf(&buf, "blob:limit=%s", config->blob_size_limit);
+		parse_list_objects_filter(filter_options, buf.buf);
+		strbuf_release(&buf);
+		partial_clone_register(remote->name, filter_options);
+	}
 }
 
 static int fetch_one(struct remote *remote, int argc, const char **argv,
@@ -2762,9 +2785,10 @@ int cmd_fetch(int argc,
 		oidset_clear(&acked_commits);
 		trace2_region_leave("fetch", "negotiate-only", the_repository);
 	} else if (remote) {
-		if (filter_options.choice || repo_has_promisor_remote(the_repository)) {
+		if (filter_options.choice || repo_has_promisor_remote(the_repository) ||
+		    config.blob_size_limit) {
 			trace2_region_enter("fetch", "setup-partial", the_repository);
-			fetch_one_setup_partial(remote, &filter_options);
+			fetch_one_setup_partial(remote, &filter_options, &config);
 			trace2_region_leave("fetch", "setup-partial", the_repository);
 		}
 		trace2_region_enter("fetch", "fetch-one", the_repository);
@@ -2876,5 +2900,6 @@ int cmd_fetch(int argc,
  cleanup:
 	string_list_clear(&list, 0);
 	list_objects_filter_release(&filter_options);
+	free(config.blob_size_limit);
 	return result;
 }
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 1e354e057f..44b41f315f 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -722,6 +722,66 @@ test_expect_success 'after fetching descendants of non-promisor commits, gc work
 	git -C partial gc --prune=now
 '
 
+# Test fetch.blobSizeLimit config
+
+test_expect_success 'setup for fetch.blobSizeLimit tests' '
+	git init blob-limit-src &&
+	echo "small" >blob-limit-src/small.txt &&
+	dd if=/dev/zero of=blob-limit-src/large.bin bs=1024 count=100 2>/dev/null &&
+	git -C blob-limit-src add . &&
+	git -C blob-limit-src commit -m "initial" &&
+
+	git clone --bare "file://$(pwd)/blob-limit-src" blob-limit-srv.bare &&
+	git -C blob-limit-srv.bare config --local uploadpack.allowfilter 1 &&
+	git -C blob-limit-srv.bare config --local uploadpack.allowanysha1inwant 1
+'
+
+test_expect_success 'clone with fetch.blobSizeLimit config applies filter' '
+	git -c fetch.blobSizeLimit=1k clone \
+		"file://$(pwd)/blob-limit-srv.bare" blob-limit-clone &&
+
+	test "$(git -C blob-limit-clone config --local remote.origin.promisor)" = "true" &&
+	test "$(git -C blob-limit-clone config --local remote.origin.partialclonefilter)" = "blob:limit=1024"
+'
+
+test_expect_success 'clone with --filter overrides fetch.blobSizeLimit' '
+	git -c fetch.blobSizeLimit=1k clone --filter=blob:none \
+		"file://$(pwd)/blob-limit-srv.bare" blob-limit-override &&
+
+	test "$(git -C blob-limit-override config --local remote.origin.partialclonefilter)" = "blob:none"
+'
+
+test_expect_success 'fetch with fetch.blobSizeLimit registers promisor remote' '
+	git clone --no-checkout "file://$(pwd)/blob-limit-srv.bare" blob-limit-fetch &&
+
+	# Sanity: not yet a partial clone
+	test_must_fail git -C blob-limit-fetch config --local remote.origin.promisor &&
+
+	# Add a new commit to the server
+	echo "new-small" >blob-limit-src/new-small.txt &&
+	dd if=/dev/zero of=blob-limit-src/new-large.bin bs=1024 count=100 2>/dev/null &&
+	git -C blob-limit-src add . &&
+	git -C blob-limit-src commit -m "second" &&
+	git -C blob-limit-src push "file://$(pwd)/blob-limit-srv.bare" main &&
+
+	# Fetch with the config set
+	git -C blob-limit-fetch -c fetch.blobSizeLimit=1k fetch origin &&
+
+	test "$(git -C blob-limit-fetch config --local remote.origin.promisor)" = "true" &&
+	test "$(git -C blob-limit-fetch config --local remote.origin.partialclonefilter)" = "blob:limit=1024"
+'
+
+test_expect_success 'fetch.blobSizeLimit does not override existing partialclonefilter' '
+	git clone --filter=blob:none \
+		"file://$(pwd)/blob-limit-srv.bare" blob-limit-existing &&
+
+	test "$(git -C blob-limit-existing config --local remote.origin.partialclonefilter)" = "blob:none" &&
+
+	# Fetch with a different blobSizeLimit; existing filter should win
+	git -C blob-limit-existing -c fetch.blobSizeLimit=1k fetch origin &&
+
+	test "$(git -C blob-limit-existing config --local remote.origin.partialclonefilter)" = "blob:none"
+'
 
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd

base-commit: 7b2bccb0d58d4f24705bf985de1f4612e4cf06e5
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2026-03-16  7:47 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-01 16:44 [PATCH] fetch, clone: add fetch.blobSizeLimit config Alan Braithwaite via GitGitGadget
2026-03-02 11:53 ` Patrick Steinhardt
2026-03-02 18:28   ` Jeff King
2026-03-02 18:57   ` Junio C Hamano
2026-03-02 21:36     ` Alan Braithwaite
2026-03-03  6:30       ` Patrick Steinhardt
2026-03-03 14:00         ` Alan Braithwaite
2026-03-03 15:08           ` Patrick Steinhardt
2026-03-03 17:58             ` Junio C Hamano
2026-03-04  5:07               ` Patrick Steinhardt
2026-03-03 17:05         ` Junio C Hamano
2026-03-03 14:34       ` Jeff King
2026-03-05  0:57 ` [PATCH v2] clone: add clone.<url>.defaultObjectFilter config Alan Braithwaite via GitGitGadget
2026-03-05 19:01   ` Junio C Hamano
2026-03-05 23:11     ` Alan Braithwaite
2026-03-06  6:55   ` [PATCH v3] " Alan Braithwaite via GitGitGadget
2026-03-06 10:39     ` brian m. carlson
2026-03-06 19:33       ` Junio C Hamano
2026-03-06 21:50         ` Alan Braithwaite
2026-03-06 21:47     ` [PATCH v4] " Alan Braithwaite via GitGitGadget
2026-03-06 22:18       ` Junio C Hamano
2026-03-07  1:04         ` Alan Braithwaite
2026-03-07  1:33       ` [PATCH v5] " Alan Braithwaite via GitGitGadget
2026-03-11  7:44         ` Patrick Steinhardt
2026-03-15  1:33           ` Alan Braithwaite
2026-03-15  5:37         ` [PATCH v6] " Alan Braithwaite via GitGitGadget
2026-03-15 21:32           ` Junio C Hamano
2026-03-16  7:47           ` Patrick Steinhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox