public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: "Alan Braithwaite via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: ps@pks.im, christian.couder@gmail.com, jonathantanmy@google.com,
	me@ttaylorr.com, gitster@pobox.com, Jeff King <peff@peff.net>,
	Alan Braithwaite <alan@braithwaite.dev>,
	Alan Braithwaite <alan@braithwaite.dev>
Subject: [PATCH v2] clone: add clone.<url>.defaultObjectFilter config
Date: Thu, 05 Mar 2026 00:57:31 +0000	[thread overview]
Message-ID: <pull.2058.v2.git.1772672251281.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2058.git.1772383499900.gitgitgadget@gmail.com>

From: Alan Braithwaite <alan@braithwaite.dev>

Add a new configuration option that lets users specify a default
partial clone filter per URL pattern.  When cloning a repository
whose URL matches a configured pattern, git-clone automatically
applies the filter, equivalent to passing --filter on the command
line.

    [clone "https://github.com/"]
        defaultObjectFilter = blob:limit=5m

    [clone "https://internal.corp.com/large-project/"]
        defaultObjectFilter = blob:none

URL matching uses the existing urlmatch_config_entry() infrastructure,
following the same rules as http.<url>.* — you can match a domain,
a namespace path, or a specific project, and the most specific match
wins.

The config only affects the initial clone.  Once the clone completes,
the filter is recorded in remote.<name>.partialCloneFilter, so
subsequent fetches inherit it automatically.  An explicit --filter
flag on the command line takes precedence.

Only the URL-qualified form (clone.<url>.defaultObjectFilter) is
honored; a bare clone.defaultObjectFilter without a URL subsection
is ignored.

Signed-off-by: Alan Braithwaite <alan@braithwaite.dev>
---
    fetch, clone: add fetch.blobSizeLimit config

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2058%2Fabraithwaite%2Falan%2Ffetch-blob-size-limit-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2058/abraithwaite/alan/fetch-blob-size-limit-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/2058

Range-diff vs v1:

 1:  818b64e2e2 ! 1:  4a73edd2e8 fetch, clone: add fetch.blobSizeLimit config
     @@ Metadata
      Author: Alan Braithwaite <alan@braithwaite.dev>
      
       ## Commit message ##
     -    fetch, clone: add fetch.blobSizeLimit config
     +    clone: add clone.<url>.defaultObjectFilter config
      
     -    External tools like git-lfs and git-fat use the filter clean/smudge
     -    mechanism to manage large binary objects, but this requires pointer
     -    files, a separate storage backend, and careful coordination. Git's
     -    partial clone infrastructure provides a more native approach: large
     -    blobs can be excluded at the protocol level during fetch and lazily
     -    retrieved on demand. However, enabling this requires passing
     -    `--filter=blob:limit=<size>` on every clone, which is not
     -    discoverable and cannot be set as a global default.
     +    Add a new configuration option that lets users specify a default
     +    partial clone filter per URL pattern.  When cloning a repository
     +    whose URL matches a configured pattern, git-clone automatically
     +    applies the filter, equivalent to passing --filter on the command
     +    line.
      
     -    Add a new `fetch.blobSizeLimit` configuration option that enables
     -    size-based partial clone behavior globally. When set, both `git
     -    clone` and `git fetch` automatically apply a `blob:limit=<size>`
     -    filter. Blobs larger than the threshold that are not needed for the
     -    current worktree are excluded from the transfer and lazily fetched
     -    on demand when needed (e.g., during checkout, diff, or merge).
     +        [clone "https://github.com/"]
     +            defaultObjectFilter = blob:limit=5m
      
     -    This makes it easy to work with repositories that have accumulated
     -    large binary files in their history, without downloading all of
     -    them upfront.
     +        [clone "https://internal.corp.com/large-project/"]
     +            defaultObjectFilter = blob:none
      
     -    The precedence order is:
     -      1. Explicit `--filter=` on the command line (highest)
     -      2. Existing `remote.<name>.partialclonefilter`
     -      3. `fetch.blobSizeLimit` (new, lowest)
     +    URL matching uses the existing urlmatch_config_entry() infrastructure,
     +    following the same rules as http.<url>.* — you can match a domain,
     +    a namespace path, or a specific project, and the most specific match
     +    wins.
      
     -    Once a clone or fetch applies this setting, the remote is registered
     -    as a promisor remote with the corresponding filter spec, so
     -    subsequent fetches inherit it automatically. If the server does not
     -    support object filtering, the setting is silently ignored.
     +    The config only affects the initial clone.  Once the clone completes,
     +    the filter is recorded in remote.<name>.partialCloneFilter, so
     +    subsequent fetches inherit it automatically.  An explicit --filter
     +    flag on the command line takes precedence.
     +
     +    Only the URL-qualified form (clone.<url>.defaultObjectFilter) is
     +    honored; a bare clone.defaultObjectFilter without a URL subsection
     +    is ignored.
      
          Signed-off-by: Alan Braithwaite <alan@braithwaite.dev>
      
     - ## Documentation/config/fetch.adoc ##
     -@@ Documentation/config/fetch.adoc: config setting.
     - 	file helps performance of many Git commands, including `git merge-base`,
     - 	`git push -f`, and `git log --graph`. Defaults to `false`.
     - 
     -+`fetch.blobSizeLimit`::
     -+	When set to a size value (e.g., `1m`, `100k`, `1g`), both
     -+	linkgit:git-clone[1] and linkgit:git-fetch[1] will automatically
     -+	use `--filter=blob:limit=<value>` to enable partial clone
     -+	behavior. Blobs larger than this threshold are excluded from the
     -+	initial transfer and lazily fetched on demand when needed (e.g.,
     -+	during checkout).
     + ## Documentation/config/clone.adoc ##
     +@@ Documentation/config/clone.adoc: endif::[]
     + 	If a partial clone filter is provided (see `--filter` in
     + 	linkgit:git-rev-list[1]) and `--recurse-submodules` is used, also apply
     + 	the filter to submodules.
     ++
     ++`clone.<url>.defaultObjectFilter`::
     ++	When set to a filter spec string (e.g., `blob:limit=1m`,
     ++	`blob:none`, `tree:0`), linkgit:git-clone[1] will automatically
     ++	use `--filter=<value>` when the clone URL matches `<url>`.
     ++	Objects matching the filter are excluded from the initial
     ++	transfer and lazily fetched on demand (e.g., during checkout).
     ++	Subsequent fetches inherit the filter via the per-remote config
     ++	that is written during the clone.
      ++
     -+This provides a convenient way to enable size-based partial clones
     -+globally without passing `--filter` on every command. Once a clone or
     -+fetch applies this setting, the remote is registered as a promisor
     -+remote with the corresponding filter, so subsequent fetches inherit
     -+the filter automatically.
     ++The URL matching follows the same rules as `http.<url>.*` (see
     ++linkgit:git-config[1]).  The most specific URL match wins.  You can
     ++match a complete domain, a namespace, or a specific project:
      ++
     -+An explicit `--filter` option on the command line takes precedence over
     -+this config. An existing `remote.<name>.partialclonefilter` also takes
     -+precedence. If the server does not support object filtering, the
     -+setting is silently ignored.
     ++----
     ++[clone "https://github.com/"]
     ++    defaultObjectFilter = blob:limit=5m
      +
     - `fetch.bundleURI`::
     - 	This value stores a URI for downloading Git object data from a bundle
     - 	URI before performing an incremental fetch from the origin Git server.
     ++[clone "https://internal.corp.com/large-project/"]
     ++    defaultObjectFilter = blob:none
     ++----
     +++
     ++An explicit `--filter` option on the command line takes precedence
     ++over this config.  Only affects the initial clone; it has no effect
     ++on later fetches into an existing repository.  If the server does
     ++not support object filtering, the setting is silently ignored.
      
       ## builtin/clone.c ##
     -@@ builtin/clone.c: static struct string_list option_optional_reference = STRING_LIST_INIT_NODUP;
     - static int max_jobs = -1;
     - static struct string_list option_recurse_submodules = STRING_LIST_INIT_NODUP;
     - static int config_filter_submodules = -1;    /* unspecified */
     -+static char *config_blob_size_limit;
     - static int option_remote_submodules;
     - 
     - static int recurse_submodules_cb(const struct option *opt,
     +@@
     + #include "path.h"
     + #include "pkt-line.h"
     + #include "list-objects-filter-options.h"
     ++#include "urlmatch.h"
     + #include "hook.h"
     + #include "bundle.h"
     + #include "bundle-uri.h"
      @@ builtin/clone.c: static int git_clone_config(const char *k, const char *v,
     - 		config_reject_shallow = git_config_bool(k, v);
     - 	if (!strcmp(k, "clone.filtersubmodules"))
     - 		config_filter_submodules = git_config_bool(k, v);
     -+	if (!strcmp(k, "fetch.blobsizelimit")) {
     -+		free(config_blob_size_limit);
     -+		git_config_string(&config_blob_size_limit, k, v);
     -+	}
     - 
       	return git_default_config(k, v, ctx, cb);
       }
     -@@ builtin/clone.c: int cmd_clone(int argc,
     - 	argc = parse_options(argc, argv, prefix, builtin_clone_options,
     - 			     builtin_clone_usage, 0);
       
     -+	if (!filter_options.choice && config_blob_size_limit) {
     -+		struct strbuf buf = STRBUF_INIT;
     -+		strbuf_addf(&buf, "blob:limit=%s", config_blob_size_limit);
     -+		parse_list_objects_filter(&filter_options, buf.buf);
     -+		strbuf_release(&buf);
     ++struct clone_filter_data {
     ++	char *default_object_filter;
     ++};
     ++
     ++static int clone_filter_collect(const char *var, const char *value,
     ++				const struct config_context *ctx UNUSED,
     ++				void *cb)
     ++{
     ++	struct clone_filter_data *data = cb;
     ++
     ++	if (!strcmp(var, "clone.defaultobjectfilter")) {
     ++		free(data->default_object_filter);
     ++		data->default_object_filter = xstrdup(value);
      +	}
     ++	return 0;
     ++}
     ++
     ++/*
     ++ * Look up clone.<url>.defaultObjectFilter using the urlmatch
     ++ * infrastructure.  Only URL-qualified forms are supported; a bare
     ++ * clone.defaultObjectFilter (without a URL) is ignored.
     ++ */
     ++static char *get_default_object_filter(const char *url)
     ++{
     ++	struct urlmatch_config config = URLMATCH_CONFIG_INIT;
     ++	struct clone_filter_data data = { 0 };
     ++	struct string_list_item *item;
     ++	char *normalized_url;
     ++
     ++	config.section = "clone";
     ++	config.key = "defaultobjectfilter";
     ++	config.collect_fn = clone_filter_collect;
     ++	config.cascade_fn = git_clone_config;
     ++	config.cb = &data;
     ++
     ++	normalized_url = url_normalize(url, &config.url);
     ++
     ++	repo_config(the_repository, urlmatch_config_entry, &config);
     ++	free(normalized_url);
      +
     - 	if (argc > 2)
     - 		usage_msg_opt(_("Too many arguments."),
     - 			builtin_clone_usage, builtin_clone_options);
     -@@ builtin/clone.c: int cmd_clone(int argc,
     - 		       ref_storage_format);
     - 
     - 	list_objects_filter_release(&filter_options);
     -+	free(config_blob_size_limit);
     - 
     - 	string_list_clear(&option_not, 0);
     - 	string_list_clear(&option_config, 0);
     -
     - ## builtin/fetch.c ##
     -@@ builtin/fetch.c: struct fetch_config {
     - 	int recurse_submodules;
     - 	int parallel;
     - 	int submodule_fetch_jobs;
     -+	char *blob_size_limit;
     - };
     - 
     - static int git_fetch_config(const char *k, const char *v,
     -@@ builtin/fetch.c: static int git_fetch_config(const char *k, const char *v,
     - 		return 0;
     - 	}
     - 
     -+	if (!strcmp(k, "fetch.blobsizelimit"))
     -+		return git_config_string(&fetch_config->blob_size_limit, k, v);
     -+
     - 	if (!strcmp(k, "fetch.output")) {
     - 		if (!v)
     - 			return config_error_nonbool(k);
     -@@ builtin/fetch.c: static int fetch_multiple(struct string_list *list, int max_children,
     -  * or inherit the default filter-spec from the config.
     -  */
     - static inline void fetch_one_setup_partial(struct remote *remote,
     --					   struct list_objects_filter_options *filter_options)
     -+					   struct list_objects_filter_options *filter_options,
     -+					   const struct fetch_config *config)
     - {
     - 	/*
     - 	 * Explicit --no-filter argument overrides everything, regardless
     -@@ builtin/fetch.c: static inline void fetch_one_setup_partial(struct remote *remote,
     - 		return;
     - 
     - 	/*
     --	 * If no prior partial clone/fetch and the current fetch DID NOT
     --	 * request a partial-fetch, do a normal fetch.
     -+	 * If no prior partial clone/fetch, the current fetch did not
     -+	 * request a partial-fetch, and no global blob size limit is
     -+	 * configured, do a normal fetch.
     - 	 */
     --	if (!repo_has_promisor_remote(the_repository) && !filter_options->choice)
     -+	if (!repo_has_promisor_remote(the_repository) &&
     -+	    !filter_options->choice && !config->blob_size_limit)
     - 		return;
     - 
     - 	/*
     -@@ builtin/fetch.c: static inline void fetch_one_setup_partial(struct remote *remote,
     - 	/*
     - 	 * Do a partial-fetch from the promisor remote using either the
     - 	 * explicitly given filter-spec or inherit the filter-spec from
     --	 * the config.
     -+	 * the per-remote config.
     ++	/*
     ++	 * Reject the bare form clone.defaultObjectFilter (no URL
     ++	 * subsection).  urlmatch stores the best match in vars with
     ++	 * hostmatch_len == 0 for non-URL-qualified entries; discard
     ++	 * the result if that is what we got.
      +	 */
     -+	if (repo_has_promisor_remote(the_repository)) {
     -+		partial_clone_get_default_filter_spec(filter_options,
     -+						      remote->name);
     -+		if (filter_options->choice)
     -+			return;
     ++	item = string_list_lookup(&config.vars, "defaultobjectfilter");
     ++	if (item) {
     ++		const struct urlmatch_item *m = item->util;
     ++		if (!m->hostmatch_len && !m->pathmatch_len) {
     ++			FREE_AND_NULL(data.default_object_filter);
     ++		}
      +	}
      +
     -+	/*
     -+	 * Fall back to the global fetch.blobSizeLimit config. This
     -+	 * enables partial clone behavior without requiring --filter
     -+	 * on the command line or a pre-existing promisor remote.
     - 	 */
     --	if (!filter_options->choice)
     --		partial_clone_get_default_filter_spec(filter_options, remote->name);
     --	return;
     -+	if (!filter_options->choice && config->blob_size_limit) {
     -+		struct strbuf buf = STRBUF_INIT;
     -+		strbuf_addf(&buf, "blob:limit=%s", config->blob_size_limit);
     -+		parse_list_objects_filter(filter_options, buf.buf);
     -+		strbuf_release(&buf);
     -+		partial_clone_register(remote->name, filter_options);
     -+	}
     - }
     ++	urlmatch_config_release(&config);
     ++
     ++	return data.default_object_filter;
     ++}
     ++
     + static int write_one_config(const char *key, const char *value,
     + 			    const struct config_context *ctx,
     + 			    void *data)
     +@@ builtin/clone.c: int cmd_clone(int argc,
     + 	} else
     + 		die(_("repository '%s' does not exist"), repo_name);
       
     - static int fetch_one(struct remote *remote, int argc, const char **argv,
     -@@ builtin/fetch.c: int cmd_fetch(int argc,
     - 		oidset_clear(&acked_commits);
     - 		trace2_region_leave("fetch", "negotiate-only", the_repository);
     - 	} else if (remote) {
     --		if (filter_options.choice || repo_has_promisor_remote(the_repository)) {
     -+		if (filter_options.choice || repo_has_promisor_remote(the_repository) ||
     -+		    config.blob_size_limit) {
     - 			trace2_region_enter("fetch", "setup-partial", the_repository);
     --			fetch_one_setup_partial(remote, &filter_options);
     -+			fetch_one_setup_partial(remote, &filter_options, &config);
     - 			trace2_region_leave("fetch", "setup-partial", the_repository);
     - 		}
     - 		trace2_region_enter("fetch", "fetch-one", the_repository);
     -@@ builtin/fetch.c: int cmd_fetch(int argc,
     -  cleanup:
     - 	string_list_clear(&list, 0);
     - 	list_objects_filter_release(&filter_options);
     -+	free(config.blob_size_limit);
     - 	return result;
     - }
     ++	if (!filter_options.choice) {
     ++		char *config_filter = get_default_object_filter(repo);
     ++		if (config_filter) {
     ++			parse_list_objects_filter(&filter_options, config_filter);
     ++			free(config_filter);
     ++		}
     ++	}
     ++
     + 	/* no need to be strict, transport_set_option() will validate it again */
     + 	if (option_depth && atoi(option_depth) < 1)
     + 		die(_("depth %s is not a positive number"), option_depth);
      
       ## t/t5616-partial-clone.sh ##
      @@ t/t5616-partial-clone.sh: test_expect_success 'after fetching descendants of non-promisor commits, gc work
       	git -C partial gc --prune=now
       '
       
     -+# Test fetch.blobSizeLimit config
     ++# Test clone.<url>.defaultObjectFilter config
     ++
     ++test_expect_success 'setup for clone.defaultObjectFilter tests' '
     ++	git init default-filter-src &&
     ++	echo "small" >default-filter-src/small.txt &&
     ++	dd if=/dev/zero of=default-filter-src/large.bin bs=1024 count=100 2>/dev/null &&
     ++	git -C default-filter-src add . &&
     ++	git -C default-filter-src commit -m "initial" &&
     ++
     ++	git clone --bare "file://$(pwd)/default-filter-src" default-filter-srv.bare &&
     ++	git -C default-filter-srv.bare config --local uploadpack.allowfilter 1 &&
     ++	git -C default-filter-srv.bare config --local uploadpack.allowanysha1inwant 1
     ++'
      +
     -+test_expect_success 'setup for fetch.blobSizeLimit tests' '
     -+	git init blob-limit-src &&
     -+	echo "small" >blob-limit-src/small.txt &&
     -+	dd if=/dev/zero of=blob-limit-src/large.bin bs=1024 count=100 2>/dev/null &&
     -+	git -C blob-limit-src add . &&
     -+	git -C blob-limit-src commit -m "initial" &&
     ++test_expect_success 'clone with clone.<url>.defaultObjectFilter applies filter' '
     ++	SERVER_URL="file://$(pwd)/default-filter-srv.bare" &&
     ++	git -c "clone.$SERVER_URL.defaultObjectFilter=blob:limit=1k" clone \
     ++		"$SERVER_URL" default-filter-clone &&
      +
     -+	git clone --bare "file://$(pwd)/blob-limit-src" blob-limit-srv.bare &&
     -+	git -C blob-limit-srv.bare config --local uploadpack.allowfilter 1 &&
     -+	git -C blob-limit-srv.bare config --local uploadpack.allowanysha1inwant 1
     ++	test "$(git -C default-filter-clone config --local remote.origin.promisor)" = "true" &&
     ++	test "$(git -C default-filter-clone config --local remote.origin.partialclonefilter)" = "blob:limit=1024"
      +'
      +
     -+test_expect_success 'clone with fetch.blobSizeLimit config applies filter' '
     -+	git -c fetch.blobSizeLimit=1k clone \
     -+		"file://$(pwd)/blob-limit-srv.bare" blob-limit-clone &&
     ++test_expect_success 'clone with --filter overrides clone.<url>.defaultObjectFilter' '
     ++	SERVER_URL="file://$(pwd)/default-filter-srv.bare" &&
     ++	git -c "clone.$SERVER_URL.defaultObjectFilter=blob:limit=1k" \
     ++		clone --filter=blob:none "$SERVER_URL" default-filter-override &&
      +
     -+	test "$(git -C blob-limit-clone config --local remote.origin.promisor)" = "true" &&
     -+	test "$(git -C blob-limit-clone config --local remote.origin.partialclonefilter)" = "blob:limit=1024"
     ++	test "$(git -C default-filter-override config --local remote.origin.partialclonefilter)" = "blob:none"
      +'
      +
     -+test_expect_success 'clone with --filter overrides fetch.blobSizeLimit' '
     -+	git -c fetch.blobSizeLimit=1k clone --filter=blob:none \
     -+		"file://$(pwd)/blob-limit-srv.bare" blob-limit-override &&
     ++test_expect_success 'clone with clone.<url>.defaultObjectFilter=blob:none works' '
     ++	SERVER_URL="file://$(pwd)/default-filter-srv.bare" &&
     ++	git -c "clone.$SERVER_URL.defaultObjectFilter=blob:none" clone \
     ++		"$SERVER_URL" default-filter-blobnone &&
      +
     -+	test "$(git -C blob-limit-override config --local remote.origin.partialclonefilter)" = "blob:none"
     ++	test "$(git -C default-filter-blobnone config --local remote.origin.promisor)" = "true" &&
     ++	test "$(git -C default-filter-blobnone config --local remote.origin.partialclonefilter)" = "blob:none"
      +'
      +
     -+test_expect_success 'fetch with fetch.blobSizeLimit registers promisor remote' '
     -+	git clone --no-checkout "file://$(pwd)/blob-limit-srv.bare" blob-limit-fetch &&
     ++test_expect_success 'clone.<url>.defaultObjectFilter with tree:0 works' '
     ++	SERVER_URL="file://$(pwd)/default-filter-srv.bare" &&
     ++	git -c "clone.$SERVER_URL.defaultObjectFilter=tree:0" clone \
     ++		"$SERVER_URL" default-filter-tree0 &&
      +
     -+	# Sanity: not yet a partial clone
     -+	test_must_fail git -C blob-limit-fetch config --local remote.origin.promisor &&
     -+
     -+	# Add a new commit to the server
     -+	echo "new-small" >blob-limit-src/new-small.txt &&
     -+	dd if=/dev/zero of=blob-limit-src/new-large.bin bs=1024 count=100 2>/dev/null &&
     -+	git -C blob-limit-src add . &&
     -+	git -C blob-limit-src commit -m "second" &&
     -+	git -C blob-limit-src push "file://$(pwd)/blob-limit-srv.bare" main &&
     ++	test "$(git -C default-filter-tree0 config --local remote.origin.promisor)" = "true" &&
     ++	test "$(git -C default-filter-tree0 config --local remote.origin.partialclonefilter)" = "tree:0"
     ++'
      +
     -+	# Fetch with the config set
     -+	git -C blob-limit-fetch -c fetch.blobSizeLimit=1k fetch origin &&
     ++test_expect_success 'most specific URL match wins for clone.defaultObjectFilter' '
     ++	SERVER_URL="file://$(pwd)/default-filter-srv.bare" &&
     ++	git \
     ++		-c "clone.file://.defaultObjectFilter=blob:limit=1k" \
     ++		-c "clone.$SERVER_URL.defaultObjectFilter=blob:none" \
     ++		clone "$SERVER_URL" default-filter-url-specific &&
      +
     -+	test "$(git -C blob-limit-fetch config --local remote.origin.promisor)" = "true" &&
     -+	test "$(git -C blob-limit-fetch config --local remote.origin.partialclonefilter)" = "blob:limit=1024"
     ++	test "$(git -C default-filter-url-specific config --local remote.origin.partialclonefilter)" = "blob:none"
      +'
      +
     -+test_expect_success 'fetch.blobSizeLimit does not override existing partialclonefilter' '
     -+	git clone --filter=blob:none \
     -+		"file://$(pwd)/blob-limit-srv.bare" blob-limit-existing &&
     ++test_expect_success 'non-matching URL does not apply clone.defaultObjectFilter' '
     ++	git \
     ++		-c "clone.https://other.example.com/.defaultObjectFilter=blob:none" \
     ++		clone "file://$(pwd)/default-filter-srv.bare" default-filter-url-nomatch &&
      +
     -+	test "$(git -C blob-limit-existing config --local remote.origin.partialclonefilter)" = "blob:none" &&
     ++	test_must_fail git -C default-filter-url-nomatch config --local remote.origin.promisor
     ++'
      +
     -+	# Fetch with a different blobSizeLimit; existing filter should win
     -+	git -C blob-limit-existing -c fetch.blobSizeLimit=1k fetch origin &&
     ++test_expect_success 'bare clone.defaultObjectFilter without URL is ignored' '
     ++	git -c clone.defaultObjectFilter=blob:none \
     ++		clone "file://$(pwd)/default-filter-srv.bare" default-filter-bare-key &&
      +
     -+	test "$(git -C blob-limit-existing config --local remote.origin.partialclonefilter)" = "blob:none"
     ++	test_must_fail git -C default-filter-bare-key config --local remote.origin.promisor
      +'
       
       . "$TEST_DIRECTORY"/lib-httpd.sh


 Documentation/config/clone.adoc | 26 ++++++++++++
 builtin/clone.c                 | 68 ++++++++++++++++++++++++++++++
 t/t5616-partial-clone.sh        | 73 +++++++++++++++++++++++++++++++++
 3 files changed, 167 insertions(+)

diff --git a/Documentation/config/clone.adoc b/Documentation/config/clone.adoc
index 0a10efd174..5805ab51c2 100644
--- a/Documentation/config/clone.adoc
+++ b/Documentation/config/clone.adoc
@@ -21,3 +21,29 @@ endif::[]
 	If a partial clone filter is provided (see `--filter` in
 	linkgit:git-rev-list[1]) and `--recurse-submodules` is used, also apply
 	the filter to submodules.
+
+`clone.<url>.defaultObjectFilter`::
+	When set to a filter spec string (e.g., `blob:limit=1m`,
+	`blob:none`, `tree:0`), linkgit:git-clone[1] will automatically
+	use `--filter=<value>` when the clone URL matches `<url>`.
+	Objects matching the filter are excluded from the initial
+	transfer and lazily fetched on demand (e.g., during checkout).
+	Subsequent fetches inherit the filter via the per-remote config
+	that is written during the clone.
++
+The URL matching follows the same rules as `http.<url>.*` (see
+linkgit:git-config[1]).  The most specific URL match wins.  You can
+match a complete domain, a namespace, or a specific project:
++
+----
+[clone "https://github.com/"]
+    defaultObjectFilter = blob:limit=5m
+
+[clone "https://internal.corp.com/large-project/"]
+    defaultObjectFilter = blob:none
+----
++
+An explicit `--filter` option on the command line takes precedence
+over this config.  Only affects the initial clone; it has no effect
+on later fetches into an existing repository.  If the server does
+not support object filtering, the setting is silently ignored.
diff --git a/builtin/clone.c b/builtin/clone.c
index 45d8fa0eed..5e20b5343d 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -44,6 +44,7 @@
 #include "path.h"
 #include "pkt-line.h"
 #include "list-objects-filter-options.h"
+#include "urlmatch.h"
 #include "hook.h"
 #include "bundle.h"
 #include "bundle-uri.h"
@@ -757,6 +758,65 @@ static int git_clone_config(const char *k, const char *v,
 	return git_default_config(k, v, ctx, cb);
 }
 
+struct clone_filter_data {
+	char *default_object_filter;
+};
+
+static int clone_filter_collect(const char *var, const char *value,
+				const struct config_context *ctx UNUSED,
+				void *cb)
+{
+	struct clone_filter_data *data = cb;
+
+	if (!strcmp(var, "clone.defaultobjectfilter")) {
+		free(data->default_object_filter);
+		data->default_object_filter = xstrdup(value);
+	}
+	return 0;
+}
+
+/*
+ * Look up clone.<url>.defaultObjectFilter using the urlmatch
+ * infrastructure.  Only URL-qualified forms are supported; a bare
+ * clone.defaultObjectFilter (without a URL) is ignored.
+ */
+static char *get_default_object_filter(const char *url)
+{
+	struct urlmatch_config config = URLMATCH_CONFIG_INIT;
+	struct clone_filter_data data = { 0 };
+	struct string_list_item *item;
+	char *normalized_url;
+
+	config.section = "clone";
+	config.key = "defaultobjectfilter";
+	config.collect_fn = clone_filter_collect;
+	config.cascade_fn = git_clone_config;
+	config.cb = &data;
+
+	normalized_url = url_normalize(url, &config.url);
+
+	repo_config(the_repository, urlmatch_config_entry, &config);
+	free(normalized_url);
+
+	/*
+	 * Reject the bare form clone.defaultObjectFilter (no URL
+	 * subsection).  urlmatch stores the best match in vars with
+	 * hostmatch_len == 0 for non-URL-qualified entries; discard
+	 * the result if that is what we got.
+	 */
+	item = string_list_lookup(&config.vars, "defaultobjectfilter");
+	if (item) {
+		const struct urlmatch_item *m = item->util;
+		if (!m->hostmatch_len && !m->pathmatch_len) {
+			FREE_AND_NULL(data.default_object_filter);
+		}
+	}
+
+	urlmatch_config_release(&config);
+
+	return data.default_object_filter;
+}
+
 static int write_one_config(const char *key, const char *value,
 			    const struct config_context *ctx,
 			    void *data)
@@ -1057,6 +1117,14 @@ int cmd_clone(int argc,
 	} else
 		die(_("repository '%s' does not exist"), repo_name);
 
+	if (!filter_options.choice) {
+		char *config_filter = get_default_object_filter(repo);
+		if (config_filter) {
+			parse_list_objects_filter(&filter_options, config_filter);
+			free(config_filter);
+		}
+	}
+
 	/* no need to be strict, transport_set_option() will validate it again */
 	if (option_depth && atoi(option_depth) < 1)
 		die(_("depth %s is not a positive number"), option_depth);
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 1e354e057f..33010f3b7d 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -722,6 +722,79 @@ test_expect_success 'after fetching descendants of non-promisor commits, gc work
 	git -C partial gc --prune=now
 '
 
+# Test clone.<url>.defaultObjectFilter config
+
+test_expect_success 'setup for clone.defaultObjectFilter tests' '
+	git init default-filter-src &&
+	echo "small" >default-filter-src/small.txt &&
+	dd if=/dev/zero of=default-filter-src/large.bin bs=1024 count=100 2>/dev/null &&
+	git -C default-filter-src add . &&
+	git -C default-filter-src commit -m "initial" &&
+
+	git clone --bare "file://$(pwd)/default-filter-src" default-filter-srv.bare &&
+	git -C default-filter-srv.bare config --local uploadpack.allowfilter 1 &&
+	git -C default-filter-srv.bare config --local uploadpack.allowanysha1inwant 1
+'
+
+test_expect_success 'clone with clone.<url>.defaultObjectFilter applies filter' '
+	SERVER_URL="file://$(pwd)/default-filter-srv.bare" &&
+	git -c "clone.$SERVER_URL.defaultObjectFilter=blob:limit=1k" clone \
+		"$SERVER_URL" default-filter-clone &&
+
+	test "$(git -C default-filter-clone config --local remote.origin.promisor)" = "true" &&
+	test "$(git -C default-filter-clone config --local remote.origin.partialclonefilter)" = "blob:limit=1024"
+'
+
+test_expect_success 'clone with --filter overrides clone.<url>.defaultObjectFilter' '
+	SERVER_URL="file://$(pwd)/default-filter-srv.bare" &&
+	git -c "clone.$SERVER_URL.defaultObjectFilter=blob:limit=1k" \
+		clone --filter=blob:none "$SERVER_URL" default-filter-override &&
+
+	test "$(git -C default-filter-override config --local remote.origin.partialclonefilter)" = "blob:none"
+'
+
+test_expect_success 'clone with clone.<url>.defaultObjectFilter=blob:none works' '
+	SERVER_URL="file://$(pwd)/default-filter-srv.bare" &&
+	git -c "clone.$SERVER_URL.defaultObjectFilter=blob:none" clone \
+		"$SERVER_URL" default-filter-blobnone &&
+
+	test "$(git -C default-filter-blobnone config --local remote.origin.promisor)" = "true" &&
+	test "$(git -C default-filter-blobnone config --local remote.origin.partialclonefilter)" = "blob:none"
+'
+
+test_expect_success 'clone.<url>.defaultObjectFilter with tree:0 works' '
+	SERVER_URL="file://$(pwd)/default-filter-srv.bare" &&
+	git -c "clone.$SERVER_URL.defaultObjectFilter=tree:0" clone \
+		"$SERVER_URL" default-filter-tree0 &&
+
+	test "$(git -C default-filter-tree0 config --local remote.origin.promisor)" = "true" &&
+	test "$(git -C default-filter-tree0 config --local remote.origin.partialclonefilter)" = "tree:0"
+'
+
+test_expect_success 'most specific URL match wins for clone.defaultObjectFilter' '
+	SERVER_URL="file://$(pwd)/default-filter-srv.bare" &&
+	git \
+		-c "clone.file://.defaultObjectFilter=blob:limit=1k" \
+		-c "clone.$SERVER_URL.defaultObjectFilter=blob:none" \
+		clone "$SERVER_URL" default-filter-url-specific &&
+
+	test "$(git -C default-filter-url-specific config --local remote.origin.partialclonefilter)" = "blob:none"
+'
+
+test_expect_success 'non-matching URL does not apply clone.defaultObjectFilter' '
+	git \
+		-c "clone.https://other.example.com/.defaultObjectFilter=blob:none" \
+		clone "file://$(pwd)/default-filter-srv.bare" default-filter-url-nomatch &&
+
+	test_must_fail git -C default-filter-url-nomatch config --local remote.origin.promisor
+'
+
+test_expect_success 'bare clone.defaultObjectFilter without URL is ignored' '
+	git -c clone.defaultObjectFilter=blob:none \
+		clone "file://$(pwd)/default-filter-srv.bare" default-filter-bare-key &&
+
+	test_must_fail git -C default-filter-bare-key config --local remote.origin.promisor
+'
 
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd

base-commit: 7b2bccb0d58d4f24705bf985de1f4612e4cf06e5
-- 
gitgitgadget

  parent reply	other threads:[~2026-03-05  0:57 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-01 16:44 [PATCH] fetch, clone: add fetch.blobSizeLimit config Alan Braithwaite via GitGitGadget
2026-03-02 11:53 ` Patrick Steinhardt
2026-03-02 18:28   ` Jeff King
2026-03-02 18:57   ` Junio C Hamano
2026-03-02 21:36     ` Alan Braithwaite
2026-03-03  6:30       ` Patrick Steinhardt
2026-03-03 14:00         ` Alan Braithwaite
2026-03-03 15:08           ` Patrick Steinhardt
2026-03-03 17:58             ` Junio C Hamano
2026-03-04  5:07               ` Patrick Steinhardt
2026-03-03 17:05         ` Junio C Hamano
2026-03-03 14:34       ` Jeff King
2026-03-05  0:57 ` Alan Braithwaite via GitGitGadget [this message]
2026-03-05 19:01   ` [PATCH v2] clone: add clone.<url>.defaultObjectFilter config Junio C Hamano
2026-03-05 23:11     ` Alan Braithwaite
2026-03-06  6:55   ` [PATCH v3] " Alan Braithwaite via GitGitGadget
2026-03-06 10:39     ` brian m. carlson
2026-03-06 19:33       ` Junio C Hamano
2026-03-06 21:50         ` Alan Braithwaite
2026-03-06 21:47     ` [PATCH v4] " Alan Braithwaite via GitGitGadget
2026-03-06 22:18       ` Junio C Hamano
2026-03-07  1:04         ` Alan Braithwaite
2026-03-07  1:33       ` [PATCH v5] " Alan Braithwaite via GitGitGadget
2026-03-11  7:44         ` Patrick Steinhardt
2026-03-15  1:33           ` Alan Braithwaite
2026-03-15  5:37         ` [PATCH v6] " Alan Braithwaite via GitGitGadget
2026-03-15 21:32           ` Junio C Hamano
2026-03-16  7:47           ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.2058.v2.git.1772672251281.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=alan@braithwaite.dev \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox