All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Alan Braithwaite via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org,  ps@pks.im,  christian.couder@gmail.com,
	jonathantanmy@google.com,  me@ttaylorr.com,
	 Jeff King <peff@peff.net>,
	Alan Braithwaite <alan@braithwaite.dev>
Subject: Re: [PATCH v2] clone: add clone.<url>.defaultObjectFilter config
Date: Thu, 05 Mar 2026 11:01:43 -0800	[thread overview]
Message-ID: <xmqqcy1i3xt4.fsf@gitster.g> (raw)
In-Reply-To: <pull.2058.v2.git.1772672251281.gitgitgadget@gmail.com> (Alan Braithwaite via GitGitGadget's message of "Thu, 05 Mar 2026 00:57:31 +0000")

"Alan Braithwaite via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Alan Braithwaite <alan@braithwaite.dev>
>
> Add a new configuration option that lets users specify a default
> partial clone filter per URL pattern.  When cloning a repository
> whose URL matches a configured pattern, git-clone automatically
> applies the filter, equivalent to passing --filter on the command
> line.
>
>     [clone "https://github.com/"]
>         defaultObjectFilter = blob:limit=5m
>
>     [clone "https://internal.corp.com/large-project/"]
>         defaultObjectFilter = blob:none
>
> URL matching uses the existing urlmatch_config_entry() infrastructure,
> following the same rules as http.<url>.* — you can match a domain,
> a namespace path, or a specific project, and the most specific match
> wins.
>
> The config only affects the initial clone.  Once the clone completes,
> the filter is recorded in remote.<name>.partialCloneFilter, so
> subsequent fetches inherit it automatically.  An explicit --filter
> flag on the command line takes precedence.

The motivation behind the change is clearly described.  Reusing the
existing urlmatch_config_entry() infrastructure is very appropriate
as it makes the feature intuitive for those familiar with
http.<url>.* settings.

> Only the URL-qualified form (clone.<url>.defaultObjectFilter) is
> honored; a bare clone.defaultObjectFilter without a URL subsection
> is ignored.

This is unlike how http.<url>.<var> configuration variables work,
and while I can see that server operators may not want to see users
set clone.defaultObjectFilter and affect traffic with _all_ sites, I
am afraid that this design choice may appear a bit counter-intuitive
to end users.


> Signed-off-by: Alan Braithwaite <alan@braithwaite.dev>

>  Documentation/config/clone.adoc | 26 ++++++++++++
>  builtin/clone.c                 | 68 ++++++++++++++++++++++++++++++
>  t/t5616-partial-clone.sh        | 73 +++++++++++++++++++++++++++++++++
>  3 files changed, 167 insertions(+)
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 45d8fa0eed..5e20b5343d 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -44,6 +44,7 @@
>  #include "path.h"
>  #include "pkt-line.h"
>  #include "list-objects-filter-options.h"
> +#include "urlmatch.h"
>  #include "hook.h"
>  #include "bundle.h"
>  #include "bundle-uri.h"
> @@ -757,6 +758,65 @@ static int git_clone_config(const char *k, const char *v,
>  	return git_default_config(k, v, ctx, cb);
>  }
>  
> +struct clone_filter_data {
> +	char *default_object_filter;
> +};
> +
> +static int clone_filter_collect(const char *var, const char *value,
> +				const struct config_context *ctx UNUSED,
> +				void *cb)
> +{
> +	struct clone_filter_data *data = cb;
> +
> +	if (!strcmp(var, "clone.defaultobjectfilter")) {
> +		free(data->default_object_filter);
> +		data->default_object_filter = xstrdup(value);
> +	}
> +	return 0;
> +}

This will segfault with a "value-less truth", i.e.,

	[clone "<URL>"]
		defaultObjectFilter

so there should be 

		if (!value)
			return config_error_nonbool(var);

in it.

I cannot convince myself that a new structure only to hold a single
"char *" member is not over-engineering.  Wouldn't it work equally
well (unless you have an immediate plan to add more members to the
struct, that is):

	char **filter_spec_p = cb;

	if (!strcmp(var, "clone.defaultobjectfilter")) {
		if (!value)
			retgurn config_error_nonbool(var);
		free(*filter_spec_p);
		*filter_spec_p = xstrdup(value);
	}
	return 0;

> +/*
> + * Look up clone.<url>.defaultObjectFilter using the urlmatch
> + * infrastructure.  Only URL-qualified forms are supported; a bare
> + * clone.defaultObjectFilter (without a URL) is ignored.
> + */
> +static char *get_default_object_filter(const char *url)
> +{
> +	struct urlmatch_config config = URLMATCH_CONFIG_INIT;
> +	struct clone_filter_data data = { 0 };
> +	struct string_list_item *item;
> +	char *normalized_url;
> +
> +	config.section = "clone";
> +	config.key = "defaultobjectfilter";
> +	config.collect_fn = clone_filter_collect;
> +	config.cascade_fn = git_clone_config;
> +	config.cb = &data;
> +
> +	normalized_url = url_normalize(url, &config.url);
> +
> +	repo_config(the_repository, urlmatch_config_entry, &config);
> +	free(normalized_url);

This forces a second full scan of the configuration space.  But it
cannot be avoided, because the existing repo_config() call has to
happen early before we call parse_options() to give us the
configured default to overwrite with the command line, and we would
not know what our URL is before we called parse_options().

However, I thihk you want to leave the .cascade_fn NULL; you do not
want urlmatch_config_entry() to call git_clone_config() AGAIN on the
configuration variables, as the first call to repo_config() before
we call parse_options() should have already handled them, no?

Thanks.

  reply	other threads:[~2026-03-05 19:01 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-01 16:44 [PATCH] fetch, clone: add fetch.blobSizeLimit config Alan Braithwaite via GitGitGadget
2026-03-02 11:53 ` Patrick Steinhardt
2026-03-02 18:28   ` Jeff King
2026-03-02 18:57   ` Junio C Hamano
2026-03-02 21:36     ` Alan Braithwaite
2026-03-03  6:30       ` Patrick Steinhardt
2026-03-03 14:00         ` Alan Braithwaite
2026-03-03 15:08           ` Patrick Steinhardt
2026-03-03 17:58             ` Junio C Hamano
2026-03-04  5:07               ` Patrick Steinhardt
2026-03-03 17:05         ` Junio C Hamano
2026-03-03 14:34       ` Jeff King
2026-03-05  0:57 ` [PATCH v2] clone: add clone.<url>.defaultObjectFilter config Alan Braithwaite via GitGitGadget
2026-03-05 19:01   ` Junio C Hamano [this message]
2026-03-05 23:11     ` Alan Braithwaite
2026-03-06  6:55   ` [PATCH v3] " Alan Braithwaite via GitGitGadget
2026-03-06 10:39     ` brian m. carlson
2026-03-06 19:33       ` Junio C Hamano
2026-03-06 21:50         ` Alan Braithwaite
2026-03-06 21:47     ` [PATCH v4] " Alan Braithwaite via GitGitGadget
2026-03-06 22:18       ` Junio C Hamano
2026-03-07  1:04         ` Alan Braithwaite
2026-03-07  1:33       ` [PATCH v5] " Alan Braithwaite via GitGitGadget
2026-03-11  7:44         ` Patrick Steinhardt
2026-03-15  1:33           ` Alan Braithwaite
2026-03-15  5:37         ` [PATCH v6] " Alan Braithwaite via GitGitGadget
2026-03-15 21:32           ` Junio C Hamano
2026-03-16  7:47           ` Patrick Steinhardt
2026-05-11  2:38             ` Junio C Hamano
2026-05-11  7:30               ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqcy1i3xt4.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=alan@braithwaite.dev \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.