git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Christian Couder <christian.couder@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Taylor Blau <me@ttaylorr.com>,
	Eric Sunshine <sunshine@sunshineco.com>,
	Karthik Nayak <karthik.188@gmail.com>,
	Kristoffer Haugsbakk <kristofferhaugsbakk@fastmail.com>,
	"brian m . carlson" <sandals@crustytoothpaste.net>,
	"Randall S . Becker" <rsbecker@nexbridge.com>,
	Christian Couder <chriscool@tuxfamily.org>
Subject: Re: [PATCH v4 4/6] Add 'promisor-remote' capability to protocol v2
Date: Thu, 30 Jan 2025 11:51:07 +0100	[thread overview]
Message-ID: <Z5tZm2dNCPt6Y37D@pks.im> (raw)
In-Reply-To: <20250127151701.2321341-5-christian.couder@gmail.com>

On Mon, Jan 27, 2025 at 04:16:59PM +0100, Christian Couder wrote:
> When a server S knows that some objects from a repository are available
> from a promisor remote X, S might want to suggest to a client C cloning
> or fetching the repo from S that C may use X directly instead of S for
> these objects.

A lot of the commit message seems to be duplicated with the technical
documentation that you add. I wonder whether it would make sense to
simply refer to that instead of repeating all of it? That would make it
easier to spot the actually-important bits in the commit message that
add context to the patch.

One very important bit of context that I was lacking is what exactly we
wire up and where we do so. I have been searching for longer than I want
to admit where the client ends up using the promisor remotes, until I
eventually figured out that the client-side isn't wired up at all. It
makes sense in retrospect, but it would've been nice if the reader was
guided a bit.

> diff --git a/Documentation/gitprotocol-v2.txt b/Documentation/gitprotocol-v2.txt
> index 1652fef3ae..f25a9a6ad8 100644
> --- a/Documentation/gitprotocol-v2.txt
> +++ b/Documentation/gitprotocol-v2.txt
> @@ -781,6 +781,60 @@ retrieving the header from a bundle at the indicated URI, and thus
>  save themselves and the server(s) the request(s) needed to inspect the
>  headers of that bundle or bundles.
>  
> +promisor-remote=<pr-infos>
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The server may advertise some promisor remotes it is using or knows
> +about to a client which may want to use them as its promisor remotes,
> +instead of this repository. In this case <pr-infos> should be of the
> +form:
> +
> +	pr-infos = pr-info | pr-infos ";" pr-info
> +
> +	pr-info = "name=" pr-name | "name=" pr-name "," "url=" pr-url
> +
> +where `pr-name` is the urlencoded name of a promisor remote, and
> +`pr-url` the urlencoded URL of that promisor remote.
> +
> +In this case, if the client decides to use one or more promisor
> +remotes the server advertised, it can reply with
> +"promisor-remote=<pr-names>" where <pr-names> should be of the form:
> +
> +	pr-names = pr-name | pr-names ";" pr-name
> +
> +where `pr-name` is the urlencoded name of a promisor remote the server
> +advertised and the client accepts.
> +
> +Note that, everywhere in this document, `pr-name` MUST be a valid
> +remote name, and the ';' and ',' characters MUST be encoded if they
> +appear in `pr-name` or `pr-url`.
> +
> +If the server doesn't know any promisor remote that could be good for
> +a client to use, or prefers a client not to use any promisor remote it
> +uses or knows about, it shouldn't advertise the "promisor-remote"
> +capability at all.
> +
> +In this case, or if the client doesn't want to use any promisor remote
> +the server advertised, the client shouldn't advertise the
> +"promisor-remote" capability at all in its reply.
> +
> +The "promisor.advertise" and "promisor.acceptFromServer" configuration
> +options can be used on the server and client side respectively to

s/respectively//, as you already say that in the next line.

> +control what they advertise or accept respectively. See the
> +documentation of these configuration options for more information.
> +
> +Note that in the future it would be nice if the "promisor-remote"
> +protocol capability could be used by the server, when responding to
> +`git fetch` or `git clone`, to advertise better-connected remotes that
> +the client can use as promisor remotes, instead of this repository, so
> +that the client can lazily fetch objects from these other
> +better-connected remotes. This would require the server to omit in its
> +response the objects available on the better-connected remotes that
> +the client has accepted. This hasn't been implemented yet though. So
> +for now this "promisor-remote" capability is useful only when the
> +server advertises some promisor remotes it already uses to borrow
> +objects from.

I'd leave away this bit as it doesn't really add a lot to the document.
It's a possibility for the future, but without it being implemented
anywhere it's not that helpful from my point of view.

> diff --git a/promisor-remote.c b/promisor-remote.c
> index c714f4f007..5ac282ed27 100644
> --- a/promisor-remote.c
> +++ b/promisor-remote.c
> @@ -292,3 +306,185 @@ void promisor_remote_get_direct(struct repository *repo,
>  	if (to_free)
>  		free(remaining_oids);
>  }
> +
> +static int allow_unsanitized(char ch)
> +{
> +	if (ch == ',' || ch == ';' || ch == '%')
> +		return 0;
> +	return ch > 32 && ch < 127;
> +}

Isn't this too lenient? It would allow also allow e.g. '=' and all kinds
of other characters. This does make sense for URLs, but it doesn't make
sense for remote names as they aren't supposed to contain punctuation in
the first place. So for these remote names I'd think we should be way
stricter and return an error in case they contain non-alphanumeric data.

> +static void promisor_info_vecs(struct repository *repo,
> +			       struct strvec *names,
> +			       struct strvec *urls)

I wonder whether it would make more sense to track these as a strmap
instead of two arrays which are expected to have related entries in the
same place.

> +{
> +	struct promisor_remote *r;
> +
> +	promisor_remote_init(repo);
> +
> +	for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
> +		char *url;
> +		char *url_key = xstrfmt("remote.%s.url", r->name);
> +
> +		strvec_push(names, r->name);
> +		strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url);
> +
> +		free(url);
> +		free(url_key);
> +	}
> +}
> +
> +char *promisor_remote_info(struct repository *repo)
> +{
> +	struct strbuf sb = STRBUF_INIT;
> +	int advertise_promisors = 0;
> +	struct strvec names = STRVEC_INIT;
> +	struct strvec urls = STRVEC_INIT;
> +
> +	git_config_get_bool("promisor.advertise", &advertise_promisors);
> +
> +	if (!advertise_promisors)
> +		return NULL;
> +
> +	promisor_info_vecs(repo, &names, &urls);
> +
> +	if (!names.nr)
> +		return NULL;
> +
> +	for (size_t i = 0; i < names.nr; i++) {
> +		if (i)
> +			strbuf_addch(&sb, ';');
> +		strbuf_addstr(&sb, "name=");
> +		strbuf_addstr_urlencode(&sb, names.v[i], allow_unsanitized);
> +		if (urls.v[i]) {
> +			strbuf_addstr(&sb, ",url=");
> +			strbuf_addstr_urlencode(&sb, urls.v[i], allow_unsanitized);
> +		}
> +	}
> +
> +	redact_non_printables(&sb);

So here we replace non-printable characters with dots as far as I
understand. But didn't we just URL-encode the strings? So is there ever
a possibility for non-printable characters here?

> +	strvec_clear(&names);
> +	strvec_clear(&urls);
> +
> +	return strbuf_detach(&sb, NULL);
> +}
> +
> +enum accept_promisor {
> +	ACCEPT_NONE = 0,
> +	ACCEPT_ALL
> +};
> +
> +static int should_accept_remote(enum accept_promisor accept,
> +				const char *remote_name UNUSED,
> +				const char *remote_url UNUSED)
> +{
> +	if (accept == ACCEPT_ALL)
> +		return 1;
> +
> +	BUG("Unhandled 'enum accept_promisor' value '%d'", accept);
> +}
> +
> +static void filter_promisor_remote(struct strvec *accepted, const char *info)
> +{
> +	struct strbuf **remotes;
> +	const char *accept_str;
> +	enum accept_promisor accept = ACCEPT_NONE;
> +
> +	if (!git_config_get_string_tmp("promisor.acceptfromserver", &accept_str)) {
> +		if (!accept_str || !*accept_str || !strcasecmp("None", accept_str))
> +			accept = ACCEPT_NONE;
> +		else if (!strcasecmp("All", accept_str))
> +			accept = ACCEPT_ALL;
> +		else
> +			warning(_("unknown '%s' value for '%s' config option"),
> +				accept_str, "promisor.acceptfromserver");
> +	}
> +
> +	if (accept == ACCEPT_NONE)
> +		return;
> +
> +	/* Parse remote info received */
> +
> +	remotes = strbuf_split_str(info, ';', 0);
> +
> +	for (size_t i = 0; remotes[i]; i++) {
> +		struct strbuf **elems;
> +		const char *remote_name = NULL;
> +		const char *remote_url = NULL;
> +		char *decoded_name = NULL;
> +		char *decoded_url = NULL;
> +
> +		strbuf_strip_suffix(remotes[i], ";");
> +		elems = strbuf_split(remotes[i], ',');
> +
> +		for (size_t j = 0; elems[j]; j++) {
> +			int res;
> +			strbuf_strip_suffix(elems[j], ",");
> +			res = skip_prefix(elems[j]->buf, "name=", &remote_name) ||
> +				skip_prefix(elems[j]->buf, "url=", &remote_url);
> +			if (!res)
> +				warning(_("unknown element '%s' from remote info"),
> +					elems[j]->buf);
> +		}
> +
> +		if (remote_name)
> +			decoded_name = url_percent_decode(remote_name);
> +		if (remote_url)
> +			decoded_url = url_percent_decode(remote_url);

This is data we have received from a potentially-untrusted remote, so we
should double-check that the data we have received doesn't contain any
weird characters:

  - For the remote name we should verify that it consists only of
    alphanumeric characters.

  - For the remote URL we need to verify that it's a proper URL without
    any newlines, non-printable characters or anything else.

We'll eventually end up storing that data in the configuration, so these
verifications are quite important so that an adversarial server cannot
perform config-injection and thus cause remote code execution.

[snip]
> +void mark_promisor_remotes_as_accepted(struct repository *r, const char *remotes)
> +{
> +	struct strbuf **accepted_remotes = strbuf_split_str(remotes, ';', 0);
> +
> +	for (size_t i = 0; accepted_remotes[i]; i++) {
> +		struct promisor_remote *p;
> +		char *decoded_remote;
> +
> +		strbuf_strip_suffix(accepted_remotes[i], ";");
> +		decoded_remote = url_percent_decode(accepted_remotes[i]->buf);
> +
> +		p = repo_promisor_remote_find(r, decoded_remote);
> +		if (p)
> +			p->accepted = 1;
> +		else
> +			warning(_("accepted promisor remote '%s' not found"),
> +				decoded_remote);

My initial understanding of this code was that it is about the
client-side accepting a remote, but this is about the server-side and
tracks whether a promisor remote has been accepted by the client. It
feels a bit weird to modify semi-global state for this, as I'd have
rather expected that we pass around a vector of accepted remotes
instead.

But I guess ultimately this isn't too bad. It would be nice though if
it was more obvious whether we're on the server- or client-side.

> diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh
> new file mode 100755
> index 0000000000..0390c1dbad
> --- /dev/null
> +++ b/t/t5710-promisor-remote-capability.sh
> @@ -0,0 +1,244 @@
[snip]
> +initialize_server () {
> +	count="$1"
> +	missing_oids="$2"
> +
> +	# Repack everything first
> +	git -C server -c repack.writebitmaps=false repack -a -d &&
> +
> +	# Remove promisor file in case they exist, useful when reinitializing
> +	rm -rf server/objects/pack/*.promisor &&
> +
> +	# Repack without the largest object and create a promisor pack on server
> +	git -C server -c repack.writebitmaps=false repack -a -d \
> +	    --filter=blob:limit=5k --filter-to="$(pwd)/pack" &&
> +	promisor_file=$(ls server/objects/pack/*.pack | sed "s/\.pack/.promisor/") &&
> +	>"$promisor_file" &&
> +
> +	# Check objects missing on the server
> +	check_missing_objects server "$count" "$missing_oids"
> +}
> +
> +copy_to_server2 () {

Nit: `server2` could be renamed to `promisor` to make the relation
between the two servers more obvious.

> diff --git a/upload-pack.c b/upload-pack.c
> index 728b2477fc..7498b45e2e 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -319,6 +320,8 @@ static void create_pack_file(struct upload_pack_data *pack_data,
>  		strvec_push(&pack_objects.args, "--delta-base-offset");
>  	if (pack_data->use_include_tag)
>  		strvec_push(&pack_objects.args, "--include-tag");
> +	if (repo_has_accepted_promisor_remote(the_repository))
> +		strvec_push(&pack_objects.args, "--missing=allow-promisor");

This is nice and simple, I like it.

Patrick

  reply	other threads:[~2025-01-30 10:51 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-31 13:40 [PATCH 0/4] Introduce a "promisor-remote" capability Christian Couder
2024-07-31 13:40 ` [PATCH 1/4] version: refactor strbuf_sanitize() Christian Couder
2024-07-31 17:18   ` Junio C Hamano
2024-08-20 11:29     ` Christian Couder
2024-07-31 13:40 ` [PATCH 2/4] strbuf: refactor strbuf_trim_trailing_ch() Christian Couder
2024-07-31 17:29   ` Junio C Hamano
2024-07-31 21:49     ` Taylor Blau
2024-08-20 11:29       ` Christian Couder
2024-08-20 11:29     ` Christian Couder
2024-07-31 13:40 ` [PATCH 3/4] Add 'promisor-remote' capability to protocol v2 Christian Couder
2024-07-31 15:40   ` Taylor Blau
2024-08-20 11:32     ` Christian Couder
2024-08-20 17:01       ` Junio C Hamano
2024-09-10 16:32         ` Christian Couder
2024-07-31 16:16   ` Taylor Blau
2024-08-20 11:32     ` Christian Couder
2024-08-20 16:55       ` Junio C Hamano
2024-09-10 16:32       ` Christian Couder
2024-09-10 17:46         ` Junio C Hamano
2024-07-31 18:25   ` Junio C Hamano
2024-07-31 19:34     ` Junio C Hamano
2024-08-20 12:21     ` Christian Couder
2024-08-05 13:48   ` Patrick Steinhardt
2024-08-19 20:00     ` Junio C Hamano
2024-09-10 16:31     ` Christian Couder
2024-07-31 13:40 ` [PATCH 4/4] promisor-remote: check advertised name or URL Christian Couder
2024-07-31 18:35   ` Junio C Hamano
2024-09-10 16:32     ` Christian Couder
2024-07-31 16:01 ` [PATCH 0/4] Introduce a "promisor-remote" capability Junio C Hamano
2024-07-31 16:17 ` Taylor Blau
2024-09-10 16:29 ` [PATCH v2 " Christian Couder
2024-09-10 16:29   ` [PATCH v2 1/4] version: refactor strbuf_sanitize() Christian Couder
2024-09-10 16:29   ` [PATCH v2 2/4] strbuf: refactor strbuf_trim_trailing_ch() Christian Couder
2024-09-10 16:29   ` [PATCH v2 3/4] Add 'promisor-remote' capability to protocol v2 Christian Couder
2024-09-30  7:56     ` Patrick Steinhardt
2024-09-30 13:28       ` Christian Couder
2024-10-01 10:14         ` Patrick Steinhardt
2024-10-01 18:47           ` Junio C Hamano
2024-11-06 14:04     ` Patrick Steinhardt
2024-11-28  5:47     ` Junio C Hamano
2024-11-28 15:31       ` Christian Couder
2024-11-29  1:31         ` Junio C Hamano
2024-09-10 16:30   ` [PATCH v2 4/4] promisor-remote: check advertised name or URL Christian Couder
2024-09-30  7:57     ` Patrick Steinhardt
2024-09-26 18:09   ` [PATCH v2 0/4] Introduce a "promisor-remote" capability Junio C Hamano
2024-09-27  9:15     ` Christian Couder
2024-09-27 22:48       ` Junio C Hamano
2024-09-27 23:31         ` rsbecker
2024-09-28 10:56           ` Kristoffer Haugsbakk
2024-09-30  7:57         ` Patrick Steinhardt
2024-09-30  9:17           ` Christian Couder
2024-09-30 16:52             ` Junio C Hamano
2024-10-01 10:14             ` Patrick Steinhardt
2024-09-30 16:34           ` Junio C Hamano
2024-09-30 21:26           ` brian m. carlson
2024-09-30 22:27             ` Junio C Hamano
2024-10-01 10:13               ` Patrick Steinhardt
2024-12-06 12:42   ` [PATCH v3 0/5] " Christian Couder
2024-12-06 12:42     ` [PATCH v3 1/5] version: refactor strbuf_sanitize() Christian Couder
2024-12-07  6:21       ` Junio C Hamano
2025-01-27 15:07         ` Christian Couder
2024-12-06 12:42     ` [PATCH v3 2/5] strbuf: refactor strbuf_trim_trailing_ch() Christian Couder
2024-12-07  6:35       ` Junio C Hamano
2025-01-27 15:07         ` Christian Couder
2024-12-16 11:47       ` karthik nayak
2024-12-06 12:42     ` [PATCH v3 3/5] Add 'promisor-remote' capability to protocol v2 Christian Couder
2024-12-07  7:59       ` Junio C Hamano
2025-01-27 15:08         ` Christian Couder
2024-12-06 12:42     ` [PATCH v3 4/5] promisor-remote: check advertised name or URL Christian Couder
2024-12-06 12:42     ` [PATCH v3 5/5] doc: add technical design doc for large object promisors Christian Couder
2024-12-10  1:28       ` Junio C Hamano
2025-01-27 15:12         ` Christian Couder
2024-12-10 11:43       ` Junio C Hamano
2024-12-16  9:00         ` Patrick Steinhardt
2025-01-27 15:11         ` Christian Couder
2025-01-27 18:02           ` Junio C Hamano
2025-02-18 11:42             ` Christian Couder
2024-12-09  8:04     ` [PATCH v3 0/5] Introduce a "promisor-remote" capability Junio C Hamano
2024-12-09 10:40       ` Christian Couder
2024-12-09 10:42         ` Christian Couder
2024-12-09 23:01         ` Junio C Hamano
2025-01-27 15:05           ` Christian Couder
2025-01-27 19:38             ` Junio C Hamano
2025-01-27 15:16     ` [PATCH v4 0/6] " Christian Couder
2025-01-27 15:16       ` [PATCH v4 1/6] version: replace manual ASCII checks with isprint() for clarity Christian Couder
2025-01-27 15:16       ` [PATCH v4 2/6] version: refactor redact_non_printables() Christian Couder
2025-01-27 15:16       ` [PATCH v4 3/6] version: make redact_non_printables() non-static Christian Couder
2025-01-30 10:51         ` Patrick Steinhardt
2025-02-18 11:42           ` Christian Couder
2025-01-27 15:16       ` [PATCH v4 4/6] Add 'promisor-remote' capability to protocol v2 Christian Couder
2025-01-30 10:51         ` Patrick Steinhardt [this message]
2025-02-18 11:41           ` Christian Couder
2025-01-27 15:17       ` [PATCH v4 5/6] promisor-remote: check advertised name or URL Christian Couder
2025-01-27 23:48         ` Junio C Hamano
2025-01-28  0:01           ` Junio C Hamano
2025-01-30 10:51           ` Patrick Steinhardt
2025-02-18 11:41             ` Christian Couder
2025-02-18 11:42           ` Christian Couder
2025-01-27 15:17       ` [PATCH v4 6/6] doc: add technical design doc for large object promisors Christian Couder
2025-01-27 21:14       ` [PATCH v4 0/6] Introduce a "promisor-remote" capability Junio C Hamano
2025-02-18 11:40         ` Christian Couder
2025-02-18 11:32       ` [PATCH v5 0/3] " Christian Couder
2025-02-18 11:32         ` [PATCH v5 1/3] Add 'promisor-remote' capability to protocol v2 Christian Couder
2025-02-18 11:32         ` [PATCH v5 2/3] promisor-remote: check advertised name or URL Christian Couder
2025-02-18 11:32         ` [PATCH v5 3/3] doc: add technical design doc for large object promisors Christian Couder
2025-02-21  8:33           ` Patrick Steinhardt
2025-03-03 16:58             ` Junio C Hamano
2025-02-18 19:07         ` [PATCH v5 0/3] Introduce a "promisor-remote" capability Junio C Hamano
2025-02-21  8:34         ` Patrick Steinhardt
2025-02-21 18:40           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z5tZm2dNCPt6Y37D@pks.im \
    --to=ps@pks.im \
    --cc=chriscool@tuxfamily.org \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=karthik.188@gmail.com \
    --cc=kristofferhaugsbakk@fastmail.com \
    --cc=me@ttaylorr.com \
    --cc=rsbecker@nexbridge.com \
    --cc=sandals@crustytoothpaste.net \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).