From: Junio C Hamano <gitster@pobox.com>
To: "Ivan Frade via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Ivan Frade <ifrade@google.com>,
Jonathan Tan <jonathantanmy@google.com>
Subject: Re: [PATCH v4 1/2] fetch-pack: redact packfile urls in traces
Date: Wed, 27 Oct 2021 18:01:48 -0700 [thread overview]
Message-ID: <xmqq35omt11f.fsf@gitster.g> (raw)
In-Reply-To: <973a250752c39c3fe835d69f3fbe8f009fc4fa74.1635288599.git.gitgitgadget@gmail.com> (Ivan Frade via GitGitGadget's message of "Tue, 26 Oct 2021 22:49:58 +0000")
"Ivan Frade via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Ivan Frade <ifrade@google.com>
>
> In some setups, packfile uris act as bearer token. It is not
> recommended to expose them plainly in logs, although in special
> circunstances (e.g. debug) it makes sense to write them.
>
> Redact the packfile URL paths by default, unless the GIT_TRACE_REDACT
> variable is set to false. This mimics the redacting of the Authorization
> header in HTTP.
>
> Signed-off-by: Ivan Frade <ifrade@google.com>
> ---
> Documentation/git.txt | 5 +++--
> fetch-pack.c | 3 +++
> pkt-line.c | 40 ++++++++++++++++++++++++++++++++-
> pkt-line.h | 1 +
> t/t5702-protocol-v2.sh | 51 ++++++++++++++++++++++++++++++++++++++++++
> 5 files changed, 97 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/git.txt b/Documentation/git.txt
> index d63c65e67d8..f64c8ce5183 100644
> --- a/Documentation/git.txt
> +++ b/Documentation/git.txt
> @@ -832,8 +832,9 @@ for full details.
>
> `GIT_TRACE_REDACT`::
> By default, when tracing is activated, Git redacts the values of
> - cookies, the "Authorization:" header, and the "Proxy-Authorization:"
> - header. Set this variable to `0` to prevent this redaction.
> + cookies, the "Authorization:" header, the "Proxy-Authorization:"
> + header and packfile URLs. Set this variable to `0` to prevent this
> + redaction.
Just a curiosity. Do we call these packfile URI, or packfile URL?
> diff --git a/pkt-line.c b/pkt-line.c
> index 2dc8ac274bd..ba0a2d65f0c 100644
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -370,6 +370,31 @@ int packet_length(const char lenbuf_hex[4])
> return (val < 0) ? val : (val << 8) | hex2chr(lenbuf_hex + 2);
> }
>
> +static char *find_url_path(const char* buffer, int *path_len)
> +{
> + const char *URL_MARK = "://";
> + char *path = strstr(buffer, URL_MARK);
> + if (!path)
> + return NULL;
Hmph, the format we expect is "<hash> <uri>"; don't we need to
validate the leading <hash> followed by SP?
len = strspn(buffer, "0123456789abcdefABCDEF");
if (len != 40 || len != 64 || buffer[len] != ' ')
return NULL; /* required "<hash> SP" not seen */
path = strstr(buffer + len + 1, URL_MARK);
or somesuch?
> + path += strlen(URL_MARK);
OK.
> + while (*path && *path != '/')
> + path++;
strchr()?
> + if (!*path || !*(path + 1))
> + return NULL;
OK.
> + // position after '/'
No // comments in our codebase, please. Unless it is a borrowed
code, that is.
> + path++;
> +
> + if (path_len) {
> + char *url_end = strchrnul(path, ' ');
Is this because SP is not a valid character in packfile URI, or at
this point in the callchain it would be encoded or something? The
format we expect is "<hash> <uri>", so we shouldn't even have to
look for SP but just redact everything to the end, no?
Apparently we are assuming that there won't be more than one such
URL-path that needs redacting in the packet, but that is perfectly
fine, as the sole goal of this helper is to identify the packfile
URI packet and redact it in the log.
> + *path_len = url_end - path;
> + }
> +
> + return path;
> +}
> - packet_trace(buffer, len, 0);
> + if (options & PACKET_READ_REDACT_URL_PATH &&
> + (url_path_start = find_url_path(buffer, &url_path_len))) {
> + const char *redacted = "<redacted>";
> + struct strbuf tracebuf = STRBUF_INIT;
> + strbuf_insert(&tracebuf, 0, buffer, len);
> + strbuf_splice(&tracebuf, url_path_start - buffer,
> + url_path_len, redacted, strlen(redacted));
> + packet_trace(tracebuf.buf, tracebuf.len, 0);
> + strbuf_release(&tracebuf);
I briefly wondered if the repeated allocation (and more
fundamentally, preparing the redacted copy of packet whether we are
actually tracing the packet in the first place) is blindly wasting
the resources too much, but this only happens in the protocol header
part, so it might be OK.
Even if that is not the case, we should be able to update
fetch_pack.c::do_fetch_pack_v2() so that the REDACT_URL_PATH bit is
turned on in a much narrower region of code, right? Enable when we
enter the GET_PACK state and drop the bit when we are done with the
packfile URI packets, or something?
Thanks for working on this.
> + } else {
> + packet_trace(buffer, len, 0);
> + }
next prev parent reply other threads:[~2021-10-28 1:01 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-08 16:03 [PATCH 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget
2021-10-08 16:03 ` [PATCH 1/2] " Ivan Frade via GitGitGadget
2021-10-08 19:36 ` Ævar Arnfjörð Bjarmason
2021-10-08 23:15 ` Ivan Frade
2021-10-08 16:03 ` [PATCH 2/2] Documentation: packfile-uri hash can be longer than 40 hex chars Ivan Frade via GitGitGadget
2021-10-08 19:43 ` Ævar Arnfjörð Bjarmason
2021-10-09 2:20 ` [PATCH v2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget
2021-10-11 20:39 ` Junio C Hamano
2021-10-26 19:32 ` Ivan Frade
2021-10-19 22:57 ` [PATCH v3] " Ivan Frade via GitGitGadget
2021-10-20 11:41 ` Ævar Arnfjörð Bjarmason
2021-10-26 22:49 ` [PATCH v4 0/2] " Ivan Frade via GitGitGadget
2021-10-26 22:49 ` [PATCH v4 1/2] " Ivan Frade via GitGitGadget
2021-10-28 1:01 ` Junio C Hamano [this message]
2021-10-28 22:15 ` Ivan Frade
2021-10-28 22:46 ` Junio C Hamano
2021-10-26 22:49 ` [PATCH v4 2/2] http-fetch: redact url on die() message Ivan Frade via GitGitGadget
2021-10-28 16:39 ` Ævar Arnfjörð Bjarmason
2021-10-28 17:25 ` Eric Sunshine
2021-10-28 22:44 ` Ivan Frade
2021-10-28 22:41 ` Ivan Frade
2021-10-29 23:18 ` Junio C Hamano
2021-11-09 1:54 ` Ævar Arnfjörð Bjarmason
2021-10-28 22:51 ` [PATCH v5 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget
2021-10-28 22:51 ` [PATCH v5 1/2] " Ivan Frade via GitGitGadget
2021-10-28 23:21 ` Junio C Hamano
2021-10-29 18:42 ` Ivan Frade
2021-10-29 19:59 ` Junio C Hamano
2021-11-08 22:43 ` Jonathan Tan
2021-10-28 22:51 ` [PATCH v5 2/2] http-fetch: redact url on die() message Ivan Frade via GitGitGadget
2021-10-29 18:42 ` [PATCH v6 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget
2021-10-29 18:42 ` [PATCH v6 1/2] " Ivan Frade via GitGitGadget
2021-11-08 23:01 ` Jonathan Tan
2021-11-09 1:36 ` Ævar Arnfjörð Bjarmason
2021-11-10 23:44 ` Ivan Frade
2021-11-11 0:01 ` Ævar Arnfjörð Bjarmason
2021-11-10 21:18 ` Ivan Frade
2021-10-29 18:42 ` [PATCH v6 2/2] http-fetch: redact url on die() message Ivan Frade via GitGitGadget
2021-11-08 23:06 ` Jonathan Tan
2021-11-10 23:51 ` [PATCH v7 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget
2021-11-10 23:51 ` [PATCH v7 1/2] " Ivan Frade via GitGitGadget
2021-11-10 23:51 ` [PATCH v7 2/2] http-fetch: redact url on die() message Ivan Frade via GitGitGadget
2021-11-12 4:43 ` [PATCH v7 0/2] fetch-pack: redact packfile urls in traces Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqq35omt11f.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=ifrade@google.com \
--cc=jonathantanmy@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).