git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: gitgitgadget@gmail.com, git@vger.kernel.org,
	sunshine@sunshineco.com, ifrade@google.com
Subject: Re: [PATCH v6 1/2] fetch-pack: redact packfile urls in traces
Date: Tue, 09 Nov 2021 02:36:58 +0100	[thread overview]
Message-ID: <211109.86mtmedrhr.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <20211108230111.1101434-1-jonathantanmy@google.com>


On Mon, Nov 08 2021, Jonathan Tan wrote:

>> diff --git a/fetch-pack.c b/fetch-pack.c
>> index a9604f35a3e..62ea90541c5 100644
>> --- a/fetch-pack.c
>> +++ b/fetch-pack.c
>> @@ -1653,8 +1653,12 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
>>  				receive_wanted_refs(&reader, sought, nr_sought);
>>  
>>  			/* get the pack(s) */
>> +			if (git_env_bool("GIT_TRACE_REDACT", 1))
>> +				reader.options |= PACKET_READ_REDACT_URI_PATH;
>>  			if (process_section_header(&reader, "packfile-uris", 1))
>>  				receive_packfile_uris(&reader, &packfile_uris);
>> +			reader.options &= ~PACKET_READ_REDACT_URI_PATH;
>
> Probably worth commenting why you're resetting the flag (avoid the
> relatively expensive URI check when we don't need it).

...yeah...

>> diff --git a/pkt-line.c b/pkt-line.c
>> index 2dc8ac274bd..5a69ddc2e77 100644
>> --- a/pkt-line.c
>> +++ b/pkt-line.c
>> @@ -370,6 +370,31 @@ int packet_length(const char lenbuf_hex[4])
>>  	return (val < 0) ? val : (val << 8) | hex2chr(lenbuf_hex + 2);
>>  }
>>  
>> +static char *find_packfile_uri_path(const char *buffer)
>> +{
>> +	const char *URI_MARK = "://";
>> +	char *path;
>> +	int len;
>> +
>> +	/* First char is sideband mark */
>> +	buffer += 1;
>> +
>> +	len = strspn(buffer, "0123456789abcdefABCDEF");
>> +	if (len != (int)the_hash_algo->hexsz || buffer[len] != ' ')
>> +		return NULL; /* required "<hash>SP" not seen */
>
> Optional: As I said in my reply (just sent out), checking for both SHA-1
> and SHA-256 lengths is reasonable too.
>
> [1] https://lore.kernel.org/git/20211108224335.569596-1-jonathantanmy@google.com/

Correct me if I'm wrong, but I find it really strange that we're trying
to parse things in pkt-line.c like this.

We'll only get these from a client in code that's invoked in
fetch-pack.c, specifically the process_section_header() quoted above,
no?

From there we'll call packet_reader_read(), which will call
packet_read_with_status(), and from there we'll call packet_trace().

Then right after all this happens we've got a loop that parses out these
packfile URIs, including this being-done-first-here parsing of the hex
value just for logging, except in pkt-line.c we've lost the information
about what hash algorithm length we should be using, which fetch-pack.c
of course needs to know.

Why can't that process_section_header() in fetch-pack.c just be made to
call some pkt-line.c API saying "don't log yet", i.e. something like
this pseudocode:

diff --git a/fetch-pack.c b/fetch-pack.c
index a9604f35a3e..31f5ee7fc6b 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1518,14 +1518,18 @@ static void receive_wanted_refs(struct packet_reader *reader,
 static void receive_packfile_uris(struct packet_reader *reader,
                                  struct string_list *uris)
 {
+       struct string_list log = STRING_LIST_INIT_DUP;
+
        process_section_header(reader, "packfile-uris", 0);
-       while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
+       while (packet_reader_read_log_to(reader, &log) == PACKET_READ_NORMAL) {
                if (reader->pktlen < the_hash_algo->hexsz ||
                    reader->line[the_hash_algo->hexsz] != ' ')
                        die("expected '<hash> <uri>', got: %s\n", reader->line);
 
+               /* move the parsing of the URLs here */
                string_list_append(uris, reader->line);
        }
+       log_stuff(&log);
        if (reader->status != PACKET_READ_DELIM)
                die("expected DELIM");
 }

I.e. we'll eventually call trace_strbuf() in pkt-line.c, and we know
that we're doing these packfile-uris, and we know that we're just about
to parse them. Let's just:

 1. Start reading the section
 2. Turn off tracing
 3. Parse the URIs as we go
 3. When done (or on the fly), scrub URIs, log any backlog suppressed trace, and turn on tracing again

Instead of:

 1. Set a flag to scrub stuff
 2. Because of the disconnect between fetch-pack.c and pkt-line.c,
    effectively implement a new parser for data we're already going to be
    parsing some microseconds later during the course of the request.

That "turn off the trace" could be passing down a string_list/strbuf, or
even doing the same via a nev member in "struct packet_reader", both
would be simpler than needing to re-do the parse. Probably simplest is just:

    struct string_list log = STRING_LIST_INIT_DUP;

    reader.deferred_trace = &log;
    /* packet_reader_read() etc. code, unchanged from now */
    /* parse URIs (just move the existing code around a bit) */
    packet_reader.deferred_trace = NULL;
    for_each...(item, &log)
        trace_strbuf(...);

  reply	other threads:[~2021-11-09  1:53 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-08 16:03 [PATCH 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget
2021-10-08 16:03 ` [PATCH 1/2] " Ivan Frade via GitGitGadget
2021-10-08 19:36   ` Ævar Arnfjörð Bjarmason
2021-10-08 23:15     ` Ivan Frade
2021-10-08 16:03 ` [PATCH 2/2] Documentation: packfile-uri hash can be longer than 40 hex chars Ivan Frade via GitGitGadget
2021-10-08 19:43   ` Ævar Arnfjörð Bjarmason
2021-10-09  2:20 ` [PATCH v2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget
2021-10-11 20:39   ` Junio C Hamano
2021-10-26 19:32     ` Ivan Frade
2021-10-19 22:57   ` [PATCH v3] " Ivan Frade via GitGitGadget
2021-10-20 11:41     ` Ævar Arnfjörð Bjarmason
2021-10-26 22:49     ` [PATCH v4 0/2] " Ivan Frade via GitGitGadget
2021-10-26 22:49       ` [PATCH v4 1/2] " Ivan Frade via GitGitGadget
2021-10-28  1:01         ` Junio C Hamano
2021-10-28 22:15           ` Ivan Frade
2021-10-28 22:46             ` Junio C Hamano
2021-10-26 22:49       ` [PATCH v4 2/2] http-fetch: redact url on die() message Ivan Frade via GitGitGadget
2021-10-28 16:39         ` Ævar Arnfjörð Bjarmason
2021-10-28 17:25           ` Eric Sunshine
2021-10-28 22:44             ` Ivan Frade
2021-10-28 22:41           ` Ivan Frade
2021-10-29 23:18           ` Junio C Hamano
2021-11-09  1:54             ` Ævar Arnfjörð Bjarmason
2021-10-28 22:51       ` [PATCH v5 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget
2021-10-28 22:51         ` [PATCH v5 1/2] " Ivan Frade via GitGitGadget
2021-10-28 23:21           ` Junio C Hamano
2021-10-29 18:42             ` Ivan Frade
2021-10-29 19:59               ` Junio C Hamano
2021-11-08 22:43                 ` Jonathan Tan
2021-10-28 22:51         ` [PATCH v5 2/2] http-fetch: redact url on die() message Ivan Frade via GitGitGadget
2021-10-29 18:42         ` [PATCH v6 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget
2021-10-29 18:42           ` [PATCH v6 1/2] " Ivan Frade via GitGitGadget
2021-11-08 23:01             ` Jonathan Tan
2021-11-09  1:36               ` Ævar Arnfjörð Bjarmason [this message]
2021-11-10 23:44                 ` Ivan Frade
2021-11-11  0:01                   ` Ævar Arnfjörð Bjarmason
2021-11-10 21:18               ` Ivan Frade
2021-10-29 18:42           ` [PATCH v6 2/2] http-fetch: redact url on die() message Ivan Frade via GitGitGadget
2021-11-08 23:06             ` Jonathan Tan
2021-11-10 23:51           ` [PATCH v7 0/2] fetch-pack: redact packfile urls in traces Ivan Frade via GitGitGadget
2021-11-10 23:51             ` [PATCH v7 1/2] " Ivan Frade via GitGitGadget
2021-11-10 23:51             ` [PATCH v7 2/2] http-fetch: redact url on die() message Ivan Frade via GitGitGadget
2021-11-12  4:43             ` [PATCH v7 0/2] fetch-pack: redact packfile urls in traces Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=211109.86mtmedrhr.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=ifrade@google.com \
    --cc=jonathantanmy@google.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).