All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Daudt <me@ikke.info>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Swift Geek <swiftgeek@gmail.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH] mailinfo: unescape quoted-pair in header fields
Date: Mon, 19 Sep 2016 12:51:33 +0200	[thread overview]
Message-ID: <20160919105133.GA10901@ikke.info> (raw)
In-Reply-To: <20160916222206.jz2d4gpaxxccia5p@sigill.intra.peff.net>

Thanks for the review

On Fri, Sep 16, 2016 at 03:22:06PM -0700, Jeff King wrote:
> On Fri, Sep 16, 2016 at 11:02:04PM +0200, Kevin Daudt wrote:
> 
> >  mailinfo.c                 | 54 ++++++++++++++++++++++++++++++++++++++++++++++
> >  t/t5100-mailinfo.sh        |  6 ++++++
> >  t/t5100/quoted-pair.expect |  5 +++++
> >  t/t5100/quoted-pair.in     |  9 ++++++++
> >  4 files changed, 74 insertions(+)
> >  create mode 100644 t/t5100/quoted-pair.expect
> >  create mode 100644 t/t5100/quoted-pair.in
> > 
> > diff --git a/mailinfo.c b/mailinfo.c
> > index e19abe3..04036f3 100644
> > --- a/mailinfo.c
> > +++ b/mailinfo.c
> > @@ -54,15 +54,69 @@ static void parse_bogus_from(struct mailinfo *mi, const struct strbuf *line)
> >  	get_sane_name(&mi->name, &mi->name, &mi->email);
> >  }
> >  
> > +static int unquote_quoted_string(struct strbuf *line)
> > +{
> > +	struct strbuf outbuf;
> > +	const char *in = line->buf;
> > +	int c, take_next_literally = 0;
> > +	int found_error = 0;
> > +	char escape_context=0;
> 
> Style: whitespace around "=".
> 
> I had to wonder why we needed both escape_context and
> take_next_literally; shouldn't we just need a single state bit. But
> escape_context is not "escape the next character", it is "we are
> currently in a mode where we should be escaping".
> 
> Could we give it a more descriptive name? I guess it is more than just
> "we are in a mode", but rather "here is the character that will end the
> escaped mode". Maybe a comment would be more appropriate.
> 

Yes, your analysis is right, we need to know what character would end
the 'escape context'. I'll add a comment.

> > +	while ((c = *in++) != 0) {
> > +		if (take_next_literally) {
> > +			take_next_literally = 0;
> > +		} else {
> 
> OK, so that means the previous one was backslash-quoted, and we don't do
> any other cleverness. Good.
> 
> > +			switch (c) {
> > +			case '"':
> > +				if (!escape_context)
> > +					escape_context = '"';
> > +				else if (escape_context == '"')
> > +					escape_context = 0;
> > +				continue;
> 
> And here we open or close the quoted portion, depending. Makes sense.
> 
> > +			case '\\':
> > +				if (escape_context) {
> > +					take_next_literally = 1;
> > +					continue;
> > +				}
> > +				break;
> 
> I didn't look in the RFC. Is:
> 
>   From: my \"name\" <foo@example.com>
> 
> really the same as:
> 
>   From: "my \\\"name\\\"" <foo@example.com>
> 
> ? That seems weird, but I think it may be that the former is simply
> bogus (you are not supposed to use backslashes outside of the quoted
> section at all).

Correct, the quoted-pair (escape sequence) can only occur in a quoted
string or a comment. Even more so, the display name *needs* to be quoted
when consisting of more then one word according to the RFC.

> 
> > +			case '(':
> > +				if (!escape_context)
> > +					escape_context = '(';
> > +				else if (escape_context == '(')
> > +					found_error = 1;
> > +				break;
> 
> Hmm. Is:
> 
>   From: Name (Comment with (another comment))
> 
> really disallowed? RFC2822 seems to say that "comment" can contain
> "ccontent", which can itself be a comment.

Yes, you are right, it is allowed, I was just looking at the ctext when
adding this, but failed to see that comments can be nested at that time.

> 
> This is obviously getting pretty silly, but if we are going to follow
> the RFC, I think you actually have to do a recursive parse, and keep
> track of an arbitrary depth of context.
> 
> I dunno. This method probably covers most cases in practice, and it's
> easy to reason about.

The problem is, how do you differentiate between nested comments, and
escaped braces within a comment after one run?
> 
> > +			case ')':
> > +				if (escape_context == '(')
> > +					escape_context = 0;
> > +				break;
> > +			}
> > +		}
> > +
> > +		strbuf_addch(&outbuf, c);
> > +	}
> > +
> > +	strbuf_reset(line);
> > +	strbuf_addbuf(line, &outbuf);
> > +	strbuf_release(&outbuf);
> 
> I think you can use strbuf_swap() here to avoid copying the line an
> extra time, like:
> 
>   strbuf_swap(line, &outbuf);
>   strbuf_release(&outbuf);
> 
> Another option would be to just:
> 
>   in = strbuf_detach(&line);
> 
> at the beginning, and then output back into "line".
> 

Thanks, I just looked at what other functions were doing, but this is
much better indeed.

> > +	return found_error;
> 
> What happens when we get here and take_next_literally is set? I.e., a
> backslash at the end of the string. We'll silently print nothing, which
> seems reasonable to me (the other option is to print a literal
> backslash).
> 
> Ditto, what if escape_context is non-zero? We're in the middle of an
> unterminated quoted string (or comment).
> 
> I'm fine with silently continuing, but it seems weird that we notice
> embedded comments (and return an error), but not these other conditions.
> 

I agree. I'm thinking it's better to just be lenient in this method. If
a quote wasn't properly closed, there would be no e-mail adress for
example. I think it would do little harm, and I'd remove the checking
for the opening brace too.

> >  static void handle_from(struct mailinfo *mi, const struct strbuf *from)
> >  {
> >  	char *at;
> >  	size_t el;
> >  	struct strbuf f;
> >  
> > +
> >  	strbuf_init(&f, from->len);
> >  	strbuf_addbuf(&f, from);
> 
> Funny extra line?

Ugh

> 
> > +test_expect_success 'mailinfo unescapes rfc2822 quoted-string' '
> > +    mkdir quoted-pair &&
> > +    git mailinfo /dev/null /dev/null <"$TEST_DIRECTORY"/t5100/quoted-pair.in >quoted-pair/info &&
> > +    test_cmp "$TEST_DIRECTORY"/t5100/quoted-pair.expect quoted-pair/info
> > +'
> 
> We usually break long lines with backslash-escapes. Like:
> 
>   git mailinfo /dev/null /dev/null \
> 	<"$TEST_DIRECTORY"/t5100/quoted-pair.in \
> 	>quoted-pair/info
> 
> I'd also wonder if things might be made much more readable by putting
> "$TEST_DIRECTORY/t5100" into a shorter variable like $data or something.
> That would be best done as a preparatory patch which updates all of the
> tests.
> 
> > --- /dev/null
> > +++ b/t/t5100/quoted-pair.in
> > @@ -0,0 +1,9 @@
> > +From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
> > +From: "Author \"The Author\" Name" <somebody@example.com>
> > +Date: Sun, 25 May 2008 00:38:18 -0700
> > +Subject: [PATCH] testing quoted-pair
> 
> I do not care that much about the "()" comment behavior myself, but if
> we are going to implement it, it probably makes sense to protect it from
> regression with a test.

Yeah, good idea.
> 
> -Peff

  reply	other threads:[~2016-09-19 10:51 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-16 21:02 [PATCH] mailinfo: unescape quoted-pair in header fields Kevin Daudt
2016-09-16 22:22 ` Jeff King
2016-09-19 10:51   ` Kevin Daudt [this message]
2016-09-20  3:57     ` Jeff King
2016-09-21 16:07       ` Junio C Hamano
2016-09-19 18:54 ` [PATCH v2 0/2] Handle escape characters in From field Kevin Daudt
2016-09-25 21:08   ` [PATCH v3 1/2] t5100-mailinfo: replace common path prefix with variable Kevin Daudt
2016-09-25 21:08     ` [PATCH v3 2/2] mailinfo: unescape quoted-pair in header fields Kevin Daudt
2016-09-26 19:11       ` Junio C Hamano
2016-09-26 19:26         ` Junio C Hamano
2016-09-26 19:44           ` Kevin Daudt
2016-09-26 22:23             ` Junio C Hamano
2016-09-27 10:26               ` Kevin Daudt
2016-09-26 19:06     ` [PATCH v3 1/2] t5100-mailinfo: replace common path prefix with variable Junio C Hamano
2016-09-28 19:49     ` [PATCH v4 0/2] Handle RFC2822 quoted-pairs in From header Kevin Daudt
2016-09-28 19:52       ` [PATCH v4 1/2] t5100-mailinfo: replace common path prefix with variable Kevin Daudt
2016-09-28 20:21         ` Junio C Hamano
2016-09-28 20:27           ` Kevin Daudt
2016-09-28 19:52       ` [PATCH v4 2/2] mailinfo: unescape quoted-pair in header fields Kevin Daudt
2016-09-19 18:54 ` [PATCH v2 1/2] t5100-mailinfo: replace common path prefix with variable Kevin Daudt
2016-09-19 21:16   ` Junio C Hamano
2016-09-20  3:59     ` Jeff King
2016-09-19 18:54 ` [PATCH v2 2/2] mailinfo: unescape quoted-pair in header fields Kevin Daudt
2016-09-19 21:24   ` Junio C Hamano
2016-09-19 22:04     ` Junio C Hamano
2016-09-20  4:28   ` Jeff King
2016-09-21 11:09   ` Jeff King
2016-09-22 22:17     ` Junio C Hamano
2016-09-23  4:15       ` Jeff King
2016-09-25 20:17         ` Kevin Daudt
2016-09-25 22:38           ` Jakub Narębski
2016-09-26  5:02             ` Kevin Daudt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160919105133.GA10901@ikke.info \
    --to=me@ikke.info \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=swiftgeek@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.