From: Jeff King <peff@peff.net>
To: "Роман Донченко" <dpb@corrigendum.ru>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"Jay Soffian" <jaysoffian@gmail.com>,
"Thomas Rast" <tr@thomasrast.ch>
Subject: Re: [PATCH] send-email: handle adjacent RFC 2047-encoded words properly
Date: Mon, 24 Nov 2014 10:36:09 -0500 [thread overview]
Message-ID: <20141124153609.GA25912@peff.net> (raw)
In-Reply-To: <1416786604-4988-1-git-send-email-dpb@corrigendum.ru>
On Mon, Nov 24, 2014 at 02:50:04AM +0300, Роман Донченко wrote:
> The RFC says that they are to be concatenated after decoding (i.e. the
> intervening whitespace is ignored).
>
> I change the sender's name to an all-Cyrillic string in the tests so that
> its encoded form goes over the 76 characters in a line limit, forcing
> format-patch to split it into multiple encoded words.
>
> Since I have to modify the regular expression for an encoded word anyway,
> I take the opportunity to bring it closer to the spec, most notably
> disallowing embedded spaces and making it case-insensitive (thus allowing
> the encoding to be specified as both "q" and "Q").
The overall goal makes sense to me. Thanks for working on this. I have a
few questions/comments, though.
> sub unquote_rfc2047 {
> local ($_) = @_;
> +
> + my $et = qr/[!->@-~]+/; # encoded-text from RFC 2047
> + my $sep = qr/[ \t]+/;
> + my $encoded_word = qr/=\?($et)\?q\?($et)\?=/i;
The first $et in $encoded_word is actually the charset, which is defined
by RFC 2047 as:
charset = token ; see section 3
token = 1*<Any CHAR except SPACE, CTLs, and especials>
especials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
<"> / "/" / "[" / "]" / "?" / "." / "="
Your regex is a little more liberal. I doubt that it is a big deal in
practice (actually, in practice, I suspect [a-zA-Z0-9-] would be fine).
But if we are tightening things up in general, it may make sense to do
so here (and I notice that is_rfc2047_quoted does a more thorough $token
definition, and it probably makes sense for the two functions to be
consistent).
For your definition of encoded-text, RFC 2047 says:
encoded-text = 1*<Any printable ASCII character other than "?"
or SPACE>
It looks like you pulled the definition of $et from is_rfc2047_quoted,
but I am not clear on where that original came from (it is from a3a8262,
but that commit message does not explain the regex).
Also, I note that we handle 'q'-style encodings here, but not 'b'. I
wonder if it is worth adding that in while we are in the area (it is not
a big deal if you always send-email git-generated patches, as we never
generate it).
> + s{$encoded_word(?:$sep$encoded_word)+}{
If I am reading this right, it requires at least two $encoded_words.
Should this "+" be a "*"?
> + my @words = split $sep, $&;
> + foreach (@words) {
> + m/$encoded_word/;
> + $encoding = $1;
> + $_ = $2;
> + s/_/ /g;
> + s/=([0-9A-F]{2})/chr(hex($1))/eg;
In the spirit of your earlier change, should this final regex be
case-insensitive? RFC 2047 says only "Upper case should be used for
hexadecimal digits "A" through "F." but that does not seem like a "MUST"
to me.
-Peff
next prev parent reply other threads:[~2014-11-24 15:36 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-23 23:50 [PATCH] send-email: handle adjacent RFC 2047-encoded words properly Роман Донченко
2014-11-24 7:27 ` Junio C Hamano
2014-11-24 15:44 ` Jeff King
2014-11-24 18:09 ` Роман Донченко
2014-11-24 15:36 ` Jeff King [this message]
2014-11-24 18:26 ` Роман Донченко
2014-11-24 23:03 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141124153609.GA25912@peff.net \
--to=peff@peff.net \
--cc=avarab@gmail.com \
--cc=dpb@corrigendum.ru \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jaysoffian@gmail.com \
--cc=tr@thomasrast.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.