git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Роман Донченко" <dpb@corrigendum.ru>
Cc: "Git Mailing List" <git@vger.kernel.org>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Jay Soffian" <jaysoffian@gmail.com>, "Jeff King" <peff@peff.net>,
	"Thomas Rast" <tr@thomasrast.ch>
Subject: Re: [PATCH] send-email: handle adjacent RFC 2047-encoded words properly
Date: Sun, 23 Nov 2014 23:27:51 -0800	[thread overview]
Message-ID: <CAPc5daVjNDg5CcWsMwfn=DZhwpCBdU2LYXOpFWZwhx2p8hLRww@mail.gmail.com> (raw)
In-Reply-To: <1416786604-4988-1-git-send-email-dpb@corrigendum.ru>

On Sun, Nov 23, 2014 at 3:50 PM, Роман Донченко <dpb@corrigendum.ru> wrote:
> The RFC says that they are to be concatenated after decoding (i.e. the
> intervening whitespace is ignored).
>
> I change the sender's name to an all-Cyrillic string in the tests so that
> its encoded form goes over the 76 characters in a line limit, forcing
> format-patch to split it into multiple encoded words.
>
> Since I have to modify the regular expression for an encoded word anyway,
> I take the opportunity to bring it closer to the spec, most notably
> disallowing embedded spaces and making it case-insensitive (thus allowing
> the encoding to be specified as both "q" and "Q").
>
> Signed-off-by: Роман Донченко <dpb@corrigendum.ru>

This sounds like a worthy thing to do in general.

I wonder if the C implementation we have for mailinfo needs similar
update, though. I vaguely recall that we have case-insensitive start for
q/b segments, but do not remember the details offhand.

Was the change to the test to use Cyrillic really necessary, or did it
suffice if you simply extended the existsing "Funny Name" spelled with
strange accents, but you substituted the whole string anyway?

Until I found out what the new string says by running web-based
translation on it, I felt somewhat uneasy. As I do not read
Cyrillic/Russian, we may have been adding some profanity without
knowing. It turns out that the string just says "Cyrillic Name", so I am
not against using the new string, but it simply looked odd to replace the
string whole-sale when you merely need a longer string. It made it look
as if a bug was specific to Cyrillic when it wasn't.

As you may notice by reading "git log --no-merges" from recent history,
we tend not to say "I did X, I did Y". If the tone of the above message
were more similar to them, it may have been easier to read.

But other than these minor nits, the change looks good from
a cursory read.

Thanks.

> ---
>  git-send-email.perl   | 21 +++++++++++++++------
>  t/t9001-send-email.sh | 18 +++++++++---------
>  2 files changed, 24 insertions(+), 15 deletions(-)
>
> diff --git a/git-send-email.perl b/git-send-email.perl
> index 9949db0..4bb9f6f 100755
> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -913,13 +913,22 @@ $time = time - scalar $#files;
>
>  sub unquote_rfc2047 {
>         local ($_) = @_;
> +
> +       my $et = qr/[!->@-~]+/; # encoded-text from RFC 2047
> +       my $sep = qr/[ \t]+/;
> +       my $encoded_word = qr/=\?($et)\?q\?($et)\?=/i;
> +
>         my $encoding;
> -       s{=\?([^?]+)\?q\?(.*?)\?=}{
> -               $encoding = $1;
> -               my $e = $2;
> -               $e =~ s/_/ /g;
> -               $e =~ s/=([0-9A-F]{2})/chr(hex($1))/eg;
> -               $e;
> +       s{$encoded_word(?:$sep$encoded_word)+}{
> +               my @words = split $sep, $&;
> +               foreach (@words) {
> +                       m/$encoded_word/;
> +                       $encoding = $1;
> +                       $_ = $2;
> +                       s/_/ /g;
> +                       s/=([0-9A-F]{2})/chr(hex($1))/eg;
> +               }
> +               join '', @words;
>         }eg;
>         return wantarray ? ($_, $encoding) : $_;
>  }
> diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
> index 19a3ced..318b870 100755
> --- a/t/t9001-send-email.sh
> +++ b/t/t9001-send-email.sh
> @@ -236,7 +236,7 @@ test_expect_success $PREREQ 'self name with dot is suppressed' "
>  "
>
>  test_expect_success $PREREQ 'non-ascii self name is suppressed' "
> -       test_suppress_self_quoted 'Füñný Nâmé' 'odd_?=mail@example.com' \
> +       test_suppress_self_quoted 'Кириллическое Имя' 'odd_?=mail@example.com' \
>                 'non_ascii_self_suppressed'
>  "
>
> @@ -946,25 +946,25 @@ test_expect_success $PREREQ 'utf8 author is correctly passed on' '
>         clean_fake_sendmail &&
>         test_commit weird_author &&
>         test_when_finished "git reset --hard HEAD^" &&
> -       git commit --amend --author "Füñný Nâmé <odd_?=mail@example.com>" &&
> -       git format-patch --stdout -1 >funny_name.patch &&
> +       git commit --amend --author "Кириллическое Имя <odd_?=mail@example.com>" &&
> +       git format-patch --stdout -1 >nonascii_name.patch &&
>         git send-email --from="Example <nobody@example.com>" \
>           --to=nobody@example.com \
>           --smtp-server="$(pwd)/fake.sendmail" \
> -         funny_name.patch &&
> -       grep "^From: Füñný Nâmé <odd_?=mail@example.com>" msgtxt1
> +         nonascii_name.patch &&
> +       grep "^From: Кириллическое Имя <odd_?=mail@example.com>" msgtxt1
>  '
>
>  test_expect_success $PREREQ 'utf8 sender is not duplicated' '
>         clean_fake_sendmail &&
>         test_commit weird_sender &&
>         test_when_finished "git reset --hard HEAD^" &&
> -       git commit --amend --author "Füñný Nâmé <odd_?=mail@example.com>" &&
> -       git format-patch --stdout -1 >funny_name.patch &&
> -       git send-email --from="Füñný Nâmé <odd_?=mail@example.com>" \
> +       git commit --amend --author "Кириллическое Имя <odd_?=mail@example.com>" &&
> +       git format-patch --stdout -1 >nonascii_name.patch &&
> +       git send-email --from="Кириллическое Имя <odd_?=mail@example.com>" \
>           --to=nobody@example.com \
>           --smtp-server="$(pwd)/fake.sendmail" \
> -         funny_name.patch &&
> +         nonascii_name.patch &&
>         grep "^From: " msgtxt1 >msgfrom &&
>         test_line_count = 1 msgfrom
>  '
> --
> 2.1.1
>

  reply	other threads:[~2014-11-24  7:28 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-23 23:50 [PATCH] send-email: handle adjacent RFC 2047-encoded words properly Роман Донченко
2014-11-24  7:27 ` Junio C Hamano [this message]
2014-11-24 15:44   ` Jeff King
2014-11-24 18:09   ` Роман Донченко
2014-11-24 15:36 ` Jeff King
2014-11-24 18:26   ` Роман Донченко
2014-11-24 23:03     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPc5daVjNDg5CcWsMwfn=DZhwpCBdU2LYXOpFWZwhx2p8hLRww@mail.gmail.com' \
    --to=gitster@pobox.com \
    --cc=avarab@gmail.com \
    --cc=dpb@corrigendum.ru \
    --cc=git@vger.kernel.org \
    --cc=jaysoffian@gmail.com \
    --cc=peff@peff.net \
    --cc=tr@thomasrast.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).