* [PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec
@ 2014-12-14 15:59 Роман Донченко
2014-12-14 15:59 ` [PATCH v2 2/2] send-email: handle adjacent RFC 2047-encoded words properly Роман Донченко
2014-12-15 23:13 ` [PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec Junio C Hamano
0 siblings, 2 replies; 4+ messages in thread
From: Роман Донченко @ 2014-12-14 15:59 UTC (permalink / raw)
To: gitster; +Cc: git
More specifically:
* Add "\" to the list of characters not allowed in a token (see RFC 2047
errata).
* Share regexes between unquote_rfc2047 and is_rfc2047_quoted. Besides
removing duplication, this also makes unquote_rfc2047 more stringent.
* Allow both "q" and "Q" to identify the encoding.
* Allow lowercase hexadecimal digits in the "Q" encoding.
And, more on the cosmetic side:
* Change the "encoded-text" regex to exclude rather than include characters,
for clarity and consistency with "token".
Signed-off-by: Роман Донченко <dpb@corrigendum.ru>
Acked-by: Jeff King <peff@peff.net>
---
git-send-email.perl | 30 +++++++++++++++++++-----------
1 file changed, 19 insertions(+), 11 deletions(-)
diff --git a/git-send-email.perl b/git-send-email.perl
index 9949db0..d461ffb 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -145,6 +145,11 @@ my $have_mail_address = eval { require Mail::Address; 1 };
my $smtp;
my $auth;
+# Regexes for RFC 2047 productions.
+my $re_token = qr/[^][()<>@,;:\\"\/?.= \000-\037\177-\377]+/;
+my $re_encoded_text = qr/[^? \000-\037\177-\377]+/;
+my $re_encoded_word = qr/=\?($re_token)\?($re_token)\?($re_encoded_text)\?=/;
+
# Variables we fill in automatically, or via prompting:
my (@to,$no_to,@initial_to,@cc,$no_cc,@initial_cc,@bcclist,$no_bcc,@xh,
$initial_reply_to,$initial_subject,@files,
@@ -913,15 +918,20 @@ $time = time - scalar $#files;
sub unquote_rfc2047 {
local ($_) = @_;
- my $encoding;
- s{=\?([^?]+)\?q\?(.*?)\?=}{
- $encoding = $1;
- my $e = $2;
- $e =~ s/_/ /g;
- $e =~ s/=([0-9A-F]{2})/chr(hex($1))/eg;
- $e;
+ my $charset;
+ s{$re_encoded_word}{
+ $charset = $1;
+ my $encoding = $2;
+ my $text = $3;
+ if ($encoding eq 'q' || $encoding eq 'Q') {
+ $text =~ s/_/ /g;
+ $text =~ s/=([0-9A-F]{2})/chr(hex($1))/egi;
+ $text;
+ } else {
+ $&; # other encodings not supported yet
+ }
}eg;
- return wantarray ? ($_, $encoding) : $_;
+ return wantarray ? ($_, $charset) : $_;
}
sub quote_rfc2047 {
@@ -934,10 +944,8 @@ sub quote_rfc2047 {
sub is_rfc2047_quoted {
my $s = shift;
- my $token = qr/[^][()<>@,;:"\/?.= \000-\037\177-\377]+/;
- my $encoded_text = qr/[!->@-~]+/;
length($s) <= 75 &&
- $s =~ m/^(?:"[[:ascii:]]*"|=\?$token\?$token\?$encoded_text\?=)$/o;
+ $s =~ m/^(?:"[[:ascii:]]*"|$re_encoded_word)$/o;
}
sub subject_needs_rfc2047_quoting {
--
2.1.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v2 2/2] send-email: handle adjacent RFC 2047-encoded words properly
2014-12-14 15:59 [PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec Роман Донченко
@ 2014-12-14 15:59 ` Роман Донченко
2014-12-15 23:13 ` [PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec Junio C Hamano
1 sibling, 0 replies; 4+ messages in thread
From: Роман Донченко @ 2014-12-14 15:59 UTC (permalink / raw)
To: gitster; +Cc: git
The RFC says that they are to be concatenated after decoding (i.e. the
intervening whitespace is ignored).
Signed-off-by: Роман Донченко <dpb@corrigendum.ru>
Acked-by: Jeff King <peff@peff.net>
---
git-send-email.perl | 26 ++++++++++++++++----------
t/t9001-send-email.sh | 7 +++++++
2 files changed, 23 insertions(+), 10 deletions(-)
diff --git a/git-send-email.perl b/git-send-email.perl
index d461ffb..7d5cc8a 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -919,17 +919,23 @@ $time = time - scalar $#files;
sub unquote_rfc2047 {
local ($_) = @_;
my $charset;
- s{$re_encoded_word}{
- $charset = $1;
- my $encoding = $2;
- my $text = $3;
- if ($encoding eq 'q' || $encoding eq 'Q') {
- $text =~ s/_/ /g;
- $text =~ s/=([0-9A-F]{2})/chr(hex($1))/egi;
- $text;
- } else {
- $&; # other encodings not supported yet
+ my $sep = qr/[ \t]+/;
+ s{$re_encoded_word(?:$sep$re_encoded_word)*}{
+ my @words = split $sep, $&;
+ foreach (@words) {
+ m/$re_encoded_word/;
+ $charset = $1;
+ my $encoding = $2;
+ my $text = $3;
+ if ($encoding eq 'q' || $encoding eq 'Q') {
+ $_ = $text;
+ s/_/ /g;
+ s/=([0-9A-F]{2})/chr(hex($1))/egi;
+ } else {
+ # other encodings not supported yet
+ }
}
+ join '', @words;
}eg;
return wantarray ? ($_, $charset) : $_;
}
diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index 19a3ced..fa965ff 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -240,6 +240,13 @@ test_expect_success $PREREQ 'non-ascii self name is suppressed' "
'non_ascii_self_suppressed'
"
+# This name is long enough to force format-patch to split it into multiple
+# encoded-words, assuming it uses UTF-8 with the "Q" encoding.
+test_expect_success $PREREQ 'long non-ascii self name is suppressed' "
+ test_suppress_self_quoted 'Ƒüñníęř €. Nâṁé' 'odd_?=mail@example.com' \
+ 'long_non_ascii_self_suppressed'
+"
+
test_expect_success $PREREQ 'sanitized self name is suppressed' "
test_suppress_self_unquoted '\"A U. Thor\"' 'author@example.com' \
'self_name_sanitized_suppressed'
--
2.1.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec
2014-12-14 15:59 [PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec Роман Донченко
2014-12-14 15:59 ` [PATCH v2 2/2] send-email: handle adjacent RFC 2047-encoded words properly Роман Донченко
@ 2014-12-15 23:13 ` Junio C Hamano
1 sibling, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2014-12-15 23:13 UTC (permalink / raw)
To: Роман Донченко
Cc: git
Thanks; queued both patches.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec
@ 2014-12-06 19:36 Роман Донченко
0 siblings, 0 replies; 4+ messages in thread
From: Роман Донченко @ 2014-12-06 19:36 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
Jay Soffian, Jeff King, Thomas Rast
More specifically:
* Add "\" to the list of characters not allowed in a token (see RFC 2047
errata).
* Share regexes between unquote_rfc2047 and is_rfc2047_quoted. Besides
removing duplication, this also makes unquote_rfc2047 more stringent.
* Allow both "q" and "Q" to identify the encoding.
* Allow lowercase hexadecimal digits in the "Q" encoding.
And, more on the cosmetic side:
* Change the "encoded-text" regex to exclude rather than include characters,
for clarity and consistency with "token".
Signed-off-by: Роман Донченко <dpb@corrigendum.ru>
---
git-send-email.perl | 30 +++++++++++++++++++-----------
1 file changed, 19 insertions(+), 11 deletions(-)
diff --git a/git-send-email.perl b/git-send-email.perl
index 9949db0..d461ffb 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -145,6 +145,11 @@ my $have_mail_address = eval { require Mail::Address; 1 };
my $smtp;
my $auth;
+# Regexes for RFC 2047 productions.
+my $re_token = qr/[^][()<>@,;:\\"\/?.= \000-\037\177-\377]+/;
+my $re_encoded_text = qr/[^? \000-\037\177-\377]+/;
+my $re_encoded_word = qr/=\?($re_token)\?($re_token)\?($re_encoded_text)\?=/;
+
# Variables we fill in automatically, or via prompting:
my (@to,$no_to,@initial_to,@cc,$no_cc,@initial_cc,@bcclist,$no_bcc,@xh,
$initial_reply_to,$initial_subject,@files,
@@ -913,15 +918,20 @@ $time = time - scalar $#files;
sub unquote_rfc2047 {
local ($_) = @_;
- my $encoding;
- s{=\?([^?]+)\?q\?(.*?)\?=}{
- $encoding = $1;
- my $e = $2;
- $e =~ s/_/ /g;
- $e =~ s/=([0-9A-F]{2})/chr(hex($1))/eg;
- $e;
+ my $charset;
+ s{$re_encoded_word}{
+ $charset = $1;
+ my $encoding = $2;
+ my $text = $3;
+ if ($encoding eq 'q' || $encoding eq 'Q') {
+ $text =~ s/_/ /g;
+ $text =~ s/=([0-9A-F]{2})/chr(hex($1))/egi;
+ $text;
+ } else {
+ $&; # other encodings not supported yet
+ }
}eg;
- return wantarray ? ($_, $encoding) : $_;
+ return wantarray ? ($_, $charset) : $_;
}
sub quote_rfc2047 {
@@ -934,10 +944,8 @@ sub quote_rfc2047 {
sub is_rfc2047_quoted {
my $s = shift;
- my $token = qr/[^][()<>@,;:"\/?.= \000-\037\177-\377]+/;
- my $encoded_text = qr/[!->@-~]+/;
length($s) <= 75 &&
- $s =~ m/^(?:"[[:ascii:]]*"|=\?$token\?$token\?$encoded_text\?=)$/o;
+ $s =~ m/^(?:"[[:ascii:]]*"|$re_encoded_word)$/o;
}
sub subject_needs_rfc2047_quoting {
--
2.1.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-12-15 23:13 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-14 15:59 [PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec Роман Донченко
2014-12-14 15:59 ` [PATCH v2 2/2] send-email: handle adjacent RFC 2047-encoded words properly Роман Донченко
2014-12-15 23:13 ` [PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec Junio C Hamano
-- strict thread matches above, loose matches on Subject: below --
2014-12-06 19:36 Роман Донченко
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).