git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec
@ 2014-12-06 19:36 Роман Донченко
  0 siblings, 0 replies; 4+ messages in thread
From: Роман Донченко @ 2014-12-06 19:36 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Jay Soffian, Jeff King, Thomas Rast

More specifically:

* Add "\" to the list of characters not allowed in a token (see RFC 2047
  errata).

* Share regexes between unquote_rfc2047 and is_rfc2047_quoted. Besides
  removing duplication, this also makes unquote_rfc2047 more stringent.

* Allow both "q" and "Q" to identify the encoding.

* Allow lowercase hexadecimal digits in the "Q" encoding.

And, more on the cosmetic side:

* Change the "encoded-text" regex to exclude rather than include characters,
  for clarity and consistency with "token".

Signed-off-by: Роман Донченко <dpb@corrigendum.ru>
---
 git-send-email.perl | 30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index 9949db0..d461ffb 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -145,6 +145,11 @@ my $have_mail_address = eval { require Mail::Address; 1 };
 my $smtp;
 my $auth;
 
+# Regexes for RFC 2047 productions.
+my $re_token = qr/[^][()<>@,;:\\"\/?.= \000-\037\177-\377]+/;
+my $re_encoded_text = qr/[^? \000-\037\177-\377]+/;
+my $re_encoded_word = qr/=\?($re_token)\?($re_token)\?($re_encoded_text)\?=/;
+
 # Variables we fill in automatically, or via prompting:
 my (@to,$no_to,@initial_to,@cc,$no_cc,@initial_cc,@bcclist,$no_bcc,@xh,
 	$initial_reply_to,$initial_subject,@files,
@@ -913,15 +918,20 @@ $time = time - scalar $#files;
 
 sub unquote_rfc2047 {
 	local ($_) = @_;
-	my $encoding;
-	s{=\?([^?]+)\?q\?(.*?)\?=}{
-		$encoding = $1;
-		my $e = $2;
-		$e =~ s/_/ /g;
-		$e =~ s/=([0-9A-F]{2})/chr(hex($1))/eg;
-		$e;
+	my $charset;
+	s{$re_encoded_word}{
+		$charset = $1;
+		my $encoding = $2;
+		my $text = $3;
+		if ($encoding eq 'q' || $encoding eq 'Q') {
+			$text =~ s/_/ /g;
+			$text =~ s/=([0-9A-F]{2})/chr(hex($1))/egi;
+			$text;
+		} else {
+			$&; # other encodings not supported yet
+		}
 	}eg;
-	return wantarray ? ($_, $encoding) : $_;
+	return wantarray ? ($_, $charset) : $_;
 }
 
 sub quote_rfc2047 {
@@ -934,10 +944,8 @@ sub quote_rfc2047 {
 
 sub is_rfc2047_quoted {
 	my $s = shift;
-	my $token = qr/[^][()<>@,;:"\/?.= \000-\037\177-\377]+/;
-	my $encoded_text = qr/[!->@-~]+/;
 	length($s) <= 75 &&
-	$s =~ m/^(?:"[[:ascii:]]*"|=\?$token\?$token\?$encoded_text\?=)$/o;
+	$s =~ m/^(?:"[[:ascii:]]*"|$re_encoded_word)$/o;
 }
 
 sub subject_needs_rfc2047_quoting {
-- 
2.1.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec
@ 2014-12-14 15:59 Роман Донченко
  2014-12-14 15:59 ` [PATCH v2 2/2] send-email: handle adjacent RFC 2047-encoded words properly Роман Донченко
  2014-12-15 23:13 ` [PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec Junio C Hamano
  0 siblings, 2 replies; 4+ messages in thread
From: Роман Донченко @ 2014-12-14 15:59 UTC (permalink / raw)
  To: gitster; +Cc: git

More specifically:

* Add "\" to the list of characters not allowed in a token (see RFC 2047
  errata).

* Share regexes between unquote_rfc2047 and is_rfc2047_quoted. Besides
  removing duplication, this also makes unquote_rfc2047 more stringent.

* Allow both "q" and "Q" to identify the encoding.

* Allow lowercase hexadecimal digits in the "Q" encoding.

And, more on the cosmetic side:

* Change the "encoded-text" regex to exclude rather than include characters,
  for clarity and consistency with "token".

Signed-off-by: Роман Донченко <dpb@corrigendum.ru>
Acked-by: Jeff King <peff@peff.net>
---
 git-send-email.perl | 30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index 9949db0..d461ffb 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -145,6 +145,11 @@ my $have_mail_address = eval { require Mail::Address; 1 };
 my $smtp;
 my $auth;
 
+# Regexes for RFC 2047 productions.
+my $re_token = qr/[^][()<>@,;:\\"\/?.= \000-\037\177-\377]+/;
+my $re_encoded_text = qr/[^? \000-\037\177-\377]+/;
+my $re_encoded_word = qr/=\?($re_token)\?($re_token)\?($re_encoded_text)\?=/;
+
 # Variables we fill in automatically, or via prompting:
 my (@to,$no_to,@initial_to,@cc,$no_cc,@initial_cc,@bcclist,$no_bcc,@xh,
 	$initial_reply_to,$initial_subject,@files,
@@ -913,15 +918,20 @@ $time = time - scalar $#files;
 
 sub unquote_rfc2047 {
 	local ($_) = @_;
-	my $encoding;
-	s{=\?([^?]+)\?q\?(.*?)\?=}{
-		$encoding = $1;
-		my $e = $2;
-		$e =~ s/_/ /g;
-		$e =~ s/=([0-9A-F]{2})/chr(hex($1))/eg;
-		$e;
+	my $charset;
+	s{$re_encoded_word}{
+		$charset = $1;
+		my $encoding = $2;
+		my $text = $3;
+		if ($encoding eq 'q' || $encoding eq 'Q') {
+			$text =~ s/_/ /g;
+			$text =~ s/=([0-9A-F]{2})/chr(hex($1))/egi;
+			$text;
+		} else {
+			$&; # other encodings not supported yet
+		}
 	}eg;
-	return wantarray ? ($_, $encoding) : $_;
+	return wantarray ? ($_, $charset) : $_;
 }
 
 sub quote_rfc2047 {
@@ -934,10 +944,8 @@ sub quote_rfc2047 {
 
 sub is_rfc2047_quoted {
 	my $s = shift;
-	my $token = qr/[^][()<>@,;:"\/?.= \000-\037\177-\377]+/;
-	my $encoded_text = qr/[!->@-~]+/;
 	length($s) <= 75 &&
-	$s =~ m/^(?:"[[:ascii:]]*"|=\?$token\?$token\?$encoded_text\?=)$/o;
+	$s =~ m/^(?:"[[:ascii:]]*"|$re_encoded_word)$/o;
 }
 
 sub subject_needs_rfc2047_quoting {
-- 
2.1.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2 2/2] send-email: handle adjacent RFC 2047-encoded words properly
  2014-12-14 15:59 [PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec Роман Донченко
@ 2014-12-14 15:59 ` Роман Донченко
  2014-12-15 23:13 ` [PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec Junio C Hamano
  1 sibling, 0 replies; 4+ messages in thread
From: Роман Донченко @ 2014-12-14 15:59 UTC (permalink / raw)
  To: gitster; +Cc: git

The RFC says that they are to be concatenated after decoding (i.e. the
intervening whitespace is ignored).

Signed-off-by: Роман Донченко <dpb@corrigendum.ru>
Acked-by: Jeff King <peff@peff.net>
---
 git-send-email.perl   | 26 ++++++++++++++++----------
 t/t9001-send-email.sh |  7 +++++++
 2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index d461ffb..7d5cc8a 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -919,17 +919,23 @@ $time = time - scalar $#files;
 sub unquote_rfc2047 {
 	local ($_) = @_;
 	my $charset;
-	s{$re_encoded_word}{
-		$charset = $1;
-		my $encoding = $2;
-		my $text = $3;
-		if ($encoding eq 'q' || $encoding eq 'Q') {
-			$text =~ s/_/ /g;
-			$text =~ s/=([0-9A-F]{2})/chr(hex($1))/egi;
-			$text;
-		} else {
-			$&; # other encodings not supported yet
+	my $sep = qr/[ \t]+/;
+	s{$re_encoded_word(?:$sep$re_encoded_word)*}{
+		my @words = split $sep, $&;
+		foreach (@words) {
+			m/$re_encoded_word/;
+			$charset = $1;
+			my $encoding = $2;
+			my $text = $3;
+			if ($encoding eq 'q' || $encoding eq 'Q') {
+				$_ = $text;
+				s/_/ /g;
+				s/=([0-9A-F]{2})/chr(hex($1))/egi;
+			} else {
+				# other encodings not supported yet
+			}
 		}
+		join '', @words;
 	}eg;
 	return wantarray ? ($_, $charset) : $_;
 }
diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index 19a3ced..fa965ff 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -240,6 +240,13 @@ test_expect_success $PREREQ 'non-ascii self name is suppressed' "
 		'non_ascii_self_suppressed'
 "
 
+# This name is long enough to force format-patch to split it into multiple
+# encoded-words, assuming it uses UTF-8 with the "Q" encoding.
+test_expect_success $PREREQ 'long non-ascii self name is suppressed' "
+	test_suppress_self_quoted 'Ƒüñníęř €. Nâṁé' 'odd_?=mail@example.com' \
+		'long_non_ascii_self_suppressed'
+"
+
 test_expect_success $PREREQ 'sanitized self name is suppressed' "
 	test_suppress_self_unquoted '\"A U. Thor\"' 'author@example.com' \
 		'self_name_sanitized_suppressed'
-- 
2.1.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec
  2014-12-14 15:59 [PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec Роман Донченко
  2014-12-14 15:59 ` [PATCH v2 2/2] send-email: handle adjacent RFC 2047-encoded words properly Роман Донченко
@ 2014-12-15 23:13 ` Junio C Hamano
  1 sibling, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2014-12-15 23:13 UTC (permalink / raw)
  To: Роман Донченко
  Cc: git

Thanks; queued both patches.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-12-15 23:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-14 15:59 [PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec Роман Донченко
2014-12-14 15:59 ` [PATCH v2 2/2] send-email: handle adjacent RFC 2047-encoded words properly Роман Донченко
2014-12-15 23:13 ` [PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2014-12-06 19:36 Роман Донченко

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).