All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec
@ 2014-12-06 19:36 Роман Донченко
  2014-12-06 19:36 ` [PATCH v2 2/2] send-email: handle adjacent RFC 2047-encoded words properly Роман Донченко
  0 siblings, 1 reply; 8+ messages in thread
From: Роман Донченко @ 2014-12-06 19:36 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Jay Soffian, Jeff King, Thomas Rast

More specifically:

* Add "\" to the list of characters not allowed in a token (see RFC 2047
  errata).

* Share regexes between unquote_rfc2047 and is_rfc2047_quoted. Besides
  removing duplication, this also makes unquote_rfc2047 more stringent.

* Allow both "q" and "Q" to identify the encoding.

* Allow lowercase hexadecimal digits in the "Q" encoding.

And, more on the cosmetic side:

* Change the "encoded-text" regex to exclude rather than include characters,
  for clarity and consistency with "token".

Signed-off-by: Роман Донченко <dpb@corrigendum.ru>
---
 git-send-email.perl | 30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index 9949db0..d461ffb 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -145,6 +145,11 @@ my $have_mail_address = eval { require Mail::Address; 1 };
 my $smtp;
 my $auth;
 
+# Regexes for RFC 2047 productions.
+my $re_token = qr/[^][()<>@,;:\\"\/?.= \000-\037\177-\377]+/;
+my $re_encoded_text = qr/[^? \000-\037\177-\377]+/;
+my $re_encoded_word = qr/=\?($re_token)\?($re_token)\?($re_encoded_text)\?=/;
+
 # Variables we fill in automatically, or via prompting:
 my (@to,$no_to,@initial_to,@cc,$no_cc,@initial_cc,@bcclist,$no_bcc,@xh,
 	$initial_reply_to,$initial_subject,@files,
@@ -913,15 +918,20 @@ $time = time - scalar $#files;
 
 sub unquote_rfc2047 {
 	local ($_) = @_;
-	my $encoding;
-	s{=\?([^?]+)\?q\?(.*?)\?=}{
-		$encoding = $1;
-		my $e = $2;
-		$e =~ s/_/ /g;
-		$e =~ s/=([0-9A-F]{2})/chr(hex($1))/eg;
-		$e;
+	my $charset;
+	s{$re_encoded_word}{
+		$charset = $1;
+		my $encoding = $2;
+		my $text = $3;
+		if ($encoding eq 'q' || $encoding eq 'Q') {
+			$text =~ s/_/ /g;
+			$text =~ s/=([0-9A-F]{2})/chr(hex($1))/egi;
+			$text;
+		} else {
+			$&; # other encodings not supported yet
+		}
 	}eg;
-	return wantarray ? ($_, $encoding) : $_;
+	return wantarray ? ($_, $charset) : $_;
 }
 
 sub quote_rfc2047 {
@@ -934,10 +944,8 @@ sub quote_rfc2047 {
 
 sub is_rfc2047_quoted {
 	my $s = shift;
-	my $token = qr/[^][()<>@,;:"\/?.= \000-\037\177-\377]+/;
-	my $encoded_text = qr/[!->@-~]+/;
 	length($s) <= 75 &&
-	$s =~ m/^(?:"[[:ascii:]]*"|=\?$token\?$token\?$encoded_text\?=)$/o;
+	$s =~ m/^(?:"[[:ascii:]]*"|$re_encoded_word)$/o;
 }
 
 sub subject_needs_rfc2047_quoting {
-- 
2.1.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread
* [PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec
@ 2014-12-14 15:59 Роман Донченко
  2014-12-14 15:59 ` [PATCH v2 2/2] send-email: handle adjacent RFC 2047-encoded words properly Роман Донченко
  0 siblings, 1 reply; 8+ messages in thread
From: Роман Донченко @ 2014-12-14 15:59 UTC (permalink / raw)
  To: gitster; +Cc: git

More specifically:

* Add "\" to the list of characters not allowed in a token (see RFC 2047
  errata).

* Share regexes between unquote_rfc2047 and is_rfc2047_quoted. Besides
  removing duplication, this also makes unquote_rfc2047 more stringent.

* Allow both "q" and "Q" to identify the encoding.

* Allow lowercase hexadecimal digits in the "Q" encoding.

And, more on the cosmetic side:

* Change the "encoded-text" regex to exclude rather than include characters,
  for clarity and consistency with "token".

Signed-off-by: Роман Донченко <dpb@corrigendum.ru>
Acked-by: Jeff King <peff@peff.net>
---
 git-send-email.perl | 30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index 9949db0..d461ffb 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -145,6 +145,11 @@ my $have_mail_address = eval { require Mail::Address; 1 };
 my $smtp;
 my $auth;
 
+# Regexes for RFC 2047 productions.
+my $re_token = qr/[^][()<>@,;:\\"\/?.= \000-\037\177-\377]+/;
+my $re_encoded_text = qr/[^? \000-\037\177-\377]+/;
+my $re_encoded_word = qr/=\?($re_token)\?($re_token)\?($re_encoded_text)\?=/;
+
 # Variables we fill in automatically, or via prompting:
 my (@to,$no_to,@initial_to,@cc,$no_cc,@initial_cc,@bcclist,$no_bcc,@xh,
 	$initial_reply_to,$initial_subject,@files,
@@ -913,15 +918,20 @@ $time = time - scalar $#files;
 
 sub unquote_rfc2047 {
 	local ($_) = @_;
-	my $encoding;
-	s{=\?([^?]+)\?q\?(.*?)\?=}{
-		$encoding = $1;
-		my $e = $2;
-		$e =~ s/_/ /g;
-		$e =~ s/=([0-9A-F]{2})/chr(hex($1))/eg;
-		$e;
+	my $charset;
+	s{$re_encoded_word}{
+		$charset = $1;
+		my $encoding = $2;
+		my $text = $3;
+		if ($encoding eq 'q' || $encoding eq 'Q') {
+			$text =~ s/_/ /g;
+			$text =~ s/=([0-9A-F]{2})/chr(hex($1))/egi;
+			$text;
+		} else {
+			$&; # other encodings not supported yet
+		}
 	}eg;
-	return wantarray ? ($_, $encoding) : $_;
+	return wantarray ? ($_, $charset) : $_;
 }
 
 sub quote_rfc2047 {
@@ -934,10 +944,8 @@ sub quote_rfc2047 {
 
 sub is_rfc2047_quoted {
 	my $s = shift;
-	my $token = qr/[^][()<>@,;:"\/?.= \000-\037\177-\377]+/;
-	my $encoded_text = qr/[!->@-~]+/;
 	length($s) <= 75 &&
-	$s =~ m/^(?:"[[:ascii:]]*"|=\?$token\?$token\?$encoded_text\?=)$/o;
+	$s =~ m/^(?:"[[:ascii:]]*"|$re_encoded_word)$/o;
 }
 
 sub subject_needs_rfc2047_quoting {
-- 
2.1.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-12-14 16:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-06 19:36 [PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec Роман Донченко
2014-12-06 19:36 ` [PATCH v2 2/2] send-email: handle adjacent RFC 2047-encoded words properly Роман Донченко
2014-12-07  9:18   ` Jeff King
2014-12-07 14:35     ` Роман Донченко
2014-12-07 17:48       ` Philip Oakley
2014-12-07 18:17         ` Роман Донченко
2014-12-10 22:21           ` Philip Oakley
  -- strict thread matches above, loose matches on Subject: below --
2014-12-14 15:59 [PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec Роман Донченко
2014-12-14 15:59 ` [PATCH v2 2/2] send-email: handle adjacent RFC 2047-encoded words properly Роман Донченко

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.