git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH resent] send-email: Honor multi-part email messages
@ 2013-01-25 15:28 Alexey Shumkin
  2013-01-25 15:28 ` [PATCH] " Alexey Shumkin
  2013-01-25 18:14 ` [PATCH resent] " Junio C Hamano
  0 siblings, 2 replies; 5+ messages in thread
From: Alexey Shumkin @ 2013-01-25 15:28 UTC (permalink / raw)
  To: git; +Cc: Alexey Shumkin, Junio C Hamano, Krzysztof Mazur

Let's try to involve Krzysztof Mazur <krzysiek@podlesie.net>
who have met the similar problem recently
(according to this thread http://thread.gmane.org/gmane.comp.version-control.git/208297/focus=208310)
I recitate myself:
>Well, as I understand "current" algorithm:
>1. It assumes that file is one-part email message
>2. Function searches non-ASCII characters in Subject header
>3. If none then it looks non-ASCII characters at message body
>
>my changes are to skip looking at message body of a multi-part
>message as it has parts with their own Content-Type headers
>
>The said above in details:
>1. To set flag when we meet Content-Type: multipart/mixed header
>2. After we processed all headers and did not found non-ASCII characters
>in a Subject we take a look at this flag and exit with 0
>if it is a multi-part message


>>I think your patch is wrong.  What happens when we see a Subject:
>>line with a non-ascii on it that causes an early return of the loop
>>before your new code has a chance to see Content-Type: header?
This function is used to determine "broken" (non-ASCII) headers (to be encode them)
The problem is if "Subject" is not broken, but message body contains non-ASCII chars,
subject is marked as broken and encoded again.

P.S.
To involved: the beginning of thread is here http://thread.gmane.org/gmane.comp.version-control.git/181743

Alexey Shumkin (1):
  send-email: Honor multi-part email messages

 git-send-email.perl   |  5 ++++
 t/t9001-send-email.sh | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 71 insertions(+)

-- 
1.8.1.1.10.g9255f3f

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] send-email: Honor multi-part email messages
  2013-01-25 15:28 [PATCH resent] send-email: Honor multi-part email messages Alexey Shumkin
@ 2013-01-25 15:28 ` Alexey Shumkin
  2013-01-25 17:47   ` Krzysztof Mazur
  2013-01-25 18:14 ` [PATCH resent] " Junio C Hamano
  1 sibling, 1 reply; 5+ messages in thread
From: Alexey Shumkin @ 2013-01-25 15:28 UTC (permalink / raw)
  To: git; +Cc: Alexey Shumkin, Junio C Hamano, Krzysztof Mazur

"git format-patch --attach/--inline" generates multi-part messages.
Every part of such messages can contain non-ASCII characters with its own
"Content-Type" and "Content-Transfer-Encoding" headers.
But git-send-mail script interprets a patch-file as one-part message
and does not recognize multi-part messages.
So already quoted printable email subject may be encoded as quoted printable
again. Due to this bug email subject looks corrupted in email clients.

Signed-off-by: Alexey Shumkin <Alex.Crezoff@gmail.com>
---
 git-send-email.perl   |  5 ++++
 t/t9001-send-email.sh | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 71 insertions(+)

diff --git a/git-send-email.perl b/git-send-email.perl
index 94c7f76..d49befe 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -1499,12 +1499,17 @@ sub file_has_nonascii {
 
 sub body_or_subject_has_nonascii {
 	my $fn = shift;
+	my $multipart = 0;
 	open(my $fh, '<', $fn)
 		or die "unable to open $fn: $!\n";
 	while (my $line = <$fh>) {
 		last if $line =~ /^$/;
+		if ($line =~ /^Content-Type:\s*multipart\/mixed.*$/) {
+			$multipart = 1;
+		}
 		return 1 if $line =~ /^Subject.*[^[:ascii:]]/;
 	}
+	return 0 if $multipart;
 	while (my $line = <$fh>) {
 		return 1 if $line =~ /[^[:ascii:]]/;
 	}
diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index 97d6f4c..c7ed370 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -1323,4 +1323,70 @@ test_expect_success $PREREQ 'sendemail.aliasfile=~/.mailrc' '
 	grep "^!someone@example\.org!$" commandline1
 '
 
+test_expect_success $PREREQ 'setup multi-part message' '
+cat >multi-part-email-using-8bit <<EOF
+From fe6ecc66ece37198fe5db91fa2fc41d9f4fe5cc4 Mon Sep 17 00:00:00 2001
+Message-Id: <bogus-message-id@example.com>
+From: author@example.com
+Date: Sat, 12 Jun 2010 15:53:58 +0200
+Subject: [PATCH] =?UTF-8?q?=D0=94=D0=BE=D0=B1=D0=B0=D0=B2=D0=BB=D0=B5=D0=BD=20?=
+ =?UTF-8?q?=D1=84=D0=B0=D0=B9=D0=BB?=
+MIME-Version: 1.0
+Content-Type: multipart/mixed; boundary="------------123"
+
+This is a multi-part message in MIME format.
+--------------1.7.6.3.4.gf71f
+Content-Type: text/plain; charset=UTF-8; format=fixed
+Content-Transfer-Encoding: 8bit
+
+This is a message created with "git format-patch --attach=123"
+---
+ master   |    1 +
+ файл |    1 +
+ 2 files changed, 2 insertions(+), 0 deletions(-)
+ create mode 100644 master
+ create mode 100644 файл
+
+
+--------------123
+Content-Type: text/x-patch; name="0001-.patch"
+Content-Transfer-Encoding: 8bit
+Content-Disposition: attachment; filename="0001-.patch"
+
+diff --git a/master b/master
+new file mode 100644
+index 0000000..1f7391f
+--- /dev/null
++++ b/master
+@@ -0,0 +1 @@
++master
+diff --git a/файл b/файл
+new file mode 100644
+index 0000000..44e5cfe
+--- /dev/null
++++ b/файл
+@@ -0,0 +1 @@
++содержимое файла
+
+--------------123--
+EOF
+'
+
+test_expect_success $PREREQ 'setup expect' '
+cat >expected <<EOF
+Subject: [PATCH] =?UTF-8?q?=D0=94=D0=BE=D0=B1=D0=B0=D0=B2=D0=BB=D0=B5=D0=BD=20?= =?UTF-8?q?=D1=84=D0=B0=D0=B9=D0=BB?=
+EOF
+'
+
+test_expect_success $PREREQ '--attach/--inline also treats subject' '
+	clean_fake_sendmail &&
+	echo bogus |
+	git send-email --from=author@example.com --to=nobody@example.com \
+			--smtp-server="$(pwd)/fake.sendmail" \
+			--8bit-encoding=UTF-8 \
+			multi-part-email-using-8bit >stdout &&
+	grep "Subject" msgtxt1 >actual &&
+	test_cmp expected actual
+'
+
 test_done
-- 
1.8.1.1.10.g9255f3f

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] send-email: Honor multi-part email messages
  2013-01-25 15:28 ` [PATCH] " Alexey Shumkin
@ 2013-01-25 17:47   ` Krzysztof Mazur
  2013-01-25 22:24     ` Jeff King
  0 siblings, 1 reply; 5+ messages in thread
From: Krzysztof Mazur @ 2013-01-25 17:47 UTC (permalink / raw)
  To: Alexey Shumkin; +Cc: git, Junio C Hamano

On Fri, Jan 25, 2013 at 07:28:54PM +0400, Alexey Shumkin wrote:
> "git format-patch --attach/--inline" generates multi-part messages.
> Every part of such messages can contain non-ASCII characters with its own
> "Content-Type" and "Content-Transfer-Encoding" headers.
> But git-send-mail script interprets a patch-file as one-part message
> and does not recognize multi-part messages.
> So already quoted printable email subject may be encoded as quoted printable
> again. Due to this bug email subject looks corrupted in email clients.

I don't think that the problem with the Subject is multi-part message
specific. The real problem with the Subject is probably that
is_rfc2047_quoted() does not detect that the Subject is already quoted.

Of course we still need that explicit multi-part message support to
avoid "Which 8bit encoding should I declare [UTF-8]? " message.

> 
> diff --git a/git-send-email.perl b/git-send-email.perl
> index 94c7f76..d49befe 100755
> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -1499,12 +1499,17 @@ sub file_has_nonascii {
>  
>  sub body_or_subject_has_nonascii {
>  	my $fn = shift;
> +	my $multipart = 0;
>  	open(my $fh, '<', $fn)
>  		or die "unable to open $fn: $!\n";
>  	while (my $line = <$fh>) {
>  		last if $line =~ /^$/;
> +		if ($line =~ /^Content-Type:\s*multipart\/mixed.*$/) {
> +			$multipart = 1;
> +		}
>  		return 1 if $line =~ /^Subject.*[^[:ascii:]]/;
>  	}
> +	return 0 if $multipart;
>  	while (my $line = <$fh>) {
>  		return 1 if $line =~ /[^[:ascii:]]/;
>  	}

After this change the function name is no longer appropriate.
Maybe we should join body_or_subject_has_nonascii()
and file_declares_8bit_cte() because in case of multi-part messages
	"next unless (body_or_subject_has_nonascii($f)
		     && !file_declares_8bit_cte($f));"
is not valid anymore. We could also check for broken_encoding
in single pass.

Thanks,

Krzysiek

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH resent] send-email: Honor multi-part email messages
  2013-01-25 15:28 [PATCH resent] send-email: Honor multi-part email messages Alexey Shumkin
  2013-01-25 15:28 ` [PATCH] " Alexey Shumkin
@ 2013-01-25 18:14 ` Junio C Hamano
  1 sibling, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2013-01-25 18:14 UTC (permalink / raw)
  To: Alexey Shumkin; +Cc: git, Krzysztof Mazur

Alexey Shumkin <Alex.Crezoff@gmail.com> writes:

> This function is used to determine "broken" (non-ASCII) headers (to be encode them)
> The problem is if "Subject" is not broken, but message body contains non-ASCII chars,
> subject is marked as broken and encoded again.

I think that is not a "problem" but is a mere symptom.

The remainder of the codeflow of send-email, AFAICS (it's not my
code), is not prepared to deal with multipart messages at all.  In
order to handle multi-part properly, you may still have to fix
broken Subject: of the whole thing, and you may also want to fix
broken headers inside one part while keeping correctly formatted
part intact.

Your patch just stops an early error checking that is meant for a
non multi-part message that happens to trigger on a multi-part
message in your test case from triggering (i.e. masking a symptom)
and let the remaining lines of the multi-part message to codepath
that does not do anything special to handle multi-part messages
correctly, letting it do whatever it happens to do to a message
assuming it is not a multi-part message, no?

In other words, making send-email capable of handling a multi-part
might be a worthy thing to do, but I do not think your patch is a
good first step for doing so.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] send-email: Honor multi-part email messages
  2013-01-25 17:47   ` Krzysztof Mazur
@ 2013-01-25 22:24     ` Jeff King
  0 siblings, 0 replies; 5+ messages in thread
From: Jeff King @ 2013-01-25 22:24 UTC (permalink / raw)
  To: Krzysztof Mazur; +Cc: Alexey Shumkin, git, Junio C Hamano

On Fri, Jan 25, 2013 at 06:47:00PM +0100, Krzysztof Mazur wrote:

> On Fri, Jan 25, 2013 at 07:28:54PM +0400, Alexey Shumkin wrote:
> > "git format-patch --attach/--inline" generates multi-part messages.
> > Every part of such messages can contain non-ASCII characters with its own
> > "Content-Type" and "Content-Transfer-Encoding" headers.
> > But git-send-mail script interprets a patch-file as one-part message
> > and does not recognize multi-part messages.
> > So already quoted printable email subject may be encoded as quoted printable
> > again. Due to this bug email subject looks corrupted in email clients.
> 
> I don't think that the problem with the Subject is multi-part message
> specific. The real problem with the Subject is probably that
> is_rfc2047_quoted() does not detect that the Subject is already quoted.

I have not even looked at this problem at all, but seeing this function
name:

> >  sub body_or_subject_has_nonascii {

Makes me think something is very wrong. The subject line should not have
anything to do whatsoever with a content-type or
content-transfer-encoding header. It should either be rfc2047 encoded or
not, and the encoding used does not have to correspond to what is used
elsewhere in the message. rfc2047 is very clear that other MIME headers
are not necessary to interpret encoded words in headers.

So this loop:

	foreach my $f (@files) {
	        next unless (body_or_subject_has_nonascii($f)
	                     && !file_declares_8bit_cte($f));
	        $broken_encoding{$f} = 1;
	}

does not seem right at all. Only the body depends on the 8bit CTE.

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-01-25 22:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-25 15:28 [PATCH resent] send-email: Honor multi-part email messages Alexey Shumkin
2013-01-25 15:28 ` [PATCH] " Alexey Shumkin
2013-01-25 17:47   ` Krzysztof Mazur
2013-01-25 22:24     ` Jeff King
2013-01-25 18:14 ` [PATCH resent] " Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).