* [BUG] encoding problem with format-patch + send-email
@ 2007-11-15 10:57 Uwe Kleine-König
2007-11-16 10:49 ` Jeff King
0 siblings, 1 reply; 9+ messages in thread
From: Uwe Kleine-König @ 2007-11-15 10:57 UTC (permalink / raw)
To: git; +Cc: Brian Swetland, Russell King - ARM Linux, Nicolas Pitre
Hello,
Brian just stumbled over a problem with format-patch + send-email.
format-patch only adds Content-Type and Content-Transfer-Encoding
headers iff the body needs it.
send-email adds "From: A. U. Thor <author@tld>" to the body if sender
and From: in the patch to send differ.
Both is just fine, but if the author has some non-ascii characters in
her name but the body is ascii-only the resulting mail is broken.
What about adding the Content-Type and Content-Transfer-Encoding headers
in any case?
Best regards
Uwe
--
Uwe Kleine-König, Software Engineer
Digi International GmbH Branch Breisach, Küferstrasse 8, 79206 Breisach, Germany
Tax: 315/5781/0242 / VAT: DE153662976 / Reg. Amtsgericht Dortmund HRB 13962
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] encoding problem with format-patch + send-email
2007-11-15 10:57 [BUG] encoding problem with format-patch + send-email Uwe Kleine-König
@ 2007-11-16 10:49 ` Jeff King
2007-11-16 11:14 ` Uwe Kleine-König
2007-11-17 0:48 ` Junio C Hamano
0 siblings, 2 replies; 9+ messages in thread
From: Jeff King @ 2007-11-16 10:49 UTC (permalink / raw)
To: Uwe Kleine-König, Junio C Hamano
Cc: git, Brian Swetland, Russell King - ARM Linux, Nicolas Pitre
On Thu, Nov 15, 2007 at 11:57:26AM +0100, Uwe Kleine-König wrote:
> send-email adds "From: A. U. Thor <author@tld>" to the body if sender
> and From: in the patch to send differ.
>
> Both is just fine, but if the author has some non-ascii characters in
> her name but the body is ascii-only the resulting mail is broken.
I posted an untested fix to this and discussed the issue in
http://article.gmane.org/gmane.comp.version-control.git/64426
http://article.gmane.org/gmane.comp.version-control.git/64436
but nobody seems to have been interested after that (I don't even use
git-send-email myself).
Below is an updated patch (there was a typo in one of the regexes in the
original) that meets my limited testing for the all-utf8 case (I don't
know how people actually use alternate encodings with git, if at all, so
I don't know that I can put together a good test case). My test case was
something like:
git-clone git test && cd test
echo junk >>Makefile
git-commit -m junk --author 'Uwe Kleine-König <peff@peff.net>' -a
git-format-patch HEAD^
git-send-email 0001-junk.patch
> What about adding the Content-Type and Content-Transfer-Encoding headers
> in any case?
You could probably add them unconditionally, but it would be nice to
have them match the encoding, so you'd still want to pick them out of
the rfc2047 encoding in the from header.
-Peff
-- >8 --
git-send-email: add charset header if we add encoded 'From'
We sometimes pick out the original rfc822 'From' header and
include it in the body of the message. If the original
author's name needs encoding, then we should specify that in
the content-type header.
If we already had a content-type header in the mail, then we
may need to re-encode. The logic is there to detect
this case, but it doesn't actually do the re-encoding.
Signed-off-by: Jeff King <peff@peff.net>
---
git-send-email.perl | 34 +++++++++++++++++++++++++++++++---
1 files changed, 31 insertions(+), 3 deletions(-)
diff --git a/git-send-email.perl b/git-send-email.perl
index f9bd2e5..fd0a4ad 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -514,11 +514,13 @@ $time = time - scalar $#files;
sub unquote_rfc2047 {
local ($_) = @_;
- if (s/=\?utf-8\?q\?(.*)\?=/$1/g) {
+ my $encoding;
+ if (s/=\?([^?]+)\?q\?(.*)\?=/$2/g) {
+ $encoding = $1;
s/_/ /g;
s/=([0-9A-F]{2})/chr(hex($1))/eg;
}
- return "$_";
+ return wantarray ? ($_, $encoding) : $_;
}
# use the simplest quoting being able to handle the recipient
@@ -667,6 +669,9 @@ foreach my $t (@files) {
open(F,"<",$t) or die "can't open file $t";
my $author = undef;
+ my $author_encoding;
+ my $has_content_type;
+ my $body_encoding;
@cc = @initial_cc;
@xh = ();
my $input_format = undef;
@@ -692,12 +697,20 @@ foreach my $t (@files) {
next if ($suppress_from);
}
elsif ($1 eq 'From') {
- $author = unquote_rfc2047($2);
+ ($author, $author_encoding)
+ = unquote_rfc2047($2);
}
printf("(mbox) Adding cc: %s from line '%s'\n",
$2, $_) unless $quiet;
push @cc, $2;
}
+ elsif (/^Content-type:/i) {
+ $has_content_type = 1;
+ if (/charset="?[^ "]+/) {
+ $body_encoding = $1;
+ }
+ push @xh, $_;
+ }
elsif (!/^Date:\s/ && /^[-A-Za-z]+:\s+\S/) {
push @xh, $_;
}
@@ -756,6 +769,21 @@ foreach my $t (@files) {
if (defined $author) {
$message = "From: $author\n\n$message";
+ if (defined $author_encoding) {
+ if ($has_content_type) {
+ if ($body_encoding eq $author_encoding) {
+ # ok, we already have the right encoding
+ }
+ else {
+ # uh oh, we should re-encode
+ }
+ }
+ else {
+ push @xh,
+ 'MIME-Version: 1.0',
+ "Content-Type: text/plain; charset=$author_encoding";
+ }
+ }
}
send_message();
--
1.5.3.1.47.g88b7d-dirty
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [BUG] encoding problem with format-patch + send-email
2007-11-16 10:49 ` Jeff King
@ 2007-11-16 11:14 ` Uwe Kleine-König
2007-11-17 0:48 ` Junio C Hamano
1 sibling, 0 replies; 9+ messages in thread
From: Uwe Kleine-König @ 2007-11-16 11:14 UTC (permalink / raw)
To: Jeff King
Cc: Junio C Hamano, git, Brian Swetland, Russell King - ARM Linux,
Nicolas Pitre
Hello Jeff,
> sub unquote_rfc2047 {
> local ($_) = @_;
> - if (s/=\?utf-8\?q\?(.*)\?=/$1/g) {
> + my $encoding;
> + if (s/=\?([^?]+)\?q\?(.*)\?=/$2/g) {
> + $encoding = $1;
> s/_/ /g;
> s/=([0-9A-F]{2})/chr(hex($1))/eg;
> }
> - return "$_";
> + return wantarray ? ($_, $encoding) : $_;
> }
I don't know perl very well, but that wantarray seems hacky. (Something
in my head wants to have it always return ($_, $encoding) and fix all
callers. :-)
> [...]
> @@ -756,6 +769,21 @@ foreach my $t (@files) {
>
> if (defined $author) {
> $message = "From: $author\n\n$message";
> + if (defined $author_encoding) {
> + if ($has_content_type) {
> + if ($body_encoding eq $author_encoding) {
> + # ok, we already have the right encoding
> + }
> + else {
> + # uh oh, we should re-encode
IMHO we should bail here or do the recoding (and bail if that fails).
OTH this patch improves send-emails behaviour because currently it
doesn't bother at all and with this patch it could at least fix the
common cases.
So
Acked-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>
but note I only read the code, I didn't run it.
> + }
> + }
> + else {
> + push @xh,
> + 'MIME-Version: 1.0',
> + "Content-Type: text/plain; charset=$author_encoding";
> + }
> + }
> }
Best regards
Uwe
--
Uwe Kleine-König, Software Engineer
Digi International GmbH Branch Breisach, Küferstrasse 8, 79206 Breisach, Germany
Tax: 315/5781/0242 / VAT: DE153662976 / Reg. Amtsgericht Dortmund HRB 13962
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] encoding problem with format-patch + send-email
2007-11-16 10:49 ` Jeff King
2007-11-16 11:14 ` Uwe Kleine-König
@ 2007-11-17 0:48 ` Junio C Hamano
2007-11-19 10:49 ` Uwe Kleine-König
1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2007-11-17 0:48 UTC (permalink / raw)
To: Jeff King
Cc: Uwe Kleine-König, git, Brian Swetland,
Russell King - ARM Linux, Nicolas Pitre
Jeff King <peff@peff.net> writes:
> git-send-email: add charset header if we add encoded 'From'
Thanks.
> If we already had a content-type header in the mail, then we
> may need to re-encode. The logic is there to detect
> this case, but it doesn't actually do the re-encoding.
Although the charset on rfc2047 encoded header fields can be
independent of the charset in the body, I think you need to do
crazy things to do so.
Will queue.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] encoding problem with format-patch + send-email
2007-11-17 0:48 ` Junio C Hamano
@ 2007-11-19 10:49 ` Uwe Kleine-König
2007-11-19 10:58 ` Brian Swetland
2007-11-20 12:54 ` [PATCH] send-email: add transfer encoding header with content-type Jeff King
0 siblings, 2 replies; 9+ messages in thread
From: Uwe Kleine-König @ 2007-11-19 10:49 UTC (permalink / raw)
To: Jeff King
Cc: Junio C Hamano, git, Brian Swetland, Russell King - ARM Linux,
Nicolas Pitre
Hello Jeff
Brian sent another mail to the linux-arm-kernel mailing list, now
spotting:
Content-Type: text/plain; charset=UTF-8
but no Content-Transfer-Encoding:. This yield a 7bit mail with 8bit
characters.
I think we should add
Content-Transfer-Encoding: 8bit
, too.
Best regards
Uwe
--
Uwe Kleine-König
$ dc << EOF
[d1-d1<a]sa99d1<a1[rdn555760928P*pz1<a]salax
EOF
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] encoding problem with format-patch + send-email
2007-11-19 10:49 ` Uwe Kleine-König
@ 2007-11-19 10:58 ` Brian Swetland
2007-11-20 12:54 ` [PATCH] send-email: add transfer encoding header with content-type Jeff King
1 sibling, 0 replies; 9+ messages in thread
From: Brian Swetland @ 2007-11-19 10:58 UTC (permalink / raw)
To: Uwe Kleine-König, Jeff King, Junio C Hamano, git
[Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de>]
> Hello Jeff
>
> Brian sent another mail to the linux-arm-kernel mailing list, now
> spotting:
>
> Content-Type: text/plain; charset=UTF-8
>
> but no Content-Transfer-Encoding:. This yield a 7bit mail with 8bit
> characters.
I actually tacked the Content-Type on by hand on that one.
I haven't had a chance to try the updated send-email, so it may well
do the right thing.
Brian
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] send-email: add transfer encoding header with content-type
2007-11-19 10:49 ` Uwe Kleine-König
2007-11-19 10:58 ` Brian Swetland
@ 2007-11-20 12:54 ` Jeff King
2007-11-20 13:49 ` Uwe Kleine-König
2007-11-21 6:58 ` Junio C Hamano
1 sibling, 2 replies; 9+ messages in thread
From: Jeff King @ 2007-11-20 12:54 UTC (permalink / raw)
To: Uwe Kleine-König
Cc: Junio C Hamano, git, Brian Swetland, Russell King - ARM Linux,
Nicolas Pitre
We add the content-type header only when we have non-7bit
characters from the 'From' header, so we really need to
specify the encoding (in other cases, where the commit text
needed a content-type, git-format-patch will already have
added the encoding header).
Signed-off-by: Jeff King <peff@peff.net>
---
On Mon, Nov 19, 2007 at 11:49:50AM +0100, Uwe Kleine-König wrote:
> but no Content-Transfer-Encoding:. This yield a 7bit mail with 8bit
> characters.
>
> I think we should add
>
> Content-Transfer-Encoding: 8bit
Even though Brian's mail turned out to be hand-generated, this problem
does exist in git-send-email. I don't know why I didn't add the encoding
header in the first place, since it is clearly required.
Junio, I think this is maint-worthy.
git-send-email.perl | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/git-send-email.perl b/git-send-email.perl
index fd0a4ad..d7b8391 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -781,7 +781,8 @@ foreach my $t (@files) {
else {
push @xh,
'MIME-Version: 1.0',
- "Content-Type: text/plain; charset=$author_encoding";
+ "Content-Type: text/plain; charset=$author_encoding",
+ 'Content-Transfer-Encoding: 8bit';
}
}
}
--
1.5.3.6.1784.gd1b1d-dirty
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] send-email: add transfer encoding header with content-type
2007-11-20 12:54 ` [PATCH] send-email: add transfer encoding header with content-type Jeff King
@ 2007-11-20 13:49 ` Uwe Kleine-König
2007-11-21 6:58 ` Junio C Hamano
1 sibling, 0 replies; 9+ messages in thread
From: Uwe Kleine-König @ 2007-11-20 13:49 UTC (permalink / raw)
To: Jeff King
Cc: Junio C Hamano, git, Brian Swetland, Russell King - ARM Linux,
Nicolas Pitre
Hello,
Jeff King wrote:
> We add the content-type header only when we have non-7bit
> characters from the 'From' header, so we really need to
> specify the encoding (in other cases, where the commit text
> needed a content-type, git-format-patch will already have
> added the encoding header).
>
> Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de>
--
Uwe Kleine-König
$ dc << EOF
[d1-d1<a]sa99d1<a1[rdn555760928P*pz1<a]salax
EOF
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] send-email: add transfer encoding header with content-type
2007-11-20 12:54 ` [PATCH] send-email: add transfer encoding header with content-type Jeff King
2007-11-20 13:49 ` Uwe Kleine-König
@ 2007-11-21 6:58 ` Junio C Hamano
1 sibling, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2007-11-21 6:58 UTC (permalink / raw)
To: Jeff King
Cc: Uwe Kleine-König, git, Brian Swetland,
Russell King - ARM Linux, Nicolas Pitre
Jeff King <peff@peff.net> writes:
>> I think we should add
>>
>> Content-Transfer-Encoding: 8bit
>
> Even though Brian's mail turned out to be hand-generated, this problem
> does exist in git-send-email. I don't know why I didn't add the encoding
> header in the first place, since it is clearly required.
>
> Junio, I think this is maint-worthy.
Yeah, looks good.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-11-21 6:59 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-15 10:57 [BUG] encoding problem with format-patch + send-email Uwe Kleine-König
2007-11-16 10:49 ` Jeff King
2007-11-16 11:14 ` Uwe Kleine-König
2007-11-17 0:48 ` Junio C Hamano
2007-11-19 10:49 ` Uwe Kleine-König
2007-11-19 10:58 ` Brian Swetland
2007-11-20 12:54 ` [PATCH] send-email: add transfer encoding header with content-type Jeff King
2007-11-20 13:49 ` Uwe Kleine-König
2007-11-21 6:58 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).