* Re: testing vger handling of charsets (part 2)
2007-05-16 6:34 ` Junio C Hamano
@ 2007-05-16 9:29 ` Jan Hudec
2007-05-16 10:57 ` Jeff King
2007-05-16 10:55 ` Jeff King
1 sibling, 1 reply; 7+ messages in thread
From: Jan Hudec @ 2007-05-16 9:29 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jeff King, git, kha, bfields
On Tue, May 15, 2007 at 11:34:02PM -0700, Junio C Hamano wrote:
> botched one:
>
> outgoing:
> body in utf-8
> Content-type: text/plain; charset=utf-8
> no MIME-Version: header
>
> vger relayed to recipients:
> body untouched
> Content-type: text/plain; charset=iso-8859-1
> MIME-Version: 1.0
The strange thing is, that I got it from vger -- with
Content-type: text/plain; charset=utf-8
Therefore either:
- It's not vger, but some other mail software, that munges it.
- Some software on my side correctly guesses that it should have been
utf-8, but I don't really believe that.
--------------------------------------------------------------------------------
- Jan Hudec `Bulb' <bulb@ucw.cz>
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: testing vger handling of charsets (part 2)
2007-05-16 6:34 ` Junio C Hamano
2007-05-16 9:29 ` Jan Hudec
@ 2007-05-16 10:55 ` Jeff King
1 sibling, 0 replies; 7+ messages in thread
From: Jeff King @ 2007-05-16 10:55 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, kha, bfields
On Tue, May 15, 2007 at 11:34:02PM -0700, Junio C Hamano wrote:
> I think you are trying to figure out how vger adds/munges the
> headers, and the above is not very useful for people but
> yourself unless you explicitly say what headers you gave on your
> end in the body of the message, is it?
Yes, I'm sorry that the message itself looked a bit vague. It was
actually about the 4th or 5th such message I sent, as the list filter
kept blocking the previous ones, so with each iteration I made the
message shorter and shorter to try to remove any offending text.
So Karl and Bruce actually received several explanatory messages that
everyone else didn't, and I really only expected them to be replying.
> Judging from the list responses, I am guessing the situation is
> like this. Does that match your understanding?
Yes, this is close.
> outgoing:
> body in utf-8
> Content-type: text/plain; charset=utf-8
> no MIME-Version: header
>
> vger relayed to recipients:
> body untouched
> Content-type: text/plain; charset=iso-8859-1
> MIME-Version: 1.0
There is also a "Content-Transfer-Encoding: 8bit" that gets switched to
quoted-printable (and the body is actually encoded as QP). However, the
change of charset is the problem.
> I am not sure what exactly you meant by with/without "the right
> mime header", but the above is based on my guess that you meant
> only MIME-VERSION header.
Yes, the two messages differed _only_ in the presence of a MIME-Version
header.
So now that I have the data, let me explain the sequence of events in
the bug, which should hopefully explain what everyone has seen.
1. Bruce generates a message containing utf8 characters in the body and
the following headers:
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
and _no_ MIME-Version header. This is produced by git-format-patch on a
commit with non-ascii in the body. The message is sent to vger, and cc'd
to some recipients, and we move to step 2.
2a. The cc'd recipients receive one copy of the message intact; none of
the mailservers along the route do any munging. This message is done.
2b. vger sends the message to each list member in turn. What the user
sees depends on the mail route. We move to step 3.
3a. If the next hop from vger advertises 8BITMIME in the SMTP session,
then vger submits the message intact. This is the case for me, so I see
all messages intact (and is why I needed responses from others --
specifically, I knew Karl and Bruce were seeing the problem). This
message is done.
3b. If the next hop does not advertise 8BITMIME, vger must convert the
message to a 7bit encoding (it chooses quoted-printable). Continue to
step 4.
4a. If the message has valid MIME headers, then vger can simply encode,
re-writing the content-transfer-encoding to quoted-printable and
encoding the body. vger considers valid mime headers to be a
MIME-version header and a content-type header. This is the case for the
second message I set, which appears correctly to all recipients.
4b. If the message doesn't have valid MIME headers, then vger adds the
headers. Without a MIME-Version header, it ignores the content-type and
guesses at a suitable one, using text/plain with some totally arbitrary
local charset (in this case "iso-8859-1"). This message has now been
incorrectly munged (claims latin1 charset, but has utf8 characters).
vger puts an explanation into the X-Warning headers of the munged
message (the only unexplained thing that I had to test is that
MIME-Version is critical to vger believing the current content-type).
So recipients see the bug IFF
the original has utf8 characters
AND the original lacks a MIME-Version header
AND their mailserver doesn't claim 8BITMIME
Interestingly, rfc1428 claims that in this case vger should actually set
the charset to "unknown-8bit":
If no information about the character set in use is available, the
gateway should upgrade the content by using the character set
"unknown-8bit". The unknown-8bit value of the charset parameter
indicates only that no reliable information about the character set(s)
used in the message was available.
Though that really just pushes the problem to the recipients MUA, and I
have no idea what the handling of "unknown-8bit" is like there.
-Peff
^ permalink raw reply [flat|nested] 7+ messages in thread