git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Robin Rosenberg <robin.rosenberg.lists@dewire.com>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [PATCH 2/2] send-email: rfc2047-quote subject lines with non-ascii characters
Date: Sat, 29 Mar 2008 05:11:45 -0400	[thread overview]
Message-ID: <20080329091145.GA19501@coredump.intra.peff.net> (raw)
In-Reply-To: <200803291002.43768.robin.rosenberg.lists@dewire.com>

On Sat, Mar 29, 2008 at 10:02:43AM +0100, Robin Rosenberg wrote:

> My proof is entirely empirical. What happens is that attempting to decode a 
> non-UTF-8 string will put a unicode surrogate pair into the (now Unicode) 
> string and encoding will just encode the surrogate pair into UTF-8 and not 
> the original. As a result, the encode(decode($x)) eq $x *only* if $x is a
> valid UTF-8 octet sequence. Why would you not get the original back if
> you start with valid UTF-8?

Because some UTF-8 sequences have multiple representations, and that
information may be lost by whatever intermediate form is the result of
decode($x). In practice, I don't know if this happens or not.

Though it looks like there is an Encode::is_utf8 function (which is also
utf8::is_utf8, but only in perl >= 5.8.1). So we could use that, but it
needs the utf-8 flag turned on for the string. Maybe utf8::valid is
actually what we want.

But there is still a larger question. You have some binary bytes that
will go in a subject header. There are non-ascii bytes. There are
non-utf8 sequences. What do you do?

-Peff

  reply	other threads:[~2008-03-29  9:12 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-28  6:30 [ANNOUNCE] GIT 1.5.5-rc2 Junio C Hamano
2008-03-28 18:13 ` Jeff King
2008-03-28 21:05   ` Junio C Hamano
2008-03-28 21:23     ` Jeff King
2008-03-28 21:27       ` Jeff King
2008-03-28 21:28         ` [PATCH 1/2] send-email: specify content-type of --compose body Jeff King
2008-03-28 21:29         ` [PATCH 2/2] send-email: rfc2047-quote subject lines with non-ascii characters Jeff King
2008-03-29  7:19           ` Robin Rosenberg
2008-03-29  7:22             ` Jeff King
2008-03-29  8:41               ` Robin Rosenberg
2008-03-29  8:49                 ` Jeff King
2008-03-29  9:02                   ` Robin Rosenberg
2008-03-29  9:11                     ` Jeff King [this message]
2008-03-29  9:39                       ` Robin Rosenberg
2008-03-29  9:43                         ` Jeff King
2008-03-29 12:54                           ` Robin Rosenberg
2008-03-29 21:45                             ` Jeff King
2008-03-30  3:40                               ` Sam Vilain
2008-03-30  4:39                                 ` Jeff King
2008-03-30 23:47                 ` Junio C Hamano
2008-03-29  8:44               ` Robin Rosenberg
2008-03-29  8:53                 ` Jeff King
2008-03-29  9:38                   ` Robin Rosenberg
2008-03-29  9:52                     ` Jeff King
2008-03-29 12:54                       ` Robin Rosenberg
2008-03-29 21:18                         ` Jeff King
2008-03-29 21:43                           ` Robin Rosenberg
2008-03-29 22:00                             ` Jeff King
2008-03-30  2:12                       ` Sam Vilain
2008-03-30  4:31                         ` Jeff King
2008-05-21 19:39           ` Junio C Hamano
2008-05-21 19:47             ` Jeff King
     [not found] <7caf19ae394accab538d2f94953bb62b55a2c79f.1206486012.git.peff@peff.net>
2008-03-25 23:03 ` Jeff King
2008-03-26  5:59   ` Teemu Likonen
2008-03-26  6:20     ` Jeff King
2008-03-26  8:30       ` Teemu Likonen
2008-03-26  8:39         ` Jeff King
2008-03-26  9:23           ` Teemu Likonen
2008-03-26  9:32             ` Teemu Likonen
2008-03-26  9:35               ` Jeff King
2008-03-26  9:33             ` Jeff King
2008-03-27  7:38               ` Jeff King
2008-03-27 19:44                 ` Todd Zullinger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080329091145.GA19501@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=robin.rosenberg.lists@dewire.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).