From: Robin Rosenberg <robin.rosenberg.lists@dewire.com>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [PATCH 2/2] send-email: rfc2047-quote subject lines with non-ascii characters
Date: Sat, 29 Mar 2008 10:02:43 +0100 [thread overview]
Message-ID: <200803291002.43768.robin.rosenberg.lists@dewire.com> (raw)
In-Reply-To: <20080329084947.GB19200@coredump.intra.peff.net>
Den Saturday 29 March 2008 09.49.48 skrev Jeff King:
> On Sat, Mar 29, 2008 at 09:41:53AM +0100, Robin Rosenberg wrote:
> > > OK. Do you have an example function that guesses with high probability
> > > whether a string is utf-8? If there are non-ascii characters but we
> > > _don't_ guess utf-8, what should we do?
> >
> > Any test for valid UTF-8 will do that with a very high probability. The
> > perl UTF-8 "api" is a mess. I couldn't find such a routine!?. Calling
> > decode/encode and see if you get the original string works, but that is
> > too clumsy, IMHO.
>
> Does that work? I would think you would have to compare the normalized
> versions of each string, since decode(encode($x)) is not, AIUI,
> guaranteed to produce $x.
I don't claim to understand it either. Hopefully some perl guru will step
forward and just explain how to do this in perl.
My proof is entirely empirical. What happens is that attempting to decode a
non-UTF-8 string will put a unicode surrogate pair into the (now Unicode)
string and encoding will just encode the surrogate pair into UTF-8 and not
the original. As a result, the encode(decode($x)) eq $x *only* if $x is a
valid UTF-8 octet sequence. Why would you not get the original back if
you start with valid UTF-8?
-- robin
next prev parent reply other threads:[~2008-03-29 9:04 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-28 6:30 [ANNOUNCE] GIT 1.5.5-rc2 Junio C Hamano
2008-03-28 18:13 ` Jeff King
2008-03-28 21:05 ` Junio C Hamano
2008-03-28 21:23 ` Jeff King
2008-03-28 21:27 ` Jeff King
2008-03-28 21:28 ` [PATCH 1/2] send-email: specify content-type of --compose body Jeff King
2008-03-28 21:29 ` [PATCH 2/2] send-email: rfc2047-quote subject lines with non-ascii characters Jeff King
2008-03-29 7:19 ` Robin Rosenberg
2008-03-29 7:22 ` Jeff King
2008-03-29 8:41 ` Robin Rosenberg
2008-03-29 8:49 ` Jeff King
2008-03-29 9:02 ` Robin Rosenberg [this message]
2008-03-29 9:11 ` Jeff King
2008-03-29 9:39 ` Robin Rosenberg
2008-03-29 9:43 ` Jeff King
2008-03-29 12:54 ` Robin Rosenberg
2008-03-29 21:45 ` Jeff King
2008-03-30 3:40 ` Sam Vilain
2008-03-30 4:39 ` Jeff King
2008-03-30 23:47 ` Junio C Hamano
2008-03-29 8:44 ` Robin Rosenberg
2008-03-29 8:53 ` Jeff King
2008-03-29 9:38 ` Robin Rosenberg
2008-03-29 9:52 ` Jeff King
2008-03-29 12:54 ` Robin Rosenberg
2008-03-29 21:18 ` Jeff King
2008-03-29 21:43 ` Robin Rosenberg
2008-03-29 22:00 ` Jeff King
2008-03-30 2:12 ` Sam Vilain
2008-03-30 4:31 ` Jeff King
2008-05-21 19:39 ` Junio C Hamano
2008-05-21 19:47 ` Jeff King
[not found] <7caf19ae394accab538d2f94953bb62b55a2c79f.1206486012.git.peff@peff.net>
2008-03-25 23:03 ` Jeff King
2008-03-26 5:59 ` Teemu Likonen
2008-03-26 6:20 ` Jeff King
2008-03-26 8:30 ` Teemu Likonen
2008-03-26 8:39 ` Jeff King
2008-03-26 9:23 ` Teemu Likonen
2008-03-26 9:32 ` Teemu Likonen
2008-03-26 9:35 ` Jeff King
2008-03-26 9:33 ` Jeff King
2008-03-27 7:38 ` Jeff King
2008-03-27 19:44 ` Todd Zullinger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200803291002.43768.robin.rosenberg.lists@dewire.com \
--to=robin.rosenberg.lists@dewire.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).