All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Robin Rosenberg <robin.rosenberg.lists@dewire.com>
Cc: Jeff King <peff@peff.net>, git@vger.kernel.org
Subject: Re: [PATCH 2/2] send-email: rfc2047-quote subject lines with non-ascii characters
Date: Sun, 30 Mar 2008 16:47:16 -0700	[thread overview]
Message-ID: <7vwsnjwz97.fsf@gitster.siamese.dyndns.org> (raw)
In-Reply-To: 200803290941.54091.robin.rosenberg.lists@dewire.com

Robin Rosenberg <robin.rosenberg.lists@dewire.com> writes:

> Den Saturday 29 March 2008 08.22.03 skrev Jeff King:
>> On Sat, Mar 29, 2008 at 08:19:07AM +0100, Robin Rosenberg wrote:
>> > Den Friday 28 March 2008 22.29.01 skrev Jeff King:
>> > > We always use 'utf-8' as the encoding, since we currently
>> > > have no way of getting the information from the user.
>> >
>> > Don't set encoding to UTF-8 unless it actually looks like UTF-8.
>>
>> OK. Do you have an example function that guesses with high probability
>> whether a string is utf-8? If there are non-ascii characters but we
>> _don't_ guess utf-8, what should we do?
>
> Any test for valid UTF-8 will do that with a very high probability. The
> perl UTF-8 "api" is a mess. I couldn't find such a routine!?. Calling 
> decode/encode and see if you get the original string works, but that is too
> clumsy, IMHO.

The sequence to decode followed by encode will test if you have a valid
one and if it is canonically encoded, which is testing too much.  You only
want to check if it is valid, and do not care about normalization.

I see this in perluniintro.pod:

    =item *

    How Do I Detect Data That's Not Valid In a Particular Encoding?

    Use the C<Encode> package to try converting it.
    For example,

        use Encode 'decode_utf8';
        if (decode_utf8($string_of_bytes_that_I_think_is_utf8)) {
            # valid
        } else {
            # invalid
        }

For commit log messages, we traditionally use similar idea to guess by
checking if it looks like an UTF-8 encoded string and otherwise assume
Latin-1 (and I think we still do if the user does not tell us).

If this issue is only about the --compose part of send-email, perhaps you
can interactively ask instead of "otherwise assume Latin-1"?

  parent reply	other threads:[~2008-03-30 23:48 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-28  6:30 [ANNOUNCE] GIT 1.5.5-rc2 Junio C Hamano
2008-03-28 18:13 ` Jeff King
2008-03-28 21:05   ` Junio C Hamano
2008-03-28 21:23     ` Jeff King
2008-03-28 21:27       ` Jeff King
2008-03-28 21:28         ` [PATCH 1/2] send-email: specify content-type of --compose body Jeff King
2008-03-28 21:29         ` [PATCH 2/2] send-email: rfc2047-quote subject lines with non-ascii characters Jeff King
2008-03-29  7:19           ` Robin Rosenberg
2008-03-29  7:22             ` Jeff King
2008-03-29  8:41               ` Robin Rosenberg
2008-03-29  8:49                 ` Jeff King
2008-03-29  9:02                   ` Robin Rosenberg
2008-03-29  9:11                     ` Jeff King
2008-03-29  9:39                       ` Robin Rosenberg
2008-03-29  9:43                         ` Jeff King
2008-03-29 12:54                           ` Robin Rosenberg
2008-03-29 21:45                             ` Jeff King
2008-03-30  3:40                               ` Sam Vilain
2008-03-30  4:39                                 ` Jeff King
2008-03-30 23:47                 ` Junio C Hamano [this message]
2008-03-29  8:44               ` Robin Rosenberg
2008-03-29  8:53                 ` Jeff King
2008-03-29  9:38                   ` Robin Rosenberg
2008-03-29  9:52                     ` Jeff King
2008-03-29 12:54                       ` Robin Rosenberg
2008-03-29 21:18                         ` Jeff King
2008-03-29 21:43                           ` Robin Rosenberg
2008-03-29 22:00                             ` Jeff King
2008-03-30  2:12                       ` Sam Vilain
2008-03-30  4:31                         ` Jeff King
2008-05-21 19:39           ` Junio C Hamano
2008-05-21 19:47             ` Jeff King
     [not found] <7caf19ae394accab538d2f94953bb62b55a2c79f.1206486012.git.peff@peff.net>
2008-03-25 23:03 ` Jeff King
2008-03-26  5:59   ` Teemu Likonen
2008-03-26  6:20     ` Jeff King
2008-03-26  8:30       ` Teemu Likonen
2008-03-26  8:39         ` Jeff King
2008-03-26  9:23           ` Teemu Likonen
2008-03-26  9:32             ` Teemu Likonen
2008-03-26  9:35               ` Jeff King
2008-03-26  9:33             ` Jeff King
2008-03-27  7:38               ` Jeff King
2008-03-27 19:44                 ` Todd Zullinger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vwsnjwz97.fsf@gitster.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=robin.rosenberg.lists@dewire.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.