From: "Torsten Bögershausen" <tboegi@web.de>
To: Jeff King <peff@peff.net>
Cc: Johannes Sixt <j.sixt@viscovery.net>,
Junio C Hamano <gitster@pobox.com>,
Thomas Haller <thom311@gmail.com>, Git List <git@vger.kernel.org>
Subject: Re: [PATCH ] t4210-log-i18n: spell encoding name "UTF-8" correctly
Date: Mon, 25 Feb 2013 22:00:46 +0100 [thread overview]
Message-ID: <512BD0FE.5040108@web.de> (raw)
In-Reply-To: <20130225151916.GA7725@sigill.intra.peff.net>
On 25.02.13 16:19, Jeff King wrote:
> On Mon, Feb 25, 2013 at 09:37:50AM +0100, Johannes Sixt wrote:
>
>> From: Johannes Sixt <j6t@kdbg.org>
>>
>> iconv on Windows does not know the encoding name "utf8", and does not
>> re-encode log messages when this name is given. Request "UTF-8" encoding.
>>
>> Signed-off-by: Johannes Sixt <j6t@kdbg.org>
>> ---
>> I'm not sure whether I'm right to say that "UTF-8" is the correct
>> spelling. Anyway, 'iconv -l' on my old Linux box lists "UTF8", but on
>> Windows it does not.
>
> UTF-8 is correct according to:
>
> https://en.wikipedia.org/wiki/Utf8#Official_name_and_variants
>
>> A more correct fix would probably be to use is_encoding_utf8() in more
>> places, but it's outside my time budget look after it.
>
> Yeah, I wonder if this is a symptom of a deeper issue, which is that
> utf-8 has many synonyms, and we would prefer to canonicalize the
> encoding name before generating an object to avoid inconsistencies (of
> course we cannot do so for every imaginable encoding, but utf-8 is a
> pretty obvious one we handle already). We _should_ be generating commits
> with no encoding header at all for utf-8, though.
>
> And indeed, it looks like that is the case. commit_tree_extended has:
>
> /* Not having i18n.commitencoding is the same as having utf-8 */
> encoding_is_utf8 = is_encoding_utf8(git_commit_encoding);
>
> [...]
>
> if (!encoding_is_utf8)
> strbuf_addf(&buffer, "encoding %s\n", git_commit_encoding);
>
>
> which makes me think that this first hunk...
>
>> diff --git a/t/t4210-log-i18n.sh b/t/t4210-log-i18n.sh
>> index 52a7472..b1956e2 100755
>> --- a/t/t4210-log-i18n.sh
>> +++ b/t/t4210-log-i18n.sh
>> @@ -15,7 +15,7 @@ test_expect_success 'create commits in different encodings' '
>> t${utf8_e}st
>> EOF
>> git add msg &&
>> - git -c i18n.commitencoding=utf8 commit -F msg &&
>> + git -c i18n.commitencoding=UTF-8 commit -F msg &&
>> cat >msg <<-EOF &&
>> latin1
>
> ...should be a no-op; the utf8 there should never be seen by anybody but
> git. Can you confirm that is the case?
>
>> @@ -30,7 +30,7 @@ test_expect_success 'log --grep searches in log output encoding (utf8)' '
>> latin1
>> utf8
>> EOF
>> - git log --encoding=utf8 --format=%s --grep=$utf8_e >actual &&
>> + git log --encoding=UTF-8 --format=%s --grep=$utf8_e >actual &&
>> test_cmp expect actual
>> '
>
> This one will feed it to iconv, though, because the latin1 commit will
> need to be re-encoded. I think the simplest thing would just be:
>
> diff --git a/utf8.c b/utf8.c
> index 1087870..8d42b50 100644
> --- a/utf8.c
> +++ b/utf8.c
> @@ -507,6 +507,17 @@ char *reencode_string(const char *in, const char *out_encoding, const char *in_e
>
> if (!in_encoding)
> return NULL;
> +
> + /*
> + * Some platforms do not have the variously spelled variants of
> + * UTF-8, so let us feed iconv the most official spelling, which
> + * should hopefully be accepted everywhere.
> + */
> + if (is_encoding_utf8(in_encoding))
> + in_encoding = "UTF-8";
> + if (is_encoding_utf8(out_encoding))
> + out_encoding = "UTF-8";
> +
> conv = iconv_open(out_encoding, in_encoding);
> if (conv == (iconv_t) -1)
> return NULL;
>
> Does that fix the tests for you? It's a larger change, but I think it
> makes git friendlier all around for people on Windows.
>
> -Peff
> --
Thanks, I'm OK with your version.
And a test on cygwin was OK for the new t4210.
next prev parent reply other threads:[~2013-02-25 21:01 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-08 23:52 segfault for git log --graph --no-walk --grep a Thomas Haller
2013-02-09 0:05 ` Junio C Hamano
2013-02-09 0:22 ` Junio C Hamano
2013-02-09 0:27 ` Jeff King
2013-02-09 0:39 ` Junio C Hamano
2013-02-09 0:47 ` Junio C Hamano
2013-02-09 1:05 ` Jeff King
2013-02-09 1:08 ` Jeff King
2013-02-11 19:16 ` Jeff King
2013-02-11 20:01 ` Junio C Hamano
2013-02-11 20:36 ` Junio C Hamano
2013-02-11 20:41 ` Jeff King
2013-02-11 20:55 ` Junio C Hamano
2013-02-11 20:59 ` [PATCH] log: re-encode commit messages before grepping Jeff King
2013-02-11 21:11 ` Junio C Hamano
2013-02-11 21:14 ` Jeff King
2013-02-25 8:37 ` [PATCH ] t4210-log-i18n: spell encoding name "UTF-8" correctly Johannes Sixt
2013-02-25 15:19 ` Jeff King
2013-02-25 19:06 ` Junio C Hamano
2013-02-25 20:31 ` Jeff King
2013-02-26 6:47 ` Johannes Sixt
2013-02-25 21:00 ` Torsten Bögershausen [this message]
2013-02-25 18:54 ` Torsten Bögershausen
2013-02-25 20:36 ` Jeff King
2013-02-09 0:29 ` segfault for git log --graph --no-walk --grep a Junio C Hamano
2013-02-09 0:39 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=512BD0FE.5040108@web.de \
--to=tboegi@web.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=j.sixt@viscovery.net \
--cc=peff@peff.net \
--cc=thom311@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.