git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Odd encoding issue with UTF-8 + gettext yields ? on non-ASCII
@ 2010-08-28 21:17 Ævar Arnfjörð Bjarmason
  2010-08-28 21:33 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 20+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-08-28 21:17 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Marcin Cieslak

I'm having an odd encoding issue with gettext on my
gettextize-git-mainporcelain branch that hadn't been turned up before
because none of the existing messages used non-ASCII translations.

With this in is.po (full version at [is.po]):

    "Content-Type: text/plain; charset=UTF-8\n"
    "Content-Transfer-Encoding: 8bit\n

I do:

    $ msgfmt -o /opt/git/next-gettext/share/locale/is/LC_MESSAGES/git.mo is.po

Which, under an Icelandic locale gives me:

    $ rm -rf /tmp/meh; LANGUAGE= LC_ALL= LANG=is_IS.UTF-8 git init /tmp/meh
    Bj? til t?ma Git lind ? /tmp/meh/.git/

Those "?" characters are actual ASCII question marks.

But if I don't specify an encoding msgfmt will complain:

    $ msgfmt -o /opt/git/next-gettext/share/locale/is/LC_MESSAGES/git.mo is.po
    is.po: warning: Charset missing in header.
                    Message conversion to user's charset will not work.

But git will now emit the non-ASCII characters from its message
catalogue. Probably because some component now doesn't try to be smart
about encoding.

    $ rm -rf /tmp/meh; LANGUAGE= LC_ALL= LANG=is_IS.UTF-8 git init /tmp/meh
    Bjó til tóma Git lind í /tmp/meh/.git/

That'd probably break under a non-UTF-8 locale, like an ISO-8859-1 one
though.

A `hexdump -C` of the two `.mo` files is exactly the same, aside from
the charset header. I.e. both contain valid UTF-8 sequences, so the
issue is somewhere between the `*.mo` file being read and it being
emitted by `libintl` and the `gettext` function.

We're not doing anything odd in our [gettext.c] that I can see that
could explain this.

To reproduce it, do:

    git clone --reference ~/g/git git://github.com/avar/git.git next-gettext
    cd next-gettext
    git checkout -t origin/gettextize-git-mainporcelain
    make -j 4 prefix=/tmp/git all install
    rm -rf /tmp/meh; LANGUAGE= LANG=is_IS.utf8 /tmp/git/bin/git init /tmp/meh

Which'll give (as mentioned above):

    Bj? til t?ma Git lind ? /tmp/meh/.git/

But editing out the Content-Type line gives:

    Bjó til tóma Git lind í /tmp/meh/.git/

[gettextize-git-mainporcelain]:
http://github.com/avar/git/tree/gettextize-git-mainporcelain]
[is.po]: http://github.com/avar/git/blob/gettextize-git-mainporcelain/po/is.po
[gettext.c]: http://github.com/avar/git/blob/gettextize-git-mainporcelain/gettext.c

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2010-08-30 14:33 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-28 21:17 Odd encoding issue with UTF-8 + gettext yields ? on non-ASCII Ævar Arnfjörð Bjarmason
2010-08-28 21:33 ` Ævar Arnfjörð Bjarmason
2010-08-28 21:46   ` Jonathan Nieder
2010-08-28 21:59     ` Jonathan Nieder
2010-08-28 22:14       ` Marcin Cieslak
2010-08-28 22:16         ` Jonathan Nieder
2010-08-29  7:36           ` Ævar Arnfjörð Bjarmason
2010-08-29  8:37             ` Ævar Arnfjörð Bjarmason
2010-08-30  2:22             ` Jonathan Nieder
2010-08-29 18:12           ` Ævar Arnfjörð Bjarmason
2010-08-29 20:45             ` Jonathan Nieder
2010-08-30  8:57               ` Ævar Arnfjörð Bjarmason
2010-08-30 13:41                 ` Jonathan Nieder
2010-08-30 14:00                   ` Marcin Cieslak
2010-08-30 14:09                     ` Jonathan Nieder
2010-08-30 14:33                       ` Ævar Arnfjörð Bjarmason
2010-08-30 14:13                     ` Ævar Arnfjörð Bjarmason
2010-08-30 14:04                   ` Ævar Arnfjörð Bjarmason
2010-08-28 22:20         ` Jonathan Nieder
2010-08-28 22:30           ` Marcin Cieslak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).