From: Eric Blake <eblake@redhat.com>
To: Markus Armbruster <armbru@redhat.com>, Alberto Garcia <berto@igalia.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Gerd Hoffmann <kraxel@redhat.com>,
qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] gtk: use setlocale() for LC_MESSAGES only
Date: Mon, 21 Dec 2015 10:49:27 -0700 [thread overview]
Message-ID: <56783BA7.7030700@redhat.com> (raw)
In-Reply-To: <878u4ry1jw.fsf@blackfin.pond.sub.org>
[-- Attachment #1: Type: text/plain, Size: 3408 bytes --]
On 12/18/2015 12:55 PM, Markus Armbruster wrote:
> Alberto Garcia <berto@igalia.com> writes:
>
>>>>> We do however have translations for a few simple strings for the GTK+
>>>>> menu items, so in order to run QEMU using the C locale, and yet have a
>>>>> translated UI let's use setlocale() for LC_MESSAGES only.
>>>>>
>>>> Not sure why I noticed it only now and if it's related to any recent
>>>> package upgrade on my side (using RHEL 7), but I noticed that
>>>> non-ASCII characters in the GTK UI strings are broken for me and git
>>>> bisect pointed to this commit.
>>>
>>> I guess we need to set LC_CTYPE too.
>>
>> That affects functions in ctype.h (isalpha(), islower(), isupper(), ...)
>> I guess that's safe?
Gnulib introduces functions named c_isalpha(), c_islower(), and so
forth, which behave identically regardless of the current locale,
precisely because locale-dependent definitions on which byte sequences
form a valid character can cause undesirable behavior. I don't know if
glib does the same, but it does indeed have the potential to affect us,
in at least util/id.c:id_wellformed(). It would be weird to let the
user's choice of locale determine which ids they can create.
>
> If we're guessing, then I guess it isn't. But we shouldn't be guessing.
>
> "LC_CTYPE affects the behavior of the character handling functions and
> the multibyte and wide character functions."
>
> I doubt there's much use for the latter in QEMU itself, but in
> libraries, all bets are off. I guess this is what actually screws up
> GTK.
>
> We do use the former. LC_CTYPE set to some sufficiently funky locale is
> bound to upset these uses.
>
> In short: nope, we can't just set LC_CTYPE, at least not without further
> analysis.
In fact, if LC_CTYPE and LC_COLLATE are incompatible, then strcoll() has
undefined behavior. GNU coreutils warns:
Unless otherwise specified, all comparisons use the character
collating sequence specified by the ‘LC_COLLATE’ locale.(1)
[...]
(1) If you use a non-POSIX locale (e.g., by setting ‘LC_ALL’ to
‘en_US’), then ‘sort’ may produce output that is sorted differently than
you’re accustomed to. In that case, set the ‘LC_ALL’ environment
variable to ‘C’. Note that setting only ‘LC_COLLATE’ has two problems.
First, it is ineffective if ‘LC_ALL’ is also set. Second, it has
undefined behavior if ‘LC_CTYPE’ (or ‘LANG’, if ‘LC_CTYPE’ is unset) is
set to an incompatible value. For example, you get undefined behavior
if ‘LC_CTYPE’ is ‘ja_JP.PCK’ but ‘LC_COLLATE’ is ‘en_US.UTF-8’.
Off-hand, we are specifically NOT calling setlocale() for the categories
that we want to leave in the C locale, so we don't have to worry about
LC_ALL throwing us off. And I'm hard-pressed to think of an example
where LC_COLLATE=C while LC_CTYPE is a multibyte character will cause
unusual sorting artifacts (the one that coreutils is warning against is
when you have two incompatibly different multibyte character sets
involved, where our case is a multibyte character set for display but a
unibyte set for collation). But it is indeed a can of worms, that
requires special analysis.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
prev parent reply other threads:[~2015-12-21 17:49 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-10 15:19 [Qemu-devel] [PATCH] gtk: use setlocale() for LC_MESSAGES only Alberto Garcia
2015-09-11 9:12 ` Gerd Hoffmann
2015-12-18 11:38 ` Kevin Wolf
2015-12-18 13:23 ` Gerd Hoffmann
2015-12-18 15:38 ` Kevin Wolf
2015-12-18 18:04 ` Alberto Garcia
2015-12-18 19:55 ` Markus Armbruster
2015-12-21 17:49 ` Eric Blake [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56783BA7.7030700@redhat.com \
--to=eblake@redhat.com \
--cc=armbru@redhat.com \
--cc=berto@igalia.com \
--cc=kraxel@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).