From: Michal Marek <mmarek@suse.cz>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: Roland Dreier <rdreier@cisco.com>,
Sergei Trofimovich <slyich@gmail.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-kernel@vger.kernel.org,
Sergei Trofimovich <slyfox@inbox.ru>
Subject: Re: [PATCH] Kbuild: set LC_MESSAGES=C (as LC_CTYPE=C is)
Date: Mon, 04 Jan 2010 15:44:41 +0100 [thread overview]
Message-ID: <4B41FED9.1060601@suse.cz> (raw)
In-Reply-To: <4B366C69.9010700@zytor.com>
On 26.12.2009 21:04, H. Peter Anvin wrote:
> On 12/25/2009 05:17 PM, Roland Dreier wrote:
>>
>> > The whole reason with only setting some LC_* to C was to be able to
>> > leave LC_MESSAGES intact, but it seems it breaks on too many real-life
>> > systems.
>>
>> > As such, I suggest we should set LC_ALL=C and get rid of the rest of it:
>>
>> Seems unfortunate to lose localized error messages. (Although in my
>> en_US.UTF-8 case, all I get is non-ASCII quote characters)
>>
>> This all started because of the awk invocation in arch/x86/lib. Maybe
>> the best idea would be to confine the locale monkeying to that one
>> place?
>>
>
> It is also possible that setting only LC_COLLATE will solve the most
> fundamental problem, which is the one of character ranges. LC_COLLATE
> probably will interfere less with LC_MESSAGES than the setting of LC_CTYPE.
We need LC_COLLATE=C so that [a-z] really means lowercase ASCII letters
and nothing else (most importantly not uppercase letters) in awk, sed
and the shell. If we stay with LC_CTYPE=$userdefined, the meaning of
[[:classes:]] becomes indeterministic and so does the mapping of
lowercase and uppercase characters:
$ echo iI | LC_CTYPE=tr_TR.UTF-8 awk '{ print $0 " " toupper($0) " "
tolower($0) }'
iI İI iı
Character classes are probably not a big issue (modulo the fact that
mawk doesn't seem to support them), because the input is ascii text
anyway. Regarding the tolower()/toupper() functions, I found one
potential troublemaker:
$ git grep -E 'to(lower|upper)' | grep -v '\.[ch]:'
arch/sh/tools/gen-mach-types: tolower(mach[i]), mach[i]);
Maybe this awk script should be run with LC_ALL=C, people mostly care
about (localized) messages from gcc, not from awk.
Michal
prev parent reply other threads:[~2010-01-04 14:44 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-25 17:13 [PATCH] Kbuild: set LC_MESSAGES=C (as LC_CTYPE=C is) Sergei Trofimovich
2009-12-25 23:36 ` H. Peter Anvin
2009-12-26 1:17 ` Roland Dreier
2009-12-26 1:30 ` H. Peter Anvin
2009-12-26 6:58 ` Roland Dreier
2009-12-26 20:04 ` H. Peter Anvin
2010-01-04 14:44 ` Michal Marek [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B41FED9.1060601@suse.cz \
--to=mmarek@suse.cz \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=rdreier@cisco.com \
--cc=slyfox@inbox.ru \
--cc=slyich@gmail.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox