public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Michal Marek <mmarek@suse.cz>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: Roland Dreier <rdreier@cisco.com>,
	Sergei Trofimovich <slyich@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	Sergei Trofimovich <slyfox@inbox.ru>
Subject: Re: [PATCH] Kbuild: set LC_MESSAGES=C (as LC_CTYPE=C is)
Date: Mon, 04 Jan 2010 15:44:41 +0100	[thread overview]
Message-ID: <4B41FED9.1060601@suse.cz> (raw)
In-Reply-To: <4B366C69.9010700@zytor.com>

On 26.12.2009 21:04, H. Peter Anvin wrote:
> On 12/25/2009 05:17 PM, Roland Dreier wrote:
>>
>>  > The whole reason with only setting some LC_* to C was to be able to
>>  > leave LC_MESSAGES intact, but it seems it breaks on too many real-life
>>  > systems.
>>
>>  > As such, I suggest we should set LC_ALL=C and get rid of the rest of it:
>>
>> Seems unfortunate to lose localized error messages.  (Although in my
>> en_US.UTF-8 case, all I get is non-ASCII quote characters)
>>
>> This all started because of the awk invocation in arch/x86/lib.  Maybe
>> the best idea would be to confine the locale monkeying to that one
>> place?
>>
> 
> It is also possible that setting only LC_COLLATE will solve the most
> fundamental problem, which is the one of character ranges.  LC_COLLATE
> probably will interfere less with LC_MESSAGES than the setting of LC_CTYPE.

We need LC_COLLATE=C so that [a-z] really means lowercase ASCII letters
and nothing else (most importantly not uppercase letters) in awk, sed
and the shell. If we stay with LC_CTYPE=$userdefined, the meaning of
[[:classes:]] becomes indeterministic and so does the mapping of
lowercase and uppercase characters:

$ echo iI | LC_CTYPE=tr_TR.UTF-8 awk '{ print $0 " " toupper($0) " "
tolower($0) }'
iI İI iı

Character classes are probably not a big issue (modulo the fact that
mawk doesn't seem to support them), because the input is ascii text
anyway. Regarding the tolower()/toupper() functions, I found one
potential troublemaker:

$ git grep -E 'to(lower|upper)' | grep -v '\.[ch]:'
arch/sh/tools/gen-mach-types:            tolower(mach[i]), mach[i]);

Maybe this awk script should be run with LC_ALL=C, people mostly care
about (localized) messages from gcc, not from awk.

Michal

      reply	other threads:[~2010-01-04 14:44 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-25 17:13 [PATCH] Kbuild: set LC_MESSAGES=C (as LC_CTYPE=C is) Sergei Trofimovich
2009-12-25 23:36 ` H. Peter Anvin
2009-12-26  1:17   ` Roland Dreier
2009-12-26  1:30     ` H. Peter Anvin
2009-12-26  6:58       ` Roland Dreier
2009-12-26 20:04     ` H. Peter Anvin
2010-01-04 14:44       ` Michal Marek [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B41FED9.1060601@suse.cz \
    --to=mmarek@suse.cz \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rdreier@cisco.com \
    --cc=slyfox@inbox.ru \
    --cc=slyich@gmail.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox