All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Marek <mmarek@suse.cz>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: Roland Dreier <rdreier@cisco.com>,
	Sergei Trofimovich <slyich@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	Sergei Trofimovich <slyfox@inbox.ru>
Subject: Re: [PATCH] Kbuild: set LC_MESSAGES=C (as LC_CTYPE=C is)
Date: Mon, 04 Jan 2010 15:44:41 +0100	[thread overview]
Message-ID: <4B41FED9.1060601@suse.cz> (raw)
In-Reply-To: <4B366C69.9010700@zytor.com>

On 26.12.2009 21:04, H. Peter Anvin wrote:
> On 12/25/2009 05:17 PM, Roland Dreier wrote:
>>
>>  > The whole reason with only setting some LC_* to C was to be able to
>>  > leave LC_MESSAGES intact, but it seems it breaks on too many real-life
>>  > systems.
>>
>>  > As such, I suggest we should set LC_ALL=C and get rid of the rest of it:
>>
>> Seems unfortunate to lose localized error messages.  (Although in my
>> en_US.UTF-8 case, all I get is non-ASCII quote characters)
>>
>> This all started because of the awk invocation in arch/x86/lib.  Maybe
>> the best idea would be to confine the locale monkeying to that one
>> place?
>>
> 
> It is also possible that setting only LC_COLLATE will solve the most
> fundamental problem, which is the one of character ranges.  LC_COLLATE
> probably will interfere less with LC_MESSAGES than the setting of LC_CTYPE.

We need LC_COLLATE=C so that [a-z] really means lowercase ASCII letters
and nothing else (most importantly not uppercase letters) in awk, sed
and the shell. If we stay with LC_CTYPE=$userdefined, the meaning of
[[:classes:]] becomes indeterministic and so does the mapping of
lowercase and uppercase characters:

$ echo iI | LC_CTYPE=tr_TR.UTF-8 awk '{ print $0 " " toupper($0) " "
tolower($0) }'
iI İI iı

Character classes are probably not a big issue (modulo the fact that
mawk doesn't seem to support them), because the input is ascii text
anyway. Regarding the tolower()/toupper() functions, I found one
potential troublemaker:

$ git grep -E 'to(lower|upper)' | grep -v '\.[ch]:'
arch/sh/tools/gen-mach-types:            tolower(mach[i]), mach[i]);

Maybe this awk script should be run with LC_ALL=C, people mostly care
about (localized) messages from gcc, not from awk.

Michal

      reply	other threads:[~2010-01-04 14:44 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-25 17:13 [PATCH] Kbuild: set LC_MESSAGES=C (as LC_CTYPE=C is) Sergei Trofimovich
2009-12-25 23:36 ` H. Peter Anvin
2009-12-26  1:17   ` Roland Dreier
2009-12-26  1:30     ` H. Peter Anvin
2009-12-26  6:58       ` Roland Dreier
2009-12-26 20:04     ` H. Peter Anvin
2010-01-04 14:44       ` Michal Marek [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B41FED9.1060601@suse.cz \
    --to=mmarek@suse.cz \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rdreier@cisco.com \
    --cc=slyfox@inbox.ru \
    --cc=slyich@gmail.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.