* Re: JFS default behavior
2004-02-13 18:31 ` Richard B. Johnson
@ 2004-02-13 18:50 ` Ulrich Drepper
0 siblings, 0 replies; 29+ messages in thread
From: Ulrich Drepper @ 2004-02-13 18:50 UTC (permalink / raw)
To: root; +Cc: viro, Nicolas Mailhot, Jamie Lokier, linux-kernel
Richard B. Johnson wrote:
> I think that all ASCII characters below 0x20 are forbidden in
> Unix file-names
Not true. Filenames in Unix are defined as
3.169 Filename
A name consisting of 1 to {NAME_MAX} bytes used to name a file. The
characters composing the name may be selected from the set of all
character values excluding the slash character and the null byte. The
filenames dot and dot-dot have special meaning. A filename is
sometimes referred to as a pathname component .
Only NUL and / are special.
--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
[not found] <04Feb13.163954est.41760@gpu.utcc.utoronto.ca>
@ 2004-02-14 14:27 ` Nicolas Mailhot
2004-02-14 15:40 ` viro
0 siblings, 1 reply; 29+ messages in thread
From: Nicolas Mailhot @ 2004-02-14 14:27 UTC (permalink / raw)
To: chris.siebenmann; +Cc: linux-kernel
Chris Siebenmann wrote:
> You write:
> | So what ?
> | Do you think an app that expects utf-8 filenames won't crash today when
> | served a byte sequence that's invalid UTF-8 ? (or an app that expects
> | ascii when served utf-8 oddities)
>
> Such apps are buggy and need to be fixed.
Well, this means every single java app right now at least.
> This is not Unix's problem,
The w2k problem was at the app level mostly.
It would not have been OS responsibility to fix it.
*However* since the unix time conventions were a bit more sane than
other os, the damage was less.
> any more than it is Unix's problem if an application frees memory twice,
> writes over unallocated memory, or destroys its stack.
The core os responsability is to share sanely ressources between apps.
Filenames are a shared ressource.
When encodings starts to be incompatible, resulting in applications
crashes it's the OS job to define and enforce sane conventions so apps
can coexist together.
Past oversights should not mean the problem should not be fixed
(especially if solutions exist, even if they are not totally painless).
There is no more justification to keep encoding undefined as there is to
keep time zone undefined. Last I've seen we're all pretty happy system
time actually means something on unix (unlike other systems where it can
be anything depending on the location where the initial installation was
performed).
> If all you care about is the future, you need no kernel support.
> Declare that all filesystem names are written in UTF-8, and make your
> tools deal with it. (Most will not care. A few will have to be fixed a
> bit.)
Tools won't change unless they're forced to. That's a plain fact.
As you wrote there shouldn't be a lot of fixups to do, since apps that
can't deal with utf-8 now use ascii-only filenames anyway, but the few
fixups that are needed won't happen without a little OS prodding.
(and without OS enforcement illegal utf-8 filename injection will remain
a security risk)
And I write utf8 here, but any unicode form is fine with me as long as
it's clearly defined and enforced by the FSs.
Cheers,
--
Nicolas Mailhot
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
2004-02-14 14:27 ` Nicolas Mailhot
@ 2004-02-14 15:40 ` viro
2004-02-14 17:47 ` Nicolas Mailhot
2004-02-14 23:06 ` Robin Rosenberg
0 siblings, 2 replies; 29+ messages in thread
From: viro @ 2004-02-14 15:40 UTC (permalink / raw)
To: Nicolas Mailhot; +Cc: chris.siebenmann, linux-kernel
On Sat, Feb 14, 2004 at 03:27:50PM +0100, Nicolas Mailhot wrote:
> There is no more justification to keep encoding undefined as there is to
> keep time zone undefined. Last I've seen we're all pretty happy system
> time actually means something on unix (unlike other systems where it can
> be anything depending on the location where the initial installation was
> performed).
"System time" is amount of time elapsed since the epoch. Period. What does
it have to any timezone?
The only place where timezone enters the picture is conversion of time to
year:month:day:hours:minutes:seconds and that's
a) process-dependent and
b) done outside of kernel
The same goes for file names. Filename is a sequence of bytes, no more and
no less. Anything beyond that belongs to applications.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
2004-02-14 15:40 ` viro
@ 2004-02-14 17:47 ` Nicolas Mailhot
2004-02-14 17:59 ` Nicolas Mailhot
2004-02-14 23:06 ` Robin Rosenberg
1 sibling, 1 reply; 29+ messages in thread
From: Nicolas Mailhot @ 2004-02-14 17:47 UTC (permalink / raw)
To: viro; +Cc: chris.siebenmann, linux-kernel
viro@parcelfarce.linux.theplanet.co.uk wrote:
> On Sat, Feb 14, 2004 at 03:27:50PM +0100, Nicolas Mailhot wrote:
>
>>There is no more justification to keep encoding undefined as there is to
>>keep time zone undefined. Last I've seen we're all pretty happy system
>>time actually means something on unix (unlike other systems where it can
>>be anything depending on the location where the initial installation was
>>performed).
>
>
> "System time" is amount of time elapsed since the epoch. Period. What does
> it have to any timezone?
And everyone agrees on the epoch and that's why it works.
(just like sensors output is not just any numerical value but has a
well-defined unit)
With filenames we have a value but what it means exactly is a matter of
conjecture. That's the problem.
(it wouldn't be if filenames were just magic cookies that never needed
to be interpreted but there's a lot of actors, be it apps or humans that
need to agree on what the byte string)
Cheers,
--
Nicolas Mailhot
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
2004-02-14 17:47 ` Nicolas Mailhot
@ 2004-02-14 17:59 ` Nicolas Mailhot
0 siblings, 0 replies; 29+ messages in thread
From: Nicolas Mailhot @ 2004-02-14 17:59 UTC (permalink / raw)
To: linux-kernel; +Cc: viro, chris.siebenmann
Nicolas Mailhot wrote:
> to be interpreted but there's a lot of actors, be it apps or humans that
> need to agree on what the byte string)
... actually means
(bad proofreading, sorry)
--
Nicolas Mailhot
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
2004-02-14 15:40 ` viro
2004-02-14 17:47 ` Nicolas Mailhot
@ 2004-02-14 23:06 ` Robin Rosenberg
2004-02-14 23:29 ` viro
1 sibling, 1 reply; 29+ messages in thread
From: Robin Rosenberg @ 2004-02-14 23:06 UTC (permalink / raw)
To: viro; +Cc: Linux kernel
On Saturday 14 February 2004 16.40, you wrote:
> The same goes for file names. Filename is a sequence of bytes, no more and
> no less. Anything beyond that belongs to applications.
Should be a sequence of characters since humans are supposed to use them and
it should be the same characters wheneve possible regardless of user's locale.
The "sequence of bytes" idea is a legacy from prehistoric times when byte == character
was true. That is no longer the case and actually hasn't been for quite a while in
some parts of the world. Interchange is important. The application cannot handle
this since it cannot know what characters a byte string represents. Fixing it in the
kernel is the simple solution since it knows the locale. Its also a small change I
believe. Having an iocharset options for all file systems make it backward compatible
and creates a migration path to UTF-8 as system default locale.
-- robin
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
2004-02-14 23:06 ` Robin Rosenberg
@ 2004-02-14 23:29 ` viro
2004-02-15 0:07 ` Robin Rosenberg
0 siblings, 1 reply; 29+ messages in thread
From: viro @ 2004-02-14 23:29 UTC (permalink / raw)
To: Robin Rosenberg; +Cc: Linux kernel
On Sun, Feb 15, 2004 at 12:06:23AM +0100, Robin Rosenberg wrote:
> On Saturday 14 February 2004 16.40, you wrote:
> > The same goes for file names. Filename is a sequence of bytes, no more and
> > no less. Anything beyond that belongs to applications.
>
> Should be a sequence of characters since humans are supposed to use them and
> it should be the same characters wheneve possible regardless of user's locale.
> The "sequence of bytes" idea is a legacy from prehistoric times when byte == character
> was true.
Bullshit. It has _nothing_ to characters, wide or not. For system filenames
are opaque. The only things that have special meanings are:
octet 0x2f ('/') splits the pathname into components
"." as a component has a special meaning
".." as a component has a special meaning.
That's it. The rest is never interpreted by the kernel.
> Having an iocharset options for all file systems make it backward compatible
> and creates a migration path to UTF-8 as system default locale.
Try to realize that different users CAN HAVE DIFFERENT LOCALES. On the same
system. And have files on the same fs. Moreover, homedirs that used to be
on different filesystems can end up one the same fs. What iocharset would
you use, then? Sigh...
Again, there is no such thing as iocharset of filesystem - it varies between
users and users can and do share filesystems. Think of /home; think of /tmp.
It isn't feasible. At all. Just as timezone doesn't belong in kernel, locales
have no place there.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
2004-02-14 23:29 ` viro
@ 2004-02-15 0:07 ` Robin Rosenberg
2004-02-15 2:41 ` Linus Torvalds
0 siblings, 1 reply; 29+ messages in thread
From: Robin Rosenberg @ 2004-02-15 0:07 UTC (permalink / raw)
To: viro; +Cc: Linux kernel
On Sunday 15 February 2004 00.29, you wrote:
> On Sun, Feb 15, 2004 at 12:06:23AM +0100, Robin Rosenberg wrote:
> > The "sequence of bytes" idea is a legacy from prehistoric times when byte == character
> > was true.
>
> Bullshit. It has _nothing_ to characters, wide or not. For system filenames
> are opaque. The only things that have special meanings are:
> octet 0x2f ('/') splits the pathname into components
> "." as a component has a special meaning
> ".." as a component has a special meaning.
> That's it. The rest is never interpreted by the kernel.
I know how it is (to some degree), and its wrong. The user sees inside the filename
and sees a string of characters, not a byte sequence.
> Try to realize that different users CAN HAVE DIFFERENT LOCALES. On the same
> system. And have files on the same fs. Moreover, homedirs that used to be
> on different filesystems can end up one the same fs. What iocharset would
> you use, then? Sigh...
Ok, I've got the iocharset option wrong, god knows why. The problem
however remains.
It seems you simply don't want to understand the problem, which is that users
CAN HAVE DIFFERENT LOCALES on the same system and on different system.
Sigh...
I less concerned with which solution than that a solution should be found. So it
seems no file system has a solution today. Still an iocharset option would relieve
the problem for removable media and muli-boot systems. Most linux machines
are essentially single user and have either the same locale for all users or all
users are using UTF-8 with their locale. It's not the locale, but the charset used
for encoding the locale. The rest cannot be helped.
-- robin
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
2004-02-15 0:07 ` Robin Rosenberg
@ 2004-02-15 2:41 ` Linus Torvalds
2004-02-15 3:33 ` Matthias Urlichs
0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2004-02-15 2:41 UTC (permalink / raw)
To: Robin Rosenberg; +Cc: viro, Linux kernel
On Sun, 15 Feb 2004, Robin Rosenberg wrote:
> >
> > Bullshit. It has _nothing_ to characters, wide or not. For system filenames
> > are opaque. The only things that have special meanings are:
> > octet 0x2f ('/') splits the pathname into components
> > "." as a component has a special meaning
> > ".." as a component has a special meaning.
> > That's it. The rest is never interpreted by the kernel.
>
> I know how it is (to some degree), and its wrong. The user sees inside the filename
> and sees a string of characters, not a byte sequence.
Yes, the user sees a string of characters, but the octet 0x2f ('/') and
the terminating NUL character '\0' are still perfectly normal characters
and there is no confusion.
The reason: UTF-8. It's the only sane encoding (apart from a pure extended
ASCII setup, which is also sane, but is obviously unacceptable for a large
portion of the world).
If some misguided person has told you about UCS-2 and horrors like UTF-9,
just ignore them. They are crazy and deluded, and - perhaps more
importantly - stupid.
In short: the kernel talks bytestreams, and that implies that if you want
to talk to the kernel, you HAVE TO USE UTF-8.
At which point there are no locale issues any more. The only locale issue
you can have is user space mistaking a stream of bytes as extended ASCII,
which will cause all your pretty UTF-8 characters to be shown as strange
latin1 (or other) squiggles.
> It seems you simply don't want to understand the problem, which is that users
> CAN HAVE DIFFERENT LOCALES on the same system and on different system.
> Sigh...
People understand the problem. And UTF-8 is the solution.
It's getting there. I think even Microsoft has seen the light, and is
phasing out their crapola (UCS-2LE? Whatever).
> I less concerned with which solution than that a solution should be found. So it
> seems no file system has a solution today. Still an iocharset option would relieve
> the problem for removable media and muli-boot systems.
No. Things like "iocharset" are not the solution. They are literally the
_problem_. The solution is to use something that not only acts as ASCII,
but also has a wide enough range to cover the whole required space (UCS-2
fails _both_ of these fundamental tests). At which point "iocharset" makes
no sense any more, and only exists as a way to translate legacy crap into
the one true format.
And that one true format is UTF-8. End of story. If you try to talk to the
kernel in UCS-2 or anything else, you _will_ fail.
Linus
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
2004-02-15 2:41 ` Linus Torvalds
@ 2004-02-15 3:33 ` Matthias Urlichs
2004-02-15 4:04 ` viro
0 siblings, 1 reply; 29+ messages in thread
From: Matthias Urlichs @ 2004-02-15 3:33 UTC (permalink / raw)
To: linux-kernel
Hi, Linus Torvalds wrote:
> In short: the kernel talks bytestreams, and that implies that if you want
> to talk to the kernel, you HAVE TO USE UTF-8.
>
> At which point there are no locale issues any more.
Not locale, but normalization problems and identical-glyph problems.
Which is actually worse, because you don't have filenames which look
like crap -- instead you have filenames which look perfectly sane, but
they still do not work. Example: is an á one character, or is it an a
followed by a composing ´?
Mac OSX, just as an example, only uses decomposed filenames. I don't know
the current situation, but 10.2 has major problems when you try to access
files with composite characters in their name (across NFS for instance).
I wonder if Linux, i.e. Linus ;-) should decree one single standard
normalization. (I am NOT saying that enforcing this would be the kernel's
job!)
--
Matthias Urlichs
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
2004-02-15 3:33 ` Matthias Urlichs
@ 2004-02-15 4:04 ` viro
2004-02-15 9:48 ` Robin Rosenberg
2004-02-15 18:26 ` yodaiken
0 siblings, 2 replies; 29+ messages in thread
From: viro @ 2004-02-15 4:04 UTC (permalink / raw)
To: Matthias Urlichs; +Cc: linux-kernel
On Sun, Feb 15, 2004 at 04:33:48AM +0100, Matthias Urlichs wrote:
> Mac OSX, just as an example, only uses decomposed filenames.
So how long does it take for a filename to decompose? ;-)
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
2004-02-15 4:04 ` viro
@ 2004-02-15 9:48 ` Robin Rosenberg
2004-02-15 18:26 ` yodaiken
1 sibling, 0 replies; 29+ messages in thread
From: Robin Rosenberg @ 2004-02-15 9:48 UTC (permalink / raw)
To: viro; +Cc: Linux kernel
On Sunday 15 February 2004 05.04, you wrote:
> On Sun, Feb 15, 2004 at 04:33:48AM +0100, Matthias Urlichs wrote:
>
> > Mac OSX, just as an example, only uses decomposed filenames.
>
> So how long does it take for a filename to decompose?
As long as it takes to switch locale to UTF-8 :) or vice verse.
-- robin
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
@ 2004-02-15 14:48 Pascal Schmidt
2004-02-16 14:24 ` Eduard Bloch
0 siblings, 1 reply; 29+ messages in thread
From: Pascal Schmidt @ 2004-02-15 14:48 UTC (permalink / raw)
To: Jamie Lokier; +Cc: linux-kernel
>> Then I did unicode_stop. Guess what: it put the display back in
>> iso-8859-1 for that virtual terminal, but the keyboard remained
>> stuck in UTF-8 for _all_ virtual terminals.
> kbd_mode -a to reset to ASCII mode.
And as I just figured out, loadkeys has to be invoked again, also.
I can go to utf-8 with:
setfont lat0-16
kbd_mode -u
loadkeys de-latin1-nodeadkeys
and return to latin-1 with:
setfont lat1-16
kbd_mode -a
loadkeys de-latin1-nodeadkeys
Without the loadkeys after returning to latin-1 mode, I can no longer
input umlauts and other special characters correctly.
--
Ciao,
Pascal
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
2004-02-15 4:04 ` viro
2004-02-15 9:48 ` Robin Rosenberg
@ 2004-02-15 18:26 ` yodaiken
1 sibling, 0 replies; 29+ messages in thread
From: yodaiken @ 2004-02-15 18:26 UTC (permalink / raw)
To: viro; +Cc: Matthias Urlichs, linux-kernel
On Sun, Feb 15, 2004 at 04:04:58AM +0000, viro@parcelfarce.linux.theplanet.co.uk wrote:
> On Sun, Feb 15, 2004 at 04:33:48AM +0100, Matthias Urlichs wrote:
>
> > Mac OSX, just as an example, only uses decomposed filenames.
>
> So how long does it take for a filename to decompose? ;-)
Depends on whether it is junk or not.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
@ 2004-02-15 23:03 Nicolas Mailhot
2004-02-16 3:45 ` Jan Knutar
2004-02-16 6:21 ` jw schultz
0 siblings, 2 replies; 29+ messages in thread
From: Nicolas Mailhot @ 2004-02-15 23:03 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 784 bytes --]
| Linus Torvalds pointed the way of Tux :
| In short: the kernel talks bytestreams, and that implies that if you
| want to talk to the kernel, you HAVE TO USE UTF-8.
In that case :
- should the kernel allow apps to write filenames that are invalid
UTF-8 and will crash UTF-8 apps ?
- should this UTF-8 rule be noted somewhere (in a FAQ/man page/LSB spec/
whatever) so apps authors know they are supposed to read and write UTF-8
filenames and not apply locale rules to kernel objects ?
- what happens to already existing invalid UTF-8 filenames ? Should the
kernel forcibly rewrite them (in 2.7.0...) to remove legacy mess ? What
should happen if someone plug an unconverted FS in such a system
afterwards ?
These are the questions people have been asking.
[-- Attachment #2: Ceci est une partie de message numériquement signée --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
2004-02-15 23:03 Nicolas Mailhot
@ 2004-02-16 3:45 ` Jan Knutar
2004-02-16 8:30 ` Nicolas Mailhot
2004-02-16 6:21 ` jw schultz
1 sibling, 1 reply; 29+ messages in thread
From: Jan Knutar @ 2004-02-16 3:45 UTC (permalink / raw)
To: Nicolas Mailhot, linux-kernel
> - what happens to already existing invalid UTF-8 filenames ? Should
> the kernel forcibly rewrite them (in 2.7.0...) to remove legacy mess
> ? What should happen if someone plug an unconverted FS in such a
> system afterwards ?
What I would like would be a userspace tool, that would recurse and
convert filename encodings from specified locale to UTF-8. Something
like "any2utf8 -from iso8859-1 -recurse /mnt/myoldmp3disk".
Does anyone know if such a tool exists already?
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
2004-02-15 23:03 Nicolas Mailhot
2004-02-16 3:45 ` Jan Knutar
@ 2004-02-16 6:21 ` jw schultz
2004-02-16 15:55 ` Jamie Lokier
1 sibling, 1 reply; 29+ messages in thread
From: jw schultz @ 2004-02-16 6:21 UTC (permalink / raw)
To: linux-kernel
On Mon, Feb 16, 2004 at 12:03:03AM +0100, Nicolas Mailhot wrote:
> | Linus Torvalds pointed the way of Tux :
>
> | In short: the kernel talks bytestreams, and that implies that if you
> | want to talk to the kernel, you HAVE TO USE UTF-8.
>
> In that case :
> - should the kernel allow apps to write filenames that are invalid
> UTF-8 and will crash UTF-8 apps ?
Yes. The kernel interface specifies it as a bytesteam with
0x00 and 0x2f having special meaning. That is a constraint,
not a policy. It is user space that determines the policy
of UTF-8.
> UTF-8 and will crash UTF-8 apps ?
Fix the broken apps. Crashing because of "invalid" UTF-8 is
no more excusable than crashing because of a string longer
than expected (buffer overrun). Filenames as read from the
filesystem should be treated just like any other untrusted
input.
> - should this UTF-8 rule be noted somewhere (in a FAQ/man page/LSB spec/
> whatever) so apps authors know they are supposed to read and write UTF-8
> filenames and not apply locale rules to kernel objects ?
Since the LSB spec describes user space it might be a
suitable place.
> - what happens to already existing invalid UTF-8 filenames ? Should the
> kernel forcibly rewrite them (in 2.7.0...) to remove legacy mess ? What
If you have a filesystem with filenames that don't conform
to your policy write userspace tools to detect and/or fix
them. If you have programs creating non-conforming
filenames, fix or rm those programs.
> kernel forcibly rewrite them (in 2.7.0...) to remove legacy mess ? What
> should happen if someone plug an unconverted FS in such a system
> afterwards ?
The kernel won't care. Any user space code that treats the
filenames as something other than bytestreams should be able
to cope with any sequence of bytes.
> These are the questions people have been asking.
OK. The questions have been asked and answered.
Asking again and again and again won't change the answer.
--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: jw@pegasys.ws
Remember Cernan and Schmitt
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
2004-02-16 3:45 ` Jan Knutar
@ 2004-02-16 8:30 ` Nicolas Mailhot
2004-02-16 8:54 ` Valdis.Kletnieks
0 siblings, 1 reply; 29+ messages in thread
From: Nicolas Mailhot @ 2004-02-16 8:30 UTC (permalink / raw)
To: Jan Knutar; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 997 bytes --]
Le lun, 16/02/2004 à 05:45 +0200, Jan Knutar a écrit :
> > - what happens to already existing invalid UTF-8 filenames ? Should
> > the kernel forcibly rewrite them (in 2.7.0...) to remove legacy mess
> > ? What should happen if someone plug an unconverted FS in such a
> > system afterwards ?
>
> What I would like would be a userspace tool, that would recurse and
> convert filename encodings from specified locale to UTF-8. Something
> like "any2utf8 -from iso8859-1 -recurse /mnt/myoldmp3disk".
> Does anyone know if such a tool exists already?
One can do find+ recode magic now
The question is :
- can this be automated ?
- how can one recognise and unconverted fs ?
- how can on guess the encodings(s) that have been used before on such
an fs ?
You're assuming the situation is merely a iso8859-1 to utf-8 migration.
Far from it. The core problem is everyone damn wrote what it pleased him
without considering future readers.
Cheers,
--
Nicolas Mailhot
[-- Attachment #2: Ceci est une partie de message numériquement signée --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
2004-02-16 8:30 ` Nicolas Mailhot
@ 2004-02-16 8:54 ` Valdis.Kletnieks
0 siblings, 0 replies; 29+ messages in thread
From: Valdis.Kletnieks @ 2004-02-16 8:54 UTC (permalink / raw)
To: Nicolas Mailhot; +Cc: Jan Knutar, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1325 bytes --]
On Mon, 16 Feb 2004 09:30:41 +0100, Nicolas Mailhot said:
> You're assuming the situation is merely a iso8859-1 to utf-8 migration.
> Far from it. The core problem is everyone damn wrote what it pleased him
> without considering future readers.
Given the fact that there isn't in general any way for the kernel to know what
was intended, I don't see how any kernel policy other than "NUL and / are
special, but if you use anything other than UTF-8 it will eventually come back
to haunt you" can possibly be made to work.
For that matter, I have seen actual production code that intentionally created
fairly deep directory trees and terminal file names that were basically hashes
written in radix-254 and blatted out in binary. Lots of them. The original
problem report I got was along the lines of "We installed XYZ, and the file
system appears corrupted - ls -R weird the screen out, and 'find | wc -l' is
127,000 different than what 'df -i' reports".
I was ready to strangle the guilty party - radix-64 wouldn't have been a big
efficiency hit and at least the uuencode/base-64 charset doesn't weird your
terminal out. :)
So it's not even always possible to make the assumption that the filename is
supposed to make sense in *any* charset. This one requires fixing in some
combination of userspace and meatspace....
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
2004-02-15 14:48 Pascal Schmidt
@ 2004-02-16 14:24 ` Eduard Bloch
0 siblings, 0 replies; 29+ messages in thread
From: Eduard Bloch @ 2004-02-16 14:24 UTC (permalink / raw)
To: Pascal Schmidt; +Cc: Jamie Lokier, linux-kernel
Moin Pascal!
Pascal Schmidt schrieb am Sunday, den 15. February 2004:
> >> iso-8859-1 for that virtual terminal, but the keyboard remained
> >> stuck in UTF-8 for _all_ virtual terminals.
> > kbd_mode -a to reset to ASCII mode.
>
> And as I just figured out, loadkeys has to be invoked again, also.
>
> I can go to utf-8 with:
>
> setfont lat0-16
> kbd_mode -u
> loadkeys de-latin1-nodeadkeys
When I do this, I still cannot enter unicode chars "as usual". I see
them, mutt (for example) displays everything correct with a UTF-8
locale. However, I cannot insert them correctly. When I use vim, I have
to press another key (eg. Space) 2..4 times after an umlaut was pressed,
only then the char appears.
Needless to say that the same applications work fine in X with the same
UTF-8 locale.
Regards,
Eduard.
--
Lob ist eine gewaltige Antriebskraft, dessen Zauber seine Wirkung nie
verfehlt.
-- Andor Foldes
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
[not found] ` <1pRVj-2am-29@gated-at.bofh.it>
@ 2004-02-16 15:32 ` Pascal Schmidt
2004-02-16 19:05 ` Eduard Bloch
0 siblings, 1 reply; 29+ messages in thread
From: Pascal Schmidt @ 2004-02-16 15:32 UTC (permalink / raw)
To: Eduard Bloch; +Cc: linux-kernel
On Mon, 16 Feb 2004 15:30:21 +0100, you wrote in linux.kernel:
> When I do this, I still cannot enter unicode chars "as usual". I see
> them, mutt (for example) displays everything correct with a UTF-8
> locale. However, I cannot insert them correctly. When I use vim, I have
> to press another key (eg. Space) 2..4 times after an umlaut was pressed,
> only then the char appears.
You're right, inputing UTF-8 (in joe) doesn't work, but that's an
application problem, I think, because it works just fine on a shell
prompt.
--
Ciao,
Pascal
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
2004-02-16 6:21 ` jw schultz
@ 2004-02-16 15:55 ` Jamie Lokier
2004-02-17 6:47 ` jw schultz
0 siblings, 1 reply; 29+ messages in thread
From: Jamie Lokier @ 2004-02-16 15:55 UTC (permalink / raw)
To: jw schultz, linux-kernel
jw schultz wrote:
> If you have a filesystem with filenames that don't conform
> to your policy write userspace tools to detect and/or fix
> them. If you have programs creating non-conforming
> filenames, fix or rm those programs.
You do understand that GNU coreutils, bash etc. are among those
programs, right? As in "touch zöe.txt" creates a non-conforming
filename...
> OK. The questions have been asked and answered.
> Asking again and again and again won't change the answer.
The question of what a program like this should do has not been
answered:
perl -e 'for (glob "*") { rename $_, "ņi-".$_ or die "rename: $!\n"; }'
(NB: The prefix string is N WITH CEDILLA followed by "i-").
Hint: it mangles perfectly fine non-ASCII file names, instead of just
prefixing the prefix string. If you change the program to correctly
prepend the prefix string, then it mangles non-UTF-8 names, which is
arguably correct, but can result in you losing some files.
This _is_ a userspace problem, but it is a genuine problem for which
no good answer is yet apparent.
-- Jamie
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
2004-02-16 15:32 ` JFS default behavior Pascal Schmidt
@ 2004-02-16 19:05 ` Eduard Bloch
0 siblings, 0 replies; 29+ messages in thread
From: Eduard Bloch @ 2004-02-16 19:05 UTC (permalink / raw)
To: Pascal Schmidt; +Cc: linux-kernel
Moin Pascal!
Pascal Schmidt schrieb am Monday, den 16. February 2004:
> > When I do this, I still cannot enter unicode chars "as usual". I see
> > them, mutt (for example) displays everything correct with a UTF-8
> > locale. However, I cannot insert them correctly. When I use vim, I have
> > to press another key (eg. Space) 2..4 times after an umlaut was pressed,
> > only then the char appears.
>
> You're right, inputing UTF-8 (in joe) doesn't work, but that's an
> application problem, I think, because it works just fine on a shell
> prompt.
No, does not work for me either (up-to-date bash). General multibyte
support in joe is a different problem.
Regards,
Eduard.
--
Wie man sein Kind nicht nennen sollte:
Mario Hanna
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
2004-02-16 15:55 ` Jamie Lokier
@ 2004-02-17 6:47 ` jw schultz
2004-02-17 21:37 ` Jamie Lokier
0 siblings, 1 reply; 29+ messages in thread
From: jw schultz @ 2004-02-17 6:47 UTC (permalink / raw)
To: linux-kernel
On Mon, Feb 16, 2004 at 03:55:34PM +0000, Jamie Lokier wrote:
> jw schultz wrote:
> > If you have a filesystem with filenames that don't conform
> > to your policy write userspace tools to detect and/or fix
> > them. If you have programs creating non-conforming
> > filenames, fix or rm those programs.
>
> You do understand that GNU coreutils, bash etc. are among those
Doesn't matter where they come from.
> programs, right? As in "touch zöe.txt" creates a non-conforming
> filename...
Your concrete example is a good one. Where did that
filename come from? It would seem to have come from the
keyboard via a tty (or simulator) which also had to display
it. I'd say this is an argument for the terminal to display
UTF-8 and convert intput into UTF-8. That is something that
seems to be not consistantly done as yet. Ultimately it
seems to be a responsiblity of the user interface, whether
tty or GUI. Until that happens the shells might be able to
fill the gap, however poorly.
Perhaps the utilities that don't attempt to interpret
filenames should treat filenames exactly like the kernel
does.
> > OK. The questions have been asked and answered.
> > Asking again and again and again won't change the answer.
>
> The question of what a program like this should do has not been
> answered:
>
> perl -e 'for (glob "*") { rename $_, "??i-".$_ or die "rename: $!\n"; }'
>
> (NB: The prefix string is N WITH CEDILLA followed by "i-").
>
> Hint: it mangles perfectly fine non-ASCII file names, instead of just
> prefixing the prefix string. If you change the program to correctly
> prepend the prefix string, then it mangles non-UTF-8 names, which is
> arguably correct, but can result in you losing some files.
Then if there is incorrect behavior is it the shell, tty or perl that is
getting things wrong here.
> This _is_ a userspace problem, but it is a genuine problem for which
> no good answer is yet apparent.
I'll buy that. Then the first question to ask is "what is
the correct forum for resolving this".
--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: jw@pegasys.ws
Remember Cernan and Schmitt
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
2004-02-17 6:47 ` jw schultz
@ 2004-02-17 21:37 ` Jamie Lokier
2004-02-17 22:12 ` Linus Torvalds
0 siblings, 1 reply; 29+ messages in thread
From: Jamie Lokier @ 2004-02-17 21:37 UTC (permalink / raw)
To: jw schultz, linux-kernel
jw schultz wrote:
> Your concrete example is a good one. Where did that
> filename come from? It would seem to have come from the
> keyboard via a tty (or simulator) which also had to display
> it. I'd say this is an argument for the terminal to display
> UTF-8 and convert intput into UTF-8. That is something that
> seems to be not consistantly done as yet. Ultimately it
> seems to be a responsiblity of the user interface, whether
> tty or GUI. Until that happens the shells might be able to
> fill the gap, however poorly.
Many terminals will not ever display UTF-8. Think: all the serial terminals.
This is why I think "stty utf8" or something along those lines would
be useful. The terminal itself doesn't have to talk UTF-8; however,
the applications talking with /dev/tty would always see UTF-8.
That seems to solve most of the practical user interface problems of
the command line, in one single clean place.
-- Jamie
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
2004-02-17 21:37 ` Jamie Lokier
@ 2004-02-17 22:12 ` Linus Torvalds
2004-02-18 9:59 ` Jamie Lokier
0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2004-02-17 22:12 UTC (permalink / raw)
To: Jamie Lokier; +Cc: jw schultz, linux-kernel
On Tue, 17 Feb 2004, Jamie Lokier wrote:
>
> Many terminals will not ever display UTF-8. Think: all the serial terminals.
>
> This is why I think "stty utf8" or something along those lines would
> be useful. The terminal itself doesn't have to talk UTF-8; however,
> the applications talking with /dev/tty would always see UTF-8.
>
> That seems to solve most of the practical user interface problems of
> the command line, in one single clean place.
Doesn't "screen" already do this? I don't think you want to have the
locale handling in the kernel, along with translation of multi-key
characters (and from things like CJK terminals? I don't know what format
they send). Sounds like you should use a user-mode thing that knows about
locales...
Linus
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
2004-02-17 22:12 ` Linus Torvalds
@ 2004-02-18 9:59 ` Jamie Lokier
2004-02-18 15:54 ` Linus Torvalds
0 siblings, 1 reply; 29+ messages in thread
From: Jamie Lokier @ 2004-02-18 9:59 UTC (permalink / raw)
To: Linus Torvalds; +Cc: jw schultz, linux-kernel
Linus Torvalds wrote:
> Doesn't "screen" already do this? I don't think you want to have the
> locale handling in the kernel, along with translation of multi-key
> characters (and from things like CJK terminals? I don't know what format
> they send). Sounds like you should use a user-mode thing that knows about
> locales...
Yes. I was thinking in a rather DEC VT100/Putty/xterm- centric way
for a moment; please excuse the slip.
It's irritating that logging in from the wrong kind of terminal
doesn't just provide the right "user experience" for the command line
automatically. It's also a pain that ssh doesn't inform the remote
end whether the local terminal is UTF-8, so everything seem to be
working fine until one day you discover typing "£" in an editor just
beeps. Grr.. Oh well.
These are all solvable in userspace. Then again, so were most of the
other stty options; didn't stop them from being implemented in the kernel :)
-- Jamie
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
2004-02-18 9:59 ` Jamie Lokier
@ 2004-02-18 15:54 ` Linus Torvalds
2004-02-18 23:58 ` Jamie Lokier
0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2004-02-18 15:54 UTC (permalink / raw)
To: Jamie Lokier; +Cc: jw schultz, linux-kernel
On Wed, 18 Feb 2004, Jamie Lokier wrote:
>
> It's irritating that logging in from the wrong kind of terminal
> doesn't just provide the right "user experience" for the command line
> automatically.
Well, you should be able to just start something "screen"-equivalent
directly by just making it your default shell or have a fix to "login".
The thing is, the kernel tty layer is happy to work with utf-8 (well,
modulo the issues of erase etc - and Andries posted that patch already,
and there are probably others like it) if your terminal supports it, but
if your terminal doesn't have CJK supprt internally, then you need
something to do the multi-character translations anyway in order to be
able to input them in the first place.
And that is _not_ an stty option.
Btw, from the screen man-page it appears that screen is not able to do
that either. You can put screen into utf-8 mode, but it sounds like it
just means that it passes UTF-8 through, not that it does any translation
from "latin1 vt100 to utf-8".
I think there are a few editors that actually do ("mined" looks like it
should do it).
Linus
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: JFS default behavior
2004-02-18 15:54 ` Linus Torvalds
@ 2004-02-18 23:58 ` Jamie Lokier
0 siblings, 0 replies; 29+ messages in thread
From: Jamie Lokier @ 2004-02-18 23:58 UTC (permalink / raw)
To: Linus Torvalds; +Cc: jw schultz, linux-kernel
Linus Torvalds wrote:
> Btw, from the screen man-page it appears that screen is not able to do
> that either. You can put screen into utf-8 mode, but it sounds like it
> just means that it passes UTF-8 through, not that it does any translation
> from "latin1 vt100 to utf-8".
Screen works nicely. Do this:
echo 'defutf8 on' >> ~/.screenrc
Then screen presents a UTF-8 interface to the shell and other
programs, regardless of what kind of terminal you connect from :)
(It's a bit overkill, no actually it's a lot overkill, and you have the
annoyance of screen intercepting at least one commonly used editing key.)
(Just remember to set the LANG environment variable to include
".UTF-8" so that screen-oriented programs know to display properly. I
do it automatically using a script which queries the current terminal,
to workaround ssh not forwarding LANG).
> I think there are a few editors that actually do ("mined" looks like it
> should do it).
Emacs does, of course.
-- Jamie
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2004-02-18 23:59 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1pvUz-6j-1@gated-at.bofh.it>
[not found] ` <1pRVj-2am-29@gated-at.bofh.it>
2004-02-16 15:32 ` JFS default behavior Pascal Schmidt
2004-02-16 19:05 ` Eduard Bloch
2004-02-15 23:03 Nicolas Mailhot
2004-02-16 3:45 ` Jan Knutar
2004-02-16 8:30 ` Nicolas Mailhot
2004-02-16 8:54 ` Valdis.Kletnieks
2004-02-16 6:21 ` jw schultz
2004-02-16 15:55 ` Jamie Lokier
2004-02-17 6:47 ` jw schultz
2004-02-17 21:37 ` Jamie Lokier
2004-02-17 22:12 ` Linus Torvalds
2004-02-18 9:59 ` Jamie Lokier
2004-02-18 15:54 ` Linus Torvalds
2004-02-18 23:58 ` Jamie Lokier
-- strict thread matches above, loose matches on Subject: below --
2004-02-15 14:48 Pascal Schmidt
2004-02-16 14:24 ` Eduard Bloch
[not found] <04Feb13.163954est.41760@gpu.utcc.utoronto.ca>
2004-02-14 14:27 ` Nicolas Mailhot
2004-02-14 15:40 ` viro
2004-02-14 17:47 ` Nicolas Mailhot
2004-02-14 17:59 ` Nicolas Mailhot
2004-02-14 23:06 ` Robin Rosenberg
2004-02-14 23:29 ` viro
2004-02-15 0:07 ` Robin Rosenberg
2004-02-15 2:41 ` Linus Torvalds
2004-02-15 3:33 ` Matthias Urlichs
2004-02-15 4:04 ` viro
2004-02-15 9:48 ` Robin Rosenberg
2004-02-15 18:26 ` yodaiken
2004-02-12 16:50 JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) Nicolas Mailhot
2004-02-13 3:03 ` Jamie Lokier
2004-02-13 18:06 ` Nicolas Mailhot
2004-02-13 18:15 ` viro
2004-02-13 18:31 ` Richard B. Johnson
2004-02-13 18:50 ` JFS default behavior Ulrich Drepper
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox