public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)
@ 2004-02-12 16:50 Nicolas Mailhot
  2004-02-12 18:12 ` Robin Rosenberg
  2004-02-13  3:03 ` Jamie Lokier
  0 siblings, 2 replies; 40+ messages in thread
From: Nicolas Mailhot @ 2004-02-12 16:50 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1450 bytes --]

Not specifying the file name encoding (either per fs type, per partition
or per filename) is plain dangerous. It is not a userspace problem -
flash/hotplug disks move, users on the same system can have different
locales and try to share files, a user can change his locale to another
one (hear the screams of RH users forcibly converted to utf8 which had
to fix years of storage which filenames were suddenly borked) 

See also the sun zip encoding bug - everyone uses zip files in Java, zip
authors thought a filename is "just a bunch of bytes" and didn't put
filename encoding info in the zip format, and now java zip handling goes
boom since numerous encodings are unicode-incompatible. It's slowly
getting its way to the top-25 most reported java bugs.

(of course as usual US users/coders  are not hit and do not feel
concerned)

The only reason we got by with it so far is linux localisation was poor,
and systems didn't scale high enough to permit high number of users per
system (reducing locale collision risks)

The only reason we might get by in the future is everyone will be using
utf8.

But that's not a reason not to fix the core problem - I don't want to
spent hours fixing filenames next time someone comes up with a new
encoding. Please put valid encoding info somewhere or declare filenames
are utf-8 od utf-16 only - changing user locale should not corrupt old
data.

Cheers,

-- 
Nicolas Mailhot

[-- Attachment #2: Ceci est une partie de message numériquement signée --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)
  2004-02-12 16:50 JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) Nicolas Mailhot
@ 2004-02-12 18:12 ` Robin Rosenberg
  2004-02-13  3:03 ` Jamie Lokier
  1 sibling, 0 replies; 40+ messages in thread
From: Robin Rosenberg @ 2004-02-12 18:12 UTC (permalink / raw)
  To: Nicolas Mailhot; +Cc: linux-kernel

On Thursday 12 February 2004 17.50, you wrote:
> But that's not a reason not to fix the core problem - I don't want to
> spent hours fixing filenames next time someone comes up with a new
> encoding. Please put valid encoding info somewhere or declare filenames
> are utf-8 od utf-16 only - changing user locale should not corrupt old
> data.

Yes! 

-- robin

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)
  2004-02-12 16:50 JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) Nicolas Mailhot
  2004-02-12 18:12 ` Robin Rosenberg
@ 2004-02-13  3:03 ` Jamie Lokier
  2004-02-13 10:07   ` Robin Rosenberg
  2004-02-13 18:06   ` Nicolas Mailhot
  1 sibling, 2 replies; 40+ messages in thread
From: Jamie Lokier @ 2004-02-13  3:03 UTC (permalink / raw)
  To: Nicolas Mailhot; +Cc: linux-kernel

Nicolas Mailhot wrote:
> But that's not a reason not to fix the core problem - I don't want to
> spent hours fixing filenames next time someone comes up with a new
> encoding. Please put valid encoding info somewhere or declare filenames
> are utf-8 od utf-16 only - changing user locale should not corrupt old
> data.

If you attach encoding to names for a whole filesystem, you will get
really unpleasant bugs including security holes because some names
won't be writable, so the fs will either return error codes when those
names are used, or silently alter the names.

-- Jamie


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)
  2004-02-13  3:03 ` Jamie Lokier
@ 2004-02-13 10:07   ` Robin Rosenberg
  2004-02-13 18:06   ` Nicolas Mailhot
  1 sibling, 0 replies; 40+ messages in thread
From: Robin Rosenberg @ 2004-02-13 10:07 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Nicolas Mailhot, linux-kernel

On Friday 13 February 2004 04.03, Jamie Lokier wrote:
> Nicolas Mailhot wrote:
> > But that's not a reason not to fix the core problem - I don't want to
> > spent hours fixing filenames next time someone comes up with a new
> > encoding. Please put valid encoding info somewhere or declare filenames
> > are utf-8 od utf-16 only - changing user locale should not corrupt old
> > data.
> 
> If you attach encoding to names for a whole filesystem, you will get
> really unpleasant bugs including security holes because some names
> won't be writable, so the fs will either return error codes when those
> names are used, or silently alter the names.

Depends on how to handle those undecodeble file names. non-ascii filenames are
probably a security issue (negative characters) with some apps. Making them inaccessible
is definitely not ok. I proposed one version, although it might be a good idea to look at those file
systems that handle the problem already so a uniform solution could be used that makes all filenames
accessible regardless of which characters are used and doesn't cause unneccessary
confusion as to what is the name.

-- robin

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)
  2004-02-13  3:03 ` Jamie Lokier
  2004-02-13 10:07   ` Robin Rosenberg
@ 2004-02-13 18:06   ` Nicolas Mailhot
  2004-02-13 18:15     ` viro
  1 sibling, 1 reply; 40+ messages in thread
From: Nicolas Mailhot @ 2004-02-13 18:06 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1326 bytes --]

Le ven, 13/02/2004 à 03:03 +0000, Jamie Lokier a écrit :
> Nicolas Mailhot wrote:
> > But that's not a reason not to fix the core problem - I don't want to
> > spent hours fixing filenames next time someone comes up with a new
> > encoding. Please put valid encoding info somewhere or declare filenames
> > are utf-8 od utf-16 only - changing user locale should not corrupt old
> > data.
> 
> If you attach encoding to names for a whole filesystem, you will get
> really unpleasant bugs including security holes because some names
> won't be writable, so the fs will either return error codes when those
> names are used, or silently alter the names.

You can have security holes now just by tricking an app into reading
files written by another app which disagreed on the locale.

And as for the filename problems :
- just mangle existing invalid filenames when a default encoding is
agreed upon
- refuse to write new files with invalid filenames just like you would
with the few names forbidden in ascii - apps will learn to cope.

Some convention is needed, expecting it to materialise without os
enforcement is deluding oneself, getting a change like this in place
will definitely be painful but the current situation is far from
painless for a lot of people.

Regards,

-- 
Nicolas Mailhot

[-- Attachment #2: Ceci est une partie de message numériquement signée --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)
  2004-02-13 18:06   ` Nicolas Mailhot
@ 2004-02-13 18:15     ` viro
  2004-02-13 18:24       ` Valdis.Kletnieks
  2004-02-13 18:31       ` Richard B. Johnson
  0 siblings, 2 replies; 40+ messages in thread
From: viro @ 2004-02-13 18:15 UTC (permalink / raw)
  To: Nicolas Mailhot; +Cc: Jamie Lokier, linux-kernel

On Fri, Feb 13, 2004 at 07:06:46PM +0100, Nicolas Mailhot wrote:
> And as for the filename problems :
> - just mangle existing invalid filenames when a default encoding is
> agreed upon
> - refuse to write new files with invalid filenames just like you would
> with the few names forbidden in ascii - apps will learn to cope.

What names forbidden in ASCII?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)
  2004-02-13 18:15     ` viro
@ 2004-02-13 18:24       ` Valdis.Kletnieks
  2004-02-13 18:31         ` viro
  2004-02-13 18:31       ` Richard B. Johnson
  1 sibling, 1 reply; 40+ messages in thread
From: Valdis.Kletnieks @ 2004-02-13 18:24 UTC (permalink / raw)
  To: viro; +Cc: Nicolas Mailhot, Jamie Lokier, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 150 bytes --]

On Fri, 13 Feb 2004 18:15:42 GMT, viro@parcelfarce.linux.theplanet.co.uk said:

> What names forbidden in ASCII?

Anything with a / or a \0 in it. ;)

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)
  2004-02-13 18:24       ` Valdis.Kletnieks
@ 2004-02-13 18:31         ` viro
  2004-02-13 20:27           ` Jamie Lokier
  0 siblings, 1 reply; 40+ messages in thread
From: viro @ 2004-02-13 18:31 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: Nicolas Mailhot, Jamie Lokier, linux-kernel

On Fri, Feb 13, 2004 at 01:24:33PM -0500, Valdis.Kletnieks@vt.edu wrote:
> On Fri, 13 Feb 2004 18:15:42 GMT, viro@parcelfarce.linux.theplanet.co.uk said:
> 
> > What names forbidden in ASCII?
> 
> Anything with a / or a \0 in it. ;)

You try and pass something _without_ \0 in it to the kernel ;-)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)
  2004-02-13 18:15     ` viro
  2004-02-13 18:24       ` Valdis.Kletnieks
@ 2004-02-13 18:31       ` Richard B. Johnson
  2004-02-13 18:50         ` JFS default behavior Ulrich Drepper
  2004-02-13 22:39         ` JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) Robin Rosenberg
  1 sibling, 2 replies; 40+ messages in thread
From: Richard B. Johnson @ 2004-02-13 18:31 UTC (permalink / raw)
  To: viro; +Cc: Nicolas Mailhot, Jamie Lokier, linux-kernel

On Fri, 13 Feb 2004 viro@parcelfarce.linux.theplanet.co.uk wrote:

> On Fri, Feb 13, 2004 at 07:06:46PM +0100, Nicolas Mailhot wrote:
> > And as for the filename problems :
> > - just mangle existing invalid filenames when a default encoding is
> > agreed upon
> > - refuse to write new files with invalid filenames just like you would
> > with the few names forbidden in ascii - apps will learn to cope.
>
> What names forbidden in ASCII?

I think that all ASCII characters below 0x20 are forbidden in
Unix file-names and others shown in the reference cited and
"disapproved".

http://www.med.nyu.edu/rcr/rcr/nyu_vms/unixfileanddirectorynames.htm


Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
            Note 96.31% of all statistics are fiction.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
  2004-02-13 18:31       ` Richard B. Johnson
@ 2004-02-13 18:50         ` Ulrich Drepper
  2004-02-13 22:39         ` JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) Robin Rosenberg
  1 sibling, 0 replies; 40+ messages in thread
From: Ulrich Drepper @ 2004-02-13 18:50 UTC (permalink / raw)
  To: root; +Cc: viro, Nicolas Mailhot, Jamie Lokier, linux-kernel

Richard B. Johnson wrote:

> I think that all ASCII characters below 0x20 are forbidden in
> Unix file-names

Not true.  Filenames in Unix are defined as

3.169 Filename
  A name consisting of 1 to {NAME_MAX} bytes used to name a file. The
  characters composing the name may be selected from the set of all
  character values excluding the slash character and the null byte. The
  filenames dot and dot-dot have special meaning. A filename is
  sometimes referred to as a   pathname component  .


Only NUL and / are special.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)
  2004-02-13 18:31         ` viro
@ 2004-02-13 20:27           ` Jamie Lokier
  0 siblings, 0 replies; 40+ messages in thread
From: Jamie Lokier @ 2004-02-13 20:27 UTC (permalink / raw)
  To: viro; +Cc: Valdis.Kletnieks, Nicolas Mailhot, linux-kernel

viro@parcelfarce.linux.theplanet.co.uk wrote:
> You try and pass something _without_ \0 in it to the kernel ;-)

:)

But seriously, even that is a security issue when someone requests a
URL containing "%00", or some text contains a filename to operate on
and the name contains \0.

For example, if I write a Perl regular expression to reject paths from
the outside world containing "..": m{(?:/|^)\.\.(?:/|\z)}, it will
fail to notice when given the path "..\0" that the kernel will treat
it identically to "..".  Potential security hole, depending on the context.

-- Jamie

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)
  2004-02-13 18:31       ` Richard B. Johnson
  2004-02-13 18:50         ` JFS default behavior Ulrich Drepper
@ 2004-02-13 22:39         ` Robin Rosenberg
  1 sibling, 0 replies; 40+ messages in thread
From: Robin Rosenberg @ 2004-02-13 22:39 UTC (permalink / raw)
  To: root; +Cc: viro, Nicolas Mailhot, Jamie Lokier, linux-kernel

On Friday 13 February 2004 19.31, Richard B. Johnson wrote:
> On Fri, 13 Feb 2004 viro@parcelfarce.linux.theplanet.co.uk wrote:
> 
> > On Fri, Feb 13, 2004 at 07:06:46PM +0100, Nicolas Mailhot wrote:
> > > And as for the filename problems :
> > > - just mangle existing invalid filenames when a default encoding is
> > > agreed upon
> > > - refuse to write new files with invalid filenames just like you would
> > > with the few names forbidden in ascii - apps will learn to cope.
> >
> > What names forbidden in ASCII?
> 
> I think that all ASCII characters below 0x20 are forbidden in
> Unix file-names and others shown in the reference cited and
> "disapproved".
> 
> http://www.med.nyu.edu/rcr/rcr/nyu_vms/unixfileanddirectorynames.htm

That's not really a formal definition of what's allowed. It's a recommendation
for users on how to avoid detecting applications that cannot handle all file names,
i.e. buggy applications. Try 

	touch "$(/bin/ls -1|head)"

and you will find apps that can handle the nice filename and those that cannot. I'm
definitely not endorsing them and it would probably be wise to implement a system policy that
allows administrators to ban such names as they represent security holes and all sorts of
problems.

Some filesystems forbid these names, but unix doesn't.

-- robin

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
       [not found] <04Feb13.163954est.41760@gpu.utcc.utoronto.ca>
@ 2004-02-14 14:27 ` Nicolas Mailhot
  2004-02-14 15:40   ` viro
  0 siblings, 1 reply; 40+ messages in thread
From: Nicolas Mailhot @ 2004-02-14 14:27 UTC (permalink / raw)
  To: chris.siebenmann; +Cc: linux-kernel

Chris Siebenmann wrote:

> You write:
> | So what ?
> | Do you think an app that expects utf-8 filenames won't crash today when
> | served a byte sequence that's invalid UTF-8 ? (or an app that expects
> | ascii when served utf-8 oddities)
> 
>  Such apps are buggy and need to be fixed. 

Well, this means every single java app right now at least.

> This is not Unix's problem,

The w2k problem was at the app level mostly.
It would not have been OS responsibility to fix it.
*However* since the unix time conventions were a bit more sane than 
other os, the damage was less.

> any more than it is Unix's problem if an application frees memory twice,
> writes over unallocated memory, or destroys its stack.

The core os responsability is to share sanely ressources between apps.
Filenames are a shared ressource.
When encodings starts to be incompatible, resulting in applications 
crashes it's the OS job to define and enforce sane conventions so apps 
can coexist together.

Past oversights should not mean the problem should not be fixed 
(especially if solutions exist, even if they are not totally painless).

There is no more justification to keep encoding undefined as there is to 
keep time zone undefined. Last I've seen we're all pretty happy system 
time actually means something on unix (unlike other systems where it can 
be anything depending on the location where the initial installation was 
performed).

>  If all you care about is the future, you need no kernel support.
> Declare that all filesystem names are written in UTF-8, and make your
> tools deal with it. (Most will not care. A few will have to be fixed a
> bit.)

Tools won't change unless they're forced to. That's a plain fact.
As you wrote there shouldn't be a lot of fixups to do, since apps that 
can't deal with utf-8 now use ascii-only filenames anyway, but the few 
fixups that are needed won't happen without a little OS prodding.

(and without OS enforcement illegal utf-8 filename injection will remain 
a security risk)

And I write utf8 here, but any unicode form is fine with me as long as 
it's clearly defined and enforced by the FSs.

Cheers,

-- 
Nicolas Mailhot



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
  2004-02-14 14:27 ` JFS default behavior Nicolas Mailhot
@ 2004-02-14 15:40   ` viro
  2004-02-14 17:47     ` Nicolas Mailhot
  2004-02-14 23:06     ` Robin Rosenberg
  0 siblings, 2 replies; 40+ messages in thread
From: viro @ 2004-02-14 15:40 UTC (permalink / raw)
  To: Nicolas Mailhot; +Cc: chris.siebenmann, linux-kernel

On Sat, Feb 14, 2004 at 03:27:50PM +0100, Nicolas Mailhot wrote:
> There is no more justification to keep encoding undefined as there is to 
> keep time zone undefined. Last I've seen we're all pretty happy system 
> time actually means something on unix (unlike other systems where it can 
> be anything depending on the location where the initial installation was 
> performed).

"System time" is amount of time elapsed since the epoch.  Period.  What does
it have to any timezone?

The only place where timezone enters the picture is conversion of time to
year:month:day:hours:minutes:seconds and that's
	a) process-dependent and
	b) done outside of kernel

The same goes for file names.  Filename is a sequence of bytes, no more and
no less.  Anything beyond that belongs to applications.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
  2004-02-14 15:40   ` viro
@ 2004-02-14 17:47     ` Nicolas Mailhot
  2004-02-14 17:59       ` Nicolas Mailhot
  2004-02-14 23:06     ` Robin Rosenberg
  1 sibling, 1 reply; 40+ messages in thread
From: Nicolas Mailhot @ 2004-02-14 17:47 UTC (permalink / raw)
  To: viro; +Cc: chris.siebenmann, linux-kernel

viro@parcelfarce.linux.theplanet.co.uk wrote:
> On Sat, Feb 14, 2004 at 03:27:50PM +0100, Nicolas Mailhot wrote:
> 
>>There is no more justification to keep encoding undefined as there is to 
>>keep time zone undefined. Last I've seen we're all pretty happy system 
>>time actually means something on unix (unlike other systems where it can 
>>be anything depending on the location where the initial installation was 
>>performed).
> 
> 
> "System time" is amount of time elapsed since the epoch.  Period.  What does
> it have to any timezone?

And everyone agrees on the epoch and that's why it works.

(just like sensors output is not just any numerical value but has a 
well-defined unit)

With filenames we have a value but what it means exactly is a matter of 
conjecture. That's the problem.
(it wouldn't be if filenames were just magic cookies that never needed 
to be interpreted but there's a lot of actors, be it apps or humans that 
need to agree on what the byte string)

Cheers,

-- 
Nicolas Mailhot



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
  2004-02-14 17:47     ` Nicolas Mailhot
@ 2004-02-14 17:59       ` Nicolas Mailhot
  0 siblings, 0 replies; 40+ messages in thread
From: Nicolas Mailhot @ 2004-02-14 17:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: viro, chris.siebenmann

Nicolas Mailhot wrote:

> to be interpreted but there's a lot of actors, be it apps or humans that 
> need to agree on what the byte string)

... actually means

(bad proofreading, sorry)

-- 
Nicolas Mailhot



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
  2004-02-14 15:40   ` viro
  2004-02-14 17:47     ` Nicolas Mailhot
@ 2004-02-14 23:06     ` Robin Rosenberg
  2004-02-14 23:29       ` viro
  1 sibling, 1 reply; 40+ messages in thread
From: Robin Rosenberg @ 2004-02-14 23:06 UTC (permalink / raw)
  To: viro; +Cc: Linux kernel

On Saturday 14 February 2004 16.40, you wrote:
> The same goes for file names.  Filename is a sequence of bytes, no more and
> no less.  Anything beyond that belongs to applications.

Should be a sequence of characters since humans are supposed to use them and
it should be the same characters wheneve possible regardless of user's locale.

The  "sequence of bytes" idea is a legacy from prehistoric times when byte == character
was true. That is no longer the case and actually hasn't been for quite a while in
some parts of the world. Interchange is important. The application cannot handle
this since it cannot know what characters a byte string represents. Fixing it in the
kernel is the simple solution since it knows the locale. Its also a small change I
believe. Having an iocharset options for all file systems make it backward compatible
and creates a migration path to UTF-8 as system default locale.

-- robin


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
  2004-02-14 23:06     ` Robin Rosenberg
@ 2004-02-14 23:29       ` viro
  2004-02-15  0:07         ` Robin Rosenberg
  0 siblings, 1 reply; 40+ messages in thread
From: viro @ 2004-02-14 23:29 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: Linux kernel

On Sun, Feb 15, 2004 at 12:06:23AM +0100, Robin Rosenberg wrote:
> On Saturday 14 February 2004 16.40, you wrote:
> > The same goes for file names.  Filename is a sequence of bytes, no more and
> > no less.  Anything beyond that belongs to applications.
> 
> Should be a sequence of characters since humans are supposed to use them and
> it should be the same characters wheneve possible regardless of user's locale.
 
> The  "sequence of bytes" idea is a legacy from prehistoric times when byte == character
> was true.

Bullshit.  It has _nothing_ to characters, wide or not.  For system filenames
are opaque.  The only things that have special meanings are:
	octet 0x2f ('/') splits the pathname into components
	"." as a component has a special meaning
	".." as a component has a special meaning.
That's it.  The rest is never interpreted by the kernel.

> Having an iocharset options for all file systems make it backward compatible
> and creates a migration path to UTF-8 as system default locale.

Try to realize that different users CAN HAVE DIFFERENT LOCALES.  On the same
system.  And have files on the same fs.  Moreover, homedirs that used to be
on different filesystems can end up one the same fs.  What iocharset would
you use, then?  Sigh...

Again, there is no such thing as iocharset of filesystem - it varies between
users and users can and do share filesystems.  Think of /home; think of /tmp.

It isn't feasible.  At all.  Just as timezone doesn't belong in kernel, locales
have no place there.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
  2004-02-14 23:29       ` viro
@ 2004-02-15  0:07         ` Robin Rosenberg
  2004-02-15  2:41           ` Linus Torvalds
  0 siblings, 1 reply; 40+ messages in thread
From: Robin Rosenberg @ 2004-02-15  0:07 UTC (permalink / raw)
  To: viro; +Cc: Linux kernel

On Sunday 15 February 2004 00.29, you wrote:
> On Sun, Feb 15, 2004 at 12:06:23AM +0100, Robin Rosenberg wrote:
> > The  "sequence of bytes" idea is a legacy from prehistoric times when byte == character
> > was true.
> 
> Bullshit.  It has _nothing_ to characters, wide or not.  For system filenames
> are opaque.  The only things that have special meanings are:
> 	octet 0x2f ('/') splits the pathname into components
> 	"." as a component has a special meaning
> 	".." as a component has a special meaning.
> That's it.  The rest is never interpreted by the kernel.
I know how it is (to some degree), and its wrong. The user sees inside the filename
and sees a string of characters, not a byte sequence.

> Try to realize that different users CAN HAVE DIFFERENT LOCALES.  On the same
> system.  And have files on the same fs.  Moreover, homedirs that used to be
> on different filesystems can end up one the same fs.  What iocharset would
> you use, then?  Sigh...
Ok, I've got the iocharset option wrong, god knows why. The problem 
however remains.

It seems you simply don't want to understand the problem, which is that users 
CAN HAVE DIFFERENT LOCALES on the same system and on different system. 
Sigh...

I less concerned with which solution than that a solution should be found. So it
seems no file system has a solution today. Still an iocharset option would relieve
the problem for removable media and muli-boot systems. Most linux machines
are essentially single user and have either the same locale for all users or all
users are using UTF-8 with their locale. It's not the locale, but the charset used
for encoding the locale. The rest cannot be helped.

-- robin

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
  2004-02-15  0:07         ` Robin Rosenberg
@ 2004-02-15  2:41           ` Linus Torvalds
  2004-02-15  3:33             ` Matthias Urlichs
  0 siblings, 1 reply; 40+ messages in thread
From: Linus Torvalds @ 2004-02-15  2:41 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: viro, Linux kernel



On Sun, 15 Feb 2004, Robin Rosenberg wrote:
> > 
> > Bullshit.  It has _nothing_ to characters, wide or not.  For system filenames
> > are opaque.  The only things that have special meanings are:
> > 	octet 0x2f ('/') splits the pathname into components
> > 	"." as a component has a special meaning
> > 	".." as a component has a special meaning.
> > That's it.  The rest is never interpreted by the kernel.
>
> I know how it is (to some degree), and its wrong. The user sees inside the filename
> and sees a string of characters, not a byte sequence.

Yes, the user sees a string of characters, but the octet 0x2f ('/') and 
the terminating NUL character '\0' are still perfectly normal characters 
and there is no confusion.

The reason: UTF-8. It's the only sane encoding (apart from a pure extended
ASCII setup, which is also sane, but is obviously unacceptable for a large
portion of the world).

If some misguided person has told you about UCS-2 and horrors like UTF-9,
just ignore them. They are crazy and deluded, and - perhaps more
importantly - stupid.

In short: the kernel talks bytestreams, and that implies that if you want 
to talk to the kernel, you HAVE TO USE UTF-8.

At which point there are no locale issues any more. The only locale issue 
you can have is user space mistaking a stream of bytes as extended ASCII, 
which will cause all your pretty UTF-8 characters to be shown as strange 
latin1 (or other) squiggles.

> It seems you simply don't want to understand the problem, which is that users 
> CAN HAVE DIFFERENT LOCALES on the same system and on different system. 
> Sigh...

People understand the problem. And UTF-8 is the solution.

It's getting there. I think even Microsoft has seen the light, and is
phasing out their crapola (UCS-2LE? Whatever). 

> I less concerned with which solution than that a solution should be found. So it
> seems no file system has a solution today. Still an iocharset option would relieve
> the problem for removable media and muli-boot systems.

No. Things like "iocharset" are not the solution. They are literally the
_problem_. The solution is to use something that not only acts as ASCII,
but also has a wide enough range to cover the whole required space (UCS-2
fails _both_ of these fundamental tests). At which point "iocharset" makes 
no sense any more, and only exists as a way to translate legacy crap into 
the one true format.

And that one true format is UTF-8. End of story. If you try to talk to the 
kernel in UCS-2 or anything else, you _will_ fail.

			Linus

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
  2004-02-15  2:41           ` Linus Torvalds
@ 2004-02-15  3:33             ` Matthias Urlichs
  2004-02-15  4:04               ` viro
  0 siblings, 1 reply; 40+ messages in thread
From: Matthias Urlichs @ 2004-02-15  3:33 UTC (permalink / raw)
  To: linux-kernel

Hi, Linus Torvalds wrote:

> In short: the kernel talks bytestreams, and that implies that if you want
> to talk to the kernel, you HAVE TO USE UTF-8.
> 
> At which point there are no locale issues any more.

Not locale, but normalization problems and identical-glyph problems.

Which is actually worse, because you don't have filenames which look
like crap -- instead you have filenames which look perfectly sane, but
they still do not work. Example: is an á one character, or is it an a
followed by a composing ´?

Mac OSX, just as an example, only uses decomposed filenames. I don't know
the current situation, but 10.2 has major problems when you try to access
files with composite characters in their name (across NFS for instance).

I wonder if Linux, i.e. Linus ;-) should decree one single standard
normalization. (I am NOT saying that enforcing this would be the kernel's
job!)

-- 
Matthias Urlichs

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
  2004-02-15  3:33             ` Matthias Urlichs
@ 2004-02-15  4:04               ` viro
  2004-02-15  9:48                 ` Robin Rosenberg
  2004-02-15 18:26                 ` yodaiken
  0 siblings, 2 replies; 40+ messages in thread
From: viro @ 2004-02-15  4:04 UTC (permalink / raw)
  To: Matthias Urlichs; +Cc: linux-kernel

On Sun, Feb 15, 2004 at 04:33:48AM +0100, Matthias Urlichs wrote:

> Mac OSX, just as an example, only uses decomposed filenames.

So how long does it take for a filename to decompose? ;-)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
  2004-02-15  4:04               ` viro
@ 2004-02-15  9:48                 ` Robin Rosenberg
  2004-02-15 18:26                 ` yodaiken
  1 sibling, 0 replies; 40+ messages in thread
From: Robin Rosenberg @ 2004-02-15  9:48 UTC (permalink / raw)
  To: viro; +Cc: Linux kernel

On Sunday 15 February 2004 05.04, you wrote:
> On Sun, Feb 15, 2004 at 04:33:48AM +0100, Matthias Urlichs wrote:
> 
> > Mac OSX, just as an example, only uses decomposed filenames.
> 
> So how long does it take for a filename to decompose?

As long as it takes to switch locale to UTF-8 :) or vice verse.

-- robin


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
@ 2004-02-15 14:48 Pascal Schmidt
  2004-02-16 14:24 ` Eduard Bloch
  0 siblings, 1 reply; 40+ messages in thread
From: Pascal Schmidt @ 2004-02-15 14:48 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: linux-kernel


>>    Then I did unicode_stop.  Guess what: it put the display back in
>>    iso-8859-1 for that virtual terminal, but the keyboard remained
>>    stuck in UTF-8 for _all_ virtual terminals.
> kbd_mode -a to reset to ASCII mode.

And as I just figured out, loadkeys has to be invoked again, also.

I can go to utf-8 with:

	setfont lat0-16
	kbd_mode -u
	loadkeys de-latin1-nodeadkeys

and return to latin-1 with:

	setfont lat1-16
	kbd_mode -a
	loadkeys de-latin1-nodeadkeys

Without the loadkeys after returning to latin-1 mode, I can no longer
input umlauts and other special characters correctly.

-- 
Ciao,
Pascal

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
  2004-02-15  4:04               ` viro
  2004-02-15  9:48                 ` Robin Rosenberg
@ 2004-02-15 18:26                 ` yodaiken
  1 sibling, 0 replies; 40+ messages in thread
From: yodaiken @ 2004-02-15 18:26 UTC (permalink / raw)
  To: viro; +Cc: Matthias Urlichs, linux-kernel

On Sun, Feb 15, 2004 at 04:04:58AM +0000, viro@parcelfarce.linux.theplanet.co.uk wrote:
> On Sun, Feb 15, 2004 at 04:33:48AM +0100, Matthias Urlichs wrote:
> 
> > Mac OSX, just as an example, only uses decomposed filenames.
> 
> So how long does it take for a filename to decompose? ;-)

Depends on whether it is junk or not.




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
@ 2004-02-15 23:03 Nicolas Mailhot
  2004-02-16  3:45 ` Jan Knutar
  2004-02-16  6:21 ` jw schultz
  0 siblings, 2 replies; 40+ messages in thread
From: Nicolas Mailhot @ 2004-02-15 23:03 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 784 bytes --]

| Linus Torvalds pointed the way of Tux :

| In short: the kernel talks bytestreams, and that implies that if you 
| want to talk to the kernel, you HAVE TO USE UTF-8.

In that case :
- should the kernel allow apps to write filenames that are invalid 
  UTF-8 and will crash UTF-8 apps ?
- should this UTF-8 rule be noted somewhere (in a FAQ/man page/LSB spec/
whatever) so apps authors know they are supposed to read and write UTF-8
filenames and not apply locale rules to kernel objects ?
- what happens to already existing invalid UTF-8 filenames ? Should the
kernel forcibly rewrite them (in 2.7.0...) to remove legacy mess ? What
should happen if someone plug an unconverted FS in such a system
afterwards ?

These are the questions people have been asking.



[-- Attachment #2: Ceci est une partie de message numériquement signée --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
  2004-02-15 23:03 Nicolas Mailhot
@ 2004-02-16  3:45 ` Jan Knutar
  2004-02-16  8:30   ` Nicolas Mailhot
  2004-02-16  6:21 ` jw schultz
  1 sibling, 1 reply; 40+ messages in thread
From: Jan Knutar @ 2004-02-16  3:45 UTC (permalink / raw)
  To: Nicolas Mailhot, linux-kernel

> - what happens to already existing invalid UTF-8 filenames ? Should
> the kernel forcibly rewrite them (in 2.7.0...) to remove legacy mess
> ? What should happen if someone plug an unconverted FS in such a
> system afterwards ?

What I would like would be a userspace tool, that would recurse and 
convert filename encodings from specified locale to UTF-8. Something 
like "any2utf8 -from iso8859-1 -recurse /mnt/myoldmp3disk". 
Does anyone know if such a tool exists already?


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
  2004-02-15 23:03 Nicolas Mailhot
  2004-02-16  3:45 ` Jan Knutar
@ 2004-02-16  6:21 ` jw schultz
  2004-02-16 15:55   ` Jamie Lokier
  1 sibling, 1 reply; 40+ messages in thread
From: jw schultz @ 2004-02-16  6:21 UTC (permalink / raw)
  To: linux-kernel

On Mon, Feb 16, 2004 at 12:03:03AM +0100, Nicolas Mailhot wrote:
> | Linus Torvalds pointed the way of Tux :
> 
> | In short: the kernel talks bytestreams, and that implies that if you 
> | want to talk to the kernel, you HAVE TO USE UTF-8.
> 
> In that case :
> - should the kernel allow apps to write filenames that are invalid 
>   UTF-8 and will crash UTF-8 apps ?

Yes.  The kernel interface specifies it as a bytesteam with
0x00 and 0x2f having special meaning.  That is a constraint,
not a policy.  It is user space that determines the policy
of UTF-8.

>   UTF-8 and will crash UTF-8 apps ?

Fix the broken apps.  Crashing because of "invalid" UTF-8 is
no more excusable than crashing because of a string longer
than expected (buffer overrun).  Filenames as read from the
filesystem should be treated just like any other untrusted
input.

> - should this UTF-8 rule be noted somewhere (in a FAQ/man page/LSB spec/
> whatever) so apps authors know they are supposed to read and write UTF-8
> filenames and not apply locale rules to kernel objects ?

Since the LSB spec describes user space it might be a
suitable place.

> - what happens to already existing invalid UTF-8 filenames ? Should the
> kernel forcibly rewrite them (in 2.7.0...) to remove legacy mess ? What

If you have a filesystem with filenames that don't conform
to your policy write userspace tools to detect and/or fix
them.  If you have programs creating non-conforming
filenames, fix or rm those programs.

> kernel forcibly rewrite them (in 2.7.0...) to remove legacy mess ? What
> should happen if someone plug an unconverted FS in such a system
> afterwards ?

The kernel won't care.  Any user space code that treats the
filenames as something other than bytestreams should be able
to cope with any sequence of bytes.

> These are the questions people have been asking.

OK.  The questions have been asked and answered.
Asking again and again and again won't change the answer.



-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

		Remember Cernan and Schmitt

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
  2004-02-16  3:45 ` Jan Knutar
@ 2004-02-16  8:30   ` Nicolas Mailhot
  2004-02-16  8:54     ` Valdis.Kletnieks
  0 siblings, 1 reply; 40+ messages in thread
From: Nicolas Mailhot @ 2004-02-16  8:30 UTC (permalink / raw)
  To: Jan Knutar; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 997 bytes --]

Le lun, 16/02/2004 à 05:45 +0200, Jan Knutar a écrit :
> > - what happens to already existing invalid UTF-8 filenames ? Should
> > the kernel forcibly rewrite them (in 2.7.0...) to remove legacy mess
> > ? What should happen if someone plug an unconverted FS in such a
> > system afterwards ?
> 
> What I would like would be a userspace tool, that would recurse and 
> convert filename encodings from specified locale to UTF-8. Something 
> like "any2utf8 -from iso8859-1 -recurse /mnt/myoldmp3disk". 
> Does anyone know if such a tool exists already?

One can do find+ recode magic now

The question is :
- can this be automated ?
- how can one recognise and unconverted fs ?
- how can on guess the encodings(s) that have been used before on such
an fs ?

You're assuming the situation is merely a iso8859-1 to utf-8 migration.
Far from it. The core problem is everyone damn wrote what it pleased him
without considering future readers.

Cheers,

-- 
Nicolas Mailhot

[-- Attachment #2: Ceci est une partie de message numériquement signée --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
  2004-02-16  8:30   ` Nicolas Mailhot
@ 2004-02-16  8:54     ` Valdis.Kletnieks
  0 siblings, 0 replies; 40+ messages in thread
From: Valdis.Kletnieks @ 2004-02-16  8:54 UTC (permalink / raw)
  To: Nicolas Mailhot; +Cc: Jan Knutar, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1325 bytes --]

On Mon, 16 Feb 2004 09:30:41 +0100, Nicolas Mailhot said:

> You're assuming the situation is merely a iso8859-1 to utf-8 migration.
> Far from it. The core problem is everyone damn wrote what it pleased him
> without considering future readers.

Given the fact that there isn't in general any way for the kernel to know what
was intended, I don't see how any kernel policy other than "NUL and / are
special, but if you use anything other than UTF-8 it will eventually come back
to haunt you" can possibly be made to work.

For that matter, I have seen actual production code that intentionally created
fairly deep directory trees and terminal file names that were basically hashes
written in radix-254 and blatted out in binary.  Lots of them.  The original
problem report I got was along the lines of "We installed XYZ, and the file
system appears corrupted - ls -R weird the screen out, and 'find | wc -l' is
127,000 different than what 'df -i' reports".

I was ready to strangle the guilty party - radix-64 wouldn't have been a big
efficiency hit and at least the uuencode/base-64 charset doesn't weird your
terminal out. :)

So it's not even always possible to make the assumption that the filename is
supposed to make sense in *any* charset. This one requires fixing in some
combination of userspace and meatspace....


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
  2004-02-15 14:48 Pascal Schmidt
@ 2004-02-16 14:24 ` Eduard Bloch
  0 siblings, 0 replies; 40+ messages in thread
From: Eduard Bloch @ 2004-02-16 14:24 UTC (permalink / raw)
  To: Pascal Schmidt; +Cc: Jamie Lokier, linux-kernel

Moin Pascal!
Pascal Schmidt schrieb am Sunday, den 15. February 2004:

> >>    iso-8859-1 for that virtual terminal, but the keyboard remained
> >>    stuck in UTF-8 for _all_ virtual terminals.
> > kbd_mode -a to reset to ASCII mode.
> 
> And as I just figured out, loadkeys has to be invoked again, also.
> 
> I can go to utf-8 with:
> 
> 	setfont lat0-16
> 	kbd_mode -u
> 	loadkeys de-latin1-nodeadkeys

When I do this, I still cannot enter unicode chars "as usual". I see
them, mutt (for example) displays everything correct with a UTF-8
locale. However, I cannot insert them correctly. When I use vim, I have
to press another key (eg. Space) 2..4 times after an umlaut was pressed,
only then the char appears.

Needless to say that the same applications work fine in X with the same
UTF-8 locale.

Regards,
Eduard.
-- 
Lob ist eine gewaltige Antriebskraft, dessen Zauber seine Wirkung nie
verfehlt.
		-- Andor Foldes

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
       [not found] ` <1pRVj-2am-29@gated-at.bofh.it>
@ 2004-02-16 15:32   ` Pascal Schmidt
  2004-02-16 19:05     ` Eduard Bloch
  0 siblings, 1 reply; 40+ messages in thread
From: Pascal Schmidt @ 2004-02-16 15:32 UTC (permalink / raw)
  To: Eduard Bloch; +Cc: linux-kernel

On Mon, 16 Feb 2004 15:30:21 +0100, you wrote in linux.kernel:

> When I do this, I still cannot enter unicode chars "as usual". I see
> them, mutt (for example) displays everything correct with a UTF-8
> locale. However, I cannot insert them correctly. When I use vim, I have
> to press another key (eg. Space) 2..4 times after an umlaut was pressed,
> only then the char appears.

You're right, inputing UTF-8 (in joe) doesn't work, but that's an
application problem, I think, because it works just fine on a shell
prompt.

-- 
Ciao,
Pascal

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
  2004-02-16  6:21 ` jw schultz
@ 2004-02-16 15:55   ` Jamie Lokier
  2004-02-17  6:47     ` jw schultz
  0 siblings, 1 reply; 40+ messages in thread
From: Jamie Lokier @ 2004-02-16 15:55 UTC (permalink / raw)
  To: jw schultz, linux-kernel

jw schultz wrote:
> If you have a filesystem with filenames that don't conform
> to your policy write userspace tools to detect and/or fix
> them.  If you have programs creating non-conforming
> filenames, fix or rm those programs.

You do understand that GNU coreutils, bash etc. are among those
programs, right?  As in "touch zöe.txt" creates a non-conforming
filename...

> OK.  The questions have been asked and answered.
> Asking again and again and again won't change the answer.

The question of what a program like this should do has not been
answered:

   perl -e 'for (glob "*") { rename $_, "ņi-".$_ or die "rename: $!\n"; }'

   (NB: The prefix string is N WITH CEDILLA followed by "i-").

Hint: it mangles perfectly fine non-ASCII file names, instead of just
prefixing the prefix string.  If you change the program to correctly
prepend the prefix string, then it mangles non-UTF-8 names, which is
arguably correct, but can result in you losing some files.

This _is_ a userspace problem, but it is a genuine problem for which
no good answer is yet apparent.

-- Jamie

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
  2004-02-16 15:32   ` Pascal Schmidt
@ 2004-02-16 19:05     ` Eduard Bloch
  0 siblings, 0 replies; 40+ messages in thread
From: Eduard Bloch @ 2004-02-16 19:05 UTC (permalink / raw)
  To: Pascal Schmidt; +Cc: linux-kernel

Moin Pascal!
Pascal Schmidt schrieb am Monday, den 16. February 2004:

> > When I do this, I still cannot enter unicode chars "as usual". I see
> > them, mutt (for example) displays everything correct with a UTF-8
> > locale. However, I cannot insert them correctly. When I use vim, I have
> > to press another key (eg. Space) 2..4 times after an umlaut was pressed,
> > only then the char appears.
> 
> You're right, inputing UTF-8 (in joe) doesn't work, but that's an
> application problem, I think, because it works just fine on a shell
> prompt.

No, does not work for me either (up-to-date bash). General multibyte
support in joe is a different problem.

Regards,
Eduard.
-- 
Wie man sein Kind nicht nennen sollte: 
  Mario Hanna 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
  2004-02-16 15:55   ` Jamie Lokier
@ 2004-02-17  6:47     ` jw schultz
  2004-02-17 21:37       ` Jamie Lokier
  0 siblings, 1 reply; 40+ messages in thread
From: jw schultz @ 2004-02-17  6:47 UTC (permalink / raw)
  To: linux-kernel

On Mon, Feb 16, 2004 at 03:55:34PM +0000, Jamie Lokier wrote:
> jw schultz wrote:
> > If you have a filesystem with filenames that don't conform
> > to your policy write userspace tools to detect and/or fix
> > them.  If you have programs creating non-conforming
> > filenames, fix or rm those programs.
> 
> You do understand that GNU coreutils, bash etc. are among those

Doesn't matter where they come from.

> programs, right?  As in "touch zöe.txt" creates a non-conforming
> filename...

Your concrete example is a good one.  Where did that
filename come from?  It would seem to have come from the
keyboard via a tty (or simulator) which also had to display
it.  I'd say this is an argument for the terminal to display
UTF-8 and convert intput into UTF-8.  That is something that
seems to be not consistantly done as yet.  Ultimately it
seems to be a responsiblity of the user interface, whether
tty or GUI.  Until that happens the shells might be able to
fill the gap, however poorly.

Perhaps the utilities that don't attempt to interpret
filenames should treat filenames exactly like the kernel
does.

> > OK.  The questions have been asked and answered.
> > Asking again and again and again won't change the answer.
> 
> The question of what a program like this should do has not been
> answered:
> 
>    perl -e 'for (glob "*") { rename $_, "??i-".$_ or die "rename: $!\n"; }'
> 
>    (NB: The prefix string is N WITH CEDILLA followed by "i-").
> 
> Hint: it mangles perfectly fine non-ASCII file names, instead of just
> prefixing the prefix string.  If you change the program to correctly
> prepend the prefix string, then it mangles non-UTF-8 names, which is
> arguably correct, but can result in you losing some files.

Then if there is incorrect behavior is it the shell, tty or perl that is
getting things wrong here.

> This _is_ a userspace problem, but it is a genuine problem for which
> no good answer is yet apparent.

I'll buy that.  Then the first question to ask is "what is
the correct forum for resolving this".

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

		Remember Cernan and Schmitt

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
  2004-02-17  6:47     ` jw schultz
@ 2004-02-17 21:37       ` Jamie Lokier
  2004-02-17 22:12         ` Linus Torvalds
  0 siblings, 1 reply; 40+ messages in thread
From: Jamie Lokier @ 2004-02-17 21:37 UTC (permalink / raw)
  To: jw schultz, linux-kernel

jw schultz wrote:
> Your concrete example is a good one.  Where did that
> filename come from?  It would seem to have come from the
> keyboard via a tty (or simulator) which also had to display
> it.  I'd say this is an argument for the terminal to display
> UTF-8 and convert intput into UTF-8.  That is something that
> seems to be not consistantly done as yet.  Ultimately it
> seems to be a responsiblity of the user interface, whether
> tty or GUI.  Until that happens the shells might be able to
> fill the gap, however poorly.

Many terminals will not ever display UTF-8.  Think: all the serial terminals.

This is why I think "stty utf8" or something along those lines would
be useful.  The terminal itself doesn't have to talk UTF-8; however,
the applications talking with /dev/tty would always see UTF-8.

That seems to solve most of the practical user interface problems of
the command line, in one single clean place.

-- Jamie

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
  2004-02-17 21:37       ` Jamie Lokier
@ 2004-02-17 22:12         ` Linus Torvalds
  2004-02-18  9:59           ` Jamie Lokier
  0 siblings, 1 reply; 40+ messages in thread
From: Linus Torvalds @ 2004-02-17 22:12 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: jw schultz, linux-kernel



On Tue, 17 Feb 2004, Jamie Lokier wrote:
> 
> Many terminals will not ever display UTF-8.  Think: all the serial terminals.
> 
> This is why I think "stty utf8" or something along those lines would
> be useful.  The terminal itself doesn't have to talk UTF-8; however,
> the applications talking with /dev/tty would always see UTF-8.
> 
> That seems to solve most of the practical user interface problems of
> the command line, in one single clean place.

Doesn't "screen" already do this? I don't think you want to have the
locale handling in the kernel, along with translation of multi-key
characters (and from things like CJK terminals? I don't know what format
they send).  Sounds like you should use a user-mode thing that knows about
locales...

		Linus

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
  2004-02-17 22:12         ` Linus Torvalds
@ 2004-02-18  9:59           ` Jamie Lokier
  2004-02-18 15:54             ` Linus Torvalds
  0 siblings, 1 reply; 40+ messages in thread
From: Jamie Lokier @ 2004-02-18  9:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: jw schultz, linux-kernel

Linus Torvalds wrote:
> Doesn't "screen" already do this? I don't think you want to have the
> locale handling in the kernel, along with translation of multi-key
> characters (and from things like CJK terminals? I don't know what format
> they send).  Sounds like you should use a user-mode thing that knows about
> locales...

Yes.  I was thinking in a rather DEC VT100/Putty/xterm- centric way
for a moment; please excuse the slip.

It's irritating that logging in from the wrong kind of terminal
doesn't just provide the right "user experience" for the command line
automatically.  It's also a pain that ssh doesn't inform the remote
end whether the local terminal is UTF-8, so everything seem to be
working fine until one day you discover typing "£" in an editor just
beeps.  Grr..  Oh well.

These are all solvable in userspace.  Then again, so were most of the
other stty options; didn't stop them from being implemented in the kernel :)

-- Jamie

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
  2004-02-18  9:59           ` Jamie Lokier
@ 2004-02-18 15:54             ` Linus Torvalds
  2004-02-18 23:58               ` Jamie Lokier
  0 siblings, 1 reply; 40+ messages in thread
From: Linus Torvalds @ 2004-02-18 15:54 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: jw schultz, linux-kernel



On Wed, 18 Feb 2004, Jamie Lokier wrote:
> 
> It's irritating that logging in from the wrong kind of terminal
> doesn't just provide the right "user experience" for the command line
> automatically.

Well, you should be able to just start something "screen"-equivalent 
directly by just making it your default shell or have a fix to "login". 

The thing is, the kernel tty layer is happy to work with utf-8 (well,
modulo the issues of erase etc - and Andries posted that patch already,
and there are probably others like it) if your terminal supports it, but
if your terminal doesn't have CJK supprt internally, then you need 
something to do the multi-character translations anyway in order to be 
able to input them in the first place.

And that is _not_ an stty option.

Btw, from the screen man-page it appears that screen is not able to do 
that either. You can put screen into utf-8 mode, but it sounds like it 
just means that it passes UTF-8 through, not that it does any translation 
from "latin1 vt100 to utf-8".

I think there are a few editors that actually do ("mined" looks like it 
should do it).

		Linus

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: JFS default behavior
  2004-02-18 15:54             ` Linus Torvalds
@ 2004-02-18 23:58               ` Jamie Lokier
  0 siblings, 0 replies; 40+ messages in thread
From: Jamie Lokier @ 2004-02-18 23:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: jw schultz, linux-kernel

Linus Torvalds wrote:
> Btw, from the screen man-page it appears that screen is not able to do 
> that either. You can put screen into utf-8 mode, but it sounds like it 
> just means that it passes UTF-8 through, not that it does any translation 
> from "latin1 vt100 to utf-8".

Screen works nicely.  Do this:

    echo 'defutf8 on' >> ~/.screenrc

Then screen presents a UTF-8 interface to the shell and other
programs, regardless of what kind of terminal you connect from :)

(It's a bit overkill, no actually it's a lot overkill, and you have the
annoyance of screen intercepting at least one commonly used editing key.)

(Just remember to set the LANG environment variable to include
".UTF-8" so that screen-oriented programs know to display properly.  I
do it automatically using a script which queries the current terminal,
to workaround ssh not forwarding LANG).

> I think there are a few editors that actually do ("mined" looks like it 
> should do it).

Emacs does, of course.

-- Jamie

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2004-02-18 23:59 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-12 16:50 JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) Nicolas Mailhot
2004-02-12 18:12 ` Robin Rosenberg
2004-02-13  3:03 ` Jamie Lokier
2004-02-13 10:07   ` Robin Rosenberg
2004-02-13 18:06   ` Nicolas Mailhot
2004-02-13 18:15     ` viro
2004-02-13 18:24       ` Valdis.Kletnieks
2004-02-13 18:31         ` viro
2004-02-13 20:27           ` Jamie Lokier
2004-02-13 18:31       ` Richard B. Johnson
2004-02-13 18:50         ` JFS default behavior Ulrich Drepper
2004-02-13 22:39         ` JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) Robin Rosenberg
     [not found] <04Feb13.163954est.41760@gpu.utcc.utoronto.ca>
2004-02-14 14:27 ` JFS default behavior Nicolas Mailhot
2004-02-14 15:40   ` viro
2004-02-14 17:47     ` Nicolas Mailhot
2004-02-14 17:59       ` Nicolas Mailhot
2004-02-14 23:06     ` Robin Rosenberg
2004-02-14 23:29       ` viro
2004-02-15  0:07         ` Robin Rosenberg
2004-02-15  2:41           ` Linus Torvalds
2004-02-15  3:33             ` Matthias Urlichs
2004-02-15  4:04               ` viro
2004-02-15  9:48                 ` Robin Rosenberg
2004-02-15 18:26                 ` yodaiken
  -- strict thread matches above, loose matches on Subject: below --
2004-02-15 14:48 Pascal Schmidt
2004-02-16 14:24 ` Eduard Bloch
2004-02-15 23:03 Nicolas Mailhot
2004-02-16  3:45 ` Jan Knutar
2004-02-16  8:30   ` Nicolas Mailhot
2004-02-16  8:54     ` Valdis.Kletnieks
2004-02-16  6:21 ` jw schultz
2004-02-16 15:55   ` Jamie Lokier
2004-02-17  6:47     ` jw schultz
2004-02-17 21:37       ` Jamie Lokier
2004-02-17 22:12         ` Linus Torvalds
2004-02-18  9:59           ` Jamie Lokier
2004-02-18 15:54             ` Linus Torvalds
2004-02-18 23:58               ` Jamie Lokier
     [not found] <1pvUz-6j-1@gated-at.bofh.it>
     [not found] ` <1pRVj-2am-29@gated-at.bofh.it>
2004-02-16 15:32   ` Pascal Schmidt
2004-02-16 19:05     ` Eduard Bloch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox