* UTF-8 in file systems? xfs/extfs/etc.
@ 2004-02-09 11:58 Nico Schottelius
2004-02-09 12:26 ` Måns Rullgård
` (5 more replies)
0 siblings, 6 replies; 81+ messages in thread
From: Nico Schottelius @ 2004-02-09 11:58 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 516 bytes --]
Morning!
What Linux supported filesystems support UTF-8 filenames?
Looks like at least xfs and reiserfs are not able of handling them,
as Apache with UTF-8 as default charset delievers wrong names, when
accessing files with German umlauts.
Is it somehow planned to enable it?
Or are you waiting for patches which do that job?
Greetings,
Nico
--
Keep it simple & stupid, use what's available.
pgp: 8D0E E27A | Nico Schottelius
http://nerd-hosting.net | http://linux.schottelius.org
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 81+ messages in thread* Re: UTF-8 in file systems? xfs/extfs/etc. 2004-02-09 11:58 UTF-8 in file systems? xfs/extfs/etc Nico Schottelius @ 2004-02-09 12:26 ` Måns Rullgård 2004-02-09 12:28 ` Hugo Mills ` (4 subsequent siblings) 5 siblings, 0 replies; 81+ messages in thread From: Måns Rullgård @ 2004-02-09 12:26 UTC (permalink / raw) To: linux-kernel Nico Schottelius <nico-kernel@schottelius.org> writes: > Morning! > > What Linux supported filesystems support UTF-8 filenames? AFAIK, the filesystems don't care what you put in the filenames. They just treat is as a sequence of bytes. > Looks like at least xfs and reiserfs are not able of handling them, > as Apache with UTF-8 as default charset delievers wrong names, when > accessing files with German umlauts. Wrong in what way? How did you create the filenames? -- Måns Rullgård mru@kth.se ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: UTF-8 in file systems? xfs/extfs/etc. 2004-02-09 11:58 UTF-8 in file systems? xfs/extfs/etc Nico Schottelius 2004-02-09 12:26 ` Måns Rullgård @ 2004-02-09 12:28 ` Hugo Mills 2004-02-09 13:04 ` Matthew Reppert ` (3 subsequent siblings) 5 siblings, 0 replies; 81+ messages in thread From: Hugo Mills @ 2004-02-09 12:28 UTC (permalink / raw) To: linux-kernel, Nico Schottelius [-- Attachment #1: Type: text/plain, Size: 1335 bytes --] On Mon, Feb 09, 2004 at 12:58:52PM +0100, Nico Schottelius wrote: > Morning! > > What Linux supported filesystems support UTF-8 filenames? > > Looks like at least xfs and reiserfs are not able of handling them, I'm using ReiserFS on my media drive, and I can tell you that the UTF-8 works fine on the filesystem: hrm@vlad:Les-Granges-Brulées $ ls Descente-Au-Village.ogg Les-Granges-Brulées.ogg Générique.ogg L'Hélicoptère.ogg Hésitation.ogg Reconstitution.ogg La-Chanson-Des-Grange-Brulées.ogg Rose.ogg La-Perquisition-Et-Les-Paysans.ogg Théme-De-L'Argent.ogg La-Vérité.ogg typescript Le-Car+Le-Chasse-Neige.ogg Une-Morte-Dans-La-Neige.ogg Le-Juge.ogg Zig-Zag.ogg Le-Pays-de-Rose.ogg > As Apache with UTF-8 as default charset delievers wrong names, when > accessing files with German umlauts. I'd suspect a problem with Apache more than a problem with the filesystem. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 1C335860 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- There's an infinite number of monkeys outside who want to --- talk to us about this new script for Hamlet they've worked out! [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: UTF-8 in file systems? xfs/extfs/etc. 2004-02-09 11:58 UTF-8 in file systems? xfs/extfs/etc Nico Schottelius 2004-02-09 12:26 ` Måns Rullgård 2004-02-09 12:28 ` Hugo Mills @ 2004-02-09 13:04 ` Matthew Reppert 2004-02-09 13:36 ` Matthias Urlichs ` (2 subsequent siblings) 5 siblings, 0 replies; 81+ messages in thread From: Matthew Reppert @ 2004-02-09 13:04 UTC (permalink / raw) To: Nico Schottelius; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 462 bytes --] On Mon, 2004-02-09 at 05:58, Nico Schottelius wrote: > Morning! > > What Linux supported filesystems support UTF-8 filenames? > > Looks like at least xfs and reiserfs are not able of handling them, > as Apache with UTF-8 as default charset delievers wrong names, when > accessing files with German umlauts. I have no problem creating and accessing files with Chinese names using ls, vi, cat, etc. in (u)xterm on my reiserfs /tmp partition. Matt [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: UTF-8 in file systems? xfs/extfs/etc. 2004-02-09 11:58 UTF-8 in file systems? xfs/extfs/etc Nico Schottelius ` (2 preceding siblings ...) 2004-02-09 13:04 ` Matthew Reppert @ 2004-02-09 13:36 ` Matthias Urlichs 2004-02-10 4:32 ` Mike Fedyk 2004-02-09 15:06 ` Matthew Garrett 2004-02-11 6:39 ` Tim Connors 5 siblings, 1 reply; 81+ messages in thread From: Matthias Urlichs @ 2004-02-09 13:36 UTC (permalink / raw) To: linux-kernel Hi, Nico Schottelius wrote: > What Linux supported filesystems support UTF-8 filenames? Filenames, to the kernel, are a sequence of 8-bit things commonly called "bytes" or "octets", excluding '/' and '\0'. => Answer: "All of them". (Or at least ext2/reiser ;-) > Looks like at least xfs and reiserfs are not able of handling them, as > Apache with UTF-8 as default charset delievers wrong names, when accessing > files with German umlauts. That's an Apache bug, and/or a problem with your Apache configuration. Talk to Apache people. -- Matthias Urlichs ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: UTF-8 in file systems? xfs/extfs/etc. 2004-02-09 13:36 ` Matthias Urlichs @ 2004-02-10 4:32 ` Mike Fedyk 2004-02-10 4:53 ` Matthias Urlichs ` (2 more replies) 0 siblings, 3 replies; 81+ messages in thread From: Mike Fedyk @ 2004-02-10 4:32 UTC (permalink / raw) To: Matthias Urlichs; +Cc: linux-kernel On Mon, Feb 09, 2004 at 02:36:24PM +0100, Matthias Urlichs wrote: > Hi, Nico Schottelius wrote: > > > What Linux supported filesystems support UTF-8 filenames? > > Filenames, to the kernel, are a sequence of 8-bit things commonly > called "bytes" or "octets", excluding '/' and '\0'. > You can have "/" in the filename also, though that could be encoded somehow... ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: UTF-8 in file systems? xfs/extfs/etc. 2004-02-10 4:32 ` Mike Fedyk @ 2004-02-10 4:53 ` Matthias Urlichs 2004-02-10 9:46 ` Robin Rosenberg 2004-02-10 23:04 ` jw schultz 2 siblings, 0 replies; 81+ messages in thread From: Matthias Urlichs @ 2004-02-10 4:53 UTC (permalink / raw) To: linux-kernel Hi, Mike Fedyk: > > You can have "/" in the filename also, though that could be encoded somehow... Such encoding isn't valid UTF-8. Of course you could use ⁄ instead (fractional slash, U+2044). Or perhaps ∕ (division slash, U+2215). How to visually distinguish these from a / (U+002F) is left as an exercise to the reader. :-/ The fun part about this email is that I'm writing it with plain old vi (ummm.... I _do_ know that there's nothing "plain old" about vim ;-) and I don't see silly square boxes here. -- Matthias Urlichs ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: UTF-8 in file systems? xfs/extfs/etc. 2004-02-10 4:32 ` Mike Fedyk 2004-02-10 4:53 ` Matthias Urlichs @ 2004-02-10 9:46 ` Robin Rosenberg 2004-02-10 23:04 ` jw schultz 2 siblings, 0 replies; 81+ messages in thread From: Robin Rosenberg @ 2004-02-10 9:46 UTC (permalink / raw) To: Matthias Urlichs, linux-kernel On Tuesday 10 February 2004 05.32, Mike Fedyk wrote: > You can have "/" in the filename also, though that could be encoded somehow... Maybe you are thinking of KDE's convention with %-encoding, e.g. if I save a web link in KDE it may look like "http://kernel.org/.desktop" in Konqueror, but "http:%2f%2fkernel.org%2f.desktop" with ls. That's on top of whatever character encoding is being used for regular characters. -- robin ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: UTF-8 in file systems? xfs/extfs/etc. 2004-02-10 4:32 ` Mike Fedyk 2004-02-10 4:53 ` Matthias Urlichs 2004-02-10 9:46 ` Robin Rosenberg @ 2004-02-10 23:04 ` jw schultz 2004-02-10 23:17 ` viro ` (2 more replies) 2 siblings, 3 replies; 81+ messages in thread From: jw schultz @ 2004-02-10 23:04 UTC (permalink / raw) To: linux-kernel; +Cc: Matthias Urlichs On Mon, Feb 09, 2004 at 08:32:12PM -0800, Mike Fedyk wrote: > On Mon, Feb 09, 2004 at 02:36:24PM +0100, Matthias Urlichs wrote: > > Hi, Nico Schottelius wrote: > > > > > What Linux supported filesystems support UTF-8 filenames? > > > > Filenames, to the kernel, are a sequence of 8-bit things commonly > > called "bytes" or "octets", excluding '/' and '\0'. > > > > You can have "/" in the filename also, though that could be encoded somehow... You might be able to have a non-ASCII character that looks like / but not 0x2f. I for one do not want open("/var/tpm/diddle", O_WRONLY | O_CREAT) to create a file "tpm/diddle" in /var just because /var/tpm doesn't exist. Fortunately what happens is it fails with ENOENT. I expect UTF-8 to have no multi-byte sequences containing NUL but it might be awkward if a multi-byte sequence contained 0x2F (/). I would hope that the committees chose to avoid using symbol and punctuation byte-codes for alphanumeric sequences. -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: jw@pegasys.ws Remember Cernan and Schmitt ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: UTF-8 in file systems? xfs/extfs/etc. 2004-02-10 23:04 ` jw schultz @ 2004-02-10 23:17 ` viro 2004-02-10 23:23 ` Måns Rullgård 2004-02-11 0:02 ` Mike Fedyk 2 siblings, 0 replies; 81+ messages in thread From: viro @ 2004-02-10 23:17 UTC (permalink / raw) To: jw schultz, linux-kernel, Matthias Urlichs On Tue, Feb 10, 2004 at 03:04:52PM -0800, jw schultz wrote: > I expect UTF-8 to have no multi-byte sequences containing NUL > but it might be awkward if a multi-byte sequence contained > 0x2F (/). I would hope that the committees chose to avoid > using symbol and punctuation byte-codes for alphanumeric > sequences. UTF-8 single-byte sequences are in range 0--127 with obvious mapping to ASCII. All bytes in UTF-8 multi-byte sequences are in range 128--255. ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: UTF-8 in file systems? xfs/extfs/etc. 2004-02-10 23:04 ` jw schultz 2004-02-10 23:17 ` viro @ 2004-02-10 23:23 ` Måns Rullgård 2004-02-11 0:02 ` Mike Fedyk 2 siblings, 0 replies; 81+ messages in thread From: Måns Rullgård @ 2004-02-10 23:23 UTC (permalink / raw) To: linux-kernel jw schultz <jw@pegasys.ws> writes: > On Mon, Feb 09, 2004 at 08:32:12PM -0800, Mike Fedyk wrote: >> On Mon, Feb 09, 2004 at 02:36:24PM +0100, Matthias Urlichs wrote: >> > Hi, Nico Schottelius wrote: >> > >> > > What Linux supported filesystems support UTF-8 filenames? >> > >> > Filenames, to the kernel, are a sequence of 8-bit things commonly >> > called "bytes" or "octets", excluding '/' and '\0'. >> > >> >> You can have "/" in the filename also, though that could be encoded >> somehow... > > You might be able to have a non-ASCII character that looks > like / but not 0x2f. > > I for one do not want open("/var/tpm/diddle", O_WRONLY | O_CREAT) > to create a file "tpm/diddle" in /var just because /var/tpm > doesn't exist. Just imagine all the possibilities for ambiguous file names it would open up. > I expect UTF-8 to have no multi-byte sequences containing NUL > but it might be awkward if a multi-byte sequence contained > 0x2F (/). I would hope that the committees chose to avoid > using symbol and punctuation byte-codes for alphanumeric > sequences. IIRC, UTF-8 doesn't use bytes <128 in any multi-byte sequences. -- Måns Rullgård mru@kth.se ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: UTF-8 in file systems? xfs/extfs/etc. 2004-02-10 23:04 ` jw schultz 2004-02-10 23:17 ` viro 2004-02-10 23:23 ` Måns Rullgård @ 2004-02-11 0:02 ` Mike Fedyk 2 siblings, 0 replies; 81+ messages in thread From: Mike Fedyk @ 2004-02-11 0:02 UTC (permalink / raw) To: jw schultz, linux-kernel, Matthias Urlichs On Tue, Feb 10, 2004 at 03:04:52PM -0800, jw schultz wrote: > On Mon, Feb 09, 2004 at 08:32:12PM -0800, Mike Fedyk wrote: > > On Mon, Feb 09, 2004 at 02:36:24PM +0100, Matthias Urlichs wrote: > > > Hi, Nico Schottelius wrote: > > > > > > > What Linux supported filesystems support UTF-8 filenames? > > > > > > Filenames, to the kernel, are a sequence of 8-bit things commonly > > > called "bytes" or "octets", excluding '/' and '\0'. > > > > > > > You can have "/" in the filename also, though that could be encoded somehow... > > You might be able to have a non-ASCII character that looks > like / but not 0x2f. > > I for one do not want open("/var/tpm/diddle", O_WRONLY | O_CREAT) > to create a file "tpm/diddle" in /var just because /var/tpm > doesn't exist. Fortunately what happens is it fails with > ENOENT. OK, I stand corrected. Thanks. ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: UTF-8 in file systems? xfs/extfs/etc. 2004-02-09 11:58 UTF-8 in file systems? xfs/extfs/etc Nico Schottelius ` (3 preceding siblings ...) 2004-02-09 13:36 ` Matthias Urlichs @ 2004-02-09 15:06 ` Matthew Garrett 2004-02-11 6:39 ` Tim Connors 5 siblings, 0 replies; 81+ messages in thread From: Matthew Garrett @ 2004-02-09 15:06 UTC (permalink / raw) To: linux-kernel Matthias Urlichs wrote: >Looks like at least xfs and reiserfs are not able of handling them, >as Apache with UTF-8 as default charset delievers wrong names, when >accessing files with German umlauts. Are you sure your filenames are in UTF-8 rather than ISO8859-1? If not, then they'll appear as an invalid UTF-8 string and code that expects UTF-8 will be unhappy. -- Matthew Garrett | mjg59-chiark.mail.linux-rutgers.kernel@srcf.ucam.org ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: UTF-8 in file systems? xfs/extfs/etc. 2004-02-09 11:58 UTF-8 in file systems? xfs/extfs/etc Nico Schottelius ` (4 preceding siblings ...) 2004-02-09 15:06 ` Matthew Garrett @ 2004-02-11 6:39 ` Tim Connors 2004-02-11 16:35 ` JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) Dave Kleikamp 5 siblings, 1 reply; 81+ messages in thread From: Tim Connors @ 2004-02-11 6:39 UTC (permalink / raw) To: linux-kernel Nico Schottelius <nico-kernel@schottelius.org> said on Mon, 9 Feb 2004 12:58:52 +0100: > > --GRPZ8SYKNexpdSJ7 > Content-Type: text/plain; charset=us-ascii > Content-Disposition: inline > Content-Transfer-Encoding: quoted-printable > > Morning! > > What Linux supported filesystems support UTF-8 filenames? > > Looks like at least xfs and reiserfs are not able of handling them, > as Apache with UTF-8 as default charset delievers wrong names, when > accessing files with German umlauts. I submitted a bug to the jfs people, because jfs incorrectly returns -EINVAL (this isn't even documented in man pages as a valid return from open()) from an open() on a filename with UTF-8 in it. See http://www-124.ibm.com/developerworks/bugs/?func=detailbug&bug_id=3838&group_id=35 and http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=229308 This was triggered just by upgrading the console-utils package in debian (the problem existed all along, except that when I first made the filesystem a jfs one, I reinstalled from backups, rather than reinstalling debian from scratch) -- TimC -- http://astronomy.swin.edu.au/staff/tconnors/ Just don't create a file called -rf. :-) -- Larry Wall in <11393@jpl-devvax.JPL.NASA.GOV> ^ permalink raw reply [flat|nested] 81+ messages in thread
* JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-11 6:39 ` Tim Connors @ 2004-02-11 16:35 ` Dave Kleikamp 2004-02-12 0:45 ` Andy Isaacson 0 siblings, 1 reply; 81+ messages in thread From: Dave Kleikamp @ 2004-02-11 16:35 UTC (permalink / raw) To: Tim Connors; +Cc: linux-kernel, JFS Discussion On Wed, 2004-02-11 at 00:39, Tim Connors wrote: > I submitted a bug to the jfs people, because jfs incorrectly returns > -EINVAL (this isn't even documented in man pages as a valid return > from open()) from an open() on a filename with UTF-8 in it. > > See http://www-124.ibm.com/developerworks/bugs/?func=detailbug&bug_id=3838&group_id=35 > and http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=229308 > > This was triggered just by upgrading the console-utils package in > debian (the problem existed all along, except that when I first made > the filesystem a jfs one, I reinstalled from backups, rather than > reinstalling debian from scratch) Yeah, JFS has poor default behavior based on CONFIG_NLS_DEFAULT. I attempted to explain why it works that way in the first bug listed above if anyone is curious. I think the right thing for JFS to do is to change the default behavior to simply store the bytes as they are seen, and to only do charset conversion when the iocharset mount option is explicitly set. This may impact some current users, but they will be able to get the old behavior by setting iocharset to whatever CONFIG_NLS_DEFAULT is set to in the running kernel. I intend to make this change soon if there are no objections. Thanks, Shaggy -- David Kleikamp IBM Linux Technology Center ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-11 16:35 ` JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) Dave Kleikamp @ 2004-02-12 0:45 ` Andy Isaacson 2004-02-12 1:19 ` Tim Connors ` (4 more replies) 0 siblings, 5 replies; 81+ messages in thread From: Andy Isaacson @ 2004-02-12 0:45 UTC (permalink / raw) To: Dave Kleikamp; +Cc: linux-kernel On Wed, Feb 11, 2004 at 10:35:10AM -0600, Dave Kleikamp wrote: > Yeah, JFS has poor default behavior based on CONFIG_NLS_DEFAULT. I > attempted to explain why it works that way in the first bug listed above > if anyone is curious. I think your suggested fix is good, but it begs the question: Why on earth is JFS worried about the filename, anyways? Why has it *ever* had *any* behavior other than "string of bytes, delimited with /, terminated with \0" ? I read your response about OS/2, and maybe I'm just slow, but I don't see what that has to do with anything. Does JFS on AIX have the same buggy behavior? What behavior was the code originally designed to implement, on OS/2? Why was that behavior chosen rather than "filenames are a string of bytes"? Feel free to point to a "Design of the OS/2 JFS interface" document if such exists and answers my question. :) -andy ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-12 0:45 ` Andy Isaacson @ 2004-02-12 1:19 ` Tim Connors 2004-02-12 3:54 ` jw schultz ` (3 subsequent siblings) 4 siblings, 0 replies; 81+ messages in thread From: Tim Connors @ 2004-02-12 1:19 UTC (permalink / raw) To: linux-kernel Andy Isaacson <adi@hexapodia.org> said on Wed, 11 Feb 2004 18:45:32 -0600: > On Wed, Feb 11, 2004 at 10:35:10AM -0600, Dave Kleikamp wrote: > > Yeah, JFS has poor default behavior based on CONFIG_NLS_DEFAULT. I > > attempted to explain why it works that way in the first bug listed above > > if anyone is curious. > > I think your suggested fix is good, but it begs the question: > > Why on earth is JFS worried about the filename, anyways? Why has it > *ever* had *any* behavior other than "string of bytes, delimited with /, > terminated with \0" ? Thanks for wording my question better. That was *precisely* the question I was trying to ask :) -- TimC -- http://astronomy.swin.edu.au/staff/tconnors/ Disclaimer: This post owned by the owner ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-12 0:45 ` Andy Isaacson 2004-02-12 1:19 ` Tim Connors @ 2004-02-12 3:54 ` jw schultz 2004-02-12 12:03 ` Robin Rosenberg 2004-02-12 8:54 ` Jamie Lokier ` (2 subsequent siblings) 4 siblings, 1 reply; 81+ messages in thread From: jw schultz @ 2004-02-12 3:54 UTC (permalink / raw) To: linux-kernel On Wed, Feb 11, 2004 at 06:45:32PM -0600, Andy Isaacson wrote: > On Wed, Feb 11, 2004 at 10:35:10AM -0600, Dave Kleikamp wrote: > > Yeah, JFS has poor default behavior based on CONFIG_NLS_DEFAULT. I > > attempted to explain why it works that way in the first bug listed above > > if anyone is curious. > > I think your suggested fix is good, but it begs the question: > > Why on earth is JFS worried about the filename, anyways? Why has it > *ever* had *any* behavior other than "string of bytes, delimited with /, > terminated with \0" ? > > I read your response about OS/2, and maybe I'm just slow, but I don't > see what that has to do with anything. > > Does JFS on AIX have the same buggy behavior? > > What behavior was the code originally designed to implement, on OS/2? > Why was that behavior chosen rather than "filenames are a string of > bytes"? > > Feel free to point to a "Design of the OS/2 JFS interface" document if > such exists and answers my question. :) His first link almost explains it. | In OS/2, the kernel had access to each process's locale | information, and converting the pathnames from the user's | charset to unicode made access to the filesystem very | transparent, even when users used different character sets | on the same computer. | | Unfortunately, in Linux the kernel has no per-process | information to go on, so it uses the charset specified by | CONFIG_NLS_DEFAULT when the kernel is built. Obviously, | this is neither intuitive or generally useful. | | I am considering changing the default behavior to | trivially convert the user-supplied pathnames to utf-16 | when stored in on-disk. This default behavior could be | overridden by specifying the iocharset= mount flag. Apparently in OS2 they implemented a policy of utf-16 into the kernel so that applications would not have to be as locale aware. This could be called kernel pollution. For Linux there is no policy except perhaps in userspace. It is up to userspace to determine what the policy will be regarding charset for filename storage. Common practice seems to be utf-8. -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: jw@pegasys.ws Remember Cernan and Schmitt ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-12 3:54 ` jw schultz @ 2004-02-12 12:03 ` Robin Rosenberg 0 siblings, 0 replies; 81+ messages in thread From: Robin Rosenberg @ 2004-02-12 12:03 UTC (permalink / raw) To: jw schultz, linux-kernel On Thursday 12 February 2004 04.54, jw schultz wrote: > For Linux there is no policy except perhaps in userspace. > It is up to userspace to determine what the policy will be > regarding charset for filename storage. Common practice > seems to be utf-8. Isn't it is the user's locale, whatever that is? I believe my file names use ISO-8859-1 (except ntfs, vfat). In northern/western europe ISO-8859-1 is common. (Sometimes ISO-8859-15 which for all practical purposes is backwards compatible with 8859-1). UTF-8 is gaining terrain though since it is now the default in some distributions even for Nordic languages (causing big problems for those not expecting it). -- robin ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-12 0:45 ` Andy Isaacson 2004-02-12 1:19 ` Tim Connors 2004-02-12 3:54 ` jw schultz @ 2004-02-12 8:54 ` Jamie Lokier 2004-02-12 15:55 ` Robin Rosenberg 2004-02-12 13:28 ` Dave Kleikamp 2004-02-12 15:26 ` Valdis.Kletnieks 4 siblings, 1 reply; 81+ messages in thread From: Jamie Lokier @ 2004-02-12 8:54 UTC (permalink / raw) To: Andy Isaacson; +Cc: Dave Kleikamp, linux-kernel Andy Isaacson wrote: > Why on earth is JFS worried about the filename, anyways? Why has it > *ever* had *any* behavior other than "string of bytes, delimited with /, > terminated with \0" ? Perhaps for the same reason that these other in-tree filesystems are sensitive to the character encoding: Joliet (ISO-9660 extension), FAT/VFAT, NTFS, BeFS, SMBFS, CIFS. Those filesystems will also fail, or give unexpected behaviour (such as bytes being changed to '?'), if you pass them names which are not in the appropriate encoding. -- Jamie ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-12 8:54 ` Jamie Lokier @ 2004-02-12 15:55 ` Robin Rosenberg 2004-02-12 16:17 ` John Bradford 2004-02-13 0:38 ` Jamie Lokier 0 siblings, 2 replies; 81+ messages in thread From: Robin Rosenberg @ 2004-02-12 15:55 UTC (permalink / raw) To: Jamie Lokier; +Cc: Linux kernel On Thursday 12 February 2004 09.54, you wrote: > Andy Isaacson wrote: > > Why on earth is JFS worried about the filename, anyways? Why has it > > *ever* had *any* behavior other than "string of bytes, delimited with /, > > terminated with \0" ? > > Perhaps for the same reason that these other in-tree filesystems are > sensitive to the character encoding: > > Joliet (ISO-9660 extension), FAT/VFAT, NTFS, BeFS, SMBFS, CIFS. > > Those filesystems will also fail, or give unexpected behaviour (such > as bytes being changed to '?'), if you pass them names which are not > in the appropriate encoding. Definitely a good reason. It seem many assume file names are a local thing, but this is not so. Now consider the case with an external firewire disk or memory stick created on a machine with iso-8859-1 as the system character set and e.g xfs as the file system. What happens when I hook it up to a new redhat installation that thinks file names are best stored as utf8? Most non-ascii file names aren't even legal in utf8. -- robin ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-12 15:55 ` Robin Rosenberg @ 2004-02-12 16:17 ` John Bradford 2004-02-12 16:40 ` Robin Rosenberg 2004-02-13 0:17 ` Jamie Lokier 2004-02-13 0:38 ` Jamie Lokier 1 sibling, 2 replies; 81+ messages in thread From: John Bradford @ 2004-02-12 16:17 UTC (permalink / raw) To: Robin Rosenberg, Jamie Lokier; +Cc: Linux kernel > Definitely a good reason. It seem many assume file names are a local thing, > but this is not so. Now consider the case with an external firewire > disk or memory stick created on a machine with iso-8859-1 as the system character > set and e.g xfs as the file system. What happens when I hook it up to a new redhat > installation that thinks file names are best stored as utf8? Most non-ascii > file names aren't even legal in utf8. Another thing to consider is that you can encode the same character in several ways using utf8, so two filenames could have different byte strings, but evaluate to the same set of unicode characters. John. ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-12 16:17 ` John Bradford @ 2004-02-12 16:40 ` Robin Rosenberg 2004-02-12 17:16 ` John Bradford 2004-02-13 0:17 ` Jamie Lokier 1 sibling, 1 reply; 81+ messages in thread From: Robin Rosenberg @ 2004-02-12 16:40 UTC (permalink / raw) To: John Bradford; +Cc: Linux kernel On Thursday 12 February 2004 17.17, you wrote: > Another thing to consider is that you can encode the same character in > several ways using utf8, so two filenames could have different byte > strings, but evaluate to the same set of unicode characters. No. That's not UTF-8. -- robin ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-12 16:40 ` Robin Rosenberg @ 2004-02-12 17:16 ` John Bradford 2004-02-12 18:06 ` Robin Rosenberg 0 siblings, 1 reply; 81+ messages in thread From: John Bradford @ 2004-02-12 17:16 UTC (permalink / raw) To: Robin Rosenberg; +Cc: Linux kernel Quote from Robin Rosenberg <robin.rosenberg.lists@dewire.com>: > On Thursday 12 February 2004 17.17, you wrote: > > Another thing to consider is that you can encode the same character in > > several ways using utf8, so two filenames could have different byte > > strings, but evaluate to the same set of unicode characters. > > No. That's not UTF-8. Please don't break the CC list on replies. I'm not sure whether it's valid UTF-8 or not, but it's certainly possible to code, for example, an 'A', (decimal 65), via an escape to a 31-bit character representation. Presumably the majority of UTF-8 parsers would decode the sequence as 65, rather than emit an error. Also, even ignoring that, how do you handle things like accented characters which can be represented as single characters, or as sequences containing combining characters? Some applications might convert the sequence containing combining characters in to the single character, and others might not. John. ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-12 17:16 ` John Bradford @ 2004-02-12 18:06 ` Robin Rosenberg 2004-02-12 19:08 ` John Bradford 0 siblings, 1 reply; 81+ messages in thread From: Robin Rosenberg @ 2004-02-12 18:06 UTC (permalink / raw) To: John Bradford; +Cc: Linux kernel On Thursday 12 February 2004 18.16, John Bradford wrote: > I'm not sure whether it's valid UTF-8 or not, but it's certainly > possible to code, for example, an 'A', (decimal 65), via an escape to > a 31-bit character representation. Presumably the majority of UTF-8 > parsers would decode the sequence as 65, rather than emit an error. There are many ways of getting things wrong. The algorithm for encoding UTF-8 doesn't give you the option of encoding 65 as two bytes; any UCS-4 character with code 0-0x7F must result in a onand the same principle goes for every other character and the unicdeo standard forbids the use of anything but the shortest possible sequence. > Also, even ignoring that, how do you handle things like accented > characters which can be represented as single characters, or as > sequences containing combining characters? Some applications might > convert the sequence containing combining characters in to the single > character, and others might not. In UTF-8 you cannot represent à as `a. I can have both in a file name and they are different. An application that assumes `a is the same a à (in UTF-8) is broken and should be fixed. -- robin ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-12 18:06 ` Robin Rosenberg @ 2004-02-12 19:08 ` John Bradford 2004-02-12 19:39 ` Robin Rosenberg 2004-02-14 15:24 ` Eduard Bloch 0 siblings, 2 replies; 81+ messages in thread From: John Bradford @ 2004-02-12 19:08 UTC (permalink / raw) To: Robin Rosenberg; +Cc: Linux kernel [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 1952 bytes --] > > I'm not sure whether it's valid UTF-8 or not, but it's certainly > > possible to code, for example, an 'A', (decimal 65), via an escape to > > a 31-bit character representation. Presumably the majority of UTF-8 > > parsers would decode the sequence as 65, rather than emit an error. > > There are many ways of getting things wrong. The algorithm for encoding > UTF-8 doesn't give you the option of encoding 65 as two bytes; any UCS-4 > character with code 0-0x7F must result in a onand the same principle goes > for every other character and the unicdeo standard forbids the use of anything > but the shortest possible sequence. The recommended encoding algorithm forbids anything but the shortest sequence, yes, but what will the majority of decoders do? I suspect that at least some will follow the usual networking rule of be liberal in what you accept, which for filenames may well cause all sorts of security holes. > > Also, even ignoring that, how do you handle things like accented > > characters which can be represented as single characters, or as > > sequences containing combining characters? Some applications might > > convert the sequence containing combining characters in to the single > > character, and others might not. > > In UTF-8 you cannot represent à as `a. I can have both in a file name and they > are different. An application that assumes `a is the same a à (in UTF-8) is broken > and should be fixed. Well, as long as every userspace implementation gets it correct, we'll be OK. Personally, I doubt they all will, especially those that convert from legacy encodings to Unicode, although quite possibly the above scenario with combining characters is not likely to happen for filenames. Or is it? What about copying a file from a filesystem with a UTF-8 encoding to a filesystem with a legacy encoding, and then back again? However, I am less concerned about this second scenario than the first. John. ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-12 19:08 ` John Bradford @ 2004-02-12 19:39 ` Robin Rosenberg 2004-02-12 21:13 ` John Bradford 2004-02-14 15:24 ` Eduard Bloch 1 sibling, 1 reply; 81+ messages in thread From: Robin Rosenberg @ 2004-02-12 19:39 UTC (permalink / raw) To: John Bradford; +Cc: Linux kernel On Thursday 12 February 2004 20.08, you wrote: > > There are many ways of getting things wrong. The algorithm for encoding > > UTF-8 doesn't give you the option of encoding 65 as two bytes; any UCS-4 > > character with code 0-0x7F must result in a onand the same principle goes > > for every other character and the unicdeo standard forbids the use of anything > > but the shortest possible sequence. > > The recommended encoding algorithm forbids anything but the shortest That algorithm is the /definition/ of UTF-8, not just an example. Sure you can actually do it another way, but the result is uniquely defined (or else it's not UTF-8). > Well, as long as every userspace implementation gets it correct, we'll > be OK. Personally, I doubt they all will, especially those that > convert from legacy encodings to Unicode, although quite possibly the > above scenario with combining characters is not likely to happen for > filenames. Or is it? What about copying a file from a filesystem > with a UTF-8 encoding to a filesystem with a legacy encoding, and then > back again? Sounds like you think we want to invent a new problem. The problem is here and it's real (not in the U.S, but the the rest of the world). There are Network file systems (samba in particular), partitions belonging to other OS's (ntfs, fat or even other Linux installation on the same machine), removable devices etc etc. Microsoft introduced a kludge for managing long file names in a short filename context. Since Linux doesn't have the length limit a nicer kludge could be used to represent unicode as non-unicode in userspace like a Uxxxxx. When there is a mismatch there has to be kludge, but it's still many times better than a bunch of character that look like garbage (and cause legacy application so choke). -- robin ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-12 19:39 ` Robin Rosenberg @ 2004-02-12 21:13 ` John Bradford 2004-02-12 22:29 ` Robin Rosenberg 2004-02-13 3:15 ` Jamie Lokier 0 siblings, 2 replies; 81+ messages in thread From: John Bradford @ 2004-02-12 21:13 UTC (permalink / raw) To: Robin Rosenberg; +Cc: Linux kernel Quote from Robin Rosenberg <robin.rosenberg.lists@dewire.com>: > On Thursday 12 February 2004 20.08, you wrote: > > > There are many ways of getting things wrong. The algorithm for encoding > > > UTF-8 doesn't give you the option of encoding 65 as two bytes; any UCS-4 > > > character with code 0-0x7F must result in a onand the same principle goes > > > for every other character and the unicdeo standard forbids the use of anything > > > but the shortest possible sequence. > > > > The recommended encoding algorithm forbids anything but the shortest > That algorithm is the /definition/ of UTF-8, not just an example. Sure you can actually > do it another way, but the result is uniquely defined (or else it's not UTF-8). I know what you're saying, there is only one way to encode the data correctly. I totally agree with that. However, we both know that UTF-8 provides escapes from the 7-bit encoding, and although it goes against the standard to encode 7-bit characters using such sequences, in the real world don't you think that there will be a lot of decoders which decode the multi-byte sequence back, rather than report an error? This is not something that will be happening in the kernel - it will be up to userspace to do it, so there may well be many different implementations. Imagine you have two files, with the following filename bytes: 11000001 10000001 00000000 01000001 00000000 ..and a _real world_ application, which is not necessarily completely UTF-8 conformant, tries to open the file with filename 'A'. Which one is it going to open? > > Well, as long as every userspace implementation gets it correct, we'll > > be OK. Personally, I doubt they all will, especially those that > > convert from legacy encodings to Unicode, although quite possibly the > > above scenario with combining characters is not likely to happen for > > filenames. Or is it? What about copying a file from a filesystem > > with a UTF-8 encoding to a filesystem with a legacy encoding, and then > > back again? > > Sounds like you think we want to invent a new problem. I am aware that similar problems already exist. However, most legacy encodings don't suffer from the first issue we discussed above, where multiple byte sequences could be decoded to the same character codes. I don't think that the issue with combining characters is likely to be an issue, I only mentioned it as an example. As you pointed out a single accented character, and a two character combination are distinct, and converting the combination to the corresponding single character in a filename would definitely be wrong, in my opinion. However, that doesn't mean that software won't do it. John. ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-12 21:13 ` John Bradford @ 2004-02-12 22:29 ` Robin Rosenberg 2004-02-12 22:50 ` Valdis.Kletnieks 2004-02-13 2:58 ` Jamie Lokier 2004-02-13 3:15 ` Jamie Lokier 1 sibling, 2 replies; 81+ messages in thread From: Robin Rosenberg @ 2004-02-12 22:29 UTC (permalink / raw) To: John Bradford; +Cc: Linux kernel On Thursday 12 February 2004 22.13, you wrote: > I know what you're saying, there is only one way to encode the data > correctly. I totally agree with that. > > However, we both know that UTF-8 provides escapes from the 7-bit > encoding, and although it goes against the standard to encode 7-bit > characters using such sequences, in the real world don't you think > that there will be a lot of decoders which decode the multi-byte > sequence back, rather than report an error? This is not something > that will be happening in the kernel - it will be up to userspace to > do it, so there may well be many different implementations. Oh, I wasn't thinking of fixing *every* application out there, but making the kernel api's convert between the user locale and the file system locale, thus restricting the problems to places that can be fixed. An alternative would be glibc since it's used by most apps, but then there could be funny and inefficient interactions with filesystems that already do the job. The "future" common case would be utf-utf conversion for all native file systems, i.e. no work. [... ] > I don't think that the issue with combining characters is likely to be > an issue, I only mentioned it as an example. As you pointed out a > single accented character, and a two character combination are > distinct, and converting the combination to the corresponding single > character in a filename would definitely be wrong, in my opinion. > However, that doesn't mean that software won't do it. Some applications break if I put any non-ascii characters, but they few enough that I can afford the loss. Most shell scripts break if I even have a space in a filename. This shouldn't be any worse than that. The space issue is really serious (but I don't think that can be fixed other than teaching people to program properly, and possibly improving bash's knowledge of the difference between a space and argument separator). -- robin ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-12 22:29 ` Robin Rosenberg @ 2004-02-12 22:50 ` Valdis.Kletnieks 2004-02-13 2:58 ` Jamie Lokier 1 sibling, 0 replies; 81+ messages in thread From: Valdis.Kletnieks @ 2004-02-12 22:50 UTC (permalink / raw) To: Robin Rosenberg; +Cc: Linux kernel [-- Attachment #1: Type: text/plain, Size: 877 bytes --] On Thu, 12 Feb 2004 23:29:11 +0100, Robin Rosenberg said: > a space in a filename. This shouldn't be any worse than that. The space > issue is really serious (but I don't think that can be fixed other than teaching > people to program properly, and possibly improving bash's knowledge of the > difference between a space and argument separator). Other than allocating a key and bytecode for non-breaking-white-space as a separator (Hmm.. allocate 'left-windows' purely for ironic value? ;), how do you propose to actually improve it's knowledge of the distinction? The basic problem is that we're overloading x'20' as both space and separator, and then end up disambiguating based on context and syntax. And quite frankly, I don't see much hope for improving things as long as x'20' is overloaded. Could go the VMS command/this/that/the/other/thing route, I guess? :) [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-12 22:29 ` Robin Rosenberg 2004-02-12 22:50 ` Valdis.Kletnieks @ 2004-02-13 2:58 ` Jamie Lokier 2004-02-13 9:48 ` Robin Rosenberg 1 sibling, 1 reply; 81+ messages in thread From: Jamie Lokier @ 2004-02-13 2:58 UTC (permalink / raw) To: Robin Rosenberg; +Cc: John Bradford, Linux kernel Robin Rosenberg wrote: > Most shell scripts break if I even have a space in a filename. This > shouldn't be any worse than that. The space issue is really serious > (but I don't think that can be fixed other than teaching people to > program properly, and possibly improving bash's knowledge of the > difference between a space and argument separator). Space works fine for me. Completion, wildcard expansion, variable substition etc. all fine. Bash doesn't need changing - your scripts do. -- Jamie ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-13 2:58 ` Jamie Lokier @ 2004-02-13 9:48 ` Robin Rosenberg 0 siblings, 0 replies; 81+ messages in thread From: Robin Rosenberg @ 2004-02-13 9:48 UTC (permalink / raw) To: Jamie Lokier; +Cc: John Bradford, Linux kernel On Friday 13 February 2004 03.58, Jamie Lokier wrote: > Robin Rosenberg wrote: > > Most shell scripts break if I even have a space in a filename. This > > shouldn't be any worse than that. The space issue is really serious > > (but I don't think that can be fixed other than teaching people to > > program properly, and possibly improving bash's knowledge of the > > difference between a space and argument separator). > > Space works fine for me. Completion, wildcard expansion, variable > substition etc. all fine. Bash doesn't need changing - your scripts do. I'm thinking about many scripts in the wild, and my own scripts (usually) handle spaces well, but it's awkward sometimes although quoting usually resolves the issue (never mind what happens with filenames with quotes, newlines and other garabage, but even those work sometimes. Fortunately these are rare, very rare and usually the result of a programming mistake elsewhere :-) On the command line there is no problem. With other script languages I use this is rarely an issue. -- robin ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-12 21:13 ` John Bradford 2004-02-12 22:29 ` Robin Rosenberg @ 2004-02-13 3:15 ` Jamie Lokier 1 sibling, 0 replies; 81+ messages in thread From: Jamie Lokier @ 2004-02-13 3:15 UTC (permalink / raw) To: John Bradford; +Cc: Robin Rosenberg, Linux kernel John Bradford wrote: > in the real world don't you think that there will be a lot of > decoders which decode the multi-byte sequence back, rather than > report an error? There will be decoders which convert ASCII "a" to "A" too. We can't fix broken code; at least we can make it clear to anyone writing a decoder what is acceptable, and that being "liberal" in what's decoded is not acceptable and considered a security flaw. An app author only writes the UTF-8 decoder once; it isn't at all hard to convert non-minimal forms to the replacement char U+FFFD. (Although that could be a security hole in some cases, it's much better than allowing non-zero characters to decoder to NUL or "/" or "."). Rejecting a non-minimal form is often hard, because the UTF-8 decoder is often used in a place which cannot flag errors. > Imagine you have two files, with the following filename bytes: > > 11000001 10000001 00000000 > 01000001 00000000 > > ..and a _real world_ application, which is not necessarily completely > UTF-8 conformant, tries to open the file with filename 'A'. Which one > is it going to open? The one which "ls" and other programs show as "A". The other one will typically show as "?" or a diamond or something. > I don't think that the issue with combining characters is likely to be > an issue, I only mentioned it as an example. As you pointed out a > single accented character, and a two character combination are > distinct, and converting the combination to the corresponding single > character in a filename would definitely be wrong, in my opinion. > However, that doesn't mean that software won't do it. Indeed some software will do it, and worse than that: they may look the same in an editor or file selector. (See recent problems with misleading URLs for why that sort of thing can be a security hole). The combining char problem is similar to case folding: some filesystems and programs treat "a" and "A" as equivalent too. If the kernel had an encoding converter, and the filesystem stored iso-8859-1 while userspace was presented with utf-8, it is likely that several Unicode characters would be mapped to "a", causing similar problems to automatic case folding in filesystems. In other words, there is no clear solution to this problem. -- Jamie ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-12 19:08 ` John Bradford 2004-02-12 19:39 ` Robin Rosenberg @ 2004-02-14 15:24 ` Eduard Bloch 1 sibling, 0 replies; 81+ messages in thread From: Eduard Bloch @ 2004-02-14 15:24 UTC (permalink / raw) To: John Bradford; +Cc: Linux kernel #include <hallo.h> * John Bradford [Thu, Feb 12 2004, 07:08:06PM]: > Well, as long as every userspace implementation gets it correct, we'll > be OK. Personally, I doubt they all will, especially those that > convert from legacy encodings to Unicode, although quite possibly the > above scenario with combining characters is not likely to happen for > filenames. Or is it? What about copying a file from a filesystem > with a UTF-8 encoding to a filesystem with a legacy encoding, and then > back again? I always wondered why there is no "iocharset" option for unixoid filesystems. IMO there could be an easy migration path for existing installations to UTF-8: - convert all filenames to UTF-8 (or any other Unicode encoding) - mount the FS with "iocharset=UTF-8,charset=latin1" (for current Latin1 users). Users can continue to use their latin1 names while they are stored in Unicode on the disk (this is what currently happens with VFAT, a very nice solution IMHO) - when enough applications are ready for multibyte encodings, remove the charset/iocharset workaround and make people use .UTF-8 locales Though, the ultimate solution for the steps 2. and 3. would be the Microsoft-like way: - convert the filenames in libc (from $locale to UTF-8), depending on which locale the user has set This sounds like cheating but would allow to be most flexible and most compatible to encoding-ignoring applications. Eduard. -- Wir sind nichts; was wir suchen ist alles. -- Johann Christian Friedrich Hölderlin ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-12 16:17 ` John Bradford 2004-02-12 16:40 ` Robin Rosenberg @ 2004-02-13 0:17 ` Jamie Lokier 1 sibling, 0 replies; 81+ messages in thread From: Jamie Lokier @ 2004-02-13 0:17 UTC (permalink / raw) To: John Bradford; +Cc: Robin Rosenberg, Linux kernel John Bradford wrote: > > Definitely a good reason. It seem many assume file names are a local thing, > > but this is not so. Now consider the case with an external firewire > > disk or memory stick created on a machine with iso-8859-1 as the system character > > set and e.g xfs as the file system. What happens when I hook it up to a new redhat > > installation that thinks file names are best stored as utf8? Most non-ascii > > file names aren't even legal in utf8. > > Another thing to consider is that you can encode the same character in > several ways using utf8, No, you can't. Only the shortest encoding of a character is valid UTF-8, and any program which claims to comply with Unicode is _required_ to reject all other encodings, citing security as the main reason. That means any code which transcodes UTF-8 to another encoding (such as iso-8859-1) must reject the non-minimal forms as invalid characters, in whatever way that is done. If there's any transcoding code in Linux which doesn't do that, it's a potential security hole and should be fixed. > so two filenames could have different byte strings, but evaluate to > the same set of unicode characters. That's true in some other encodings I think (the iso-2022 ones), but not UTF-8. -- Jamie ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-12 15:55 ` Robin Rosenberg 2004-02-12 16:17 ` John Bradford @ 2004-02-13 0:38 ` Jamie Lokier 2004-02-13 1:16 ` Robin Rosenberg 1 sibling, 1 reply; 81+ messages in thread From: Jamie Lokier @ 2004-02-13 0:38 UTC (permalink / raw) To: Robin Rosenberg; +Cc: Linux kernel Robin Rosenberg wrote: Now consider the case with an external firewire > disk or memory stick created on a machine with iso-8859-1 as the system character > set and e.g xfs as the file system. What happens when I hook it up to a new redhat > installation that thinks file names are best stored as utf8? Most non-ascii > file names aren't even legal in utf8. It goes wrong. This happens both with filesystems that know nothing about encodings, e.g. ext3, and filesystems that need to be told what to transcode to/from utf-8, e.g. ntfs. It is also a problem that some applications access the filesystem assuming utf-8 and some don't. Nothing in the filesystem can make the different applications cooperate regarding these. E.g. I have filenames that look fine in "ls" containg things like c-cedilla, but xmms displays them wrongly. -- Jamie ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-13 0:38 ` Jamie Lokier @ 2004-02-13 1:16 ` Robin Rosenberg 2004-02-13 1:23 ` Jamie Lokier 2004-02-13 2:29 ` viro 0 siblings, 2 replies; 81+ messages in thread From: Robin Rosenberg @ 2004-02-13 1:16 UTC (permalink / raw) To: Jamie Lokier; +Cc: Linux kernel On Friday 13 February 2004 01.38, Jamie Lokier wrote: > Robin Rosenberg wrote: > Now consider the case with an external firewire > > disk or memory stick created on a machine with iso-8859-1 as the system character > > set and e.g xfs as the file system. What happens when I hook it up to a new redhat > > installation that thinks file names are best stored as utf8? Most non-ascii > > file names aren't even legal in utf8. > > It goes wrong. This happens both with filesystems that know nothing > about encodings, e.g. ext3, and filesystems that need to be told what > to transcode to/from utf-8, e.g. ntfs. Yes, so ext3&co. should be equipped with charset options just the other so it can be fixed by the user or in some cases the mount tools. Is there a place to store character set information in these file systems? > It is also a problem that some applications access the filesystem > assuming utf-8 and some don't. Nothing in the filesystem can make the > different applications cooperate regarding these. E.g. I have > filenames that look fine in "ls" containg things like c-cedilla, but > xmms displays them wrongly. Some apps simply don't think non-ascii is relevant. Xmms is one, although is doesn't crash at least. My guess was that it was a font problem since it looks like XMMS uses some special fonts. Even new apps (like gedit have character set problems. These apps have to be fixed since they don't work properly anywhere outside the US. But that is a pure userspace problem, not a kernel one. -- robin ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-13 1:16 ` Robin Rosenberg @ 2004-02-13 1:23 ` Jamie Lokier 2004-02-13 1:46 ` Robin Rosenberg 2004-02-13 2:29 ` viro 1 sibling, 1 reply; 81+ messages in thread From: Jamie Lokier @ 2004-02-13 1:23 UTC (permalink / raw) To: Robin Rosenberg; +Cc: Linux kernel Robin Rosenberg wrote: > Is there a place to store character set information in these file systems? Please don't confuse character set with character encoding. The problem we are talking about here is about character encoding. Once upon a time the two were muddled; that's why MIME and HTTP use "charset" to mean character encoding. And the answer is: yes, you can store it wherever you want :) > Some apps simply don't think non-ascii is relevant. Xmms is one, although > is doesn't crash at least. My guess was that it was a font problem since it > looks like XMMS uses some special fonts. It's not a font problem. XMMS simply displays each byte as a separate character because that's what it assumes it should do. No font will fix that. -- Jamie ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-13 1:23 ` Jamie Lokier @ 2004-02-13 1:46 ` Robin Rosenberg 0 siblings, 0 replies; 81+ messages in thread From: Robin Rosenberg @ 2004-02-13 1:46 UTC (permalink / raw) To: Jamie Lokier; +Cc: Linux kernel On Friday 13 February 2004 02.23, Jamie Lokier wrote: > Robin Rosenberg wrote: > > Is there a place to store character set information in these file systems? > > Please don't confuse character set with character encoding. The > problem we are talking about here is about character encoding. > Once upon a time the two were muddled; that's why MIME and HTTP use > "charset" to mean character encoding. I shall try not to mix them in the future. The reason for the name in MIME is probably because a (mime)charset does specify a character set (+encoding), while the mime-encoding only specifies raw bytes. > And the answer is: yes, you can store it wherever you want :) I was thinking of the file system meta data so mount or the kernel or the fs could handle it. > > Some apps simply don't think non-ascii is relevant. Xmms is one, although > > is doesn't crash at least. My guess was that it was a font problem since it > > looks like XMMS uses some special fonts. > > It's not a font problem. XMMS simply displays each byte as a separate > character because that's what it assumes it should do. No font will fix that. I assumed a font problem because my machine is using ISO-8859-1 and XMMS doesn't display tose non-ascii characters I use; of course it could be both. -- robin ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-13 1:16 ` Robin Rosenberg 2004-02-13 1:23 ` Jamie Lokier @ 2004-02-13 2:29 ` viro 2004-02-13 3:23 ` Jamie Lokier 2004-02-13 10:03 ` Robin Rosenberg 1 sibling, 2 replies; 81+ messages in thread From: viro @ 2004-02-13 2:29 UTC (permalink / raw) To: Robin Rosenberg; +Cc: Jamie Lokier, Linux kernel On Fri, Feb 13, 2004 at 02:16:53AM +0100, Robin Rosenberg wrote: > Yes, so ext3&co. should be equipped with charset options just the other so > it can be fixed by the user or in some cases the mount tools. > > Is there a place to store character set information in these file systems? Bullshit. Just as there is no timezone common for all users, there is no charset common for all of them. Charset of _machine_ doesn't make any sense at all - toy operating systems nonwithstanding. ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-13 2:29 ` viro @ 2004-02-13 3:23 ` Jamie Lokier 2004-02-14 15:09 ` Eduard Bloch 2004-02-13 10:03 ` Robin Rosenberg 1 sibling, 1 reply; 81+ messages in thread From: Jamie Lokier @ 2004-02-13 3:23 UTC (permalink / raw) To: viro; +Cc: Robin Rosenberg, Linux kernel viro@parcelfarce.linux.theplanet.co.uk wrote: > On Fri, Feb 13, 2004 at 02:16:53AM +0100, Robin Rosenberg wrote: > > Yes, so ext3&co. should be equipped with charset options just the other so > > it can be fixed by the user or in some cases the mount tools. > > > > Is there a place to store character set information in these file systems? > > Bullshit. Just as there is no timezone common for all users, there is no > charset common for all of them. Charset of _machine_ doesn't make any sense > at all - toy operating systems nonwithstanding. Charset of a filename does make sense, though. That's not per user, it's per filename. A name which one user entered as "£10.txt" should ideally display as that sequence of characters to all users who want to display the name. I already have this problem on my filesystems: some programs show the names assuming UTF-8, other programs show them assuming iso-8859-1. But it's worse than that. On my filesystem, names are stored in UTF-8 as is recommended these days. "ls" on some terminals shows the names as I wrote them. But on other terminals it shows the wrong names. If I create a file using a shell command, what I get depends on which terminal I used to create it. If I am using a terminal which displays UTF-8 but ssh to another machine, the other machine assumes the terminal is displaying iso-8859-1 even though the other machine's default locale is UTF-8. And so on. -- Jamie ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-13 3:23 ` Jamie Lokier @ 2004-02-14 15:09 ` Eduard Bloch 2004-02-15 1:01 ` Jamie Lokier 0 siblings, 1 reply; 81+ messages in thread From: Eduard Bloch @ 2004-02-14 15:09 UTC (permalink / raw) To: Jamie Lokier; +Cc: Linux kernel #include <hallo.h> * Jamie Lokier [Fri, Feb 13 2004, 03:23:05AM]: > If I create a file using a shell command, what I get depends on which > terminal I used to create it. If I am using a terminal which displays > UTF-8 but ssh to another machine, the other machine assumes the > terminal is displaying iso-8859-1 even though the other machine's > default locale is UTF-8. And so on. Then you have something wrong in the shell configuration of the remote machine. I do not see any problems in having a ssh shell opened from a UTF-8 terminal to a machine where the shell environment is also configured to use UTF-8 environment. The only problem that may appear if you deliberatedly configured the user environment on the other side for latin1, then you would have to fix it in some way. Eg. configuring LANG depending on SSH* variables in .bashrc. Regards, Eduard. -- Das Merkmal eines kleinen Menschen ist, daß er hochmütig wird, wenn er merkt, daß man ihn braucht. -- Friedl Beutelrock ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-14 15:09 ` Eduard Bloch @ 2004-02-15 1:01 ` Jamie Lokier 2004-02-16 14:03 ` Eduard Bloch 0 siblings, 1 reply; 81+ messages in thread From: Jamie Lokier @ 2004-02-15 1:01 UTC (permalink / raw) To: Eduard Bloch; +Cc: Linux kernel Eduard Bloch wrote: > > If I create a file using a shell command, what I get depends on which > > terminal I used to create it. If I am using a terminal which displays > > UTF-8 but ssh to another machine, the other machine assumes the > > terminal is displaying iso-8859-1 even though the other machine's > > default locale is UTF-8. And so on. > > Then you have something wrong in the shell configuration of the remote > machine. I do not see any problems in having a ssh shell opened from a > UTF-8 terminal to a machine where the shell environment is also > configured to use UTF-8 environment. Of course that's fine. What goes wrong is when you connect to that same machine from another terminal which is not UTF-8. There are in fact two different problems, and you have ignored them both :) Firstly, "ls", editors, filenames: The shell configuration is irrelevant. If I create a file name like "£100.txt" (that's POUND followed by "100.txt") when I'm connected from a UTF-8 terminal, it creates a filename encoded in UTF-8 and displays it fine. If I then log in to the same machine from another terminal which displays latin1, then "ls" will _not_ display the name correctly _regardless_ of shell or locale configuration. If I then create a file called "£100.txt" (same name) using the terminal which displays latin1, it creates a filename encoded in latin1. When I log in using the UTF-8 terminal, "ls" won't display the second name as it was entered. Neither will GNOME or KDE. Unfortunately, to be compatible with shell utilities, programs like Mutt and Emacs which _are_ aware of the display and input encodings will use the current terminal's encoding when accessing the filesystem. So even those programs create file names with the wrong encoding, if you log in from the wrong kind of terminal. When I open a file in Emacs, and the file contains UTF-8, that displays just fine on either kind of terminal (provided the terminal can display the characters). But Emacs, and many other programs, will display the wrong file _names_ when logged in from the wrong kind of terminal. Secondly, message locale and the shell: There is no mechanism for SSH to convey which character encoding the remote machine must use for displaying and inputting text, yet client terminals come in different flavours. That is the problem. (On my laptop, for example, which is a standard RH9, Gnome terminal windows are UTF-8 but console is latin1). These are both fine locally. There is no configuration on a remote machine which is right for both of them, though.) I think this is because the character encoding used by the terminal should be in the TERM environment variable, but it is in LANG instead. > The only problem that may appear if you deliberatedly configured the > user environment on the other side for latin1, then you would have to > fix it in some way. Eg. configuring LANG depending on SSH* variables in > .bashrc. No. If I have a plain shell with no configuration at all, then both charset-aware programs like Mutt and Emacs, and non-charset-aware code like filename display from "ls" do _not_ automatically display filenames properly on both kinds of client terminal. In the former case it is because SSH does not automatically convey the appropriate setting for LANG, which (rather dubiously) includes whether to use UTF-8 for display. In the latter case, "ls" and such, there is nothing SSH can do. (And that's what makes this relevant to linux-kernel - "ls" has no way to display names correctly on both terminal types precisely because it does not have any information about the character encoding of the filenames returned by readdir()). The result of all this is that everything works fine as long as you only log in from the kind of terminal which matches the remote machine. Unfortunately, while the modern GUIs all use UTF-8 (this is a good thing in the long run), both the default Linux console, and most non-Linux terminals, do not use UTF-8. Therefore file names are generally created and displayed in UTF-8 when using any of the modern GUIs, including GUI terminals, but file names are generally created and displayed in a locale-specific encoding (usually iso-8859-1) when using any console, external terminal, or ssh from an older client. Btw, as a practical matter, it took me about a year before I figured out how to enter a "£" (POUND) symbol into a message being edited with Mutt and Emacs on a remote server. Until I learned to explicitly set "LANG=en_GB.utf8" on the remote server when I logged in from GNOME Terminal (it was a RH9 box which by default set LANG=en_GB, which is _correct_ for most clients), typing "£" just didn't enter anything. Third problem (a straightforward Linux bug): I just did unicode_start on the console, which turns on UTF-8 for that virtual terminal - for display and for keyboard input. Then I did unicode_stop. Guess what: it put the display back in iso-8859-1 for that virtual terminal, but the keyboard remained stuck in UTF-8 for _all_ virtual terminals. Once in that state, I had difficulty typing the pound sign which appears earlier in this message, and in fact I don't know how to restore the console without rebooting the client machine. "reset" doesn't work; using a different virtual terminal doesn't work. -- Jamie ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-15 1:01 ` Jamie Lokier @ 2004-02-16 14:03 ` Eduard Bloch 2004-02-16 14:28 ` Jamie Lokier ` (3 more replies) 0 siblings, 4 replies; 81+ messages in thread From: Eduard Bloch @ 2004-02-16 14:03 UTC (permalink / raw) To: Jamie Lokier; +Cc: Linux kernel #include <hallo.h> * Jamie Lokier [Sun, Feb 15 2004, 01:01:50AM]: > > Then you have something wrong in the shell configuration of the remote > > machine. I do not see any problems in having a ssh shell opened from a > > UTF-8 terminal to a machine where the shell environment is also > > configured to use UTF-8 environment. > > Of course that's fine. What goes wrong is when you connect to that > same machine from another terminal which is not UTF-8. > > There are in fact two different problems, and you have ignored them both :) > > Firstly, "ls", editors, filenames: > > The shell configuration is irrelevant. If I create a file name > like "£100.txt" (that's POUND followed by "100.txt") when I'm Sure, sure, I can read it since I use UTF-8 too. > connected from a UTF-8 terminal, it creates a filename encoded in > UTF-8 and displays it fine. > > If I then log in to the same machine from another terminal which > displays latin1, then "ls" will _not_ display the name correctly > _regardless_ of shell or locale configuration. I know what you mean and that is why I already proposed a radical solution. Let me repeat it: - convert all files from the previous charset to UTF-8 overnight if the previous charset was unknown, first make sure that you can guess it for all users and contact users that have files with suspicous filenames (eg. not convertable from Latin1). Or look trough their shell/X config files (*) - in libc, implement a recoding function to convert file names from LC_CTYPE to the underlying UTF-8 encoding Done. (*) There is no other way. Linux developers ignored the diversity of charset/encodings over many years and now the needed information is lost (not stored anywhere in the filesystem) > If I then create a file called "£100.txt" (same name) using the > terminal which displays latin1, it creates a filename encoded in > latin1. Of course. That is what the conversion shoudl be done in Userspace (libc). The kernel itself does not know about used locale. > Unfortunately, to be compatible with shell utilities, programs like > Mutt and Emacs which _are_ aware of the display and input encodings > will use the current terminal's encoding when accessing the That is the correct way, though. > filesystem. So even those programs create file names with the > wrong encoding, if you log in from the wrong kind of terminal. It is the _right_ enconding in the moment when they create it. > Secondly, message locale and the shell: > > There is no mechanism for SSH to convey which character encoding > the remote machine must use for displaying and inputting text, yet > client terminals come in different flavours. That is the problem. > > (On my laptop, for example, which is a standard RH9, Gnome terminal > windows are UTF-8 but console is latin1). These are both fine > locally. There is no configuration on a remote machine which is right > for both of them, though.) Yup, I know that problem. At least to display them correctly, you can either run unicode_start (to enable console's own conversion) which sucks when they are chars from completely different language groups, eg. latin and cyrillic. I used dynafont for a while which worked well for displaying characters. > I think this is because the character encoding used by the terminal > should be in the TERM environment variable, but it is in LANG instead. No. TERM does not have anything to do with locales (LANG). Regards, Eduard. -- Selbstlosigkeit ist ausgereifter Egoismus. -- Herbert Spencer ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-16 14:03 ` Eduard Bloch @ 2004-02-16 14:28 ` Jamie Lokier 2004-02-16 19:22 ` Eduard Bloch 2004-02-16 15:18 ` Valdis.Kletnieks ` (2 subsequent siblings) 3 siblings, 1 reply; 81+ messages in thread From: Jamie Lokier @ 2004-02-16 14:28 UTC (permalink / raw) To: Eduard Bloch; +Cc: Linux kernel Eduard Bloch wrote: > > I think this is because the character encoding used by the terminal > > should be in the TERM environment variable, but it is in LANG instead. > > No. TERM does not have anything to do with locales (LANG). No. The locale should not have anything to do with the appropriate byte sequences need to make the terminal display characters. It is wrong that LANG must have a different value depending on whether I log in using a DEC VT100 or a Gnome Terminal, even though I wish to see exactly the same language, dialect, messages, number formats, currency formats, dates and times. It is acceptable that LANG may control the encoding stored in files and filenames, but this should be independent of the terminal type. It is especially wrong that libraries which should be locale-independent - such as curses, slang and readline - must read the LANG variable in addition to TERM. If curses does not read and parse LANG, simple things like the box around a dialog will not line up correctly. This is wrong - it is a terminal characteristic. -- Jamie ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-16 14:28 ` Jamie Lokier @ 2004-02-16 19:22 ` Eduard Bloch 2004-02-16 21:44 ` Jamie Lokier 0 siblings, 1 reply; 81+ messages in thread From: Eduard Bloch @ 2004-02-16 19:22 UTC (permalink / raw) To: Jamie Lokier; +Cc: Linux kernel #include <hallo.h> * Jamie Lokier [Mon, Feb 16 2004, 02:28:07PM]: > > > I think this is because the character encoding used by the terminal > > > should be in the TERM environment variable, but it is in LANG instead. > > > > No. TERM does not have anything to do with locales (LANG). > > No. The locale should not have anything to do with the appropriate > byte sequences need to make the terminal display characters. Heh. It would be very nice if we had the situation that you describe, but that is actually not the case. TERM specifies the general capabilities of the terminal. It does _not_ tell the application inside which FONT encoding is used, nor whether it is compatible with multibyte input. > It is wrong that LANG must have a different value depending on whether > I log in using a DEC VT100 or a Gnome Terminal, even though I wish to > see exactly the same language, dialect, messages, number formats, > currency formats, dates and times. Nonsense, sorry. How should your application know how to encode its output? How should it know which font is used. I have heard about some magic strings that application can send to the Xterm (when TERM=xterm) to tell it to change the font encoding (similar to the string used to set the window Title used by mc, for example). But this is an extension, not mandatory for a general implentation of a "terminal". > It is acceptable that LANG may control the encoding stored in files > and filenames, but this should be independent of the terminal type. And what controls the font setting? (see above) > It is especially wrong that libraries which should be > locale-independent - such as curses, slang and readline - must read > the LANG variable in addition to TERM. If curses does not read and See above. Especially since different chars are used to draw graphical characters (lines, boxes, ...), they _must_ know which font encoding they have to expect. Regards, Eduard. -- Zufälle sind die Mittel des Schicksals, durch die es seine wichtigsten Pläne mit uns durchführt. -- Charles Tschopp ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-16 19:22 ` Eduard Bloch @ 2004-02-16 21:44 ` Jamie Lokier 0 siblings, 0 replies; 81+ messages in thread From: Jamie Lokier @ 2004-02-16 21:44 UTC (permalink / raw) To: Eduard Bloch; +Cc: Linux kernel Eduard Bloch wrote: > TERM specifies the general capabilities of the terminal. It does > _not_ tell the application inside which FONT encoding is used, nor > whether it is compatible with multibyte input. It should - especially the multibyte encoding. The font is irrelevant; our trouble here is *character encoding* which has nothing to do with fonts. Please don't use the incorrect term as there is widespread confusion over it already. That isn't just about which glyph is displayed in response to each byte. UTF-8 affects terminal escape sequence parsing, and also the relationship between number of non-control bytes transmitted and the distance moved by the cursor. If I write a UTF-8 string to a VT220-like terminal (such as xterm approximates), some text characters are interpreted as terminal commands. (Hint: 0x9b (which can occur in UTF-8 text) is equivalent to 0x1c 0x5b, the control sequence introducer; there are others too). When you edit a line with the unix terminal line editor, when you type DEL, it writes BACKSPACE-SPACE-BACKSPACE and removes one byte from the input. That utterly fails to do the right thing on UTF-8 terminals. For example, run the command "cat" by itself, then type "£££", then hit DEL twice - it will show one pound sterling sign. Press enter, and cat will echo the line containing _two_ pound sterling signs. No setting of LANG or TERM makes that behave correctly. So, do you think the kernel's line editor should be locale-aware too? :) > > It is wrong that LANG must have a different value depending on whether > > I log in using a DEC VT100 or a Gnome Terminal, even though I wish to > > see exactly the same language, dialect, messages, number formats, > > currency formats, dates and times. NB: It's wrong because LANG should be for terminal-independent locale properties, such as which languages I want to use and how I want text files stored. If I log into a remote machine, I want characters displayed according to the local terminal's requirements, but I want text files and filenames to use the remote machine's locale, naturally. > Nonsense, sorry. How should your application know how to encode its > output? Increasingly I'm thinking UTF-8-ness should be a terminal capability, like ocrnl. The kernel's own line editor needs to know this property anyway, and it would really help with moving filenames and everything else over to UTF-8 - with no change to the simple unix programs such as the shell utilities. > > It is especially wrong that libraries which should be > > locale-independent - such as curses, slang and readline - must > > read the LANG variable in addition to TERM. > > See above. Especially since different chars are used to draw graphical > characters (lines, boxes, ...), they _must_ know which font encoding > they have to expect. See "acsc" in the terminfo(5) database. Line & box drawing characters have been treated as a terminal capability for a long time. Case made :) -- Jamie ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-16 14:03 ` Eduard Bloch 2004-02-16 14:28 ` Jamie Lokier @ 2004-02-16 15:18 ` Valdis.Kletnieks 2004-02-16 15:32 ` Jamie Lokier 2004-02-16 15:46 ` John Bradford 2004-02-16 15:27 ` Jamie Lokier 2004-02-16 15:44 ` Robin Rosenberg 3 siblings, 2 replies; 81+ messages in thread From: Valdis.Kletnieks @ 2004-02-16 15:18 UTC (permalink / raw) To: Eduard Bloch; +Cc: Jamie Lokier, Linux kernel [-- Attachment #1: Type: text/plain, Size: 957 bytes --] On Mon, 16 Feb 2004 15:03:38 +0100, Eduard Bloch said: > - convert all files from the previous charset to UTF-8 overnight > if the previous charset was unknown, first make sure that you can > guess it for all users and contact users that have files with > suspicous filenames (eg. not convertable from Latin1). Or look trough > their shell/X config files (*) Hazardous. > - in libc, implement a recoding function to convert file names from > LC_CTYPE to the underlying UTF-8 encoding Hmm.. could be fun if somebody is calling 'open', and the UTF-8 encoding requires the insertion of extra characters to encode it - what do you do then? That looks like a security hole just waiting to happen. Probably has lots of lurking corner cases too - what if you creat() a file, then do a readdir() and strcmp() each entry looking for your file (while comparing a filename smashed to UTF-8 to the original unsmashed string)? [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-16 15:18 ` Valdis.Kletnieks @ 2004-02-16 15:32 ` Jamie Lokier 2004-02-16 19:13 ` Eduard Bloch 2004-02-16 15:46 ` John Bradford 1 sibling, 1 reply; 81+ messages in thread From: Jamie Lokier @ 2004-02-16 15:32 UTC (permalink / raw) To: Valdis.Kletnieks; +Cc: Eduard Bloch, Linux kernel Valdis.Kletnieks@vt.edu wrote: > > - in libc, implement a recoding function to convert file names from > > LC_CTYPE to the underlying UTF-8 encoding > > Hmm.. could be fun if somebody is calling 'open', and the UTF-8 encoding > requires the insertion of extra characters to encode it - what do you do then? > That looks like a security hole just waiting to happen. Probably > has lots of lurking corner cases too - what if you creat() a file, > then do a readdir() and strcmp() each entry looking for your file > (while comparing a filename smashed to UTF-8 to the original > unsmashed string)? Actually, following Eduard's proposal, that would work fine. The file name would be passed to libc in the current encoding, created in UTF-8, libc's readdir() would convert it back (which is always possible without mangling), and strcmp() would be fine. The real problem comes when you readdir() a directory which contains non-UTF-8 names. Even if you changes your local filesystem, when you go travelling an remotely-mounted filesystem elsewhere may have them. What does Eduard's libc do then? Ignore the names? Mangle them? Not to mention the extremely unpleasant performance implications. -- Jamie ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-16 15:32 ` Jamie Lokier @ 2004-02-16 19:13 ` Eduard Bloch 0 siblings, 0 replies; 81+ messages in thread From: Eduard Bloch @ 2004-02-16 19:13 UTC (permalink / raw) To: Jamie Lokier; +Cc: Valdis.Kletnieks, Linux kernel #include <hallo.h> * Jamie Lokier [Mon, Feb 16 2004, 03:32:24PM]: > > That looks like a security hole just waiting to happen. Probably > > has lots of lurking corner cases too - what if you creat() a file, > > then do a readdir() and strcmp() each entry looking for your file > > (while comparing a filename smashed to UTF-8 to the original > > unsmashed string)? > > Actually, following Eduard's proposal, that would work fine. The file > name would be passed to libc in the current encoding, created in > UTF-8, libc's readdir() would convert it back (which is always > possible without mangling), and strcmp() would be fine. > > The real problem comes when you readdir() a directory which contains > non-UTF-8 names. Even if you changes your local filesystem, when you > go travelling an remotely-mounted filesystem elsewhere may have them. > What does Eduard's libc do then? Ignore the names? Mangle them? Just pass the uncoverted strings then. Please note that this is exactly what happens today - every application running in UTF-8 locale and facing incompatible filenames has to deal with this problem. I wonder why so many people pretend that the current situation is "less or more okay". > Not to mention the extremely unpleasant performance implications. You always loose a bit performance when dealing with Unicode. Just accept it. Regards, Eduard. -- Lang ist der Weg durch Lehren, kurz und wirksam durch Beispiele. -- Lucius Annaeus Seneca (4-65 n.Chr.) ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-16 15:18 ` Valdis.Kletnieks 2004-02-16 15:32 ` Jamie Lokier @ 2004-02-16 15:46 ` John Bradford 2004-02-16 15:48 ` viro 2004-02-16 16:25 ` Robin Rosenberg 1 sibling, 2 replies; 81+ messages in thread From: John Bradford @ 2004-02-16 15:46 UTC (permalink / raw) To: Valdis.Kletnieks, Eduard Bloch; +Cc: Jamie Lokier, Linux kernel > > - convert all files from the previous charset to UTF-8 overnight > > if the previous charset was unknown, first make sure that you can > > guess it for all users and contact users that have files with > > suspicous filenames (eg. not convertable from Latin1). Or look troug= > h > > their shell/X config files (*) > > Hazardous. > > > - in libc, implement a recoding function to convert file names from > > LC_CTYPE to the underlying UTF-8 encoding > > Hmm.. could be fun if somebody is calling 'open', and the UTF-8 encoding > requires the insertion of extra characters to encode it - what do you do = > then? > That looks like a security hole just waiting to happen. Probably has lot= > s of > lurking corner cases too - what if you creat() a file, then do a readdir(= > ) and > strcmp() each entry looking for your file (while comparing a filename sma= > shed > to UTF-8 to the original unsmashed string)? The current situation is that so many applications simply treat filenames as arbitrary sequences of bytes. With many encodings, this simply happens to work, and an encoding mis-match will result in some incorrect characters being displayed for byte values > 127. However, some encodings, such as UTF-8, are simply _not_ compatible with the 'you can also treat it like an arbitrary byte string model', and there is a very real potential for security holes in bad implementations if we go down the "it's an arbitrary byte string, but you _should_ store UTF-8 there" route. Maybe we should forget filename encoding altogether, and start thinking of filenames as arbitrary sequences of _32-bit words_. Existing applications can store their arbitrary byte sequences in the low byte, and new calls can be added to provide Unicode-aware userspace applications with access to the 32-bit space, which _must_ be used for UCS-4. John. ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-16 15:46 ` John Bradford @ 2004-02-16 15:48 ` viro 2004-02-16 16:43 ` John Bradford 2004-02-16 16:25 ` Robin Rosenberg 1 sibling, 1 reply; 81+ messages in thread From: viro @ 2004-02-16 15:48 UTC (permalink / raw) To: John Bradford; +Cc: Valdis.Kletnieks, Eduard Bloch, Jamie Lokier, Linux kernel On Mon, Feb 16, 2004 at 03:46:21PM +0000, John Bradford wrote: > The current situation is that so many applications simply treat > filenames as arbitrary sequences of bytes. With many encodings, this > simply happens to work, and an encoding mis-match will result in some > incorrect characters being displayed for byte values > 127. However, > some encodings, such as UTF-8, are simply _not_ compatible with the > 'you can also treat it like an arbitrary byte string model', and there Excuse me? Would you fscking mind explaining what, in your opinion, UTF-8 is and what makes "simply _not_ compatible" with aforementioned model? ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-16 15:48 ` viro @ 2004-02-16 16:43 ` John Bradford 0 siblings, 0 replies; 81+ messages in thread From: John Bradford @ 2004-02-16 16:43 UTC (permalink / raw) To: viro; +Cc: Valdis.Kletnieks, Eduard Bloch, Jamie Lokier, Linux kernel, john Quote from viro@parcelfarce.linux.theplanet.co.uk: > On Mon, Feb 16, 2004 at 03:46:21PM +0000, John Bradford wrote: > > The current situation is that so many applications simply treat > > filenames as arbitrary sequences of bytes. With many encodings, this > > simply happens to work, and an encoding mis-match will result in some > > incorrect characters being displayed for byte values > 127. However, > > some encodings, such as UTF-8, are simply _not_ compatible with the > > 'you can also treat it like an arbitrary byte string model', and there > > Excuse me? Would you fscking mind explaining what, in your opinion, > UTF-8 is Read the UTF-8 manual page. > and what makes "simply _not_ compatible" with aforementioned > model? Byte values > 127 in UTF-8 don't map to single characters, but instead many of them form part of an escape to a larger set of values. The net effect is that if you have filenames in an existing 8-bit encoding, such as any of the ISO-8859- encodings, and treat them as being in another, similar encoding, you may get some incorrect characters. This is not ideal, of course, but it is not very confusing for end users. You can more or less store arbitrary bytes in the filename, and usually at least get a displayable, re-typeable, somewhat usable result out. Note that _many_ current applications expect to be able to do just that. However, with UTF-8, random bytes > 127 may not map to any valid character sequence at all, or may map to a sequence that is not permitted by the spec, but which is, for example, a 31-bit representation of a value < 128. These are a potential source of security vulnerabilities for badly written decoders. Now, this problem is not limited to UTF-8 - many 16 bit encodings may have similar issues with 'random' byte streams. However, with my proposed solution, Unicode-aware applications can be adapted to write their filenames as UCS-4, and existing applications which continue to see 8-bit byte streams which they can interpret as they like, will see 7-bit ascii for characters which can be represented in it, and a random character for those which can't. Assuming that those applications treat the byte sequence as an ISO-8859- type character set, (not UTF-8, or a 16-bit character set), this shouldn't be too much of a problem, except where the low byte of the UCS-4 character is \0 or /. We can work around this by replacing such bytes with another character in the 8-bit read routine, (which isn't expected to deal with anything other than 7-bit ASCII 100% correctly, (which is no worse than what we have at the moment, as far as I can see)). Applications which do treat the 8-bit byte stream as UTF-8 or an existing 16-bit encoding should have only one additional thing to deal with over what they have to deal with today, and that is the potential for a filename created from truncated 32-bit UCS-4 values to contain \0 or /. I suggested above that the kernel could deal with that by substituting another value, but obviously UTF-8 and 16-bit encodings are more sensitive to what that substitute value is, than ISO-8859- type encodings are. John. ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-16 15:46 ` John Bradford 2004-02-16 15:48 ` viro @ 2004-02-16 16:25 ` Robin Rosenberg 1 sibling, 0 replies; 81+ messages in thread From: Robin Rosenberg @ 2004-02-16 16:25 UTC (permalink / raw) To: John Bradford; +Cc: Valdis.Kletnieks, Eduard Bloch, Jamie Lokier, Linux kernel On Monday 16 February 2004 16.46, John Bradford wrote: > Maybe we should forget filename encoding altogether, and start > thinking of filenames as arbitrary sequences of _32-bit words_. > Existing applications can store their arbitrary byte sequences in the > low byte, and new calls can be added to provide Unicode-aware > userspace applications with access to the 32-bit space, which _must_ > be used for UCS-4. You forgot a :-). Right :-/ -- robin ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-16 14:03 ` Eduard Bloch 2004-02-16 14:28 ` Jamie Lokier 2004-02-16 15:18 ` Valdis.Kletnieks @ 2004-02-16 15:27 ` Jamie Lokier 2004-02-16 15:44 ` Robin Rosenberg 3 siblings, 0 replies; 81+ messages in thread From: Jamie Lokier @ 2004-02-16 15:27 UTC (permalink / raw) To: Eduard Bloch; +Cc: Linux kernel Eduard Bloch wrote: > Yup, I know that problem. At least to display them correctly, you can > either run unicode_start (to enable console's own conversion) which > sucks when they are chars from completely different language groups, eg. > latin and cyrillic. I used dynafont for a while which worked well for > displaying characters. Sorry, unicode_start doesn't work on most terminals (e.g. the VT100 downstairs or the Putty in the internet cafe), and it's also very antisocial to do when I log in from someone else's Linux console. -- Jamie ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-16 14:03 ` Eduard Bloch ` (2 preceding siblings ...) 2004-02-16 15:27 ` Jamie Lokier @ 2004-02-16 15:44 ` Robin Rosenberg 3 siblings, 0 replies; 81+ messages in thread From: Robin Rosenberg @ 2004-02-16 15:44 UTC (permalink / raw) To: Eduard Bloch; +Cc: Jamie Lokier, Linux kernel On Monday 16 February 2004 15.03, Eduard Bloch wrote: > I know what you mean and that is why I already proposed a radical > solution. Let me repeat it: > > - convert all files from the previous charset to UTF-8 overnight > if the previous charset was unknown, first make sure that you can > guess it for all users and contact users that have files with > suspicous filenames (eg. not convertable from Latin1). Or look trough > their shell/X config files (*) Thankfully isolatin-1 (and all other encodings in use AFAIK) can be converted to UTF-8. IsoLatin1 is also extremly simpe to convert- -- robin ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-13 2:29 ` viro 2004-02-13 3:23 ` Jamie Lokier @ 2004-02-13 10:03 ` Robin Rosenberg 2004-02-13 10:22 ` vda 1 sibling, 1 reply; 81+ messages in thread From: Robin Rosenberg @ 2004-02-13 10:03 UTC (permalink / raw) To: viro; +Cc: Jamie Lokier, Linux kernel On Friday 13 February 2004 03.29, you wrote: > On Fri, Feb 13, 2004 at 02:16:53AM +0100, Robin Rosenberg wrote: > > Yes, so ext3&co. should be equipped with charset options just the other so > > it can be fixed by the user or in some cases the mount tools. > > > > Is there a place to store character set information in these file systems? > > Bullshit. Just as there is no timezone common for all users, there is no > charset common for all of them. Charset of _machine_ doesn't make any sense > at all - toy operating systems nonwithstanding. For us using toy languages, we see characters in filenames, not byte sequences, and if whenever possible users should see the same name regardless of locale. -- robin ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-13 10:03 ` Robin Rosenberg @ 2004-02-13 10:22 ` vda 2004-02-13 10:29 ` Robin Rosenberg 0 siblings, 1 reply; 81+ messages in thread From: vda @ 2004-02-13 10:22 UTC (permalink / raw) To: Robin Rosenberg, viro; +Cc: Jamie Lokier, Linux kernel On Friday 13 February 2004 12:03, Robin Rosenberg wrote: > On Friday 13 February 2004 03.29, you wrote: > > On Fri, Feb 13, 2004 at 02:16:53AM +0100, Robin Rosenberg wrote: > > > Yes, so ext3&co. should be equipped with charset options just the other > > > so it can be fixed by the user or in some cases the mount tools. > > > > > > Is there a place to store character set information in these file > > > systems? > > > > Bullshit. Just as there is no timezone common for all users, there is no > > charset common for all of them. Charset of _machine_ doesn't make any > > sense at all - toy operating systems nonwithstanding. > > For us using toy languages, we see characters in filenames, not byte > sequences, and if whenever possible users should see the same name > regardless of locale. Al says that there can be a hundred of users on the box _simultaneously_, each with different locale. fs should store filenames in locale-agnostic way. -- vda ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-13 10:22 ` vda @ 2004-02-13 10:29 ` Robin Rosenberg 0 siblings, 0 replies; 81+ messages in thread From: Robin Rosenberg @ 2004-02-13 10:29 UTC (permalink / raw) To: vda; +Cc: viro, Jamie Lokier, Linux kernel On Friday 13 February 2004 11.22, vda wrote: > Al says that there can be a hundred of users on the box _simultaneously_, > each with different locale. fs should store filenames > in locale-agnostic way. I assume we agree then :-) -- robin ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-12 0:45 ` Andy Isaacson ` (2 preceding siblings ...) 2004-02-12 8:54 ` Jamie Lokier @ 2004-02-12 13:28 ` Dave Kleikamp 2004-02-12 15:26 ` Valdis.Kletnieks 4 siblings, 0 replies; 81+ messages in thread From: Dave Kleikamp @ 2004-02-12 13:28 UTC (permalink / raw) To: Andy Isaacson; +Cc: linux-kernel On Wed, 2004-02-11 at 18:45, Andy Isaacson wrote: > Why on earth is JFS worried about the filename, anyways? Why has it > *ever* had *any* behavior other than "string of bytes, delimited with /, > terminated with \0" ? The problem that was addressed in OS/2 was that one user using locale A would create some files using non-ascii characters. Then a user using locale B would access these files, but the characters in those names did not make sense in his locale. Storing the file names in unicode allowed the characters to always translate to the correct characters in the user's locale, when the charset allowed it. I'm not familiar enough with the European locales to give specific examples. It was never an issue in the U.S. :^) The OS/2 kernel has locale information for each process, so this actually works very well there. I will admit that it was a mistake not to change the default behavior when we ported this to Linux. > I read your response about OS/2, and maybe I'm just slow, but I don't > see what that has to do with anything. > > Does JFS on AIX have the same buggy behavior? I know that JFS1 did not. I'm not sure about JFS2, since it was ported from the same OS/2 code base. > What behavior was the code originally designed to implement, on OS/2? > Why was that behavior chosen rather than "filenames are a string of > bytes"? I hope I explained that well enough above. > Feel free to point to a "Design of the OS/2 JFS interface" document if > such exists and answers my question. :) > > -andy -- David Kleikamp IBM Linux Technology Center ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-12 0:45 ` Andy Isaacson ` (3 preceding siblings ...) 2004-02-12 13:28 ` Dave Kleikamp @ 2004-02-12 15:26 ` Valdis.Kletnieks 2004-02-12 15:41 ` Dave Kleikamp 4 siblings, 1 reply; 81+ messages in thread From: Valdis.Kletnieks @ 2004-02-12 15:26 UTC (permalink / raw) To: Andy Isaacson; +Cc: Dave Kleikamp, linux-kernel [-- Attachment #1: Type: text/plain, Size: 484 bytes --] On Wed, 11 Feb 2004 18:45:32 CST, Andy Isaacson said: > Does JFS on AIX have the same buggy behavior? Nope, it's been tolerant of all 254 bit patterns except \0 and '/' since at least AIX 3.1.2 back in the early 90s. It doesn't even have a concept of "UTF-8 filename" - it considers that a userspace issue. Now, over the last 15 years I've tripped over a number of *userspace* things that did really stupid things when handed non-ASCII filenames, but that's a different issue... [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-12 15:26 ` Valdis.Kletnieks @ 2004-02-12 15:41 ` Dave Kleikamp 0 siblings, 0 replies; 81+ messages in thread From: Dave Kleikamp @ 2004-02-12 15:41 UTC (permalink / raw) To: Valdis.Kletnieks; +Cc: Andy Isaacson, linux-kernel On Thu, 2004-02-12 at 09:26, Valdis.Kletnieks@vt.edu wrote: > On Wed, 11 Feb 2004 18:45:32 CST, Andy Isaacson said: > Now, over the last 15 years I've tripped over a number of *userspace* > things that did really stupid things when handed non-ASCII filenames, > but that's a different issue... That's the problem that OS/2 addressed. In OS/2 each application would see the correct charset for its locale, no matter what the locale of the application that created the file was. In Linux, the file system simply doesn't have the information needed to do this, so it was a mistake to try to imitate it. -- David Kleikamp IBM Linux Technology Center ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) @ 2004-02-12 16:50 Nicolas Mailhot 2004-02-12 18:12 ` Robin Rosenberg 2004-02-13 3:03 ` Jamie Lokier 0 siblings, 2 replies; 81+ messages in thread From: Nicolas Mailhot @ 2004-02-12 16:50 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: text/plain, Size: 1450 bytes --] Not specifying the file name encoding (either per fs type, per partition or per filename) is plain dangerous. It is not a userspace problem - flash/hotplug disks move, users on the same system can have different locales and try to share files, a user can change his locale to another one (hear the screams of RH users forcibly converted to utf8 which had to fix years of storage which filenames were suddenly borked) See also the sun zip encoding bug - everyone uses zip files in Java, zip authors thought a filename is "just a bunch of bytes" and didn't put filename encoding info in the zip format, and now java zip handling goes boom since numerous encodings are unicode-incompatible. It's slowly getting its way to the top-25 most reported java bugs. (of course as usual US users/coders are not hit and do not feel concerned) The only reason we got by with it so far is linux localisation was poor, and systems didn't scale high enough to permit high number of users per system (reducing locale collision risks) The only reason we might get by in the future is everyone will be using utf8. But that's not a reason not to fix the core problem - I don't want to spent hours fixing filenames next time someone comes up with a new encoding. Please put valid encoding info somewhere or declare filenames are utf-8 od utf-16 only - changing user locale should not corrupt old data. Cheers, -- Nicolas Mailhot [-- Attachment #2: Ceci est une partie de message numériquement signée --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-12 16:50 Nicolas Mailhot @ 2004-02-12 18:12 ` Robin Rosenberg 2004-02-13 3:03 ` Jamie Lokier 1 sibling, 0 replies; 81+ messages in thread From: Robin Rosenberg @ 2004-02-12 18:12 UTC (permalink / raw) To: Nicolas Mailhot; +Cc: linux-kernel On Thursday 12 February 2004 17.50, you wrote: > But that's not a reason not to fix the core problem - I don't want to > spent hours fixing filenames next time someone comes up with a new > encoding. Please put valid encoding info somewhere or declare filenames > are utf-8 od utf-16 only - changing user locale should not corrupt old > data. Yes! -- robin ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-12 16:50 Nicolas Mailhot 2004-02-12 18:12 ` Robin Rosenberg @ 2004-02-13 3:03 ` Jamie Lokier 2004-02-13 10:07 ` Robin Rosenberg 2004-02-13 18:06 ` Nicolas Mailhot 1 sibling, 2 replies; 81+ messages in thread From: Jamie Lokier @ 2004-02-13 3:03 UTC (permalink / raw) To: Nicolas Mailhot; +Cc: linux-kernel Nicolas Mailhot wrote: > But that's not a reason not to fix the core problem - I don't want to > spent hours fixing filenames next time someone comes up with a new > encoding. Please put valid encoding info somewhere or declare filenames > are utf-8 od utf-16 only - changing user locale should not corrupt old > data. If you attach encoding to names for a whole filesystem, you will get really unpleasant bugs including security holes because some names won't be writable, so the fs will either return error codes when those names are used, or silently alter the names. -- Jamie ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-13 3:03 ` Jamie Lokier @ 2004-02-13 10:07 ` Robin Rosenberg 2004-02-13 18:06 ` Nicolas Mailhot 1 sibling, 0 replies; 81+ messages in thread From: Robin Rosenberg @ 2004-02-13 10:07 UTC (permalink / raw) To: Jamie Lokier; +Cc: Nicolas Mailhot, linux-kernel On Friday 13 February 2004 04.03, Jamie Lokier wrote: > Nicolas Mailhot wrote: > > But that's not a reason not to fix the core problem - I don't want to > > spent hours fixing filenames next time someone comes up with a new > > encoding. Please put valid encoding info somewhere or declare filenames > > are utf-8 od utf-16 only - changing user locale should not corrupt old > > data. > > If you attach encoding to names for a whole filesystem, you will get > really unpleasant bugs including security holes because some names > won't be writable, so the fs will either return error codes when those > names are used, or silently alter the names. Depends on how to handle those undecodeble file names. non-ascii filenames are probably a security issue (negative characters) with some apps. Making them inaccessible is definitely not ok. I proposed one version, although it might be a good idea to look at those file systems that handle the problem already so a uniform solution could be used that makes all filenames accessible regardless of which characters are used and doesn't cause unneccessary confusion as to what is the name. -- robin ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-13 3:03 ` Jamie Lokier 2004-02-13 10:07 ` Robin Rosenberg @ 2004-02-13 18:06 ` Nicolas Mailhot 2004-02-13 18:15 ` viro 1 sibling, 1 reply; 81+ messages in thread From: Nicolas Mailhot @ 2004-02-13 18:06 UTC (permalink / raw) To: Jamie Lokier; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 1326 bytes --] Le ven, 13/02/2004 à 03:03 +0000, Jamie Lokier a écrit : > Nicolas Mailhot wrote: > > But that's not a reason not to fix the core problem - I don't want to > > spent hours fixing filenames next time someone comes up with a new > > encoding. Please put valid encoding info somewhere or declare filenames > > are utf-8 od utf-16 only - changing user locale should not corrupt old > > data. > > If you attach encoding to names for a whole filesystem, you will get > really unpleasant bugs including security holes because some names > won't be writable, so the fs will either return error codes when those > names are used, or silently alter the names. You can have security holes now just by tricking an app into reading files written by another app which disagreed on the locale. And as for the filename problems : - just mangle existing invalid filenames when a default encoding is agreed upon - refuse to write new files with invalid filenames just like you would with the few names forbidden in ascii - apps will learn to cope. Some convention is needed, expecting it to materialise without os enforcement is deluding oneself, getting a change like this in place will definitely be painful but the current situation is far from painless for a lot of people. Regards, -- Nicolas Mailhot [-- Attachment #2: Ceci est une partie de message numériquement signée --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-13 18:06 ` Nicolas Mailhot @ 2004-02-13 18:15 ` viro 2004-02-13 18:24 ` Valdis.Kletnieks 2004-02-13 18:31 ` Richard B. Johnson 0 siblings, 2 replies; 81+ messages in thread From: viro @ 2004-02-13 18:15 UTC (permalink / raw) To: Nicolas Mailhot; +Cc: Jamie Lokier, linux-kernel On Fri, Feb 13, 2004 at 07:06:46PM +0100, Nicolas Mailhot wrote: > And as for the filename problems : > - just mangle existing invalid filenames when a default encoding is > agreed upon > - refuse to write new files with invalid filenames just like you would > with the few names forbidden in ascii - apps will learn to cope. What names forbidden in ASCII? ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-13 18:15 ` viro @ 2004-02-13 18:24 ` Valdis.Kletnieks 2004-02-13 18:31 ` viro 2004-02-13 18:31 ` Richard B. Johnson 1 sibling, 1 reply; 81+ messages in thread From: Valdis.Kletnieks @ 2004-02-13 18:24 UTC (permalink / raw) To: viro; +Cc: Nicolas Mailhot, Jamie Lokier, linux-kernel [-- Attachment #1: Type: text/plain, Size: 150 bytes --] On Fri, 13 Feb 2004 18:15:42 GMT, viro@parcelfarce.linux.theplanet.co.uk said: > What names forbidden in ASCII? Anything with a / or a \0 in it. ;) [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-13 18:24 ` Valdis.Kletnieks @ 2004-02-13 18:31 ` viro 2004-02-13 20:27 ` Jamie Lokier 0 siblings, 1 reply; 81+ messages in thread From: viro @ 2004-02-13 18:31 UTC (permalink / raw) To: Valdis.Kletnieks; +Cc: Nicolas Mailhot, Jamie Lokier, linux-kernel On Fri, Feb 13, 2004 at 01:24:33PM -0500, Valdis.Kletnieks@vt.edu wrote: > On Fri, 13 Feb 2004 18:15:42 GMT, viro@parcelfarce.linux.theplanet.co.uk said: > > > What names forbidden in ASCII? > > Anything with a / or a \0 in it. ;) You try and pass something _without_ \0 in it to the kernel ;-) ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-13 18:31 ` viro @ 2004-02-13 20:27 ` Jamie Lokier 0 siblings, 0 replies; 81+ messages in thread From: Jamie Lokier @ 2004-02-13 20:27 UTC (permalink / raw) To: viro; +Cc: Valdis.Kletnieks, Nicolas Mailhot, linux-kernel viro@parcelfarce.linux.theplanet.co.uk wrote: > You try and pass something _without_ \0 in it to the kernel ;-) :) But seriously, even that is a security issue when someone requests a URL containing "%00", or some text contains a filename to operate on and the name contains \0. For example, if I write a Perl regular expression to reject paths from the outside world containing "..": m{(?:/|^)\.\.(?:/|\z)}, it will fail to notice when given the path "..\0" that the kernel will treat it identically to "..". Potential security hole, depending on the context. -- Jamie ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-13 18:15 ` viro 2004-02-13 18:24 ` Valdis.Kletnieks @ 2004-02-13 18:31 ` Richard B. Johnson 2004-02-13 22:39 ` Robin Rosenberg 1 sibling, 1 reply; 81+ messages in thread From: Richard B. Johnson @ 2004-02-13 18:31 UTC (permalink / raw) To: viro; +Cc: Nicolas Mailhot, Jamie Lokier, linux-kernel On Fri, 13 Feb 2004 viro@parcelfarce.linux.theplanet.co.uk wrote: > On Fri, Feb 13, 2004 at 07:06:46PM +0100, Nicolas Mailhot wrote: > > And as for the filename problems : > > - just mangle existing invalid filenames when a default encoding is > > agreed upon > > - refuse to write new files with invalid filenames just like you would > > with the few names forbidden in ascii - apps will learn to cope. > > What names forbidden in ASCII? I think that all ASCII characters below 0x20 are forbidden in Unix file-names and others shown in the reference cited and "disapproved". http://www.med.nyu.edu/rcr/rcr/nyu_vms/unixfileanddirectorynames.htm Cheers, Dick Johnson Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips). Note 96.31% of all statistics are fiction. ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-13 18:31 ` Richard B. Johnson @ 2004-02-13 22:39 ` Robin Rosenberg 0 siblings, 0 replies; 81+ messages in thread From: Robin Rosenberg @ 2004-02-13 22:39 UTC (permalink / raw) To: root; +Cc: viro, Nicolas Mailhot, Jamie Lokier, linux-kernel On Friday 13 February 2004 19.31, Richard B. Johnson wrote: > On Fri, 13 Feb 2004 viro@parcelfarce.linux.theplanet.co.uk wrote: > > > On Fri, Feb 13, 2004 at 07:06:46PM +0100, Nicolas Mailhot wrote: > > > And as for the filename problems : > > > - just mangle existing invalid filenames when a default encoding is > > > agreed upon > > > - refuse to write new files with invalid filenames just like you would > > > with the few names forbidden in ascii - apps will learn to cope. > > > > What names forbidden in ASCII? > > I think that all ASCII characters below 0x20 are forbidden in > Unix file-names and others shown in the reference cited and > "disapproved". > > http://www.med.nyu.edu/rcr/rcr/nyu_vms/unixfileanddirectorynames.htm That's not really a formal definition of what's allowed. It's a recommendation for users on how to avoid detecting applications that cannot handle all file names, i.e. buggy applications. Try touch "$(/bin/ls -1|head)" and you will find apps that can handle the nice filename and those that cannot. I'm definitely not endorsing them and it would probably be wise to implement a system policy that allows administrators to ban such names as they represent security holes and all sorts of problems. Some filesystems forbid these names, but unix doesn't. -- robin ^ permalink raw reply [flat|nested] 81+ messages in thread
[parent not found: <04Feb13.015940est.41760@gpu.utcc.utoronto.ca>]
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) [not found] <04Feb13.015940est.41760@gpu.utcc.utoronto.ca> @ 2004-02-13 10:26 ` Robin Rosenberg 0 siblings, 0 replies; 81+ messages in thread From: Robin Rosenberg @ 2004-02-13 10:26 UTC (permalink / raw) To: Chris Siebenmann; +Cc: Linux kernel On Friday 13 February 2004 07.59, Chris Siebenmann wrote: > You write: > | Oh, I wasn't thinking of fixing *every* application out there, but > | making the kernel api's convert between the user locale and the file > | system locale, thus restricting the problems to places that can be > | fixed. > > Why should the kernel have any idea about locales, or care? (We've just > had an illustration, in this very thread, about why bits of the kernel > caring about locales is dangerous.) We have also explained why it's a problem and why "something" should care. The problem is clear, the solution is less clear. Some file systems already try to to handle the issue because the fs itself define the character set. That's the argument for solving the issue with other file systems the same way. It's also only the fs media that can reliably know this since media are movable these days. > Making the kernel convert between character sets also requires as a > corollary that the kernel know about all of the character sets, which is > both dangerous and liable to expand one's kernel impressively. That's NLS support, which is already there. Conceivably this could be a compile-time option for the file systems that due legacy do not state what character set/encoding is to be used so the system could be tuned for use in a homogeneous environment w.r.t locale. > Declaring that the kernel operates in a fixed locale amounts to > declaring that it will reject certain byte sequences for filenames > because it doesn't like how they smell, without clear technical need > for it. People generally object to their kernel restricting them for > such reasons. The needs are not "technical", they are "user" needs. (I hear them laughing in Redmond). -- robin ^ permalink raw reply [flat|nested] 81+ messages in thread
[parent not found: <04Feb13.024659est.41760@gpu.utcc.utoronto.ca>]
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) [not found] <04Feb13.024659est.41760@gpu.utcc.utoronto.ca> @ 2004-02-13 17:57 ` Nicolas Mailhot 0 siblings, 0 replies; 81+ messages in thread From: Nicolas Mailhot @ 2004-02-13 17:57 UTC (permalink / raw) To: chris.siebenmann; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 1653 bytes --] Le ven, 13/02/2004 à 02:46 -0500, Chris Siebenmann a écrit : > You write: > | Please put valid encoding info somewhere [...] > > There is no place for encoding information in the Unix API; Big surprise;) > you would > have to implement a new one. Even if the kernel is informed of process > locale when a process creates files, a new API that returns filename > encoding alongside the file name itself is necessary. And relying on > process locale on creation leads to undesirable results in some cases. > > | [...] or declare filenames are utf-8 od utf-16 only - changing user > | locale should not corrupt old data. > > Since not all byte sequences are valid UTF-8, this immediately means > that some old files are inaccessible since their filenames are now > illegal[*]. This also screws everyone who has no desire to work in > UTF-8, and it screws everyone completely if ever UTF-8 is decided to not > be the solution to the world's problems. So what ? Do you think an app that expects utf-8 filenames won't crash today when served a byte sequence that's invalid UTF-8 ? (or an app that expects ascii when served utf-8 oddities) The problem exists now - putting encoding info somewhere of agreeing on a common convention won't solve the legacy mess. What it will do is avoid we get stuck the same way in a decade. As long as an FS is shared by multiple apps/users agreeing on what the filenames mean exactly should not be revolutionary. And btw I don't care if it's UTF-8, UCS or something else. I just want a common ground so peple and apps can communicate sanely. Cheers, -- Nicolas Mailhot [-- Attachment #2: Ceci est une partie de message numériquement signée --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 81+ messages in thread
[parent not found: <1nioI-5Re-1@gated-at.bofh.it>]
[parent not found: <1orqh-6gs-47@gated-at.bofh.it>]
[parent not found: <1ozGR-60N-1@gated-at.bofh.it>]
[parent not found: <1oAa3-6pR-37@gated-at.bofh.it>]
[parent not found: <1oBpi-7pO-1@gated-at.bofh.it>]
[parent not found: <1oCbM-8oW-9@gated-at.bofh.it>]
[parent not found: <1p9Kl-7BV-1@gated-at.bofh.it>]
[parent not found: <1piXj-1d3-3@gated-at.bofh.it>]
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) [not found] ` <1piXj-1d3-3@gated-at.bofh.it> @ 2004-02-15 14:26 ` Pascal Schmidt [not found] ` <1pRLy-21o-31@gated-at.bofh.it> 1 sibling, 0 replies; 81+ messages in thread From: Pascal Schmidt @ 2004-02-15 14:26 UTC (permalink / raw) To: Jamie Lokier; +Cc: linux-kernel On Sun, 15 Feb 2004 02:10:05 +0100, you wrote in linux.kernel: > Then I did unicode_stop. Guess what: it put the display back in > iso-8859-1 for that virtual terminal, but the keyboard remained stuck > in UTF-8 for _all_ virtual terminals. kbd_mode -a to reset to ASCII mode. -- Ciao, Pascal ^ permalink raw reply [flat|nested] 81+ messages in thread
[parent not found: <1pRLy-21o-31@gated-at.bofh.it>]
[parent not found: <1pSRf-31Z-5@gated-at.bofh.it>]
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) [not found] ` <1pSRf-31Z-5@gated-at.bofh.it> @ 2004-02-16 15:44 ` Pascal Schmidt 2004-02-16 15:59 ` Valdis.Kletnieks 0 siblings, 1 reply; 81+ messages in thread From: Pascal Schmidt @ 2004-02-16 15:44 UTC (permalink / raw) To: Valdis.Kletnieks; +Cc: linux-kernel On Mon, 16 Feb 2004 16:30:13 +0100, you wrote in linux.kernel: > lurking corner cases too - what if you creat() a file, then do a > readdir() and strcmp() each entry looking for your file (while > comparing a filename smashed to UTF-8 to the original unsmashed string)? That's broken on multitasking systems anyway. Even if you find the same name, somebody (root process for example) might have unlinked your file and created another with the same name between you calling creat() and doing the readdir(). What would be the use of this, anyway? -- Ciao, Pascal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-16 15:44 ` Pascal Schmidt @ 2004-02-16 15:59 ` Valdis.Kletnieks 0 siblings, 0 replies; 81+ messages in thread From: Valdis.Kletnieks @ 2004-02-16 15:59 UTC (permalink / raw) To: Pascal Schmidt; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 239 bytes --] On Mon, 16 Feb 2004 16:44:48 +0100, Pascal Schmidt said: > file and created another with the same name between you calling creat() > and doing the readdir(). What would be the use of this, anyway? How does the shell do 'echo foo*'? [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 81+ messages in thread
[parent not found: <1pvrI-8bq-29@gated-at.bofh.it>]
[parent not found: <1pvrI-8bq-31@gated-at.bofh.it>]
[parent not found: <1pvrJ-8bq-33@gated-at.bofh.it>]
[parent not found: <1pvrJ-8bq-35@gated-at.bofh.it>]
[parent not found: <1pvrJ-8bq-37@gated-at.bofh.it>]
[parent not found: <1pvrJ-8bq-39@gated-at.bofh.it>]
[parent not found: <1pvrJ-8bq-41@gated-at.bofh.it>]
[parent not found: <1pvrJ-8bq-43@gated-at.bofh.it>]
[parent not found: <1pTay-3hc-13@gated-at.bofh.it>]
[parent not found: <1pTay-3hc-15@gated-at.bofh.it>]
[parent not found: <1pTay-3hc-11@gated-at.bofh.it>]
[parent not found: <1pTu7-3Ce-7@gated-at.bofh.it>]
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) [not found] ` <1pTu7-3Ce-7@gated-at.bofh.it> @ 2004-02-16 17:26 ` Pascal Schmidt 2004-02-16 17:58 ` Valdis.Kletnieks 0 siblings, 1 reply; 81+ messages in thread From: Pascal Schmidt @ 2004-02-16 17:26 UTC (permalink / raw) To: Valdis.Kletnieks; +Cc: linux-kernel On Mon, 16 Feb 2004 17:10:23 +0100, you wrote in linux.kernel: >> file and created another with the same name between you calling creat() >> and doing the readdir(). What would be the use of this, anyway? > How does the shell do 'echo foo*'? I fail to see the connection with creat() followed by readdir(). The shell is surely not expecting the names that follow from the glob expansion to have any relationship with previous shell operations. -- Ciao, Pascal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-16 17:26 ` Pascal Schmidt @ 2004-02-16 17:58 ` Valdis.Kletnieks 2004-02-16 19:48 ` Pascal Schmidt 0 siblings, 1 reply; 81+ messages in thread From: Valdis.Kletnieks @ 2004-02-16 17:58 UTC (permalink / raw) To: Pascal Schmidt; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 1637 bytes --] On Mon, 16 Feb 2004 18:26:47 +0100, Pascal Schmidt said: > On Mon, 16 Feb 2004 17:10:23 +0100, you wrote in linux.kernel: > > >> file and created another with the same name between you calling creat() > >> and doing the readdir(). What would be the use of this, anyway? > > How does the shell do 'echo foo*'? > > I fail to see the connection with creat() followed by readdir(). The shell > is surely not expecting the names that follow from the glob expansion to > have any relationship with previous shell operations Oh? % rm * % touch foo1 bar1 # this calls creat() or open() or similar % touch foo2 bar2 # as will this... % echo foo* # and this will do a readdir(), presumably Do you have any expectations what the echo will do? Obviously the glob DOES have a relationship with previous shell operations. The point is that *if* we assume that glibc is going to do some magic conversion when creating a file, we are assuming that glibc will *always* keep the conversion hidden. No matter what. Because the user now has expectations of what that file was called when he created it - the string he passed to open()/creat(). If what gets handed to the kernel is something different, we have to make sure that the user never finds out about it. And if there's special iso8859-* chars in the filename, this means that the magic handwave to convert to utf-8 inside glibc will either have to do it in-place (mangling the user-supplied filename, and bad karma) or it gets to call malloc() to get a work space (can't use a 'static char[MAXPATHLEN]', that's not thread-safe. This gets *very* interesting if the malloc() fails.. ;) [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) 2004-02-16 17:58 ` Valdis.Kletnieks @ 2004-02-16 19:48 ` Pascal Schmidt 0 siblings, 0 replies; 81+ messages in thread From: Pascal Schmidt @ 2004-02-16 19:48 UTC (permalink / raw) To: Valdis.Kletnieks; +Cc: linux-kernel On Mon, 16 Feb 2004 Valdis.Kletnieks@vt.edu wrote: > Oh? > > % rm * > % touch foo1 bar1 # this calls creat() or open() or similar > % touch foo2 bar2 # as will this... > % echo foo* # and this will do a readdir(), presumably > > Do you have any expectations what the echo will do? Obviously the glob > DOES have a relationship with previous shell operations. Yes, and? One may expect the echo to give "foo1 foo2", but that depends on a lot of side effect, such as no other processing doing things in the current directory. The same is true in a program - if you need to know whether you could create a file, the only sane way is to use creat() from an application and look at the return value. No other method is meaningful - arbitrary things can happen between creating a file and running readdir(). > The point is that *if* we assume that glibc is going to do some magic > conversion when creating a file, we are assuming that glibc will > *always* keep the conversion hidden. No matter what. Because the user > now has expectations of what that file was called when he created it - > the string he passed to open()/creat(). If what gets handed to the > kernel is something different, we have to make sure that the user never > finds out about it. That way lies madness, I agree. The sane thing (but breaks existing applications) would be to reject any filename that is not valid UTF-8, returning -EINVAL. I don't think *that* is going to happen, though. ;) -- Ciao, Pascal ^ permalink raw reply [flat|nested] 81+ messages in thread
end of thread, other threads:[~2004-02-16 21:44 UTC | newest]
Thread overview: 81+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-09 11:58 UTF-8 in file systems? xfs/extfs/etc Nico Schottelius
2004-02-09 12:26 ` Måns Rullgård
2004-02-09 12:28 ` Hugo Mills
2004-02-09 13:04 ` Matthew Reppert
2004-02-09 13:36 ` Matthias Urlichs
2004-02-10 4:32 ` Mike Fedyk
2004-02-10 4:53 ` Matthias Urlichs
2004-02-10 9:46 ` Robin Rosenberg
2004-02-10 23:04 ` jw schultz
2004-02-10 23:17 ` viro
2004-02-10 23:23 ` Måns Rullgård
2004-02-11 0:02 ` Mike Fedyk
2004-02-09 15:06 ` Matthew Garrett
2004-02-11 6:39 ` Tim Connors
2004-02-11 16:35 ` JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) Dave Kleikamp
2004-02-12 0:45 ` Andy Isaacson
2004-02-12 1:19 ` Tim Connors
2004-02-12 3:54 ` jw schultz
2004-02-12 12:03 ` Robin Rosenberg
2004-02-12 8:54 ` Jamie Lokier
2004-02-12 15:55 ` Robin Rosenberg
2004-02-12 16:17 ` John Bradford
2004-02-12 16:40 ` Robin Rosenberg
2004-02-12 17:16 ` John Bradford
2004-02-12 18:06 ` Robin Rosenberg
2004-02-12 19:08 ` John Bradford
2004-02-12 19:39 ` Robin Rosenberg
2004-02-12 21:13 ` John Bradford
2004-02-12 22:29 ` Robin Rosenberg
2004-02-12 22:50 ` Valdis.Kletnieks
2004-02-13 2:58 ` Jamie Lokier
2004-02-13 9:48 ` Robin Rosenberg
2004-02-13 3:15 ` Jamie Lokier
2004-02-14 15:24 ` Eduard Bloch
2004-02-13 0:17 ` Jamie Lokier
2004-02-13 0:38 ` Jamie Lokier
2004-02-13 1:16 ` Robin Rosenberg
2004-02-13 1:23 ` Jamie Lokier
2004-02-13 1:46 ` Robin Rosenberg
2004-02-13 2:29 ` viro
2004-02-13 3:23 ` Jamie Lokier
2004-02-14 15:09 ` Eduard Bloch
2004-02-15 1:01 ` Jamie Lokier
2004-02-16 14:03 ` Eduard Bloch
2004-02-16 14:28 ` Jamie Lokier
2004-02-16 19:22 ` Eduard Bloch
2004-02-16 21:44 ` Jamie Lokier
2004-02-16 15:18 ` Valdis.Kletnieks
2004-02-16 15:32 ` Jamie Lokier
2004-02-16 19:13 ` Eduard Bloch
2004-02-16 15:46 ` John Bradford
2004-02-16 15:48 ` viro
2004-02-16 16:43 ` John Bradford
2004-02-16 16:25 ` Robin Rosenberg
2004-02-16 15:27 ` Jamie Lokier
2004-02-16 15:44 ` Robin Rosenberg
2004-02-13 10:03 ` Robin Rosenberg
2004-02-13 10:22 ` vda
2004-02-13 10:29 ` Robin Rosenberg
2004-02-12 13:28 ` Dave Kleikamp
2004-02-12 15:26 ` Valdis.Kletnieks
2004-02-12 15:41 ` Dave Kleikamp
-- strict thread matches above, loose matches on Subject: below --
2004-02-12 16:50 Nicolas Mailhot
2004-02-12 18:12 ` Robin Rosenberg
2004-02-13 3:03 ` Jamie Lokier
2004-02-13 10:07 ` Robin Rosenberg
2004-02-13 18:06 ` Nicolas Mailhot
2004-02-13 18:15 ` viro
2004-02-13 18:24 ` Valdis.Kletnieks
2004-02-13 18:31 ` viro
2004-02-13 20:27 ` Jamie Lokier
2004-02-13 18:31 ` Richard B. Johnson
2004-02-13 22:39 ` Robin Rosenberg
[not found] <04Feb13.015940est.41760@gpu.utcc.utoronto.ca>
2004-02-13 10:26 ` Robin Rosenberg
[not found] <04Feb13.024659est.41760@gpu.utcc.utoronto.ca>
2004-02-13 17:57 ` Nicolas Mailhot
[not found] <1nioI-5Re-1@gated-at.bofh.it>
[not found] ` <1orqh-6gs-47@gated-at.bofh.it>
[not found] ` <1ozGR-60N-1@gated-at.bofh.it>
[not found] ` <1oAa3-6pR-37@gated-at.bofh.it>
[not found] ` <1oBpi-7pO-1@gated-at.bofh.it>
[not found] ` <1oCbM-8oW-9@gated-at.bofh.it>
[not found] ` <1p9Kl-7BV-1@gated-at.bofh.it>
[not found] ` <1piXj-1d3-3@gated-at.bofh.it>
2004-02-15 14:26 ` Pascal Schmidt
[not found] ` <1pRLy-21o-31@gated-at.bofh.it>
[not found] ` <1pSRf-31Z-5@gated-at.bofh.it>
2004-02-16 15:44 ` Pascal Schmidt
2004-02-16 15:59 ` Valdis.Kletnieks
[not found] <1pvrI-8bq-29@gated-at.bofh.it>
[not found] ` <1pvrI-8bq-31@gated-at.bofh.it>
[not found] ` <1pvrJ-8bq-33@gated-at.bofh.it>
[not found] ` <1pvrJ-8bq-35@gated-at.bofh.it>
[not found] ` <1pvrJ-8bq-37@gated-at.bofh.it>
[not found] ` <1pvrJ-8bq-39@gated-at.bofh.it>
[not found] ` <1pvrJ-8bq-41@gated-at.bofh.it>
[not found] ` <1pvrJ-8bq-43@gated-at.bofh.it>
[not found] ` <1pTay-3hc-13@gated-at.bofh.it>
[not found] ` <1pTay-3hc-15@gated-at.bofh.it>
[not found] ` <1pTay-3hc-11@gated-at.bofh.it>
[not found] ` <1pTu7-3Ce-7@gated-at.bofh.it>
2004-02-16 17:26 ` Pascal Schmidt
2004-02-16 17:58 ` Valdis.Kletnieks
2004-02-16 19:48 ` Pascal Schmidt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox