* Re: JFS default behavior
@ 2004-02-15 23:03 Nicolas Mailhot
2004-02-16 3:45 ` Jan Knutar
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Nicolas Mailhot @ 2004-02-15 23:03 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 784 bytes --]
| Linus Torvalds pointed the way of Tux :
| In short: the kernel talks bytestreams, and that implies that if you
| want to talk to the kernel, you HAVE TO USE UTF-8.
In that case :
- should the kernel allow apps to write filenames that are invalid
UTF-8 and will crash UTF-8 apps ?
- should this UTF-8 rule be noted somewhere (in a FAQ/man page/LSB spec/
whatever) so apps authors know they are supposed to read and write UTF-8
filenames and not apply locale rules to kernel objects ?
- what happens to already existing invalid UTF-8 filenames ? Should the
kernel forcibly rewrite them (in 2.7.0...) to remove legacy mess ? What
should happen if someone plug an unconverted FS in such a system
afterwards ?
These are the questions people have been asking.
[-- Attachment #2: Ceci est une partie de message numériquement signée --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: JFS default behavior 2004-02-15 23:03 JFS default behavior Nicolas Mailhot @ 2004-02-16 3:45 ` Jan Knutar 2004-02-16 8:30 ` Nicolas Mailhot 2004-02-16 6:21 ` jw schultz 2004-02-19 10:59 ` JFS default behavior / UTF-8 filenames kernel 2 siblings, 1 reply; 18+ messages in thread From: Jan Knutar @ 2004-02-16 3:45 UTC (permalink / raw) To: Nicolas Mailhot, linux-kernel > - what happens to already existing invalid UTF-8 filenames ? Should > the kernel forcibly rewrite them (in 2.7.0...) to remove legacy mess > ? What should happen if someone plug an unconverted FS in such a > system afterwards ? What I would like would be a userspace tool, that would recurse and convert filename encodings from specified locale to UTF-8. Something like "any2utf8 -from iso8859-1 -recurse /mnt/myoldmp3disk". Does anyone know if such a tool exists already? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: JFS default behavior 2004-02-16 3:45 ` Jan Knutar @ 2004-02-16 8:30 ` Nicolas Mailhot 2004-02-16 8:54 ` Valdis.Kletnieks 0 siblings, 1 reply; 18+ messages in thread From: Nicolas Mailhot @ 2004-02-16 8:30 UTC (permalink / raw) To: Jan Knutar; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 997 bytes --] Le lun, 16/02/2004 à 05:45 +0200, Jan Knutar a écrit : > > - what happens to already existing invalid UTF-8 filenames ? Should > > the kernel forcibly rewrite them (in 2.7.0...) to remove legacy mess > > ? What should happen if someone plug an unconverted FS in such a > > system afterwards ? > > What I would like would be a userspace tool, that would recurse and > convert filename encodings from specified locale to UTF-8. Something > like "any2utf8 -from iso8859-1 -recurse /mnt/myoldmp3disk". > Does anyone know if such a tool exists already? One can do find+ recode magic now The question is : - can this be automated ? - how can one recognise and unconverted fs ? - how can on guess the encodings(s) that have been used before on such an fs ? You're assuming the situation is merely a iso8859-1 to utf-8 migration. Far from it. The core problem is everyone damn wrote what it pleased him without considering future readers. Cheers, -- Nicolas Mailhot [-- Attachment #2: Ceci est une partie de message numériquement signée --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: JFS default behavior 2004-02-16 8:30 ` Nicolas Mailhot @ 2004-02-16 8:54 ` Valdis.Kletnieks 0 siblings, 0 replies; 18+ messages in thread From: Valdis.Kletnieks @ 2004-02-16 8:54 UTC (permalink / raw) To: Nicolas Mailhot; +Cc: Jan Knutar, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1325 bytes --] On Mon, 16 Feb 2004 09:30:41 +0100, Nicolas Mailhot said: > You're assuming the situation is merely a iso8859-1 to utf-8 migration. > Far from it. The core problem is everyone damn wrote what it pleased him > without considering future readers. Given the fact that there isn't in general any way for the kernel to know what was intended, I don't see how any kernel policy other than "NUL and / are special, but if you use anything other than UTF-8 it will eventually come back to haunt you" can possibly be made to work. For that matter, I have seen actual production code that intentionally created fairly deep directory trees and terminal file names that were basically hashes written in radix-254 and blatted out in binary. Lots of them. The original problem report I got was along the lines of "We installed XYZ, and the file system appears corrupted - ls -R weird the screen out, and 'find | wc -l' is 127,000 different than what 'df -i' reports". I was ready to strangle the guilty party - radix-64 wouldn't have been a big efficiency hit and at least the uuencode/base-64 charset doesn't weird your terminal out. :) So it's not even always possible to make the assumption that the filename is supposed to make sense in *any* charset. This one requires fixing in some combination of userspace and meatspace.... [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: JFS default behavior 2004-02-15 23:03 JFS default behavior Nicolas Mailhot 2004-02-16 3:45 ` Jan Knutar @ 2004-02-16 6:21 ` jw schultz 2004-02-16 15:55 ` Jamie Lokier 2004-02-19 10:59 ` JFS default behavior / UTF-8 filenames kernel 2 siblings, 1 reply; 18+ messages in thread From: jw schultz @ 2004-02-16 6:21 UTC (permalink / raw) To: linux-kernel On Mon, Feb 16, 2004 at 12:03:03AM +0100, Nicolas Mailhot wrote: > | Linus Torvalds pointed the way of Tux : > > | In short: the kernel talks bytestreams, and that implies that if you > | want to talk to the kernel, you HAVE TO USE UTF-8. > > In that case : > - should the kernel allow apps to write filenames that are invalid > UTF-8 and will crash UTF-8 apps ? Yes. The kernel interface specifies it as a bytesteam with 0x00 and 0x2f having special meaning. That is a constraint, not a policy. It is user space that determines the policy of UTF-8. > UTF-8 and will crash UTF-8 apps ? Fix the broken apps. Crashing because of "invalid" UTF-8 is no more excusable than crashing because of a string longer than expected (buffer overrun). Filenames as read from the filesystem should be treated just like any other untrusted input. > - should this UTF-8 rule be noted somewhere (in a FAQ/man page/LSB spec/ > whatever) so apps authors know they are supposed to read and write UTF-8 > filenames and not apply locale rules to kernel objects ? Since the LSB spec describes user space it might be a suitable place. > - what happens to already existing invalid UTF-8 filenames ? Should the > kernel forcibly rewrite them (in 2.7.0...) to remove legacy mess ? What If you have a filesystem with filenames that don't conform to your policy write userspace tools to detect and/or fix them. If you have programs creating non-conforming filenames, fix or rm those programs. > kernel forcibly rewrite them (in 2.7.0...) to remove legacy mess ? What > should happen if someone plug an unconverted FS in such a system > afterwards ? The kernel won't care. Any user space code that treats the filenames as something other than bytestreams should be able to cope with any sequence of bytes. > These are the questions people have been asking. OK. The questions have been asked and answered. Asking again and again and again won't change the answer. -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: jw@pegasys.ws Remember Cernan and Schmitt ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: JFS default behavior 2004-02-16 6:21 ` jw schultz @ 2004-02-16 15:55 ` Jamie Lokier 2004-02-17 6:47 ` jw schultz 0 siblings, 1 reply; 18+ messages in thread From: Jamie Lokier @ 2004-02-16 15:55 UTC (permalink / raw) To: jw schultz, linux-kernel jw schultz wrote: > If you have a filesystem with filenames that don't conform > to your policy write userspace tools to detect and/or fix > them. If you have programs creating non-conforming > filenames, fix or rm those programs. You do understand that GNU coreutils, bash etc. are among those programs, right? As in "touch zöe.txt" creates a non-conforming filename... > OK. The questions have been asked and answered. > Asking again and again and again won't change the answer. The question of what a program like this should do has not been answered: perl -e 'for (glob "*") { rename $_, "ņi-".$_ or die "rename: $!\n"; }' (NB: The prefix string is N WITH CEDILLA followed by "i-"). Hint: it mangles perfectly fine non-ASCII file names, instead of just prefixing the prefix string. If you change the program to correctly prepend the prefix string, then it mangles non-UTF-8 names, which is arguably correct, but can result in you losing some files. This _is_ a userspace problem, but it is a genuine problem for which no good answer is yet apparent. -- Jamie ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: JFS default behavior 2004-02-16 15:55 ` Jamie Lokier @ 2004-02-17 6:47 ` jw schultz 2004-02-17 21:37 ` Jamie Lokier 0 siblings, 1 reply; 18+ messages in thread From: jw schultz @ 2004-02-17 6:47 UTC (permalink / raw) To: linux-kernel On Mon, Feb 16, 2004 at 03:55:34PM +0000, Jamie Lokier wrote: > jw schultz wrote: > > If you have a filesystem with filenames that don't conform > > to your policy write userspace tools to detect and/or fix > > them. If you have programs creating non-conforming > > filenames, fix or rm those programs. > > You do understand that GNU coreutils, bash etc. are among those Doesn't matter where they come from. > programs, right? As in "touch zöe.txt" creates a non-conforming > filename... Your concrete example is a good one. Where did that filename come from? It would seem to have come from the keyboard via a tty (or simulator) which also had to display it. I'd say this is an argument for the terminal to display UTF-8 and convert intput into UTF-8. That is something that seems to be not consistantly done as yet. Ultimately it seems to be a responsiblity of the user interface, whether tty or GUI. Until that happens the shells might be able to fill the gap, however poorly. Perhaps the utilities that don't attempt to interpret filenames should treat filenames exactly like the kernel does. > > OK. The questions have been asked and answered. > > Asking again and again and again won't change the answer. > > The question of what a program like this should do has not been > answered: > > perl -e 'for (glob "*") { rename $_, "??i-".$_ or die "rename: $!\n"; }' > > (NB: The prefix string is N WITH CEDILLA followed by "i-"). > > Hint: it mangles perfectly fine non-ASCII file names, instead of just > prefixing the prefix string. If you change the program to correctly > prepend the prefix string, then it mangles non-UTF-8 names, which is > arguably correct, but can result in you losing some files. Then if there is incorrect behavior is it the shell, tty or perl that is getting things wrong here. > This _is_ a userspace problem, but it is a genuine problem for which > no good answer is yet apparent. I'll buy that. Then the first question to ask is "what is the correct forum for resolving this". -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: jw@pegasys.ws Remember Cernan and Schmitt ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: JFS default behavior 2004-02-17 6:47 ` jw schultz @ 2004-02-17 21:37 ` Jamie Lokier 2004-02-17 22:12 ` Linus Torvalds 0 siblings, 1 reply; 18+ messages in thread From: Jamie Lokier @ 2004-02-17 21:37 UTC (permalink / raw) To: jw schultz, linux-kernel jw schultz wrote: > Your concrete example is a good one. Where did that > filename come from? It would seem to have come from the > keyboard via a tty (or simulator) which also had to display > it. I'd say this is an argument for the terminal to display > UTF-8 and convert intput into UTF-8. That is something that > seems to be not consistantly done as yet. Ultimately it > seems to be a responsiblity of the user interface, whether > tty or GUI. Until that happens the shells might be able to > fill the gap, however poorly. Many terminals will not ever display UTF-8. Think: all the serial terminals. This is why I think "stty utf8" or something along those lines would be useful. The terminal itself doesn't have to talk UTF-8; however, the applications talking with /dev/tty would always see UTF-8. That seems to solve most of the practical user interface problems of the command line, in one single clean place. -- Jamie ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: JFS default behavior 2004-02-17 21:37 ` Jamie Lokier @ 2004-02-17 22:12 ` Linus Torvalds 2004-02-18 9:59 ` Jamie Lokier 0 siblings, 1 reply; 18+ messages in thread From: Linus Torvalds @ 2004-02-17 22:12 UTC (permalink / raw) To: Jamie Lokier; +Cc: jw schultz, linux-kernel On Tue, 17 Feb 2004, Jamie Lokier wrote: > > Many terminals will not ever display UTF-8. Think: all the serial terminals. > > This is why I think "stty utf8" or something along those lines would > be useful. The terminal itself doesn't have to talk UTF-8; however, > the applications talking with /dev/tty would always see UTF-8. > > That seems to solve most of the practical user interface problems of > the command line, in one single clean place. Doesn't "screen" already do this? I don't think you want to have the locale handling in the kernel, along with translation of multi-key characters (and from things like CJK terminals? I don't know what format they send). Sounds like you should use a user-mode thing that knows about locales... Linus ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: JFS default behavior 2004-02-17 22:12 ` Linus Torvalds @ 2004-02-18 9:59 ` Jamie Lokier 2004-02-18 15:54 ` Linus Torvalds 0 siblings, 1 reply; 18+ messages in thread From: Jamie Lokier @ 2004-02-18 9:59 UTC (permalink / raw) To: Linus Torvalds; +Cc: jw schultz, linux-kernel Linus Torvalds wrote: > Doesn't "screen" already do this? I don't think you want to have the > locale handling in the kernel, along with translation of multi-key > characters (and from things like CJK terminals? I don't know what format > they send). Sounds like you should use a user-mode thing that knows about > locales... Yes. I was thinking in a rather DEC VT100/Putty/xterm- centric way for a moment; please excuse the slip. It's irritating that logging in from the wrong kind of terminal doesn't just provide the right "user experience" for the command line automatically. It's also a pain that ssh doesn't inform the remote end whether the local terminal is UTF-8, so everything seem to be working fine until one day you discover typing "£" in an editor just beeps. Grr.. Oh well. These are all solvable in userspace. Then again, so were most of the other stty options; didn't stop them from being implemented in the kernel :) -- Jamie ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: JFS default behavior 2004-02-18 9:59 ` Jamie Lokier @ 2004-02-18 15:54 ` Linus Torvalds 2004-02-18 23:58 ` Jamie Lokier 0 siblings, 1 reply; 18+ messages in thread From: Linus Torvalds @ 2004-02-18 15:54 UTC (permalink / raw) To: Jamie Lokier; +Cc: jw schultz, linux-kernel On Wed, 18 Feb 2004, Jamie Lokier wrote: > > It's irritating that logging in from the wrong kind of terminal > doesn't just provide the right "user experience" for the command line > automatically. Well, you should be able to just start something "screen"-equivalent directly by just making it your default shell or have a fix to "login". The thing is, the kernel tty layer is happy to work with utf-8 (well, modulo the issues of erase etc - and Andries posted that patch already, and there are probably others like it) if your terminal supports it, but if your terminal doesn't have CJK supprt internally, then you need something to do the multi-character translations anyway in order to be able to input them in the first place. And that is _not_ an stty option. Btw, from the screen man-page it appears that screen is not able to do that either. You can put screen into utf-8 mode, but it sounds like it just means that it passes UTF-8 through, not that it does any translation from "latin1 vt100 to utf-8". I think there are a few editors that actually do ("mined" looks like it should do it). Linus ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: JFS default behavior 2004-02-18 15:54 ` Linus Torvalds @ 2004-02-18 23:58 ` Jamie Lokier 0 siblings, 0 replies; 18+ messages in thread From: Jamie Lokier @ 2004-02-18 23:58 UTC (permalink / raw) To: Linus Torvalds; +Cc: jw schultz, linux-kernel Linus Torvalds wrote: > Btw, from the screen man-page it appears that screen is not able to do > that either. You can put screen into utf-8 mode, but it sounds like it > just means that it passes UTF-8 through, not that it does any translation > from "latin1 vt100 to utf-8". Screen works nicely. Do this: echo 'defutf8 on' >> ~/.screenrc Then screen presents a UTF-8 interface to the shell and other programs, regardless of what kind of terminal you connect from :) (It's a bit overkill, no actually it's a lot overkill, and you have the annoyance of screen intercepting at least one commonly used editing key.) (Just remember to set the LANG environment variable to include ".UTF-8" so that screen-oriented programs know to display properly. I do it automatically using a script which queries the current terminal, to workaround ssh not forwarding LANG). > I think there are a few editors that actually do ("mined" looks like it > should do it). Emacs does, of course. -- Jamie ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: JFS default behavior / UTF-8 filenames 2004-02-15 23:03 JFS default behavior Nicolas Mailhot 2004-02-16 3:45 ` Jan Knutar 2004-02-16 6:21 ` jw schultz @ 2004-02-19 10:59 ` kernel 2004-02-19 14:05 ` Dave Kleikamp 2 siblings, 1 reply; 18+ messages in thread From: kernel @ 2004-02-19 10:59 UTC (permalink / raw) To: linux-kernel So then, just about everyone agrees that if you've got a filename with non-ASCII characters, you should pass it to creat() as UTF-8. You have to pass it as something, individual encodings like BIG5 and EUC-JP are unacceptable, and UCS-4's benefits over UTF-8 (simplicity and in VERY rare cases storage size reductions) aren't worth the stuff it breaks. Correct? As I see it, there's no way for the kernel to deal with all the legacy filenames out there. There's no way the kernel can magically fix them. So the only thing the kernel could do for those who want to see valid unicode is have an option to make UTF-8 only filesystems. Best would be if it was done at mkfs time and always enforced from then on in so that a non-UTF8 filename can never be created. Because if you want the kernel to not pass non-UTF8 filenames back to userspace, the ONLY clean way to do that is to make sure they're not there in the first place. You could maybe try it with a mount=utf8only flag, but the only thing that could do then would be to make the files with invalid filenames "disappear". For filesystems like JFS and NTFS, I think this is the best way in the long run, have the kernel output as UTF-8 by default, assume UTF-8 inputs, and reject non-UTF8 filenames because they can't really store the arbitrary string of bytes model anyway. For others which can, maybe leave it up to the filesystem creator whether to reject non-UTF8 filenames or to accept invalid ones as well? Either way, a well-written userspace app shouldn't barf on recieving invalid UTF-8 from the kernel, we'll have legacy filenames around for a good long time yet, and it's the only way to be portable to older linuxes and other UNIXes where you definatly would not be guaranteed valid UTF-8 no matter what new linux kernels decide. In any case, the important part is to make sure userspace stops writing filenames in BIG5 as soon as possible. I don't know if this can be done nicely in libc, with libc automagically transforming the BIG5 filename in open() to UTF-8 and the UTF-8 in readdir() to BIG5 based on the locale, or if we have to rely on every userspace app to store filenames in UTF-8 by themselves. But that's a decision for the glibc guys. It doesn't affect that filenames need to start being written to the filesystem in UTF-8 rather than other encodings, and that the only decision the kernel has to make is whether or not to reject attempts to create filenames which are invalid UTF-8. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: JFS default behavior / UTF-8 filenames 2004-02-19 10:59 ` JFS default behavior / UTF-8 filenames kernel @ 2004-02-19 14:05 ` Dave Kleikamp 2004-02-19 23:47 ` kernel 0 siblings, 1 reply; 18+ messages in thread From: Dave Kleikamp @ 2004-02-19 14:05 UTC (permalink / raw) To: kernel; +Cc: linux-kernel On Thu, 2004-02-19 at 04:59, kernel@mikebell.org wrote: > For filesystems like JFS and NTFS, I think this is the best way in the > long run, have the kernel output as UTF-8 by default, assume UTF-8 > inputs, and reject non-UTF8 filenames because they can't really store > the arbitrary string of bytes model anyway. Actually, I just submitted a patch to fix the default behavior of JFS to always treat the name as an arbitrary string. The previous default depended on the value of CONFIG_NLS_DEFAULT. Setting the mount option iocharset=utf8 will reject non-utf8 filenames as you propose. The arbitrary string of bytes is treated as the latin1 charset in that it is stored as 0x00nn (in UTF2), but JFS really doesn't care what the character set is. > For others which can, maybe leave it up to the filesystem creator > whether to reject non-UTF8 filenames or to accept invalid ones as well? It's been said before, but a posix-compliant file system should accept any bytes other that NUL and '/'. Shaggy -- David Kleikamp IBM Linux Technology Center ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: JFS default behavior / UTF-8 filenames 2004-02-19 14:05 ` Dave Kleikamp @ 2004-02-19 23:47 ` kernel 2004-02-20 15:00 ` Dave Kleikamp 0 siblings, 1 reply; 18+ messages in thread From: kernel @ 2004-02-19 23:47 UTC (permalink / raw) To: Dave Kleikamp; +Cc: linux-kernel On Thu, Feb 19, 2004 at 08:05:06AM -0600, Dave Kleikamp wrote: > The arbitrary string of bytes is treated as the latin1 charset in that > it is stored as 0x00nn (in UTF2), but JFS really doesn't care what the > character set is. While I don't really care one way or the other about the whole "rejecting non-UTF8 filenames" thing, trying to store 8bit strings in UTF2 (no such thing, is there? Is JFS UCS-2 or UTF-16?) seems really ugly. In general at least, maybe it's not so bad in JFS's case specifically because of there not being much sharing of JFS filesystems between linux and non-linux systems. But if JFS uses that "make the high byte zero and return the low byte only" scheme, what does it do when it encounters a UCS-2 filename that has a non-NUL high byte on an existing filesystem? I can't see any ways of dealing with this that aren't much more horribly broken than merely refusing to create filenames that aren't valid in the current encoding. If it throws the high byte away then you've made it impossible to open said files, and up to 256 files per character of the filename can now appear to have the same filename. So what does JFS do in its "throw away the high byte and store binary character strings in the low byte" mode? How does it deal with an existing filesystem that has filenames that don't conform to said rule? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: JFS default behavior / UTF-8 filenames 2004-02-19 23:47 ` kernel @ 2004-02-20 15:00 ` Dave Kleikamp 2004-02-22 19:22 ` kernel 0 siblings, 1 reply; 18+ messages in thread From: Dave Kleikamp @ 2004-02-20 15:00 UTC (permalink / raw) To: kernel; +Cc: linux-kernel On Thu, 2004-02-19 at 17:47, kernel@mikebell.org wrote: > While I don't really care one way or the other about the whole > "rejecting non-UTF8 filenames" thing, trying to store 8bit strings in > UTF2 (no such thing, is there? Is JFS UCS-2 or UTF-16?) UCS-2 - I can't keep this stuff straight. > seems really > ugly. In general at least, maybe it's not so bad in JFS's case > specifically because of there not being much sharing of JFS filesystems > between linux and non-linux systems. > > But if JFS uses that "make the high byte zero and return the low byte > only" scheme, what does it do when it encounters a UCS-2 filename that > has a non-NUL high byte on an existing filesystem? I can't see any ways > of dealing with this that aren't much more horribly broken than merely > refusing to create filenames that aren't valid in the current encoding. > If it throws the high byte away then you've made it impossible to open > said files, and up to 256 files per character of the filename can now > appear to have the same filename. > > So what does JFS do in its "throw away the high byte and store binary > character strings in the low byte" mode? How does it deal with an > existing filesystem that has filenames that don't conform to said rule? With no iocharset specified, a filename with such a character will be inaccessible. Probably the best thing for readdir to do is to substitute a '?' and print a message to the syslog to mount the volume with iocharset=utf8 to be able to access the file. Of course I would limit the number of printk's to something small. I'll submit a patch to do this. -- David Kleikamp IBM Linux Technology Center ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: JFS default behavior / UTF-8 filenames 2004-02-20 15:00 ` Dave Kleikamp @ 2004-02-22 19:22 ` kernel 2004-02-24 14:44 ` Dave Kleikamp 0 siblings, 1 reply; 18+ messages in thread From: kernel @ 2004-02-22 19:22 UTC (permalink / raw) To: Dave Kleikamp; +Cc: linux-kernel On Fri, Feb 20, 2004 at 09:00:58AM -0600, Dave Kleikamp wrote: > With no iocharset specified, a filename with such a character will be > inaccessible. Probably the best thing for readdir to do is to > substitute a '?' and print a message to the syslog to mount the volume > with iocharset=utf8 to be able to access the file. Of course I would > limit the number of printk's to something small. I'll submit a patch to > do this. And that's why I was saying I think UTF-8 mode is the "least broken" for any filesystem that stores filenames in a specific encoding rather than "as the client submitted it". And most especially for UCS-2/UTF-16 filesystems. I think the default for a filesystem should be something that absolutely will not disappear your files. So for NTFS/JFS, it should be UTF-8. And if a traditional UNIX filesystem wants to do a UTF-8 only mode, I think ideally it should be done at mkfs time rather than mount time. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: JFS default behavior / UTF-8 filenames 2004-02-22 19:22 ` kernel @ 2004-02-24 14:44 ` Dave Kleikamp 0 siblings, 0 replies; 18+ messages in thread From: Dave Kleikamp @ 2004-02-24 14:44 UTC (permalink / raw) To: kernel; +Cc: linux-kernel On Sun, 2004-02-22 at 13:22, kernel@mikebell.org wrote: > > And that's why I was saying I think UTF-8 mode is the "least broken" for > any filesystem that stores filenames in a specific encoding rather than > "as the client submitted it". And most especially for UCS-2/UTF-16 > filesystems. I receive a lot of complaints when JFS does not accept names because they contain an "invalid" character. Defaulting to UTF-8 will cause some non-utf-8 filenames to be rejected. The change I made makes the default behavior sane and posix-compliant. It won't make everybody happy, but it will provide predicable, sane behavior. > I think the default for a filesystem should be something that absolutely > will not disappear your files. So for NTFS/JFS, it should be UTF-8. And > if a traditional UNIX filesystem wants to do a UTF-8 only mode, I think > ideally it should be done at mkfs time rather than mount time. The biggest problem with changing the default now is that the behavior was unpredictable before. Now, the default behavior will not allow filenames to be stored with UCS-2 characters greater than 0x00ff, so there won't be inaccessible files unless the iocharset option has been used. This allows the average user to get sane behavior, but allows the flexibility of accessing the file system in a specific character set for those users who know what they are doing. -- David Kleikamp IBM Linux Technology Center ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2004-02-24 14:44 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-02-15 23:03 JFS default behavior Nicolas Mailhot 2004-02-16 3:45 ` Jan Knutar 2004-02-16 8:30 ` Nicolas Mailhot 2004-02-16 8:54 ` Valdis.Kletnieks 2004-02-16 6:21 ` jw schultz 2004-02-16 15:55 ` Jamie Lokier 2004-02-17 6:47 ` jw schultz 2004-02-17 21:37 ` Jamie Lokier 2004-02-17 22:12 ` Linus Torvalds 2004-02-18 9:59 ` Jamie Lokier 2004-02-18 15:54 ` Linus Torvalds 2004-02-18 23:58 ` Jamie Lokier 2004-02-19 10:59 ` JFS default behavior / UTF-8 filenames kernel 2004-02-19 14:05 ` Dave Kleikamp 2004-02-19 23:47 ` kernel 2004-02-20 15:00 ` Dave Kleikamp 2004-02-22 19:22 ` kernel 2004-02-24 14:44 ` Dave Kleikamp
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox