* Re: JFS default behavior
@ 2004-02-15 23:03 Nicolas Mailhot
2004-02-16 3:45 ` Jan Knutar
` (2 more replies)
0 siblings, 3 replies; 35+ messages in thread
From: Nicolas Mailhot @ 2004-02-15 23:03 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 784 bytes --]
| Linus Torvalds pointed the way of Tux :
| In short: the kernel talks bytestreams, and that implies that if you
| want to talk to the kernel, you HAVE TO USE UTF-8.
In that case :
- should the kernel allow apps to write filenames that are invalid
UTF-8 and will crash UTF-8 apps ?
- should this UTF-8 rule be noted somewhere (in a FAQ/man page/LSB spec/
whatever) so apps authors know they are supposed to read and write UTF-8
filenames and not apply locale rules to kernel objects ?
- what happens to already existing invalid UTF-8 filenames ? Should the
kernel forcibly rewrite them (in 2.7.0...) to remove legacy mess ? What
should happen if someone plug an unconverted FS in such a system
afterwards ?
These are the questions people have been asking.
[-- Attachment #2: Ceci est une partie de message numériquement signée --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: JFS default behavior
2004-02-15 23:03 JFS default behavior Nicolas Mailhot
@ 2004-02-16 3:45 ` Jan Knutar
2004-02-16 8:30 ` Nicolas Mailhot
2004-02-16 6:21 ` jw schultz
2004-02-19 10:59 ` JFS default behavior / UTF-8 filenames kernel
2 siblings, 1 reply; 35+ messages in thread
From: Jan Knutar @ 2004-02-16 3:45 UTC (permalink / raw)
To: Nicolas Mailhot, linux-kernel
> - what happens to already existing invalid UTF-8 filenames ? Should
> the kernel forcibly rewrite them (in 2.7.0...) to remove legacy mess
> ? What should happen if someone plug an unconverted FS in such a
> system afterwards ?
What I would like would be a userspace tool, that would recurse and
convert filename encodings from specified locale to UTF-8. Something
like "any2utf8 -from iso8859-1 -recurse /mnt/myoldmp3disk".
Does anyone know if such a tool exists already?
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: JFS default behavior
2004-02-16 3:45 ` Jan Knutar
@ 2004-02-16 8:30 ` Nicolas Mailhot
2004-02-16 8:54 ` Valdis.Kletnieks
0 siblings, 1 reply; 35+ messages in thread
From: Nicolas Mailhot @ 2004-02-16 8:30 UTC (permalink / raw)
To: Jan Knutar; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 997 bytes --]
Le lun, 16/02/2004 à 05:45 +0200, Jan Knutar a écrit :
> > - what happens to already existing invalid UTF-8 filenames ? Should
> > the kernel forcibly rewrite them (in 2.7.0...) to remove legacy mess
> > ? What should happen if someone plug an unconverted FS in such a
> > system afterwards ?
>
> What I would like would be a userspace tool, that would recurse and
> convert filename encodings from specified locale to UTF-8. Something
> like "any2utf8 -from iso8859-1 -recurse /mnt/myoldmp3disk".
> Does anyone know if such a tool exists already?
One can do find+ recode magic now
The question is :
- can this be automated ?
- how can one recognise and unconverted fs ?
- how can on guess the encodings(s) that have been used before on such
an fs ?
You're assuming the situation is merely a iso8859-1 to utf-8 migration.
Far from it. The core problem is everyone damn wrote what it pleased him
without considering future readers.
Cheers,
--
Nicolas Mailhot
[-- Attachment #2: Ceci est une partie de message numériquement signée --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: JFS default behavior
2004-02-16 8:30 ` Nicolas Mailhot
@ 2004-02-16 8:54 ` Valdis.Kletnieks
0 siblings, 0 replies; 35+ messages in thread
From: Valdis.Kletnieks @ 2004-02-16 8:54 UTC (permalink / raw)
To: Nicolas Mailhot; +Cc: Jan Knutar, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1325 bytes --]
On Mon, 16 Feb 2004 09:30:41 +0100, Nicolas Mailhot said:
> You're assuming the situation is merely a iso8859-1 to utf-8 migration.
> Far from it. The core problem is everyone damn wrote what it pleased him
> without considering future readers.
Given the fact that there isn't in general any way for the kernel to know what
was intended, I don't see how any kernel policy other than "NUL and / are
special, but if you use anything other than UTF-8 it will eventually come back
to haunt you" can possibly be made to work.
For that matter, I have seen actual production code that intentionally created
fairly deep directory trees and terminal file names that were basically hashes
written in radix-254 and blatted out in binary. Lots of them. The original
problem report I got was along the lines of "We installed XYZ, and the file
system appears corrupted - ls -R weird the screen out, and 'find | wc -l' is
127,000 different than what 'df -i' reports".
I was ready to strangle the guilty party - radix-64 wouldn't have been a big
efficiency hit and at least the uuencode/base-64 charset doesn't weird your
terminal out. :)
So it's not even always possible to make the assumption that the filename is
supposed to make sense in *any* charset. This one requires fixing in some
combination of userspace and meatspace....
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: JFS default behavior
2004-02-15 23:03 JFS default behavior Nicolas Mailhot
2004-02-16 3:45 ` Jan Knutar
@ 2004-02-16 6:21 ` jw schultz
2004-02-16 15:55 ` Jamie Lokier
2004-02-19 10:59 ` JFS default behavior / UTF-8 filenames kernel
2 siblings, 1 reply; 35+ messages in thread
From: jw schultz @ 2004-02-16 6:21 UTC (permalink / raw)
To: linux-kernel
On Mon, Feb 16, 2004 at 12:03:03AM +0100, Nicolas Mailhot wrote:
> | Linus Torvalds pointed the way of Tux :
>
> | In short: the kernel talks bytestreams, and that implies that if you
> | want to talk to the kernel, you HAVE TO USE UTF-8.
>
> In that case :
> - should the kernel allow apps to write filenames that are invalid
> UTF-8 and will crash UTF-8 apps ?
Yes. The kernel interface specifies it as a bytesteam with
0x00 and 0x2f having special meaning. That is a constraint,
not a policy. It is user space that determines the policy
of UTF-8.
> UTF-8 and will crash UTF-8 apps ?
Fix the broken apps. Crashing because of "invalid" UTF-8 is
no more excusable than crashing because of a string longer
than expected (buffer overrun). Filenames as read from the
filesystem should be treated just like any other untrusted
input.
> - should this UTF-8 rule be noted somewhere (in a FAQ/man page/LSB spec/
> whatever) so apps authors know they are supposed to read and write UTF-8
> filenames and not apply locale rules to kernel objects ?
Since the LSB spec describes user space it might be a
suitable place.
> - what happens to already existing invalid UTF-8 filenames ? Should the
> kernel forcibly rewrite them (in 2.7.0...) to remove legacy mess ? What
If you have a filesystem with filenames that don't conform
to your policy write userspace tools to detect and/or fix
them. If you have programs creating non-conforming
filenames, fix or rm those programs.
> kernel forcibly rewrite them (in 2.7.0...) to remove legacy mess ? What
> should happen if someone plug an unconverted FS in such a system
> afterwards ?
The kernel won't care. Any user space code that treats the
filenames as something other than bytestreams should be able
to cope with any sequence of bytes.
> These are the questions people have been asking.
OK. The questions have been asked and answered.
Asking again and again and again won't change the answer.
--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: jw@pegasys.ws
Remember Cernan and Schmitt
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: JFS default behavior
2004-02-16 6:21 ` jw schultz
@ 2004-02-16 15:55 ` Jamie Lokier
2004-02-17 6:47 ` jw schultz
0 siblings, 1 reply; 35+ messages in thread
From: Jamie Lokier @ 2004-02-16 15:55 UTC (permalink / raw)
To: jw schultz, linux-kernel
jw schultz wrote:
> If you have a filesystem with filenames that don't conform
> to your policy write userspace tools to detect and/or fix
> them. If you have programs creating non-conforming
> filenames, fix or rm those programs.
You do understand that GNU coreutils, bash etc. are among those
programs, right? As in "touch zöe.txt" creates a non-conforming
filename...
> OK. The questions have been asked and answered.
> Asking again and again and again won't change the answer.
The question of what a program like this should do has not been
answered:
perl -e 'for (glob "*") { rename $_, "ņi-".$_ or die "rename: $!\n"; }'
(NB: The prefix string is N WITH CEDILLA followed by "i-").
Hint: it mangles perfectly fine non-ASCII file names, instead of just
prefixing the prefix string. If you change the program to correctly
prepend the prefix string, then it mangles non-UTF-8 names, which is
arguably correct, but can result in you losing some files.
This _is_ a userspace problem, but it is a genuine problem for which
no good answer is yet apparent.
-- Jamie
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: JFS default behavior
2004-02-16 15:55 ` Jamie Lokier
@ 2004-02-17 6:47 ` jw schultz
2004-02-17 21:37 ` Jamie Lokier
0 siblings, 1 reply; 35+ messages in thread
From: jw schultz @ 2004-02-17 6:47 UTC (permalink / raw)
To: linux-kernel
On Mon, Feb 16, 2004 at 03:55:34PM +0000, Jamie Lokier wrote:
> jw schultz wrote:
> > If you have a filesystem with filenames that don't conform
> > to your policy write userspace tools to detect and/or fix
> > them. If you have programs creating non-conforming
> > filenames, fix or rm those programs.
>
> You do understand that GNU coreutils, bash etc. are among those
Doesn't matter where they come from.
> programs, right? As in "touch zöe.txt" creates a non-conforming
> filename...
Your concrete example is a good one. Where did that
filename come from? It would seem to have come from the
keyboard via a tty (or simulator) which also had to display
it. I'd say this is an argument for the terminal to display
UTF-8 and convert intput into UTF-8. That is something that
seems to be not consistantly done as yet. Ultimately it
seems to be a responsiblity of the user interface, whether
tty or GUI. Until that happens the shells might be able to
fill the gap, however poorly.
Perhaps the utilities that don't attempt to interpret
filenames should treat filenames exactly like the kernel
does.
> > OK. The questions have been asked and answered.
> > Asking again and again and again won't change the answer.
>
> The question of what a program like this should do has not been
> answered:
>
> perl -e 'for (glob "*") { rename $_, "??i-".$_ or die "rename: $!\n"; }'
>
> (NB: The prefix string is N WITH CEDILLA followed by "i-").
>
> Hint: it mangles perfectly fine non-ASCII file names, instead of just
> prefixing the prefix string. If you change the program to correctly
> prepend the prefix string, then it mangles non-UTF-8 names, which is
> arguably correct, but can result in you losing some files.
Then if there is incorrect behavior is it the shell, tty or perl that is
getting things wrong here.
> This _is_ a userspace problem, but it is a genuine problem for which
> no good answer is yet apparent.
I'll buy that. Then the first question to ask is "what is
the correct forum for resolving this".
--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: jw@pegasys.ws
Remember Cernan and Schmitt
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: JFS default behavior
2004-02-17 6:47 ` jw schultz
@ 2004-02-17 21:37 ` Jamie Lokier
2004-02-17 22:12 ` Linus Torvalds
0 siblings, 1 reply; 35+ messages in thread
From: Jamie Lokier @ 2004-02-17 21:37 UTC (permalink / raw)
To: jw schultz, linux-kernel
jw schultz wrote:
> Your concrete example is a good one. Where did that
> filename come from? It would seem to have come from the
> keyboard via a tty (or simulator) which also had to display
> it. I'd say this is an argument for the terminal to display
> UTF-8 and convert intput into UTF-8. That is something that
> seems to be not consistantly done as yet. Ultimately it
> seems to be a responsiblity of the user interface, whether
> tty or GUI. Until that happens the shells might be able to
> fill the gap, however poorly.
Many terminals will not ever display UTF-8. Think: all the serial terminals.
This is why I think "stty utf8" or something along those lines would
be useful. The terminal itself doesn't have to talk UTF-8; however,
the applications talking with /dev/tty would always see UTF-8.
That seems to solve most of the practical user interface problems of
the command line, in one single clean place.
-- Jamie
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: JFS default behavior
2004-02-17 21:37 ` Jamie Lokier
@ 2004-02-17 22:12 ` Linus Torvalds
2004-02-18 9:59 ` Jamie Lokier
0 siblings, 1 reply; 35+ messages in thread
From: Linus Torvalds @ 2004-02-17 22:12 UTC (permalink / raw)
To: Jamie Lokier; +Cc: jw schultz, linux-kernel
On Tue, 17 Feb 2004, Jamie Lokier wrote:
>
> Many terminals will not ever display UTF-8. Think: all the serial terminals.
>
> This is why I think "stty utf8" or something along those lines would
> be useful. The terminal itself doesn't have to talk UTF-8; however,
> the applications talking with /dev/tty would always see UTF-8.
>
> That seems to solve most of the practical user interface problems of
> the command line, in one single clean place.
Doesn't "screen" already do this? I don't think you want to have the
locale handling in the kernel, along with translation of multi-key
characters (and from things like CJK terminals? I don't know what format
they send). Sounds like you should use a user-mode thing that knows about
locales...
Linus
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: JFS default behavior
2004-02-17 22:12 ` Linus Torvalds
@ 2004-02-18 9:59 ` Jamie Lokier
2004-02-18 15:54 ` Linus Torvalds
0 siblings, 1 reply; 35+ messages in thread
From: Jamie Lokier @ 2004-02-18 9:59 UTC (permalink / raw)
To: Linus Torvalds; +Cc: jw schultz, linux-kernel
Linus Torvalds wrote:
> Doesn't "screen" already do this? I don't think you want to have the
> locale handling in the kernel, along with translation of multi-key
> characters (and from things like CJK terminals? I don't know what format
> they send). Sounds like you should use a user-mode thing that knows about
> locales...
Yes. I was thinking in a rather DEC VT100/Putty/xterm- centric way
for a moment; please excuse the slip.
It's irritating that logging in from the wrong kind of terminal
doesn't just provide the right "user experience" for the command line
automatically. It's also a pain that ssh doesn't inform the remote
end whether the local terminal is UTF-8, so everything seem to be
working fine until one day you discover typing "£" in an editor just
beeps. Grr.. Oh well.
These are all solvable in userspace. Then again, so were most of the
other stty options; didn't stop them from being implemented in the kernel :)
-- Jamie
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: JFS default behavior
2004-02-18 9:59 ` Jamie Lokier
@ 2004-02-18 15:54 ` Linus Torvalds
2004-02-18 23:58 ` Jamie Lokier
0 siblings, 1 reply; 35+ messages in thread
From: Linus Torvalds @ 2004-02-18 15:54 UTC (permalink / raw)
To: Jamie Lokier; +Cc: jw schultz, linux-kernel
On Wed, 18 Feb 2004, Jamie Lokier wrote:
>
> It's irritating that logging in from the wrong kind of terminal
> doesn't just provide the right "user experience" for the command line
> automatically.
Well, you should be able to just start something "screen"-equivalent
directly by just making it your default shell or have a fix to "login".
The thing is, the kernel tty layer is happy to work with utf-8 (well,
modulo the issues of erase etc - and Andries posted that patch already,
and there are probably others like it) if your terminal supports it, but
if your terminal doesn't have CJK supprt internally, then you need
something to do the multi-character translations anyway in order to be
able to input them in the first place.
And that is _not_ an stty option.
Btw, from the screen man-page it appears that screen is not able to do
that either. You can put screen into utf-8 mode, but it sounds like it
just means that it passes UTF-8 through, not that it does any translation
from "latin1 vt100 to utf-8".
I think there are a few editors that actually do ("mined" looks like it
should do it).
Linus
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: JFS default behavior
2004-02-18 15:54 ` Linus Torvalds
@ 2004-02-18 23:58 ` Jamie Lokier
0 siblings, 0 replies; 35+ messages in thread
From: Jamie Lokier @ 2004-02-18 23:58 UTC (permalink / raw)
To: Linus Torvalds; +Cc: jw schultz, linux-kernel
Linus Torvalds wrote:
> Btw, from the screen man-page it appears that screen is not able to do
> that either. You can put screen into utf-8 mode, but it sounds like it
> just means that it passes UTF-8 through, not that it does any translation
> from "latin1 vt100 to utf-8".
Screen works nicely. Do this:
echo 'defutf8 on' >> ~/.screenrc
Then screen presents a UTF-8 interface to the shell and other
programs, regardless of what kind of terminal you connect from :)
(It's a bit overkill, no actually it's a lot overkill, and you have the
annoyance of screen intercepting at least one commonly used editing key.)
(Just remember to set the LANG environment variable to include
".UTF-8" so that screen-oriented programs know to display properly. I
do it automatically using a script which queries the current terminal,
to workaround ssh not forwarding LANG).
> I think there are a few editors that actually do ("mined" looks like it
> should do it).
Emacs does, of course.
-- Jamie
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: JFS default behavior / UTF-8 filenames
2004-02-15 23:03 JFS default behavior Nicolas Mailhot
2004-02-16 3:45 ` Jan Knutar
2004-02-16 6:21 ` jw schultz
@ 2004-02-19 10:59 ` kernel
2004-02-19 14:05 ` Dave Kleikamp
2 siblings, 1 reply; 35+ messages in thread
From: kernel @ 2004-02-19 10:59 UTC (permalink / raw)
To: linux-kernel
So then, just about everyone agrees that if you've got a filename with
non-ASCII characters, you should pass it to creat() as UTF-8. You have
to pass it as something, individual encodings like BIG5 and EUC-JP
are unacceptable, and UCS-4's benefits over UTF-8 (simplicity and in
VERY rare cases storage size reductions) aren't worth the stuff it
breaks. Correct?
As I see it, there's no way for the kernel to deal with all the legacy
filenames out there. There's no way the kernel can magically fix them.
So the only thing the kernel could do for those who want to see valid
unicode is have an option to make UTF-8 only filesystems. Best would be
if it was done at mkfs time and always enforced from then on in so that
a non-UTF8 filename can never be created. Because if you want the kernel
to not pass non-UTF8 filenames back to userspace, the ONLY clean way to
do that is to make sure they're not there in the first place. You could
maybe try it with a mount=utf8only flag, but the only thing that could
do then would be to make the files with invalid filenames "disappear".
For filesystems like JFS and NTFS, I think this is the best way in the
long run, have the kernel output as UTF-8 by default, assume UTF-8
inputs, and reject non-UTF8 filenames because they can't really store
the arbitrary string of bytes model anyway.
For others which can, maybe leave it up to the filesystem creator
whether to reject non-UTF8 filenames or to accept invalid ones as well?
Either way, a well-written userspace app shouldn't barf on recieving
invalid UTF-8 from the kernel, we'll have legacy filenames around for a
good long time yet, and it's the only way to be portable to older
linuxes and other UNIXes where you definatly would not be guaranteed
valid UTF-8 no matter what new linux kernels decide.
In any case, the important part is to make sure userspace stops writing
filenames in BIG5 as soon as possible. I don't know if this can be done
nicely in libc, with libc automagically transforming the BIG5 filename
in open() to UTF-8 and the UTF-8 in readdir() to BIG5 based on the
locale, or if we have to rely on every userspace app to store filenames
in UTF-8 by themselves. But that's a decision for the glibc guys. It
doesn't affect that filenames need to start being written to the
filesystem in UTF-8 rather than other encodings, and that the only
decision the kernel has to make is whether or not to reject attempts to
create filenames which are invalid UTF-8.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: JFS default behavior / UTF-8 filenames
2004-02-19 10:59 ` JFS default behavior / UTF-8 filenames kernel
@ 2004-02-19 14:05 ` Dave Kleikamp
2004-02-19 23:47 ` kernel
0 siblings, 1 reply; 35+ messages in thread
From: Dave Kleikamp @ 2004-02-19 14:05 UTC (permalink / raw)
To: kernel; +Cc: linux-kernel
On Thu, 2004-02-19 at 04:59, kernel@mikebell.org wrote:
> For filesystems like JFS and NTFS, I think this is the best way in the
> long run, have the kernel output as UTF-8 by default, assume UTF-8
> inputs, and reject non-UTF8 filenames because they can't really store
> the arbitrary string of bytes model anyway.
Actually, I just submitted a patch to fix the default behavior of JFS to
always treat the name as an arbitrary string. The previous default
depended on the value of CONFIG_NLS_DEFAULT. Setting the mount option
iocharset=utf8 will reject non-utf8 filenames as you propose.
The arbitrary string of bytes is treated as the latin1 charset in that
it is stored as 0x00nn (in UTF2), but JFS really doesn't care what the
character set is.
> For others which can, maybe leave it up to the filesystem creator
> whether to reject non-UTF8 filenames or to accept invalid ones as well?
It's been said before, but a posix-compliant file system should accept
any bytes other that NUL and '/'.
Shaggy
--
David Kleikamp
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: JFS default behavior / UTF-8 filenames
2004-02-19 14:05 ` Dave Kleikamp
@ 2004-02-19 23:47 ` kernel
2004-02-20 15:00 ` Dave Kleikamp
0 siblings, 1 reply; 35+ messages in thread
From: kernel @ 2004-02-19 23:47 UTC (permalink / raw)
To: Dave Kleikamp; +Cc: linux-kernel
On Thu, Feb 19, 2004 at 08:05:06AM -0600, Dave Kleikamp wrote:
> The arbitrary string of bytes is treated as the latin1 charset in that
> it is stored as 0x00nn (in UTF2), but JFS really doesn't care what the
> character set is.
While I don't really care one way or the other about the whole
"rejecting non-UTF8 filenames" thing, trying to store 8bit strings in
UTF2 (no such thing, is there? Is JFS UCS-2 or UTF-16?) seems really
ugly. In general at least, maybe it's not so bad in JFS's case
specifically because of there not being much sharing of JFS filesystems
between linux and non-linux systems.
But if JFS uses that "make the high byte zero and return the low byte
only" scheme, what does it do when it encounters a UCS-2 filename that
has a non-NUL high byte on an existing filesystem? I can't see any ways
of dealing with this that aren't much more horribly broken than merely
refusing to create filenames that aren't valid in the current encoding.
If it throws the high byte away then you've made it impossible to open
said files, and up to 256 files per character of the filename can now
appear to have the same filename.
So what does JFS do in its "throw away the high byte and store binary
character strings in the low byte" mode? How does it deal with an
existing filesystem that has filenames that don't conform to said rule?
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: JFS default behavior / UTF-8 filenames
2004-02-19 23:47 ` kernel
@ 2004-02-20 15:00 ` Dave Kleikamp
2004-02-22 19:22 ` kernel
0 siblings, 1 reply; 35+ messages in thread
From: Dave Kleikamp @ 2004-02-20 15:00 UTC (permalink / raw)
To: kernel; +Cc: linux-kernel
On Thu, 2004-02-19 at 17:47, kernel@mikebell.org wrote:
> While I don't really care one way or the other about the whole
> "rejecting non-UTF8 filenames" thing, trying to store 8bit strings in
> UTF2 (no such thing, is there? Is JFS UCS-2 or UTF-16?)
UCS-2 - I can't keep this stuff straight.
> seems really
> ugly. In general at least, maybe it's not so bad in JFS's case
> specifically because of there not being much sharing of JFS filesystems
> between linux and non-linux systems.
>
> But if JFS uses that "make the high byte zero and return the low byte
> only" scheme, what does it do when it encounters a UCS-2 filename that
> has a non-NUL high byte on an existing filesystem? I can't see any ways
> of dealing with this that aren't much more horribly broken than merely
> refusing to create filenames that aren't valid in the current encoding.
> If it throws the high byte away then you've made it impossible to open
> said files, and up to 256 files per character of the filename can now
> appear to have the same filename.
>
> So what does JFS do in its "throw away the high byte and store binary
> character strings in the low byte" mode? How does it deal with an
> existing filesystem that has filenames that don't conform to said rule?
With no iocharset specified, a filename with such a character will be
inaccessible. Probably the best thing for readdir to do is to
substitute a '?' and print a message to the syslog to mount the volume
with iocharset=utf8 to be able to access the file. Of course I would
limit the number of printk's to something small. I'll submit a patch to
do this.
--
David Kleikamp
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: JFS default behavior / UTF-8 filenames
2004-02-20 15:00 ` Dave Kleikamp
@ 2004-02-22 19:22 ` kernel
2004-02-24 14:44 ` Dave Kleikamp
0 siblings, 1 reply; 35+ messages in thread
From: kernel @ 2004-02-22 19:22 UTC (permalink / raw)
To: Dave Kleikamp; +Cc: linux-kernel
On Fri, Feb 20, 2004 at 09:00:58AM -0600, Dave Kleikamp wrote:
> With no iocharset specified, a filename with such a character will be
> inaccessible. Probably the best thing for readdir to do is to
> substitute a '?' and print a message to the syslog to mount the volume
> with iocharset=utf8 to be able to access the file. Of course I would
> limit the number of printk's to something small. I'll submit a patch to
> do this.
And that's why I was saying I think UTF-8 mode is the "least broken" for
any filesystem that stores filenames in a specific encoding rather than
"as the client submitted it". And most especially for UCS-2/UTF-16
filesystems.
I think the default for a filesystem should be something that absolutely
will not disappear your files. So for NTFS/JFS, it should be UTF-8. And
if a traditional UNIX filesystem wants to do a UTF-8 only mode, I think
ideally it should be done at mkfs time rather than mount time.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: JFS default behavior / UTF-8 filenames
2004-02-22 19:22 ` kernel
@ 2004-02-24 14:44 ` Dave Kleikamp
0 siblings, 0 replies; 35+ messages in thread
From: Dave Kleikamp @ 2004-02-24 14:44 UTC (permalink / raw)
To: kernel; +Cc: linux-kernel
On Sun, 2004-02-22 at 13:22, kernel@mikebell.org wrote:
>
> And that's why I was saying I think UTF-8 mode is the "least broken" for
> any filesystem that stores filenames in a specific encoding rather than
> "as the client submitted it". And most especially for UCS-2/UTF-16
> filesystems.
I receive a lot of complaints when JFS does not accept names because
they contain an "invalid" character. Defaulting to UTF-8 will cause
some non-utf-8 filenames to be rejected. The change I made makes the
default behavior sane and posix-compliant. It won't make everybody
happy, but it will provide predicable, sane behavior.
> I think the default for a filesystem should be something that absolutely
> will not disappear your files. So for NTFS/JFS, it should be UTF-8. And
> if a traditional UNIX filesystem wants to do a UTF-8 only mode, I think
> ideally it should be done at mkfs time rather than mount time.
The biggest problem with changing the default now is that the behavior
was unpredictable before. Now, the default behavior will not allow
filenames to be stored with UCS-2 characters greater than 0x00ff, so
there won't be inaccessible files unless the iocharset option has been
used. This allows the average user to get sane behavior, but allows the
flexibility of accessing the file system in a specific character set for
those users who know what they are doing.
--
David Kleikamp
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 35+ messages in thread
[parent not found: <1pvUz-6j-1@gated-at.bofh.it>]
* Re: JFS default behavior
@ 2004-02-15 14:48 Pascal Schmidt
2004-02-16 14:24 ` Eduard Bloch
0 siblings, 1 reply; 35+ messages in thread
From: Pascal Schmidt @ 2004-02-15 14:48 UTC (permalink / raw)
To: Jamie Lokier; +Cc: linux-kernel
>> Then I did unicode_stop. Guess what: it put the display back in
>> iso-8859-1 for that virtual terminal, but the keyboard remained
>> stuck in UTF-8 for _all_ virtual terminals.
> kbd_mode -a to reset to ASCII mode.
And as I just figured out, loadkeys has to be invoked again, also.
I can go to utf-8 with:
setfont lat0-16
kbd_mode -u
loadkeys de-latin1-nodeadkeys
and return to latin-1 with:
setfont lat1-16
kbd_mode -a
loadkeys de-latin1-nodeadkeys
Without the loadkeys after returning to latin-1 mode, I can no longer
input umlauts and other special characters correctly.
--
Ciao,
Pascal
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: JFS default behavior
2004-02-15 14:48 Pascal Schmidt
@ 2004-02-16 14:24 ` Eduard Bloch
0 siblings, 0 replies; 35+ messages in thread
From: Eduard Bloch @ 2004-02-16 14:24 UTC (permalink / raw)
To: Pascal Schmidt; +Cc: Jamie Lokier, linux-kernel
Moin Pascal!
Pascal Schmidt schrieb am Sunday, den 15. February 2004:
> >> iso-8859-1 for that virtual terminal, but the keyboard remained
> >> stuck in UTF-8 for _all_ virtual terminals.
> > kbd_mode -a to reset to ASCII mode.
>
> And as I just figured out, loadkeys has to be invoked again, also.
>
> I can go to utf-8 with:
>
> setfont lat0-16
> kbd_mode -u
> loadkeys de-latin1-nodeadkeys
When I do this, I still cannot enter unicode chars "as usual". I see
them, mutt (for example) displays everything correct with a UTF-8
locale. However, I cannot insert them correctly. When I use vim, I have
to press another key (eg. Space) 2..4 times after an umlaut was pressed,
only then the char appears.
Needless to say that the same applications work fine in X with the same
UTF-8 locale.
Regards,
Eduard.
--
Lob ist eine gewaltige Antriebskraft, dessen Zauber seine Wirkung nie
verfehlt.
-- Andor Foldes
^ permalink raw reply [flat|nested] 35+ messages in thread
[parent not found: <04Feb13.163954est.41760@gpu.utcc.utoronto.ca>]
* Re: JFS default behavior
[not found] <04Feb13.163954est.41760@gpu.utcc.utoronto.ca>
@ 2004-02-14 14:27 ` Nicolas Mailhot
2004-02-14 15:40 ` viro
0 siblings, 1 reply; 35+ messages in thread
From: Nicolas Mailhot @ 2004-02-14 14:27 UTC (permalink / raw)
To: chris.siebenmann; +Cc: linux-kernel
Chris Siebenmann wrote:
> You write:
> | So what ?
> | Do you think an app that expects utf-8 filenames won't crash today when
> | served a byte sequence that's invalid UTF-8 ? (or an app that expects
> | ascii when served utf-8 oddities)
>
> Such apps are buggy and need to be fixed.
Well, this means every single java app right now at least.
> This is not Unix's problem,
The w2k problem was at the app level mostly.
It would not have been OS responsibility to fix it.
*However* since the unix time conventions were a bit more sane than
other os, the damage was less.
> any more than it is Unix's problem if an application frees memory twice,
> writes over unallocated memory, or destroys its stack.
The core os responsability is to share sanely ressources between apps.
Filenames are a shared ressource.
When encodings starts to be incompatible, resulting in applications
crashes it's the OS job to define and enforce sane conventions so apps
can coexist together.
Past oversights should not mean the problem should not be fixed
(especially if solutions exist, even if they are not totally painless).
There is no more justification to keep encoding undefined as there is to
keep time zone undefined. Last I've seen we're all pretty happy system
time actually means something on unix (unlike other systems where it can
be anything depending on the location where the initial installation was
performed).
> If all you care about is the future, you need no kernel support.
> Declare that all filesystem names are written in UTF-8, and make your
> tools deal with it. (Most will not care. A few will have to be fixed a
> bit.)
Tools won't change unless they're forced to. That's a plain fact.
As you wrote there shouldn't be a lot of fixups to do, since apps that
can't deal with utf-8 now use ascii-only filenames anyway, but the few
fixups that are needed won't happen without a little OS prodding.
(and without OS enforcement illegal utf-8 filename injection will remain
a security risk)
And I write utf8 here, but any unicode form is fine with me as long as
it's clearly defined and enforced by the FSs.
Cheers,
--
Nicolas Mailhot
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: JFS default behavior
2004-02-14 14:27 ` Nicolas Mailhot
@ 2004-02-14 15:40 ` viro
2004-02-14 17:47 ` Nicolas Mailhot
2004-02-14 23:06 ` Robin Rosenberg
0 siblings, 2 replies; 35+ messages in thread
From: viro @ 2004-02-14 15:40 UTC (permalink / raw)
To: Nicolas Mailhot; +Cc: chris.siebenmann, linux-kernel
On Sat, Feb 14, 2004 at 03:27:50PM +0100, Nicolas Mailhot wrote:
> There is no more justification to keep encoding undefined as there is to
> keep time zone undefined. Last I've seen we're all pretty happy system
> time actually means something on unix (unlike other systems where it can
> be anything depending on the location where the initial installation was
> performed).
"System time" is amount of time elapsed since the epoch. Period. What does
it have to any timezone?
The only place where timezone enters the picture is conversion of time to
year:month:day:hours:minutes:seconds and that's
a) process-dependent and
b) done outside of kernel
The same goes for file names. Filename is a sequence of bytes, no more and
no less. Anything beyond that belongs to applications.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: JFS default behavior
2004-02-14 15:40 ` viro
@ 2004-02-14 17:47 ` Nicolas Mailhot
2004-02-14 17:59 ` Nicolas Mailhot
2004-02-14 23:06 ` Robin Rosenberg
1 sibling, 1 reply; 35+ messages in thread
From: Nicolas Mailhot @ 2004-02-14 17:47 UTC (permalink / raw)
To: viro; +Cc: chris.siebenmann, linux-kernel
viro@parcelfarce.linux.theplanet.co.uk wrote:
> On Sat, Feb 14, 2004 at 03:27:50PM +0100, Nicolas Mailhot wrote:
>
>>There is no more justification to keep encoding undefined as there is to
>>keep time zone undefined. Last I've seen we're all pretty happy system
>>time actually means something on unix (unlike other systems where it can
>>be anything depending on the location where the initial installation was
>>performed).
>
>
> "System time" is amount of time elapsed since the epoch. Period. What does
> it have to any timezone?
And everyone agrees on the epoch and that's why it works.
(just like sensors output is not just any numerical value but has a
well-defined unit)
With filenames we have a value but what it means exactly is a matter of
conjecture. That's the problem.
(it wouldn't be if filenames were just magic cookies that never needed
to be interpreted but there's a lot of actors, be it apps or humans that
need to agree on what the byte string)
Cheers,
--
Nicolas Mailhot
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: JFS default behavior
2004-02-14 15:40 ` viro
2004-02-14 17:47 ` Nicolas Mailhot
@ 2004-02-14 23:06 ` Robin Rosenberg
2004-02-14 23:29 ` viro
1 sibling, 1 reply; 35+ messages in thread
From: Robin Rosenberg @ 2004-02-14 23:06 UTC (permalink / raw)
To: viro; +Cc: Linux kernel
On Saturday 14 February 2004 16.40, you wrote:
> The same goes for file names. Filename is a sequence of bytes, no more and
> no less. Anything beyond that belongs to applications.
Should be a sequence of characters since humans are supposed to use them and
it should be the same characters wheneve possible regardless of user's locale.
The "sequence of bytes" idea is a legacy from prehistoric times when byte == character
was true. That is no longer the case and actually hasn't been for quite a while in
some parts of the world. Interchange is important. The application cannot handle
this since it cannot know what characters a byte string represents. Fixing it in the
kernel is the simple solution since it knows the locale. Its also a small change I
believe. Having an iocharset options for all file systems make it backward compatible
and creates a migration path to UTF-8 as system default locale.
-- robin
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: JFS default behavior
2004-02-14 23:06 ` Robin Rosenberg
@ 2004-02-14 23:29 ` viro
2004-02-15 0:07 ` Robin Rosenberg
0 siblings, 1 reply; 35+ messages in thread
From: viro @ 2004-02-14 23:29 UTC (permalink / raw)
To: Robin Rosenberg; +Cc: Linux kernel
On Sun, Feb 15, 2004 at 12:06:23AM +0100, Robin Rosenberg wrote:
> On Saturday 14 February 2004 16.40, you wrote:
> > The same goes for file names. Filename is a sequence of bytes, no more and
> > no less. Anything beyond that belongs to applications.
>
> Should be a sequence of characters since humans are supposed to use them and
> it should be the same characters wheneve possible regardless of user's locale.
> The "sequence of bytes" idea is a legacy from prehistoric times when byte == character
> was true.
Bullshit. It has _nothing_ to characters, wide or not. For system filenames
are opaque. The only things that have special meanings are:
octet 0x2f ('/') splits the pathname into components
"." as a component has a special meaning
".." as a component has a special meaning.
That's it. The rest is never interpreted by the kernel.
> Having an iocharset options for all file systems make it backward compatible
> and creates a migration path to UTF-8 as system default locale.
Try to realize that different users CAN HAVE DIFFERENT LOCALES. On the same
system. And have files on the same fs. Moreover, homedirs that used to be
on different filesystems can end up one the same fs. What iocharset would
you use, then? Sigh...
Again, there is no such thing as iocharset of filesystem - it varies between
users and users can and do share filesystems. Think of /home; think of /tmp.
It isn't feasible. At all. Just as timezone doesn't belong in kernel, locales
have no place there.
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: JFS default behavior
2004-02-14 23:29 ` viro
@ 2004-02-15 0:07 ` Robin Rosenberg
2004-02-15 2:41 ` Linus Torvalds
0 siblings, 1 reply; 35+ messages in thread
From: Robin Rosenberg @ 2004-02-15 0:07 UTC (permalink / raw)
To: viro; +Cc: Linux kernel
On Sunday 15 February 2004 00.29, you wrote:
> On Sun, Feb 15, 2004 at 12:06:23AM +0100, Robin Rosenberg wrote:
> > The "sequence of bytes" idea is a legacy from prehistoric times when byte == character
> > was true.
>
> Bullshit. It has _nothing_ to characters, wide or not. For system filenames
> are opaque. The only things that have special meanings are:
> octet 0x2f ('/') splits the pathname into components
> "." as a component has a special meaning
> ".." as a component has a special meaning.
> That's it. The rest is never interpreted by the kernel.
I know how it is (to some degree), and its wrong. The user sees inside the filename
and sees a string of characters, not a byte sequence.
> Try to realize that different users CAN HAVE DIFFERENT LOCALES. On the same
> system. And have files on the same fs. Moreover, homedirs that used to be
> on different filesystems can end up one the same fs. What iocharset would
> you use, then? Sigh...
Ok, I've got the iocharset option wrong, god knows why. The problem
however remains.
It seems you simply don't want to understand the problem, which is that users
CAN HAVE DIFFERENT LOCALES on the same system and on different system.
Sigh...
I less concerned with which solution than that a solution should be found. So it
seems no file system has a solution today. Still an iocharset option would relieve
the problem for removable media and muli-boot systems. Most linux machines
are essentially single user and have either the same locale for all users or all
users are using UTF-8 with their locale. It's not the locale, but the charset used
for encoding the locale. The rest cannot be helped.
-- robin
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: JFS default behavior
2004-02-15 0:07 ` Robin Rosenberg
@ 2004-02-15 2:41 ` Linus Torvalds
2004-02-15 3:33 ` Matthias Urlichs
0 siblings, 1 reply; 35+ messages in thread
From: Linus Torvalds @ 2004-02-15 2:41 UTC (permalink / raw)
To: Robin Rosenberg; +Cc: viro, Linux kernel
On Sun, 15 Feb 2004, Robin Rosenberg wrote:
> >
> > Bullshit. It has _nothing_ to characters, wide or not. For system filenames
> > are opaque. The only things that have special meanings are:
> > octet 0x2f ('/') splits the pathname into components
> > "." as a component has a special meaning
> > ".." as a component has a special meaning.
> > That's it. The rest is never interpreted by the kernel.
>
> I know how it is (to some degree), and its wrong. The user sees inside the filename
> and sees a string of characters, not a byte sequence.
Yes, the user sees a string of characters, but the octet 0x2f ('/') and
the terminating NUL character '\0' are still perfectly normal characters
and there is no confusion.
The reason: UTF-8. It's the only sane encoding (apart from a pure extended
ASCII setup, which is also sane, but is obviously unacceptable for a large
portion of the world).
If some misguided person has told you about UCS-2 and horrors like UTF-9,
just ignore them. They are crazy and deluded, and - perhaps more
importantly - stupid.
In short: the kernel talks bytestreams, and that implies that if you want
to talk to the kernel, you HAVE TO USE UTF-8.
At which point there are no locale issues any more. The only locale issue
you can have is user space mistaking a stream of bytes as extended ASCII,
which will cause all your pretty UTF-8 characters to be shown as strange
latin1 (or other) squiggles.
> It seems you simply don't want to understand the problem, which is that users
> CAN HAVE DIFFERENT LOCALES on the same system and on different system.
> Sigh...
People understand the problem. And UTF-8 is the solution.
It's getting there. I think even Microsoft has seen the light, and is
phasing out their crapola (UCS-2LE? Whatever).
> I less concerned with which solution than that a solution should be found. So it
> seems no file system has a solution today. Still an iocharset option would relieve
> the problem for removable media and muli-boot systems.
No. Things like "iocharset" are not the solution. They are literally the
_problem_. The solution is to use something that not only acts as ASCII,
but also has a wide enough range to cover the whole required space (UCS-2
fails _both_ of these fundamental tests). At which point "iocharset" makes
no sense any more, and only exists as a way to translate legacy crap into
the one true format.
And that one true format is UTF-8. End of story. If you try to talk to the
kernel in UCS-2 or anything else, you _will_ fail.
Linus
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: JFS default behavior
2004-02-15 2:41 ` Linus Torvalds
@ 2004-02-15 3:33 ` Matthias Urlichs
2004-02-15 4:04 ` viro
0 siblings, 1 reply; 35+ messages in thread
From: Matthias Urlichs @ 2004-02-15 3:33 UTC (permalink / raw)
To: linux-kernel
Hi, Linus Torvalds wrote:
> In short: the kernel talks bytestreams, and that implies that if you want
> to talk to the kernel, you HAVE TO USE UTF-8.
>
> At which point there are no locale issues any more.
Not locale, but normalization problems and identical-glyph problems.
Which is actually worse, because you don't have filenames which look
like crap -- instead you have filenames which look perfectly sane, but
they still do not work. Example: is an á one character, or is it an a
followed by a composing ´?
Mac OSX, just as an example, only uses decomposed filenames. I don't know
the current situation, but 10.2 has major problems when you try to access
files with composite characters in their name (across NFS for instance).
I wonder if Linux, i.e. Linus ;-) should decree one single standard
normalization. (I am NOT saying that enforcing this would be the kernel's
job!)
--
Matthias Urlichs
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: JFS default behavior
2004-02-15 3:33 ` Matthias Urlichs
@ 2004-02-15 4:04 ` viro
2004-02-15 9:48 ` Robin Rosenberg
2004-02-15 18:26 ` yodaiken
0 siblings, 2 replies; 35+ messages in thread
From: viro @ 2004-02-15 4:04 UTC (permalink / raw)
To: Matthias Urlichs; +Cc: linux-kernel
On Sun, Feb 15, 2004 at 04:33:48AM +0100, Matthias Urlichs wrote:
> Mac OSX, just as an example, only uses decomposed filenames.
So how long does it take for a filename to decompose? ;-)
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: JFS default behavior
2004-02-15 4:04 ` viro
@ 2004-02-15 9:48 ` Robin Rosenberg
2004-02-15 18:26 ` yodaiken
1 sibling, 0 replies; 35+ messages in thread
From: Robin Rosenberg @ 2004-02-15 9:48 UTC (permalink / raw)
To: viro; +Cc: Linux kernel
On Sunday 15 February 2004 05.04, you wrote:
> On Sun, Feb 15, 2004 at 04:33:48AM +0100, Matthias Urlichs wrote:
>
> > Mac OSX, just as an example, only uses decomposed filenames.
>
> So how long does it take for a filename to decompose?
As long as it takes to switch locale to UTF-8 :) or vice verse.
-- robin
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: JFS default behavior
2004-02-15 4:04 ` viro
2004-02-15 9:48 ` Robin Rosenberg
@ 2004-02-15 18:26 ` yodaiken
1 sibling, 0 replies; 35+ messages in thread
From: yodaiken @ 2004-02-15 18:26 UTC (permalink / raw)
To: viro; +Cc: Matthias Urlichs, linux-kernel
On Sun, Feb 15, 2004 at 04:04:58AM +0000, viro@parcelfarce.linux.theplanet.co.uk wrote:
> On Sun, Feb 15, 2004 at 04:33:48AM +0100, Matthias Urlichs wrote:
>
> > Mac OSX, just as an example, only uses decomposed filenames.
>
> So how long does it take for a filename to decompose? ;-)
Depends on whether it is junk or not.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)
@ 2004-02-12 16:50 Nicolas Mailhot
2004-02-13 3:03 ` Jamie Lokier
0 siblings, 1 reply; 35+ messages in thread
From: Nicolas Mailhot @ 2004-02-12 16:50 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1450 bytes --]
Not specifying the file name encoding (either per fs type, per partition
or per filename) is plain dangerous. It is not a userspace problem -
flash/hotplug disks move, users on the same system can have different
locales and try to share files, a user can change his locale to another
one (hear the screams of RH users forcibly converted to utf8 which had
to fix years of storage which filenames were suddenly borked)
See also the sun zip encoding bug - everyone uses zip files in Java, zip
authors thought a filename is "just a bunch of bytes" and didn't put
filename encoding info in the zip format, and now java zip handling goes
boom since numerous encodings are unicode-incompatible. It's slowly
getting its way to the top-25 most reported java bugs.
(of course as usual US users/coders are not hit and do not feel
concerned)
The only reason we got by with it so far is linux localisation was poor,
and systems didn't scale high enough to permit high number of users per
system (reducing locale collision risks)
The only reason we might get by in the future is everyone will be using
utf8.
But that's not a reason not to fix the core problem - I don't want to
spent hours fixing filenames next time someone comes up with a new
encoding. Please put valid encoding info somewhere or declare filenames
are utf-8 od utf-16 only - changing user locale should not corrupt old
data.
Cheers,
--
Nicolas Mailhot
[-- Attachment #2: Ceci est une partie de message numériquement signée --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)
2004-02-12 16:50 JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) Nicolas Mailhot
@ 2004-02-13 3:03 ` Jamie Lokier
2004-02-13 18:06 ` Nicolas Mailhot
0 siblings, 1 reply; 35+ messages in thread
From: Jamie Lokier @ 2004-02-13 3:03 UTC (permalink / raw)
To: Nicolas Mailhot; +Cc: linux-kernel
Nicolas Mailhot wrote:
> But that's not a reason not to fix the core problem - I don't want to
> spent hours fixing filenames next time someone comes up with a new
> encoding. Please put valid encoding info somewhere or declare filenames
> are utf-8 od utf-16 only - changing user locale should not corrupt old
> data.
If you attach encoding to names for a whole filesystem, you will get
really unpleasant bugs including security holes because some names
won't be writable, so the fs will either return error codes when those
names are used, or silently alter the names.
-- Jamie
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)
2004-02-13 3:03 ` Jamie Lokier
@ 2004-02-13 18:06 ` Nicolas Mailhot
2004-02-13 18:15 ` viro
0 siblings, 1 reply; 35+ messages in thread
From: Nicolas Mailhot @ 2004-02-13 18:06 UTC (permalink / raw)
To: Jamie Lokier; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1326 bytes --]
Le ven, 13/02/2004 à 03:03 +0000, Jamie Lokier a écrit :
> Nicolas Mailhot wrote:
> > But that's not a reason not to fix the core problem - I don't want to
> > spent hours fixing filenames next time someone comes up with a new
> > encoding. Please put valid encoding info somewhere or declare filenames
> > are utf-8 od utf-16 only - changing user locale should not corrupt old
> > data.
>
> If you attach encoding to names for a whole filesystem, you will get
> really unpleasant bugs including security holes because some names
> won't be writable, so the fs will either return error codes when those
> names are used, or silently alter the names.
You can have security holes now just by tricking an app into reading
files written by another app which disagreed on the locale.
And as for the filename problems :
- just mangle existing invalid filenames when a default encoding is
agreed upon
- refuse to write new files with invalid filenames just like you would
with the few names forbidden in ascii - apps will learn to cope.
Some convention is needed, expecting it to materialise without os
enforcement is deluding oneself, getting a change like this in place
will definitely be painful but the current situation is far from
painless for a lot of people.
Regards,
--
Nicolas Mailhot
[-- Attachment #2: Ceci est une partie de message numériquement signée --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)
2004-02-13 18:06 ` Nicolas Mailhot
@ 2004-02-13 18:15 ` viro
2004-02-13 18:31 ` Richard B. Johnson
0 siblings, 1 reply; 35+ messages in thread
From: viro @ 2004-02-13 18:15 UTC (permalink / raw)
To: Nicolas Mailhot; +Cc: Jamie Lokier, linux-kernel
On Fri, Feb 13, 2004 at 07:06:46PM +0100, Nicolas Mailhot wrote:
> And as for the filename problems :
> - just mangle existing invalid filenames when a default encoding is
> agreed upon
> - refuse to write new files with invalid filenames just like you would
> with the few names forbidden in ascii - apps will learn to cope.
What names forbidden in ASCII?
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)
2004-02-13 18:15 ` viro
@ 2004-02-13 18:31 ` Richard B. Johnson
2004-02-13 18:50 ` JFS default behavior Ulrich Drepper
0 siblings, 1 reply; 35+ messages in thread
From: Richard B. Johnson @ 2004-02-13 18:31 UTC (permalink / raw)
To: viro; +Cc: Nicolas Mailhot, Jamie Lokier, linux-kernel
On Fri, 13 Feb 2004 viro@parcelfarce.linux.theplanet.co.uk wrote:
> On Fri, Feb 13, 2004 at 07:06:46PM +0100, Nicolas Mailhot wrote:
> > And as for the filename problems :
> > - just mangle existing invalid filenames when a default encoding is
> > agreed upon
> > - refuse to write new files with invalid filenames just like you would
> > with the few names forbidden in ascii - apps will learn to cope.
>
> What names forbidden in ASCII?
I think that all ASCII characters below 0x20 are forbidden in
Unix file-names and others shown in the reference cited and
"disapproved".
http://www.med.nyu.edu/rcr/rcr/nyu_vms/unixfileanddirectorynames.htm
Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
Note 96.31% of all statistics are fiction.
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: JFS default behavior
2004-02-13 18:31 ` Richard B. Johnson
@ 2004-02-13 18:50 ` Ulrich Drepper
0 siblings, 0 replies; 35+ messages in thread
From: Ulrich Drepper @ 2004-02-13 18:50 UTC (permalink / raw)
To: root; +Cc: viro, Nicolas Mailhot, Jamie Lokier, linux-kernel
Richard B. Johnson wrote:
> I think that all ASCII characters below 0x20 are forbidden in
> Unix file-names
Not true. Filenames in Unix are defined as
3.169 Filename
A name consisting of 1 to {NAME_MAX} bytes used to name a file. The
characters composing the name may be selected from the set of all
character values excluding the slash character and the null byte. The
filenames dot and dot-dot have special meaning. A filename is
sometimes referred to as a pathname component .
Only NUL and / are special.
--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2004-02-24 14:44 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-15 23:03 JFS default behavior Nicolas Mailhot
2004-02-16 3:45 ` Jan Knutar
2004-02-16 8:30 ` Nicolas Mailhot
2004-02-16 8:54 ` Valdis.Kletnieks
2004-02-16 6:21 ` jw schultz
2004-02-16 15:55 ` Jamie Lokier
2004-02-17 6:47 ` jw schultz
2004-02-17 21:37 ` Jamie Lokier
2004-02-17 22:12 ` Linus Torvalds
2004-02-18 9:59 ` Jamie Lokier
2004-02-18 15:54 ` Linus Torvalds
2004-02-18 23:58 ` Jamie Lokier
2004-02-19 10:59 ` JFS default behavior / UTF-8 filenames kernel
2004-02-19 14:05 ` Dave Kleikamp
2004-02-19 23:47 ` kernel
2004-02-20 15:00 ` Dave Kleikamp
2004-02-22 19:22 ` kernel
2004-02-24 14:44 ` Dave Kleikamp
[not found] <1pvUz-6j-1@gated-at.bofh.it>
[not found] ` <1pRVj-2am-29@gated-at.bofh.it>
2004-02-16 15:32 ` JFS default behavior Pascal Schmidt
2004-02-16 19:05 ` Eduard Bloch
-- strict thread matches above, loose matches on Subject: below --
2004-02-15 14:48 Pascal Schmidt
2004-02-16 14:24 ` Eduard Bloch
[not found] <04Feb13.163954est.41760@gpu.utcc.utoronto.ca>
2004-02-14 14:27 ` Nicolas Mailhot
2004-02-14 15:40 ` viro
2004-02-14 17:47 ` Nicolas Mailhot
2004-02-14 17:59 ` Nicolas Mailhot
2004-02-14 23:06 ` Robin Rosenberg
2004-02-14 23:29 ` viro
2004-02-15 0:07 ` Robin Rosenberg
2004-02-15 2:41 ` Linus Torvalds
2004-02-15 3:33 ` Matthias Urlichs
2004-02-15 4:04 ` viro
2004-02-15 9:48 ` Robin Rosenberg
2004-02-15 18:26 ` yodaiken
2004-02-12 16:50 JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) Nicolas Mailhot
2004-02-13 3:03 ` Jamie Lokier
2004-02-13 18:06 ` Nicolas Mailhot
2004-02-13 18:15 ` viro
2004-02-13 18:31 ` Richard B. Johnson
2004-02-13 18:50 ` JFS default behavior Ulrich Drepper
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox