linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Hudec <bulb@ucw.cz>
To: "Joseph D. Wagner" <theman@josephdwagner.info>
Cc: linux-fsdevel@vger.kernel.org
Subject: Re: RFC: Illegal Characters in File Names
Date: Mon, 19 Jul 2004 23:01:19 +0200	[thread overview]
Message-ID: <20040719210119.GF3227@vagabond> (raw)
In-Reply-To: <20040719192145.50750578E5@jabberwock.ucw.cz>

[-- Attachment #1: Type: text/plain, Size: 4488 bytes --]

On Mon, Jul 19, 2004 at 14:21:33 -0500, Joseph D. Wagner wrote:
> > There are just two illegal characters. '\0' and '/'. All other
> > characters are *permited [sic]*.
> 
> By whose standard?  POSIX doesn't require non-printing control
> characters to be legal.  Linux PERMITS them, but there's no standard
> requiring them to be permitted.

I didn't say they are required to be permited. I said they are. Which is
currently true.

> > They are not illegal. The shell has problems with them, and the problems
> > are not absolute. You CAN type them in if you try. The file managers
> > will operate on them without problems, too.
> 
> I would argue that the shell having problems with them is the exact
> reason they should be made illegal.

The fact, that some program, even if it's a shell, has problem with
something, it's an argument for one and only one thing -- that the
program should be fixed.

As I have already shown, control chars are inconvenient to work with in
shell, but POSSIBLE.

> Allowing them begs the question "how should they be handled?"  For
> example, if a file name contained a backspace, displaying the raw
> backspace would backup the character's position and result in two
> characters being overwritten: the backspace and the character
> immediately prior to the backspace.  Printing a substitute character
> instead of the raw character simply leads to more questions.  Does
> '\b' mean backspace or backslash and b?  How do you tell the
> difference?

ls does handle this. You can choose how it will display them using
various options.

> Handling these characters has not been standardized on Linux.
> Different applications handle the same characters differently.  For
> example, ls displays a backslash followed by the octal character

In the good old UNIX tradition. All the "low level" tools use this same
style of escaping.

> number, while KDE's Konqueror automatically substitutes characters
> with their hexadecimal form prefixing them with '%' so that prefix and
> hexadecimal form is stored on disk and when displayed the hexadecimal
> character is automatically transformed into the actual character.

No. Just tried. It's NOT that way -- when I create file with % 'URL'
escapes, Konqueror, quite correctly, displays the names verbatim. With
the % characters... When the filename contains non-displayable
character, Konqueror draws a box instead of it.

The % conversion you talk about happens when it handles URIs. The URI
specification mandates such translation -- and it's undoing on the
server.

> This can lead to problems when different applications need to access
> the same file.  How do you know which method the other application
> used in handling these characters?

The application stored the characters on disk. How it got them has
nothing to do with the filesystem.

If you need to give the file to an application, that can't handle
characters in a name, don't do that then.

Make can't handle ':', ',' and ' ' -- go ahead, forbid those too!

The fact that an application can't handle something is none of kernel's
and filesystem's business. You don't have to use such characters. And
one day, you may have a good reason to do so.

> Additionally, security vulnerabilities (now patched) have resulted
> from the allowed use of control characters.

Filenames are input. They must be validated. Filesystem should not fix
broken applications.

Could you mention where the vulnereabilites were?

> I want to close this potential can-of-worms.

Why is it a can-of-worms? The kernel has no problem with those
characters. Neither does the filesystem. Neither does any of the
standard system utilities. Neither does most applications.

> > The system is 8-bit clean an [sic] is relied upon by many users.
> > I have LOTS of files with iso-8859-2 encoded names on my filesystem.
> 
> According to ISO 8859, the lower 128 characters are all the same.
> It's the upper 128 characters that differ with iso-8859-1, iso-8859-2,
> etc.  Hence, the proposed change should be OK regardless of the
> encoding mechanism.

I was comenting that the characters >= 128 are ok.

Oh...! The characters 128 -- 159 behave as control characters (in
iso-8859-x), but must be kept for utf-8. Only part of files might be
utf-8 encoded...

-------------------------------------------------------------------------------
						 Jan 'Bulb' Hudec <bulb@ucw.cz>

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

  parent reply	other threads:[~2004-07-19 21:01 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-07-19  0:41 RFC: Illegal Characters in File Names Joseph Wagner
2004-07-19  8:47 ` Jan Hudec
2004-07-19 19:21   ` Joseph D. Wagner
2004-07-19 20:08     ` Pat LaVarre
2004-07-19 20:54       ` Joseph D. Wagner
2004-07-20  6:33     ` Jan-Benedict Glaw
2004-07-20 16:25       ` Joseph D. Wagner
2004-07-20 20:42         ` Stephen Rothwell
     [not found]       ` <20040720162549.857014B7E7@dvmwest.gt.owl.de>
2004-07-20 16:52         ` Jan-Benedict Glaw
     [not found]   ` <20040719192145.50750578E5@jabberwock.ucw.cz>
2004-07-19 21:01     ` Jan Hudec [this message]
2004-07-20 16:40       ` Bryan Henderson
2004-07-20 16:54         ` Guy
2004-07-20 18:10           ` viro
2004-07-20 20:44             ` Guy
2004-07-20 21:27               ` Matthew Wilcox
2004-07-20 21:37                 ` Jan Hudec
2004-07-20 21:40                   ` Matthew Wilcox
2004-07-20 21:45                     ` Jan Hudec
2004-07-20 21:49                       ` Guy
2004-07-20 22:04                         ` Jan Hudec
2004-07-20 22:11                         ` Paul Stewart
2004-07-20 22:16                       ` Joseph D. Wagner
2004-07-21 12:26                         ` Jan-Benedict Glaw
2004-07-21 15:28                           ` Guy
2004-07-21 16:25                             ` Jan-Benedict Glaw
2004-07-21 12:24                       ` Jan-Benedict Glaw
2004-07-20 21:41               ` Bryan Henderson
2004-07-21 12:21               ` Jan-Benedict Glaw
2004-07-21 15:25                 ` Guy
2004-07-22 18:04                   ` Matthew Wilcox
2004-07-22 18:35                     ` Guy
2004-07-20 20:57             ` Jan Hudec
2004-07-20 21:09               ` Guy
2004-07-20 21:36                 ` Jan Hudec
2004-07-20 22:13                 ` viro
2004-07-20 22:44                   ` Jan Hudec
2004-07-20 22:51                     ` viro
2004-07-20 23:30                   ` Guy
2004-07-21 20:25                     ` Bryan Henderson
2004-07-22  3:17                       ` John Newbigin
2004-07-22  3:24                         ` Matthew Wilcox
2004-07-22  6:01                         ` viro
2004-07-22 22:12                         ` Bryan Henderson
2004-07-22 14:51                       ` Jan-Benedict Glaw
2004-07-22 22:44                         ` Bryan Henderson
2004-07-22 22:47                           ` Jan Hudec
2004-07-23 18:10                             ` Bryan Henderson
2004-07-20 23:52                   ` John Newbigin
2004-07-21  3:26                     ` Joseph D. Wagner
2004-07-21  4:15                     ` viro
2004-07-21  5:03                     ` Guy
2004-07-21 12:28                 ` Jan-Benedict Glaw
2004-07-21 15:30                   ` Guy
2004-07-21 16:26                     ` Jan-Benedict Glaw
2004-07-21 16:33                       ` Jan Hudec
2004-07-21 16:41                       ` Guy
2004-07-21 17:01                         ` Jan Hudec
2004-07-20 22:16             ` Joseph D. Wagner
2004-07-21 12:43               ` Jan-Benedict Glaw
2004-07-20 22:31             ` viro
2004-07-20 18:27           ` Bryan Henderson
2004-07-19  9:26 ` Matthew Wilcox
2004-07-19 19:21   ` Joseph D. Wagner
     [not found]   ` <E1BmdhG-0004NG-00@master.debian.org>
2004-07-20  2:43     ` Matthew Wilcox
2004-07-20  3:16       ` Joseph D. Wagner
2004-07-20  8:45         ` Jan Hudec
2004-07-20 16:25           ` Joseph D. Wagner
2004-07-20 16:41             ` Guy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040719210119.GF3227@vagabond \
    --to=bulb@ucw.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=theman@josephdwagner.info \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).