From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Hudec Subject: Re: RFC: Illegal Characters in File Names Date: Mon, 19 Jul 2004 23:01:19 +0200 Sender: linux-fsdevel-owner@vger.kernel.org Message-ID: <20040719210119.GF3227@vagabond> References: <20040719084757.GC3227@vagabond> <20040719192145.50750578E5@jabberwock.ucw.cz> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="T6xhMxlHU34Bk0ad" Cc: linux-fsdevel@vger.kernel.org Return-path: Received: from cimice4.lam.cz ([212.71.168.94]:30624 "EHLO vagabond.light.src") by vger.kernel.org with ESMTP id S263574AbUGSVBS (ORCPT ); Mon, 19 Jul 2004 17:01:18 -0400 To: "Joseph D. Wagner" Content-Disposition: inline In-Reply-To: <20040719192145.50750578E5@jabberwock.ucw.cz> List-Id: linux-fsdevel.vger.kernel.org --T6xhMxlHU34Bk0ad Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jul 19, 2004 at 14:21:33 -0500, Joseph D. Wagner wrote: > > There are just two illegal characters. '\0' and '/'. All other > > characters are *permited [sic]*. >=20 > By whose standard? POSIX doesn't require non-printing control > characters to be legal. Linux PERMITS them, but there's no standard > requiring them to be permitted. I didn't say they are required to be permited. I said they are. Which is currently true. > > They are not illegal. The shell has problems with them, and the problems > > are not absolute. You CAN type them in if you try. The file managers > > will operate on them without problems, too. >=20 > I would argue that the shell having problems with them is the exact > reason they should be made illegal. The fact, that some program, even if it's a shell, has problem with something, it's an argument for one and only one thing -- that the program should be fixed. As I have already shown, control chars are inconvenient to work with in shell, but POSSIBLE. > Allowing them begs the question "how should they be handled?" For > example, if a file name contained a backspace, displaying the raw > backspace would backup the character's position and result in two > characters being overwritten: the backspace and the character > immediately prior to the backspace. Printing a substitute character > instead of the raw character simply leads to more questions. Does > '\b' mean backspace or backslash and b? How do you tell the > difference? ls does handle this. You can choose how it will display them using various options. > Handling these characters has not been standardized on Linux. > Different applications handle the same characters differently. For > example, ls displays a backslash followed by the octal character In the good old UNIX tradition. All the "low level" tools use this same style of escaping. > number, while KDE's Konqueror automatically substitutes characters > with their hexadecimal form prefixing them with '%' so that prefix and > hexadecimal form is stored on disk and when displayed the hexadecimal > character is automatically transformed into the actual character. No. Just tried. It's NOT that way -- when I create file with % 'URL' escapes, Konqueror, quite correctly, displays the names verbatim. With the % characters... When the filename contains non-displayable character, Konqueror draws a box instead of it. The % conversion you talk about happens when it handles URIs. The URI specification mandates such translation -- and it's undoing on the server. > This can lead to problems when different applications need to access > the same file. How do you know which method the other application > used in handling these characters? The application stored the characters on disk. How it got them has nothing to do with the filesystem. If you need to give the file to an application, that can't handle characters in a name, don't do that then. Make can't handle ':', ',' and ' ' -- go ahead, forbid those too! The fact that an application can't handle something is none of kernel's and filesystem's business. You don't have to use such characters. And one day, you may have a good reason to do so. > Additionally, security vulnerabilities (now patched) have resulted > from the allowed use of control characters. Filenames are input. They must be validated. Filesystem should not fix broken applications. Could you mention where the vulnereabilites were? > I want to close this potential can-of-worms. Why is it a can-of-worms? The kernel has no problem with those characters. Neither does the filesystem. Neither does any of the standard system utilities. Neither does most applications. > > The system is 8-bit clean an [sic] is relied upon by many users. > > I have LOTS of files with iso-8859-2 encoded names on my filesystem. >=20 > According to ISO 8859, the lower 128 characters are all the same. > It's the upper 128 characters that differ with iso-8859-1, iso-8859-2, > etc. Hence, the proposed change should be OK regardless of the > encoding mechanism. I was comenting that the characters >=3D 128 are ok. Oh...! The characters 128 -- 159 behave as control characters (in iso-8859-x), but must be kept for utf-8. Only part of files might be utf-8 encoded... ---------------------------------------------------------------------------= ---- Jan 'Bulb' Hudec --T6xhMxlHU34Bk0ad Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFA/DafRel1vVwhjGURAkYvAJ4m4XqVkJHCvcDk5VXqTcHoQ2mZsACfTtGj qauFCgwYNnq5DQAckIPHiuo= =P/g6 -----END PGP SIGNATURE----- --T6xhMxlHU34Bk0ad--