From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Hudec Subject: Re: RFC: Illegal Characters in File Names Date: Mon, 19 Jul 2004 10:47:57 +0200 Sender: linux-fsdevel-owner@vger.kernel.org Message-ID: <20040719084757.GC3227@vagabond> References: <200407181941.32163.theman@josephdwagner.info> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="FsscpQKzF/jJk6ya" Cc: linux-fsdevel@vger.kernel.org Return-path: Received: from cimice4.lam.cz ([212.71.168.94]:49819 "EHLO vagabond.light.src") by vger.kernel.org with ESMTP id S264833AbUGSIr5 (ORCPT ); Mon, 19 Jul 2004 04:47:57 -0400 To: Joseph Wagner Content-Disposition: inline In-Reply-To: <200407181941.32163.theman@josephdwagner.info> List-Id: linux-fsdevel.vger.kernel.org --FsscpQKzF/jJk6ya Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Jul 18, 2004 at 19:41:32 -0500, Joseph Wagner wrote: > IMHO, e2fsck is not sufficiently aggressive in examining a file name for= =20 > illegal characters. While it is possible to place non-printing control=20 There are just two illegal characters. '\0' and '/'. All other characters are *permited*. > characters in a file name, few (if any) programs support opening a file= =20 > with control characters in the file name. In fact, 'rm' doesn't even=20 > support control characters. To remove such a file, one must substitute a > wildcard character and run rm interactively (i.e. 'rm -i'). rm does perfectly support control characters! As do ALL the standard utilities. What rm especialy does NOT support is the wildcard! It's the shell that expands these. It's the shell that has problems with these, but it's not a show-stopper: $ touch "$(echo foo\\nbar)" $ ls foo?bar $ ls -Q "foo\nbar" $ rm 'foo quote> bar' $ ls $ $ touch foo^Bbar $ ls foo?bar $ ls -Q "foo\002bar" $ rm foo^Bbar $ ls $ (note, that the ^B was written as Ctrl-V Ctrl-B) Perfectly legal. Everything is right. You can write each and every control char this way, just some of them must be in single quotes. > Some of us in the e2fsprogs project are considering a change which would= =20 > mark all non-printing control characters (i.e. ASCII <=3D31 and ASCII =3D= =3D 127)=20 > as illegal. They are not illegal. The shell has problems with them, and the problems are not absolute. You CAN type them in if you try. The file managers will operate on them without problems, too. > I decided not to flag characters > ASCII 127 as illegal in case some day = in=20 > the future the encoding changes to UTF-8, in which case valid printing=20 > non-control characters exist > 127. On the flip side, ASCII and UTF-8 ar= e=20 > 100% compatible when <=3D 127, so the changes I did make will be fine wit= h=20 > both. The system is 8-bit clean an is relied upon by many users. I have LOTS of files with iso-8859-2 encoded names on my filesystem. > Can anyone suggest a good reason not to good forward with this? >=20 > If it is OK to go forward, should the kernel be changed to disallow a fil= e=20 > name from having these same non-printing control characters? No, it's not OK. These characters are perfectly legal. Actualy, more programs will have problems with filename containing a ':', that with a filename containing, say, a '\b'. (usualy because they accept URI arguments and use presence of : to distinguish plain filename and an URI). ---------------------------------------------------------------------------= ---- Jan 'Bulb' Hudec --FsscpQKzF/jJk6ya Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFA+4q9Rel1vVwhjGURAt1QAJ9ILnseWvaD+8K+Yza/aAAR00NzAACeLwnd DL09WU0NjPBi5MWvk0opZ0M= =SYxK -----END PGP SIGNATURE----- --FsscpQKzF/jJk6ya--