From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Joseph D. Wagner" Subject: RE: RFC: Illegal Characters in File Names Date: Mon, 19 Jul 2004 14:21:33 -0500 Sender: linux-fsdevel-owner@vger.kernel.org Message-ID: References: <20040719084757.GC3227@vagabond> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8BIT Cc: Return-path: Received: from ssa8.serverconfig.com ([209.51.129.179]:21222 "EHLO ssa8.serverconfig.com") by vger.kernel.org with ESMTP id S265489AbUGSTVf convert rfc822-to-8bit (ORCPT ); Mon, 19 Jul 2004 15:21:35 -0400 To: "'Jan Hudec'" In-Reply-To: <20040719084757.GC3227@vagabond> List-Id: linux-fsdevel.vger.kernel.org > There are just two illegal characters. '\0' and '/'. All other > characters are *permited [sic]*. By whose standard? POSIX doesn't require non-printing control characters to be legal. Linux PERMITS them, but there's no standard requiring them to be permitted. > They are not illegal. The shell has problems with them, and the problems > are not absolute. You CAN type them in if you try. The file managers > will operate on them without problems, too. I would argue that the shell having problems with them is the exact reason they should be made illegal. Allowing them begs the question "how should they be handled?" For example, if a file name contained a backspace, displaying the raw backspace would backup the character's position and result in two characters being overwritten: the backspace and the character immediately prior to the backspace. Printing a substitute character instead of the raw character simply leads to more questions. Does '\b' mean backspace or backslash and b? How do you tell the difference? Handling these characters has not been standardized on Linux. Different applications handle the same characters differently. For example, ls displays a backslash followed by the octal character number, while KDE's Konqueror automatically substitutes characters with their hexadecimal form prefixing them with '%' so that prefix and hexadecimal form is stored on disk and when displayed the hexadecimal character is automatically transformed into the actual character. This can lead to problems when different applications need to access the same file. How do you know which method the other application used in handling these characters? Additionally, security vulnerabilities (now patched) have resulted from the allowed use of control characters. I want to close this potential can-of-worms. > The system is 8-bit clean an [sic] is relied upon by many users. > I have LOTS of files with iso-8859-2 encoded names on my filesystem. According to ISO 8859, the lower 128 characters are all the same. It's the upper 128 characters that differ with iso-8859-1, iso-8859-2, etc. Hence, the proposed change should be OK regardless of the encoding mechanism. For iso-8859-2 specifically, see: http://nl.ijs.si/gnusl/cee/charset.html Joseph D. Wagner