From mboxrd@z Thu Jan 1 00:00:00 1970 From: Theodore Tso Subject: Re: git on MacOSX and files with decomposed utf-8 file names Date: Mon, 21 Jan 2008 18:00:53 -0500 Message-ID: <20080121230053.GA317@mit.edu> References: <373E260A-6786-4932-956A-68706AA7C469@sb.org> <7EB98659-4036-45DA-BD50-42CB23ED517A@sb.org> <0CA4DF3F-1B64-4F62-8794-6F82C21BD068@sb.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Linus Torvalds , Peter Karlsson , Mark Junker , Pedro Melo , "git@vger.kernel.org" To: Kevin Ballard X-From: git-owner@vger.kernel.org Tue Jan 22 00:03:18 2008 Return-path: Envelope-to: gcvg-git-2@gmane.org Received: from vger.kernel.org ([209.132.176.167]) by lo.gmane.org with esmtp (Exim 4.50) id 1JH5fi-0001RC-2E for gcvg-git-2@gmane.org; Tue, 22 Jan 2008 00:03:18 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758867AbYAUXBd (ORCPT ); Mon, 21 Jan 2008 18:01:33 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1758762AbYAUXBc (ORCPT ); Mon, 21 Jan 2008 18:01:32 -0500 Received: from BISCAYNE-ONE-STATION.MIT.EDU ([18.7.7.80]:51520 "EHLO biscayne-one-station.mit.edu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758535AbYAUXBb (ORCPT ); Mon, 21 Jan 2008 18:01:31 -0500 Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by biscayne-one-station.mit.edu (8.13.6/8.9.2) with ESMTP id m0LN0tAI024368; Mon, 21 Jan 2008 18:00:55 -0500 (EST) Received: from closure.thunk.org (c-76-19-244-124.hsd1.ma.comcast.net [76.19.244.124]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id m0LN0r2a006827 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Mon, 21 Jan 2008 18:00:54 -0500 (EST) Received: from tytso by closure.thunk.org with local (Exim 4.67) (envelope-from ) id 1JH5dN-00009J-7G; Mon, 21 Jan 2008 18:00:53 -0500 Content-Disposition: inline In-Reply-To: <0CA4DF3F-1B64-4F62-8794-6F82C21BD068@sb.org> User-Agent: Mutt/1.5.15+20070412 (2007-04-11) X-Scanned-By: MIMEDefang 2.42 X-Spam-Flag: NO X-Spam-Score: 0.00 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Mon, Jan 21, 2008 at 05:46:27PM -0500, Kevin Ballard wrote: > I find it amusing that you keep arguing against having git treat filenames > as unicode when, if you had actually taken my advice and read my previous > email talking about "ideal" vs "practical"... If by "ideal" you mean a world where 100% of all computers were designed by Steve Jobs, you might have a point. But trying to argue for such a state of idealism seems to be stupid, and certainly a complete waste of everyone's time on the git mailing list. It's simply not reality. It's like with the infamous resource forks, which would have worked fine if all the world were MacOS, but which had a tendency to get stripped off whenver you used a program that wasn't resource fork aware, like zip, or a protocol that wasn't resource fork aware, like FTP. And so people had to put in all sorts of kludges like BinHex to work around MacOS's "if only the entire world was like *me*, no one would get hurt" attitude. In some ways, the MacOS designers are even worse than Microsoft in terms of having the "the world revolves around us" attitude. > In other words, I was trying to illustrate that > HFS+ isn't wrong, it's just different, and the difference is causing the > problem. And if you want to interoperate with the rest of the world, where at least count over 92% of computers are NOT running HFS+, then "Thinking Different" is indeed causing the problem, yes. And whose fault is that? The whole point of interoperability is that when we communicate, we have to do so in a uniform and predictable way. If we can't, the next best thing is to have protocol translators; but in order to do that, we must avoid lossy transformations, such as HFS+'s pseudo-normalization. (Why, by the way, will not result in a "normal" form for any glyph which can be encoded with and without a combining character if said glyph was introduced into Unicode after 1988. So you can't even call it a "normalization" algorithm, but just a pseudo-normalization transformation which is lossy and which DESTROYS filename information in an irrecoverable way.) - Ted