From mboxrd@z Thu Jan 1 00:00:00 1970 From: Robin Rosenberg Subject: Re: git on MacOSX and files with decomposed utf-8 file names Date: Fri, 18 Jan 2008 01:44:04 +0100 Message-ID: <200801180144.06253.robin.rosenberg.lists@dewire.com> References: <478E1FED.5010801@web.de> <2010BC03-E5AE-4333-96CA-4A9B700AD720@sb.org> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Cc: Johannes Schindelin , Wincent Colaiuta , Mitch Tishmack , git@vger.kernel.org To: Kevin Ballard X-From: git-owner@vger.kernel.org Fri Jan 18 01:44:36 2008 Return-path: Envelope-to: gcvg-git-2@gmane.org Received: from vger.kernel.org ([209.132.176.167]) by lo.gmane.org with esmtp (Exim 4.50) id 1JFfLX-0005l4-6k for gcvg-git-2@gmane.org; Fri, 18 Jan 2008 01:44:35 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756857AbYARAoH (ORCPT ); Thu, 17 Jan 2008 19:44:07 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756662AbYARAoG (ORCPT ); Thu, 17 Jan 2008 19:44:06 -0500 Received: from [83.140.172.130] ([83.140.172.130]:6658 "EHLO dewire.com" rhost-flags-FAIL-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1756597AbYARAoE (ORCPT ); Thu, 17 Jan 2008 19:44:04 -0500 Received: from localhost (localhost [127.0.0.1]) by dewire.com (Postfix) with ESMTP id C643E8030D9; Fri, 18 Jan 2008 01:44:02 +0100 (CET) X-Virus-Scanned: by amavisd-new at dewire.com Received: from dewire.com ([127.0.0.1]) by localhost (torino.dewire.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XK-Td5EZ3oi2; Fri, 18 Jan 2008 01:44:01 +0100 (CET) Received: from [10.9.0.4] (unknown [10.9.0.4]) by dewire.com (Postfix) with ESMTP id C342D80264C; Fri, 18 Jan 2008 01:44:01 +0100 (CET) User-Agent: KMail/1.9.6 (enterprise 0.20071123.740460) In-Reply-To: <2010BC03-E5AE-4333-96CA-4A9B700AD720@sb.org> Content-Disposition: inline Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: torsdagen den 17 januari 2008 skrev Kevin Ballard: > On Jan 17, 2008, at 10:57 AM, Johannes Schindelin wrote: > > > On Thu, 17 Jan 2008, Kevin Ballard wrote: > > > >> On Jan 17, 2008, at 5:22 AM, Wincent Colaiuta wrote: > >> > >>> While it's a nice workaround, it really is just that (a workaround) > >>> because performance will be suboptimal in a repository running on a > >>> disk image (and many of switched to Git because of its speed). > >> > >> Not only is it suboptimal, it's also not acceptable, plain and > >> simple. > > > > If it's not acceptable, do something about it (and I don't mean > > writing 50 > > emails). If you don't want to do something about it, I have to > > assume that > > you accept it as-is. > > I never said I don't want to do anything about it. However, I do > believe that it will take a significant investment of time and energy > to learn all the gooey details of how git handles filenames and how > the index works and all that jazz, which is knowledge that other > people already have. I believe that, for me to solve this problem > independently, it may require so much time that it never gets done > (after all, I am fairly busy). However, if other people who already > have this knowledge are willing to help, that would make this task far > easier, especially given that if nobody else even acknowledges that > this is a problem I don't have much hope of getting a patch accepted. > > So again, I'm certainly going to try, but working by myself it simply > may never get done. (This is only for those that think the problem should be solved somehow. The rest can move on - nothing to see here) You may look at http://rosenberg.homelinux.net/cgi-bin/gitweb/gitweb.cgi?p=GIT.git;a=log;h=i18n for inspiration. It's pretty obsolete by now and only a "proof of concept", i.e. it can be done, not that it necessarily should be done exactly this way. Basically it intercepts the user's access to git, i.e. certain commands and how files are named (since those names represent a user interface). Then it assumes the internal encoding is UTF-8 (or garbage) converting to and from the user's local encoding. The heuristics is based on the assumption that a string (even random onesthat looks like UTF-8, with a very high probablity actually is UTF-8 encoded. The test cases might be usable almost as is. -- robin