git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Hudec <bulb@ucw.cz>
To: Junio C Hamano <gitster@pobox.com>
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	David Kastrup <dak@gnu.org>,
	git@vger.kernel.org
Subject: Re: Stupid quoting...
Date: Sun, 24 Jun 2007 08:50:08 +0200	[thread overview]
Message-ID: <20070624065008.GA6979@efreet.light.src> (raw)
In-Reply-To: <7vd4zrw3k4.fsf@assigned-by-dhcp.pobox.com>

[-- Attachment #1: Type: text/plain, Size: 3033 bytes --]

On Tue, Jun 19, 2007 at 23:19:39 -0700, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> >> I don't see our discourse leading anywhere: the points have been made.
> >
> > I would really, really, really like to see a solution. Alas, I cannot 
> > think of one, other than _forcing_ the developers to use ASCII-only 
> > filenames.
> >
> > Note that there is no convention yet in Git to state which encoding your 
> > filenames are supposed to use. And in fact, we already had a fine example 
> > in git.git why this is particularly difficult. MacOSX is too clever to be 
> > true, in that it gladly takes filenames in one encoding, but reads those 
> > filenames out in _another_ encoding. Thus, a "git add <filename>" can well 
> > end up in git-status saying that a file was deleted, and another file 
> > (actually the same, but in a different encoding) is untracked.

I saw bazaar folks discussing this MacOSX issue. Basically in MacOSX
filenames are *unicode* strings (just as they are in Windows, btw). Unicode,
for compatibility reasons allows expressing many characters in multiple forms
-- composed and decomposed. For example 'á' can be expressed as '\u00e1'
('\xc3\xa1' in utf-8) or as 'a\u0301' ('a\xcc\x81' in utf-8).

MaxOSX opts to, in accord with unicode standard, treat such representations
as equal and it does so by normalizing all filenames to one form. I don't
know whether it uses compatibility normalization and I believe it uses the
decomposed form (which makes the issue immediately obvious, because most
programs work in composed form).

> By the way, the pathname quoting done by "diff" does not even
> attempt to tackle that.  I already explained why in the thread
> so I would not repeat myself.
> 
> Having said that, the absolute minimum that needs to be quoted
> are double-quote (because it is used by quoting as agreed with
> GNU diff/patch maintainer), backslash (used to introduce C-like
> quoting), newline and horizontal tab (makes "patch" confused, as
> it would make it ambiguous where the pathname ends), so I am not
> opposed to a patch that introduces a new mode, probably on by
> default _unless_ we are generating --format=email, that does not
> quote high byte values.  That would solve "My UTF-8 filenames
> are unreadable on my terminal" problem.

IMHO it should be the default even for email format. Most projects that use
non-ascii filenames probably have all members using same locale. And for
such group, it will just work. Also usually the file names, content and
commit messages will usually be in the same (though project-specific)
encoding, so if charset in content-type is set to that, people with different
locale able to represent the same characters will still see the names
correctly. For other people, the MUA will probably print some escape anyway
(it will not screw up the terminal -- it usually knows what it can safely
pass to it).

-- 
						 Jan 'Bulb' Hudec <bulb@ucw.cz>

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

  parent reply	other threads:[~2007-06-24  6:50 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-13 11:30 Stupid quoting David Kastrup
2007-06-13 12:06 ` Alex Riesen
2007-06-13 12:21 ` Johannes Schindelin
     [not found]   ` <86ejkgvxmb.fsf@lola.quinscape.zz>
2007-06-14  0:51     ` Johannes Schindelin
2007-06-14  6:12       ` David Kastrup
2007-06-14  7:06         ` Alex Riesen
     [not found]           ` <86hcpb6lr6.fsf@lola.quinscape.zz>
2007-06-14  8:51             ` Alex Riesen
2007-06-14  1:06   ` Steven Grimm
2007-06-14  1:12     ` Johannes Schindelin
2007-06-14  1:19       ` Steven Grimm
2007-06-14  1:34         ` Johannes Schindelin
2007-06-14  8:49   ` Junio C Hamano
2007-06-16 21:03 ` Jakub Narebski
2007-06-18  8:00   ` David Kastrup
2007-06-18 16:19     ` Jeff King
2007-06-19  1:00     ` Johannes Schindelin
2007-06-19  7:44       ` David Kastrup
2007-06-19  9:50         ` Johannes Schindelin
2007-06-19 20:53           ` Olivier Galibert
     [not found]           ` <86645kutow.fsf@lola.quinscape.zz>
2007-06-20  2:19             ` Johannes Schindelin
2007-06-20  6:19               ` Junio C Hamano
2007-06-20  7:49                 ` David Kastrup
2007-06-20  8:40                   ` Jakub Narebski
2007-06-20  8:59                     ` David Kastrup
2007-06-24  6:50                 ` Jan Hudec [this message]
2007-06-24 11:14                   ` Robin Rosenberg
2007-06-24 11:47                     ` Junio C Hamano
2007-06-24 11:58                       ` David Kastrup
2007-06-24 12:19                         ` Junio C Hamano
2007-06-24 12:41                           ` Jeff King
2007-06-24 16:25                     ` Jan Hudec
2007-06-24 19:39                       ` Robin Rosenberg
2007-06-24 19:47                         ` David Kastrup
2007-06-24 20:17                           ` Robin Rosenberg
2007-06-24 20:25                             ` David Kastrup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070624065008.GA6979@efreet.light.src \
    --to=bulb@ucw.cz \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=dak@gnu.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).