From: B Smith-Mannschott <bsmith.occs@gmail.com>
To: Thomas Singer <thomas.singer@syntevo.com>
Cc: Daniel Barkalow <barkalow@iabervon.org>, git@vger.kernel.org
Subject: Re: OS X and umlauts in file names
Date: Wed, 25 Nov 2009 10:51:48 +0100 [thread overview]
Message-ID: <28c656e20911250151v7b89e5f8pa15aa2bd7d5f96d@mail.gmail.com> (raw)
In-Reply-To: <4B0CEFCA.5020605@syntevo.com>
On Wed, Nov 25, 2009 at 09:50, Thomas Singer <thomas.singer@syntevo.com> wrote:
> I've did following:
>
> toms-mac-mini:git-umlauts tom$ ls
> Überlänge.txt
> toms-mac-mini:git-umlauts tom$ git status
> # On branch master
> #
> # Initial commit
> #
> # Changes to be committed:
> # (use "git rm --cached <file>..." to unstage)
> #
> # new file: "U\314\210berla\314\210nge.txt"
> #
> toms-mac-mini:git-umlauts tom$ git stage "U\314\210berla\314\210nge.txt"
> fatal: pathspec 'U\314\210berla\314\210nge.txt' did not match any files
>
> Note, that I copy-pasted the file name which 'git status' showed to the
> stage command. IMHO, this should work, especially, because different people
> said Git would treat the file name as byte-array without interpreting it in
> some kind.
>
> From the user with the German OS X (for which the staging is said to work),
> I've got the output of 'env' and hence also tried
>
> export LANG=de_DE.UTF-8
>
> before doing the above steps, but with the same results. :(
The problem you are having is not because of the *encoding*, it's the
Normalization form that's messing things up. The fact is that in
Unicode there are two ways to represent many -- but not all --
accented characters.
- "composed": one code point for the accented character)
- "decomposed": two code points: one for the base letter, one or more
combining characters for the accents.
The composed code points are really just backward compatibility to
legacy encodings (like LATIN-1). If you want to actually support
(rather than just tolerate) unicode you have to know how to deal with
the decomposed form, and once you can do that there's little point
beyond backward compatibility in continuing to use composed form
internally.
The Subversion people have run into this same problem because they
made the same error of assuming that any given sequence of glyphs has
only one possible representation as unicode code points and thus only
one representation as UTF-8 bytes. Dionisos has done written up the
issues involved here:
http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames
// Ben
next prev parent reply other threads:[~2009-11-25 9:52 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-23 16:37 OS X and umlauts in file names Thomas Singer
2009-11-23 17:45 ` Thomas Rast
2009-11-23 18:10 ` Thomas Singer
2009-11-23 18:23 ` Johannes Schindelin
2009-11-23 20:31 ` Thomas Singer
2009-11-23 23:31 ` Jakub Narebski
2009-11-23 18:29 ` Martin Langhoff
2009-11-23 20:26 ` Daniel Barkalow
2009-11-25 8:50 ` Thomas Singer
2009-11-25 9:51 ` B Smith-Mannschott [this message]
2009-11-25 10:07 ` Martin Langhoff
2009-11-25 10:19 ` Martin Langhoff
2009-11-25 22:43 ` Andreas Schwab
2009-11-26 8:28 ` Thomas Singer
2009-11-26 17:27 ` Jay Soffian
2009-11-27 10:01 ` Thomas Singer
2009-11-27 10:20 ` Thomas Singer
2009-11-27 10:56 ` Martin Langhoff
2009-11-27 18:35 ` Thomas Singer
2009-11-26 17:23 ` Jay Soffian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=28c656e20911250151v7b89e5f8pa15aa2bd7d5f96d@mail.gmail.com \
--to=bsmith.occs@gmail.com \
--cc=barkalow@iabervon.org \
--cc=git@vger.kernel.org \
--cc=thomas.singer@syntevo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).