From: "Torsten Bögershausen" <tboegi@web.de>
To: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Cc: "Torsten Bögershausen" <tboegi@web.de>,
"Junio C Hamano" <gitster@pobox.com>,
git@vger.kernel.org
Subject: Re: [PATCH V4] git on Mac OS and precomposed unicode
Date: Sun, 24 Jun 2012 17:47:01 +0200 [thread overview]
Message-ID: <4FE73675.2000901@web.de> (raw)
In-Reply-To: <CACsJy8AucS9ez=-zej=72dr+0AVvGFR_eZgQcabitXgmQTJOCA@mail.gmail.com>
On 22.01.12 11:03, Nguyen Thai Ngoc Duy wrote:
> 2012/1/22 Nguyen Thai Ngoc Duy <pclouds@gmail.com>:
>>>> In order to prevent that ever a file name in decomposed unicode is
>>>> entering the index, a "brute force" attempt is taken: all arguments into
>>>> git (argv[1]..argv[n]) are converted into precomposed unicode.
>
> Forgot one more thing. We have case-insensitive support in place
> already, we can hook precomposed form conversion there before
> comparing. In other words we just need to support
> {pre,de}composed-insensitive string compare.
>
>Forgot one more thing. We have case-insensitive support in place
>already, we can hook precomposed form conversion there before
>comparing. In other words we just need to support
>{pre,de}composed-insensitive string compare.
>-- Duy
Yes, I like that idea.
After doing some experiments with precomposed and decomposed file names,
the motvation for the fix, or so to say the root cause, changed.
The Mac OS X file system on VFAT has a kind of schizophrenia:
- unicode file names are written as precomposed onto the disk
and Linux/Windows see them as precomposed.
- readdir() returns always decomposed.
- open()/fopen() stat() lstat() works for both pre- and decomposed
Therefore a repository on VFAT under Mac OS X looks as follows:
- file names on disk are precomposed
- file names in the index are decomposed
As long as only Mac OS uses that device, there is no problem.
When we now move the e.g. USB stick using VFAT to Linux
the "decomposed" in the index seem to be deleted and the
precomposed on disk are untracked.
A complete desaster.
To keep the file names in the index and how they are stored
on disk the same, we can can set "core.precomposedunicode" = true.
(Side note: should we rename it to "i18n.precomposedunicode" ?)
Now we keep the index and file names on disk the same, and can move the
USB stick between Linux, Mac OS X, or Windows (since msysGit-1.7.10-
now called Git for Windows, or git under cygwin 1.7)
The same problem occurs when Mac OS mounts a network share from linux
using SAMBA:
readdir() returns decomposed, creat() stores file names in precomposed
on the remote linux machine.
If we put a file name with decomposed unicode on the linux machine,
it will be listed as decomposed by readdir() on the Mac OS side.
Trying to access this file failes, because Mac OS tries to open
the precomposed version, and that does not exist.
Another thing:
There are some reasons to avoid decomposed unicode in Linux and Windows:
Many user space programs don't handle decomposed unicode very well.
When e.g. an "û" should be displayed, the output looks like "u^" in many
programs.
And if we need more motivations: decomposed unicode is hard to enter on
the keyboard.
Then I went back to my original problem
(versioning the "Documents" folder on my Linux $HOME under git
and access it from Windows and Mac OS using SAMBA, or cloning it
to a laptop...)
Knowing that
a) Mac OS X handles precomposed and decomposed the same in open()...
b) The user space program on Mac OS handle precomposed just fine
c) Many user space programs don't presentate decomposed as it should be
d) It is hard to enter decomposed unicode at the keyboard
e) and therefore decomposed unicode is seldom used on Linux
f) Mac OS using SAMBA puts file names in precomposed unicode on the
remote side
Do we have a motivation for pushing a solution that ignores
the unicode composition ?
I'll send a V5 version with hopefully a better motivation
next prev parent reply other threads:[~2012-06-24 15:47 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-21 19:36 [PATCH V4] git on Mac OS and precomposed unicode Torsten Bögershausen
2012-01-21 22:28 ` Carlos Martín Nieto
2012-01-29 16:26 ` Erik Faye-Lund
2012-01-21 22:56 ` Junio C Hamano
2012-01-22 9:58 ` Nguyen Thai Ngoc Duy
2012-01-22 10:03 ` Nguyen Thai Ngoc Duy
2012-06-24 15:47 ` Torsten Bögershausen [this message]
2012-07-25 20:45 ` Robin Rosenberg
2012-01-29 10:29 ` Torsten Bögershausen
2012-01-29 12:57 ` Torsten Bögershausen
-- strict thread matches above, loose matches on Subject: below --
2012-01-21 19:36 Torsten Bögershausen
2012-01-21 22:14 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FE73675.2000901@web.de \
--to=tboegi@web.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).