From: "Torsten Bögershausen" <tboegi@web.de>
To: git@vger.kernel.org
Cc: tboegi@web.de
Subject: [PATCH][RFC] git on Mac OS and precomposed unicode
Date: Sat, 7 Jan 2012 20:59:18 +0100 [thread overview]
Message-ID: <201201072059.19103.tboegi@web.de> (raw)
Purpose:
This patch is a suggestion to work around the unpleasenties
when Mac OS is decomposing unicode filenames.
The suggested change:
a) is only used under Mac OS
b) can be switched off by a configuration variable
c) is optimized to handle ASCII only filename
d) will improve the interwork between Mac OS, Linux and Windows*
via git push/pull, using USB sticks (technically speaking VFAT)
or mounted network shares using samba.
* (Not all Windows versions support UTF-8 yet:
Msysgit needs the unicode branch, cygwin supports UTF-8 since 1.7)
Runtime configuration:
A new confguration variable is added: "core.precomposedunicode"
This variable is only used on Mac OS.
If set to false, git behaves exactly as older versions of git.
When a new git version is installed and there is a repository
where the configuration "core.precomposedunicode" is not present,
the new git is backward compatible.
When core.precomposedunicode=true, all filenames are stored in precomposed
unicode in the index (technically speaking precomposed UTF-8).
Even when readdir() under Mac OS returns filenames as decomposed.
Implementation:
Two files are added to the "compat" directory, darwin.h and darwin.c.
They implement basically 3 new functions:
darwin_opendir(), darwin_readdir() and darwin_closedir().
Compile time configuration:
A new compiler option PRECOMPOSED_UNICODE is introduced in the Makefile,
so that the patch can be switched off completely at compile time.
No decomposed file names in a git repository:
In order to prevent that ever a file name in decomposed unicode is entering
the index, a "brute force" attempt is taken:
all arguments into git (technically argv[1]..argv[n]) are converted into
precomposed unicode.
This is done in git.c by calling argv_precompose() for all commands
except "git commit".
This function is actually a #define, and it is only defined under Mac OS.
Nothing is converted on any other OS.
Implementation details:
The main work is done in darwin_readdir() and argv_precompose().
The conversion into precomposed unicode is done by using iconv,
where decomposed is denoted by "UTF-8-MAC" and precomposed is "UTF-8".
When already precomposed unicode is precomposed, the string is returned
unchanged.
Thread save:
Since there is no need for argv_precompose()to be thread-save, one iconv
instance is created at the beginning and kept for all conversions.
Even readdir() is not thread-save, so that darwin_opendir() will call
iconv_open() once and keep the instance for all calls of darwin_readdir()
until darwin_close() is called.
Auto sensing:
When creating a new git repository with "git init" or "git clone", the
"core.precomposedunicode" will be set automatically to "true" or "false".
Typically core.precomposedunicode is "true" on HFS and VFAT.
It is even true for file systems mounted via SAMBA onto a Linux box,
and "false" for drives mounted via NFS onto a Linux box.
New test case:
The new t3910-mac-os-precompose.sh is added to check if a filename
can be reached either in precomposed or decomposed unicode (NFC or NFD).
Documentation/config.txt | 9 ++
Makefile | 3 +
builtin/init-db.c | 22 +++++
compat/darwin.c | 200 ++++++++++++++++++++++++++++++++++++++++++
compat/darwin.h | 31 +++++++
git-compat-util.h | 8 ++
git.c | 1 +
t/t0050-filesystem.sh | 1 +
t/t3910-mac-os-precompose.sh | 104 ++++++++++++++++++++++
9 files changed, 379 insertions(+), 0 deletions(-)
create mode 100644 compat/darwin.c
create mode 100644 compat/darwin.h
create mode 100755 t/t3910-mac-os-precompose.sh
--
1.7.8.rc0.43.gb49a8
next reply other threads:[~2012-01-07 19:59 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-07 19:59 Torsten Bögershausen [this message]
2012-01-08 2:46 ` [PATCH][RFC] git on Mac OS and precomposed unicode Junio C Hamano
2012-01-09 16:44 ` Torsten Bögershausen
2012-01-09 19:29 ` Junio C Hamano
2012-01-09 20:47 ` Torsten Bögershausen
-- strict thread matches above, loose matches on Subject: below --
2012-01-07 19:59 Torsten Bögershausen
2012-01-08 6:01 ` Miles Bader
2012-01-09 16:42 ` Torsten Bögershausen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201201072059.19103.tboegi@web.de \
--to=tboegi@web.de \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).