git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: tb <tboegi@web.de>
To: gitster@pobox.com, git@vger.kernel.org
Cc: tboegi@web.de
Subject: [RFC][PATCH v3] git on Mac OS and precomposed unicode
Date: Fri, 13 Jan 2012 22:53:00 +0100	[thread overview]
Message-ID: <201201132253.00799.tboegi@web.de> (raw)

Changes since V2:
- renamed compat/darwin.[ch] into compat/precomposed_utf8.[ch]
- builtin/init-db.c: use probe_utf8_pathname_composition()
- fix in reencode_string_iconv(): Don't call iconv_close()
- Allways convert all arguments, no special handling for "git commit"
Changes since last version:
- Improved testcase t/t3910-mac-os-precompose.sh:
  test "git commit -- pathspec" (Thanks Junio)
- Improved the converting of argv[] for "git commit"

===============
Purpose:
This patch is a suggestion to work around the unpleasenties
when Mac OS is decomposing unicode filenames.

The suggested change:
a) is only used under Mac OS
b) can be switched off by a configuration variable
c) is optimized to handle ASCII only filename
d) will improve the interwork between Mac OS, Linux and Windows*
   via git push/pull, using USB sticks (technically speaking VFAT)
   or mounted network shares using samba.

* (Not all Windows versions support UTF-8 yet:
   Msysgit needs the unicode branch, cygwin supports UTF-8 since 1.7)


Runtime configuration:
A new confguration variable is added: "core.precomposedunicode"
This variable is only used on Mac OS.
If set to false, git behaves exactly as older versions of git.
When a new git version is installed and there is a repository
where the configuration "core.precomposedunicode" is not present,
the new git is backward compatible.

When core.precomposedunicode=true, all filenames are stored in precomposed
unicode in the index (technically speaking precomposed UTF-8).
Even when readdir() under Mac OS returns filenames as decomposed.

Implementation:
Two files are added to the "compat" directory, darwin.h and darwin.c.
They implement basically 4 new functions:
precomposed_utf8_opendir(), precomposed_utf8_readdir(),
precomposed_utf8_closedir() argv_precompose()



Compile time configuration:
A new compiler option PRECOMPOSED_UNICODE is introduced in the Makefile,
so that the patch can be switched off completely at compile time.

No decomposed file names in a git repository:
In order to prevent that ever a file name in decomposed unicode is entering
the index, a "brute force" attempt is taken:
all arguments into git (technically argv[1]..argv[n]) are converted into
precomposed unicode.
This is done in git.c by calling argv_precompose().
This function is actually a #define, and it is only defined under Mac OS.
Nothing is converted on any other OS.

Implementation details:
The main work is done in precomposed_utf8_readdir() and argv_precompose().
The conversion into precomposed unicode is done by using iconv,
where decomposed is denoted by "UTF-8-MAC" and precomposed is "UTF-8".
When already precomposed unicode is precomposed, the string is returned
unchanged.

Thread save:
Since there is no need for argv_precompose()to be thread-save, one iconv
instance is created at the beginning and kept for all conversions.
Even readdir() is not thread-save, so that precomposed_utf8_opendir() will call
iconv_open() once and keep the instance for all calls of precomposed_utf8_readdir()
until precomposed_utf8_close() is called.

Auto sensing:
When creating a new git repository with "git init" or "git clone", the
"core.precomposedunicode" will be set automatically to "true" or "false".

Typically core.precomposedunicode is "true" on HFS and VFAT.
It is even true for file systems mounted via SAMBA onto a Linux box,
and "false" for drives mounted via NFS onto a Linux box.


New test case:
The new t3910-mac-os-precompose.sh is added to check if a filename
can be reached either in precomposed or decomposed unicode (NFC or NFD).



Torsten Bögershausen (1):
  git on Mac OS and precomposed unicode

 Documentation/config.txt     |    9 ++
 Makefile                     |    3 +
 builtin/init-db.c            |    2 +
 compat/precomposed_utf8.c    |  213 ++++++++++++++++++++++++++++++++++++++++++
 compat/precomposed_utf8.h    |   39 ++++++++
 git-compat-util.h            |    9 ++
 git.c                        |    1 +
 t/t0050-filesystem.sh        |    1 +
 t/t3910-mac-os-precompose.sh |  117 +++++++++++++++++++++++
 9 files changed, 394 insertions(+), 0 deletions(-)
 create mode 100644 compat/precomposed_utf8.c
 create mode 100644 compat/precomposed_utf8.h
 create mode 100755 t/t3910-mac-os-precompose.sh

-- 
1.7.8.rc0.43.gb49a8

             reply	other threads:[~2012-01-13 21:53 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-13 21:53 tb [this message]
2012-01-13 22:56 ` [RFC][PATCH v3] git on Mac OS and precomposed unicode Jonathan Nieder
  -- strict thread matches above, loose matches on Subject: below --
2012-01-13 21:53 tb

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201201132253.00799.tboegi@web.de \
    --to=tboegi@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).