From mboxrd@z Thu Jan 1 00:00:00 1970 From: Torsten =?utf-8?q?B=C3=B6gershausen?= Subject: [PATCH][RFC] git on Mac OS and precomposed unicode Date: Sat, 7 Jan 2012 20:59:22 +0100 Message-ID: <201201072059.23074.tboegi@web.de> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: tboegi@web.de To: git@vger.kernel.org X-From: git-owner@vger.kernel.org Sat Jan 07 20:59:53 2012 Return-path: Envelope-to: gcvg-git-2@lo.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1RjcQi-0004Kh-AC for gcvg-git-2@lo.gmane.org; Sat, 07 Jan 2012 20:59:52 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753352Ab2AGT7t convert rfc822-to-quoted-printable (ORCPT ); Sat, 7 Jan 2012 14:59:49 -0500 Received: from fmmailgate03.web.de ([217.72.192.234]:56394 "EHLO fmmailgate03.web.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753259Ab2AGT7f convert rfc822-to-8bit (ORCPT ); Sat, 7 Jan 2012 14:59:35 -0500 Received: from moweb001.kundenserver.de (moweb001.kundenserver.de [172.19.20.114]) by fmmailgate03.web.de (Postfix) with ESMTP id 9D62D1AF4F072 for ; Sat, 7 Jan 2012 20:59:28 +0100 (CET) Received: from maxi.localnet ([194.22.188.61]) by smtp.web.de (mrweb002) with ESMTPA (Nemesis) id 0MbQbk-1S2NlC0NvQ-00J5SK; Sat, 07 Jan 2012 20:59:26 +0100 X-Provags-ID: V02:K0:c55uY9GNXWIcdERV8X/4ShpVb4dyyvKjKOzNkk8r3Ht TlqZjc/y52y99P5x/D5M0OZbHGt0imzvYuG1vvjVP8dQY4epe7 Eg5c98IOPkziAIiUDdIUd1PVIKYpYim94wQhsI1yJbZaFZqpGE tKEwJ2/3Kpixt5tmLXRoAIBcbQch4esZF3Wl3iUVMfSUi7A0TM Lb3a5fOYWQgN7VDN4F+yA== Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Allow git on Mac OS to store file names in the index in precomposed uni= code, while the file system used decomposed unicode. When a file called "LATIN CAPITAL LETTER A WITH DIAERESIS" (in utf-8 encoded as 0xc3 0x84) is created, the filesystem converts "precomposed unicode" into "decomposed unicode"= , which means that readdir() will return 0x41 0xcc 0x88. When true, git reverts the unicode decomposition of filenames. This is useful when pulling/pushing from repositories containing utf-8 encoded filenames using precomposed utf-8 (like Linux). This feature is automatically switched on when "git init" is run, and the file system is doing UTF-8 decompostion. (Which has been observed on HFS+, SMBFS and VFAT, but not on NFS) It can be switched off by setting core.macosforcenfc=3Dfalse It is implemented by re-defining the readdir() functions. =46ile names are converted into precomposed UTF-8. Signed-off-by: Torsten B=C3=B6gershausen --- Documentation/config.txt | 9 ++ Makefile | 3 + builtin/init-db.c | 22 +++++ compat/darwin.c | 200 ++++++++++++++++++++++++++++++++++= ++++++++ compat/darwin.h | 31 +++++++ git-compat-util.h | 8 ++ git.c | 1 + t/t0050-filesystem.sh | 1 + t/t3910-mac-os-precompose.sh | 104 ++++++++++++++++++++++ 9 files changed, 379 insertions(+), 0 deletions(-) create mode 100644 compat/darwin.c create mode 100644 compat/darwin.h create mode 100755 t/t3910-mac-os-precompose.sh diff --git a/Documentation/config.txt b/Documentation/config.txt index 2959390..01b9465 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -175,6 +175,15 @@ The default is false, except linkgit:git-clone[1] = or linkgit:git-init[1] will probe and set core.ignorecase true if appropriate when the reposi= tory is created. =20 +core.precomposedunicode:: + This option is only used by Mac OS implementation of git. + When core.precomposedunicode=3Dtrue, + git reverts the unicode decomposition of filenames done by Mac OS. + This is useful when pulling/pushing from repositories containing utf-= 8 + encoded filenames using precomposed unicode (like Linux). + When false, file names are handled fully transparent by git. + If in doubt, set core.precomposedunicode=3Dfalse. + core.trustctime:: If false, the ctime differences between the index and the working tree are ignored; useful when the inode change time diff --git a/Makefile b/Makefile index b21d2f1..596900e 100644 --- a/Makefile +++ b/Makefile @@ -519,6 +519,7 @@ LIB_H +=3D compat/bswap.h LIB_H +=3D compat/cygwin.h LIB_H +=3D compat/mingw.h LIB_H +=3D compat/obstack.h +LIB_H +=3D compat/darwin.h LIB_H +=3D compat/win32/pthread.h LIB_H +=3D compat/win32/syslog.h LIB_H +=3D compat/win32/poll.h @@ -884,6 +885,8 @@ ifeq ($(uname_S),Darwin) endif NO_MEMMEM =3D YesPlease USE_ST_TIMESPEC =3D YesPlease + COMPAT_OBJS +=3D compat/darwin.o + BASIC_CFLAGS +=3D -DPRECOMPOSED_UNICODE endif ifeq ($(uname_S),SunOS) NEEDS_SOCKET =3D YesPlease diff --git a/builtin/init-db.c b/builtin/init-db.c index 0dacb8b..88c9de1 100644 --- a/builtin/init-db.c +++ b/builtin/init-db.c @@ -290,6 +290,28 @@ static int create_default_files(const char *templa= te_path) strcpy(path + len, "CoNfIg"); if (!access(path, F_OK)) git_config_set("core.ignorecase", "true"); +#if defined (PRECOMPOSED_UNICODE) + { + const static char *auml_nfc =3D "\xc3\xa4"; + const static char *auml_nfd =3D "\x61\xcc\x88"; + int output_fd; + path[len] =3D 0; + strcpy(path + len, auml_nfc); + output_fd =3D open(path, O_CREAT|O_EXCL|O_RDWR, 0600); + if (output_fd >=3D0) { + close(output_fd); + path[len] =3D 0; + strcpy(path + len, auml_nfd); + if (0 =3D=3D access(path, R_OK)) + git_config_set("core.precomposedunicode", "true"); + else + git_config_set("core.precomposedunicode", "false"); + path[len] =3D 0; + strcpy(path + len, auml_nfc); + unlink(path); + } + } +#endif } =20 return reinit; diff --git a/compat/darwin.c b/compat/darwin.c new file mode 100644 index 0000000..15de7c2 --- /dev/null +++ b/compat/darwin.c @@ -0,0 +1,200 @@ +#define __DARWIN_C__ + +#include +#include +#include +#include + +#include "../cache.h" +#include "../utf8.h" + +#include "darwin.h" + +static int mac_os_precomposed_unicode; +const static char *repo_encoding =3D "UTF-8"; +const static char *path_encoding =3D "UTF-8-MAC"; + + +/* Code borrowed from utf8.c */ +#if defined(OLD_ICONV) || (defined(__sun__) && !defined(_XPG6)) + typedef const char * iconv_ibp; +#else + typedef char * iconv_ibp; +#endif +static char *reencode_string_iconv(const char *in, size_t insz, iconv_= t conv) +{ + size_t outsz, outalloc; + char *out, *outpos; + iconv_ibp cp; + + outsz =3D insz; + outalloc =3D outsz + 1; /* for terminating NUL */ + out =3D xmalloc(outalloc); + outpos =3D out; + cp =3D (iconv_ibp)in; + + while (1) { + size_t cnt =3D iconv(conv, &cp, &insz, &outpos, &outsz); + + if (cnt =3D=3D -1) { + size_t sofar; + if (errno !=3D E2BIG) { + free(out); + iconv_close(conv); + return NULL; + } + /* insz has remaining number of bytes. + * since we started outsz the same as insz, + * it is likely that insz is not enough for + * converting the rest. + */ + sofar =3D outpos - out; + outalloc =3D sofar + insz * 2 + 32; + out =3D xrealloc(out, outalloc); + outpos =3D out + sofar; + outsz =3D outalloc - sofar - 1; + } + else { + *outpos =3D '\0'; + break; + } + } + return out; +} + +static size_t +has_utf8(const char *s, size_t maxlen, size_t *strlen_c) +{ + const uint8_t *utf8p =3D (const uint8_t*) s; + size_t strlen_chars =3D 0; + size_t ret =3D 0; + + if ((!utf8p) || (!*utf8p)) + return 0; + + while((*utf8p) && maxlen) { + if (*utf8p & 0x80) + ret++; + strlen_chars++; + utf8p++; + maxlen--; + } + if (strlen_c) + *strlen_c =3D strlen_chars; + + return ret; +} + +static int +precomposed_unicode_config(const char *var, const char *value, void *c= b) +{ + if (!strcasecmp(var, "core.precomposedunicode")) { + mac_os_precomposed_unicode =3D git_config_bool(var, value); + return 0; + } + return 1; +} + +void +argv_precompose(int argc, const char **argv) +{ + int i =3D 0; + const char *oldarg; + char *newarg; + iconv_t ic_precompose; + + if (!strcmp("commit", argv[0])) + return; + + git_config(precomposed_unicode_config, NULL); + if (!mac_os_precomposed_unicode) + return; + + ic_precompose =3D iconv_open(repo_encoding, path_encoding); + if (ic_precompose =3D=3D (iconv_t) -1) + return; + + while (i < argc) { + size_t namelen; + oldarg =3D argv[i]; + + if (has_utf8(oldarg, (size_t)-1, &namelen)) { + newarg =3D reencode_string_iconv(oldarg, namelen, ic_precompose); + if (newarg) + argv[i] =3D newarg; + } + i++; + } + iconv_close(ic_precompose); +} + + +DARWIN_DIR * +darwin_opendir(const char *dirname) +{ + DARWIN_DIR *darwin_dir; + darwin_dir =3D malloc(sizeof(DARWIN_DIR)); + if (!darwin_dir) + return NULL; + + darwin_dir->dirp =3D opendir(dirname); + if (!darwin_dir->dirp) { + free(darwin_dir); + return NULL; + } + darwin_dir->ic_precompose =3D iconv_open(repo_encoding, path_encoding= ); + if (darwin_dir->ic_precompose =3D=3D (iconv_t) -1) { + closedir(darwin_dir->dirp); + free(darwin_dir); + return NULL; + } + + return darwin_dir; +} + +struct dirent * +darwin_readdir(DARWIN_DIR *darwin_dirp) +{ + struct dirent *res; + size_t namelen =3D 0; + + res =3D readdir(darwin_dirp->dirp); + if (!res || !mac_os_precomposed_unicode || !has_utf8(res->d_name, (si= ze_t)-1, &namelen)) + return res; + else { + int olderrno =3D errno; + size_t outsz =3D sizeof(darwin_dirp->dirent_nfc.d_name) - 1; /* one = for \0 */ + char *outpos =3D darwin_dirp->dirent_nfc.d_name; + iconv_ibp cp; + size_t cnt; + size_t insz =3D namelen; + cp =3D (iconv_ibp)res->d_name; + + /* Copy all data except the name */ + memcpy(&darwin_dirp->dirent_nfc, + res, + sizeof(darwin_dirp->dirent_nfc)-sizeof(darwin_dirp->dirent_nfc.d= _name)); + errno =3D 0; + + cnt =3D iconv(darwin_dirp->ic_precompose, &cp, &insz, &outpos, &outs= z); + if (cnt < sizeof(darwin_dirp->dirent_nfc.d_name) -1) { + *outpos =3D 0; + errno =3D olderrno; + return &darwin_dirp->dirent_nfc; + } + errno =3D olderrno; + return res; + } +} + + +int +darwin_closedir(DARWIN_DIR *darwin_dirp) +{ + int ret_value; + ret_value =3D closedir(darwin_dirp->dirp); + if (darwin_dirp->ic_precompose !=3D (iconv_t)-1) + iconv_close(darwin_dirp->ic_precompose); + free(darwin_dirp); + return ret_value; +} diff --git a/compat/darwin.h b/compat/darwin.h new file mode 100644 index 0000000..094f930 --- /dev/null +++ b/compat/darwin.h @@ -0,0 +1,31 @@ +#ifndef __DARWIN_H__ +#include +#include +#include +#include + + +typedef struct { + iconv_t ic_precompose; + DIR *dirp; + struct dirent dirent_nfc; +} DARWIN_DIR; + +char *str_precompose(const char *in, iconv_t ic_precompose); + +void argv_precompose(int argc, const char **argv); + +DARWIN_DIR *darwin_opendir(const char *dirname); +struct dirent *darwin_readdir(DARWIN_DIR *dirp); +int darwin_closedir(DARWIN_DIR *dirp); + +#ifndef __DARWIN_C__ +#define opendir(n) darwin_opendir(n) +#define readdir(d) darwin_readdir(d) +#define closedir(d) darwin_closedir(d) +#define DIR DARWIN_DIR + +#endif /* __DARWIN_C__ */ + +#define __DARWIN_H__ +#endif /* __DARWIN_H__ */ diff --git a/git-compat-util.h b/git-compat-util.h index 230e198..859dfcf 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -90,6 +90,14 @@ #include #endif =20 +#if defined (PRECOMPOSED_UNICODE) +#include "compat/darwin.h" +#else +#define str_precompose(in,i_nfd2nfc) (NULL) +#define argv_precompose(c,v) + +#endif + #include #include #include diff --git a/git.c b/git.c index 8e34903..6b2ffb7 100644 --- a/git.c +++ b/git.c @@ -298,6 +298,7 @@ static int run_builtin(struct cmd_struct *p, int ar= gc, const char **argv) startup_info->have_repository) /* get_git_dir() may set up repo,= avoid that */ trace_repo_setup(prefix); } + argv_precompose(argc, argv); commit_pager_choice(); =20 if (!help && p->option & NEED_WORK_TREE) diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh index 1542cf6..befe39e 100755 --- a/t/t0050-filesystem.sh +++ b/t/t0050-filesystem.sh @@ -126,6 +126,7 @@ test_expect_success "setup unicode normalization te= sts" ' =20 test_create_repo unicode && cd unicode && + git config core.precomposedunicode false && touch "$aumlcdiar" && git add "$aumlcdiar" && git commit -m initial && diff --git a/t/t3910-mac-os-precompose.sh b/t/t3910-mac-os-precompose.s= h new file mode 100755 index 0000000..d4763c5 --- /dev/null +++ b/t/t3910-mac-os-precompose.sh @@ -0,0 +1,104 @@ +#!/bin/sh +# +# Copyright (c) 2012 Torsten B=C3=B6gershausen +# + +test_description=3D'utf-8 decomposed (nfd) converted to precomposed (n= fc)' + +. ./test-lib.sh + +Adiarnfc=3D`printf '\303\204'` +Odiarnfc=3D`printf '\303\226'` +Adiarnfd=3D`printf 'A\314\210'` +Odiarnfd=3D`printf 'O\314\210'` + +mkdir junk && +>junk/"$Adiarnfc" && +case "$(cd junk && echo *)" in + "$Adiarnfd") + test_nfd=3D1 + ;; + *) ;; +esac +rm -rf junk + +if test "$test_nfd" +then + test_expect_success "detect if nfd needed" ' + precomposedunicode=3D`git config --bool core.precomposedunicode` && + test "$precomposedunicode" =3D true + ' + test_expect_success "setup" ' + >x && + git add x && + git commit -m "1st commit" && + git rm x && + git commit -m "rm x" + ' + test_expect_success "setup case mac" ' + git checkout -b mac_os + ' + # This will test nfd2nfc in readdir() + test_expect_success "add file Adiarnfc" ' + echo f.Adiarnfc >f.$Adiarnfc && + git add f.$Adiarnfc && + git commit -m "add f.$Adiarnfc" + ' + # This will test nfd2nfc in git add() + test_expect_success "stage file d.Adiarnfd/f.Adiarnfd" ' + mkdir d.$Adiarnfd && + echo d.$Adiarnfd/f.$Adiarnfd >d.$Adiarnfd/f.$Adiarnfd && + git stage d.$Adiarnfd/f.$Adiarnfd && + git commit -m "add d.$Adiarnfd/f.$Adiarnfd" + ' + test_expect_success "add link Adiarnfc" ' + ln -s d.$Adiarnfd/f.$Adiarnfd l.$Adiarnfc && + git add l.$Adiarnfc && + git commit -m "add l.Adiarnfc" + ' + # This will test git log + test_expect_success "git log f.Adiar" ' + git log f.$Adiarnfc > f.Adiarnfc.log && + git log f.$Adiarnfd > f.Adiarnfd.log && + test -s f.Adiarnfc.log && + test -s f.Adiarnfd.log && + test_cmp f.Adiarnfc.log f.Adiarnfd.log && + rm f.Adiarnfc.log f.Adiarnfd.log + ' + # This will test git ls-files + test_expect_success "git lsfiles f.Adiar" ' + git ls-files f.$Adiarnfc > f.Adiarnfc.log && + git ls-files f.$Adiarnfd > f.Adiarnfd.log && + test -s f.Adiarnfc.log && + test -s f.Adiarnfd.log && + test_cmp f.Adiarnfc.log f.Adiarnfd.log && + rm f.Adiarnfc.log f.Adiarnfd.log + ' + # This will test git mv + test_expect_success "git mv" ' + git mv f.$Adiarnfd f.$Odiarnfc && + git mv d.$Adiarnfd d.$Odiarnfc && + git mv l.$Adiarnfd l.$Odiarnfc && + git commit -m "mv Adiarnfd Odiarnfc" + ' + # Files can be checked out as nfc + # And the link has been corrected from nfd to nfc + test_expect_success "git checkout nfc" ' + rm f.$Odiarnfc && + git checkout f.$Odiarnfc + ' + # Make it possible to checkout files with their NFD names + test_expect_success "git checkout file nfd" ' + rm -f f.* && + git checkout f.$Odiarnfd + ' + # Make it possible to checkout links with their NFD names + test_expect_success "git checkout link nfd" ' + rm l.* && + git checkout l.$Odiarnfd + ' +else + say "Skipping nfc/nfd tests" +fi + +test_done --=20 1.7.8.rc0.43.gb49a8