From: Junio C Hamano <gitster@pobox.com>
To: "Torsten Bögershausen" <tboegi@web.de>
Cc: git@vger.kernel.org, "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: [PATCH V4] git on Mac OS and precomposed unicode
Date: Sat, 21 Jan 2012 14:56:22 -0800 [thread overview]
Message-ID: <7vsjj8acmh.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <201201212036.57632.tboegi@web.de> ("Torsten Bögershausen"'s message of "Sat, 21 Jan 2012 20:36:56 +0100")
[Pinging Nguyen who has worked rather extensively on the start-up sequence
for ideas.]
Torsten Bögershausen <tboegi@web.de> writes:
I'll try to reword the log message a bit below.
> When a file called "LATIN CAPITAL LETTER A WITH DIAERESIS" (in utf-8
> encoded as 0xc3 0x84) is created, the Mac OS filesystem converts
> "precomposed unicode" into "decomposed unicode". readdir() will return
> 0x41 0xcc 0x88 for such a file, that does not match what the caller
> thought it created.
>
> To work around this braindamage, allow git on Mac OS to optionally use a
> wrapper for readdir() that converts decomposed unicode back into the
> precomposed form, which most other platforms use natively. This makes it
> easier for Mac OS users to work together on the same project with people
> on other platforms (Note that not all Windows versions support UTF-8
> yet. Msysgit needs the unicode branch, cygwin supports UTF-8 since
> 1.7). This allows sharing git repositories stored on a VFAT file system
> (e.g. a USB stick), and mounted network share using samba.
>
> This new feature is controlled by setting a new configuration variable
> "core.precomposedunicode" to "true". Unless the variable is set to "true",
> Git on Mac OS behaves exactly as before, for backward compatiblity.
>
> The code in compat/precomposed_utf8.c implements basically 4 new
> functions: precomposed_utf8_opendir(), precomposed_utf8_readdir(),
> precomposed_utf8_closedir() precompose_argv()
>
> In order to prevent that ever a file name in decomposed unicode is
> entering the index, a "brute force" attempt is taken: all arguments into
> git (argv[1]..argv[n]) are converted into precomposed unicode. This is
> done in git.c by calling precompose_argv(). This function is actually a
> #define, and it is only defined under Mac OS. Nothing is converted on
> any other platforms.
It may be just me, but the above looks more in line with the usual style
of writing in our existing log messages.
Is this UTF-8 decomposition only an issue on HFS+, or does it happen on
any filesystem mounted on a MacOS box? If the former, then the second line
of the first paragraph needs further rephrasing, e.g. "... is created,
HFS+, the primary filesystem on the Mac OS, converts ...".
> Auto sensing:
> When creating a new git repository with "git init" or "git clone",
> "core.precomposedunicode" will be set "false".
>
> The user needs to activate this feature manually.
> She typically sets core.precomposedunicode to "true" on HFS and VFAT,
> or file systems mounted via SAMBA onto a Linux box.
I am not sure about this design decision.
I agree that it is prudent to introduce a new feature disabled by default,
and I can understand that you tried to make the feature more discoverable
by setting it explicitly to "false".
But I do not think it is a good idea. If a user is on MacOS and has only
HFS+, then it would be more convenient to have the configuration set to
true in $HOME/.gitconfig once and for all, to affect all repositories on
the box. "git init" dropping the explicit "false" to any new repositories
defeats that.
Wouldn't it make more sense if your "git init" did it this way?
* Do not do anything, if you know core.precomposedunicode is already
set (in /etc/gitconfig or $HOME/.gitconfig);
* Otherwise, if the "probe" says "yes, we are on HFS+", issue an
advice message to suggest the user to set it either in the
repository specific .git/config or in $HOME/.gitconfig file.
> +core.precomposedunicode::
> + This option is only used by Mac OS implementation of git.
> + When core.precomposedunicode=true,
> + git reverts the unicode decomposition of filenames done by Mac OS.
> + This is useful when pulling/pushing from repositories containing utf-8
> + encoded filenames using precomposed unicode (like Linux).
I would imagine that if the caller of creat(2) named the path in the
decomposed form, Mac OS would store it unaltered; strictly speaking, we
shouldn't say "reverts". How about:
When set to true, pathnames in decomposed UTF-8 read from the
filesystem are converted to precomposed UTF-8 before they are used by
Git, to improve interoperability with other platforms.
> +void precompose_argv(int argc, const char **argv)
> +{
> + int i = 0;
> + const char *oldarg;
> + char *newarg;
> + iconv_t ic_precompose;
> +
> + git_config(precomposed_unicode_config, NULL);
As the first thing called after main(), I still doubt this is a safe thing
to do (Pinging Nguyen who has worked rather extensively on the start-up
sequence for ideas). This is ifdefed away and will not break things on
other platforms, which may make it even harder to diagnose breakages.
next prev parent reply other threads:[~2012-01-21 22:56 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-21 19:36 [PATCH V4] git on Mac OS and precomposed unicode Torsten Bögershausen
2012-01-21 22:28 ` Carlos Martín Nieto
2012-01-29 16:26 ` Erik Faye-Lund
2012-01-21 22:56 ` Junio C Hamano [this message]
2012-01-22 9:58 ` Nguyen Thai Ngoc Duy
2012-01-22 10:03 ` Nguyen Thai Ngoc Duy
2012-06-24 15:47 ` Torsten Bögershausen
2012-07-25 20:45 ` Robin Rosenberg
2012-01-29 10:29 ` Torsten Bögershausen
2012-01-29 12:57 ` Torsten Bögershausen
-- strict thread matches above, loose matches on Subject: below --
2012-01-21 19:36 Torsten Bögershausen
2012-01-21 22:14 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7vsjj8acmh.fsf@alter.siamese.dyndns.org \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=pclouds@gmail.com \
--cc=tboegi@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).