From: Jeff King <peff@peff.net>
To: "Torsten Bögershausen" <tboegi@web.de>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] t3910: show failure of core.precomposeunicode with decomposed filenames
Date: Mon, 5 May 2014 17:46:58 -0400 [thread overview]
Message-ID: <20140505214658.GA16971@sigill.intra.peff.net> (raw)
In-Reply-To: <5365DA7B.6050000@web.de>
On Sun, May 04, 2014 at 08:13:15AM +0200, Torsten Bögershausen wrote:
> > 1. Tell everyone that NFD in the git repo is wrong, and
> > they should make a new commit to normalize all their
> > in-repo files to be precomposed.
> > This is probably not the right thing to do, because it
> > still doesn't fix checkouts of old history. And it
> > spreads the problem to people on byte-preserving
> > filesystems (like ext4), because now they have to start
> > precomposing their filenames as they are adde to git.
> (typo: ^added)
> I'm not sure if I follow. People running ext4 (or Linux in general,
> or Windows, or Unix) do not suffer from file system
> "feature" of Mac OS, which accepts precomposed/decomposed Unicode
> but returns decompomsed.
What I mean by "spreads the problem" is that git on Linux does not need
to care about utf8 at all. It treats filenames as a byte sequence. But
if we were to start enforcing "filenames should be precomposed utf8",
then people adding files on Linux would want to enforce that, too.
People on Linux could ignore the issue as they do now, but they would
then create problems for OS X users if they add decomposed filenames.
IOW, if the OS X code assumes "all repo filenames are precomposed", then
other systems become a possible vector for violating that assumption.
> > 3. Convert index filenames to their precomposed form when
> > we read the index from disk. This would be efficient,
> > but we would have to be careful not to write the
> > precomposed forms back out to disk.
> Question:
> How could we be careful?
> Mac OS writes always decomposed Unicode to disk.
> (And all other OS tend to use precomposed forms, mainly because the "keyboard
> driver" generates it.)
Sorry, I should have been more clear here. I meant "do not write index
entries using the precomposed forms out to the on-disk index". Because
that would mean that git silently converts your filenames, and it would
look like you have changes to commit whenever you read in a tree with a
decomposed name.
Looking over the patch you sent earlier, I suspect that is part of its
problem (it stores the converted name in the index entry's name field).
> This is my understanding:
> Some possible fixes are:
>
> 1. Accept that NFD in a Git repo which is shared between Mac OS
> and Linux or Windows is problematic.
> Whenever core.precomposeunicode = true, do the following:
> Let Git under Mac OS change all file names in the index
> into the precomposed form when a new commit is done.
> This is probably not a wrong thing to do.
>
> When the index file is read into memory, precompose the file names and compare
> them with the precomposed form coming from precompose_utf8_readdir().
> This avoids decomposed file names to be reported as untracked by "git status.
This is the case I was specifically thinking of above (and I think what
your patch is doing).
> 2. Do all index filename comparisons under Mac OS X using a UTF-8 aware
> comparison function regardless if core.precomposeunicode is set.
> This would probably have bad performance, and somewhat
> defeats the point of converting the filenames at the
> readdir level in the first place.
Right, I'm concerned about performance here, but I wonder if we can
reuse the name-hash solutions from ignorecase.
-Peff
next prev parent reply other threads:[~2014-05-06 16:24 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-28 16:16 [PATCH] t3910: show failure of core.precomposeunicode with decomposed filenames Jeff King
2014-04-28 19:17 ` Junio C Hamano
2014-04-28 19:35 ` Jeff King
2014-04-28 19:52 ` Torsten Bögershausen
2014-04-28 20:03 ` Jeff King
2014-04-28 20:49 ` Torsten Bögershausen
2014-04-29 3:23 ` Jeff King
2014-04-29 7:39 ` Torsten Bögershausen
2014-04-29 3:15 ` Jeff King
2014-04-29 17:12 ` Junio C Hamano
2014-04-29 18:02 ` Jeff King
2014-04-29 18:49 ` Junio C Hamano
2014-04-29 19:46 ` Jeff King
2014-04-30 14:57 ` Torsten Bögershausen
2014-05-04 12:04 ` Torsten Bögershausen
2014-05-04 6:13 ` Torsten Bögershausen
2014-05-05 21:46 ` Jeff King [this message]
2014-05-06 10:11 ` Erik Faye-Lund
2014-05-07 19:16 ` Torsten Bögershausen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140505214658.GA16971@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=tboegi@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).