git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Torsten Bögershausen" <tboegi@web.de>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [PATCH] t3910: show failure of core.precomposeunicode with decomposed filenames
Date: Tue, 29 Apr 2014 09:39:53 +0200	[thread overview]
Message-ID: <535F5749.1060500@web.de> (raw)
In-Reply-To: <20140429032347.GB11979@sigill.intra.peff.net>

On 04/29/2014 05:23 AM, Jeff King wrote:
> On Mon, Apr 28, 2014 at 10:49:30PM +0200, Torsten Bögershausen wrote:
>
>> OK, thanks for the description.
>> In theory we can make Git "composition ignoring" by changing
>> index_file_exists() in name-hash.c.
>> (Both names must be precomposed first and compared then)
> Yeah, we could perhaps get away without storing the extra precomposed
> form if we just stored the entries under their precomposed hash, and
> then taught same_name to use a slower precompose-aware comparison. But I
> also see that we do binary searches in index_name_pos (called by
> index_name_is_other). I don't know if we'd have to deal with this
> problem there, too.
Just loud thinking:
We precompose whenever we read file names from disc, that's done in 
readdir()
We precompose whenever we get an argv into Git, that's done in 
precompose_argv()
We precompose every time we read file names from the index file on 
disc(?) into memory.
That we don't do today, and my suggestion to hack name-hash.c isn't a 
good one.

Probably we need to go into read-cache.c, or more places. I'm not an 
expert here knowing
all Git internal details.
Basically all places where strings containing file names are put into 
memory are effected,
and I wouldn't be too concerned about CPU cycles.

>> I don't know how much people are using Git before 1.7.12 (the
>> first version supporting precomposed unicode).
>>
>> Could we simply ask them to upgrade ?
> I'm not sure what you mean here. Upgrading won't help, because the
> values are baked into existing history created with the old versions
> forever. Any time I "git checkout v1.0" on the broken project, a modern
> git will complain about the ghost untracked file.
It depends if all file names in a certain repo are stored decomposed,
(in this case everybody can set core.precomposeunicode false)
or if there is a mixture having precomposed and decomposed
in different comits/directories...
In this case we can normalize the master branch.
For older commit the users need to wait for an updated Git version,
until that they need to live with the ghosts as good as they can.

>
>> The next problem is that people need to agree if the repo should store
>> names in pre- or decomposed form.
>> (My voice is for precomposed)
>> Unfortunatly the core.precomposeunicode is repo-local, so everybody
>> needs to "agree globally" and "configure locally".
> Yeah, that was sort of my "point 1" from the patch. I'm a bit worried
> that it creates problems for people on other systems, though. Linux
> people do not need to care about precomposed/decomposed at all, and any
> attempt we make to automatically handle it is going to run afoul of
> non-utf8 encodings. Not to mention that it does not solve the "git
> checkout v1.0" problem above.
Not sure what is meant by non-utf8 encodings.
Mac OS X can only handle Unicode filenames,
and a single ISO-8859-1 will be returned as "%XY" from readdir().
So if you want to share a repo with Mac OS X (and/or Windows)
Unicode should be used.
Are you saying that there is a Linux station feeding in file names in  
e.g. 8859-1, EUC ?
My experience/knowledge is that you can not do that (in a useful way).


>> Side note:
>> I which we had this config variable travelling with the repo, like .gitattributes does
>> for text dealing with CRLF-LF.
> Yeah, I guess if we wanted to enforce it everywhere, you would have to
> use .gitattributes since we cannot safely turn on such a feature by
> default.
>
>> I don't know how many reports you have, reading all this it feels as if the effected users
>> could "normalize" their repos and run "git config core.precomposeunicode true", followed
>> by "git config --global core.precomposeunicode true".
>> Does that sound like a possible way forward ?
> I have just a handful of reports. Maybe 3-4? I didn't dig them all up
> today, as it would be a non-trivial effort. But I have no idea how good
> a sampling that is.
The following could help, may be:
git -c core.quotepath=false ls-files | iconv -f UTF-8-MAC -t UTF-8 >expected
git -c core.quotepath=false ls-files >actual
diff expected actual
>
> -Peff

  reply	other threads:[~2014-04-29  7:40 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-28 16:16 [PATCH] t3910: show failure of core.precomposeunicode with decomposed filenames Jeff King
2014-04-28 19:17 ` Junio C Hamano
2014-04-28 19:35   ` Jeff King
2014-04-28 19:52     ` Torsten Bögershausen
2014-04-28 20:03       ` Jeff King
2014-04-28 20:49         ` Torsten Bögershausen
2014-04-29  3:23           ` Jeff King
2014-04-29  7:39             ` Torsten Bögershausen [this message]
2014-04-29  3:15     ` Jeff King
2014-04-29 17:12 ` Junio C Hamano
2014-04-29 18:02   ` Jeff King
2014-04-29 18:49     ` Junio C Hamano
2014-04-29 19:46       ` Jeff King
2014-04-30 14:57     ` Torsten Bögershausen
2014-05-04 12:04       ` Torsten Bögershausen
2014-05-04  6:13 ` Torsten Bögershausen
2014-05-05 21:46   ` Jeff King
2014-05-06 10:11     ` Erik Faye-Lund
2014-05-07 19:16     ` Torsten Bögershausen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=535F5749.1060500@web.de \
    --to=tboegi@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).