All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Torsten Bögershausen" <tboegi@web.de>
To: Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] t3910: show failure of core.precomposeunicode with decomposed filenames
Date: Mon, 28 Apr 2014 21:52:07 +0200	[thread overview]
Message-ID: <535EB167.4030804@web.de> (raw)
In-Reply-To: <20140428193502.GD25993@sigill.intra.peff.net>


On 28.04.14 21:35, Jeff King wrote:
> On Mon, Apr 28, 2014 at 12:17:28PM -0700, Junio C Hamano wrote:
>
>>>   3. Convert index filenames to their precomposed form when
>>>      we read the index from disk. This would be efficient,
>>>      but we would have to be careful not to write the
>>>      precomposed forms back out to disk.
>> I think this may be the right approach, especially if you are going
>> to do this only when core.precomposeunicode is set.
>>
>> the reasoning behind "we would have to be careful not to write"
>> part, is unclear to me, though.  Don't decomposing filesystems
>> perform the manglig from the precomposed form without even being
>> asked to do so, just like a case insensitive filesystem will
>> overwrite an existing "makefile" on a request to write to
>> "Makefile"?
> Sorry, I meant "do not write the precomposed forms back out to the
> on-disk index". And by extension, do not update cache-tree and write
> them out to git trees.
>
> IOW, it is not enough to just set cache_entry->name to the normalized
> form. You'd need to store both.
>
> Since such entries are in the minority, and because cache_entry is
> already a variable-length struct, I think you could get away with
> sticking it after the "name" field, and then comparing like:
>
>   const char *ce_normalized_name(struct cache_entry *ce, size_t *len)
>   {
> 	const char *ret;
>
> 	/* Normal, fast path */
> 	if (!(ce->ce_flags & CE_NORMALIZED_NAME)) {
> 		len = ce_namelen(ce);
> 		return ce->name;
> 	}
>
> 	/* Slow path for normalized names */
> 	ret = ce->name + ce->namelen + 1;
> 	*len = strlen(name);
> 	return ret;
>   }
>
> The strlen is probably OK since such paths are presumably in the
> minority (even for UTF-8 paths, we can avoid storing the extra copy if
> they do not need any normalization). Or we could get fancy and encode
> the length in front, but I am not sure it is worth the complexity.
>
> Anyway, the tricky part is then making sure that all cache_entry name
> comparisons use ce_normalized_name instead of ce->name.
>
> -Peff
To my knowledge repos with decomposed unicode should be rare in practice.
I only can speak for european (or latin based) or cyrillic languages myself:

- It is difficult (but not impossible) to enter decomposed unicode on the keyboard.
- Some programs under Mac OS X do not handle decomposed code points well,
  an "ä" may be displayed as "¨a" for example.
- Pushing and pulling to Windows or Linux is possible, but the same problems here:
  the keyboard is not prepared to enter the decomposed form, and the display may be wrong.

The only possible use case for decomposed unicode I am aware of is when you use git-bzr,
because bzr does not do the precomposition (and neither hg to my knowledge).

So for me the test case could sense, even if I think that nobody (TM) uses an old Git version
under Mac OS X which is not able to handle precomposed unicode.

Unless I have missed something.

  reply	other threads:[~2014-04-28 19:52 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-28 16:16 [PATCH] t3910: show failure of core.precomposeunicode with decomposed filenames Jeff King
2014-04-28 19:17 ` Junio C Hamano
2014-04-28 19:35   ` Jeff King
2014-04-28 19:52     ` Torsten Bögershausen [this message]
2014-04-28 20:03       ` Jeff King
2014-04-28 20:49         ` Torsten Bögershausen
2014-04-29  3:23           ` Jeff King
2014-04-29  7:39             ` Torsten Bögershausen
2014-04-29  3:15     ` Jeff King
2014-04-29 17:12 ` Junio C Hamano
2014-04-29 18:02   ` Jeff King
2014-04-29 18:49     ` Junio C Hamano
2014-04-29 19:46       ` Jeff King
2014-04-30 14:57     ` Torsten Bögershausen
2014-05-04 12:04       ` Torsten Bögershausen
2014-05-04  6:13 ` Torsten Bögershausen
2014-05-05 21:46   ` Jeff King
2014-05-06 10:11     ` Erik Faye-Lund
2014-05-07 19:16     ` Torsten Bögershausen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=535EB167.4030804@web.de \
    --to=tboegi@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.