Git development
 help / color / mirror / Atom feed
From: "H. Peter Anvin" <hpa@zytor.com>
To: Mark Junker <mjscod@web.de>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] Use FIX_UTF8_MAC to enable conversion from UTF8-MAC to UTF8
Date: Mon, 21 Jan 2008 20:08:46 -0800	[thread overview]
Message-ID: <47956C4E.1080903@zytor.com> (raw)
In-Reply-To: <fn1sk4$uh4$1@ger.gmane.org>

Mark Junker wrote:
> Junio C Hamano schrieb:
> 
>> I do not know how Macintosh libc implements "struc dirent", but
>> this approach does not work in general.
> 
> IMHO there is no need that this approach works in general because this 
> is a fix for MacOSX systems only. I also use d_namlen which might not be 
> available on other systems. But on MacOSX this works as expected.
> 
>> yet you can obtain a path component longer than 256 bytes.
>> Apparently the library allocates longer d_name[] field than what
>> is shown to the user.
> 
> This is not a problem either because on MacOSX we get decomposed UTF8 
> and we always convert to composed UTF8. This means that the string 
> returned from reencode_string will always be smaller than the original 
> filename that had to be reencoded.
> 

That's not true!  There are strings which gets longer when a composing 
normalization is applied.  Please see section 3.3 of Unicode Techical 
Report 36:

	http://www.unicode.org/reports/tr36/

 > People assume that NFC always composes, and thus is the same or
 > shorter length than the original source. However, some characters
 > decompose in NFC.

(NFC = Normalization Form Composing.)

U+1D160 MUSICAL SYMBOL EIGHT NOTE is given as an example with a 3x 
expansion factor when encoded in UTF-8 (I don't know what it expands to; 
seems odd to me.)

	-hpa

  parent reply	other threads:[~2008-01-22  4:09 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-21  9:12 [PATCH] Use FIX_UTF8_MAC to enable conversion from UTF8-MAC to UTF8 Mark Junker
2008-01-21  9:45 ` Mark Junker
2008-01-21  9:50   ` Mark Junker
2008-01-21  9:55     ` Mark Junker
2008-01-21 10:15       ` Junio C Hamano
2008-01-21 10:36         ` Mark Junker
2008-01-21 11:04           ` Junio C Hamano
2008-01-21 11:43             ` Mark Junker
2008-01-22  4:08           ` H. Peter Anvin [this message]
2008-01-22  4:59         ` Linus Torvalds
2008-01-22  7:16           ` Linus Torvalds
2008-01-22  7:54             ` Junio C Hamano
2008-01-22 22:34               ` Robin Rosenberg
2008-01-22 12:20             ` Dmitry Potapov
2008-01-22 11:57           ` Dmitry Potapov
2008-01-22 14:21           ` Nicolas Pitre
2008-01-22 15:58             ` Linus Torvalds
2008-01-21 11:24 ` Johannes Schindelin
2008-01-21 11:29   ` Junio C Hamano
2008-01-21 11:49   ` Mark Junker
2008-01-21 12:09     ` Johannes Schindelin
2008-01-21 19:14       ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47956C4E.1080903@zytor.com \
    --to=hpa@zytor.com \
    --cc=git@vger.kernel.org \
    --cc=mjscod@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox