From: "H. Peter Anvin" <hpa@zytor.com>
To: Mark Junker <mjscod@web.de>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] Use FIX_UTF8_MAC to enable conversion from UTF8-MAC to UTF8
Date: Mon, 21 Jan 2008 20:08:46 -0800 [thread overview]
Message-ID: <47956C4E.1080903@zytor.com> (raw)
In-Reply-To: <fn1sk4$uh4$1@ger.gmane.org>
Mark Junker wrote:
> Junio C Hamano schrieb:
>
>> I do not know how Macintosh libc implements "struc dirent", but
>> this approach does not work in general.
>
> IMHO there is no need that this approach works in general because this
> is a fix for MacOSX systems only. I also use d_namlen which might not be
> available on other systems. But on MacOSX this works as expected.
>
>> yet you can obtain a path component longer than 256 bytes.
>> Apparently the library allocates longer d_name[] field than what
>> is shown to the user.
>
> This is not a problem either because on MacOSX we get decomposed UTF8
> and we always convert to composed UTF8. This means that the string
> returned from reencode_string will always be smaller than the original
> filename that had to be reencoded.
>
That's not true! There are strings which gets longer when a composing
normalization is applied. Please see section 3.3 of Unicode Techical
Report 36:
http://www.unicode.org/reports/tr36/
> People assume that NFC always composes, and thus is the same or
> shorter length than the original source. However, some characters
> decompose in NFC.
(NFC = Normalization Form Composing.)
U+1D160 MUSICAL SYMBOL EIGHT NOTE is given as an example with a 3x
expansion factor when encoded in UTF-8 (I don't know what it expands to;
seems odd to me.)
-hpa
next prev parent reply other threads:[~2008-01-22 4:09 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-21 9:12 [PATCH] Use FIX_UTF8_MAC to enable conversion from UTF8-MAC to UTF8 Mark Junker
2008-01-21 9:45 ` Mark Junker
2008-01-21 9:50 ` Mark Junker
2008-01-21 9:55 ` Mark Junker
2008-01-21 10:15 ` Junio C Hamano
2008-01-21 10:36 ` Mark Junker
2008-01-21 11:04 ` Junio C Hamano
2008-01-21 11:43 ` Mark Junker
2008-01-22 4:08 ` H. Peter Anvin [this message]
2008-01-22 4:59 ` Linus Torvalds
2008-01-22 7:16 ` Linus Torvalds
2008-01-22 7:54 ` Junio C Hamano
2008-01-22 22:34 ` Robin Rosenberg
2008-01-22 12:20 ` Dmitry Potapov
2008-01-22 11:57 ` Dmitry Potapov
2008-01-22 14:21 ` Nicolas Pitre
2008-01-22 15:58 ` Linus Torvalds
2008-01-21 11:24 ` Johannes Schindelin
2008-01-21 11:29 ` Junio C Hamano
2008-01-21 11:49 ` Mark Junker
2008-01-21 12:09 ` Johannes Schindelin
2008-01-21 19:14 ` Johannes Schindelin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47956C4E.1080903@zytor.com \
--to=hpa@zytor.com \
--cc=git@vger.kernel.org \
--cc=mjscod@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox