From: Olaf Weber <olaf@sgi.com>
To: Andi Kleen <andi@firstfloor.org>, Ben Myers <bpm@sgi.com>
Cc: linux-fsdevel@vger.kernel.org, tinguely@sgi.com, xfs@oss.sgi.com
Subject: Re: [RFC v2] Unicode/UTF-8 support for XFS
Date: Tue, 23 Sep 2014 15:01:20 +0200 [thread overview]
Message-ID: <54216F20.1090302@sgi.com> (raw)
In-Reply-To: <87lhpbhfgg.fsf@tassilo.jf.intel.com>
On 22-09-14 16:55, Andi Kleen wrote:
> Ben Myers <bpm@sgi.com> writes:
>>
>> Strings are normalized using a trie that stores the relevant
>> information. The trie itself is about 250kB in size, and lives in a
>> separate module.
>
> So 250kB bloat -- and what does this fix exactly?
>
> Someone putting random ligatures into their file names and expecting
> the file to be the same as before. Can't they just not do that?
I like the 'office' example because it is applicable to English and easy to
explain. Once you move away from English examples are much easier to come
by. Take a Dutch name like 'Renée Soutendijk'.
These two forms both spell Renée in UTF-8:
0x52 0x65 0x6E 0xC3 0xA9 0x65
0x52 0x65 0x6E 0x65 0xCC 0x81 0x65
The difference is
LATIN SMALL LETTER E WITH ACUTE (U+00E9)
LATIN SMALL LETTER E (U+0065) COMBINING ACUTE ACCENT (U+0301)
and corresponds to the difference between NFC and NFD.
These two forms both spell Soutendijk in UTF-8:
0x53 0x6F 0x75 0x74 0x65 0x6E 0x64 0x69 0x6A 0x6B
0x53 0x6F 0x75 0x74 0x65 0x6E 0x64 0xC4 0xB3 0x6B
The difference is
LATIN SMALL LETTER I (U+0069) LATIN SMALL LETTER J (U+006A)
LATIN SMALL LIGATURE IJ (U+0133)
and the former is the compatibility decomposition of the latter, the 'K' in
NFKC/NFKD.
Do accented letters count as random ligatures that people should just not use?
The bulk of the table deals with Korean.
Olaf
--
Olaf Weber SGI Phone: +31(0)30-6696796
Veldzigt 2b Fax: +31(0)30-6696799
Technical Lead 3454 PW de Meern Vnet: 955-6796
Storage Software The Netherlands Email: olaf@sgi.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-09-23 13:01 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-18 19:56 [RFC v2] Unicode/UTF-8 support for XFS Ben Myers
2014-09-18 20:08 ` [PATCH 01/10] xfs: return the first match during case-insensitive lookup Ben Myers
2014-09-18 20:09 ` [PATCH 02/10] xfs: rename XFS_CMP_CASE to XFS_CMP_MATCH Ben Myers
2014-09-18 20:09 ` [PATCH 03/13] libxfs: add xfs_nameops.normhash Ben Myers
2014-09-18 20:10 ` [PATCH 04/10] xfs: change interface of xfs_nameops.normhash Ben Myers
2014-09-18 20:10 ` Ben Myers
2014-09-18 20:11 ` [PATCH 05/10] xfs: add a superblock feature bit to indicate UTF-8 support Ben Myers
2014-09-18 20:13 ` [PATCH 03/10] xfs: add xfs_nameops.normhash Ben Myers
2014-09-18 20:14 ` [PATCH 06/10] xfs: add unicode character database files Ben Myers
2014-09-18 20:14 ` Ben Myers
2014-09-22 20:54 ` Dave Chinner
2014-09-22 20:54 ` Dave Chinner
2014-09-26 17:09 ` Christoph Hellwig
2014-09-18 20:15 ` [PATCH 07/10] xfs: add trie generator and supporting code for UTF-8 Ben Myers
2014-09-22 20:57 ` Dave Chinner
2014-09-22 20:57 ` Dave Chinner
2014-09-23 18:57 ` Ben Myers
2014-09-26 17:10 ` Christoph Hellwig
2014-09-18 20:16 ` [PATCH 08/10] xfs: add xfs_nameops for utf8 and utf8+casefold Ben Myers
2014-09-18 20:16 ` Ben Myers
2014-09-18 20:17 ` [PATCH 09/10] xfs: apply utf-8 normalization rules to user extended attribute names Ben Myers
2014-09-18 20:18 ` [PATCH 10/10] xfs: implement demand load of utf8norm.ko Ben Myers
2014-09-18 20:31 ` [PATCH 00/13] xfsprogs: Unicode/UTF-8 support for XFS Ben Myers
2014-09-18 20:33 ` [PATCH 01/13] libxfs: return the first match during case-insensitive lookup Ben Myers
2014-09-18 20:33 ` [PATCH 02/13] libxfs: rename XFS_CMP_CASE to XFS_CMP_MATCH Ben Myers
2014-09-18 20:34 ` [PATCH 03/13] libxfs: add xfs_nameops.normhash Ben Myers
2014-09-18 20:35 ` [PATCH 04/13] libxfs: change interface of xfs_nameops.normhash Ben Myers
2014-09-18 20:35 ` Ben Myers
2014-09-18 20:36 ` [PATCH 05/13] libxfs: add a superblock feature bit to indicate UTF-8 support Ben Myers
2014-09-18 20:37 ` [PATCH 06/13] xfsprogs: add unicode character database files Ben Myers
2014-09-18 20:38 ` [PATCH 07/13] libxfs: add trie generator and supporting code for UTF-8 Ben Myers
2014-09-18 20:38 ` [PATCH 08/13] libxfs: add xfs_nameops for utf8 and utf8+casefold Ben Myers
2014-09-18 20:39 ` [PATCH 09/13] libxfs: apply utf-8 normalization rules to user extended attribute names Ben Myers
2014-09-18 20:40 ` [PATCH 10/13] xfsprogs: add utf8 support to growfs Ben Myers
2014-09-18 20:41 ` [PATCH 11/13] xfsprogs: add utf8 support to mkfs.xfs Ben Myers
2014-09-18 20:42 ` [PATCH 12/13] xfsprogs: add utf8 support to xfs_repair Ben Myers
2014-09-18 20:42 ` Ben Myers
2014-09-18 20:43 ` [PATCH 13/13] xfsprogs: add a preliminary test for utf8 support Ben Myers
2014-09-19 16:06 ` [PATCH 07a/13] xfsprogs: add trie generator for UTF-8 Ben Myers
2014-09-23 18:34 ` Roger Willcocks
2014-09-24 23:11 ` Ben Myers
2014-09-19 16:07 ` [PATCH 07b/13] libxfs: add supporting code " Ben Myers
2014-09-18 21:10 ` [RFC v2] Unicode/UTF-8 support for XFS Ben Myers
2014-09-18 21:24 ` Zach Brown
2014-09-18 21:24 ` Zach Brown
2014-09-18 22:23 ` Ben Myers
2014-09-19 16:03 ` [PATCH 07a/10] xfs: add trie generator for UTF-8 Ben Myers
2014-09-19 16:04 ` [PATCH 07b/10] xfs: add supporting code " Ben Myers
2014-09-22 14:55 ` [RFC v2] Unicode/UTF-8 support for XFS Andi Kleen
2014-09-22 14:55 ` Andi Kleen
2014-09-22 18:41 ` Ben Myers
2014-09-22 19:29 ` Andi Kleen
2014-09-22 19:29 ` Andi Kleen
2014-09-23 16:13 ` Olaf Weber
2014-09-23 20:15 ` Andi Kleen
2014-09-23 20:45 ` Ben Myers
2014-09-23 20:45 ` Ben Myers
2014-09-24 11:07 ` Olaf Weber
2014-09-26 14:06 ` Olaf Weber
2014-09-23 13:01 ` Olaf Weber [this message]
2014-09-23 20:02 ` Andi Kleen
2014-09-22 22:26 ` Dave Chinner
2014-09-22 22:26 ` Dave Chinner
2014-09-24 13:21 ` Olaf Weber
2014-09-24 13:21 ` Olaf Weber
2014-09-24 23:10 ` Dave Chinner
2014-09-24 23:10 ` Dave Chinner
2014-09-25 13:33 ` Zuckerman, Boris
2014-09-26 14:50 ` Olaf Weber
2014-09-26 16:56 ` Christoph Hellwig
2014-09-26 16:56 ` Christoph Hellwig
2014-09-26 17:04 ` Jeremy Allison
2014-09-26 17:06 ` Christoph Hellwig
2014-09-26 17:13 ` Jeremy Allison
2014-09-26 17:13 ` Jeremy Allison
2014-09-26 19:37 ` Olaf Weber
2014-09-26 19:46 ` Jeremy Allison
2014-09-26 20:03 ` Olaf Weber
2014-09-29 20:16 ` J. Bruce Fields
2014-09-29 20:16 ` J. Bruce Fields
2014-09-29 11:06 ` Christoph Hellwig
2014-09-29 11:06 ` Christoph Hellwig
2014-09-26 17:30 ` Ben Myers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54216F20.1090302@sgi.com \
--to=olaf@sgi.com \
--cc=andi@firstfloor.org \
--cc=bpm@sgi.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=tinguely@sgi.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.