linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
To: tytso@mit.edu, david@fromorbit.com, olaf@sgi.com,
	viro@zeniv.linux.org.uk
Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	alvaro.soliverez@collabora.co.uk, kernel@lists.collabora.co.uk,
	Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Subject: [PATCH RFC v2 03/13] charsets: utf8: Add unicode character database files
Date: Thu, 25 Jan 2018 00:53:39 -0200	[thread overview]
Message-ID: <20180125025349.31494-4-krisman@collabora.co.uk> (raw)
In-Reply-To: <20180125025349.31494-1-krisman@collabora.co.uk>

From: Olaf Weber <olaf@sgi.com>

Add files from the Unicode Character Database, version 10.0.0, to the source.
A helper program that generates a trie used for normalization from these
files is part of a separate commit.

- Notes on the update from 8.0.0 and 10.0.0:

The structure of ucd files and special cases have not experienced any
changes between versions 8.0.0 and 10.0.0.  8.0.0 saw the addition of
Cherokee LC characters, which is an interesting case for case-folding.
The update is accompanied by new tests on the test_ucd module to catch
specific cases.  No changes to mkutf8data script was required for the
update.

Signed-off-by: Olaf Weber <olaf@sgi.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
  [Move ucd directory to lib/charsets]
  [Update to ucd-10.0.0]
---
 lib/charsets/ucd/README | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)
 create mode 100644 lib/charsets/ucd/README

diff --git a/lib/charsets/ucd/README b/lib/charsets/ucd/README
new file mode 100644
index 000000000000..67f2075d1fca
--- /dev/null
+++ b/lib/charsets/ucd/README
@@ -0,0 +1,33 @@
+The files in this directory are part of the Unicode Character Database
+for version 10.0.0 of the Unicode standard.
+
+The full set of files can be found here:
+
+  http://www.unicode.org/Public/10.0.0/ucd/
+
+The latest released version of the UCD can be found here:
+
+  http://www.unicode.org/Public/UCD/latest/
+
+The files in this directory are identical, except that they have been
+renamed with a suffix indicating the unicode version.
+
+Individual source links:
+
+  http://www.unicode.org/Public/10.0.0/ucd/CaseFolding.txt
+  http://www.unicode.org/Public/10.0.0/ucd/DerivedAge.txt
+  http://www.unicode.org/Public/10.0.0/ucd/extracted/DerivedCombiningClass.txt
+  http://www.unicode.org/Public/10.0.0/ucd/DerivedCoreProperties.txt
+  http://www.unicode.org/Public/10.0.0/ucd/NormalizationCorrections.txt
+  http://www.unicode.org/Public/10.0.0/ucd/NormalizationTest.txt
+  http://www.unicode.org/Public/10.0.0/ucd/UnicodeData.txt
+
+md5sums
+
+  7893b6e005c5a521319a0d12062ae122  CaseFolding-10.0.0.txt
+  a602e4b44de3350087e40f2eb2184898  DerivedAge-10.0.0.txt
+  5abdeb21af4edcc5d1e4c0b5802fc7a7  DerivedCombiningClass-10.0.0.txt
+  eda11c2c2e3c308d9d3b90e2b3282024  DerivedCoreProperties-10.0.0.txt
+  425ece5ffbecd0140d98c13ce05724aa  NormalizationCorrections-10.0.0.txt
+  7296fe7aa07d7d288e65d559af2ad49b  NormalizationTest-10.0.0.txt
+  2a52f30695dcc821f0f224650552beaf  UnicodeData-10.0.0.txt
-- 
2.15.1

  parent reply	other threads:[~2018-01-25  2:53 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-25  2:53 [PATCH RFC v2 00/13] NLS/UTF-8 Case-Insensitive lookups for ext4 and VFS proposal Gabriel Krisman Bertazi
2018-01-25  2:53 ` [PATCH RFC v2 01/13] charsets: Introduce middle-layer for character encoding Gabriel Krisman Bertazi
2018-01-25  2:53 ` [PATCH RFC v2 02/13] charsets: ascii: Wrap ascii functions to charsets library Gabriel Krisman Bertazi
2018-01-25  2:53 ` Gabriel Krisman Bertazi [this message]
2018-01-25  2:53 ` [PATCH RFC v2 04/13] scripts: add trie generator for UTF-8 Gabriel Krisman Bertazi
2018-01-25  2:53 ` [PATCH RFC v2 05/13] charsets: utf8: Introduce code for UTF-8 normalization Gabriel Krisman Bertazi
2018-01-25  2:53 ` [PATCH RFC v2 06/13] charsets: utf8: reduce the size of utf8data[] Gabriel Krisman Bertazi
2018-01-25  2:53 ` [PATCH RFC v2 07/13] charsets: utf8: Hook-up utf-8 code to charsets library Gabriel Krisman Bertazi
2018-01-25  2:53 ` [PATCH RFC v2 08/13] charsets: utf8: Introduce test module for kernel UTF-8 implementation Gabriel Krisman Bertazi
2018-01-25  2:53 ` [PATCH RFC v2 09/13] ext4: Add ignorecase mount option Gabriel Krisman Bertazi
2018-01-25  2:53 ` [PATCH RFC v2 10/13] ext4: Include encoding information on the superblock Gabriel Krisman Bertazi
2018-01-25  2:53 ` [PATCH RFC v2 11/13] fscrypt: Introduce charset-based matching functions Gabriel Krisman Bertazi
2018-01-25  2:53 ` [PATCH RFC v2 12/13] ext4: Support charset name matching Gabriel Krisman Bertazi
2018-01-25  2:53 ` [PATCH RFC v2 13/13] ext4: Implement ext4 dcache hooks for custom charsets Gabriel Krisman Bertazi
2018-01-25  3:16 ` [PATCH RFC v2 00/13] NLS/UTF-8 Case-Insensitive lookups for ext4 and VFS proposal Al Viro
2018-01-25 19:32   ` Theodore Ts'o
2018-01-26  2:52     ` Gaoxiang (OS)
2018-02-06  2:24   ` Gabriel Krisman Bertazi
2018-02-06  3:21     ` Gao Xiang
2018-02-12 19:56       ` Gabriel Krisman Bertazi
2018-02-12 22:43         ` Gao Xiang
2018-02-13 22:20           ` Gabriel Krisman Bertazi
2018-02-14 12:27             ` Gao Xiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180125025349.31494-4-krisman@collabora.co.uk \
    --to=krisman@collabora.co.uk \
    --cc=alvaro.soliverez@collabora.co.uk \
    --cc=david@fromorbit.com \
    --cc=kernel@lists.collabora.co.uk \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=olaf@sgi.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).