From: Gabriel Krisman Bertazi <krisman@collabora.com>
To: "Theodore Y. Ts'o" <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [PATCH e2fsprogs 3/9] libe2p: Helpers for configuring the encoding superblock fields
Date: Wed, 21 Nov 2018 14:33:30 -0500 [thread overview]
Message-ID: <87pnuydy45.fsf@collabora.com> (raw)
In-Reply-To: <20181121043216.GA14968@thunk.org> (Theodore Y. Ts'o's message of "Tue, 20 Nov 2018 23:32:16 -0500")
"Theodore Y. Ts'o" <tytso@mit.edu> writes:
> On Mon, Nov 19, 2018 at 10:28:48AM -0500, Gabriel Krisman Bertazi wrote:
>>
>> >> +#define UTF8_NORMALIZATION_TYPE_NFKD (1 << 1)
>> >> +#define UTF8_CASEFOLD_TYPE_NFKDCF (1 << 4)
>
> Where do these values come from? And why are they (1 << 1) and (1 << 4),
> respectively?
>
> I just noticed that these are used in utf8's default flags, when then
> end up getting set in the superblock. So if these are official ext4
> code points, they should have a EXT4_ prefix, not a UTF8_ prefix. It
> also seems that it's not possible to set them in mke2fs (only the
> "strict" flag can be set or unset in e2p_str2encoding_flags).
Hi,
They come from the nls.h kernel header. These flags are passed to the
NLS system to describe the behavior of normalization/casefold functions.
In order to maintain compatibility to previous kernel users, the utf8
module (and others, eventually), still support the "no
normalization/casefold" policy (which I call 'plain' in the kernel).
When I merged utf8n into utf8, it became up to a flag set when loading
the nls table to decide what kind of normalization, if any, should be
done.
> So are we going to support something other than NFKD, or not? If it's
> in the superblock, then we need to make sure the kernel does something
> sane if they are something other than the default. And if we are just
> going to make it be a rule that all ext4 file systems with encoding
> type utf8 v10 will be NFKD, then we should let it be configurable in
> the superblock.
The NLS code in the kernel supports PLAIN and NFKD, but there is no real
reason for ext4 users to request PLAIN at all, which is only for
backward compatibility with filesystems that used the utf8 module
beforehand, so it can't be configured in e2fsprogs. It still makes
sense to store the normalization type in the superblock though, in case
we support other normalization forms in the future and need to do some
conversion.
That said, I am not planning to support other normalization forms in
ext4 in the future.
If the kernel (nls_load_version) finds any value other than TYPE_PLAIN
(0x0) or TYPE_NFKD in the superblock when loading the NLS table, it will
fail the table creation, which, in turn, fails the mount operation.
If you agree with the design above, I will just fix the EXT4_ prefix.
>
>> >> +
>> >> +static const struct ext4_sb_encoding_map {
>> >> + char *name;
>> >> + __u16 default_flags;
>> >> +} ext4_encoding_map[] = {
>> >> + /* 0x0 */ { "ascii", 0x0},
>> >> + /* 0x1 */ {"utf8-10.0.0", UTF8_NORMALIZATION_TYPE_NFKD|UTF8_CASEFOLD_TYPE_NFKDCF},
>
> It might be enough to just use "utf8-10.0". Internally in the Unicode
> standard, they only use the X.Y notation, and given that we're already
> using the utf8 short-name, as opposed to something like "UTF-8
> encoding of Unicode 10.0.0", it might be better to shorten it to utf-8.
>
> I also noticed that Unicode 11.0 has been released in June 2018. For
> poeple interested in scripts like Georgian Mtavruli (which has new
> case folding rules, so it's not just academic on our part), Hanifi
> Rohingya, Mayan Numberals, Historic Sanskrit etc., in their ext4 file
> names, I'm sure they'll appreciate it. :-)
>
> Oh, and I think the FSF will be happier if we use Unicode 11.0, since
> it also features (in addition to a number of new emoji's), the
> Copyleft Symbol. :-)
I can do the update!
--
Gabriel Krisman Bertazi
next prev parent reply other threads:[~2018-11-22 6:09 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-15 21:12 [PATCH e2fsprogs 0/9] Support encoding awareness and casefold Gabriel Krisman Bertazi
2018-10-15 21:12 ` [PATCH e2fsprogs 1/9] e2fsprogs: Add timestamp extension bits to superblock Gabriel Krisman Bertazi
2018-11-19 3:35 ` Theodore Y. Ts'o
2018-10-15 21:12 ` [PATCH e2fsprogs 2/9] e2fsprogs: Reserve feature bit and SB field bit for filename encoding Gabriel Krisman Bertazi
2018-11-19 4:15 ` Theodore Y. Ts'o
2018-10-15 21:12 ` [PATCH e2fsprogs 3/9] libe2p: Helpers for configuring the encoding superblock fields Gabriel Krisman Bertazi
2018-11-19 4:27 ` Theodore Y. Ts'o
2018-11-19 15:28 ` Gabriel Krisman Bertazi
2018-11-21 4:32 ` Theodore Y. Ts'o
2018-11-21 19:33 ` Gabriel Krisman Bertazi [this message]
2018-10-15 21:12 ` [PATCH e2fsprogs 4/9] mke2fs: Configure encoding during superblock initialization Gabriel Krisman Bertazi
2018-11-21 4:55 ` Theodore Y. Ts'o
2018-11-21 19:43 ` Gabriel Krisman Bertazi
2018-10-15 21:12 ` [PATCH e2fsprogs 5/9] chattr/lsattr: Support casefold attribute Gabriel Krisman Bertazi
2018-11-21 5:00 ` Theodore Y. Ts'o
2018-10-15 21:12 ` [PATCH e2fsprogs 6/9] lib/ext2fs: Implement NLS support Gabriel Krisman Bertazi
2018-11-21 5:01 ` Theodore Y. Ts'o
2018-11-21 19:44 ` Gabriel Krisman Bertazi
2018-10-15 21:12 ` [PATCH e2fsprogs 7/9] lib/ext2fs: Support encoding when calculating dx hashes Gabriel Krisman Bertazi
2018-11-21 5:10 ` Theodore Y. Ts'o
2018-10-15 21:12 ` [PATCH e2fsprogs 8/9] debugfs/htree: Support encoding when printing the file hash Gabriel Krisman Bertazi
2018-10-15 21:12 ` [PATCH e2fsprogs 9/9] tune2fs: Prevent enabling encryption flag on encoding-aware fs Gabriel Krisman Bertazi
2018-11-21 5:03 ` Theodore Y. Ts'o
2018-11-21 19:46 ` Gabriel Krisman Bertazi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87pnuydy45.fsf@collabora.com \
--to=krisman@collabora.com \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.