From: Gabriel Krisman Bertazi <krisman@collabora.com>
To: "Theodore Y. Ts'o" <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org, kernel@collabora.com
Subject: Re: [PATCH] e2p: Print encoding information in superblock dump
Date: Mon, 10 Dec 2018 16:05:48 -0500 [thread overview]
Message-ID: <87h8flf5xv.fsf@collabora.com> (raw)
In-Reply-To: <20181208183637.GD20708@thunk.org> (Theodore Y. Ts'o's message of "Sat, 8 Dec 2018 13:36:37 -0500")
"Theodore Y. Ts'o" <tytso@mit.edu> writes:
> On Tue, Dec 04, 2018 at 04:16:09PM -0500, Gabriel Krisman Bertazi wrote:
>> diff --git a/lib/e2p/ls.c b/lib/e2p/ls.c
>> index a7586e094da1..bb1fc8aa94da 100644
>> --- a/lib/e2p/ls.c
>> +++ b/lib/e2p/ls.c
> ....
>> + if (encoding == EXT4_ENC_UTF8_11_0) {
>> + if (flags & EXT4_UTF8_NORMALIZATION_TYPE_NFKD)
>> + fputs(" NFKD", f);
>> + else
>> + fputs(" Unnormalized", f);
>> + flags_found++;
>> +
>> + if (flags & EXT4_UTF8_CASEFOLD_TYPE_NFKDCF)
>> + fputs(" NFKDCF", f);
>> + else
>> + fputs(" toUpper", f);
>> + flags_found++;
>> + }
>
> I don't understand this. Why is "toUpper" the opposite of
> "CASEFOLD_TYPE_NFKDCF"? From what I can tell looking at the kernel
> patches, it appears that if CASEFOLD_TYPE_NFKDCF is not specified, no
> case folding is done at all. And it appears the opposite of "toupper"
> is "tolower" --- for ASCII case folding.
In order to allow any NLS charset to benefit from the
nls_strcmp/strncasecmp API I specified some default
normalization/casefold operations that could be implemented using the
hooks we already have. The default was toUpper. That was my thinking.
utf8 was originally split between utf8 and utf8n, the former being the
original unnormalized behavior. If we didn't have CASEFOLD_TYPE_NFKDCF,
it used the toUpper method.
> More generally, we don't have a way of setting these flags, and I'm
> wondering if we should just make a decision and be done with it.
> After all, without any way of changing the flags, there's only one
> code path that is going to be well tested, and realistically user
> programs will come to *expect* only one way file systems will do
> things. The MacOS world has discovered the hard way what happens if
> they try to change normalization conventions from one to another,
> leading to all sorts of confusion for application programmers.
>
> So perhaps we should just remove these flags from the superblock, and
> only support one way of doing things. We ask the opinion of various
> stake-holders --- the Samba folks, fsdevel, Steam, etc. But whether
> we decide NFC, NFD, NFKC or NFKD, I suspect we'll be better off just
> picking one and only one way of doing things. WDYT?
My approach is over-complex, just to support the existing NLS tables.
Since Linus seems ok to move the code into a separate module and not
support other encodings, I agree we can make things much simpler, define
a single normalization/casefold and be done with it.
So, I will revive the first versions of this charset/unicode module. We drop
these flags from the superblock, but we still store the encoding and the
encoding version in it, since it is useful to maintain stability of name
sequences. We also support ASCII, alongside with utf8 because that is a
safer and pretty trivial. Finally, do we revisit the decision to
provide a strict mode to reject invalid sequences? I still think that
flag is useful. Do we also want a flag to specify if the default is +F for
newer directories?
Do you agree?
--
Gabriel Krisman Bertazi
prev parent reply other threads:[~2018-12-10 21:05 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-04 21:16 [PATCH] e2p: Print encoding information in superblock dump Gabriel Krisman Bertazi
2018-12-08 18:36 ` Theodore Y. Ts'o
2018-12-10 21:05 ` Gabriel Krisman Bertazi [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87h8flf5xv.fsf@collabora.com \
--to=krisman@collabora.com \
--cc=kernel@collabora.com \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).