All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aleksa Sarai <cyphar@cyphar.com>
To: lampahome <pahome.chen@mirlab.org>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: why do we need utf8 normalization when compare name?
Date: Mon, 2 Mar 2020 21:37:54 +1100	[thread overview]
Message-ID: <20200302103754.nsvtne2vvduug77e@yavin> (raw)
In-Reply-To: <CAB3eZfv4VSj6_XBBdHK12iX_RakhvXnTCFAmQfwogR34uySo3Q@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 930 bytes --]

On 2020-03-02, lampahome <pahome.chen@mirlab.org> wrote:
> According to case insensitive since kernel 5.2, d_compare will
> transform string into normalized form and then compare.
>
> But why do we need this normalization function? Could we just compare
> by utf8 string?

The problem is that there are multiple ways to represent the same glyph
in Unicode -- for instance, you can represent Å (the symbol for
angstrom) as both U+212B and U+0041 U+030A (the latin letter "A"
followed by the ring-above symbol "°"). Different software may choose to
represent the same glyphs in different Unicode forms, hence the need for
normalisation.

[1] is the Wikipedia article that describes this problem and what the
different kinds of Unicode normalisation are.

[1]: https://en.wikipedia.org/wiki/Unicode_equivalence

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

  reply	other threads:[~2020-03-02 10:38 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-02  9:00 why do we need utf8 normalization when compare name? lampahome
2020-03-02 10:37 ` Aleksa Sarai [this message]
2020-03-02 10:47   ` Aleksa Sarai
2020-03-03  1:48     ` lampahome
     [not found]       ` <20200303070928.aawxoyeq77wnc3ts@yavin>
2020-03-03 10:13         ` lampahome
2020-03-03 17:22           ` Theodore Y. Ts'o
2020-03-02 12:54 ` Matthew Wilcox
2020-03-02 15:28   ` Al Viro
2020-03-02 17:14     ` Matthew Wilcox
2020-03-02 18:12     ` Theodore Y. Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200302103754.nsvtne2vvduug77e@yavin \
    --to=cyphar@cyphar.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pahome.chen@mirlab.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.