From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B13A8C3F2D1 for ; Mon, 2 Mar 2020 10:38:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 881BF217F4 for ; Mon, 2 Mar 2020 10:38:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727107AbgCBKiL (ORCPT ); Mon, 2 Mar 2020 05:38:11 -0500 Received: from mout-p-101.mailbox.org ([80.241.56.151]:21402 "EHLO mout-p-101.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726674AbgCBKiK (ORCPT ); Mon, 2 Mar 2020 05:38:10 -0500 Received: from smtp2.mailbox.org (smtp2.mailbox.org [IPv6:2001:67c:2050:105:465:1:2:0]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 48WGnV3GqqzKmgS; Mon, 2 Mar 2020 11:38:06 +0100 (CET) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.241]) by spamfilter03.heinlein-hosting.de (spamfilter03.heinlein-hosting.de [80.241.56.117]) (amavisd-new, port 10030) with ESMTP id titW3oOKldKy; Mon, 2 Mar 2020 11:38:02 +0100 (CET) Date: Mon, 2 Mar 2020 21:37:54 +1100 From: Aleksa Sarai To: lampahome Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: why do we need utf8 normalization when compare name? Message-ID: <20200302103754.nsvtne2vvduug77e@yavin> References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="vt4djlomkycskysi" Content-Disposition: inline In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org --vt4djlomkycskysi Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2020-03-02, lampahome wrote: > According to case insensitive since kernel 5.2, d_compare will > transform string into normalized form and then compare. > > But why do we need this normalization function? Could we just compare > by utf8 string? The problem is that there are multiple ways to represent the same glyph in Unicode -- for instance, you can represent =C5 (the symbol for angstrom) as both U+212B and U+0041 U+030A (the latin letter "A" followed by the ring-above symbol "=B0"). Different software may choose to represent the same glyphs in different Unicode forms, hence the need for normalisation. [1] is the Wikipedia article that describes this problem and what the different kinds of Unicode normalisation are. [1]: https://en.wikipedia.org/wiki/Unicode_equivalence --=20 Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH --vt4djlomkycskysi Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSxZm6dtfE8gxLLfYqdlLljIbnQEgUCXlzh/wAKCRCdlLljIbnQ Et6tAQCq8ZXt+A2whrSxyf0bcHdIFSYEonsJIKRgPmRE16VhpgD+IoBvz+ekhdw1 q7VArnP8oJQ/PLZkF3Cs2fO4Y7j2sAs= =Ed33 -----END PGP SIGNATURE----- --vt4djlomkycskysi--