linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Pali Rohár" <pali@kernel.org>
To: "Kohada.Tetsuhiro@dc.MitsubishiElectric.co.jp" 
	<Kohada.Tetsuhiro@dc.MitsubishiElectric.co.jp>
Cc: "'linux-fsdevel@vger.kernel.org'" <linux-fsdevel@vger.kernel.org>,
	"'linux-kernel@vger.kernel.org'" <linux-kernel@vger.kernel.org>,
	"'namjae.jeon@samsung.com'" <namjae.jeon@samsung.com>,
	"'sj1557.seo@samsung.com'" <sj1557.seo@samsung.com>,
	"'viro@zeniv.linux.org.uk'" <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH 1/4] exfat: Simplify exfat_utf8_d_hash() for code points above U+FFFF
Date: Tue, 7 Apr 2020 12:06:48 +0200	[thread overview]
Message-ID: <20200407100648.phkvxbmv2kootyt7@pali> (raw)
In-Reply-To: <TY1PR01MB1578D63C6F303DE805D75DAA90C20@TY1PR01MB1578.jpnprd01.prod.outlook.com>

On Monday 06 April 2020 09:37:38 Kohada.Tetsuhiro@dc.MitsubishiElectric.co.jp wrote:
> > > If you want to get an unbiased hash value by specifying an 8 or 16-bit
> > > value,
> > 
> > Hello! In exfat we have sequence of 21-bit values (not 8, not 16).
> 
> hash_32() generates a less-biased hash, even for 21-bit characters.
> 
> The hash of partial_name_hash() for the filename with the following character is ...
>  - 21-bit(surrogate pair): the upper 3-bits of hash tend to be 0.
>  - 16-bit(mostly CJKV): the upper 8-bits of hash tend to be 0.
>  - 8-bit(mostly latin): the upper 16-bits of hash tend to be 0.
> 
> I think the more frequently used latin/CJKV characters are more important
> when considering the hash efficiency of surrogate pair characters.
> 
> The hash of partial_name_hash() for 8/16-bit characters is also biased.
> However, it works well.
> 
> Surrogate pair characters are used less frequently, and the hash of 
> partial_name_hash() has less bias than for 8/16 bit characters.
> 
> So I think there is no problem with your patch.

So partial_name_hash() like I used it in this patch series is enough?

  reply	other threads:[~2020-04-07 10:06 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-03  2:18 [PATCH 1/4] exfat: Simplify exfat_utf8_d_hash() for code points above U+FFFF Kohada.Tetsuhiro
2020-04-03 20:40 ` Pali Rohár
2020-04-06  9:37   ` Kohada.Tetsuhiro
2020-04-07 10:06     ` Pali Rohár [this message]
2020-04-08  3:59       ` Kohada.Tetsuhiro
2020-04-08  9:04         ` Pali Rohár
2020-04-13  8:13           ` Kohada.Tetsuhiro
2020-04-13 10:10             ` Pali Rohár
2020-04-14  9:29               ` Kohada.Tetsuhiro
2020-04-14  9:47                 ` Pali Rohár
2020-04-15  7:46                   ` Kohada.Tetsuhiro
  -- strict thread matches above, loose matches on Subject: below --
2020-03-17 22:25 [PATCH 0/4] Fixes for exfat driver Pali Rohár
2020-03-17 22:25 ` [PATCH 1/4] exfat: Simplify exfat_utf8_d_hash() for code points above U+FFFF Pali Rohár
2020-03-18  0:09   ` Al Viro
2020-03-18  9:32     ` Pali Rohár
2020-03-28 23:40       ` Pali Rohár

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200407100648.phkvxbmv2kootyt7@pali \
    --to=pali@kernel.org \
    --cc=Kohada.Tetsuhiro@dc.MitsubishiElectric.co.jp \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=namjae.jeon@samsung.com \
    --cc=sj1557.seo@samsung.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).