linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Amir Goldstein <amir73il@gmail.com>
Cc: George Spelvin <linux@sciencehorizons.net>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Miklos Szeredi <miklos@szeredi.hu>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [RFC][PATCH] <linux/stringhash.h>: fix end_name_hash() for 64bit long
Date: Fri, 9 Feb 2018 10:33:15 -0800	[thread overview]
Message-ID: <CA+55aFz34O6dA6fYy36yYA0w==dwkLX6UOCnTrioCO6__+rajQ@mail.gmail.com> (raw)
In-Reply-To: <1517851938-5892-1-git-send-email-amir73il@gmail.com>

On Mon, Feb 5, 2018 at 9:32 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> The comment claims that this helper will try not to loose bits, but
> for 64bit long it looses the high bits before hashing 64bit long into
> 32bit int. Use the helper hash_long() to do the right thing for 64bit
> long. For 32bit long, there is no change.

Ok, sorry for the delay, I only got back to this now because the merge
window is calming down (famous last words - sometimes Friday ends up
being busy, but I might decide that it's too late if somebody sends me
an annoying late pull request now).

And after having looked more at it, I take back all my complaints
about the patch, you were right and I was mis-reading things or just
being stupid.

I also don't worry too much about the possible performance impact of
this on 64-bit, since most architectures that actually care about
performance end up not using this very much (the dcache code is the
most performance-critical, but the word-at-a-time case uses its own
hashing anyway).

So this ends up being mostly used for filesystems that do their own
degraded hashing (usually because they want a case-insensitive
comparison function).

A _tiny_ worry remains, in that not everybody uses DCACHE_WORD_ACCESS,
and then this potentially makes things more expensive on 64-bit
architectures with slow or lacking multipliers even for the normal
case.

That said, realistically the only such architecture I can think of is
PA-RISC. Nobody really cares about performance on that, it's more of a
"look ma, I've got warts^W an odd machine" platform.

So I think your patch is fine, and all my initial worries were just
misplaced from not looking at this properly.

Sorry.

I *would* love to get some kind of estimate on how this changes the
hash distribution. Do you have anything like that? The way this
function is designed to be used, the upper bits should be used for the
hash bucket information, so doing some bucket size distribution on
(say) the upper ~12 bits or something with some real string input
would be really good.

                Linus

  parent reply	other threads:[~2018-02-09 18:33 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-05 17:32 [RFC][PATCH] <linux/stringhash.h>: fix end_name_hash() for 64bit long Amir Goldstein
2018-02-05 17:36 ` Linus Torvalds
2018-02-05 17:51   ` Amir Goldstein
2018-02-05 17:57     ` Linus Torvalds
2018-02-05 18:35       ` Amir Goldstein
2018-02-05 18:37         ` Linus Torvalds
2018-02-09 18:33 ` Linus Torvalds [this message]
2018-02-11 14:51   ` Amir Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+55aFz34O6dA6fYy36yYA0w==dwkLX6UOCnTrioCO6__+rajQ@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=amir73il@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@sciencehorizons.net \
    --cc=miklos@szeredi.hu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).