From: Ingo Molnar <mingo@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>,
Thomas Gleixner <tglx@linutronix.de>,
LKML <linux-kernel@vger.kernel.org>,
Jason Low <jason.low2@hp.com>,
Darren Hart <dvhart@linux.intel.com>,
Mike Galbraith <efault@gmx.de>, Jeff Mahoney <jeffm@suse.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Scott Norton <scott.norton@hp.com>, Tom Vaden <tom.vaden@hp.com>,
Aswin Chandramouleeswaran <aswin@hp.com>,
Waiman Long <Waiman.Long@hp.com>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC patch 0/5] futex: Allow lockless empty check of hashbucket plist in futex_wake()
Date: Sun, 1 Dec 2013 13:10:22 +0100 [thread overview]
Message-ID: <20131201121022.GB12115@gmail.com> (raw)
In-Reply-To: <20131128115946.GD13532@twins.programming.kicks-ass.net>
* Peter Zijlstra <peterz@infradead.org> wrote:
> Wouldn't something like the below also work?
> -#define FUTEX_HASHBITS (CONFIG_BASE_SMALL ? 4 : 8)
> -
> /*
> * Futex flags used to encode options to functions and preserve them across
> * restarts.
> @@ -149,9 +147,11 @@ static const struct futex_q futex_q_init = {
> struct futex_hash_bucket {
> spinlock_t lock;
> struct plist_head chain;
> -};
> +} ____cacheline_aligned_in_smp;
>
> -static struct futex_hash_bucket futex_queues[1<<FUTEX_HASHBITS];
> +static unsigned long __read_mostly futex_hashsize;
> +
> +static struct futex_hash_bucket *futex_queues;
>
> /*
> * We hash on the keys returned from get_futex_key (see below).
> @@ -161,7 +161,7 @@ static struct futex_hash_bucket *hash_futex(union futex_key *key)
> u32 hash = jhash2((u32*)&key->both.word,
> (sizeof(key->both.word)+sizeof(key->both.ptr))/4,
> key->both.offset);
> - return &futex_queues[hash & ((1 << FUTEX_HASHBITS)-1)];
> + return &futex_queues[hash & (futex_hashsize - 1)];
> }
>
> /*
> @@ -2735,6 +2735,16 @@ static int __init futex_init(void)
> u32 curval;
> int i;
>
> +#ifdef CONFIG_BASE_SMALL
> + futex_hashsize = 16;
> +#else
> + futex_hashsize = roundup_pow_of_two(256 * num_possible_cpus());
> +#endif
> +
> + futex_queues = alloc_large_system_hash("futex", sizeof(*futex_queues),
> + futex_hashsize, 0, futex_hashsize < 256 ? HASH_SMALL : 0,
> + NULL, NULL, futex_hashsize, futex_hashsize);
> +
Two observations:
1)
Once we go dynamic hash here we might as well do a proper job: I think
we should also scale up by RAM as well.
A 64 GB computing node with 64 cores should probably not not scale the
same way as a 64 TB system with 64 cores, right?
Something like += log2(memory_per_node / 1GB) ? I.e. every doubling in
the per node gigabytes adds one more bit to the hash, or so.
2)
But more importantly, since these are all NUMA systems, would it make
sense to create per node hashes on NUMA? Each futex would be enqueued
into the hash belonging to its own page's node.
That kind of separation would both reduce the collision count, and it
would also reduce the cost of a collision. (it would slightly increase
hash calculation cost.)
(It also makes hash size calculation nicely node count independent,
we'd only consider per node CPU count and per node memory.)
Thanks,
Ingo
next prev parent reply other threads:[~2013-12-01 12:10 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-25 20:58 [RFC patch 0/5] futex: Allow lockless empty check of hashbucket plist in futex_wake() Thomas Gleixner
2013-11-25 20:58 ` [RFC patch 1/5] futex: Misc cleanups Thomas Gleixner
2013-11-25 20:58 ` [RFC patch 2/5] futex: Document ordering guarantees Thomas Gleixner
2013-11-25 20:58 ` [RFC patch 3/5] futex: Split out unlock from queue_me() Thomas Gleixner
2013-11-25 20:58 ` [RFC patch 4/5] futex: Enqueue waiter before user space check Thomas Gleixner
2013-11-26 0:20 ` Darren Hart
2013-11-25 20:58 ` [RFC patch 5/5] futex: Allow lockless empty check of hash bucket plist Thomas Gleixner
2013-11-26 8:12 ` [RFC patch 0/5] futex: Allow lockless empty check of hashbucket plist in futex_wake() Davidlohr Bueso
2013-11-26 8:52 ` Peter Zijlstra
2013-11-26 11:21 ` Ingo Molnar
2013-11-26 11:56 ` Peter Zijlstra
2013-11-26 12:34 ` Thomas Gleixner
2013-11-26 15:38 ` Davidlohr Bueso
2013-11-26 14:49 ` Davidlohr Bueso
2013-11-26 19:25 ` Davidlohr Bueso
2013-11-26 20:51 ` Davidlohr Bueso
2013-11-26 23:56 ` Thomas Gleixner
2013-11-28 7:44 ` Davidlohr Bueso
2013-11-28 11:58 ` Thomas Gleixner
2013-11-28 11:59 ` Peter Zijlstra
2013-11-28 14:23 ` Thomas Gleixner
2013-12-01 4:37 ` Davidlohr Bueso
2013-12-02 11:01 ` Thomas Gleixner
2013-12-01 12:10 ` Ingo Molnar [this message]
2013-12-01 12:56 ` Peter Zijlstra
2013-12-01 16:55 ` Ingo Molnar
2013-12-01 18:58 ` Linus Torvalds
2013-12-01 20:39 ` Eric Dumazet
2013-12-01 21:46 ` Linus Torvalds
2013-12-03 17:59 ` Darren Hart
2013-12-02 12:35 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131201121022.GB12115@gmail.com \
--to=mingo@kernel.org \
--cc=Waiman.Long@hp.com \
--cc=akpm@linux-foundation.org \
--cc=aswin@hp.com \
--cc=davidlohr@hp.com \
--cc=dvhart@linux.intel.com \
--cc=efault@gmx.de \
--cc=jason.low2@hp.com \
--cc=jeffm@suse.com \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=scott.norton@hp.com \
--cc=tglx@linutronix.de \
--cc=tom.vaden@hp.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox