From: Ravikiran G Thirumalai <kiran@scalex86.org>
To: linux-kernel@vger.kernel.org
Cc: "Shai Fultheim (Shai@scalex86.org)" <shai@scalex86.org>,
pravin b shelar <pravin.shelar@calsoftinc.com>
Subject: [RFC] NUMA futex hashing
Date: Tue, 8 Aug 2006 00:07:08 -0700 [thread overview]
Message-ID: <20060808070708.GA3931@localhost.localdomain> (raw)
Current futex hash scheme is not the best for NUMA. The futex hash table is
an array of struct futex_hash_bucket, which is just a spinlock and a
list_head -- this means multiple spinlocks on the same cacheline and on NUMA
machines, on the same internode cacheline. If futexes of two unrelated
threads running on two different nodes happen to hash onto adjacent hash
buckets, or buckets on the same internode cacheline, then we have the
internode cacheline bouncing between nodes.
Here is a simple scheme which maintains per-node hash tables for futexes.
In this scheme, a private futex is assigned to the node id of the futex's KVA.
The reasoning is, the futex KVA is allocated from the node as indicated
by memory policy set by the process, and that should be a good 'home node'
for that futex. Of course this helps workloads where all the threads of a
process are bound to the same node, but it seems reasonable to run all
threads of a process on the same node.
A shared futex is assigned a home node based on jhash2 itself. Since inode
and offset are used as the key, the same inode offset is used to arrive at
the home node of a shared futex. This distributes private futexes across
all nodes.
Comments? Suggestions? Particularly regarding shared futexes. Any policy
suggestions?
Thanks,
Kiran
Note: This patch needs to have kvaddr_to_nid() reintroduced. This was taken
out in git commit 9f3fd602aef96c2a490e3bfd669d06475aeba8d8
Index: linux-2.6.18-rc3/kernel/futex.c
===================================================================
--- linux-2.6.18-rc3.orig/kernel/futex.c 2006-08-02 12:11:34.000000000 -0700
+++ linux-2.6.18-rc3/kernel/futex.c 2006-08-02 16:48:47.000000000 -0700
@@ -137,20 +137,35 @@ struct futex_hash_bucket {
struct list_head chain;
};
-static struct futex_hash_bucket futex_queues[1<<FUTEX_HASHBITS];
+static struct futex_hash_bucket *futex_queues[MAX_NUMNODES] __read_mostly;
/* Futex-fs vfsmount entry: */
static struct vfsmount *futex_mnt;
/*
* We hash on the keys returned from get_futex_key (see below).
+ * With NUMA aware futex hashing, we have per-node hash tables.
+ * We determine the home node of a futex based on the KVA -- if the futex
+ * is a private futex. For shared futexes, we use jhash2 itself on the
+ * futex_key to arrive at a home node.
*/
static struct futex_hash_bucket *hash_futex(union futex_key *key)
{
+ int nodeid;
u32 hash = jhash2((u32*)&key->both.word,
(sizeof(key->both.word)+sizeof(key->both.ptr))/4,
key->both.offset);
- return &futex_queues[hash & ((1 << FUTEX_HASHBITS)-1)];
+ if (key->both.offset & 0x1) {
+ /*
+ * Shared futex: Use any of the 'possible' nodes as home node.
+ */
+ nodeid = hash & (MAX_NUMNODES -1);
+ BUG_ON(!node_possible(nodeid));
+ } else
+ /* Private futex */
+ nodeid = kvaddr_to_nid(key->both.ptr);
+
+ return &futex_queues[nodeid][hash & ((1 << FUTEX_HASHBITS)-1)];
}
/*
@@ -1909,13 +1924,25 @@ static int __init init(void)
{
unsigned int i;
+ int nid;
+
+ for_each_node(nid)
+ {
+ futex_queues[nid] = kmalloc_node(
+ (sizeof(struct futex_hash_bucket) *
+ (1 << FUTEX_HASHBITS)),
+ GFP_KERNEL, nid);
+ if (!futex_queues[nid])
+ panic("futex_init: Allocation of multi-node futex_queues failed");
+ for (i = 0; i < (1 << FUTEX_HASHBITS); i++) {
+ INIT_LIST_HEAD(&futex_queues[nid][i].chain);
+ spin_lock_init(&futex_queues[nid][i].lock);
+ }
+ }
+
register_filesystem(&futex_fs_type);
futex_mnt = kern_mount(&futex_fs_type);
- for (i = 0; i < ARRAY_SIZE(futex_queues); i++) {
- INIT_LIST_HEAD(&futex_queues[i].chain);
- spin_lock_init(&futex_queues[i].lock);
- }
return 0;
}
__initcall(init);
next reply other threads:[~2006-08-08 7:05 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-08 7:07 Ravikiran G Thirumalai [this message]
2006-08-08 9:14 ` [RFC] NUMA futex hashing Eric Dumazet
2006-08-08 20:31 ` Ravikiran G Thirumalai
2006-08-08 9:37 ` Jes Sorensen
2006-08-08 9:58 ` Andi Kleen
2006-08-08 10:07 ` Jes Sorensen
2006-08-08 9:57 ` Andi Kleen
2006-08-08 10:10 ` Eric Dumazet
2006-08-08 10:36 ` Andi Kleen
2006-08-08 12:29 ` Eric Dumazet
2006-08-08 12:47 ` Andi Kleen
2006-08-08 12:57 ` Eric Dumazet
2006-08-08 14:39 ` Ulrich Drepper
2006-08-08 15:11 ` Nick Piggin
2006-08-08 15:36 ` Ulrich Drepper
2006-08-08 16:22 ` Nick Piggin
2006-08-08 16:26 ` Nick Piggin
2006-08-08 16:49 ` Ulrich Drepper
2006-08-08 16:08 ` Eric Dumazet
2006-08-08 16:34 ` Nick Piggin
2006-08-08 16:49 ` Eric Dumazet
2006-08-08 16:59 ` Eric Dumazet
2006-08-09 1:56 ` Nick Piggin
2006-08-08 16:58 ` Ulrich Drepper
2006-08-08 17:08 ` Eric Dumazet
2006-08-09 1:58 ` Nick Piggin
2006-08-09 6:26 ` Eric Dumazet
2006-08-09 6:43 ` Eric Dumazet
2007-03-15 19:10 ` [PATCH 0/3] FUTEX : new PRIVATE futexes, SMP and NUMA improvements Eric Dumazet
2007-03-15 20:15 ` Nick Piggin
2007-03-16 8:05 ` Peter Zijlstra
2007-03-16 9:30 ` Eric Dumazet
2007-03-16 10:10 ` Peter Zijlstra
2007-03-16 10:30 ` Eric Dumazet
2007-03-16 10:36 ` Peter Zijlstra
2007-04-04 7:16 ` Ulrich Drepper
2007-04-05 17:49 ` [PATCH] FUTEX : new PRIVATE futexes Eric Dumazet
2007-04-05 20:43 ` Ulrich Drepper
2007-04-06 1:19 ` Nick Piggin
2007-04-06 5:53 ` Eric Dumazet
2007-04-06 11:50 ` Nick Piggin
2007-04-06 6:05 ` Hugh Dickins
2007-04-06 17:41 ` Jan Engelhardt
2007-04-06 12:26 ` Shared futexes (was [PATCH] FUTEX : new PRIVATE futexes) Peter Zijlstra
2007-04-06 13:02 ` Hugh Dickins
2007-04-06 13:15 ` Peter Zijlstra
2007-04-06 13:15 ` Nick Piggin
2007-04-06 13:22 ` Peter Zijlstra
2007-04-06 13:40 ` Nick Piggin
2007-04-06 12:31 ` [PATCH] FUTEX : new PRIVATE futexes Peter Zijlstra
2007-04-07 8:43 ` [PATCH, take4] " Eric Dumazet
2007-04-07 9:30 ` Nick Piggin
2007-04-07 10:00 ` Eric Dumazet
2007-04-11 7:22 ` Nick Piggin
2007-04-11 8:14 ` Eric Dumazet
2007-04-11 9:23 ` Nick Piggin
2007-04-11 9:30 ` Pierre Peiffer
2007-04-11 9:39 ` Nick Piggin
2007-04-11 9:40 ` Nick Piggin
2007-04-11 9:35 ` Eric Dumazet
2007-04-12 1:57 ` Nick Piggin
2007-04-07 11:18 ` Jakub Jelinek
2007-04-07 11:54 ` Eric Dumazet
2007-04-07 16:40 ` Ulrich Drepper
2007-04-07 22:15 ` Andrew Morton
2007-04-10 9:21 ` Eric Dumazet
2007-04-11 9:19 ` [PATCH, take5] " Eric Dumazet
2007-04-11 12:23 ` Rusty Russell
2007-04-26 12:55 ` [PATCH, take6] " Eric Dumazet
2007-04-26 13:35 ` Pierre Peiffer
2007-03-15 19:13 ` [PATCH 1/3] FUTEX : introduce PROCESS_PRIVATE semantic Eric Dumazet
2007-03-15 19:16 ` [PATCH 2/3] FUTEX : introduce private hashtables Eric Dumazet
2007-03-15 20:25 ` Nick Piggin
2007-03-15 21:09 ` Ulrich Drepper
2007-03-15 21:29 ` Nick Piggin
2007-03-15 22:59 ` William Lee Irwin III
2007-03-15 19:20 ` [PATCH 3/3] FUTEX : NUMA friendly global hashtable Eric Dumazet
2006-08-09 0:13 ` [RFC] NUMA futex hashing Ravikiran G Thirumalai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060808070708.GA3931@localhost.localdomain \
--to=kiran@scalex86.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pravin.shelar@calsoftinc.com \
--cc=shai@scalex86.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox