* [PATCH] Fix rt preempt slab NUMA freeing
@ 2007-10-23 17:13 Andi Kleen
0 siblings, 0 replies; 3+ messages in thread
From: Andi Kleen @ 2007-10-23 17:13 UTC (permalink / raw)
To: linux-rt-users
When this_cpu changes in the free path node needs to change too.
Otherwise the slab can end up in the wrong node's list and this
eventually leads to WARN_ONs and of course worse NUMA performace.
This patch is likely not complete (the NUMA slab code is *very* hairy),
but seems to make the make -j128 test survive for at least two hours.
But at least it fixes one case that regularly triggered during
testing, resulting in slabs in the wrong node lists and triggering
WARN_ONs in slab_put/get_obj
I tried a complete audit of keeping this_cpu/node/slabp in sync when needed, but
it is very hairy code and I likely missed some cases. This so far
fixes only the simple free path; but it seems to be good enough
to not trigger easily anymore on a NUMA system with memory pressure.
Longer term the only good fix is probably to migrate to slub.
Or disable NUMA slab for PREEMPT_RT (its value has been disputed
in some benchmarks anyways)
Signed-off-by: Andi Kleen <ak@suse.de>
Index: linux-2.6.23-rt1/mm/slab.c
===================================================================
--- linux-2.6.23-rt1.orig/mm/slab.c
+++ linux-2.6.23-rt1/mm/slab.c
@@ -1193,7 +1193,7 @@ cache_free_alien(struct kmem_cache *cach
struct array_cache *alien = NULL;
int node;
- node = numa_node_id();
+ node = cpu_to_node(*this_cpu);
/*
* Make sure we are not freeing a object from another node to the array
@@ -4194,6 +4194,8 @@ static void cache_reap(struct work_struc
work_done += reap_alien(searchp, l3, &this_cpu);
+ node = cpu_to_node(this_cpu);
+
work_done += drain_array(searchp, l3,
cpu_cache_get(searchp, this_cpu), 0, node);
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH] Fix rt preempt slab NUMA freeing
@ 2007-10-23 17:13 Andi Kleen
0 siblings, 0 replies; 3+ messages in thread
From: Andi Kleen @ 2007-10-23 17:13 UTC (permalink / raw)
To: linux-rt-users
When this_cpu changes in the free path node needs to change too.
Otherwise the slab can end up in the wrong node's list and this
eventually leads to WARN_ONs and of course worse NUMA performace.
This patch is likely not complete (the NUMA slab code is *very* hairy),
but seems to make the make -j128 test survive for at least two hours.
But at least it fixes one case that regularly triggered during
testing, resulting in slabs in the wrong node lists and triggering
WARN_ONs in slab_put/get_obj
I tried a complete audit of keeping this_cpu/node/slabp in sync when needed, but
it is very hairy code and I likely missed some cases. This so far
fixes only the simple free path; but it seems to be good enough
to not trigger easily anymore on a NUMA system with memory pressure.
Longer term the only good fix is probably to migrate to slub.
Or disable NUMA slab for PREEMPT_RT (its value has been disputed
in some benchmarks anyways)
Signed-off-by: Andi Kleen <ak@suse.de>
---
mm/slab.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
Index: linux-2.6.21-rt-hack/mm/slab.c
===================================================================
--- linux-2.6.21-rt-hack.orig/mm/slab.c
+++ linux-2.6.21-rt-hack/mm/slab.c
@@ -1205,7 +1205,7 @@ cache_free_alien(struct kmem_cache *cach
struct array_cache *alien = NULL;
int node;
- node = numa_node_id();
+ node = cpu_to_node(*this_cpu);
/*
* Make sure we are not freeing a object from another node to the array
@@ -4199,6 +4199,8 @@ static void cache_reap(struct work_struc
work_done += reap_alien(searchp, l3, &this_cpu);
+ node = cpu_to_node(this_cpu);
+
work_done += drain_array(searchp, l3,
cpu_cache_get(searchp, this_cpu), 0, node);
--------------050607000601000003090500--
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH] Fix rt preempt slab NUMA freeing
@ 2007-10-23 17:13 Andi Kleen
0 siblings, 0 replies; 3+ messages in thread
From: Andi Kleen @ 2007-10-23 17:13 UTC (permalink / raw)
To: linux-rt-users
When this_cpu changes in the free path node needs to change too.
Otherwise the slab can end up in the wrong node's list and this
eventually leads to WARN_ONs and of course worse NUMA performace.
This patch is likely not complete (the NUMA slab code is *very* hairy),
but seems to make the make -j128 test survive for at least two hours.
But at least it fixes one case that regularly triggered during
testing, resulting in slabs in the wrong node lists and triggering
WARN_ONs in slab_put/get_obj
I tried a complete audit of keeping this_cpu/node/slabp in sync when needed, but
it is very hairy code and I likely missed some cases. This so far
fixes only the simple free path; but it seems to be good enough
to not trigger easily anymore on a NUMA system with memory pressure.
Longer term the only good fix is probably to migrate to slub.
Or disable NUMA slab for PREEMPT_RT (its value has been disputed
in some benchmarks anyways)
Signed-off-by: Andi Kleen <ak@suse.de>
---
mm/slab.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
Index: linux-2.6.21-rt-hack/mm/slab.c
===================================================================
--- linux-2.6.21-rt-hack.orig/mm/slab.c
+++ linux-2.6.21-rt-hack/mm/slab.c
@@ -1205,7 +1205,7 @@ cache_free_alien(struct kmem_cache *cach
struct array_cache *alien = NULL;
int node;
- node = numa_node_id();
+ node = cpu_to_node(*this_cpu);
/*
* Make sure we are not freeing a object from another node to the array
@@ -4199,6 +4199,8 @@ static void cache_reap(struct work_struc
work_done += reap_alien(searchp, l3, &this_cpu);
+ node = cpu_to_node(this_cpu);
+
work_done += drain_array(searchp, l3,
cpu_cache_get(searchp, this_cpu), 0, node);
--------------040207020607000702050607--
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2007-10-23 17:13 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-23 17:13 [PATCH] Fix rt preempt slab NUMA freeing Andi Kleen
-- strict thread matches above, loose matches on Subject: below --
2007-10-23 17:13 Andi Kleen
2007-10-23 17:13 Andi Kleen
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.