[PATCH 2/2]x86: spread tlb flush vector between nodes

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 2/2]x86: spread tlb flush vector between nodes
@ 2010-10-20  3:07 Shaohua Li
  2010-10-20  5:16 ` Eric Dumazet
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Shaohua Li @ 2010-10-20  3:07 UTC (permalink / raw)
  To: lkml; +Cc: Ingo Molnar, hpa@zytor.com, Andi Kleen, Chen, Tim C

Currently flush tlb vector allocation is based on below equation:
	sender = smp_processor_id() % 8
This isn't optimal, CPUs from different node can have the same vector, this
causes a lot of lock contention. Instead, we can assign the same vectors to
CPUs from the same node, while different node has different vectors. This has
below advantages:
a. if there is lock contention, the lock contention is between CPUs from one
node. This should be much cheaper than the contention between nodes.
b. completely avoid lock contention between nodes. This especially benefits
kswapd, which is the biggest user of tlb flush, since kswapd sets its affinity
to specific node.

In my test, this could reduce > 20% CPU overhead in extreme case.The test
machine has 4 nodes and each node has 16 CPUs. I then bind each node's kswapd
to the first CPU of the node. I run a workload with 4 sequential mmap file
read thread. The files are empty sparse file. This workload will trigger a
lot of page reclaim and tlbflush. The kswapd bind is to easy trigger the
extreme tlb flush lock contention because otherwise kswapd keeps migrating
between CPUs of a node and I can't get stable result. Sure in real workload,
we can't always see so big tlb flush lock contention, but it's possible.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
---
 arch/x86/mm/tlb.c |   48 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 47 insertions(+), 1 deletion(-)

Index: linux/arch/x86/mm/tlb.c
===================================================================
--- linux.orig/arch/x86/mm/tlb.c	2010-10-20 10:07:53.000000000 +0800
+++ linux/arch/x86/mm/tlb.c	2010-10-20 10:09:26.000000000 +0800
@@ -5,6 +5,7 @@
 #include <linux/smp.h>
 #include <linux/interrupt.h>
 #include <linux/module.h>
+#include <linux/cpu.h>
 
 #include <asm/tlbflush.h>
 #include <asm/mmu_context.h>
@@ -52,6 +53,8 @@ union smp_flush_state {
    want false sharing in the per cpu data segment. */
 static union smp_flush_state flush_state[NUM_INVALIDATE_TLB_VECTORS];
 
+static DEFINE_PER_CPU_READ_MOSTLY(int, tlb_vector_offset);
+
 /*
  * We cannot call mmdrop() because we are in interrupt context,
  * instead update mm->cpu_vm_mask.
@@ -173,7 +176,7 @@ static void flush_tlb_others_ipi(const s
 	union smp_flush_state *f;
 
 	/* Caller has disabled preemption */
-	sender = smp_processor_id() % NUM_INVALIDATE_TLB_VECTORS;
+	sender = per_cpu(tlb_vector_offset, smp_processor_id());
 	f = &flush_state[sender];
 
 	/*
@@ -218,6 +221,47 @@ void native_flush_tlb_others(const struc
 	flush_tlb_others_ipi(cpumask, mm, va);
 }
 
+static void __cpuinit calculate_tlb_offset(void)
+{
+	int cpu, node, nr_node_vecs;
+	/*
+	 * we are changing tlb_vector_offset for each CPU in runtime, but this
+	 * will not cause inconsistency, as the write is atomic under X86. we
+	 * might see more lock contentions in a short time, but after all CPU's
+	 * tlb_vector_offset are changed, everything should go normal
+	 *
+	 * Note: if NUM_INVALIDATE_TLB_VECTORS % nr_online_nodes !=0, we might
+	 * waste some vectors.
+	 **/
+	if (nr_online_nodes > NUM_INVALIDATE_TLB_VECTORS)
+		nr_node_vecs = 1;
+	else
+		nr_node_vecs = NUM_INVALIDATE_TLB_VECTORS/nr_online_nodes;
+
+	for_each_online_node(node) {
+		int node_offset = (node % NUM_INVALIDATE_TLB_VECTORS) *
+			nr_node_vecs;
+		int cpu_offset = 0;
+		for_each_cpu(cpu, cpumask_of_node(node)) {
+			per_cpu(tlb_vector_offset, cpu) = node_offset +
+				cpu_offset;
+			cpu_offset++;
+			cpu_offset = cpu_offset % nr_node_vecs;
+		}
+	}
+}
+
+static int tlb_cpuhp_notify(struct notifier_block *n,
+		unsigned long action, void *hcpu)
+{
+	switch (action & 0xf) {
+	case CPU_ONLINE:
+	case CPU_DEAD:
+		calculate_tlb_offset();
+	}
+	return NOTIFY_OK;
+}
+
 static int __cpuinit init_smp_flush(void)
 {
 	int i;
@@ -225,6 +269,8 @@ static int __cpuinit init_smp_flush(void
 	for (i = 0; i < ARRAY_SIZE(flush_state); i++)
 		raw_spin_lock_init(&flush_state[i].tlbstate_lock);
 
+	calculate_tlb_offset();
+	hotcpu_notifier(tlb_cpuhp_notify, 0);
 	return 0;
 }
 core_initcall(init_smp_flush);



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2]x86: spread tlb flush vector between nodes
  2010-10-20  3:07 [PATCH 2/2]x86: spread tlb flush vector between nodes Shaohua Li
@ 2010-10-20  5:16 ` Eric Dumazet
  2010-10-20  7:31   ` Andi Kleen
  2010-10-20  7:30 ` Andi Kleen
  2010-10-20 23:07 ` [tip:x86/mm] x86: Spread " tip-bot for Shaohua Li
  2 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2010-10-20  5:16 UTC (permalink / raw)
  To: Shaohua Li; +Cc: lkml, Ingo Molnar, hpa@zytor.com, Andi Kleen, Chen, Tim C

Le mercredi 20 octobre 2010 à 11:07 +0800, Shaohua Li a écrit :
> Currently flush tlb vector allocation is based on below equation:
> 	sender = smp_processor_id() % 8
> This isn't optimal, CPUs from different node can have the same vector, this
> causes a lot of lock contention. Instead, we can assign the same vectors to
> CPUs from the same node, while different node has different vectors. This has
> below advantages:
> a. if there is lock contention, the lock contention is between CPUs from one
> node. This should be much cheaper than the contention between nodes.
> b. completely avoid lock contention between nodes. This especially benefits
> kswapd, which is the biggest user of tlb flush, since kswapd sets its affinity
> to specific node.
> 
> In my test, this could reduce > 20% CPU overhead in extreme case.The test
> machine has 4 nodes and each node has 16 CPUs. I then bind each node's kswapd
> to the first CPU of the node. I run a workload with 4 sequential mmap file
> read thread. The files are empty sparse file. This workload will trigger a
> lot of page reclaim and tlbflush. The kswapd bind is to easy trigger the
> extreme tlb flush lock contention because otherwise kswapd keeps migrating
> between CPUs of a node and I can't get stable result. Sure in real workload,
> we can't always see so big tlb flush lock contention, but it's possible.
> 
> Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> ---
>  arch/x86/mm/tlb.c |   48 +++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 47 insertions(+), 1 deletion(-)
> 
> Index: linux/arch/x86/mm/tlb.c
> ===================================================================
> --- linux.orig/arch/x86/mm/tlb.c	2010-10-20 10:07:53.000000000 +0800
> +++ linux/arch/x86/mm/tlb.c	2010-10-20 10:09:26.000000000 +0800
> @@ -5,6 +5,7 @@
>  #include <linux/smp.h>
>  #include <linux/interrupt.h>
>  #include <linux/module.h>
> +#include <linux/cpu.h>
>  
>  #include <asm/tlbflush.h>
>  #include <asm/mmu_context.h>
> @@ -52,6 +53,8 @@ union smp_flush_state {
>     want false sharing in the per cpu data segment. */
>  static union smp_flush_state flush_state[NUM_INVALIDATE_TLB_VECTORS];
>  
> +static DEFINE_PER_CPU_READ_MOSTLY(int, tlb_vector_offset);
> +
>  /*
>   * We cannot call mmdrop() because we are in interrupt context,
>   * instead update mm->cpu_vm_mask.
> @@ -173,7 +176,7 @@ static void flush_tlb_others_ipi(const s
>  	union smp_flush_state *f;
>  
>  	/* Caller has disabled preemption */
> -	sender = smp_processor_id() % NUM_INVALIDATE_TLB_VECTORS;
> +	sender = per_cpu(tlb_vector_offset, smp_processor_id());

	sender = this_cpu_read(tlb_vector_offset);

>  	f = &flush_state[sender];
>  
>  	/*
> @@ -218,6 +221,47 @@ void native_flush_tlb_others(const struc
>  	flush_tlb_others_ipi(cpumask, mm, va);
>  }
>  


Thats a pretty good patch, thanks !

Maybe we should have a per_node memory infrastructure, so that we can
lower memory needs of currently per_cpu objects.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2]x86: spread tlb flush vector between nodes
  2010-10-20  3:07 [PATCH 2/2]x86: spread tlb flush vector between nodes Shaohua Li
  2010-10-20  5:16 ` Eric Dumazet
@ 2010-10-20  7:30 ` Andi Kleen
  2010-10-20  8:44   ` Shaohua Li
  2010-10-20 23:07 ` [tip:x86/mm] x86: Spread " tip-bot for Shaohua Li
  2 siblings, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2010-10-20  7:30 UTC (permalink / raw)
  To: Shaohua Li; +Cc: lkml, Ingo Molnar, hpa@zytor.com, Andi Kleen, Chen, Tim C

Hi Shaohua,

> +	if (nr_online_nodes > NUM_INVALIDATE_TLB_VECTORS)
> +		nr_node_vecs = 1;
> +	else
> +		nr_node_vecs = NUM_INVALIDATE_TLB_VECTORS/nr_online_nodes;

Does this build without CONFIG_NUMA? AFAIK nr_online_nodes is only
defined for a numa kernel.

> +
> +static int tlb_cpuhp_notify(struct notifier_block *n,
> +		unsigned long action, void *hcpu)
> +{
> +	switch (action & 0xf) {
> +	case CPU_ONLINE:
> +	case CPU_DEAD:
> +		calculate_tlb_offset();

I still think the notifier is overkill and a static mapping at boot time
would be fine.

The rest looks ok to me.

-andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2]x86: spread tlb flush vector between nodes
  2010-10-20  5:16 ` Eric Dumazet
@ 2010-10-20  7:31   ` Andi Kleen
  2010-10-20 11:20     ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2010-10-20  7:31 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Shaohua Li, lkml, Ingo Molnar, hpa@zytor.com, Andi Kleen,
	Chen, Tim C

> Maybe we should have a per_node memory infrastructure, so that we can
> lower memory needs of currently per_cpu objects.

I have been looking at that :- for a lot of things per core
data makes sense too.

Really a lot of the per CPU scaling we have today should be per core
or per node to avoid explosion.

But for this particular case it doesn't help because you still
need a mapping for each CPU.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2]x86: spread tlb flush vector between nodes
  2010-10-20  7:30 ` Andi Kleen
@ 2010-10-20  8:44   ` Shaohua Li
  0 siblings, 0 replies; 10+ messages in thread
From: Shaohua Li @ 2010-10-20  8:44 UTC (permalink / raw)
  To: Andi Kleen; +Cc: lkml, Ingo Molnar, hpa@zytor.com, Chen, Tim C

On Wed, 2010-10-20 at 15:30 +0800, Andi Kleen wrote:
> Hi Shaohua,
> 
> > +	if (nr_online_nodes > NUM_INVALIDATE_TLB_VECTORS)
> > +		nr_node_vecs = 1;
> > +	else
> > +		nr_node_vecs = NUM_INVALIDATE_TLB_VECTORS/nr_online_nodes;
> 
> Does this build without CONFIG_NUMA? AFAIK nr_online_nodes is only
> defined for a numa kernel.
yes it's ok without CONFIG_NUMA

> > +
> > +static int tlb_cpuhp_notify(struct notifier_block *n,
> > +		unsigned long action, void *hcpu)
> > +{
> > +	switch (action & 0xf) {
> > +	case CPU_ONLINE:
> > +	case CPU_DEAD:
> > +		calculate_tlb_offset();
> 
> I still think the notifier is overkill and a static mapping at boot time
> would be fine.
this hasn't much overhead, so please keep it. I'm afraid static mapping
calculation for hotplug cpu will introduce more complex.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2]x86: spread tlb flush vector between nodes
  2010-10-20  7:31   ` Andi Kleen
@ 2010-10-20 11:20     ` Peter Zijlstra
  2010-10-20 12:06       ` Andi Kleen
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2010-10-20 11:20 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Eric Dumazet, Shaohua Li, lkml, Ingo Molnar, hpa@zytor.com,
	Chen, Tim C

On Wed, 2010-10-20 at 09:31 +0200, Andi Kleen wrote:
> Really a lot of the per CPU scaling we have today should be per core
> or per node to avoid explosion. 

Shouldn't that be per-cache instead of per-core?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2]x86: spread tlb flush vector between nodes
  2010-10-20 11:20     ` Peter Zijlstra
@ 2010-10-20 12:06       ` Andi Kleen
  2010-10-20 12:08         ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2010-10-20 12:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Eric Dumazet, Shaohua Li, lkml, Ingo Molnar,
	hpa@zytor.com, Chen, Tim C

On Wed, Oct 20, 2010 at 01:20:52PM +0200, Peter Zijlstra wrote:
> On Wed, 2010-10-20 at 09:31 +0200, Andi Kleen wrote:
> > Really a lot of the per CPU scaling we have today should be per core
> > or per node to avoid explosion. 
> 
> Shouldn't that be per-cache instead of per-core?

That's the same on modern x86:

per core = per L1/L2 cache
per node = per L3 cache

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2]x86: spread tlb flush vector between nodes
  2010-10-20 12:06       ` Andi Kleen
@ 2010-10-20 12:08         ` Peter Zijlstra
  2010-10-20 12:18           ` Andi Kleen
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2010-10-20 12:08 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Eric Dumazet, Shaohua Li, lkml, Ingo Molnar, hpa@zytor.com,
	Chen, Tim C

On Wed, 2010-10-20 at 14:06 +0200, Andi Kleen wrote:
> On Wed, Oct 20, 2010 at 01:20:52PM +0200, Peter Zijlstra wrote:
> > On Wed, 2010-10-20 at 09:31 +0200, Andi Kleen wrote:
> > > Really a lot of the per CPU scaling we have today should be per core
> > > or per node to avoid explosion. 
> > 
> > Shouldn't that be per-cache instead of per-core?
> 
> That's the same on modern x86:

Last time I checked there's more than 1 directory in arch/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2]x86: spread tlb flush vector between nodes
  2010-10-20 12:08         ` Peter Zijlstra
@ 2010-10-20 12:18           ` Andi Kleen
  0 siblings, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2010-10-20 12:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Eric Dumazet, Shaohua Li, lkml, Ingo Molnar,
	hpa@zytor.com, Chen, Tim C

On Wed, Oct 20, 2010 at 02:08:32PM +0200, Peter Zijlstra wrote:
> On Wed, 2010-10-20 at 14:06 +0200, Andi Kleen wrote:
> > On Wed, Oct 20, 2010 at 01:20:52PM +0200, Peter Zijlstra wrote:
> > > On Wed, 2010-10-20 at 09:31 +0200, Andi Kleen wrote:
> > > > Really a lot of the per CPU scaling we have today should be per core
> > > > or per node to avoid explosion. 
> > > 
> > > Shouldn't that be per-cache instead of per-core?
> > 
> > That's the same on modern x86:
> 
> Last time I checked there's more than 1 directory in arch/

Not sure what your point is? 

I believe non x86 server processors have similar cache
layouts as the one I described, occasionally with another
cache level, and should do well with a similar setup.

For non server it typically doesn't matter too much
because there are not enough cores.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tip:x86/mm] x86: Spread tlb flush vector between nodes
  2010-10-20  3:07 [PATCH 2/2]x86: spread tlb flush vector between nodes Shaohua Li
  2010-10-20  5:16 ` Eric Dumazet
  2010-10-20  7:30 ` Andi Kleen
@ 2010-10-20 23:07 ` tip-bot for Shaohua Li
  2 siblings, 0 replies; 10+ messages in thread
From: tip-bot for Shaohua Li @ 2010-10-20 23:07 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, eric.dumazet, shaohua.li, tglx, hpa

Commit-ID:  932967202182743c01a2eee4bdfa2c42697bc586
Gitweb:     http://git.kernel.org/tip/932967202182743c01a2eee4bdfa2c42697bc586
Author:     Shaohua Li <shaohua.li@intel.com>
AuthorDate: Wed, 20 Oct 2010 11:07:03 +0800
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Wed, 20 Oct 2010 14:44:42 -0700

x86: Spread tlb flush vector between nodes

Currently flush tlb vector allocation is based on below equation:
	sender = smp_processor_id() % 8
This isn't optimal, CPUs from different node can have the same vector, this
causes a lot of lock contention. Instead, we can assign the same vectors to
CPUs from the same node, while different node has different vectors. This has
below advantages:
a. if there is lock contention, the lock contention is between CPUs from one
node. This should be much cheaper than the contention between nodes.
b. completely avoid lock contention between nodes. This especially benefits
kswapd, which is the biggest user of tlb flush, since kswapd sets its affinity
to specific node.

In my test, this could reduce > 20% CPU overhead in extreme case.The test
machine has 4 nodes and each node has 16 CPUs. I then bind each node's kswapd
to the first CPU of the node. I run a workload with 4 sequential mmap file
read thread. The files are empty sparse file. This workload will trigger a
lot of page reclaim and tlbflush. The kswapd bind is to easy trigger the
extreme tlb flush lock contention because otherwise kswapd keeps migrating
between CPUs of a node and I can't get stable result. Sure in real workload,
we can't always see so big tlb flush lock contention, but it's possible.

[ hpa: folded in fix from Eric Dumazet to use this_cpu_read() ]

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
LKML-Reference: <1287544023.4571.8.camel@sli10-conroe.sh.intel.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/mm/tlb.c |   48 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 47 insertions(+), 1 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index c03f14a..4935848 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -5,6 +5,7 @@
 #include <linux/smp.h>
 #include <linux/interrupt.h>
 #include <linux/module.h>
+#include <linux/cpu.h>
 
 #include <asm/tlbflush.h>
 #include <asm/mmu_context.h>
@@ -52,6 +53,8 @@ union smp_flush_state {
    want false sharing in the per cpu data segment. */
 static union smp_flush_state flush_state[NUM_INVALIDATE_TLB_VECTORS];
 
+static DEFINE_PER_CPU_READ_MOSTLY(int, tlb_vector_offset);
+
 /*
  * We cannot call mmdrop() because we are in interrupt context,
  * instead update mm->cpu_vm_mask.
@@ -173,7 +176,7 @@ static void flush_tlb_others_ipi(const struct cpumask *cpumask,
 	union smp_flush_state *f;
 
 	/* Caller has disabled preemption */
-	sender = smp_processor_id() % NUM_INVALIDATE_TLB_VECTORS;
+	sender = this_cpu_read(tlb_vector_offset);
 	f = &flush_state[sender];
 
 	/*
@@ -218,6 +221,47 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
 	flush_tlb_others_ipi(cpumask, mm, va);
 }
 
+static void __cpuinit calculate_tlb_offset(void)
+{
+	int cpu, node, nr_node_vecs;
+	/*
+	 * we are changing tlb_vector_offset for each CPU in runtime, but this
+	 * will not cause inconsistency, as the write is atomic under X86. we
+	 * might see more lock contentions in a short time, but after all CPU's
+	 * tlb_vector_offset are changed, everything should go normal
+	 *
+	 * Note: if NUM_INVALIDATE_TLB_VECTORS % nr_online_nodes !=0, we might
+	 * waste some vectors.
+	 **/
+	if (nr_online_nodes > NUM_INVALIDATE_TLB_VECTORS)
+		nr_node_vecs = 1;
+	else
+		nr_node_vecs = NUM_INVALIDATE_TLB_VECTORS/nr_online_nodes;
+
+	for_each_online_node(node) {
+		int node_offset = (node % NUM_INVALIDATE_TLB_VECTORS) *
+			nr_node_vecs;
+		int cpu_offset = 0;
+		for_each_cpu(cpu, cpumask_of_node(node)) {
+			per_cpu(tlb_vector_offset, cpu) = node_offset +
+				cpu_offset;
+			cpu_offset++;
+			cpu_offset = cpu_offset % nr_node_vecs;
+		}
+	}
+}
+
+static int tlb_cpuhp_notify(struct notifier_block *n,
+		unsigned long action, void *hcpu)
+{
+	switch (action & 0xf) {
+	case CPU_ONLINE:
+	case CPU_DEAD:
+		calculate_tlb_offset();
+	}
+	return NOTIFY_OK;
+}
+
 static int __cpuinit init_smp_flush(void)
 {
 	int i;
@@ -225,6 +269,8 @@ static int __cpuinit init_smp_flush(void)
 	for (i = 0; i < ARRAY_SIZE(flush_state); i++)
 		raw_spin_lock_init(&flush_state[i].tlbstate_lock);
 
+	calculate_tlb_offset();
+	hotcpu_notifier(tlb_cpuhp_notify, 0);
 	return 0;
 }
 core_initcall(init_smp_flush);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-10-20 23:07 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-20  3:07 [PATCH 2/2]x86: spread tlb flush vector between nodes Shaohua Li
2010-10-20  5:16 ` Eric Dumazet
2010-10-20  7:31   ` Andi Kleen
2010-10-20 11:20     ` Peter Zijlstra
2010-10-20 12:06       ` Andi Kleen
2010-10-20 12:08         ` Peter Zijlstra
2010-10-20 12:18           ` Andi Kleen
2010-10-20  7:30 ` Andi Kleen
2010-10-20  8:44   ` Shaohua Li
2010-10-20 23:07 ` [tip:x86/mm] x86: Spread " tip-bot for Shaohua Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox