* [v7 1/8] smp: introduce a generic on_each_cpu_mask function
[not found] <1327572121-13673-1-git-send-email-gilad@benyossef.com>
@ 2012-01-26 10:01 ` Gilad Ben-Yossef
2012-01-29 12:24 ` Gilad Ben-Yossef
2012-01-26 10:01 ` [v7 2/8] arm: move arm over to generic on_each_cpu_mask Gilad Ben-Yossef
` (5 subsequent siblings)
6 siblings, 1 reply; 21+ messages in thread
From: Gilad Ben-Yossef @ 2012-01-26 10:01 UTC (permalink / raw)
To: linux-kernel
Cc: Gilad Ben-Yossef, Chris Metcalf, Peter Zijlstra,
Frederic Weisbecker, Russell King, linux-mm, Pekka Enberg,
Matt Mackall, Rik van Riel, Andi Kleen, Sasha Levin, Mel Gorman,
Andrew Morton, Alexander Viro, linux-fsdevel, Avi Kivity,
Michal Nazarewicz, Kosaki Motohiro, Milton Miller
on_each_cpu_mask calls a function on processors specified by
cpumask, which may or may not include the local processor.
You must not call this function with disabled interrupts or
from a hardware interrupt handler or from a bottom half handler.
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
CC: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Rik van Riel <riel@redhat.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Sasha Levin <levinsasha928@gmail.com>
CC: Mel Gorman <mel@csn.ul.ie>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
CC: Avi Kivity <avi@redhat.com>
CC: Michal Nazarewicz <mina86@mina86.org>
CC: Kosaki Motohiro <kosaki.motohiro@gmail.com>
CC: Milton Miller <miltonm@bga.com>
---
include/linux/smp.h | 22 ++++++++++++++++++++++
kernel/smp.c | 29 +++++++++++++++++++++++++++++
2 files changed, 51 insertions(+), 0 deletions(-)
diff --git a/include/linux/smp.h b/include/linux/smp.h
index 8cc38d3..d0adb78 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -102,6 +102,13 @@ static inline void call_function_init(void) { }
int on_each_cpu(smp_call_func_t func, void *info, int wait);
/*
+ * Call a function on processors specified by mask, which might include
+ * the local one.
+ */
+void on_each_cpu_mask(const struct cpumask *mask, smp_call_func_t func,
+ void *info, bool wait);
+
+/*
* Mark the boot cpu "online" so that it can call console drivers in
* printk() and can access its per-cpu storage.
*/
@@ -132,6 +139,21 @@ static inline int up_smp_call_function(smp_call_func_t func, void *info)
local_irq_enable(); \
0; \
})
+/*
+ * Note we still need to test the mask even for UP
+ * because we actually can get an empty mask from
+ * code that on SMP might call us without the local
+ * CPU in the mask.
+ */
+#define on_each_cpu_mask(mask, func, info, wait) \
+ do { \
+ if (cpumask_test_cpu(0, (mask))) { \
+ local_irq_disable(); \
+ (func)(info); \
+ local_irq_enable(); \
+ } \
+ } while (0)
+
static inline void smp_send_reschedule(int cpu) { }
#define num_booting_cpus() 1
#define smp_prepare_boot_cpu() do {} while (0)
diff --git a/kernel/smp.c b/kernel/smp.c
index db197d6..a081e6c 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -701,3 +701,32 @@ int on_each_cpu(void (*func) (void *info), void *info, int wait)
return ret;
}
EXPORT_SYMBOL(on_each_cpu);
+
+/**
+ * on_each_cpu_mask(): Run a function on processors specified by
+ * cpumask, which may include the local processor.
+ * @mask: The set of cpus to run on (only runs on online subset).
+ * @func: The function to run. This must be fast and non-blocking.
+ * @info: An arbitrary pointer to pass to the function.
+ * @wait: If true, wait (atomically) until function has completed
+ * on other CPUs.
+ *
+ * If @wait is true, then returns once @func has returned.
+ *
+ * You must not call this function with disabled interrupts or
+ * from a hardware interrupt handler or from a bottom half handler.
+ */
+void on_each_cpu_mask(const struct cpumask *mask, smp_call_func_t func,
+ void *info, bool wait)
+{
+ int cpu = get_cpu();
+
+ smp_call_function_many(mask, func, info, wait);
+ if (cpumask_test_cpu(cpu, mask)) {
+ local_irq_disable();
+ func(info);
+ local_irq_enable();
+ }
+ put_cpu();
+}
+EXPORT_SYMBOL(on_each_cpu_mask);
--
1.7.0.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [v7 2/8] arm: move arm over to generic on_each_cpu_mask
[not found] <1327572121-13673-1-git-send-email-gilad@benyossef.com>
2012-01-26 10:01 ` [v7 1/8] smp: introduce a generic on_each_cpu_mask function Gilad Ben-Yossef
@ 2012-01-26 10:01 ` Gilad Ben-Yossef
2012-01-26 10:01 ` [v7 3/8] tile: move tile to use " Gilad Ben-Yossef
` (4 subsequent siblings)
6 siblings, 0 replies; 21+ messages in thread
From: Gilad Ben-Yossef @ 2012-01-26 10:01 UTC (permalink / raw)
To: linux-kernel
Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
Russell King, Christoph Lameter, Chris Metcalf, linux-mm,
Pekka Enberg, Matt Mackall, Rik van Riel, Andi Kleen, Sasha Levin,
Mel Gorman, Andrew Morton, Alexander Viro, linux-fsdevel,
Avi Kivity, Michal Nazarewicz, Kosaki Motohiro, Milton Miller
Note that the generic version is a little different then the Arm one:
1. It has the mask as first parameter
2. It calls the function on the calling CPU with interrupts disabled,
but this should be OK since the function is called on the other CPUs
with interrupts disabled anyway.
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Christoph Lameter <cl@linux.com>
CC: Chris Metcalf <cmetcalf@tilera.com>
CC: linux-mm@kvack.org
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Rik van Riel <riel@redhat.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Sasha Levin <levinsasha928@gmail.com>
CC: Mel Gorman <mel@csn.ul.ie>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
CC: Avi Kivity <avi@redhat.com>
CC: Michal Nazarewicz <mina86@mina86.org>
CC: Kosaki Motohiro <kosaki.motohiro@gmail.com>
CC: Milton Miller <miltonm@bga.com>
---
arch/arm/kernel/smp_tlb.c | 20 +++++---------------
1 files changed, 5 insertions(+), 15 deletions(-)
diff --git a/arch/arm/kernel/smp_tlb.c b/arch/arm/kernel/smp_tlb.c
index 7dcb352..02c5d2c 100644
--- a/arch/arm/kernel/smp_tlb.c
+++ b/arch/arm/kernel/smp_tlb.c
@@ -13,18 +13,6 @@
#include <asm/smp_plat.h>
#include <asm/tlbflush.h>
-static void on_each_cpu_mask(void (*func)(void *), void *info, int wait,
- const struct cpumask *mask)
-{
- preempt_disable();
-
- smp_call_function_many(mask, func, info, wait);
- if (cpumask_test_cpu(smp_processor_id(), mask))
- func(info);
-
- preempt_enable();
-}
-
/**********************************************************************/
/*
@@ -87,7 +75,7 @@ void flush_tlb_all(void)
void flush_tlb_mm(struct mm_struct *mm)
{
if (tlb_ops_need_broadcast())
- on_each_cpu_mask(ipi_flush_tlb_mm, mm, 1, mm_cpumask(mm));
+ on_each_cpu_mask(mm_cpumask(mm), ipi_flush_tlb_mm, mm, 1);
else
local_flush_tlb_mm(mm);
}
@@ -98,7 +86,8 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
struct tlb_args ta;
ta.ta_vma = vma;
ta.ta_start = uaddr;
- on_each_cpu_mask(ipi_flush_tlb_page, &ta, 1, mm_cpumask(vma->vm_mm));
+ on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_page,
+ &ta, 1);
} else
local_flush_tlb_page(vma, uaddr);
}
@@ -121,7 +110,8 @@ void flush_tlb_range(struct vm_area_struct *vma,
ta.ta_vma = vma;
ta.ta_start = start;
ta.ta_end = end;
- on_each_cpu_mask(ipi_flush_tlb_range, &ta, 1, mm_cpumask(vma->vm_mm));
+ on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_range,
+ &ta, 1);
} else
local_flush_tlb_range(vma, start, end);
}
--
1.7.0.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [v7 3/8] tile: move tile to use generic on_each_cpu_mask
[not found] <1327572121-13673-1-git-send-email-gilad@benyossef.com>
2012-01-26 10:01 ` [v7 1/8] smp: introduce a generic on_each_cpu_mask function Gilad Ben-Yossef
2012-01-26 10:01 ` [v7 2/8] arm: move arm over to generic on_each_cpu_mask Gilad Ben-Yossef
@ 2012-01-26 10:01 ` Gilad Ben-Yossef
2012-01-26 10:01 ` [v7 4/8] smp: add func to IPI cpus based on parameter func Gilad Ben-Yossef
` (3 subsequent siblings)
6 siblings, 0 replies; 21+ messages in thread
From: Gilad Ben-Yossef @ 2012-01-26 10:01 UTC (permalink / raw)
To: linux-kernel
Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
Matt Mackall, Rik van Riel, Andi Kleen, Sasha Levin, Mel Gorman,
Andrew Morton, Alexander Viro, linux-fsdevel, Avi Kivity,
Michal Nazarewicz, Kosaki Motohiro
The API is the same as the tile private one, but the generic version
also calls the function on the with interrupts disabled in UP case
This is OK since the function is called on the other CPUs
with interrupts disabled.
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Rik van Riel <riel@redhat.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Sasha Levin <levinsasha928@gmail.com>
CC: Mel Gorman <mel@csn.ul.ie>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
CC: Avi Kivity <avi@redhat.com>
CC: Michal Nazarewicz <mina86@mina86.org>
CC: Kosaki Motohiro <kosaki.motohiro@gmail.com>
---
arch/tile/include/asm/smp.h | 7 -------
arch/tile/kernel/smp.c | 19 -------------------
2 files changed, 0 insertions(+), 26 deletions(-)
diff --git a/arch/tile/include/asm/smp.h b/arch/tile/include/asm/smp.h
index 532124a..1aa759a 100644
--- a/arch/tile/include/asm/smp.h
+++ b/arch/tile/include/asm/smp.h
@@ -43,10 +43,6 @@ void evaluate_message(int tag);
/* Boot a secondary cpu */
void online_secondary(void);
-/* Call a function on a specified set of CPUs (may include this one). */
-extern void on_each_cpu_mask(const struct cpumask *mask,
- void (*func)(void *), void *info, bool wait);
-
/* Topology of the supervisor tile grid, and coordinates of boot processor */
extern HV_Topology smp_topology;
@@ -91,9 +87,6 @@ void print_disabled_cpus(void);
#else /* !CONFIG_SMP */
-#define on_each_cpu_mask(mask, func, info, wait) \
- do { if (cpumask_test_cpu(0, (mask))) func(info); } while (0)
-
#define smp_master_cpu 0
#define smp_height 1
#define smp_width 1
diff --git a/arch/tile/kernel/smp.c b/arch/tile/kernel/smp.c
index c52224d..a44e103 100644
--- a/arch/tile/kernel/smp.c
+++ b/arch/tile/kernel/smp.c
@@ -87,25 +87,6 @@ void send_IPI_allbutself(int tag)
send_IPI_many(&mask, tag);
}
-
-/*
- * Provide smp_call_function_mask, but also run function locally
- * if specified in the mask.
- */
-void on_each_cpu_mask(const struct cpumask *mask, void (*func)(void *),
- void *info, bool wait)
-{
- int cpu = get_cpu();
- smp_call_function_many(mask, func, info, wait);
- if (cpumask_test_cpu(cpu, mask)) {
- local_irq_disable();
- func(info);
- local_irq_enable();
- }
- put_cpu();
-}
-
-
/*
* Functions related to starting/stopping cpus.
*/
--
1.7.0.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [v7 4/8] smp: add func to IPI cpus based on parameter func
[not found] <1327572121-13673-1-git-send-email-gilad@benyossef.com>
` (2 preceding siblings ...)
2012-01-26 10:01 ` [v7 3/8] tile: move tile to use " Gilad Ben-Yossef
@ 2012-01-26 10:01 ` Gilad Ben-Yossef
2012-01-27 23:57 ` Andrew Morton
2012-01-26 10:01 ` [v7 5/8] slub: only IPI CPUs that have per cpu obj to flush Gilad Ben-Yossef
` (2 subsequent siblings)
6 siblings, 1 reply; 21+ messages in thread
From: Gilad Ben-Yossef @ 2012-01-26 10:01 UTC (permalink / raw)
To: linux-kernel
Cc: Gilad Ben-Yossef, Chris Metcalf, Christoph Lameter,
Peter Zijlstra, Frederic Weisbecker, Russell King, linux-mm,
Pekka Enberg, Matt Mackall, Sasha Levin, Rik van Riel, Andi Kleen,
Alexander Viro, linux-fsdevel, Avi Kivity, Michal Nazarewicz,
Kosaki Motohiro, Andrew Morton, Milton Miller
Add the on_each_cpu_cond() function that wraps on_each_cpu_mask()
and calculates the cpumask of cpus to IPI by calling a function supplied
as a parameter in order to determine whether to IPI each specific cpu.
The function works around allocation failure of cpumask variable in
CONFIG_CPUMASK_OFFSTACK=y by itereating over cpus sending an IPI a
time via smp_call_function_single().
The function is useful since it allows to seperate the specific
code that decided in each case whether to IPI a specific cpu for
a specific request from the common boilerplate code of handling
creating the mask, handling failures etc.
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
CC: Chris Metcalf <cmetcalf@tilera.com>
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
CC: Rik van Riel <riel@redhat.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
CC: Avi Kivity <avi@redhat.com>
CC: Michal Nazarewicz <mina86@mina86.com>
CC: Kosaki Motohiro <kosaki.motohiro@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Milton Miller <miltonm@bga.com>
---
include/linux/smp.h | 19 ++++++++++++++++
kernel/smp.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 77 insertions(+), 0 deletions(-)
diff --git a/include/linux/smp.h b/include/linux/smp.h
index d0adb78..e1ea702 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -109,6 +109,15 @@ void on_each_cpu_mask(const struct cpumask *mask, smp_call_func_t func,
void *info, bool wait);
/*
+ * Call a function on each processor for which the supplied function
+ * cond_func returns a positive value. This may include the local
+ * processor.
+ */
+void on_each_cpu_cond(bool (*cond_func)(int cpu, void *info),
+ smp_call_func_t func, void *info, bool wait,
+ gfp_t gfpflags);
+
+/*
* Mark the boot cpu "online" so that it can call console drivers in
* printk() and can access its per-cpu storage.
*/
@@ -153,6 +162,16 @@ static inline int up_smp_call_function(smp_call_func_t func, void *info)
local_irq_enable(); \
} \
} while (0)
+#define on_each_cpu_cond(cond_func, func, info, wait, gfpflags) \
+ do { \
+ preempt_disable(); \
+ if (cond_func(0, info)) { \
+ local_irq_disable(); \
+ (func)(info); \
+ local_irq_enable(); \
+ } \
+ preempt_enable(); \
+ } while (0)
static inline void smp_send_reschedule(int cpu) { }
#define num_booting_cpus() 1
diff --git a/kernel/smp.c b/kernel/smp.c
index a081e6c..fa0912a 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -730,3 +730,61 @@ void on_each_cpu_mask(const struct cpumask *mask, smp_call_func_t func,
put_cpu();
}
EXPORT_SYMBOL(on_each_cpu_mask);
+
+/*
+ * on_each_cpu_cond(): Call a function on each processor for which
+ * the supplied function cond_func returns true, optionally waiting
+ * for all the required CPUs to finish. This may include the local
+ * processor.
+ * @cond_func: A callback function that is passed a cpu id and
+ * the the info parameter. The function is called
+ * with preemption disabled. The function should
+ * return a blooean value indicating whether to IPI
+ * the specified CPU.
+ * @func: The function to run on all applicable CPUs.
+ * This must be fast and non-blocking.
+ * @info: An arbitrary pointer to pass to both functions.
+ * @wait: If true, wait (atomically) until function has
+ * completed on other CPUs.
+ * @gfpflags: GFP flags to use when allocating the cpumask
+ * used internally by the function.
+ *
+ * The function might sleep if the GFP flags indicates a non
+ * atomic allocation is allowed.
+ *
+ * You must not call this function with disabled interrupts or
+ * from a hardware interrupt handler or from a bottom half handler.
+ */
+void on_each_cpu_cond(bool (*cond_func)(int cpu, void *info),
+ smp_call_func_t func, void *info, bool wait,
+ gfp_t gfpflags)
+{
+ cpumask_var_t cpus;
+ int cpu, ret;
+
+ might_sleep_if(gfpflags & __GFP_WAIT);
+
+ if (likely(zalloc_cpumask_var(&cpus, (gfpflags|__GFP_NOWARN)))) {
+ preempt_disable();
+ for_each_online_cpu(cpu)
+ if (cond_func(cpu, info))
+ cpumask_set_cpu(cpu, cpus);
+ on_each_cpu_mask(cpus, func, info, wait);
+ preempt_enable();
+ free_cpumask_var(cpus);
+ } else {
+ /*
+ * No free cpumask, bother. No matter, we'll
+ * just have to IPI them one by one.
+ */
+ preempt_disable();
+ for_each_online_cpu(cpu)
+ if (cond_func(cpu, info)) {
+ ret = smp_call_function_single(cpu, func,
+ info, wait);
+ WARN_ON_ONCE(!ret);
+ }
+ preempt_enable();
+ }
+}
+EXPORT_SYMBOL(on_each_cpu_cond);
--
1.7.0.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [v7 5/8] slub: only IPI CPUs that have per cpu obj to flush
[not found] <1327572121-13673-1-git-send-email-gilad@benyossef.com>
` (3 preceding siblings ...)
2012-01-26 10:01 ` [v7 4/8] smp: add func to IPI cpus based on parameter func Gilad Ben-Yossef
@ 2012-01-26 10:01 ` Gilad Ben-Yossef
2012-01-26 15:09 ` Christoph Lameter
2012-01-26 10:01 ` [v7 6/8] fs: only send IPI to invalidate LRU BH when needed Gilad Ben-Yossef
2012-01-26 10:02 ` [v7 7/8] mm: only IPI CPUs to drain local pages if they exist Gilad Ben-Yossef
6 siblings, 1 reply; 21+ messages in thread
From: Gilad Ben-Yossef @ 2012-01-26 10:01 UTC (permalink / raw)
To: linux-kernel
Cc: Gilad Ben-Yossef, Christoph Lameter, Chris Metcalf,
Peter Zijlstra, Frederic Weisbecker, Russell King, linux-mm,
Pekka Enberg, Matt Mackall, Sasha Levin, Rik van Riel, Andi Kleen,
Mel Gorman, Andrew Morton, Alexander Viro, linux-fsdevel,
Avi Kivity, Michal Nazarewicz, Kosaki Motohiro, Milton Miller
flush_all() is called for each kmem_cahce_destroy(). So every cache
being destroyed dynamically ends up sending an IPI to each CPU in the
system, regardless if the cache has ever been used there.
For example, if you close the Infinband ipath driver char device file,
the close file ops calls kmem_cache_destroy(). So running some
infiniband config tool on one a single CPU dedicated to system tasks
might interrupt the rest of the 127 CPUs dedicated to some CPU
intensive or latency sensitive task.
I suspect there is a good chance that every line in the output of "git
grep kmem_cache_destroy linux/ | grep '\->'" has a similar scenario.
This patch attempts to rectify this issue by sending an IPI to flush
the per cpu objects back to the free lists only to CPUs that seem to
have such objects.
The check which CPU to IPI is racy but we don't care since asking a
CPU without per cpu objects to flush does no damage and as far as I
can tell the flush_all by itself is racy against allocs on remote
CPUs anyway, so if you required the flush_all to be determinstic, you
had to arrange for locking regardless.
Without this patch the following artificial test case:
$ cd /sys/kernel/slab
$ for DIR in *; do cat $DIR/alloc_calls > /dev/null; done
produces 166 IPIs on an cpuset isolated CPU. With it it produces none.
The code path of memory allocation failure for CPUMASK_OFFSTACK=y
config was tested using fault injection framework.
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
CC: Christoph Lameter <cl@linux.com>
CC: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
CC: Rik van Riel <riel@redhat.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Mel Gorman <mel@csn.ul.ie>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
CC: Avi Kivity <avi@redhat.com>
CC: Michal Nazarewicz <mina86@mina86.org>
CC: Kosaki Motohiro <kosaki.motohiro@gmail.com>
CC: Milton Miller <miltonm@bga.com>
---
mm/slub.c | 10 +++++++++-
1 files changed, 9 insertions(+), 1 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 4907563..3d75f89 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2018,9 +2018,17 @@ static void flush_cpu_slab(void *d)
__flush_cpu_slab(s, smp_processor_id());
}
+static bool has_cpu_slab(int cpu, void *info)
+{
+ struct kmem_cache *s = info;
+ struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
+
+ return !!(c->page);
+}
+
static void flush_all(struct kmem_cache *s)
{
- on_each_cpu(flush_cpu_slab, s, 1);
+ on_each_cpu_cond(has_cpu_slab, flush_cpu_slab, s, 1, GFP_ATOMIC);
}
/*
--
1.7.0.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [v7 6/8] fs: only send IPI to invalidate LRU BH when needed
[not found] <1327572121-13673-1-git-send-email-gilad@benyossef.com>
` (4 preceding siblings ...)
2012-01-26 10:01 ` [v7 5/8] slub: only IPI CPUs that have per cpu obj to flush Gilad Ben-Yossef
@ 2012-01-26 10:01 ` Gilad Ben-Yossef
2012-01-26 10:02 ` [v7 7/8] mm: only IPI CPUs to drain local pages if they exist Gilad Ben-Yossef
6 siblings, 0 replies; 21+ messages in thread
From: Gilad Ben-Yossef @ 2012-01-26 10:01 UTC (permalink / raw)
To: linux-kernel
Cc: Gilad Ben-Yossef, Christoph Lameter, Chris Metcalf,
Peter Zijlstra, Frederic Weisbecker, Russell King, linux-mm,
Pekka Enberg, Matt Mackall, Sasha Levin, Rik van Riel, Andi Kleen,
Mel Gorman, Andrew Morton, Alexander Viro, linux-fsdevel,
Avi Kivity, Michal Nazarewicz, Kosaki Motohiro, Milton Miller
In several code paths, such as when unmounting a file system (but
not only) we send an IPI to ask each cpu to invalidate its local
LRU BHs.
For multi-cores systems that have many cpus that may not have
any LRU BH because they are idle or because they have not performed
any file system accesses since last invalidation (e.g. CPU crunching
on high perfomance computing nodes that write results to shared
memory or only using filesystems that do not use the bh layer.)
This can lead to loss of performance each time someone switches
the KVM (the virtual keyboard and screen type, not the hypervisor)
if it has a USB storage stuck in.
This patch attempts to only send an IPI to cpus that have LRU BH.
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
CC: Christoph Lameter <cl@linux.com>
CC: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
CC: Rik van Riel <riel@redhat.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Mel Gorman <mel@csn.ul.ie>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
CC: Avi Kivity <avi@redhat.com>
CC: Michal Nazarewicz <mina86@mina86.com>
CC: Kosaki Motohiro <kosaki.motohiro@gmail.com>
CC: Milton Miller <miltonm@bga.com>
---
fs/buffer.c | 15 ++++++++++++++-
1 files changed, 14 insertions(+), 1 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 1a30db7..baa075e 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1384,10 +1384,23 @@ static void invalidate_bh_lru(void *arg)
}
put_cpu_var(bh_lrus);
}
+
+static bool has_bh_in_lru(int cpu, void *dummy)
+{
+ struct bh_lru *b = per_cpu_ptr(&bh_lrus, cpu);
+ int i;
+ for (i = 0; i < BH_LRU_SIZE; i++) {
+ if (b->bhs[i])
+ return 1;
+ }
+
+ return 0;
+}
+
void invalidate_bh_lrus(void)
{
- on_each_cpu(invalidate_bh_lru, NULL, 1);
+ on_each_cpu_cond(has_bh_in_lru, invalidate_bh_lru, NULL, 1, GFP_KERNEL);
}
EXPORT_SYMBOL_GPL(invalidate_bh_lrus);
--
1.7.0.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [v7 7/8] mm: only IPI CPUs to drain local pages if they exist
[not found] <1327572121-13673-1-git-send-email-gilad@benyossef.com>
` (5 preceding siblings ...)
2012-01-26 10:01 ` [v7 6/8] fs: only send IPI to invalidate LRU BH when needed Gilad Ben-Yossef
@ 2012-01-26 10:02 ` Gilad Ben-Yossef
2012-01-26 15:13 ` Christoph Lameter
` (2 more replies)
6 siblings, 3 replies; 21+ messages in thread
From: Gilad Ben-Yossef @ 2012-01-26 10:02 UTC (permalink / raw)
To: linux-kernel
Cc: Gilad Ben-Yossef, Mel Gorman, KOSAKI Motohiro, Christoph Lameter,
Chris Metcalf, Peter Zijlstra, Frederic Weisbecker, Russell King,
linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin, Rik van Riel,
Andi Kleen, Andrew Morton, Alexander Viro, linux-fsdevel,
Avi Kivity, Michal Nazarewicz, Milton Miller
Calculate a cpumask of CPUs with per-cpu pages in any zone
and only send an IPI requesting CPUs to drain these pages
to the buddy allocator if they actually have pages when
asked to flush.
This patch saves 85%+ of IPIs asking to drain per-cpu
pages in case of severe memory preassure that leads
to OOM since in these cases multiple, possibly concurrent,
allocation requests end up in the direct reclaim code
path so when the per-cpu pages end up reclaimed on first
allocation failure for most of the proceeding allocation
attempts until the memory pressure is off (possibly via
the OOM killer) there are no per-cpu pages on most CPUs
(and there can easily be hundreds of them).
This also has the side effect of shortening the average
latency of direct reclaim by 1 or more order of magnitude
since waiting for all the CPUs to ACK the IPI takes a
long time.
Tested by running "hackbench 400" on a 8 CPU x86 VM and
observing the difference between the number of direct
reclaim attempts that end up in drain_all_pages() and
those were more then 1/2 of the online CPU had any per-cpu
page in them, using the vmstat counters introduced
in the next patch in the series and using proc/interrupts.
In the test sceanrio, this was seen to save around 3600 global
IPIs after trigerring an OOM on a concurrent workload:
$ cat /proc/vmstat | tail -n 2
pcp_global_drain 0
pcp_global_ipi_saved 0
$ cat /proc/interrupts | grep CAL
CAL: 1 2 1 2
2 2 2 2 Function call interrupts
$ hackbench 400
[OOM messages snipped]
$ cat /proc/vmstat | tail -n 2
pcp_global_drain 3647
pcp_global_ipi_saved 3642
$ cat /proc/interrupts | grep CAL
CAL: 6 13 6 3
3 3 1 2 7 Function call interrupts
Please note that if the global drain is removed from the
direct reclaim path as a patch from Mel Gorman currently
suggests this should be replaced with an on_each_cpu_cond
invocation.
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
CC: Mel Gorman <mel@csn.ul.ie>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Christoph Lameter <cl@linux.com>
CC: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
CC: Rik van Riel <riel@redhat.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
CC: Avi Kivity <avi@redhat.com>
CC: Michal Nazarewicz <mina86@mina86.com>
CC: Milton Miller <miltonm@bga.com>
---
mm/page_alloc.c | 31 ++++++++++++++++++++++++++++++-
1 files changed, 30 insertions(+), 1 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d2186ec..4135983 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1165,7 +1165,36 @@ void drain_local_pages(void *arg)
*/
void drain_all_pages(void)
{
- on_each_cpu(drain_local_pages, NULL, 1);
+ int cpu;
+ struct per_cpu_pageset *pcp;
+ struct zone *zone;
+
+ /* Allocate in the BSS so we wont require allocation in
+ * direct reclaim path for CONFIG_CPUMASK_OFFSTACK=y
+ */
+ static cpumask_t cpus_with_pcps;
+
+ /*
+ * We don't care about racing with CPU hotplug event
+ * as offline notification will cause the notified
+ * cpu to drain that CPU pcps and on_each_cpu_mask
+ * disables preemption as part of its processing
+ */
+ for_each_online_cpu(cpu) {
+ bool has_pcps = false;
+ for_each_populated_zone(zone) {
+ pcp = per_cpu_ptr(zone->pageset, cpu);
+ if (pcp->pcp.count) {
+ has_pcps = true;
+ break;
+ }
+ }
+ if (has_pcps)
+ cpumask_set_cpu(cpu, &cpus_with_pcps);
+ else
+ cpumask_clear_cpu(cpu, &cpus_with_pcps);
+ }
+ on_each_cpu_mask(&cpus_with_pcps, drain_local_pages, NULL, 1);
}
#ifdef CONFIG_HIBERNATION
--
1.7.0.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [v7 5/8] slub: only IPI CPUs that have per cpu obj to flush
2012-01-26 10:01 ` [v7 5/8] slub: only IPI CPUs that have per cpu obj to flush Gilad Ben-Yossef
@ 2012-01-26 15:09 ` Christoph Lameter
0 siblings, 0 replies; 21+ messages in thread
From: Christoph Lameter @ 2012-01-26 15:09 UTC (permalink / raw)
To: Gilad Ben-Yossef
Cc: linux-kernel, Chris Metcalf, Peter Zijlstra, Frederic Weisbecker,
Russell King, linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin,
Rik van Riel, Andi Kleen, Mel Gorman, Andrew Morton,
Alexander Viro, linux-fsdevel, Avi Kivity, Michal Nazarewicz,
Kosaki Motohiro, Milton Miller
On Thu, 26 Jan 2012, Gilad Ben-Yossef wrote:
> produces 166 IPIs on an cpuset isolated CPU. With it it produces none.
Acked-by: Christoph Lameter <cl@linux.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [v7 7/8] mm: only IPI CPUs to drain local pages if they exist
2012-01-26 10:02 ` [v7 7/8] mm: only IPI CPUs to drain local pages if they exist Gilad Ben-Yossef
@ 2012-01-26 15:13 ` Christoph Lameter
2012-01-28 0:12 ` Andrew Morton
2012-01-30 14:59 ` Mel Gorman
2 siblings, 0 replies; 21+ messages in thread
From: Christoph Lameter @ 2012-01-26 15:13 UTC (permalink / raw)
To: Gilad Ben-Yossef
Cc: linux-kernel, Mel Gorman, KOSAKI Motohiro, Chris Metcalf,
Peter Zijlstra, Frederic Weisbecker, Russell King, linux-mm,
Pekka Enberg, Matt Mackall, Sasha Levin, Rik van Riel, Andi Kleen,
Andrew Morton, Alexander Viro, linux-fsdevel, Avi Kivity,
Michal Nazarewicz, Milton Miller
On Thu, 26 Jan 2012, Gilad Ben-Yossef wrote:
> Calculate a cpumask of CPUs with per-cpu pages in any zone
> and only send an IPI requesting CPUs to drain these pages
> to the buddy allocator if they actually have pages when
> asked to flush.
Acked-by: Christoph Lameter <cl@linux.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [v7 4/8] smp: add func to IPI cpus based on parameter func
2012-01-26 10:01 ` [v7 4/8] smp: add func to IPI cpus based on parameter func Gilad Ben-Yossef
@ 2012-01-27 23:57 ` Andrew Morton
2012-01-29 12:04 ` Gilad Ben-Yossef
0 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2012-01-27 23:57 UTC (permalink / raw)
To: Gilad Ben-Yossef
Cc: linux-kernel, Chris Metcalf, Christoph Lameter, Peter Zijlstra,
Frederic Weisbecker, Russell King, linux-mm, Pekka Enberg,
Matt Mackall, Sasha Levin, Rik van Riel, Andi Kleen,
Alexander Viro, linux-fsdevel, Avi Kivity, Michal Nazarewicz,
Kosaki Motohiro, Milton Miller
On Thu, 26 Jan 2012 12:01:57 +0200
Gilad Ben-Yossef <gilad@benyossef.com> wrote:
> Add the on_each_cpu_cond() function that wraps on_each_cpu_mask()
> and calculates the cpumask of cpus to IPI by calling a function supplied
> as a parameter in order to determine whether to IPI each specific cpu.
>
> The function works around allocation failure of cpumask variable in
> CONFIG_CPUMASK_OFFSTACK=y by itereating over cpus sending an IPI a
> time via smp_call_function_single().
>
> The function is useful since it allows to seperate the specific
> code that decided in each case whether to IPI a specific cpu for
> a specific request from the common boilerplate code of handling
> creating the mask, handling failures etc.
>
> ...
>
> @@ -153,6 +162,16 @@ static inline int up_smp_call_function(smp_call_func_t func, void *info)
> local_irq_enable(); \
> } \
> } while (0)
> +#define on_each_cpu_cond(cond_func, func, info, wait, gfpflags) \
> + do { \
> + preempt_disable(); \
> + if (cond_func(0, info)) { \
> + local_irq_disable(); \
> + (func)(info); \
> + local_irq_enable(); \
Ordinarily, local_irq_enable() in such a low-level thing is dangerous,
because it can cause horrid bugs when called from local_irq_disable()d
code.
However I think we're OK here because it is a bug to call on_each_cpu()
and friends with local irqs disabled, yes?
Do we have any warnings printks if someone calls the ipi-sending
functions with local interrupts disabled? I didn't see any, but didn't
look very hard.
If my above claims are correct then why does on_each_cpu() use
local_irq_save()? hrm.
> + } \
> + preempt_enable(); \
> + } while (0)
>
> static inline void smp_send_reschedule(int cpu) { }
> #define num_booting_cpus() 1
> diff --git a/kernel/smp.c b/kernel/smp.c
> index a081e6c..fa0912a 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -730,3 +730,61 @@ void on_each_cpu_mask(const struct cpumask *mask, smp_call_func_t func,
> put_cpu();
> }
> EXPORT_SYMBOL(on_each_cpu_mask);
> +
> +/*
> + * on_each_cpu_cond(): Call a function on each processor for which
> + * the supplied function cond_func returns true, optionally waiting
> + * for all the required CPUs to finish. This may include the local
> + * processor.
> + * @cond_func: A callback function that is passed a cpu id and
> + * the the info parameter. The function is called
> + * with preemption disabled. The function should
> + * return a blooean value indicating whether to IPI
> + * the specified CPU.
> + * @func: The function to run on all applicable CPUs.
> + * This must be fast and non-blocking.
> + * @info: An arbitrary pointer to pass to both functions.
> + * @wait: If true, wait (atomically) until function has
> + * completed on other CPUs.
> + * @gfpflags: GFP flags to use when allocating the cpumask
> + * used internally by the function.
> + *
> + * The function might sleep if the GFP flags indicates a non
> + * atomic allocation is allowed.
> + *
> + * You must not call this function with disabled interrupts or
> + * from a hardware interrupt handler or from a bottom half handler.
> + */
> +void on_each_cpu_cond(bool (*cond_func)(int cpu, void *info),
> + smp_call_func_t func, void *info, bool wait,
> + gfp_t gfpflags)
bah.
z:/usr/src/linux-3.3-rc1> grep -r gfpflags . | wc -l
78
z:/usr/src/linux-3.3-rc1> grep -r gfp_flags . | wc -l
548
> +{
> + cpumask_var_t cpus;
> + int cpu, ret;
> +
> + might_sleep_if(gfpflags & __GFP_WAIT);
For the zalloc_cpumask_var(), it seems. I expect there are
might_sleep() elsewhere in the memory allocation paths, but putting one
here will detect bugs even if CONFIG_CPUMASK_OFFSTACK=n.
> + if (likely(zalloc_cpumask_var(&cpus, (gfpflags|__GFP_NOWARN)))) {
> + preempt_disable();
> + for_each_online_cpu(cpu)
> + if (cond_func(cpu, info))
> + cpumask_set_cpu(cpu, cpus);
> + on_each_cpu_mask(cpus, func, info, wait);
> + preempt_enable();
> + free_cpumask_var(cpus);
> + } else {
> + /*
> + * No free cpumask, bother. No matter, we'll
> + * just have to IPI them one by one.
> + */
> + preempt_disable();
> + for_each_online_cpu(cpu)
> + if (cond_func(cpu, info)) {
> + ret = smp_call_function_single(cpu, func,
> + info, wait);
> + WARN_ON_ONCE(!ret);
> + }
> + preempt_enable();
> + }
> +}
> +EXPORT_SYMBOL(on_each_cpu_cond);
I assume the preempt_disable()s here are to suspend CPU hotplug?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [v7 7/8] mm: only IPI CPUs to drain local pages if they exist
2012-01-26 10:02 ` [v7 7/8] mm: only IPI CPUs to drain local pages if they exist Gilad Ben-Yossef
2012-01-26 15:13 ` Christoph Lameter
@ 2012-01-28 0:12 ` Andrew Morton
2012-01-29 12:18 ` Gilad Ben-Yossef
2012-01-30 14:59 ` Mel Gorman
2 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2012-01-28 0:12 UTC (permalink / raw)
To: Gilad Ben-Yossef
Cc: linux-kernel, Mel Gorman, KOSAKI Motohiro, Christoph Lameter,
Chris Metcalf, Peter Zijlstra, Frederic Weisbecker, Russell King,
linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin, Rik van Riel,
Andi Kleen, Alexander Viro, linux-fsdevel, Avi Kivity,
Michal Nazarewicz, Milton Miller
On Thu, 26 Jan 2012 12:02:00 +0200
Gilad Ben-Yossef <gilad@benyossef.com> wrote:
> Calculate a cpumask of CPUs with per-cpu pages in any zone
> and only send an IPI requesting CPUs to drain these pages
> to the buddy allocator if they actually have pages when
> asked to flush.
>
> This patch saves 85%+ of IPIs asking to drain per-cpu
> pages in case of severe memory preassure that leads
> to OOM since in these cases multiple, possibly concurrent,
> allocation requests end up in the direct reclaim code
> path so when the per-cpu pages end up reclaimed on first
> allocation failure for most of the proceeding allocation
> attempts until the memory pressure is off (possibly via
> the OOM killer) there are no per-cpu pages on most CPUs
> (and there can easily be hundreds of them).
>
> This also has the side effect of shortening the average
> latency of direct reclaim by 1 or more order of magnitude
> since waiting for all the CPUs to ACK the IPI takes a
> long time.
>
> Tested by running "hackbench 400" on a 8 CPU x86 VM and
> observing the difference between the number of direct
> reclaim attempts that end up in drain_all_pages() and
> those were more then 1/2 of the online CPU had any per-cpu
> page in them, using the vmstat counters introduced
> in the next patch in the series and using proc/interrupts.
>
> In the test sceanrio, this was seen to save around 3600 global
> IPIs after trigerring an OOM on a concurrent workload:
>
>
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1165,7 +1165,36 @@ void drain_local_pages(void *arg)
> */
> void drain_all_pages(void)
> {
> - on_each_cpu(drain_local_pages, NULL, 1);
> + int cpu;
> + struct per_cpu_pageset *pcp;
> + struct zone *zone;
> +
> + /* Allocate in the BSS so we wont require allocation in
> + * direct reclaim path for CONFIG_CPUMASK_OFFSTACK=y
> + */
> + static cpumask_t cpus_with_pcps;
> +
> + /*
> + * We don't care about racing with CPU hotplug event
> + * as offline notification will cause the notified
> + * cpu to drain that CPU pcps and on_each_cpu_mask
> + * disables preemption as part of its processing
> + */
hmmm.
> + for_each_online_cpu(cpu) {
> + bool has_pcps = false;
> + for_each_populated_zone(zone) {
> + pcp = per_cpu_ptr(zone->pageset, cpu);
> + if (pcp->pcp.count) {
> + has_pcps = true;
> + break;
> + }
> + }
> + if (has_pcps)
> + cpumask_set_cpu(cpu, &cpus_with_pcps);
> + else
> + cpumask_clear_cpu(cpu, &cpus_with_pcps);
> + }
> + on_each_cpu_mask(&cpus_with_pcps, drain_local_pages, NULL, 1);
> }
Can we end up sending an IPI to a now-unplugged CPU? That won't work
very well if that CPU is now sitting on its sysadmin's desk.
There's also the case of CPU online. We could end up failing to IPI a
CPU which now has some percpu pages. That's not at all serious - 90%
is good enough in page reclaim. But this thinking merits a mention in
the comment. Or we simply make this code hotplug-safe.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [v7 4/8] smp: add func to IPI cpus based on parameter func
2012-01-27 23:57 ` Andrew Morton
@ 2012-01-29 12:04 ` Gilad Ben-Yossef
0 siblings, 0 replies; 21+ messages in thread
From: Gilad Ben-Yossef @ 2012-01-29 12:04 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, Chris Metcalf, Christoph Lameter, Peter Zijlstra,
Frederic Weisbecker, Russell King, linux-mm, Pekka Enberg,
Matt Mackall, Sasha Levin, Rik van Riel, Andi Kleen,
Alexander Viro, linux-fsdevel, Avi Kivity, Michal Nazarewicz,
Kosaki Motohiro, Milton Miller
On Sat, Jan 28, 2012 at 1:57 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Thu, 26 Jan 2012 12:01:57 +0200
> Gilad Ben-Yossef <gilad@benyossef.com> wrote:
...
>>
>> @@ -153,6 +162,16 @@ static inline int up_smp_call_function(smp_call_func_t func, void *info)
>> local_irq_enable(); \
>> } \
>> } while (0)
>> +#define on_each_cpu_cond(cond_func, func, info, wait, gfpflags) \
>> + do { \
>> + preempt_disable(); \
>> + if (cond_func(0, info)) { \
>> + local_irq_disable(); \
>> + (func)(info); \
>> + local_irq_enable(); \
>
> Ordinarily, local_irq_enable() in such a low-level thing is dangerous,
> because it can cause horrid bugs when called from local_irq_disable()d
> code.
>
> However I think we're OK here because it is a bug to call on_each_cpu()
> and friends with local irqs disabled, yes?
Yes, that is my understanding and this way the function gets called in
the same conditions in UP and SMP.
> Do we have any warnings printks if someone calls the ipi-sending
> functions with local interrupts disabled? I didn't see any, but didn't
> look very hard.
There is this check in smp_call_function_many():
WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
&& !oops_in_progress && !early_boot_irqs_disabled);
Only catches SMP offenders though.
> If my above claims are correct then why does on_each_cpu() use
> local_irq_save()? hrm.
The comment in on_each_cpu() in kernel.smp.c says: "May be
used during early boot while early_boot_irqs_disabled is set. Use
local_irq_save/restore() instead of local_irq_disable/enable()."
>
>> + } \
>> + preempt_enable(); \
>> + } while (0)
>>
>> static inline void smp_send_reschedule(int cpu) { }
>> #define num_booting_cpus() 1
>> diff --git a/kernel/smp.c b/kernel/smp.c
>> index a081e6c..fa0912a 100644
>> --- a/kernel/smp.c
>> +++ b/kernel/smp.c
>> @@ -730,3 +730,61 @@ void on_each_cpu_mask(const struct cpumask *mask, smp_call_func_t func,
>> put_cpu();
>> }
>> EXPORT_SYMBOL(on_each_cpu_mask);
>> +
>> +/*
>> + * on_each_cpu_cond(): Call a function on each processor for which
>> + * the supplied function cond_func returns true, optionally waiting
>> + * for all the required CPUs to finish. This may include the local
>> + * processor.
>> + * @cond_func: A callback function that is passed a cpu id and
>> + * the the info parameter. The function is called
>> + * with preemption disabled. The function should
>> + * return a blooean value indicating whether to IPI
>> + * the specified CPU.
>> + * @func: The function to run on all applicable CPUs.
>> + * This must be fast and non-blocking.
>> + * @info: An arbitrary pointer to pass to both functions.
>> + * @wait: If true, wait (atomically) until function has
>> + * completed on other CPUs.
>> + * @gfpflags: GFP flags to use when allocating the cpumask
>> + * used internally by the function.
>> + *
>> + * The function might sleep if the GFP flags indicates a non
>> + * atomic allocation is allowed.
>> + *
>> + * You must not call this function with disabled interrupts or
>> + * from a hardware interrupt handler or from a bottom half handler.
>> + */
>> +void on_each_cpu_cond(bool (*cond_func)(int cpu, void *info),
>> + smp_call_func_t func, void *info, bool wait,
>> + gfp_t gfpflags)
>
> bah.
>
> z:/usr/src/linux-3.3-rc1> grep -r gfpflags . | wc -l
> 78
> z:/usr/src/linux-3.3-rc1> grep -r gfp_flags . | wc -l
> 548
>
I have no specific preference. Should I switch?
>> +{
>> + cpumask_var_t cpus;
>> + int cpu, ret;
>> +
>> + might_sleep_if(gfpflags & __GFP_WAIT);
>
> For the zalloc_cpumask_var(), it seems. I expect there are
> might_sleep() elsewhere in the memory allocation paths, but putting one
> here will detect bugs even if CONFIG_CPUMASK_OFFSTACK=n.
Well, yes, although I didn't think about that :-)
>
>> + if (likely(zalloc_cpumask_var(&cpus, (gfpflags|__GFP_NOWARN)))) {
>> + preempt_disable();
>> + for_each_online_cpu(cpu)
>> + if (cond_func(cpu, info))
>> + cpumask_set_cpu(cpu, cpus);
>> + on_each_cpu_mask(cpus, func, info, wait);
>> + preempt_enable();
>> + free_cpumask_var(cpus);
>> + } else {
>> + /*
>> + * No free cpumask, bother. No matter, we'll
>> + * just have to IPI them one by one.
>> + */
>> + preempt_disable();
>> + for_each_online_cpu(cpu)
>> + if (cond_func(cpu, info)) {
>> + ret = smp_call_function_single(cpu, func,
>> + info, wait);
>> + WARN_ON_ONCE(!ret);
>> + }
>> + preempt_enable();
>> + }
>> +}
>> +EXPORT_SYMBOL(on_each_cpu_cond);
>
> I assume the preempt_disable()s here are to suspend CPU hotplug?
Yes. Also, I figured that since the original code disabled
preemption for the entire on_each_cpu run time, including waiting for all
the CPUs to ack the IPI, and since we (hopefully) wait for less CPUs, the
overall runtime with preemption disabled will be (usually) lower then the
original code most of the time and we'll get a more robust interface.
Thanks,
Gilad
--
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com
"Unfortunately, cache misses are an equal opportunity pain provider."
-- Mike Galbraith, LKML
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [v7 7/8] mm: only IPI CPUs to drain local pages if they exist
2012-01-28 0:12 ` Andrew Morton
@ 2012-01-29 12:18 ` Gilad Ben-Yossef
2012-01-30 21:49 ` Andrew Morton
0 siblings, 1 reply; 21+ messages in thread
From: Gilad Ben-Yossef @ 2012-01-29 12:18 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, Mel Gorman, KOSAKI Motohiro, Christoph Lameter,
Chris Metcalf, Peter Zijlstra, Frederic Weisbecker, Russell King,
linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin, Rik van Riel,
Andi Kleen, Alexander Viro, linux-fsdevel, Avi Kivity,
Michal Nazarewicz, Milton Miller
On Sat, Jan 28, 2012 at 2:12 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Thu, 26 Jan 2012 12:02:00 +0200
> Gilad Ben-Yossef <gilad@benyossef.com> wrote:
>
>> Calculate a cpumask of CPUs with per-cpu pages in any zone
>> and only send an IPI requesting CPUs to drain these pages
>> to the buddy allocator if they actually have pages when
>> asked to flush.
>>
...
>>
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1165,7 +1165,36 @@ void drain_local_pages(void *arg)
>> */
>> void drain_all_pages(void)
>> {
>> - on_each_cpu(drain_local_pages, NULL, 1);
>> + int cpu;
>> + struct per_cpu_pageset *pcp;
>> + struct zone *zone;
>> +
>> + /* Allocate in the BSS so we wont require allocation in
>> + * direct reclaim path for CONFIG_CPUMASK_OFFSTACK=y
>> + */
>> + static cpumask_t cpus_with_pcps;
>> +
>> + /*
>> + * We don't care about racing with CPU hotplug event
>> + * as offline notification will cause the notified
>> + * cpu to drain that CPU pcps and on_each_cpu_mask
>> + * disables preemption as part of its processing
>> + */
>
> hmmm.
>
>> + for_each_online_cpu(cpu) {
>> + bool has_pcps = false;
>> + for_each_populated_zone(zone) {
>> + pcp = per_cpu_ptr(zone->pageset, cpu);
>> + if (pcp->pcp.count) {
>> + has_pcps = true;
>> + break;
>> + }
>> + }
>> + if (has_pcps)
>> + cpumask_set_cpu(cpu, &cpus_with_pcps);
>> + else
>> + cpumask_clear_cpu(cpu, &cpus_with_pcps);
>> + }
>> + on_each_cpu_mask(&cpus_with_pcps, drain_local_pages, NULL, 1);
>> }
>
> Can we end up sending an IPI to a now-unplugged CPU? That won't work
> very well if that CPU is now sitting on its sysadmin's desk.
Nope. on_each_cpu_mask() disables preemption and calls smp_call_function_many()
which then checks the mask against the cpu_online_mask
> There's also the case of CPU online. We could end up failing to IPI a
> CPU which now has some percpu pages. That's not at all serious - 90%
> is good enough in page reclaim. But this thinking merits a mention in
> the comment. Or we simply make this code hotplug-safe.
hmm.. I'm probably daft but I don't see how to make the code hotplug safe for
CPU online case. I mean, let's say we disable preemption throughout the
entire ordeal and then the CPU goes online and gets itself some percpu pages
*after* we've calculated the masks, sent the IPIs and waiting for the
whole thing
to finish but before we've returned...
I might be missing something here, but I think that unless you have some other
means to stop newly hotplugged CPUs to grab per cpus pages there is nothing
you can do in this code to stop it. Maybe make the race window
shorter, that's all.
Would adding a comment such as the following OK?
"This code is protected against sending an IPI to an offline CPU but does not
guarantee sending an IPI to newly hotplugged CPUs"
Thanks,
Gilad
--
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com
"Unfortunately, cache misses are an equal opportunity pain provider."
-- Mike Galbraith, LKML
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [v7 1/8] smp: introduce a generic on_each_cpu_mask function
2012-01-26 10:01 ` [v7 1/8] smp: introduce a generic on_each_cpu_mask function Gilad Ben-Yossef
@ 2012-01-29 12:24 ` Gilad Ben-Yossef
2012-01-30 21:52 ` Andrew Morton
0 siblings, 1 reply; 21+ messages in thread
From: Gilad Ben-Yossef @ 2012-01-29 12:24 UTC (permalink / raw)
To: Andrew Morton
Cc: Gilad Ben-Yossef, Chris Metcalf, Peter Zijlstra,
Frederic Weisbecker, Russell King, linux-mm, Pekka Enberg,
Matt Mackall, Rik van Riel, Andi Kleen, Sasha Levin, Mel Gorman,
Alexander Viro, linux-fsdevel, Avi Kivity, Michal Nazarewicz,
Kosaki Motohiro, Milton Miller, linux-kernel
On Thu, Jan 26, 2012 at 12:01 PM, Gilad Ben-Yossef <gilad@benyossef.com> wrote:
> on_each_cpu_mask calls a function on processors specified by
> cpumask, which may or may not include the local processor.
>
> You must not call this function with disabled interrupts or
> from a hardware interrupt handler or from a bottom half handler.
>
> Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
> Reviewed-by: Christoph Lameter <cl@linux.com>
> CC: Chris Metcalf <cmetcalf@tilera.com>
...
> ---
> include/linux/smp.h | 22 ++++++++++++++++++++++
> kernel/smp.c | 29 +++++++++++++++++++++++++++++
> 2 files changed, 51 insertions(+), 0 deletions(-)
>
Milton made the very sensible comment that while adding on_each_cpu() in the
arch generic code and removing the two arch specific instances from tile and arm
in separate patches is good for review, it will break bisect.
He suggested I squash them into a single commit when it goes in.
Since you picked the patch set into linux-mm, will now be a good time for that?
Thanks,
Gilad
--
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com
"Unfortunately, cache misses are an equal opportunity pain provider."
-- Mike Galbraith, LKML
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [v7 7/8] mm: only IPI CPUs to drain local pages if they exist
2012-01-26 10:02 ` [v7 7/8] mm: only IPI CPUs to drain local pages if they exist Gilad Ben-Yossef
2012-01-26 15:13 ` Christoph Lameter
2012-01-28 0:12 ` Andrew Morton
@ 2012-01-30 14:59 ` Mel Gorman
2012-01-30 15:14 ` Gilad Ben-Yossef
2 siblings, 1 reply; 21+ messages in thread
From: Mel Gorman @ 2012-01-30 14:59 UTC (permalink / raw)
To: Gilad Ben-Yossef
Cc: linux-kernel, KOSAKI Motohiro, Christoph Lameter, Chris Metcalf,
Peter Zijlstra, Frederic Weisbecker, Russell King, linux-mm,
Pekka Enberg, Matt Mackall, Sasha Levin, Rik van Riel, Andi Kleen,
Andrew Morton, Alexander Viro, linux-fsdevel, Avi Kivity,
Michal Nazarewicz, Milton Miller
On Thu, Jan 26, 2012 at 12:02:00PM +0200, Gilad Ben-Yossef wrote:
> Calculate a cpumask of CPUs with per-cpu pages in any zone
> and only send an IPI requesting CPUs to drain these pages
> to the buddy allocator if they actually have pages when
> asked to flush.
>
> This patch saves 85%+ of IPIs asking to drain per-cpu
> pages in case of severe memory preassure that leads
> to OOM since in these cases multiple, possibly concurrent,
> allocation requests end up in the direct reclaim code
> path so when the per-cpu pages end up reclaimed on first
> allocation failure for most of the proceeding allocation
> attempts until the memory pressure is off (possibly via
> the OOM killer) there are no per-cpu pages on most CPUs
> (and there can easily be hundreds of them).
>
> This also has the side effect of shortening the average
> latency of direct reclaim by 1 or more order of magnitude
> since waiting for all the CPUs to ACK the IPI takes a
> long time.
>
> Tested by running "hackbench 400" on a 8 CPU x86 VM and
> observing the difference between the number of direct
> reclaim attempts that end up in drain_all_pages() and
> those were more then 1/2 of the online CPU had any per-cpu
> page in them, using the vmstat counters introduced
> in the next patch in the series and using proc/interrupts.
>
> In the test sceanrio, this was seen to save around 3600 global
> IPIs after trigerring an OOM on a concurrent workload:
>
> $ cat /proc/vmstat | tail -n 2
> pcp_global_drain 0
> pcp_global_ipi_saved 0
>
> $ cat /proc/interrupts | grep CAL
> CAL: 1 2 1 2
> 2 2 2 2 Function call interrupts
>
> $ hackbench 400
> [OOM messages snipped]
>
> $ cat /proc/vmstat | tail -n 2
> pcp_global_drain 3647
> pcp_global_ipi_saved 3642
>
> $ cat /proc/interrupts | grep CAL
> CAL: 6 13 6 3
> 3 3 1 2 7 Function call interrupts
>
> Please note that if the global drain is removed from the
> direct reclaim path as a patch from Mel Gorman currently
> suggests this should be replaced with an on_each_cpu_cond
> invocation.
>
> Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
> CC: Mel Gorman <mel@csn.ul.ie>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Christoph Lameter <cl@linux.com>
> CC: Chris Metcalf <cmetcalf@tilera.com>
> CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
> CC: Frederic Weisbecker <fweisbec@gmail.com>
> CC: Russell King <linux@arm.linux.org.uk>
> CC: linux-mm@kvack.org
> CC: Pekka Enberg <penberg@kernel.org>
> CC: Matt Mackall <mpm@selenic.com>
> CC: Sasha Levin <levinsasha928@gmail.com>
> CC: Rik van Riel <riel@redhat.com>
> CC: Andi Kleen <andi@firstfloor.org>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Alexander Viro <viro@zeniv.linux.org.uk>
> CC: linux-fsdevel@vger.kernel.org
> CC: Avi Kivity <avi@redhat.com>
> CC: Michal Nazarewicz <mina86@mina86.com>
> CC: Milton Miller <miltonm@bga.com>
> ---
> mm/page_alloc.c | 31 ++++++++++++++++++++++++++++++-
> 1 files changed, 30 insertions(+), 1 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d2186ec..4135983 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1165,7 +1165,36 @@ void drain_local_pages(void *arg)
> */
> void drain_all_pages(void)
> {
> - on_each_cpu(drain_local_pages, NULL, 1);
> + int cpu;
> + struct per_cpu_pageset *pcp;
> + struct zone *zone;
> +
> + /* Allocate in the BSS so we wont require allocation in
> + * direct reclaim path for CONFIG_CPUMASK_OFFSTACK=y
> + */
> + static cpumask_t cpus_with_pcps;
> +
> + /*
> + * We don't care about racing with CPU hotplug event
> + * as offline notification will cause the notified
> + * cpu to drain that CPU pcps and on_each_cpu_mask
> + * disables preemption as part of its processing
> + */
> + for_each_online_cpu(cpu) {
> + bool has_pcps = false;
> + for_each_populated_zone(zone) {
> + pcp = per_cpu_ptr(zone->pageset, cpu);
> + if (pcp->pcp.count) {
> + has_pcps = true;
> + break;
> + }
> + }
> + if (has_pcps)
> + cpumask_set_cpu(cpu, &cpus_with_pcps);
> + else
> + cpumask_clear_cpu(cpu, &cpus_with_pcps);
> + }
Lets take two CPUs running this code at the same time. CPU 1 has per-cpu
pages in all zones. CPU 2 has no per-cpu pages in any zone. If both run
at the same time, CPU 2 can be clearing the mask for CPU 1 before it has
had a chance to send the IPI. This means we'll miss sending IPIs to CPUs
that we intended to. As I was willing to send no IPI at all;
Acked-by: Mel Gorman <mel@csn.ul.ie>
But if this gets another revision, add a comment saying that two CPUs
can interfere with each other running at the same time but we don't
care.
> + on_each_cpu_mask(&cpus_with_pcps, drain_local_pages, NULL, 1);
> }
>
> #ifdef CONFIG_HIBERNATION
> --
> 1.7.0.4
>
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [v7 7/8] mm: only IPI CPUs to drain local pages if they exist
2012-01-30 14:59 ` Mel Gorman
@ 2012-01-30 15:14 ` Gilad Ben-Yossef
2012-01-30 15:44 ` Mel Gorman
0 siblings, 1 reply; 21+ messages in thread
From: Gilad Ben-Yossef @ 2012-01-30 15:14 UTC (permalink / raw)
To: Mel Gorman
Cc: linux-kernel, KOSAKI Motohiro, Christoph Lameter, Chris Metcalf,
Peter Zijlstra, Frederic Weisbecker, Russell King, linux-mm,
Pekka Enberg, Matt Mackall, Sasha Levin, Rik van Riel, Andi Kleen,
Andrew Morton, Alexander Viro, linux-fsdevel, Avi Kivity,
Michal Nazarewicz, Milton Miller
On Mon, Jan 30, 2012 at 4:59 PM, Mel Gorman <mel@csn.ul.ie> wrote:
> On Thu, Jan 26, 2012 at 12:02:00PM +0200, Gilad Ben-Yossef wrote:
>> Calculate a cpumask of CPUs with per-cpu pages in any zone
>> and only send an IPI requesting CPUs to drain these pages
>> to the buddy allocator if they actually have pages when
>> asked to flush.
>>
...
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index d2186ec..4135983 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1165,7 +1165,36 @@ void drain_local_pages(void *arg)
>> */
>> void drain_all_pages(void)
>> {
>> - on_each_cpu(drain_local_pages, NULL, 1);
>> + int cpu;
>> + struct per_cpu_pageset *pcp;
>> + struct zone *zone;
>> +
>> + /* Allocate in the BSS so we wont require allocation in
>> + * direct reclaim path for CONFIG_CPUMASK_OFFSTACK=y
>> + */
>> + static cpumask_t cpus_with_pcps;
>> +
>> + /*
>> + * We don't care about racing with CPU hotplug event
>> + * as offline notification will cause the notified
>> + * cpu to drain that CPU pcps and on_each_cpu_mask
>> + * disables preemption as part of its processing
>> + */
>> + for_each_online_cpu(cpu) {
>> + bool has_pcps = false;
>> + for_each_populated_zone(zone) {
>> + pcp = per_cpu_ptr(zone->pageset, cpu);
>> + if (pcp->pcp.count) {
>> + has_pcps = true;
>> + break;
>> + }
>> + }
>> + if (has_pcps)
>> + cpumask_set_cpu(cpu, &cpus_with_pcps);
>> + else
>> + cpumask_clear_cpu(cpu, &cpus_with_pcps);
>> + }
>
> Lets take two CPUs running this code at the same time. CPU 1 has per-cpu
> pages in all zones. CPU 2 has no per-cpu pages in any zone. If both run
> at the same time, CPU 2 can be clearing the mask for CPU 1 before it has
> had a chance to send the IPI. This means we'll miss sending IPIs to CPUs
> that we intended to.
I'm confused. You seem to be assuming that each CPU is looking at its own pcps
only (per zone). Assuming no change in the state of the pcps when both CPUs
run this code at the same time, both of them should mark the bit for
their respective
CPUs the same, unless one of them raced and managed to send the IPI to clear
pcps from the other, at which point you might see one of them send a
spurious IPI
to drains pcps that have been drained - but that isn't bad.
At least, that is what I meant the code to do and what I believe it
does. What have I
missed?
> As I was willing to send no IPI at all;
>
> Acked-by: Mel Gorman <mel@csn.ul.ie>
Thank you for the review and the ACK :-)
>
> But if this gets another revision, add a comment saying that two CPUs
> can interfere with each other running at the same time but we don't
> care.
>
>> + on_each_cpu_mask(&cpus_with_pcps, drain_local_pages, NULL, 1);
>> }
>>
Gilad
--
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com
"Unfortunately, cache misses are an equal opportunity pain provider."
-- Mike Galbraith, LKML
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [v7 7/8] mm: only IPI CPUs to drain local pages if they exist
2012-01-30 15:14 ` Gilad Ben-Yossef
@ 2012-01-30 15:44 ` Mel Gorman
0 siblings, 0 replies; 21+ messages in thread
From: Mel Gorman @ 2012-01-30 15:44 UTC (permalink / raw)
To: Gilad Ben-Yossef
Cc: linux-kernel, KOSAKI Motohiro, Christoph Lameter, Chris Metcalf,
Peter Zijlstra, Frederic Weisbecker, Russell King, linux-mm,
Pekka Enberg, Matt Mackall, Sasha Levin, Rik van Riel, Andi Kleen,
Andrew Morton, Alexander Viro, linux-fsdevel, Avi Kivity,
Michal Nazarewicz, Milton Miller
On Mon, Jan 30, 2012 at 05:14:37PM +0200, Gilad Ben-Yossef wrote:
> >> + for_each_online_cpu(cpu) {
> >> + bool has_pcps = false;
> >> + for_each_populated_zone(zone) {
> >> + pcp = per_cpu_ptr(zone->pageset, cpu);
> >> + if (pcp->pcp.count) {
> >> + has_pcps = true;
> >> + break;
> >> + }
> >> + }
> >> + if (has_pcps)
> >> + cpumask_set_cpu(cpu, &cpus_with_pcps);
> >> + else
> >> + cpumask_clear_cpu(cpu, &cpus_with_pcps);
> >> + }
> >
> > Lets take two CPUs running this code at the same time. CPU 1 has per-cpu
> > pages in all zones. CPU 2 has no per-cpu pages in any zone. If both run
> > at the same time, CPU 2 can be clearing the mask for CPU 1 before it has
> > had a chance to send the IPI. This means we'll miss sending IPIs to CPUs
> > that we intended to.
>
> I'm confused. You seem to be assuming that each CPU is looking at its own pcps
> only (per zone).
/me slaps self
I was assuming exactly this.
> Assuming no change in the state of the pcps when both CPUs
> run this code at the same time, both of them should mark the bit for
> their respective
> CPUs the same, unless one of them raced and managed to send the IPI to clear
> pcps from the other, at which point you might see one of them send a
> spurious IPI
> to drains pcps that have been drained - but that isn't bad.
>
Indeed, the race is tiny and the consequences are not important.
> At least, that is what I meant the code to do and what I believe it
> does. What have I
> missed?
>
Nothing, the problem was on my side. Sorry for the noise.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [v7 7/8] mm: only IPI CPUs to drain local pages if they exist
2012-01-29 12:18 ` Gilad Ben-Yossef
@ 2012-01-30 21:49 ` Andrew Morton
2012-01-31 6:32 ` Gilad Ben-Yossef
0 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2012-01-30 21:49 UTC (permalink / raw)
To: Gilad Ben-Yossef
Cc: linux-kernel, Mel Gorman, KOSAKI Motohiro, Christoph Lameter,
Chris Metcalf, Peter Zijlstra, Frederic Weisbecker, Russell King,
linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin, Rik van Riel,
Andi Kleen, Alexander Viro, linux-fsdevel, Avi Kivity,
Michal Nazarewicz, Milton Miller
On Sun, 29 Jan 2012 14:18:32 +0200
Gilad Ben-Yossef <gilad@benyossef.com> wrote:
> On Sat, Jan 28, 2012 at 2:12 AM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
> > On Thu, 26 Jan 2012 12:02:00 +0200
> > Gilad Ben-Yossef <gilad@benyossef.com> wrote:
> >
> >> Calculate a cpumask of CPUs with per-cpu pages in any zone
> >> and only send an IPI requesting CPUs to drain these pages
> >> to the buddy allocator if they actually have pages when
> >> asked to flush.
> >>
>
> ...
>
> > Can we end up sending an IPI to a now-unplugged CPU? __That won't work
> > very well if that CPU is now sitting on its sysadmin's desk.
>
> Nope. on_each_cpu_mask() disables preemption and calls smp_call_function_many()
> which then checks the mask against the cpu_online_mask
OK.
General rule of thumb: if a reviewer asked something then it is likely
that others will wonder the same thing when reading the code later on.
So consider reviewer questions as a sign that the code needs additional
comments!
> > There's also the case of CPU online. __We could end up failing to IPI a
> > CPU which now has some percpu pages. __That's not at all serious - 90%
> > is good enough in page reclaim. __But this thinking merits a mention in
> > the comment. __Or we simply make this code hotplug-safe.
>
> hmm.. I'm probably daft but I don't see how to make the code hotplug safe for
> CPU online case. I mean, let's say we disable preemption throughout the
> entire ordeal and then the CPU goes online and gets itself some percpu pages
> *after* we've calculated the masks, sent the IPIs and waiting for the
> whole thing
> to finish but before we've returned...
This is inherent to the whole drain-pages design - it's only a
best-effort thing and there's nothing to prevent other CPUs from
undoing your work 2 nanoseconds later.
The exception to this is the case of suspend, which drains the queues
when all tasks (and, hopefully, IRQs) have been frozen. This is the
only way to make draining 100% "reliable".
> I might be missing something here, but I think that unless you have some other
> means to stop newly hotplugged CPUs to grab per cpus pages there is nothing
> you can do in this code to stop it. Maybe make the race window
> shorter, that's all.
>
> Would adding a comment such as the following OK?
>
> "This code is protected against sending an IPI to an offline CPU but does not
> guarantee sending an IPI to newly hotplugged CPUs"
Looks OK. I'd mention *how* this protection comes about:
on_each_cpu_mask() blocks hotplug and won't talk to offlined CPUs.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [v7 1/8] smp: introduce a generic on_each_cpu_mask function
2012-01-29 12:24 ` Gilad Ben-Yossef
@ 2012-01-30 21:52 ` Andrew Morton
2012-01-31 6:33 ` Gilad Ben-Yossef
0 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2012-01-30 21:52 UTC (permalink / raw)
To: Gilad Ben-Yossef
Cc: Chris Metcalf, Peter Zijlstra, Frederic Weisbecker, Russell King,
linux-mm, Pekka Enberg, Matt Mackall, Rik van Riel, Andi Kleen,
Sasha Levin, Mel Gorman, Alexander Viro, linux-fsdevel,
Avi Kivity, Michal Nazarewicz, Kosaki Motohiro, Milton Miller,
linux-kernel
On Sun, 29 Jan 2012 14:24:16 +0200
Gilad Ben-Yossef <gilad@benyossef.com> wrote:
> On Thu, Jan 26, 2012 at 12:01 PM, Gilad Ben-Yossef <gilad@benyossef.com> wrote:
> > on_each_cpu_mask calls a function on processors specified by
> > cpumask, which may or may not include the local processor.
> >
> > You must not call this function with disabled interrupts or
> > from a hardware interrupt handler or from a bottom half handler.
> >
> > Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
> > Reviewed-by: Christoph Lameter <cl@linux.com>
> > CC: Chris Metcalf <cmetcalf@tilera.com>
> ...
> > ---
> > __include/linux/smp.h | __ 22 ++++++++++++++++++++++
> > __kernel/smp.c __ __ __ __| __ 29 +++++++++++++++++++++++++++++
> > __2 files changed, 51 insertions(+), 0 deletions(-)
> >
>
>
> Milton made the very sensible comment that while adding on_each_cpu() in the
> arch generic code and removing the two arch specific instances from tile and arm
> in separate patches is good for review, it will break bisect.
>
> He suggested I squash them into a single commit when it goes in.
>
> Since you picked the patch set into linux-mm, will now be a good time for that?
I can fold the patches together - I do that all the time.
Please identify exactly whcih patches you're referring to here.
arm-move-arm-over-to-generic-on_each_cpu_mask and
tile-move-tile-to-use-generic-on_each_cpu_mask should be folded into
smp-introduce-a-generic-on_each_cpu_mask-function, yes?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [v7 7/8] mm: only IPI CPUs to drain local pages if they exist
2012-01-30 21:49 ` Andrew Morton
@ 2012-01-31 6:32 ` Gilad Ben-Yossef
0 siblings, 0 replies; 21+ messages in thread
From: Gilad Ben-Yossef @ 2012-01-31 6:32 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, Mel Gorman, KOSAKI Motohiro, Christoph Lameter,
Chris Metcalf, Peter Zijlstra, Frederic Weisbecker, Russell King,
linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin, Rik van Riel,
Andi Kleen, Alexander Viro, linux-fsdevel, Avi Kivity,
Michal Nazarewicz, Milton Miller
On Mon, Jan 30, 2012 at 11:49 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Sun, 29 Jan 2012 14:18:32 +0200
> Gilad Ben-Yossef <gilad@benyossef.com> wrote:
>
>> On Sat, Jan 28, 2012 at 2:12 AM, Andrew Morton
>> <akpm@linux-foundation.org> wrote:
>> > On Thu, 26 Jan 2012 12:02:00 +0200
>> > Gilad Ben-Yossef <gilad@benyossef.com> wrote:
>> >
>> >> Calculate a cpumask of CPUs with per-cpu pages in any zone
>> >> and only send an IPI requesting CPUs to drain these pages
>> >> to the buddy allocator if they actually have pages when
>> >> asked to flush.
>> >>
>>
>> ...
>>
>> > Can we end up sending an IPI to a now-unplugged CPU? __That won't work
>> > very well if that CPU is now sitting on its sysadmin's desk.
>>
>> Nope. on_each_cpu_mask() disables preemption and calls smp_call_function_many()
>> which then checks the mask against the cpu_online_mask
>
> OK.
>
> General rule of thumb: if a reviewer asked something then it is likely
> that others will wonder the same thing when reading the code later on.
> So consider reviewer questions as a sign that the code needs additional
> comments!
Right, point taken.
>
>> > There's also the case of CPU online. __We could end up failing to IPI a
>> > CPU which now has some percpu pages. __That's not at all serious - 90%
>> > is good enough in page reclaim. __But this thinking merits a mention in
>> > the comment. __Or we simply make this code hotplug-safe.
>>
>> hmm.. I'm probably daft but I don't see how to make the code hotplug safe for
>> CPU online case. I mean, let's say we disable preemption throughout the
>> entire ordeal and then the CPU goes online and gets itself some percpu pages
>> *after* we've calculated the masks, sent the IPIs and waiting for the
>> whole thing
>> to finish but before we've returned...
>
> This is inherent to the whole drain-pages design - it's only a
> best-effort thing and there's nothing to prevent other CPUs from
> undoing your work 2 nanoseconds later.
>
> The exception to this is the case of suspend, which drains the queues
> when all tasks (and, hopefully, IRQs) have been frozen. This is the
> only way to make draining 100% "reliable".
>
>> I might be missing something here, but I think that unless you have some other
>> means to stop newly hotplugged CPUs to grab per cpus pages there is nothing
>> you can do in this code to stop it. Maybe make the race window
>> shorter, that's all.
>>
>> Would adding a comment such as the following OK?
>>
>> "This code is protected against sending an IPI to an offline CPU but does not
>> guarantee sending an IPI to newly hotplugged CPUs"
>
> Looks OK. I'd mention *how* this protection comes about:
> on_each_cpu_mask() blocks hotplug and won't talk to offlined CPUs.
Good. I'll send an updated patch set.
Thanks :-)
Gilad
--
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com
"Unfortunately, cache misses are an equal opportunity pain provider."
-- Mike Galbraith, LKML
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [v7 1/8] smp: introduce a generic on_each_cpu_mask function
2012-01-30 21:52 ` Andrew Morton
@ 2012-01-31 6:33 ` Gilad Ben-Yossef
0 siblings, 0 replies; 21+ messages in thread
From: Gilad Ben-Yossef @ 2012-01-31 6:33 UTC (permalink / raw)
To: Andrew Morton
Cc: Chris Metcalf, Peter Zijlstra, Frederic Weisbecker, Russell King,
linux-mm, Pekka Enberg, Matt Mackall, Rik van Riel, Andi Kleen,
Sasha Levin, Mel Gorman, Alexander Viro, linux-fsdevel,
Avi Kivity, Michal Nazarewicz, Kosaki Motohiro, Milton Miller,
linux-kernel
On Mon, Jan 30, 2012 at 11:52 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Sun, 29 Jan 2012 14:24:16 +0200
> Gilad Ben-Yossef <gilad@benyossef.com> wrote:
>
>> On Thu, Jan 26, 2012 at 12:01 PM, Gilad Ben-Yossef <gilad@benyossef.com> wrote:
>> > on_each_cpu_mask calls a function on processors specified by
>> > cpumask, which may or may not include the local processor.
>> >
>> > You must not call this function with disabled interrupts or
>> > from a hardware interrupt handler or from a bottom half handler.
>> >
>> > Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
>> > Reviewed-by: Christoph Lameter <cl@linux.com>
>> > CC: Chris Metcalf <cmetcalf@tilera.com>
>> ...
>> > ---
>> > __include/linux/smp.h | __ 22 ++++++++++++++++++++++
>> > __kernel/smp.c __ __ __ __| __ 29 +++++++++++++++++++++++++++++
>> > __2 files changed, 51 insertions(+), 0 deletions(-)
>> >
>>
>>
>> Milton made the very sensible comment that while adding on_each_cpu() in the
>> arch generic code and removing the two arch specific instances from tile and arm
>> in separate patches is good for review, it will break bisect.
>>
>> He suggested I squash them into a single commit when it goes in.
>>
>> Since you picked the patch set into linux-mm, will now be a good time for that?
>
> I can fold the patches together - I do that all the time.
>
> Please identify exactly whcih patches you're referring to here.
>
> arm-move-arm-over-to-generic-on_each_cpu_mask and
> tile-move-tile-to-use-generic-on_each_cpu_mask should be folded into
> smp-introduce-a-generic-on_each_cpu_mask-function, yes?
Yes. Thank you.
Gilad
--
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com
"Unfortunately, cache misses are an equal opportunity pain provider."
-- Mike Galbraith, LKML
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2012-01-31 6:33 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1327572121-13673-1-git-send-email-gilad@benyossef.com>
2012-01-26 10:01 ` [v7 1/8] smp: introduce a generic on_each_cpu_mask function Gilad Ben-Yossef
2012-01-29 12:24 ` Gilad Ben-Yossef
2012-01-30 21:52 ` Andrew Morton
2012-01-31 6:33 ` Gilad Ben-Yossef
2012-01-26 10:01 ` [v7 2/8] arm: move arm over to generic on_each_cpu_mask Gilad Ben-Yossef
2012-01-26 10:01 ` [v7 3/8] tile: move tile to use " Gilad Ben-Yossef
2012-01-26 10:01 ` [v7 4/8] smp: add func to IPI cpus based on parameter func Gilad Ben-Yossef
2012-01-27 23:57 ` Andrew Morton
2012-01-29 12:04 ` Gilad Ben-Yossef
2012-01-26 10:01 ` [v7 5/8] slub: only IPI CPUs that have per cpu obj to flush Gilad Ben-Yossef
2012-01-26 15:09 ` Christoph Lameter
2012-01-26 10:01 ` [v7 6/8] fs: only send IPI to invalidate LRU BH when needed Gilad Ben-Yossef
2012-01-26 10:02 ` [v7 7/8] mm: only IPI CPUs to drain local pages if they exist Gilad Ben-Yossef
2012-01-26 15:13 ` Christoph Lameter
2012-01-28 0:12 ` Andrew Morton
2012-01-29 12:18 ` Gilad Ben-Yossef
2012-01-30 21:49 ` Andrew Morton
2012-01-31 6:32 ` Gilad Ben-Yossef
2012-01-30 14:59 ` Mel Gorman
2012-01-30 15:14 ` Gilad Ben-Yossef
2012-01-30 15:44 ` Mel Gorman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).