* [PATCH] generic-ipi: make struct call_function_data lockless
@ 2009-07-24 9:50 Xiao Guangrong
2009-07-27 22:00 ` Andrew Morton
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Xiao Guangrong @ 2009-07-24 9:50 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Jens Axboe, Nick Piggin, Peter Zijlstra, LKML
This patch can remove spinlock from struct call_function_data, the
reasons are below:
1: add a new interface for cpumask named cpumask_test_and_clear_cpu(),
it can atomically test and clear specific cpu, we can use it instead
of cpumask_test_cpu() and cpumask_clear_cpu() and no need data->lock
to protect those in generic_smp_call_function_interrupt().
2: in smp_call_function_many(), after csd_lock() return, the current's
cfd_data is deleted from call_function list, so it not have race
between other cpus, then cfs_data is only used in
smp_call_function_many() that must disable preemption and not from
a hardware interrupthandler or from a bottom half handler to call,
only the correspond cpu can use it, so it not have race in current
cpu, no need cfs_data->lock to protect it.
3: after 1 and 2, cfs_data->lock is only use to protect cfs_data->refs in
generic_smp_call_function_interrupt(), so we can define cfs_data->refs
to atomic_t, and no need cfs_data->lock any more.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
---
include/linux/cpumask.h | 12 ++++++++++++
kernel/smp.c | 29 ++++++++---------------------
2 files changed, 20 insertions(+), 21 deletions(-)
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index c5ac87c..a370bd5 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -715,6 +715,18 @@ static inline int cpumask_test_and_set_cpu(int cpu, struct cpumask *cpumask)
}
/**
+ * cpumask_test_and_clear_cpu - atomically test and clear a cpu in a cpumask
+ * @cpu: cpu number (< nr_cpu_ids)
+ * @cpumask: the cpumask pointer
+ *
+ * test_and_clear_bit wrapper for cpumasks.
+ */
+static inline int cpumask_test_and_clear_cpu(int cpu, struct cpumask *cpumask)
+{
+ return test_and_clear_bit(cpumask_check(cpu), cpumask_bits(cpumask));
+}
+
+/**
* cpumask_setall - set all cpus (< nr_cpu_ids) in a cpumask
* @dstp: the cpumask pointer
*/
diff --git a/kernel/smp.c b/kernel/smp.c
index ad63d85..7016996 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -29,8 +29,7 @@ enum {
struct call_function_data {
struct call_single_data csd;
- spinlock_t lock;
- unsigned int refs;
+ atomic_t refs;
cpumask_var_t cpumask;
};
@@ -39,9 +38,7 @@ struct call_single_queue {
spinlock_t lock;
};
-static DEFINE_PER_CPU(struct call_function_data, cfd_data) = {
- .lock = __SPIN_LOCK_UNLOCKED(cfd_data.lock),
-};
+static DEFINE_PER_CPU(struct call_function_data, cfd_data);
static int
hotplug_cfd(struct notifier_block *nfb, unsigned long action, void *hcpu)
@@ -191,25 +188,18 @@ void generic_smp_call_function_interrupt(void)
list_for_each_entry_rcu(data, &call_function.queue, csd.list) {
int refs;
- spin_lock(&data->lock);
- if (!cpumask_test_cpu(cpu, data->cpumask)) {
- spin_unlock(&data->lock);
+ if (!cpumask_test_and_clear_cpu(cpu, data->cpumask))
continue;
- }
- cpumask_clear_cpu(cpu, data->cpumask);
- spin_unlock(&data->lock);
data->csd.func(data->csd.info);
- spin_lock(&data->lock);
- WARN_ON(data->refs == 0);
- refs = --data->refs;
+ refs = atomic_sub_return(1, &data->refs);
+ WARN_ON(refs < 0);
if (!refs) {
spin_lock(&call_function.lock);
list_del_rcu(&data->csd.list);
spin_unlock(&call_function.lock);
}
- spin_unlock(&data->lock);
if (refs)
continue;
@@ -391,23 +381,20 @@ void smp_call_function_many(const struct cpumask *mask,
data = &__get_cpu_var(cfd_data);
csd_lock(&data->csd);
- spin_lock_irqsave(&data->lock, flags);
data->csd.func = func;
data->csd.info = info;
cpumask_and(data->cpumask, mask, cpu_online_mask);
cpumask_clear_cpu(this_cpu, data->cpumask);
- data->refs = cpumask_weight(data->cpumask);
+ atomic_set(&data->refs, cpumask_weight(data->cpumask));
- spin_lock(&call_function.lock);
+ spin_lock_irqsave(&call_function.lock, flags);
/*
* Place entry at the _HEAD_ of the list, so that any cpu still
* observing the entry in generic_smp_call_function_interrupt()
* will not miss any other list entries:
*/
list_add_rcu(&data->csd.list, &call_function.queue);
- spin_unlock(&call_function.lock);
-
- spin_unlock_irqrestore(&data->lock, flags);
+ spin_unlock_irqrestore(&call_function.lock, flags);
/*
* Make the list addition visible before sending the ipi.
--
1.6.1.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] generic-ipi: make struct call_function_data lockless
2009-07-24 9:50 [PATCH] generic-ipi: make struct call_function_data lockless Xiao Guangrong
@ 2009-07-27 22:00 ` Andrew Morton
2009-07-29 7:08 ` Jens Axboe
2009-07-29 7:53 ` [PATCH 1/3 -mm] generic-ipi: fix hotplug_cfd() Xiao Guangrong
2 siblings, 0 replies; 13+ messages in thread
From: Andrew Morton @ 2009-07-27 22:00 UTC (permalink / raw)
To: Xiao Guangrong
Cc: mingo, jens.axboe, nickpiggin, peterz, linux-kernel,
Rusty Russell
On Fri, 24 Jul 2009 17:50:16 +0800
Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> wrote:
> This patch can remove spinlock from struct call_function_data, the
> reasons are below:
>
> 1: add a new interface for cpumask named cpumask_test_and_clear_cpu(),
> it can atomically test and clear specific cpu, we can use it instead
> of cpumask_test_cpu() and cpumask_clear_cpu() and no need data->lock
> to protect those in generic_smp_call_function_interrupt().
>
> 2: in smp_call_function_many(), after csd_lock() return, the current's
> cfd_data is deleted from call_function list, so it not have race
> between other cpus, then cfs_data is only used in
> smp_call_function_many() that must disable preemption and not from
> a hardware interrupthandler or from a bottom half handler to call,
> only the correspond cpu can use it, so it not have race in current
> cpu, no need cfs_data->lock to protect it.
>
> 3: after 1 and 2, cfs_data->lock is only use to protect cfs_data->refs in
> generic_smp_call_function_interrupt(), so we can define cfs_data->refs
> to atomic_t, and no need cfs_data->lock any more.
Looks good to me. One tiny cleanup:
--- a/kernel/smp.c~generic-ipi-make-struct-call_function_data-lockless-cleanup
+++ a/kernel/smp.c
@@ -193,7 +193,7 @@ void generic_smp_call_function_interrupt
data->csd.func(data->csd.info);
- refs = atomic_sub_return(1, &data->refs);
+ refs = atomic_dec_return(&data->refs);
WARN_ON(refs < 0);
if (!refs) {
spin_lock(&call_function.lock);
_
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] generic-ipi: make struct call_function_data lockless
2009-07-24 9:50 [PATCH] generic-ipi: make struct call_function_data lockless Xiao Guangrong
2009-07-27 22:00 ` Andrew Morton
@ 2009-07-29 7:08 ` Jens Axboe
2009-07-29 7:53 ` [PATCH 1/3 -mm] generic-ipi: fix hotplug_cfd() Xiao Guangrong
2 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2009-07-29 7:08 UTC (permalink / raw)
To: Xiao Guangrong; +Cc: Ingo Molnar, Nick Piggin, Peter Zijlstra, LKML
On Fri, Jul 24 2009, Xiao Guangrong wrote:
> This patch can remove spinlock from struct call_function_data, the
> reasons are below:
>
> 1: add a new interface for cpumask named cpumask_test_and_clear_cpu(),
> it can atomically test and clear specific cpu, we can use it instead
> of cpumask_test_cpu() and cpumask_clear_cpu() and no need data->lock
> to protect those in generic_smp_call_function_interrupt().
>
> 2: in smp_call_function_many(), after csd_lock() return, the current's
> cfd_data is deleted from call_function list, so it not have race
> between other cpus, then cfs_data is only used in
> smp_call_function_many() that must disable preemption and not from
> a hardware interrupthandler or from a bottom half handler to call,
> only the correspond cpu can use it, so it not have race in current
> cpu, no need cfs_data->lock to protect it.
>
> 3: after 1 and 2, cfs_data->lock is only use to protect cfs_data->refs in
> generic_smp_call_function_interrupt(), so we can define cfs_data->refs
> to atomic_t, and no need cfs_data->lock any more.
Very nice, looks good to me! Tested on x86-64 and sparc64 here.
Andrew, shall I include it, or shall we just let it funnel through
your patch stream?
Acked-by: Jens Axboe <jens.axboe@oracle.com>
--
Jens Axboe
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/3 -mm] generic-ipi: fix hotplug_cfd()
2009-07-24 9:50 [PATCH] generic-ipi: make struct call_function_data lockless Xiao Guangrong
2009-07-27 22:00 ` Andrew Morton
2009-07-29 7:08 ` Jens Axboe
@ 2009-07-29 7:53 ` Xiao Guangrong
2009-07-29 7:55 ` [PATCH 2/3 -mm] generic-ipi: cleanup for generic_smp_call_function_interrupt() Xiao Guangrong
` (2 more replies)
2 siblings, 3 replies; 13+ messages in thread
From: Xiao Guangrong @ 2009-07-29 7:53 UTC (permalink / raw)
To: Andrew Morton
Cc: Ingo Molnar, Jens Axboe, Nick Piggin, Peter Zijlstra,
Rusty Russell, LKML
Use CONFIG_HOTPLUG_CPU, not CONFIG_CPU_HOTPLUG
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
---
kernel/smp.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/kernel/smp.c b/kernel/smp.c
index bf9f18b..1b5fd2e 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -54,7 +54,7 @@ hotplug_cfd(struct notifier_block *nfb, unsigned long action, void *hcpu)
return NOTIFY_BAD;
break;
-#ifdef CONFIG_CPU_HOTPLUG
+#ifdef CONFIG_HOTPLUG_CPU
case CPU_UP_CANCELED:
case CPU_UP_CANCELED_FROZEN:
--
1.6.1.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/3 -mm] generic-ipi: cleanup for generic_smp_call_function_interrupt()
2009-07-29 7:53 ` [PATCH 1/3 -mm] generic-ipi: fix hotplug_cfd() Xiao Guangrong
@ 2009-07-29 7:55 ` Xiao Guangrong
2009-07-29 7:57 ` [PATCH 3/3 -mm] generic-ipi: fix the race between generic_smp_call_function_*() and hotplug_cfd() Xiao Guangrong
2009-07-29 23:27 ` [PATCH 1/3 -mm] generic-ipi: fix hotplug_cfd() Andrew Morton
2 siblings, 0 replies; 13+ messages in thread
From: Xiao Guangrong @ 2009-07-29 7:55 UTC (permalink / raw)
To: Andrew Morton
Cc: Ingo Molnar, Jens Axboe, Nick Piggin, Peter Zijlstra,
Rusty Russell, LKML
Use smp_processor_id() instead of get_cpu() and put_cpu() in
generic_smp_call_function_interrupt(), It's no need to disable preempt,
beacuse we must call generic_smp_call_function_interrupt() with interrupts
disabled
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
---
kernel/smp.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)
diff --git a/kernel/smp.c b/kernel/smp.c
index 1b5fd2e..3035ab8 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -171,7 +171,7 @@ void generic_exec_single(int cpu, struct call_single_data *data, int wait)
void generic_smp_call_function_interrupt(void)
{
struct call_function_data *data;
- int cpu = get_cpu();
+ int cpu = smp_processor_id();
/*
* Ensure entry is visible on call_function_queue after we have
@@ -207,7 +207,6 @@ void generic_smp_call_function_interrupt(void)
csd_unlock(&data->csd);
}
- put_cpu();
}
/*
--
1.6.1.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 3/3 -mm] generic-ipi: fix the race between generic_smp_call_function_*() and hotplug_cfd()
2009-07-29 7:53 ` [PATCH 1/3 -mm] generic-ipi: fix hotplug_cfd() Xiao Guangrong
2009-07-29 7:55 ` [PATCH 2/3 -mm] generic-ipi: cleanup for generic_smp_call_function_interrupt() Xiao Guangrong
@ 2009-07-29 7:57 ` Xiao Guangrong
2009-07-29 23:31 ` Andrew Morton
2009-07-29 23:27 ` [PATCH 1/3 -mm] generic-ipi: fix hotplug_cfd() Andrew Morton
2 siblings, 1 reply; 13+ messages in thread
From: Xiao Guangrong @ 2009-07-29 7:57 UTC (permalink / raw)
To: Andrew Morton
Cc: Ingo Molnar, Jens Axboe, Nick Piggin, Peter Zijlstra,
Rusty Russell, LKML
It have race between generic_smp_call_function_*() and hotplug_cfd()
in many cases, see below examples:
1: hotplug_cfd() can free cfd->cpumask, the system will crash if the
cpu's cfd still in the call_function list:
CPU A: CPU B
smp_call_function_many() ......
cpu_down() ......
hotplug_cfd() -> ......
free_cpumask_var(cfd->cpumask) (receive function IPI interrupte)
/* read cfd->cpumask */
generic_smp_call_function_interrupt() ->
cpumask_test_and_clear_cpu(cpu, data->cpumask)
CRASH!!!
2: It's not handle call_function list when cpu down, It's will lead to
dead-wait if other path is waiting this cpu to execute function
CPU A: CPU B
smp_call_function_many(wait=0)
...... CPU B down
smp_call_function_many() --> (cpu down before recevie function
csd_lock(&data->csd); IPI interrupte)
DEAD-WAIT!!!!
So, CPU A will dead-wait in csd_lock(), the same as
smp_call_function_single()
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
---
kernel/smp.c | 140 ++++++++++++++++++++++++++++++++-------------------------
1 files changed, 79 insertions(+), 61 deletions(-)
diff --git a/kernel/smp.c b/kernel/smp.c
index 3035ab8..e52e30c 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -40,57 +40,6 @@ struct call_single_queue {
static DEFINE_PER_CPU(struct call_function_data, cfd_data);
-static int
-hotplug_cfd(struct notifier_block *nfb, unsigned long action, void *hcpu)
-{
- long cpu = (long)hcpu;
- struct call_function_data *cfd = &per_cpu(cfd_data, cpu);
-
- switch (action) {
- case CPU_UP_PREPARE:
- case CPU_UP_PREPARE_FROZEN:
- if (!zalloc_cpumask_var_node(&cfd->cpumask, GFP_KERNEL,
- cpu_to_node(cpu)))
- return NOTIFY_BAD;
- break;
-
-#ifdef CONFIG_HOTPLUG_CPU
- case CPU_UP_CANCELED:
- case CPU_UP_CANCELED_FROZEN:
-
- case CPU_DEAD:
- case CPU_DEAD_FROZEN:
- free_cpumask_var(cfd->cpumask);
- break;
-#endif
- };
-
- return NOTIFY_OK;
-}
-
-static struct notifier_block __cpuinitdata hotplug_cfd_notifier = {
- .notifier_call = hotplug_cfd,
-};
-
-static int __cpuinit init_call_single_data(void)
-{
- void *cpu = (void *)(long)smp_processor_id();
- int i;
-
- for_each_possible_cpu(i) {
- struct call_single_queue *q = &per_cpu(call_single_queue, i);
-
- spin_lock_init(&q->lock);
- INIT_LIST_HEAD(&q->list);
- }
-
- hotplug_cfd(&hotplug_cfd_notifier, CPU_UP_PREPARE, cpu);
- register_cpu_notifier(&hotplug_cfd_notifier);
-
- return 0;
-}
-early_initcall(init_call_single_data);
-
/*
* csd_lock/csd_unlock used to serialize access to per-cpu csd resources
*
@@ -164,14 +113,10 @@ void generic_exec_single(int cpu, struct call_single_data *data, int wait)
csd_lock_wait(data);
}
-/*
- * Invoked by arch to handle an IPI for call function. Must be called with
- * interrupts disabled.
- */
-void generic_smp_call_function_interrupt(void)
+static void
+__generic_smp_call_function_interrupt(int cpu, int run_callbacks)
{
struct call_function_data *data;
- int cpu = smp_processor_id();
/*
* Ensure entry is visible on call_function_queue after we have
@@ -210,12 +155,18 @@ void generic_smp_call_function_interrupt(void)
}
/*
- * Invoked by arch to handle an IPI for call function single. Must be
- * called from the arch with interrupts disabled.
+ * Invoked by arch to handle an IPI for call function. Must be called with
+ * interrupts disabled.
*/
-void generic_smp_call_function_single_interrupt(void)
+void generic_smp_call_function_interrupt(void)
+{
+ __generic_smp_call_function_interrupt(smp_processor_id(), 1);
+}
+
+static void
+__generic_smp_call_function_single_interrupt(int cpu, int run_callbacks)
{
- struct call_single_queue *q = &__get_cpu_var(call_single_queue);
+ struct call_single_queue *q = &per_cpu(call_single_queue, cpu);
unsigned int data_flags;
LIST_HEAD(list);
@@ -246,6 +197,15 @@ void generic_smp_call_function_single_interrupt(void)
}
}
+/*
+ * Invoked by arch to handle an IPI for call function single. Must be
+ * called from the arch with interrupts disabled.
+ */
+void generic_smp_call_function_single_interrupt(void)
+{
+ __generic_smp_call_function_single_interrupt(smp_processor_id(), 1);
+}
+
static DEFINE_PER_CPU(struct call_single_data, csd_data);
/*
@@ -456,3 +416,61 @@ void ipi_call_unlock_irq(void)
{
spin_unlock_irq(&call_function.lock);
}
+
+static int
+hotplug_cfd(struct notifier_block *nfb, unsigned long action, void *hcpu)
+{
+ long cpu = (long)hcpu;
+ unsigned long flags;
+ struct call_function_data *cfd = &per_cpu(cfd_data, cpu);
+
+ switch (action) {
+ case CPU_UP_PREPARE:
+ case CPU_UP_PREPARE_FROZEN:
+ if (!zalloc_cpumask_var_node(&cfd->cpumask, GFP_KERNEL,
+ cpu_to_node(cpu)))
+ return NOTIFY_BAD;
+ break;
+
+#ifdef CONFIG_HOTPLUG_CPU
+ case CPU_UP_CANCELED:
+ case CPU_UP_CANCELED_FROZEN:
+
+ case CPU_DEAD:
+ case CPU_DEAD_FROZEN:
+ local_irq_save(flags);
+ __generic_smp_call_function_interrupt(cpu, 0);
+ __generic_smp_call_function_single_interrupt(cpu, 0);
+ local_irq_restore(flags);
+
+ csd_lock_wait(&cfd->csd);
+ free_cpumask_var(cfd->cpumask);
+ break;
+#endif
+ };
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block __cpuinitdata hotplug_cfd_notifier = {
+ .notifier_call = hotplug_cfd,
+};
+
+static int __cpuinit init_call_single_data(void)
+{
+ void *cpu = (void *)(long)smp_processor_id();
+ int i;
+
+ for_each_possible_cpu(i) {
+ struct call_single_queue *q = &per_cpu(call_single_queue, i);
+
+ spin_lock_init(&q->lock);
+ INIT_LIST_HEAD(&q->list);
+ }
+
+ hotplug_cfd(&hotplug_cfd_notifier, CPU_UP_PREPARE, cpu);
+ register_cpu_notifier(&hotplug_cfd_notifier);
+
+ return 0;
+}
+early_initcall(init_call_single_data);
--
1.6.1.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/3 -mm] generic-ipi: fix hotplug_cfd()
2009-07-29 7:53 ` [PATCH 1/3 -mm] generic-ipi: fix hotplug_cfd() Xiao Guangrong
2009-07-29 7:55 ` [PATCH 2/3 -mm] generic-ipi: cleanup for generic_smp_call_function_interrupt() Xiao Guangrong
2009-07-29 7:57 ` [PATCH 3/3 -mm] generic-ipi: fix the race between generic_smp_call_function_*() and hotplug_cfd() Xiao Guangrong
@ 2009-07-29 23:27 ` Andrew Morton
2009-07-30 1:18 ` Li Zefan
2 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2009-07-29 23:27 UTC (permalink / raw)
To: Xiao Guangrong; +Cc: mingo, jens.axboe, nickpiggin, peterz, rusty, linux-kernel
On Wed, 29 Jul 2009 15:53:13 +0800
Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> wrote:
> Use CONFIG_HOTPLUG_CPU, not CONFIG_CPU_HOTPLUG
>
> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
> ---
> kernel/smp.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/smp.c b/kernel/smp.c
> index bf9f18b..1b5fd2e 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -54,7 +54,7 @@ hotplug_cfd(struct notifier_block *nfb, unsigned long action, void *hcpu)
> return NOTIFY_BAD;
> break;
>
> -#ifdef CONFIG_CPU_HOTPLUG
> +#ifdef CONFIG_HOTPLUG_CPU
> case CPU_UP_CANCELED:
> case CPU_UP_CANCELED_FROZEN:
>
Dammit, that mistake is easy to make. We should have used #if from day
one, not #ifdef. Oh well.
What's the impact of this bug? Do we think the fix should be present
in 2.6.31? 2.6.30.x?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3 -mm] generic-ipi: fix the race between generic_smp_call_function_*() and hotplug_cfd()
2009-07-29 7:57 ` [PATCH 3/3 -mm] generic-ipi: fix the race between generic_smp_call_function_*() and hotplug_cfd() Xiao Guangrong
@ 2009-07-29 23:31 ` Andrew Morton
2009-07-30 3:31 ` Xiao Guangrong
2009-07-30 6:50 ` Peter Zijlstra
0 siblings, 2 replies; 13+ messages in thread
From: Andrew Morton @ 2009-07-29 23:31 UTC (permalink / raw)
To: Xiao Guangrong; +Cc: mingo, jens.axboe, nickpiggin, peterz, rusty, linux-kernel
On Wed, 29 Jul 2009 15:57:51 +0800
Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> wrote:
> It have race between generic_smp_call_function_*() and hotplug_cfd()
> in many cases, see below examples:
>
> 1: hotplug_cfd() can free cfd->cpumask, the system will crash if the
> cpu's cfd still in the call_function list:
>
>
> CPU A: CPU B
>
> smp_call_function_many() ......
> cpu_down() ......
> hotplug_cfd() -> ......
> free_cpumask_var(cfd->cpumask) (receive function IPI interrupte)
> /* read cfd->cpumask */
> generic_smp_call_function_interrupt() ->
> cpumask_test_and_clear_cpu(cpu, data->cpumask)
>
> CRASH!!!
>
> 2: It's not handle call_function list when cpu down, It's will lead to
> dead-wait if other path is waiting this cpu to execute function
>
> CPU A: CPU B
>
> smp_call_function_many(wait=0)
> ...... CPU B down
> smp_call_function_many() --> (cpu down before recevie function
> csd_lock(&data->csd); IPI interrupte)
>
> DEAD-WAIT!!!!
>
> So, CPU A will dead-wait in csd_lock(), the same as
> smp_call_function_single()
>
> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
> ---
> kernel/smp.c | 140 ++++++++++++++++++++++++++++++++-------------------------
> 1 files changed, 79 insertions(+), 61 deletions(-)
>
It was unfortunate that this patch moved a screenful of code around and
changed that code at the same time - it makes it hard to see what the
functional change was.
So I split this patch into two. The first patch simply moves
hotplug_cfd() to the end of the file and the second makes the
functional changes. The second patch is below, for easier review.
Do we think that this patch should be merged into 2.6.31? 2.6.30.x?
From: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
There is a race between generic_smp_call_function_*() and hotplug_cfd() in
many cases, see below examples:
1: hotplug_cfd() can free cfd->cpumask, the system will crash if the
cpu's cfd still in the call_function list:
CPU A: CPU B
smp_call_function_many() ......
cpu_down() ......
hotplug_cfd() -> ......
free_cpumask_var(cfd->cpumask) (receive function IPI interrupte)
/* read cfd->cpumask */
generic_smp_call_function_interrupt() ->
cpumask_test_and_clear_cpu(cpu, data->cpumask)
CRASH!!!
2: It's not handle call_function list when cpu down, It's will lead to
dead-wait if other path is waiting this cpu to execute function
CPU A: CPU B
smp_call_function_many(wait=0)
...... CPU B down
smp_call_function_many() --> (cpu down before recevie function
csd_lock(&data->csd); IPI interrupte)
DEAD-WAIT!!!!
So, CPU A will dead-wait in csd_lock(), the same as
smp_call_function_single()
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
kernel/smp.c | 38 ++++++++++++++++++++++++++++----------
1 file changed, 28 insertions(+), 10 deletions(-)
diff -puN kernel/smp.c~generic-ipi-fix-the-race-between-generic_smp_call_function_-and-hotplug_cfd kernel/smp.c
--- a/kernel/smp.c~generic-ipi-fix-the-race-between-generic_smp_call_function_-and-hotplug_cfd
+++ a/kernel/smp.c
@@ -116,14 +116,10 @@ void generic_exec_single(int cpu, struct
csd_lock_wait(data);
}
-/*
- * Invoked by arch to handle an IPI for call function. Must be called with
- * interrupts disabled.
- */
-void generic_smp_call_function_interrupt(void)
+static void
+__generic_smp_call_function_interrupt(int cpu, int run_callbacks)
{
struct call_function_data *data;
- int cpu = smp_processor_id();
/*
* Ensure entry is visible on call_function_queue after we have
@@ -169,12 +165,18 @@ void generic_smp_call_function_interrupt
}
/*
- * Invoked by arch to handle an IPI for call function single. Must be
- * called from the arch with interrupts disabled.
+ * Invoked by arch to handle an IPI for call function. Must be called with
+ * interrupts disabled.
*/
-void generic_smp_call_function_single_interrupt(void)
+void generic_smp_call_function_interrupt(void)
+{
+ __generic_smp_call_function_interrupt(smp_processor_id(), 1);
+}
+
+static void
+__generic_smp_call_function_single_interrupt(int cpu, int run_callbacks)
{
- struct call_single_queue *q = &__get_cpu_var(call_single_queue);
+ struct call_single_queue *q = &per_cpu(call_single_queue, cpu);
unsigned int data_flags;
LIST_HEAD(list);
@@ -205,6 +207,15 @@ void generic_smp_call_function_single_in
}
}
+/*
+ * Invoked by arch to handle an IPI for call function single. Must be
+ * called from the arch with interrupts disabled.
+ */
+void generic_smp_call_function_single_interrupt(void)
+{
+ __generic_smp_call_function_single_interrupt(smp_processor_id(), 1);
+}
+
static DEFINE_PER_CPU(struct call_single_data, csd_data);
/*
@@ -456,6 +467,7 @@ static int
hotplug_cfd(struct notifier_block *nfb, unsigned long action, void *hcpu)
{
long cpu = (long)hcpu;
+ unsigned long flags;
struct call_function_data *cfd = &per_cpu(cfd_data, cpu);
switch (action) {
@@ -472,6 +484,12 @@ hotplug_cfd(struct notifier_block *nfb,
case CPU_DEAD:
case CPU_DEAD_FROZEN:
+ local_irq_save(flags);
+ __generic_smp_call_function_interrupt(cpu, 0);
+ __generic_smp_call_function_single_interrupt(cpu, 0);
+ local_irq_restore(flags);
+
+ csd_lock_wait(&cfd->csd);
free_cpumask_var(cfd->cpumask);
break;
#endif
_
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/3 -mm] generic-ipi: fix hotplug_cfd()
2009-07-29 23:27 ` [PATCH 1/3 -mm] generic-ipi: fix hotplug_cfd() Andrew Morton
@ 2009-07-30 1:18 ` Li Zefan
0 siblings, 0 replies; 13+ messages in thread
From: Li Zefan @ 2009-07-30 1:18 UTC (permalink / raw)
To: Andrew Morton
Cc: Xiao Guangrong, mingo, jens.axboe, nickpiggin, peterz, rusty,
linux-kernel
07:27, Andrew Morton wrote:
> On Wed, 29 Jul 2009 15:53:13 +0800
> Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> wrote:
>
>> Use CONFIG_HOTPLUG_CPU, not CONFIG_CPU_HOTPLUG
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
>> ---
>> kernel/smp.c | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/kernel/smp.c b/kernel/smp.c
>> index bf9f18b..1b5fd2e 100644
>> --- a/kernel/smp.c
>> +++ b/kernel/smp.c
>> @@ -54,7 +54,7 @@ hotplug_cfd(struct notifier_block *nfb, unsigned long action, void *hcpu)
>> return NOTIFY_BAD;
>> break;
>>
>> -#ifdef CONFIG_CPU_HOTPLUG
>> +#ifdef CONFIG_HOTPLUG_CPU
>> case CPU_UP_CANCELED:
>> case CPU_UP_CANCELED_FROZEN:
>>
>
> Dammit, that mistake is easy to make. We should have used #if from day
> one, not #ifdef. Oh well.
>
> What's the impact of this bug? Do we think the fix should be present
> in 2.6.31? 2.6.30.x?
>
When hot-unpluging a cpu, it will leak memory allocated at cpu hotplug,
but only if CPUMASK_OFFSTACK=y, which is default to n, and I guess no
distro turns it on? All that said, I agree to add this patch to
-stable.
The bug is introduced by:
| commit 8969a5ede0f9e17da4b943712429aef2c9bcd82b
| Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
| Date: Wed Feb 25 13:59:47 2009 +0100
|
| generic-ipi: remove kmalloc()
So this patch can be applied to 2.6.29..2.6.31
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3 -mm] generic-ipi: fix the race between generic_smp_call_function_*() and hotplug_cfd()
2009-07-29 23:31 ` Andrew Morton
@ 2009-07-30 3:31 ` Xiao Guangrong
2009-07-30 6:50 ` Peter Zijlstra
1 sibling, 0 replies; 13+ messages in thread
From: Xiao Guangrong @ 2009-07-30 3:31 UTC (permalink / raw)
To: Andrew Morton; +Cc: mingo, jens.axboe, nickpiggin, peterz, rusty, linux-kernel
Andrew Morton wrote:
> On Wed, 29 Jul 2009 15:57:51 +0800
> Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> wrote:
>
>> It have race between generic_smp_call_function_*() and hotplug_cfd()
>> in many cases, see below examples:
>>
>> 1: hotplug_cfd() can free cfd->cpumask, the system will crash if the
>> cpu's cfd still in the call_function list:
>>
>>
>> CPU A: CPU B
>>
>> smp_call_function_many() ......
>> cpu_down() ......
>> hotplug_cfd() -> ......
>> free_cpumask_var(cfd->cpumask) (receive function IPI interrupte)
>> /* read cfd->cpumask */
>> generic_smp_call_function_interrupt() ->
>> cpumask_test_and_clear_cpu(cpu, data->cpumask)
>>
>> CRASH!!!
>>
>> 2: It's not handle call_function list when cpu down, It's will lead to
>> dead-wait if other path is waiting this cpu to execute function
>>
>> CPU A: CPU B
>>
>> smp_call_function_many(wait=0)
>> ...... CPU B down
>> smp_call_function_many() --> (cpu down before recevie function
>> csd_lock(&data->csd); IPI interrupte)
>>
>> DEAD-WAIT!!!!
>>
>> So, CPU A will dead-wait in csd_lock(), the same as
>> smp_call_function_single()
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
>> ---
>> kernel/smp.c | 140 ++++++++++++++++++++++++++++++++-------------------------
>> 1 files changed, 79 insertions(+), 61 deletions(-)
>>
>
> It was unfortunate that this patch moved a screenful of code around and
> changed that code at the same time - it makes it hard to see what the
> functional change was.
>
> So I split this patch into two. The first patch simply moves
> hotplug_cfd() to the end of the file and the second makes the
> functional changes. The second patch is below, for easier review.
>
> Do we think that this patch should be merged into 2.6.31? 2.6.30.x?
>
This bug is conceal from v2.6.26 when kernel/smp.c created and be
found by my review, no one is bothered by it and sends us a bug
report, besides, this patch can't be applied to <= 2.6.30 cleanly,
so I think we can just fix it for .31
Thanks,
Xiao
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3 -mm] generic-ipi: fix the race between generic_smp_call_function_*() and hotplug_cfd()
2009-07-29 23:31 ` Andrew Morton
2009-07-30 3:31 ` Xiao Guangrong
@ 2009-07-30 6:50 ` Peter Zijlstra
2009-07-30 8:11 ` Xiao Guangrong
1 sibling, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2009-07-30 6:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Xiao Guangrong, mingo, jens.axboe, nickpiggin, rusty,
linux-kernel
On Wed, 2009-07-29 at 16:31 -0700, Andrew Morton wrote:
> -void generic_smp_call_function_interrupt(void)
> +static void
> +__generic_smp_call_function_interrupt(int cpu, int run_callbacks)
> {
> struct call_function_data *data;
> - int cpu = smp_processor_id();
>
> /*
> * Ensure entry is visible on call_function_queue after we have
> @@ -169,12 +165,18 @@ void generic_smp_call_function_interrupt
> +static void
> +__generic_smp_call_function_single_interrupt(int cpu, int run_callbacks)
> {
> - struct call_single_queue *q = &__get_cpu_var(call_single_queue);
> + struct call_single_queue *q = &per_cpu(call_single_queue, cpu);
> unsigned int data_flags;
> LIST_HEAD(list);
>
It introduces this run_callbacks thing to two functions, but nothing
actually uses that... makes me suspicious there's something missing.
> case CPU_DEAD_FROZEN:
> + local_irq_save(flags);
> + __generic_smp_call_function_interrupt(cpu, 0);
> + __generic_smp_call_function_single_interrupt(cpu, 0);
> + local_irq_restore(flags);
Doing the callbacks from a different cpu than they were queued on seems
like a fine way to mess things up.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3 -mm] generic-ipi: fix the race between generic_smp_call_function_*() and hotplug_cfd()
2009-07-30 6:50 ` Peter Zijlstra
@ 2009-07-30 8:11 ` Xiao Guangrong
2009-07-30 8:23 ` Li Zefan
0 siblings, 1 reply; 13+ messages in thread
From: Xiao Guangrong @ 2009-07-30 8:11 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrew Morton, mingo, jens.axboe, nickpiggin, rusty, linux-kernel
Peter Zijlstra wrote:
> It introduces this run_callbacks thing to two functions, but nothing
> actually uses that... makes me suspicious there's something missing.
>
Hi Peter,
Thanks for your point out.
I'm sorry for making a mistake when I make this patch, I'll send the new
patch soon
Hi Andrew,
I can't apply "kernel/smp.c: relocate some code" because this patch are
base on my other patch: "generic-ipi: make struct call_function_data lockless",
which can be found at:
http://lkml.org/lkml/2009/7/24/63
I'll rewrite your patch base on it
Thanks,
Xiao
>
>> case CPU_DEAD_FROZEN:
>> + local_irq_save(flags);
>> + __generic_smp_call_function_interrupt(cpu, 0);
>> + __generic_smp_call_function_single_interrupt(cpu, 0);
>> + local_irq_restore(flags);
>
> Doing the callbacks from a different cpu than they were queued on seems
> like a fine way to mess things up.
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3 -mm] generic-ipi: fix the race between generic_smp_call_function_*() and hotplug_cfd()
2009-07-30 8:11 ` Xiao Guangrong
@ 2009-07-30 8:23 ` Li Zefan
0 siblings, 0 replies; 13+ messages in thread
From: Li Zefan @ 2009-07-30 8:23 UTC (permalink / raw)
To: Xiao Guangrong
Cc: Peter Zijlstra, Andrew Morton, mingo, jens.axboe, nickpiggin,
rusty, linux-kernel
Xiao Guangrong wrote:
>
> Peter Zijlstra wrote:
>
>> It introduces this run_callbacks thing to two functions, but nothing
>> actually uses that... makes me suspicious there's something missing.
>>
>
> Hi Peter,
>
> Thanks for your point out.
> I'm sorry for making a mistake when I make this patch, I'll send the new
> patch soon
>
> Hi Andrew,
>
> I can't apply "kernel/smp.c: relocate some code" because this patch are
> base on my other patch: "generic-ipi: make struct call_function_data lockless",
> which can be found at:
> http://lkml.org/lkml/2009/7/24/63
>
I'm a bit confused what's the problem here.
> I'll rewrite your patch base on it
>
No, instead you should cook *your* patch against lastest -mm tree.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-07-30 8:25 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-24 9:50 [PATCH] generic-ipi: make struct call_function_data lockless Xiao Guangrong
2009-07-27 22:00 ` Andrew Morton
2009-07-29 7:08 ` Jens Axboe
2009-07-29 7:53 ` [PATCH 1/3 -mm] generic-ipi: fix hotplug_cfd() Xiao Guangrong
2009-07-29 7:55 ` [PATCH 2/3 -mm] generic-ipi: cleanup for generic_smp_call_function_interrupt() Xiao Guangrong
2009-07-29 7:57 ` [PATCH 3/3 -mm] generic-ipi: fix the race between generic_smp_call_function_*() and hotplug_cfd() Xiao Guangrong
2009-07-29 23:31 ` Andrew Morton
2009-07-30 3:31 ` Xiao Guangrong
2009-07-30 6:50 ` Peter Zijlstra
2009-07-30 8:11 ` Xiao Guangrong
2009-07-30 8:23 ` Li Zefan
2009-07-29 23:27 ` [PATCH 1/3 -mm] generic-ipi: fix hotplug_cfd() Andrew Morton
2009-07-30 1:18 ` Li Zefan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox