From: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: akpm@linux-foundation.org, mm-commits@vger.kernel.org,
jens.axboe@oracle.com, mingo@elte.hu, nickpiggin@yahoo.com.au,
rusty@rustcorp.com.au, suresh.b.siddha@intel.com,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: + generic-ipi-fix-the-race-between-generic_smp_call_function_-and-hotplug_cfd.patch added to -mm tree
Date: Fri, 11 Sep 2009 15:45:05 +0800 [thread overview]
Message-ID: <4AAA0001.2060703@cn.fujitsu.com> (raw)
In-Reply-To: <1252616988.7205.102.camel@laptop>
Peter Zijlstra wrote:
> On Thu, 2009-07-30 at 17:30 -0700, akpm@linux-foundation.org wrote:
>
>> ------------------------------------------------------
>> Subject: generic-ipi: fix the race between generic_smp_call_function_*() and hotplug_cfd()
>> From: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
>>
>> There is a race between generic_smp_call_function_*() and hotplug_cfd() in
>> many cases, see below examples:
>>
>> 1: hotplug_cfd() can free cfd->cpumask, the system will crash if the
>> cpu's cfd still in the call_function list:
>>
>>
>> CPU A: CPU B
>>
>> smp_call_function_many() ......
>> cpu_down() ......
>> hotplug_cfd() -> ......
>> free_cpumask_var(cfd->cpumask) (receive function IPI interrupte)
>> /* read cfd->cpumask */
>> generic_smp_call_function_interrupt() ->
>> cpumask_test_and_clear_cpu(cpu, data->cpumask)
>>
>> CRASH!!!
>>
>> 2: It's not handle call_function list when cpu down, It's will lead to
>> dead-wait if other path is waiting this cpu to execute function
>>
>> CPU A: CPU B
>>
>> smp_call_function_many(wait=0)
>> ...... CPU B down
>> smp_call_function_many() --> (cpu down before recevie function
>> csd_lock(&data->csd); IPI interrupte)
>>
>> DEAD-WAIT!!!!
>>
>> So, CPU A will dead-wait in csd_lock(), the same as
>> smp_call_function_single()
>
> On re-reading this, I'm wondering if 2 is a real case.
>
> I'm thinking it should never happen since you're supposed to do things
> like get_online_cpus() around stuff like this, but then again, I doubt
> we actually do.
>
I think that get_online_cpus() and stop_machine() can't avoid this
race,
1: For the first example in my patch's changlog:
CPU A: CPU B
smp_call_function_many(wait=0) ......
cpu_down() ......
hotplug_cfd() -> ......
free_cpumask_var(cfd->cpumask) (receive function IPI interrupte)
/* read cfd->cpumask */
generic_smp_call_function_interrupt() ->
cpumask_test_and_clear_cpu(cpu, data->cpumask)
CRASH!!!
CPU A call smp_call_function_many(wait=0) that want CPU B to call
a specific function, after smp_call_function_many() return, we let
CPU A offline immediately. Unfortunately, if CPU B receives this
IPI interrupt after CPU A down, it will crash like above description.
2: For the second example in my patch's changlog:
If CPU B is dying, like below:
_cpu_down()
{
......
/* We suppose that have below sequences:
* before call __stop_machine(), CPU B is online (in cpu_online_mask),
* in this time, CPU A call smp_call_function_many(wait=0) and want
* CPU B to call a specific function, after CPU A finish it, CPU B
* go to __stop_machine() and disable it's interrupt
* (suppose CPU B not receive IPI interrupt in this time now)
*/
err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
......
}
Now, CPU B is down, but it's not handle CPU A's request, it cause that
can't clean the CSD_FLAG_LOCK flag of CPU A's cfd_data, if CPU A
call smp_call_function_many() next time. it will block in
csd_lock() -> csd_lock_wait(data) forever.
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
>> Cc: Ingo Molnar <mingo@elte.hu>
>> Cc: Jens Axboe <jens.axboe@oracle.com>
>> Cc: Nick Piggin <nickpiggin@yahoo.com.au>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>> ---
>>
>> kernel/smp.c | 38 ++++++++++++++++++++++++++++----------
>> 1 file changed, 28 insertions(+), 10 deletions(-)
>>
>> diff -puN kernel/smp.c~generic-ipi-fix-the-race-between-generic_smp_call_function_-and-hotplug_cfd kernel/smp.c
>> --- a/kernel/smp.c~generic-ipi-fix-the-race-between-generic_smp_call_function_-and-hotplug_cfd
>> +++ a/kernel/smp.c
>> @@ -113,14 +113,10 @@ void generic_exec_single(int cpu, struct
>> csd_lock_wait(data);
>> }
>>
>> -/*
>> - * Invoked by arch to handle an IPI for call function. Must be called with
>> - * interrupts disabled.
>> - */
>> -void generic_smp_call_function_interrupt(void)
>> +static void
>> +__generic_smp_call_function_interrupt(int cpu, int run_callbacks)
>> {
>> struct call_function_data *data;
>> - int cpu = smp_processor_id();
>>
>> /*
>> * Ensure entry is visible on call_function_queue after we have
>
> Also, if this is the last version, we're still not using run_callbacks
> for anything..
>
It's not the last version and fixed in another patch, see below URL please:
http://marc.info/?l=linux-mm-commits&m=124900028228350&w=2
Thanks,
Xiao
next prev parent reply other threads:[~2009-09-11 7:45 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-31 0:30 + generic-ipi-fix-the-race-between-generic_smp_call_function_-and-hotplug_cfd.patch added to -mm tree akpm
[not found] ` <1252616988.7205.102.camel@laptop>
2009-09-11 7:45 ` Xiao Guangrong [this message]
2009-09-11 7:50 ` Peter Zijlstra
2009-09-11 19:08 ` Suresh Siddha
2009-09-14 7:22 ` Xiao Guangrong
2009-09-15 0:16 ` Suresh Siddha
2009-09-15 2:03 ` Xiao Guangrong
2009-09-16 2:20 ` Suresh Siddha
2009-09-17 3:00 ` Xiao Guangrong
2009-09-17 7:45 ` Peter Zijlstra
2009-09-19 2:16 ` Suresh Siddha
2009-09-21 2:55 ` Xiao Guangrong
2009-09-21 3:11 ` Suresh Siddha
2009-09-21 4:04 ` Xiao Guangrong
2009-09-22 5:32 ` Suresh Siddha
2009-09-22 6:52 ` Xiao Guangrong
-- strict thread matches above, loose matches on Subject: below --
2009-07-29 23:31 akpm
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4AAA0001.2060703@cn.fujitsu.com \
--to=xiaoguangrong@cn.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=jens.axboe@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=mm-commits@vger.kernel.org \
--cc=nickpiggin@yahoo.com.au \
--cc=peterz@infradead.org \
--cc=rusty@rustcorp.com.au \
--cc=suresh.b.siddha@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.