From: Peter Zijlstra <peterz@infradead.org>
To: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>,
Venkatesh Pallipadi <venki@google.com>,
Suresh Siddha <suresh.b.siddha@intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, Jens Axboe <axboe@kernel.dk>
Subject: Re: [PATCH] generic-ipi: fix deadlock in __smp_call_function_single
Date: Fri, 10 Sep 2010 13:06:57 +0200 [thread overview]
Message-ID: <1284116817.402.33.camel@laptop> (raw)
In-Reply-To: <20100909135050.GB2228@osiris.boeblingen.de.ibm.com>
On Thu, 2010-09-09 at 15:50 +0200, Heiko Carstens wrote:
> From: Heiko Carstens <heiko.carstens@de.ibm.com>
>
> Just got my 6 way machine to a state where cpu 0 is in an endless loop
> within __smp_call_function_single.
> All other cpus are idle.
>
> The call trace on cpu 0 looks like this:
>
> __smp_call_function_single
> scheduler_tick
> update_process_times
> tick_sched_timer
> __run_hrtimer
> hrtimer_interrupt
> clock_comparator_work
> do_extint
> ext_int_handler
> ----> timer irq
> cpu_idle
>
> __smp_call_function_single got called from nohz_balancer_kick (inlined)
> with the remote cpu being 1, wait being 0 and the per cpu variable
> remote_sched_softirq_cb (call_single_data) of the current cpu (0).
>
> Then it loops forever when it tries to grab the lock of the
> call_single_data, since it is already locked and enqueued on cpu 0.
>
> My theory how this could have happened: for some reason the scheduler
> decided to call __smp_call_function_single on it's own cpu, and sends
> an IPI to itself. The interrupt stays pending since IRQs are disabled.
> If then the hypervisor schedules the cpu away it might happen that upon
> rescheduling both the IPI and the timer IRQ are pending.
> If then interrupts are enabled again it depends which one gets scheduled
> first.
> If the timer interrupt gets delivered first we end up with the local
> deadlock as seen in the calltrace above.
>
> Let's make __smp_call_function_single check if the target cpu is the
> current cpu and execute the function immediately just like
> smp_call_function_single does. That should prevent at least the
> scenario described here.
>
> It might also be that the scheduler is not supposed to call
> __smp_call_function_single with the remote cpu being the current cpu,
> but that is a different issue.
>
> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Right, so it looks like all other users of __smp_call_function_single()
do indeed ensure not to call it on self, but your patch does make sense.
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> kernel/smp.c | 14 ++++++++++++--
> 1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/smp.c b/kernel/smp.c
> index 75c970c..f1427d8 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -376,8 +376,10 @@ EXPORT_SYMBOL_GPL(smp_call_function_any);
> void __smp_call_function_single(int cpu, struct call_single_data *data,
> int wait)
> {
> - csd_lock(data);
> + unsigned int this_cpu;
> + unsigned long flags;
>
> + this_cpu = get_cpu();
> /*
> * Can deadlock when called with interrupts disabled.
> * We allow cpu's that are not yet online though, as no one else can
> @@ -387,7 +389,15 @@ void __smp_call_function_single(int cpu, struct call_single_data *data,
> WARN_ON_ONCE(cpu_online(smp_processor_id()) && wait && irqs_disabled()
> && !oops_in_progress);
>
> - generic_exec_single(cpu, data, wait);
> + if (cpu == this_cpu) {
> + local_irq_save(flags);
> + data->func(data->info);
> + local_irq_restore(flags);
> + } else {
> + csd_lock(data);
> + generic_exec_single(cpu, data, wait);
> + }
> + put_cpu();
> }
>
> /**
next prev parent reply other threads:[~2010-09-10 11:07 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-09 13:50 [PATCH] generic-ipi: fix deadlock in __smp_call_function_single Heiko Carstens
2010-09-10 11:06 ` Peter Zijlstra [this message]
2010-09-10 11:23 ` Jens Axboe
2010-09-10 11:47 ` Heiko Carstens
2010-09-10 15:47 ` [tip:core/urgent] generic-ipi: Fix " tip-bot for Heiko Carstens
2010-09-11 0:28 ` [PATCH] generic-ipi: fix " Andrew Morton
2010-09-11 9:20 ` Peter Zijlstra
2010-09-11 16:42 ` Venkatesh Pallipadi
2010-09-13 8:08 ` Heiko Carstens
2010-09-13 18:02 ` Suresh Siddha
2010-09-14 8:03 ` Peter Zijlstra
2010-09-14 11:19 ` Heiko Carstens
2010-09-17 22:12 ` Suresh Siddha
2010-09-18 15:18 ` Peter Zijlstra
2010-09-21 14:13 ` [tip:sched/urgent] sched: Fix nohz balance kick tip-bot for Suresh Siddha
2010-09-26 8:42 ` [PATCH] generic-ipi: fix deadlock in __smp_call_function_single Ingo Molnar
2010-09-26 12:59 ` Heiko Carstens
2010-09-26 16:23 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1284116817.402.33.camel@laptop \
--to=peterz@infradead.org \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=heiko.carstens@de.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=suresh.b.siddha@intel.com \
--cc=venki@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox