public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] hrtimer: remove deadlock due to waiting on IPI in softirq context
@ 2014-03-05 21:25 Rik van Riel
  2014-03-05 21:51 ` Thomas Gleixner
  0 siblings, 1 reply; 3+ messages in thread
From: Rik van Riel @ 2014-03-05 21:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mateusz Guzik, Benjamin Herrenschmidt, Ingo Molnar,
	Thomas Gleixner, Prarit Bhargava, Frederic Weisbecker,
	Clark Williams

There appears to be a deadlock in the hrtimer code. Specifically,
clock_was_set() calls an IPI with wait=1, from softirq context.

Waiting for IPIs to complete in irq context can lead to a deadlock,
because the current code (that was interrupted) might be holding some
kind of lock, that another CPU is waiting for with spin_lock_irq or
similar.

In other words, the current CPU may need to release a resource, before
the IPI can be handled by one of the destination CPUs.

To my untrained eye, it does not look like this patch introduces a
new bug to the timer code, but that is hard to ascertain with the
timer code. so I am posting this as an RFC for the timer gods to hurt
their brains on :)

This bug was introduced by 54cdfdb4 in early 2007 (the original
hrtimer code patch).

Not-yet-signed-off-by: Rik van Riel <riel@redhat.com>
Reported-by: Mateusz Guzik <mguzik@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Clark Williams <williams@redhat.com>
---
 kernel/hrtimer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 0909436..19145ec 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -771,7 +771,7 @@ void clock_was_set(void)
 {
 #ifdef CONFIG_HIGH_RES_TIMERS
 	/* Retrigger the CPU local events everywhere */
-	on_each_cpu(retrigger_next_event, NULL, 1);
+	on_each_cpu(retrigger_next_event, NULL, 0);
 #endif
 	timerfd_clock_was_set();
 }

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [RFC PATCH] hrtimer: remove deadlock due to waiting on IPI in softirq context
  2014-03-05 21:25 [RFC PATCH] hrtimer: remove deadlock due to waiting on IPI in softirq context Rik van Riel
@ 2014-03-05 21:51 ` Thomas Gleixner
  2014-03-05 21:54   ` Rik van Riel
  0 siblings, 1 reply; 3+ messages in thread
From: Thomas Gleixner @ 2014-03-05 21:51 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, Mateusz Guzik, Benjamin Herrenschmidt, Ingo Molnar,
	Prarit Bhargava, Frederic Weisbecker, Clark Williams

On Wed, 5 Mar 2014, Rik van Riel wrote:
> There appears to be a deadlock in the hrtimer code. Specifically,
> clock_was_set() calls an IPI with wait=1, from softirq context.

This should not be called from softirq context.
 
> Waiting for IPIs to complete in irq context can lead to a deadlock,
> because the current code (that was interrupted) might be holding some
> kind of lock, that another CPU is waiting for with spin_lock_irq or
> similar.
> 
> In other words, the current CPU may need to release a resource, before
> the IPI can be handled by one of the destination CPUs.
> 
> To my untrained eye, it does not look like this patch introduces a
> new bug to the timer code, but that is hard to ascertain with the
> timer code. so I am posting this as an RFC for the timer gods to hurt
> their brains on :)
> 
> This bug was introduced by 54cdfdb4 in early 2007 (the original
> hrtimer code patch).

Right and we had some issues with that until we moved the calls to
clock_was_set() out of lock held regions.

The only call which happens from interrupt context is in
update_wall_time(). And that one definitely holds no locks which are
relevant.

On which kernel are you observing the issue?

Can you provide the debug info which made you look into this?

Thanks,

	tglx
 
> Not-yet-signed-off-by: Rik van Riel <riel@redhat.com>
> Reported-by: Mateusz Guzik <mguzik@redhat.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Prarit Bhargava <prarit@redhat.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Clark Williams <williams@redhat.com>
> ---
>  kernel/hrtimer.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> index 0909436..19145ec 100644
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -771,7 +771,7 @@ void clock_was_set(void)
>  {
>  #ifdef CONFIG_HIGH_RES_TIMERS
>  	/* Retrigger the CPU local events everywhere */
> -	on_each_cpu(retrigger_next_event, NULL, 1);
> +	on_each_cpu(retrigger_next_event, NULL, 0);
>  #endif
>  	timerfd_clock_was_set();
>  }
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC PATCH] hrtimer: remove deadlock due to waiting on IPI in softirq context
  2014-03-05 21:51 ` Thomas Gleixner
@ 2014-03-05 21:54   ` Rik van Riel
  0 siblings, 0 replies; 3+ messages in thread
From: Rik van Riel @ 2014-03-05 21:54 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Mateusz Guzik, Benjamin Herrenschmidt, Ingo Molnar,
	Prarit Bhargava, Frederic Weisbecker, Clark Williams

On 03/05/2014 04:51 PM, Thomas Gleixner wrote:
> On Wed, 5 Mar 2014, Rik van Riel wrote:
>> There appears to be a deadlock in the hrtimer code. Specifically,
>> clock_was_set() calls an IPI with wait=1, from softirq context.
>
> This should not be called from softirq context.
>
>> Waiting for IPIs to complete in irq context can lead to a deadlock,
>> because the current code (that was interrupted) might be holding some
>> kind of lock, that another CPU is waiting for with spin_lock_irq or
>> similar.
>>
>> In other words, the current CPU may need to release a resource, before
>> the IPI can be handled by one of the destination CPUs.
>>
>> To my untrained eye, it does not look like this patch introduces a
>> new bug to the timer code, but that is hard to ascertain with the
>> timer code. so I am posting this as an RFC for the timer gods to hurt
>> their brains on :)
>>
>> This bug was introduced by 54cdfdb4 in early 2007 (the original
>> hrtimer code patch).
>
> Right and we had some issues with that until we moved the calls to
> clock_was_set() out of lock held regions.

Ahh indeed, the bug got fixed already :)

> The only call which happens from interrupt context is in
> update_wall_time(). And that one definitely holds no locks which are
> relevant.
>
> On which kernel are you observing the issue?

This was RHEL6, and I saw that the immediate function
was still the same upstream.

I forgot to check that clock_was_set() is now called
in a different way. My bad.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-03-05 21:55 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-05 21:25 [RFC PATCH] hrtimer: remove deadlock due to waiting on IPI in softirq context Rik van Riel
2014-03-05 21:51 ` Thomas Gleixner
2014-03-05 21:54   ` Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox