* [PATCH RT] rt: Make cpu_chill() use hrtimer instead of msleep()
@ 2014-02-05 16:51 Steven Rostedt
2014-02-05 16:57 ` Steven Rostedt
2014-02-07 11:30 ` Sebastian Andrzej Siewior
0 siblings, 2 replies; 9+ messages in thread
From: Steven Rostedt @ 2014-02-05 16:51 UTC (permalink / raw)
To: LKML, linux-rt-users
Cc: Thomas Gleixner, Sebastian Andrzej Siewior, Clark Williams,
Luis Claudio R. Goncalves, John Kacur, Ulrich Obergfell
Ulrich Obergfell pointed out that cpu_chill() calls msleep() which is woken
up by the ksoftirqd running the TIMER softirq. But as the cpu_chill() is
called from softirq context, it may block the ksoftirqd() from running, in
which case, it may never wake up the msleep() causing the deadlock.
I checked the vmcore, and irq/74-qla2xxx is stuck in the msleep() call,
running on CPU 8. The one ksoftirqd that is stuck, happens to be the one that
runs on CPU 8, and it is blocked on a lock held by irq/74-qla2xxx. As that
ksoftirqd is the one that will wake up irq/74-qla2xxx, and it happens to be
blocked on a lock that irq/74-qla2xxx holds, we have our deadlock.
The solution is not to convert the cpu_chill() back to a cpu_relax() as that
will re-create a possible live lock that the cpu_chill() fixed earlier, and may
also leave this bug open on other softirqs. The fix is to remove the
dependency on ksoftirqd from cpu_chill(). That is, instead of calling
msleep() that requires ksoftirqd to wake it up, use the
hrtimer_nanosleep() code that does the wakeup from hard irq context.
Found-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
diff --git a/include/linux/delay.h b/include/linux/delay.h
index e23a7c0..37caab3 100644
--- a/include/linux/delay.h
+++ b/include/linux/delay.h
@@ -53,7 +53,7 @@ static inline void ssleep(unsigned int seconds)
}
#ifdef CONFIG_PREEMPT_RT_FULL
-# define cpu_chill() msleep(1)
+extern void cpu_chill(void);
#else
# define cpu_chill() cpu_relax()
#endif
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index aa5eb4f..2f023aa 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1852,6 +1852,21 @@ SYSCALL_DEFINE2(nanosleep, struct timespec __user *, rqtp,
return hrtimer_nanosleep(&tu, rmtp, HRTIMER_MODE_REL, CLOCK_MONOTONIC);
}
+#ifdef CONFIG_PREEMPT_RT_FULL
+/*
+ * Sleep for 1 ms in hope whoever holds what we want will let it go.
+ */
+void cpu_chill(void)
+{
+ struct timespec tu = {
+ .tv_nsec = NSEC_PER_MSEC,
+ };
+
+ hrtimer_nanosleep(&tu, NULL, HRTIMER_MODE_REL, CLOCK_MONOTONIC);
+}
+EXPORT_SYMBOL(cpu_chill);
+#endif
+
/*
* Functions related to boot-time initialization:
*/
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH RT] rt: Make cpu_chill() use hrtimer instead of msleep()
2014-02-05 16:51 [PATCH RT] rt: Make cpu_chill() use hrtimer instead of msleep() Steven Rostedt
@ 2014-02-05 16:57 ` Steven Rostedt
2014-02-07 11:30 ` Sebastian Andrzej Siewior
1 sibling, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2014-02-05 16:57 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, linux-rt-users, Thomas Gleixner, Sebastian Andrzej Siewior,
Clark Williams, Luis Claudio R. Goncalves, John Kacur,
Ulrich Obergfell, stable-rt
On Wed, 5 Feb 2014 11:51:25 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Ulrich Obergfell pointed out that cpu_chill() calls msleep() which is woken
> up by the ksoftirqd running the TIMER softirq. But as the cpu_chill() is
> called from softirq context, it may block the ksoftirqd() from running, in
> which case, it may never wake up the msleep() causing the deadlock.
>
> I checked the vmcore, and irq/74-qla2xxx is stuck in the msleep() call,
> running on CPU 8. The one ksoftirqd that is stuck, happens to be the one that
> runs on CPU 8, and it is blocked on a lock held by irq/74-qla2xxx. As that
> ksoftirqd is the one that will wake up irq/74-qla2xxx, and it happens to be
> blocked on a lock that irq/74-qla2xxx holds, we have our deadlock.
>
> The solution is not to convert the cpu_chill() back to a cpu_relax() as that
> will re-create a possible live lock that the cpu_chill() fixed earlier, and may
> also leave this bug open on other softirqs. The fix is to remove the
> dependency on ksoftirqd from cpu_chill(). That is, instead of calling
> msleep() that requires ksoftirqd to wake it up, use the
> hrtimer_nanosleep() code that does the wakeup from hard irq context.
>
> Found-by: Ulrich Obergfell <uobergfe@redhat.com>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
>
I should have added:
Cc: stable-rt@vger.kernel.org
-- Steve
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RT] rt: Make cpu_chill() use hrtimer instead of msleep()
2014-02-05 16:51 [PATCH RT] rt: Make cpu_chill() use hrtimer instead of msleep() Steven Rostedt
2014-02-05 16:57 ` Steven Rostedt
@ 2014-02-07 11:30 ` Sebastian Andrzej Siewior
2014-02-07 14:08 ` Steven Rostedt
1 sibling, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-02-07 11:30 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, linux-rt-users, Thomas Gleixner, Clark Williams,
Luis Claudio R. Goncalves, John Kacur, Ulrich Obergfell
* Steven Rostedt | 2014-02-05 11:51:25 [-0500]:
>Ulrich Obergfell pointed out that cpu_chill() calls msleep() which is woken
>up by the ksoftirqd running the TIMER softirq. But as the cpu_chill() is
>called from softirq context, it may block the ksoftirqd() from running, in
>which case, it may never wake up the msleep() causing the deadlock.
>
>I checked the vmcore, and irq/74-qla2xxx is stuck in the msleep() call,
>running on CPU 8. The one ksoftirqd that is stuck, happens to be the one that
>runs on CPU 8, and it is blocked on a lock held by irq/74-qla2xxx. As that
>ksoftirqd is the one that will wake up irq/74-qla2xxx, and it happens to be
>blocked on a lock that irq/74-qla2xxx holds, we have our deadlock.
could you please tell me which two locks are invovled here?
Sebastian
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RT] rt: Make cpu_chill() use hrtimer instead of msleep()
2014-02-07 11:30 ` Sebastian Andrzej Siewior
@ 2014-02-07 14:08 ` Steven Rostedt
2014-02-07 14:13 ` Steven Rostedt
0 siblings, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2014-02-07 14:08 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: LKML, linux-rt-users, Thomas Gleixner, Clark Williams,
Luis Claudio R. Goncalves, John Kacur, Ulrich Obergfell
On Fri, 7 Feb 2014 12:30:06 +0100
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> * Steven Rostedt | 2014-02-05 11:51:25 [-0500]:
>
> >Ulrich Obergfell pointed out that cpu_chill() calls msleep() which is woken
> >up by the ksoftirqd running the TIMER softirq. But as the cpu_chill() is
> >called from softirq context, it may block the ksoftirqd() from running, in
> >which case, it may never wake up the msleep() causing the deadlock.
> >
> >I checked the vmcore, and irq/74-qla2xxx is stuck in the msleep() call,
> >running on CPU 8. The one ksoftirqd that is stuck, happens to be the one that
> >runs on CPU 8, and it is blocked on a lock held by irq/74-qla2xxx. As that
> >ksoftirqd is the one that will wake up irq/74-qla2xxx, and it happens to be
> >blocked on a lock that irq/74-qla2xxx holds, we have our deadlock.
>
> could you please tell me which two locks are invovled here?
>
Looks to be the lock of the block softirq. I don't have the core dump
anymore, but from what I could tell the ksoftirqd was blocked on the
block softirq lock, where the block softirq handler did a msleep
(called by the qla2xxx interrupt handler).
Looking at trigger_softirq() in block/blk-softirq.c, it can do a
smp_callfunction() to another cpu to run the block softirq. If that
happens to be the cpu where the qla2xx irq handler is doing the block
softirq and is in a middle of a msleep(), I believe the ksoftirqd will
try to run the softirq. If it does that, then BOOM, it's deadlocked
because the ksoftirqd will never run the timer softirq either.
-- Steve
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RT] rt: Make cpu_chill() use hrtimer instead of msleep()
2014-02-07 14:08 ` Steven Rostedt
@ 2014-02-07 14:13 ` Steven Rostedt
2014-02-07 14:21 ` Sebastian Andrzej Siewior
2014-02-19 11:53 ` [PATCH RT] kernel/hrtimer: be non-freezeable in cpu_chill() Sebastian Andrzej Siewior
0 siblings, 2 replies; 9+ messages in thread
From: Steven Rostedt @ 2014-02-07 14:13 UTC (permalink / raw)
To: Steven Rostedt
Cc: Sebastian Andrzej Siewior, LKML, linux-rt-users, Thomas Gleixner,
Clark Williams, Luis Claudio R. Goncalves, John Kacur,
Ulrich Obergfell
On Fri, 7 Feb 2014 09:08:34 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> > could you please tell me which two locks are invovled here?
I should have also stated that it was only one lock that was involved.
But the lock owner was doing a msleep() that requires a wakeup by
ksoftirqd to continue. If ksoftirqd happens to be blocked on a lock
held by the msleep() caller, then you have your deadlock.
It's best not to have any softirqs going to sleep requiring another
softirq to wake it up. Note, if we ever require a timer softirq to do a
cpu_chill() it will most definitely hit this deadlock.
-- Steve
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RT] rt: Make cpu_chill() use hrtimer instead of msleep()
2014-02-07 14:13 ` Steven Rostedt
@ 2014-02-07 14:21 ` Sebastian Andrzej Siewior
2014-02-19 11:53 ` [PATCH RT] kernel/hrtimer: be non-freezeable in cpu_chill() Sebastian Andrzej Siewior
1 sibling, 0 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-02-07 14:21 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, linux-rt-users, Thomas Gleixner, Clark Williams,
Luis Claudio R. Goncalves, John Kacur, Ulrich Obergfell
On 02/07/2014 03:13 PM, Steven Rostedt wrote:
> On Fri, 7 Feb 2014 09:08:34 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
>
>>> could you please tell me which two locks are invovled here?
>
> I should have also stated that it was only one lock that was involved.
> But the lock owner was doing a msleep() that requires a wakeup by
> ksoftirqd to continue. If ksoftirqd happens to be blocked on a lock
> held by the msleep() caller, then you have your deadlock.
That makes sense.
> It's best not to have any softirqs going to sleep requiring another
> softirq to wake it up. Note, if we ever require a timer softirq to do a
> cpu_chill() it will most definitely hit this deadlock.
Yes. And that sleep in softirq is also not really nice but this isn't
new. Thanks for the patch & explanation.
>
> -- Steve
Sebastian
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH RT] kernel/hrtimer: be non-freezeable in cpu_chill()
2014-02-07 14:13 ` Steven Rostedt
2014-02-07 14:21 ` Sebastian Andrzej Siewior
@ 2014-02-19 11:53 ` Sebastian Andrzej Siewior
2014-02-19 15:27 ` Steven Rostedt
1 sibling, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-02-19 11:53 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, linux-rt-users, Thomas Gleixner, Clark Williams,
Luis Claudio R. Goncalves, John Kacur, Ulrich Obergfell
Since we replaced msleep() by hrtimer I see now and then (rarely) this:
| [....] Waiting for /dev to be fully populated...
| =====================================
| [ BUG: udevd/229 still has locks held! ]
| 3.12.11-rt17 #23 Not tainted
| -------------------------------------
| 1 lock held by udevd/229:
| #0: (&type->i_mutex_dir_key#2){+.+.+.}, at: lookup_slow+0x28/0x98
|
| stack backtrace:
| CPU: 0 PID: 229 Comm: udevd Not tainted 3.12.11-rt17 #23
| (unwind_backtrace+0x0/0xf8) from (show_stack+0x10/0x14)
| (show_stack+0x10/0x14) from (dump_stack+0x74/0xbc)
| (dump_stack+0x74/0xbc) from (do_nanosleep+0x120/0x160)
| (do_nanosleep+0x120/0x160) from (hrtimer_nanosleep+0x90/0x110)
| (hrtimer_nanosleep+0x90/0x110) from (cpu_chill+0x30/0x38)
| (cpu_chill+0x30/0x38) from (dentry_kill+0x158/0x1ec)
| (dentry_kill+0x158/0x1ec) from (dput+0x74/0x15c)
| (dput+0x74/0x15c) from (lookup_real+0x4c/0x50)
| (lookup_real+0x4c/0x50) from (__lookup_hash+0x34/0x44)
| (__lookup_hash+0x34/0x44) from (lookup_slow+0x38/0x98)
| (lookup_slow+0x38/0x98) from (path_lookupat+0x208/0x7fc)
| (path_lookupat+0x208/0x7fc) from (filename_lookup+0x20/0x60)
| (filename_lookup+0x20/0x60) from (user_path_at_empty+0x50/0x7c)
| (user_path_at_empty+0x50/0x7c) from (user_path_at+0x14/0x1c)
| (user_path_at+0x14/0x1c) from (vfs_fstatat+0x48/0x94)
| (vfs_fstatat+0x48/0x94) from (SyS_stat64+0x14/0x30)
| (SyS_stat64+0x14/0x30) from (ret_fast_syscall+0x0/0x48)
For now I see no better way but to disable the freezer for the sleep period.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
kernel/hrtimer.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 5c26d2c..083815d 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1899,8 +1899,12 @@ void cpu_chill(void)
struct timespec tu = {
.tv_nsec = NSEC_PER_MSEC,
};
+ unsigned int freeze_flag = current->flags & PF_NOFREEZE;
+ current->flags |= PF_NOFREEZE;
hrtimer_nanosleep(&tu, NULL, HRTIMER_MODE_REL, CLOCK_MONOTONIC);
+ if (!freeze_flag)
+ current->flags &= ~PF_NOFREEZE;
}
EXPORT_SYMBOL(cpu_chill);
#endif
--
1.9.0.rc3
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH RT] kernel/hrtimer: be non-freezeable in cpu_chill()
2014-02-19 11:53 ` [PATCH RT] kernel/hrtimer: be non-freezeable in cpu_chill() Sebastian Andrzej Siewior
@ 2014-02-19 15:27 ` Steven Rostedt
2014-02-19 15:42 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2014-02-19 15:27 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: LKML, linux-rt-users, Thomas Gleixner, Clark Williams,
Luis Claudio R. Goncalves, John Kacur, Ulrich Obergfell
On Wed, 19 Feb 2014 12:53:53 +0100
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> Since we replaced msleep() by hrtimer I see now and then (rarely) this:
>
> | [....] Waiting for /dev to be fully populated...
> | =====================================
> | [ BUG: udevd/229 still has locks held! ]
> | 3.12.11-rt17 #23 Not tainted
> | -------------------------------------
> | 1 lock held by udevd/229:
> | #0: (&type->i_mutex_dir_key#2){+.+.+.}, at: lookup_slow+0x28/0x98
> |
> | stack backtrace:
> | CPU: 0 PID: 229 Comm: udevd Not tainted 3.12.11-rt17 #23
> | (unwind_backtrace+0x0/0xf8) from (show_stack+0x10/0x14)
> | (show_stack+0x10/0x14) from (dump_stack+0x74/0xbc)
> | (dump_stack+0x74/0xbc) from (do_nanosleep+0x120/0x160)
> | (do_nanosleep+0x120/0x160) from (hrtimer_nanosleep+0x90/0x110)
> | (hrtimer_nanosleep+0x90/0x110) from (cpu_chill+0x30/0x38)
> | (cpu_chill+0x30/0x38) from (dentry_kill+0x158/0x1ec)
> | (dentry_kill+0x158/0x1ec) from (dput+0x74/0x15c)
> | (dput+0x74/0x15c) from (lookup_real+0x4c/0x50)
> | (lookup_real+0x4c/0x50) from (__lookup_hash+0x34/0x44)
> | (__lookup_hash+0x34/0x44) from (lookup_slow+0x38/0x98)
> | (lookup_slow+0x38/0x98) from (path_lookupat+0x208/0x7fc)
> | (path_lookupat+0x208/0x7fc) from (filename_lookup+0x20/0x60)
> | (filename_lookup+0x20/0x60) from (user_path_at_empty+0x50/0x7c)
> | (user_path_at_empty+0x50/0x7c) from (user_path_at+0x14/0x1c)
> | (user_path_at+0x14/0x1c) from (vfs_fstatat+0x48/0x94)
> | (vfs_fstatat+0x48/0x94) from (SyS_stat64+0x14/0x30)
> | (SyS_stat64+0x14/0x30) from (ret_fast_syscall+0x0/0x48)
>
> For now I see no better way but to disable the freezer for the sleep period.
Hmm, I wonder if this is because msleep() sets the tasks state to
TASK_UNINTERRUPTIBLE, and hrtimer_nanosleep sets it to
TASK_INTERRUPTIBLE.
>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> kernel/hrtimer.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> index 5c26d2c..083815d 100644
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -1899,8 +1899,12 @@ void cpu_chill(void)
> struct timespec tu = {
> .tv_nsec = NSEC_PER_MSEC,
> };
> + unsigned int freeze_flag = current->flags & PF_NOFREEZE;
>
> + current->flags |= PF_NOFREEZE;
> hrtimer_nanosleep(&tu, NULL, HRTIMER_MODE_REL, CLOCK_MONOTONIC);
> + if (!freeze_flag)
> + current->flags &= ~PF_NOFREEZE;
We could remove the branch with:
current->flags &= ~PF_NOFREEZE | freeze_flag;
But as we just slept for a millisecond, that optimization may not be
worth anything ;-)
Acked-by: Steven Rostedt <rostedt@goodmis.org>
-- Steve
> }
> EXPORT_SYMBOL(cpu_chill);
> #endif
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH RT] kernel/hrtimer: be non-freezeable in cpu_chill()
2014-02-19 15:27 ` Steven Rostedt
@ 2014-02-19 15:42 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-02-19 15:42 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, linux-rt-users, Thomas Gleixner, Clark Williams,
Luis Claudio R. Goncalves, John Kacur, Ulrich Obergfell
On 02/19/2014 04:27 PM, Steven Rostedt wrote:
> Hmm, I wonder if this is because msleep() sets the tasks state to
> TASK_UNINTERRUPTIBLE, and hrtimer_nanosleep sets it to
> TASK_INTERRUPTIBLE.
No, it is because freezable_schedule() is called via do_nanosleep() in
hrtimer call path and msleep() calls schedule() without it.
Sebastian
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-02-19 15:42 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-05 16:51 [PATCH RT] rt: Make cpu_chill() use hrtimer instead of msleep() Steven Rostedt
2014-02-05 16:57 ` Steven Rostedt
2014-02-07 11:30 ` Sebastian Andrzej Siewior
2014-02-07 14:08 ` Steven Rostedt
2014-02-07 14:13 ` Steven Rostedt
2014-02-07 14:21 ` Sebastian Andrzej Siewior
2014-02-19 11:53 ` [PATCH RT] kernel/hrtimer: be non-freezeable in cpu_chill() Sebastian Andrzej Siewior
2014-02-19 15:27 ` Steven Rostedt
2014-02-19 15:42 ` Sebastian Andrzej Siewior
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).