From: Peter Zijlstra <peterz@infradead.org>
To: mingo@kernel.org, tglx@linutronix.de, juri.lelli@arm.com,
rostedt@goodmis.org, xlpang@redhat.com
Cc: linux-kernel@vger.kernel.org, mathieu.desnoyers@efficios.com,
jdesfossez@efficios.com, bristot@redhat.com,
peterz@infradead.org, Ingo Molnar <mingo@redhat.com>
Subject: [RFC][PATCH 2/8] sched/rtmutex/deadline: Fix a PI crash for deadline tasks
Date: Tue, 07 Jun 2016 21:56:37 +0200 [thread overview]
Message-ID: <20160607200215.719626477@infradead.org> (raw)
In-Reply-To: 20160607195635.710022345@infradead.org
[-- Attachment #1: xunlei_pang-sched_rtmutex_deadline-fix_a_pi_crash_for_deadline_tasks.patch --]
[-- Type: text/plain, Size: 8031 bytes --]
From: Xunlei Pang <xlpang@redhat.com>
A crash happened while I was playing with deadline PI rtmutex.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
IP: [<ffffffff810eeb8f>] rt_mutex_get_top_task+0x1f/0x30
PGD 232a75067 PUD 230947067 PMD 0
Oops: 0000 [#1] SMP
CPU: 1 PID: 10994 Comm: a.out Not tainted
Call Trace:
[<ffffffff810b658c>] enqueue_task+0x2c/0x80
[<ffffffff810ba763>] activate_task+0x23/0x30
[<ffffffff810d0ab5>] pull_dl_task+0x1d5/0x260
[<ffffffff810d0be6>] pre_schedule_dl+0x16/0x20
[<ffffffff8164e783>] __schedule+0xd3/0x900
[<ffffffff8164efd9>] schedule+0x29/0x70
[<ffffffff8165035b>] __rt_mutex_slowlock+0x4b/0xc0
[<ffffffff81650501>] rt_mutex_slowlock+0xd1/0x190
[<ffffffff810eeb33>] rt_mutex_timed_lock+0x53/0x60
[<ffffffff810ecbfc>] futex_lock_pi.isra.18+0x28c/0x390
[<ffffffff810ed8b0>] do_futex+0x190/0x5b0
[<ffffffff810edd50>] SyS_futex+0x80/0x180
This is because rt_mutex_enqueue_pi() and rt_mutex_dequeue_pi()
are only protected by pi_lock when operating pi waiters, while
rt_mutex_get_top_task(), will access them with rq lock held but
not holding pi_lock.
In order to tackle it, we introduce new "pi_top_task" pointer
cached in task_struct, and add new rt_mutex_update_top_task()
to update its value, it can be called by rt_mutex_setprio()
which held both owner's pi_lock and rq lock. Thus "pi_top_task"
can be safely accessed by enqueue_task_dl() under rq lock.
[XXX this next section is unparsable]
One problem is when rt_mutex_adjust_prio()->...->rt_mutex_setprio(),
at that time rtmutex lock was released and owner was marked off,
this can cause "pi_top_task" dereferenced to be a running one(as it
can be falsely woken up by others before rt_mutex_setprio() is
made to update "pi_top_task"). We solve this by directly calling
__rt_mutex_adjust_prio() in mark_wakeup_next_waiter() which held
pi_lock and rtmutex lock, and remove rt_mutex_adjust_prio(). Since
now we moved the deboost point, in order to avoid current to be
preempted due to deboost earlier before wake_up_q(), we also moved
preempt_disable() before unlocking rtmutex.
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@arm.com>
Originally-From: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1461659449-19497-2-git-send-email-xlpang@redhat.com
---
include/linux/init_task.h | 1
include/linux/sched.h | 2 +
include/linux/sched/rt.h | 1
kernel/fork.c | 1
kernel/locking/rtmutex.c | 65 +++++++++++++++++++---------------------------
kernel/sched/core.c | 2 +
6 files changed, 34 insertions(+), 38 deletions(-)
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -162,6 +162,7 @@ extern struct task_group root_task_group
#ifdef CONFIG_RT_MUTEXES
# define INIT_RT_MUTEXES(tsk) \
.pi_waiters = RB_ROOT, \
+ .pi_top_task = NULL, \
.pi_waiters_leftmost = NULL,
#else
# define INIT_RT_MUTEXES(tsk)
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1681,6 +1681,8 @@ struct task_struct {
/* PI waiters blocked on a rt_mutex held by this task */
struct rb_root pi_waiters;
struct rb_node *pi_waiters_leftmost;
+ /* Updated under owner's pi_lock and rq lock */
+ struct task_struct *pi_top_task;
/* Deadlock detection and priority inheritance handling */
struct rt_mutex_waiter *pi_blocked_on;
#endif
--- a/include/linux/sched/rt.h
+++ b/include/linux/sched/rt.h
@@ -19,6 +19,7 @@ static inline int rt_task(struct task_st
extern int rt_mutex_getprio(struct task_struct *p);
extern void rt_mutex_setprio(struct task_struct *p, int prio);
extern int rt_mutex_get_effective_prio(struct task_struct *task, int newprio);
+extern void rt_mutex_update_top_task(struct task_struct *p);
extern struct task_struct *rt_mutex_get_top_task(struct task_struct *task);
extern void rt_mutex_adjust_pi(struct task_struct *p);
static inline bool tsk_is_pi_blocked(struct task_struct *tsk)
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1219,6 +1219,7 @@ static void rt_mutex_init_task(struct ta
#ifdef CONFIG_RT_MUTEXES
p->pi_waiters = RB_ROOT;
p->pi_waiters_leftmost = NULL;
+ p->pi_top_task = NULL;
p->pi_blocked_on = NULL;
#endif
}
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -256,6 +256,16 @@ rt_mutex_dequeue_pi(struct task_struct *
RB_CLEAR_NODE(&waiter->pi_tree_entry);
}
+void rt_mutex_update_top_task(struct task_struct *p)
+{
+ if (!task_has_pi_waiters(p)) {
+ p->pi_top_task = NULL;
+ return;
+ }
+
+ p->pi_top_task = task_top_pi_waiter(p)->task;
+}
+
/*
* Calculate task priority from the waiter tree priority
*
@@ -273,10 +283,7 @@ int rt_mutex_getprio(struct task_struct
struct task_struct *rt_mutex_get_top_task(struct task_struct *task)
{
- if (likely(!task_has_pi_waiters(task)))
- return NULL;
-
- return task_top_pi_waiter(task)->task;
+ return task->pi_top_task;
}
/*
@@ -285,12 +292,12 @@ struct task_struct *rt_mutex_get_top_tas
*/
int rt_mutex_get_effective_prio(struct task_struct *task, int newprio)
{
- if (!task_has_pi_waiters(task))
+ struct task_struct *top_task = rt_mutex_get_top_task(task);
+
+ if (!top_task)
return newprio;
- if (task_top_pi_waiter(task)->task->prio <= newprio)
- return task_top_pi_waiter(task)->task->prio;
- return newprio;
+ return min(top_task->prio, newprio);
}
/*
@@ -307,24 +314,6 @@ static void __rt_mutex_adjust_prio(struc
}
/*
- * Adjust task priority (undo boosting). Called from the exit path of
- * rt_mutex_slowunlock() and rt_mutex_slowlock().
- *
- * (Note: We do this outside of the protection of lock->wait_lock to
- * allow the lock to be taken while or before we readjust the priority
- * of task. We do not use the spin_xx_mutex() variants here as we are
- * outside of the debug path.)
- */
-void rt_mutex_adjust_prio(struct task_struct *task)
-{
- unsigned long flags;
-
- raw_spin_lock_irqsave(&task->pi_lock, flags);
- __rt_mutex_adjust_prio(task);
- raw_spin_unlock_irqrestore(&task->pi_lock, flags);
-}
-
-/*
* Deadlock detection is conditional:
*
* If CONFIG_DEBUG_RT_MUTEXES=n, deadlock detection is only conducted
@@ -987,6 +976,7 @@ static void mark_wakeup_next_waiter(stru
* lock->wait_lock.
*/
rt_mutex_dequeue_pi(current, waiter);
+ __rt_mutex_adjust_prio(current);
/*
* As we are waking up the top waiter, and the waiter stays
@@ -1325,6 +1315,16 @@ static bool __sched rt_mutex_slowunlock(
*/
mark_wakeup_next_waiter(wake_q, lock);
+ /*
+ * We should deboost before waking the top waiter task such that
+ * we don't run two tasks with the 'same' priority. This however
+ * can lead to prio-inversion if we would get preempted after
+ * the deboost but before waking our high-prio task, hence the
+ * preempt_disable before unlock. Pairs with preempt_enable() in
+ * rt_mutex_postunlock();
+ */
+ preempt_disable();
+
raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
/* check PI boosting */
@@ -1400,20 +1400,9 @@ rt_mutex_fastunlock(struct rt_mutex *loc
*/
void rt_mutex_postunlock(struct wake_q_head *wake_q, bool deboost)
{
- /*
- * We should deboost before waking the top waiter task such that
- * we don't run two tasks with the 'same' priority. This however
- * can lead to prio-inversion if we would get preempted after
- * the deboost but before waking our high-prio task, hence the
- * preempt_disable.
- */
- if (deboost) {
- preempt_disable();
- rt_mutex_adjust_prio(current);
- }
-
wake_up_q(wake_q);
+ /* Pairs with preempt_disable() in rt_mutex_slowunlock() */
if (deboost)
preempt_enable();
}
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3568,6 +3568,8 @@ void rt_mutex_setprio(struct task_struct
goto out_unlock;
}
+ rt_mutex_update_top_task(p);
+
trace_sched_pi_setprio(p, prio);
oldprio = p->prio;
next prev parent reply other threads:[~2016-06-07 20:20 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-07 19:56 [RFC][PATCH 0/8] PI and assorted failings Peter Zijlstra
2016-06-07 19:56 ` [RFC][PATCH 1/8] rtmutex: Deboost before waking up the top waiter Peter Zijlstra
2016-06-14 9:09 ` Juri Lelli
2016-06-14 12:54 ` Peter Zijlstra
2016-06-14 13:20 ` Juri Lelli
2016-06-14 13:59 ` Peter Zijlstra
2016-06-14 16:36 ` Davidlohr Bueso
2016-06-14 17:01 ` Juri Lelli
2016-06-14 18:22 ` Steven Rostedt
2016-06-07 19:56 ` Peter Zijlstra [this message]
2016-06-14 10:21 ` [RFC][PATCH 2/8] sched/rtmutex/deadline: Fix a PI crash for deadline tasks Juri Lelli
2016-06-14 12:30 ` Peter Zijlstra
2016-06-14 12:53 ` Xunlei Pang
2016-06-14 13:07 ` Juri Lelli
2016-06-14 16:39 ` Juri Lelli
2016-06-14 18:42 ` Steven Rostedt
2016-06-14 20:28 ` Peter Zijlstra
2016-06-15 16:14 ` Steven Rostedt
2016-06-07 19:56 ` [RFC][PATCH 3/8] sched/deadline/rtmutex: Dont miss the dl_runtime/dl_period update Peter Zijlstra
2016-06-14 10:43 ` Juri Lelli
2016-06-14 12:09 ` Peter Zijlstra
2016-06-15 16:30 ` Steven Rostedt
2016-06-15 17:55 ` Peter Zijlstra
2016-06-07 19:56 ` [RFC][PATCH 4/8] rtmutex: Remove rt_mutex_fastunlock() Peter Zijlstra
2016-06-15 16:43 ` Steven Rostedt
2016-06-07 19:56 ` [RFC][PATCH 5/8] rtmutex: Clean up Peter Zijlstra
2016-06-14 12:08 ` Juri Lelli
2016-06-14 12:32 ` Peter Zijlstra
2016-06-14 12:41 ` Juri Lelli
2016-06-07 19:56 ` [RFC][PATCH 6/8] sched/rtmutex: Refactor rt_mutex_setprio() Peter Zijlstra
2016-06-14 13:14 ` Juri Lelli
2016-06-14 14:08 ` Peter Zijlstra
2016-06-07 19:56 ` [RFC][PATCH 7/8] sched,tracing: Update trace_sched_pi_setprio() Peter Zijlstra
2016-06-07 19:56 ` [RFC][PATCH 8/8] rtmutex: Fix PI chain order integrity Peter Zijlstra
2016-06-14 17:39 ` Juri Lelli
2016-06-14 19:44 ` Peter Zijlstra
2016-06-15 7:25 ` Juri Lelli
2016-06-27 12:23 ` Peter Zijlstra
2016-06-27 12:40 ` Thomas Gleixner
2016-06-28 9:05 ` Juri Lelli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160607200215.719626477@infradead.org \
--to=peterz@infradead.org \
--cc=bristot@redhat.com \
--cc=jdesfossez@efficios.com \
--cc=juri.lelli@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mingo@kernel.org \
--cc=mingo@redhat.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=xlpang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox