From: Steven Rostedt <rostedt@goodmis.org>
To: linux-kernel@vger.kernel.org,
linux-rt-users <linux-rt-users@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Carsten Emde <C.Emde@osadl.org>, John Kacur <jkacur@redhat.com>,
<stable@vger.kernel.org>, <stable-rt@vger.kernel.org>
Subject: [PATCH RT 1/2] sched: Queue RT tasks to head when prio drops
Date: Tue, 11 Dec 2012 19:45:58 -0500 [thread overview]
Message-ID: <20121212004820.598190405@goodmis.org> (raw)
In-Reply-To: 20121212004557.110050982@goodmis.org
[-- Attachment #1: 0001-sched-Queue-RT-tasks-to-head-when-prio-drops.patch --]
[-- Type: text/plain, Size: 2838 bytes --]
From: Thomas Gleixner <tglx@linutronix.de>
The following scenario does not work correctly:
Runqueue of CPU1 contains two runnable and pinned tasks:
T1: SCHED_FIFO, prio 80
T2: SCHED_FIFO, prio 80
T1 is on the cpu and executes the following syscalls (classic priority
ceiling scenario):
sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 90);
...
sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 80);
...
Now T1 gets preempted by T3 (SCHED_FIFO, prio 95). After T3 goes back
to sleep the scheduler picks T2. Surprise!
The same happens w/o actual preemption when T1 is forced into the
scheduler due to a sporadic NEED_RESCHED event. The scheduler invokes
pick_next_task() which returns T2. So T1 gets preempted and scheduled
out.
This happens because sched_setscheduler() dequeues T1 from the prio 90
list and then enqueues it on the tail of the prio 80 list behind T2.
This violates the POSIX spec and surprises user space which relies on
the guarantee that SCHED_FIFO tasks are not scheduled out unless they
give the CPU up voluntarily or are preempted by a higher priority
task. In the latter case the preempted task must get back on the CPU
after the preempting task schedules out again.
We fixed a similar issue already in commit 60db48c(sched: Queue a
deboosted task to the head of the RT prio queue). The same treatment
is necessary for sched_setscheduler().
While analyzing the problem I noticed that the fix in
rt_mutex_setprio() is one off. The head queueing depends on old
priority greater than new priority (user space view), but in fact it
needs to have the same treatment for equal priority. Instead of
blindly changing the condition to <= it's better to avoid the whole
dequeue/requeue business for the equal priority case completely.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
kernel/sched/core.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1f9d6f5..054e669 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4117,6 +4117,8 @@ void rt_mutex_setprio(struct task_struct *p, int prio)
trace_sched_pi_setprio(p, prio);
oldprio = p->prio;
+ if (oldprio == prio)
+ goto out_unlock;
prev_class = p->sched_class;
on_rq = p->on_rq;
running = task_current(rq, p);
@@ -4472,6 +4474,13 @@ recheck:
task_rq_unlock(rq, p, &flags);
goto recheck;
}
+
+ p->sched_reset_on_fork = reset_on_fork;
+
+ oldprio = p->prio;
+ if (oldprio == param->sched_priority)
+ goto out;
+
on_rq = p->on_rq;
running = task_current(rq, p);
if (on_rq)
@@ -4479,18 +4488,17 @@ recheck:
if (running)
p->sched_class->put_prev_task(rq, p);
- p->sched_reset_on_fork = reset_on_fork;
WARNING: multiple messages have this Message-ID (diff)
From: Steven Rostedt <rostedt@goodmis.org>
To: linux-kernel@vger.kernel.org,
linux-rt-users <linux-rt-users@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Carsten Emde <C.Emde@osadl.org>, John Kacur <jkacur@redhat.com>,
<stable@vger.kernel.org>, <stable-rt@vger.kernel.org>
Subject: [PATCH RT 1/2] sched: Queue RT tasks to head when prio drops
Date: Tue, 11 Dec 2012 19:45:58 -0500 [thread overview]
Message-ID: <20121212004820.598190405@goodmis.org> (raw)
In-Reply-To: 20121212004557.110050982@goodmis.org
[-- Attachment #1: 0001-sched-Queue-RT-tasks-to-head-when-prio-drops.patch --]
[-- Type: text/plain, Size: 3262 bytes --]
From: Thomas Gleixner <tglx@linutronix.de>
The following scenario does not work correctly:
Runqueue of CPU1 contains two runnable and pinned tasks:
T1: SCHED_FIFO, prio 80
T2: SCHED_FIFO, prio 80
T1 is on the cpu and executes the following syscalls (classic priority
ceiling scenario):
sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 90);
...
sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 80);
...
Now T1 gets preempted by T3 (SCHED_FIFO, prio 95). After T3 goes back
to sleep the scheduler picks T2. Surprise!
The same happens w/o actual preemption when T1 is forced into the
scheduler due to a sporadic NEED_RESCHED event. The scheduler invokes
pick_next_task() which returns T2. So T1 gets preempted and scheduled
out.
This happens because sched_setscheduler() dequeues T1 from the prio 90
list and then enqueues it on the tail of the prio 80 list behind T2.
This violates the POSIX spec and surprises user space which relies on
the guarantee that SCHED_FIFO tasks are not scheduled out unless they
give the CPU up voluntarily or are preempted by a higher priority
task. In the latter case the preempted task must get back on the CPU
after the preempting task schedules out again.
We fixed a similar issue already in commit 60db48c(sched: Queue a
deboosted task to the head of the RT prio queue). The same treatment
is necessary for sched_setscheduler().
While analyzing the problem I noticed that the fix in
rt_mutex_setprio() is one off. The head queueing depends on old
priority greater than new priority (user space view), but in fact it
needs to have the same treatment for equal priority. Instead of
blindly changing the condition to <= it's better to avoid the whole
dequeue/requeue business for the equal priority case completely.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
kernel/sched/core.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1f9d6f5..054e669 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4117,6 +4117,8 @@ void rt_mutex_setprio(struct task_struct *p, int prio)
trace_sched_pi_setprio(p, prio);
oldprio = p->prio;
+ if (oldprio == prio)
+ goto out_unlock;
prev_class = p->sched_class;
on_rq = p->on_rq;
running = task_current(rq, p);
@@ -4472,6 +4474,13 @@ recheck:
task_rq_unlock(rq, p, &flags);
goto recheck;
}
+
+ p->sched_reset_on_fork = reset_on_fork;
+
+ oldprio = p->prio;
+ if (oldprio == param->sched_priority)
+ goto out;
+
on_rq = p->on_rq;
running = task_current(rq, p);
if (on_rq)
@@ -4479,18 +4488,17 @@ recheck:
if (running)
p->sched_class->put_prev_task(rq, p);
- p->sched_reset_on_fork = reset_on_fork;
-
- oldprio = p->prio;
prev_class = p->sched_class;
__setscheduler(rq, p, policy, param->sched_priority);
if (running)
p->sched_class->set_curr_task(rq);
if (on_rq)
- enqueue_task(rq, p, 0);
+ enqueue_task(rq, p, oldprio < param->sched_priority ?
+ ENQUEUE_HEAD : 0);
check_class_changed(rq, p, prev_class, oldprio);
+out:
task_rq_unlock(rq, p, &flags);
rt_mutex_adjust_pi(p);
--
1.7.10.4
next prev parent reply other threads:[~2012-12-12 0:48 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-12 0:45 [PATCH RT 0/2] [ANNOUNCE] 3.4.22-rt34-rc1 stable review Steven Rostedt
2012-12-12 0:45 ` Steven Rostedt [this message]
2012-12-12 0:45 ` [PATCH RT 1/2] sched: Queue RT tasks to head when prio drops Steven Rostedt
2012-12-14 8:13 ` AW: " eg Engleder Gerhard
2012-12-14 22:23 ` Steven Rostedt
2012-12-12 0:45 ` [PATCH RT 2/2] Linux 3.4.22-rt34-rc1 Steven Rostedt
2012-12-14 22:19 ` [PATCH RT 0/2] [ANNOUNCE] 3.4.22-rt34-rc1 stable review Steven Rostedt
-- strict thread matches above, loose matches on Subject: below --
2012-12-12 0:50 [PATCH RT 0/2] [ANNOUNCE] 3.2.35-rt53-rc1 stable release Steven Rostedt
2012-12-12 0:50 ` [PATCH RT 1/2] sched: Queue RT tasks to head when prio drops Steven Rostedt
2012-12-12 0:53 [PATCH RT 0/2] [ANNOUNCE] 3.0.55-rt80-rc1 stable review Steven Rostedt
2012-12-12 0:53 ` [PATCH RT 1/2] sched: Queue RT tasks to head when prio drops Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121212004820.598190405@goodmis.org \
--to=rostedt@goodmis.org \
--cc=C.Emde@osadl.org \
--cc=jkacur@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rt-users@vger.kernel.org \
--cc=stable-rt@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.