From: Juri Lelli <juri.lelli@gmail.com>
To: Steven Rostedt <rostedt@goodmis.org>,
LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Ingo Molnar <mingo@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH] sched/deadline: Fix bad accounting of nr_running
Date: Mon, 17 Feb 2014 16:47:57 +0100 [thread overview]
Message-ID: <53022F2D.8040301@gmail.com> (raw)
In-Reply-To: <20140214235946.60a89b65@gandalf.local.home>
Hi,
On 02/15/2014 05:59 AM, Steven Rostedt wrote:
> My test suite was locking up hard when enabling mmiotracer. This was due
> to the mmiotracer placing all but one CPU offline. I found this out
> when I was able to reproduce the bug with just my stress-cpu-hotplug
> test. This bug baffled me because it would not always trigger, and
> would only trigger on the first run after boot up. The
> stress-cpu-hotplug test would crash hard the first run, or never crash
> at all. But a new reboot may cause it to crash on the first run again.
>
> I spent all week bisecting this, as I couldn't find a consistent
> reproducer. I finally narrowed it down to the sched deadline patches,
> and even more peculiar, to the commit that added the sched
> deadline boot up self test to the latency tracer. Then it dawned on me
> to what the bug was.
>
> All it took was to run a task under sched deadline to screw up the CPU
> hot plugging. This explained why it would lock up only on the first run
> of the stress-cpu-hotplug test. The bug happened when the boot up self
> test of the schedule latency tracer would test a deadline task. The
> deadline task would corrupt something that would cause CPU hotplug to
> fail. If it didn't corrupt it, the stress test would always work
> (there's no other sched deadline tasks that would run to cause
> problems). If it did corrupt on boot up, the first test would lockup
> hard.
>
> I proved this theory by running my deadline test program on another box,
> and then run the stress-cpu-hotplug test, and it would now consistently
> lock up. I could run stress-cpu-hotplug over and over with no problem,
> but once I ran the deadline test, the next run of the
> stress-cpu-hotplug would lock hard.
>
> After adding lots of tracing to the code, I found the cause. The
> function tracer showed that migrate_tasks() was stuck in an infinite
> loop, where rq->nr_running never equaled 1 to break out of it. When I
> added a trace_printk() to see what that number was, it was 335 and
> never decrementing!
>
> Looking at the deadline code I found:
>
> static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int
> flags) {
> dequeue_dl_entity(&p->dl);
> dequeue_pushable_dl_task(rq, p);
> }
>
> static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int
> flags) {
> update_curr_dl(rq);
> __dequeue_task_dl(rq, p, flags);
>
> dec_nr_running(rq);
> }
>
> And this:
>
> if (dl_runtime_exceeded(rq, dl_se)) {
> __dequeue_task_dl(rq, curr, 0);
> if (likely(start_dl_timer(dl_se, curr->dl.dl_boosted)))
> dl_se->dl_throttled = 1;
> else
> enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH);
>
> if (!is_leftmost(curr, &rq->dl))
> resched_task(curr);
> }
>
> Notice how we call __dequeue_task_dl() and in the else case we
> call enqueue_task_dl()? Also notice that dequeue_task_dl() has
> underscores where enqueue_task_dl() does not. The enqueue_task_dl()
> calls inc_nr_running(rq), but __dequeue_task_dl() does not. This is
> where we get nr_running out of sync.
>
Right. I'd add another place that could cause this misalignment:
static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
{
[snip]
dl_se->dl_throttled = 0;
if (p->on_rq) {
enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
if (task_has_dl_policy(rq->curr))
check_preempt_curr_dl(rq, p, 0);
else
resched_task(rq->curr);
[snip]
}
This is called when the replenishment timer for a throttled task fires,
and we have to queue it back on the dl_rq with recharged parameters.
Best test for this bug is to run a while(1) task with SCHED_DEADLINE
(using for example https://github.com/jlelli/schedtool-dl). This causes
a lot of throttle/replenish events and causes nr_running to explode.
All is ok with this fix.
> By moving the dec_nr_running() from dequeue_task_dl() to
> __dequeue_task_dl(), everything works again. That is, I can run the
> deadline test program and then run the stress-cpu-hotplug() and all
> would be fine.
>
Rationale for this odd behavior is that, when a task is throttled, it
is removed only from the dl_rq, but we keep it on_rq (as this is not
a "full dequeue", that is the task is not actually sleeping). But, it
is also true that, while throttled a task behaves like it is sleeping
(e.g., its timer will fire on a new CPU if the old one is dead). So,
Steven's fix sounds also semantically correct.
Thanks!
Best,
- Juri
> For reference on my test programs:
>
> http://rostedt.homelinux.com/private/stress-cpu-hotplug
> http://rostedt.homelinux.com/private/deadline.c
>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 0dd5e09..84c2454 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -844,14 +844,14 @@ static void __dequeue_task_dl(struct rq *rq,
> struct task_struct *p, int flags) {
> dequeue_dl_entity(&p->dl);
> dequeue_pushable_dl_task(rq, p);
> +
> + dec_nr_running(rq);
> }
>
> static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int
> flags) {
> update_curr_dl(rq);
> __dequeue_task_dl(rq, p, flags);
> -
> - dec_nr_running(rq);
> }
>
> /*
>
next prev parent reply other threads:[~2014-02-17 15:47 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-15 4:59 [PATCH] sched/deadline: Fix bad accounting of nr_running Steven Rostedt
2014-02-15 9:52 ` Peter Zijlstra
2014-02-15 13:03 ` Steven Rostedt
2014-02-15 13:08 ` [PATCH v2] " Steven Rostedt
2014-02-17 15:47 ` Juri Lelli [this message]
2014-02-19 2:50 ` [PATCH v3] " Steven Rostedt
2014-02-19 8:46 ` Peter Zijlstra
2014-02-19 10:32 ` Juri Lelli
2014-02-19 13:14 ` Juri Lelli
2014-02-19 17:45 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53022F2D.8040301@gmail.com \
--to=juri.lelli@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox