From: Juri Lelli <juri.lelli@gmail.com>
To: Steven Rostedt <rostedt@goodmis.org>,
LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Ingo Molnar <mingo@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH] sched/deadline: Fix bad accounting of nr_running
Date: Mon, 17 Feb 2014 16:47:57 +0100 [thread overview]
Message-ID: <53022F2D.8040301@gmail.com> (raw)
In-Reply-To: <20140214235946.60a89b65@gandalf.local.home>
Hi,
On 02/15/2014 05:59 AM, Steven Rostedt wrote:
> My test suite was locking up hard when enabling mmiotracer. This was due
> to the mmiotracer placing all but one CPU offline. I found this out
> when I was able to reproduce the bug with just my stress-cpu-hotplug
> test. This bug baffled me because it would not always trigger, and
> would only trigger on the first run after boot up. The
> stress-cpu-hotplug test would crash hard the first run, or never crash
> at all. But a new reboot may cause it to crash on the first run again.
>
> I spent all week bisecting this, as I couldn't find a consistent
> reproducer. I finally narrowed it down to the sched deadline patches,
> and even more peculiar, to the commit that added the sched
> deadline boot up self test to the latency tracer. Then it dawned on me
> to what the bug was.
>
> All it took was to run a task under sched deadline to screw up the CPU
> hot plugging. This explained why it would lock up only on the first run
> of the stress-cpu-hotplug test. The bug happened when the boot up self
> test of the schedule latency tracer would test a deadline task. The
> deadline task would corrupt something that would cause CPU hotplug to
> fail. If it didn't corrupt it, the stress test would always work
> (there's no other sched deadline tasks that would run to cause
> problems). If it did corrupt on boot up, the first test would lockup
> hard.
>
> I proved this theory by running my deadline test program on another box,
> and then run the stress-cpu-hotplug test, and it would now consistently
> lock up. I could run stress-cpu-hotplug over and over with no problem,
> but once I ran the deadline test, the next run of the
> stress-cpu-hotplug would lock hard.
>
> After adding lots of tracing to the code, I found the cause. The
> function tracer showed that migrate_tasks() was stuck in an infinite
> loop, where rq->nr_running never equaled 1 to break out of it. When I
> added a trace_printk() to see what that number was, it was 335 and
> never decrementing!
>
> Looking at the deadline code I found:
>
> static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int
> flags) {
> dequeue_dl_entity(&p->dl);
> dequeue_pushable_dl_task(rq, p);
> }
>
> static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int
> flags) {
> update_curr_dl(rq);
> __dequeue_task_dl(rq, p, flags);
>
> dec_nr_running(rq);
> }
>
> And this:
>
> if (dl_runtime_exceeded(rq, dl_se)) {
> __dequeue_task_dl(rq, curr, 0);
> if (likely(start_dl_timer(dl_se, curr->dl.dl_boosted)))
> dl_se->dl_throttled = 1;
> else
> enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH);
>
> if (!is_leftmost(curr, &rq->dl))
> resched_task(curr);
> }
>
> Notice how we call __dequeue_task_dl() and in the else case we
> call enqueue_task_dl()? Also notice that dequeue_task_dl() has
> underscores where enqueue_task_dl() does not. The enqueue_task_dl()
> calls inc_nr_running(rq), but __dequeue_task_dl() does not. This is
> where we get nr_running out of sync.
>
Right. I'd add another place that could cause this misalignment:
static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
{
[snip]
dl_se->dl_throttled = 0;
if (p->on_rq) {
enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
if (task_has_dl_policy(rq->curr))
check_preempt_curr_dl(rq, p, 0);
else
resched_task(rq->curr);
[snip]
}
This is called when the replenishment timer for a throttled task fires,
and we have to queue it back on the dl_rq with recharged parameters.
Best test for this bug is to run a while(1) task with SCHED_DEADLINE
(using for example https://github.com/jlelli/schedtool-dl). This causes
a lot of throttle/replenish events and causes nr_running to explode.
All is ok with this fix.
> By moving the dec_nr_running() from dequeue_task_dl() to
> __dequeue_task_dl(), everything works again. That is, I can run the
> deadline test program and then run the stress-cpu-hotplug() and all
> would be fine.
>
Rationale for this odd behavior is that, when a task is throttled, it
is removed only from the dl_rq, but we keep it on_rq (as this is not
a "full dequeue", that is the task is not actually sleeping). But, it
is also true that, while throttled a task behaves like it is sleeping
(e.g., its timer will fire on a new CPU if the old one is dead). So,
Steven's fix sounds also semantically correct.
Thanks!
Best,
- Juri
> For reference on my test programs:
>
> http://rostedt.homelinux.com/private/stress-cpu-hotplug
> http://rostedt.homelinux.com/private/deadline.c
>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 0dd5e09..84c2454 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -844,14 +844,14 @@ static void __dequeue_task_dl(struct rq *rq,
> struct task_struct *p, int flags) {
> dequeue_dl_entity(&p->dl);
> dequeue_pushable_dl_task(rq, p);
> +
> + dec_nr_running(rq);
> }
>
> static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int
> flags) {
> update_curr_dl(rq);
> __dequeue_task_dl(rq, p, flags);
> -
> - dec_nr_running(rq);
> }
>
> /*
>
next prev parent reply other threads:[~2014-02-17 15:47 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-15 4:59 [PATCH] sched/deadline: Fix bad accounting of nr_running Steven Rostedt
2014-02-15 9:52 ` Peter Zijlstra
2014-02-15 13:03 ` Steven Rostedt
2014-02-15 13:08 ` [PATCH v2] " Steven Rostedt
2014-02-17 15:47 ` Juri Lelli [this message]
2014-02-19 2:50 ` [PATCH v3] " Steven Rostedt
2014-02-19 8:46 ` Peter Zijlstra
2014-02-19 10:32 ` Juri Lelli
2014-02-19 13:14 ` Juri Lelli
2014-02-19 17:45 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53022F2D.8040301@gmail.com \
--to=juri.lelli@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.