Re: [PATCH] sched/deadline: Fix bad accounting of nr_running

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Juri Lelli <juri.lelli@gmail.com>
To: Steven Rostedt <rostedt@goodmis.org>,
	LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH] sched/deadline: Fix bad accounting of nr_running
Date: Mon, 17 Feb 2014 16:47:57 +0100	[thread overview]
Message-ID: <53022F2D.8040301@gmail.com> (raw)
In-Reply-To: <20140214235946.60a89b65@gandalf.local.home>

Hi,

On 02/15/2014 05:59 AM, Steven Rostedt wrote:
> My test suite was locking up hard when enabling mmiotracer. This was due
> to the mmiotracer placing all but one CPU offline. I found this out
> when I was able to reproduce the bug with just my stress-cpu-hotplug
> test. This bug baffled me because it would not always trigger, and
> would only trigger on the first run after boot up. The
> stress-cpu-hotplug test would crash hard the first run, or never crash
> at all. But a new reboot may cause it to crash on the first run again.
> 
> I spent all week bisecting this, as I couldn't find a consistent
> reproducer. I finally narrowed it down to the sched deadline patches,
> and even more peculiar, to the commit that added the sched
> deadline boot up self test to the latency tracer. Then it dawned on me
> to what the bug was.
> 
> All it took was to run a task under sched deadline to screw up the CPU
> hot plugging. This explained why it would lock up only on the first run
> of the stress-cpu-hotplug test. The bug happened when the boot up self
> test of the schedule latency tracer would test a deadline task. The
> deadline task would corrupt something that would cause CPU hotplug to
> fail. If it didn't corrupt it, the stress test would always work
> (there's no other sched deadline tasks that would run to cause
> problems). If it did corrupt on boot up, the first test would lockup
> hard.
> 
> I proved this theory by running my deadline test program on another box,
> and then run the stress-cpu-hotplug test, and it would now consistently
> lock up. I could run stress-cpu-hotplug over and over with no problem,
> but once I ran the deadline test, the next run of the
> stress-cpu-hotplug would lock hard.
> 
> After adding lots of tracing to the code, I found the cause. The
> function tracer showed that migrate_tasks() was stuck in an infinite
> loop, where rq->nr_running never equaled 1 to break out of it. When I
> added a trace_printk() to see what that number was, it was 335 and
> never decrementing!
> 
> Looking at the deadline code I found:
> 
> static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int
> flags) {
> 	dequeue_dl_entity(&p->dl);
> 	dequeue_pushable_dl_task(rq, p);
> }
> 
> static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int
> flags) {
> 	update_curr_dl(rq);
> 	__dequeue_task_dl(rq, p, flags);
> 
> 	dec_nr_running(rq);
> }
> 
> And this:
> 
> 	if (dl_runtime_exceeded(rq, dl_se)) {
> 		__dequeue_task_dl(rq, curr, 0);
> 		if (likely(start_dl_timer(dl_se, curr->dl.dl_boosted)))
> 			dl_se->dl_throttled = 1;
> 		else
> 			enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH);
> 
> 		if (!is_leftmost(curr, &rq->dl))
> 			resched_task(curr);
> 	}
> 
> Notice how we call __dequeue_task_dl() and in the else case we
> call enqueue_task_dl()? Also notice that dequeue_task_dl() has
> underscores where enqueue_task_dl() does not. The enqueue_task_dl()
> calls inc_nr_running(rq), but __dequeue_task_dl() does not. This is
> where we get nr_running out of sync.
> 

Right. I'd add another place that could cause this misalignment:

static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
{
[snip]
        dl_se->dl_throttled = 0;
        if (p->on_rq) {
                enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
                if (task_has_dl_policy(rq->curr))
                        check_preempt_curr_dl(rq, p, 0);
                else
                        resched_task(rq->curr);
[snip]
}

This is called when the replenishment timer for a throttled task fires,
and we have to queue it back on the dl_rq with recharged parameters.
Best test for this bug is to run a while(1) task with SCHED_DEADLINE
(using for example https://github.com/jlelli/schedtool-dl). This causes
a lot of throttle/replenish events and causes nr_running to explode.
All is ok with this fix.

> By moving the dec_nr_running() from dequeue_task_dl() to
> __dequeue_task_dl(), everything works again. That is, I can run the
> deadline test program and then run the stress-cpu-hotplug() and all
> would be fine.
> 

Rationale for this odd behavior is that, when a task is throttled, it
is removed only from the dl_rq, but we keep it on_rq (as this is not
a "full dequeue", that is the task is not actually sleeping). But, it
is also true that, while throttled a task behaves like it is sleeping
(e.g., its timer will fire on a new CPU if the old one is dead). So,
Steven's fix sounds also semantically correct.

Thanks!

Best,

- Juri

> For reference on my test programs:
> 
>   http://rostedt.homelinux.com/private/stress-cpu-hotplug
>   http://rostedt.homelinux.com/private/deadline.c
> 
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 0dd5e09..84c2454 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -844,14 +844,14 @@ static void __dequeue_task_dl(struct rq *rq,
> struct task_struct *p, int flags) {
>  	dequeue_dl_entity(&p->dl);
>  	dequeue_pushable_dl_task(rq, p);
> +
> +	dec_nr_running(rq);
>  }
>  
>  static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int
> flags) {
>  	update_curr_dl(rq);
>  	__dequeue_task_dl(rq, p, flags);
> -
> -	dec_nr_running(rq);
>  }
>  
>  /*
>

next prev parent reply	other threads:[~2014-02-17 15:47 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-15  4:59 [PATCH] sched/deadline: Fix bad accounting of nr_running Steven Rostedt
2014-02-15  9:52 ` Peter Zijlstra
2014-02-15 13:03   ` Steven Rostedt
2014-02-15 13:08   ` [PATCH v2] " Steven Rostedt
2014-02-17 15:47 ` Juri Lelli [this message]
2014-02-19  2:50   ` [PATCH v3] " Steven Rostedt
2014-02-19  8:46     ` Peter Zijlstra
2014-02-19 10:32       ` Juri Lelli
2014-02-19 13:14         ` Juri Lelli
2014-02-19 17:45           ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53022F2D.8040301@gmail.com \
    --to=juri.lelli@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox