All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: lkp@lists.01.org
Subject: Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection
Date: Tue, 16 Dec 2014 15:56:53 +0100	[thread overview]
Message-ID: <20141216145653.GY3337@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <alpine.DEB.2.11.1412161510000.17382@nanos>

[-- Attachment #1: Type: text/plain, Size: 2733 bytes --]

On Tue, Dec 16, 2014 at 03:32:28PM +0100, Thomas Gleixner wrote:
> @@ -4997,6 +5025,8 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev)
>  	struct task_struct *p;
>  	int new_tasks;
>  
> +	if (class_fair_disabled())
> +		goto idle;

We don't want to do new idle balancing here I think, just return NULL.

>  again:
>  #ifdef CONFIG_FAIR_GROUP_SCHED
>  	if (!cfs_rq->nr_running)
> 
> The static key is enabled once the powerclamp mess starts. So nobody
> else than powerclamp users are affected by this and rightfully so.
> 
> Not pretty, but better than a gazillion workarounds all over the place
> to make "pretending I'm idle" actually work. This is basically the
> same mechanism as we have with the RT throttler, where a RT hog will
> be put onto hold for some time. We just put all sched other tasks on
> hold while still allowing RT tasks and everything else to work.
> 
> Thoughts?

Other than hating it on sight right? ;-)

So let me try and understand the problem with the emulated idle thing
better (running idle from FIFO threads).

Suppose we are in nohz_full:

 ts->inidle     ts->infullnohz  ts->tick_stopped

  0              1               1               valid

Then the powerclamp fake idle thread comes in, this increase nr_running
and will result in leaving infullnohz and will re-start the
tick_stopped.

 0              0               0               valid

Then we 'start' the idle loop, and end up in:

 1              0               1               valid

No problem there, right? And it looks to be the same in reverse.


I suppose the tricky bit is what happens when the cpu was idle; in that
case we'll end up with 1 running thread in state:

 1              0               1               valid

But need to avoid ending up in:

 1              1               1               BUG

Which should be relatively simple by never entering nohzfull when 'idle'.



However with your proposed thingy, I think we'll end up in:

 1              1               1               BUG

Because we don't start another thread, so infullnohz will stay valid,
however we'll also be 'forced' into idle (with nr_running > 0) and stop
the tick.

A remote wakeup might result in nr_running going from 1->2 and seeing
infullnohz == 1, try and restart the tick, while we're idle!

Of course, we can fix that too, by clearing nohzfull when going 'idle',
after all, nohzfull will re-establish itself automagically when the tick
detects but the one task afterwards.


So both cases need work, neither works out of the box afaict. But I
can't see one really being better than the other either -- am I missing
obvious things again?

WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Fengguang Wu <fengguang.wu@intel.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	"Pan, Jacob jun" <jacob.jun.pan@intel.com>,
	LKML <linux-kernel@vger.kernel.org>, LKP <lkp@01.org>
Subject: Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection
Date: Tue, 16 Dec 2014 15:56:53 +0100	[thread overview]
Message-ID: <20141216145653.GY3337@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <alpine.DEB.2.11.1412161510000.17382@nanos>

On Tue, Dec 16, 2014 at 03:32:28PM +0100, Thomas Gleixner wrote:
> @@ -4997,6 +5025,8 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev)
>  	struct task_struct *p;
>  	int new_tasks;
>  
> +	if (class_fair_disabled())
> +		goto idle;

We don't want to do new idle balancing here I think, just return NULL.

>  again:
>  #ifdef CONFIG_FAIR_GROUP_SCHED
>  	if (!cfs_rq->nr_running)
> 
> The static key is enabled once the powerclamp mess starts. So nobody
> else than powerclamp users are affected by this and rightfully so.
> 
> Not pretty, but better than a gazillion workarounds all over the place
> to make "pretending I'm idle" actually work. This is basically the
> same mechanism as we have with the RT throttler, where a RT hog will
> be put onto hold for some time. We just put all sched other tasks on
> hold while still allowing RT tasks and everything else to work.
> 
> Thoughts?

Other than hating it on sight right? ;-)

So let me try and understand the problem with the emulated idle thing
better (running idle from FIFO threads).

Suppose we are in nohz_full:

 ts->inidle     ts->infullnohz  ts->tick_stopped

  0              1               1               valid

Then the powerclamp fake idle thread comes in, this increase nr_running
and will result in leaving infullnohz and will re-start the
tick_stopped.

 0              0               0               valid

Then we 'start' the idle loop, and end up in:

 1              0               1               valid

No problem there, right? And it looks to be the same in reverse.


I suppose the tricky bit is what happens when the cpu was idle; in that
case we'll end up with 1 running thread in state:

 1              0               1               valid

But need to avoid ending up in:

 1              1               1               BUG

Which should be relatively simple by never entering nohzfull when 'idle'.



However with your proposed thingy, I think we'll end up in:

 1              1               1               BUG

Because we don't start another thread, so infullnohz will stay valid,
however we'll also be 'forced' into idle (with nr_running > 0) and stop
the tick.

A remote wakeup might result in nr_running going from 1->2 and seeing
infullnohz == 1, try and restart the tick, while we're idle!

Of course, we can fix that too, by clearing nohzfull when going 'idle',
after all, nohzfull will re-establish itself automagically when the tick
detects but the one task afterwards.


So both cases need work, neither works out of the box afaict. But I
can't see one really being better than the other either -- am I missing
obvious things again?

  reply	other threads:[~2014-12-16 14:56 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-11 19:42 [nohz] 2a16fc93d2c: kernel lockup on idle injection Fengguang Wu
2014-12-11 19:42 ` Fengguang Wu
2014-12-12 11:57 ` Viresh Kumar
2014-12-12 11:57   ` Viresh Kumar
2014-12-15  7:25   ` Preeti U Murthy
2014-12-15  7:25     ` Preeti U Murthy
2014-12-15  9:32     ` Viresh Kumar
2014-12-15  9:32       ` Viresh Kumar
2014-12-15  9:43       ` Preeti U Murthy
2014-12-15  9:43         ` Preeti U Murthy
2014-12-15 21:24         ` Pan, Jacob jun
2014-12-15 21:24           ` Pan, Jacob jun
2014-12-16  4:18           ` Viresh Kumar
2014-12-16  4:18             ` Viresh Kumar
2014-12-16 17:15             ` Jacob Pan
2014-12-16 17:15               ` Jacob Pan
2014-12-16 21:15               ` Thomas Gleixner
2014-12-16 21:15                 ` Thomas Gleixner
2014-12-15 23:44       ` Frederic Weisbecker
2014-12-15 23:44         ` Frederic Weisbecker
2014-12-16  4:53         ` Viresh Kumar
2014-12-16  4:53           ` Viresh Kumar
2014-12-16  9:36           ` Preeti U Murthy
2014-12-16  9:36             ` Preeti U Murthy
2014-12-16 12:49             ` Thomas Gleixner
2014-12-16 12:49               ` Thomas Gleixner
2014-12-16 14:20               ` Frederic Weisbecker
2014-12-16 14:20                 ` Frederic Weisbecker
2014-12-16 14:50                 ` Thomas Gleixner
2014-12-16 14:50                   ` Thomas Gleixner
2014-12-16 21:21                   ` Thomas Gleixner
2014-12-16 21:21                     ` Thomas Gleixner
2014-12-16 22:49                     ` Peter Zijlstra
2014-12-16 22:49                       ` Peter Zijlstra
2014-12-16 22:54                       ` Thomas Gleixner
2014-12-16 22:54                         ` Thomas Gleixner
2014-12-17  0:26                         ` Frederic Weisbecker
2014-12-17  0:26                           ` Frederic Weisbecker
2014-12-17  0:12                     ` Frederic Weisbecker
2014-12-17  0:12                       ` Frederic Weisbecker
2014-12-17  9:11                       ` Thomas Gleixner
2014-12-17  9:11                         ` Thomas Gleixner
2014-12-17 12:47                         ` Frederic Weisbecker
2014-12-17 12:47                           ` Frederic Weisbecker
2014-12-16 14:32               ` Thomas Gleixner
2014-12-16 14:32                 ` Thomas Gleixner
2014-12-16 14:56                 ` Peter Zijlstra [this message]
2014-12-16 14:56                   ` Peter Zijlstra
2014-12-16 16:54                   ` Thomas Gleixner
2014-12-16 16:54                     ` Thomas Gleixner
2014-12-17 12:31               ` Preeti Murthy
2014-12-17 12:31                 ` Preeti Murthy
2014-12-17 15:42                 ` Thomas Gleixner
2014-12-17 15:42                   ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141216145653.GY3337@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=lkp@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.