linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wanpeng Li <kernellwp@gmail.com>
To: Paul Gortmaker <paul.gortmaker@windriver.com>,
	linux-kernel@vger.kernel.org
Cc: linux-rt-users@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
Date: Thu, 27 Nov 2014 19:21:09 +0800	[thread overview]
Message-ID: <54770925.1000608@gmail.com> (raw)
In-Reply-To: <1400080115-12339-1-git-send-email-paul.gortmaker@windriver.com>

Hi Paul,
On 5/14/14, 11:08 PM, Paul Gortmaker wrote:
> As of the old commit ac086bc22997a2be24fc40fc8d46522fe7e03d11
> ("sched: rt-group: smp balancing") the concept of borrowing per
> cpu rt_runtime from one core to another was introduced.
>
> However, this prevents the RT throttling message from ever being
> emitted when someone does a common (but mistaken) attempt at
> using too much CPU in RT context.  Consider the following test:
>
>    echo "main() {for(;;);}" > full_load.c
>    gcc full_load.c -o full_load
>    taskset -c 1 ./full_load &
>    chrt -r -p 80 `pidof full_load`

I try this on 3.18-rc6 w/ CONFIG_RCU_CPU_STALL_TIMEOUT=60 and 
SCHED_FEAT(RT_RUNTIME_SHARE, true), however I don't see rcu stall 
warning, where I miss?

Regards,
Wanpeng Li

> When run on x86_64 defconfig, what happens is as follows:
>
> -task runs on core1 for 95% of an rt_period as documented in
>   the file Documentation/scheduler/sched-rt-group.txt
>
> -at 95%, the code in balance_runtime sees this threshold and
>   calls do_balance_runtime()
>
> -do_balance_runtime sees that core 1 is in need, and does this:
> 	---------------
>          if (rt_rq->rt_runtime + diff > rt_period)
>                  diff = rt_period - rt_rq->rt_runtime;
>          iter->rt_runtime -= diff;
>          rt_rq->rt_runtime += diff;
> 	---------------
>   which extends core1's rt_runtime by 5%, making it 100% of rt_period
>   by stealing 5% from core0 (or possibly some other core).
>
> However, the next time core1's rt_rq enters sched_rt_runtime_exceeded(),
> we hit this near the top of that function:
> 	---------------
>          if (runtime >= sched_rt_period(rt_rq))
>                  return 0;
> 	---------------
> and hence we'll _never_ look at/set any of the throttling checks and
> messages in sched_rt_runtime_exceeded().  Instead, we will happily
> plod along for CONFIG_RCU_CPU_STALL_TIMEOUT seconds, at which point
> the RCU subsystem will get angry and trigger an NMI in response to
> what it rightly sees as a WTF situation.
>
> Granted, there are lots of ways you can do bad things to yourself with
> RT, but in the current zeitgeist of multicore systems with people
> dedicating individual cores to individual tasks, I'd say the above is
> common enough that we should react to it sensibly, and an RCU stall
> really doesn't translate well to an end user vs a simple message that
> says "throttling activated".
>
> One way to get the throttle message instead of the ambiguous and lengthy
> NMI triggered all core backtrace of the RCU stall is to change the
> SCHED_FEAT(RT_RUNTIME_SHARE, true) to false.  One could make a good
> case for this being the default for the out-of-tree preempt-rt series,
> since folks using that are more apt to be manually tuning the system
> and won't want an invisible hand coming in and making changes.
>
> However, in mainline, where it is more likely that there will be
> n+x (x>0) RT tasks on an n core system, we can leave the sharing on,
> and still avoid the RCU stalls by noting that there is no point in
> trying to balance when there are no tasks to migrate, or only a
> single RT task is present.  Inflating the rt_runtime does nothing
> in this case other than defeat sched_rt_runtime_exceeded().
>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> ---
>
> [I'd mentioned a similar use case here: https://lkml.org/lkml/2013/3/6/338
>   and tglx asked why they wouldn't see the throttle message; it is only
>   now that I had a chance to dig in and figure out why.  Oh, and the patch
>   is against linux-next, in case that matters...]
>
>   kernel/sched/rt.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index ea4d500..698aac9 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -774,6 +774,15 @@ static int balance_runtime(struct rt_rq *rt_rq)
>   	if (!sched_feat(RT_RUNTIME_SHARE))
>   		return more;
>   
> +	/*
> +	 * Stealing from another core won't help us at all if
> +	 * we have nothing to migrate over there, or only one
> +	 * task that is running up all the rt_time.  In fact it
> +	 * will just inhibit the throttling message in that case.
> +	 */
> +	if (!rt_rq->rt_nr_migratory || rt_rq->rt_nr_total == 1)
> +		return more;
> +
>   	if (rt_rq->rt_time > rt_rq->rt_runtime) {
>   		raw_spin_unlock(&rt_rq->rt_runtime_lock);
>   		more = do_balance_runtime(rt_rq);


      parent reply	other threads:[~2014-11-27 11:21 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-14 15:08 [PATCH] sched/rt: don't try to balance rt_runtime when it is futile Paul Gortmaker
2014-05-14 15:44 ` Paul E. McKenney
2014-05-14 19:11   ` Paul Gortmaker
2014-05-14 19:27     ` Paul E. McKenney
2014-05-15  2:49     ` Mike Galbraith
2014-05-15 14:09       ` Paul Gortmaker
2014-11-27  9:17       ` Wanpeng Li
2014-11-27 15:31         ` Mike Galbraith
2014-11-27 11:36     ` Wanpeng Li
2014-05-15  3:18   ` Mike Galbraith
2014-05-15 14:45     ` Paul E. McKenney
2014-05-15 17:27       ` Mike Galbraith
2014-05-18  4:22     ` Mike Galbraith
2014-05-18  5:20       ` Paul E. McKenney
2014-05-18  8:36         ` Mike Galbraith
2014-05-18 15:58           ` Paul E. McKenney
2014-05-19  2:44             ` Mike Galbraith
2014-05-19  5:34               ` Paul E. McKenney
2014-05-20 14:53                 ` Frederic Weisbecker
2014-05-20 15:53                   ` Paul E. McKenney
2014-05-20 16:24                     ` Frederic Weisbecker
2014-05-20 16:36                       ` Peter Zijlstra
2014-05-20 17:20                       ` Paul E. McKenney
2014-05-21  4:29                         ` Mike Galbraith
2014-05-21  4:18                     ` Mike Galbraith
2014-05-21 12:03                       ` Paul E. McKenney
2014-05-21  3:52                   ` Mike Galbraith
2014-05-19 10:54     ` Peter Zijlstra
2014-05-19 12:40 ` Peter Zijlstra
2014-05-22 19:40   ` Paul Gortmaker
2014-11-27 11:21 ` Wanpeng Li [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54770925.1000608@gmail.com \
    --to=kernellwp@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paul.gortmaker@windriver.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).