All of lore.kernel.org
 help / color / mirror / Atom feed
From: Benjamin Segall <bsegall@google.com>
To: Phil Auld <pauld@redhat.com>
Cc: linux-kernel@vger.kernel.org, Juri Lelli <juri.lelli@redhat.com>,
	Ingo Molnar <mingo@redhat.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mel Gorman <mgorman@suse.de>
Subject: Re: [PATCH v2] Sched/fair: Block nohz tick_stop when cfs bandwidth in use
Date: Thu, 29 Jun 2023 10:55:44 -0700	[thread overview]
Message-ID: <xm26fs6a8867.fsf@google.com> (raw)
In-Reply-To: <20230629005342.GB8069@lorien.usersys.redhat.com> (Phil Auld's message of "Wed, 28 Jun 2023 20:53:42 -0400")

Phil Auld <pauld@redhat.com> writes:

> On Wed, Jun 28, 2023 at 02:42:16PM -0700 Benjamin Segall wrote:
>> Phil Auld <pauld@redhat.com> writes:
>> 
>> > CFS bandwidth limits and NOHZ full don't play well together.  Tasks
>> > can easily run well past their quotas before a remote tick does
>> > accounting.  This leads to long, multi-period stalls before such
>> > tasks can run again. Currentlyi, when presented with these conflicting
>> > requirements the scheduler is favoring nohz_full and letting the tick
>> > be stopped. However, nohz tick stopping is already best-effort, there
>> > are a number of conditions that can prevent it, whereas cfs runtime
>> > bandwidth is expected to be enforced.
>> >
>> > Make the scheduler favor bandwidth over stopping the tick by setting
>> > TICK_DEP_BIT_SCHED when the only running task is a cfs task with
>> > runtime limit enabled.
>> >
>> > Add sched_feat HZ_BW (off by default) to control this behavior.
>> >
>> > Signed-off-by: Phil Auld <pauld@redhat.com>
>> > Cc: Ingo Molnar <mingo@redhat.com>
>> > Cc: Peter Zijlstra <peterz@infradead.org>
>> > Cc: Vincent Guittot <vincent.guittot@linaro.org>
>> > Cc: Juri Lelli <juri.lelli@redhat.com>
>> > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
>> > Cc: Valentin Schneider <vschneid@redhat.com>
>> > Cc: Ben Segall <bsegall@google.com>
>> > ---
>> >
>> > v2:  Ben pointed out that the bit could get cleared in the dequeue path
>> > if we migrate a newly enqueued task without preempting curr. Added a 
>> > check for that edge case to sched_can_stop_tick. Removed the call to 
>> > sched_can_stop_tick from sched_fair_update_stop_tick since it was 
>> > redundant.
>> >
>> >  kernel/sched/core.c     | 12 +++++++++++
>> >  kernel/sched/fair.c     | 45 +++++++++++++++++++++++++++++++++++++++++
>> >  kernel/sched/features.h |  2 ++
>> >  3 files changed, 59 insertions(+)
>> >
>> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> > index a68d1276bab0..646f60bfc7e7 100644
>> > --- a/kernel/sched/core.c
>> > +++ b/kernel/sched/core.c
>> > @@ -1194,6 +1194,8 @@ static void nohz_csd_func(void *info)
>> >  #endif /* CONFIG_NO_HZ_COMMON */
>> >  
>> >  #ifdef CONFIG_NO_HZ_FULL
>> > +extern bool sched_cfs_bandwidth_active(struct cfs_rq *cfs_rq);
>> > +
>> >  bool sched_can_stop_tick(struct rq *rq)
>> >  {
>> >  	int fifo_nr_running;
>> > @@ -1229,6 +1231,16 @@ bool sched_can_stop_tick(struct rq *rq)
>> >  	if (rq->nr_running > 1)
>> >  		return false;
>> >  
>> > +	/*
>> > +	 * If there is one task and it has CFS runtime bandwidth constraints
>> > +	 * and it's on the cpu now we don't want to stop the tick.
>> > +	 */
>> > +	if (sched_feat(HZ_BW) && rq->nr_running == 1 && rq->curr
>> > +	    && rq->curr->sched_class == &fair_sched_class && task_on_rq_queued(rq->curr)) {
>> > +		if (sched_cfs_bandwidth_active(task_cfs_rq(rq->curr)))
>> 
>> Actually, something I should have noticed earlier is that this should
>> probably be hierarchical, right? You need to check every ancestor
>> cfs_rq, not just the immediate parent. And at that point it probably
>> makes sense to have sched_cfs_bandwidth_active take a task_struct.
>> 
>
> Are you saying a child cfs_rq with a parent that has runtime_enabled could
> itself not have runtime_enabled?   I may be missing something but I don't
> see how that works.

Correct.

>
> account_cfs_rq_runtime() for example just looks at the immediate cfs_rq of
> curr and bails if it does not have runtime_enabled. How could that task get
> throttled if it exceeds some parent's limit?

account_cfs_rq_runtime() is called (primarily) from update_curr(), which
is called by enqueue_entity/dequeue_entity/entity_tick/etc, which are
called at each level of the hierarchy.

The worse cache behavior of doing a separate walk in sched_can_stop_tick
aka add/sub_nr_running could I guess be avoided by having some
runtime_enabled flag on the task struct or rq that is up to date for
rq->curr only. That would only be a little annoying to keep accurate,
and there's the dual arguments of "task_struct/rq is already too
cluttered"/"well they're already so cluttered a little more won't hurt".

  reply	other threads:[~2023-06-29 17:55 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-27 19:12 [PATCH v2] Sched/fair: Block nohz tick_stop when cfs bandwidth in use Phil Auld
2023-06-27 21:19 ` kernel test robot
2023-06-28 21:42 ` Benjamin Segall
2023-06-29  0:53   ` Phil Auld
2023-06-29 17:55     ` Benjamin Segall [this message]
2023-06-29 19:06       ` Phil Auld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xm26fs6a8867.fsf@google.com \
    --to=bsegall@google.com \
    --cc=bristot@redhat.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.