From: Benjamin Segall <bsegall@google.com>
To: Phil Auld <pauld@redhat.com>
Cc: linux-kernel@vger.kernel.org, Juri Lelli <juri.lelli@redhat.com>,
Ingo Molnar <mingo@redhat.com>,
Daniel Bristot de Oliveira <bristot@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Valentin Schneider <vschneid@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>,
Mel Gorman <mgorman@suse.de>
Subject: Re: [PATCH] Sched/fair: Block nohz tick_stop when cfs bandwidth in use
Date: Fri, 23 Jun 2023 11:59:09 -0700 [thread overview]
Message-ID: <xm26r0q280oy.fsf@google.com> (raw)
In-Reply-To: <20230623130859.GA766130@lorien.usersys.redhat.com> (Phil Auld's message of "Fri, 23 Jun 2023 09:08:59 -0400")
Phil Auld <pauld@redhat.com> writes:
> On Thu, Jun 22, 2023 at 05:37:30PM -0400 Phil Auld wrote:
>> On Thu, Jun 22, 2023 at 01:49:52PM -0700 Benjamin Segall wrote:
>> > Phil Auld <pauld@redhat.com> writes:
>> >
>> > > CFS bandwidth limits and NOHZ full don't play well together. Tasks
>> > > can easily run well past their quotas before a remote tick does
>> > > accounting. This leads to long, multi-period stalls before such
>> > > tasks can run again. Currentlyi, when presented with these conflicting
>> > > requirements the scheduler is favoring nohz_full and letting the tick
>> > > be stopped. However, nohz tick stopping is already best-effort, there
>> > > are a number of conditions that can prevent it, whereas cfs runtime
>> > > bandwidth is expected to be enforced.
>> > >
>> > > Make the scheduler favor bandwidth over stopping the tick by setting
>> > > TICK_DEP_BIT_SCHED when the only running task is a cfs task with
>> > > runtime limit enabled.
>> > >
>> > > Add sched_feat HZ_BW (off by default) to control this behavior.
>> > >
>> > > Signed-off-by: Phil Auld <pauld@redhat.com>
>> > > Cc: Ingo Molnar <mingo@redhat.com>
>> > > Cc: Peter Zijlstra <peterz@infradead.org>
>> > > Cc: Vincent Guittot <vincent.guittot@linaro.org>
>> > > Cc: Juri Lelli <juri.lelli@redhat.com>
>> > > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
>> > > Cc: Valentin Schneider <vschneid@redhat.com>
>> > > Cc: Ben Segall <bsegall@google.com>
>> > > ---
>> > > kernel/sched/fair.c | 33 ++++++++++++++++++++++++++++++++-
>> > > kernel/sched/features.h | 2 ++
>> > > 2 files changed, 34 insertions(+), 1 deletion(-)
>> > >
>> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> > > index 373ff5f55884..880eadfac330 100644
>> > > --- a/kernel/sched/fair.c
>> > > +++ b/kernel/sched/fair.c
>> > > @@ -6139,6 +6139,33 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
>> > > rcu_read_unlock();
>> > > }
>> > >
>> > > +#ifdef CONFIG_NO_HZ_FULL
>> > > +/* called from pick_next_task_fair() */
>> > > +static void sched_fair_update_stop_tick(struct rq *rq, struct task_struct *p)
>> > > +{
>> > > + struct cfs_rq *cfs_rq = task_cfs_rq(p);
>> > > + int cpu = cpu_of(rq);
>> > > +
>> > > + if (!sched_feat(HZ_BW) || !cfs_bandwidth_used())
>> > > + return;
>> > > +
>> > > + if (!tick_nohz_full_cpu(cpu))
>> > > + return;
>> > > +
>> > > + if (rq->nr_running != 1 || !sched_can_stop_tick(rq))
>> > > + return;
>> > > +
>> > > + /*
>> > > + * We know there is only one task runnable and we've just picked it. The
>> > > + * normal enqueue path will have cleared TICK_DEP_BIT_SCHED if we will
>> > > + * be otherwise able to stop the tick. Just need to check if we are using
>> > > + * bandwidth control.
>> > > + */
>> > > + if (cfs_rq->runtime_enabled)
>> > > + tick_nohz_dep_set_cpu(cpu, TICK_DEP_BIT_SCHED);
>> > > +}
>> > > +#endif
>> >
>> > So from a CFS_BANDWIDTH pov runtime_enabled && nr_running == 1 seems
>> > fine. But working around sched_can_stop_tick instead of with it seems
>> > sketchy in general, and in an edge case like "migrate a task onto the
>> > cpu and then off again" you'd get sched_update_tick_dependency resetting
>> > the TICK_DEP_BIT and then not call PNT (ie a task wakes up onto this cpu
>> > without preempting, and then another cpu goes idle and pulls it, causing
>> > this cpu to go into nohz_full).
>> >
>>
>> The information to make these tests is not available in sched_can_stop_tick.
>> I did start there. When that is called, and we are likely to go nohz_full,
>> curr is null so it's hard to find the right cfs_rq to make that
>> runtime_enabled test against. We could, maybe, plumb the task being enqueued
>> in but it would not be valid for the dequeue path and would be a bit messy.
>>
>
> Sorry, mispoke... rq->curr == rq-idle not null. But still we don't have
> access to the task and its cfs_rq which will have runtime_enabled set.
>
That is unfortunate. I suppose then you'd wind up needing both this
extra bit in PNT to handle the switch into nr_running == 1 territory,
and a "HZ_BW && nr_running == 1 && curr is fair && curr->on_rq &&
curr->cfs_rq->runtime_enabled" check in sched_can_stop_tick to catch
edge cases. (I think that would be sufficient, if an annoyingly long set
of conditionals)
next prev parent reply other threads:[~2023-06-23 18:59 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-22 13:27 [PATCH] Sched/fair: Block nohz tick_stop when cfs bandwidth in use Phil Auld
2023-06-22 13:44 ` Phil Auld
2023-06-22 14:22 ` Steven Rostedt
2023-06-22 15:44 ` Phil Auld
2023-06-22 20:49 ` Benjamin Segall
2023-06-22 21:37 ` Phil Auld
2023-06-23 13:08 ` Phil Auld
2023-06-23 18:59 ` Benjamin Segall [this message]
2023-06-23 19:59 ` Phil Auld
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xm26r0q280oy.fsf@google.com \
--to=bsegall@google.com \
--cc=bristot@redhat.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=pauld@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.