From: Patrick Bellasi <patrick.bellasi@matbug.net>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@redhat.com>,
"Peter Zijlstra \(Intel\)" <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>, Paul Turner <pjt@google.com>,
Ben Segall <bsegall@google.com>,
Thomas Gleixner <tglx@linutronix.de>,
Jonathan Corbet <corbet@lwn.net>,
Dhaval Giani <dhaval.giani@oracle.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Josef Bacik <jbacik@fb.com>, Chris Hyser <chris.hyser@oracle.com>,
Parth Shah <parth@linux.ibm.com>
Subject: Re: [SchedulerWakeupLatency] Per-task vruntime wakeup bonus
Date: Fri, 10 Jul 2020 21:59:31 +0200 [thread overview]
Message-ID: <878sfrywd8.derkling@matbug.net> (raw)
In-Reply-To: <CAKfTPtBHmP6BOrx6XGqZ7UpCFxWCZz23KWf4DXtAhRGUPfjebA@mail.gmail.com>
On Fri, Jul 10, 2020 at 15:21:48 +0200, Vincent Guittot <vincent.guittot@linaro.org> wrote...
> Hi Patrick,
Hi Vincent,
[...]
>> > C) Existing control paths
>>
>> Assuming:
>>
>> C: CFS task currently running on CPUx
>> W: CFS task waking up on the same CPUx
>>
>> And considering the overall simplified workflow:
>>
>> core::try_to_wake_up()
>>
>> // 1) Select on which CPU W will run
>> core::select_task_rq()
>> fair::select_task_rq_fair()
>>
>> // 2) Enqueue W on the selected CPU
>> core::ttwu_queue()
>> core::ttwu_do_activate()
>> core::activate_task()
>> core::enqueue_task()
>> fair::enqueue_task_fair()
>> fair::enqueue_entity()
>>
>> // 3) Set W's vruntime bonus
>> fair::place_entity()
>> se->vruntime = ...
>>
>> // 4) Check if C can be preempted by W
>> core::ttwu_do_wakeup()
>> core::check_preempt_curr()
>> fair::check_preempt_curr()
>> fair::check_preempt_wakeup(curr, se)
>> fair::wakeup_preempt_entity(curr, se)
>> vdiff = curr.vruntime - se.vruntime
>> return vdiff > wakeup_gran(se)
>>
>> We see that W preempts C iff:
>>
>> vdiff > wakeup_gran(se)
>>
>> Since:
>>
>> enqueue_entity(cfs_rq, se, flags)
>> place_entity(cfs_rq, se, initial=0)
>> thresh = sysctl_sched_latency / (GENTLE_FAIR_SLEEPERS ? 2 : 1)
>> vruntime = cfs_rq->min_vruntime - thresh
>> se->vruntime = max_vruntime(se->vruntime, vruntime)
>>
>> a waking task's W.vruntime can get a "vruntime bonus" up to:
>> - 1 scheduler latency (w/ GENTLE_FAIR_SLEEPERS)
>> - 1/2 scheduler latency (w/o GENTLE_FAIR_SLEEPERS)
>>
>>
>> > D) Desired behavior
>>
>> The "vruntime bonus" (thresh) computed in place_entity() should have a
>> per-task definition, which defaults to the current implementation.
>>
>> A bigger vruntime bonus can be configured for latency sensitive tasks.
>> A smaller vruntime bonus can be configured for latency tolerant tasks.
>
> I'm not sure that adjusting what you called "vruntime bonus" is the
> right way to provide some latency because it doesn't only provide a
> wakeup latency bonus but also provides a runtime bonus.
True, however that's what we already do but _just_ in an hard-coded way.
A task waking up from sleep gets 1 sched_latency bonus, or 1/2 w/o
FAIR_SLEEPERS. Point is that not all tasks are the same: for some this
bonus can be not really required, for others too small.
Regarding the 'runtime bonus' I think it's kind of unavoidable,
if we really want a latency sensitive task being scheduled
before the others.
> It means that one can impact the running time by playing with
> latency_nice whereas the goal is only to impact the wakeup latency.
Well, but I'm not sure how much you can really gain considering that
this bonus is given only at wakeup time: the task should keep
suspending himself. It would get a better result by just asking for a
lower nice value.
Now, asking for a reduced nice value is RLIMIT_NICE and CAP_SYS_NICE
protected. The same will be for latency_nice.
Moreover, considering that by default tasks will get what we already
have as hard-coded or less of a bonus, I don't see how easy should be to
abuse.
To the contrary we can introduce a very useful knob to allow certain
tasks to voluntarily demote themselves and avoid annoying a currently
running task.
> Instead, it should weight the decision in wakeup_preempt_entity() and
> wakeup_gran()
In those functions we already take the task prio into consideration
(ref details at the end of this message).
Lower nice value tasks have more chances to preempt current since they
will have a smaller wakeup_gran, indeed:
we preempt IFF vdiff(se, current) > wakeup_gran(se)
\----------------/ \-------------/
A B
While task's prio affects B, in this proposal, lantecy_nice works on the
A side of the equation above by making it a bit more task specific.
That said, it's true that both latency_nice and prio will ultimately
play a role on how much CPU bandwidth a task gets.
Question is: do we deem it useful to have an additional knob working on
the A side of the equation above?
Best,
Patrick
---8<------8<------8<------8<------8<------8<------8<------8<------8<------8<---
TL;DR: The nice value already affects the wakeup latency
As reported above:
check_preempt_wakeup(rq, p, wake_flags)
wakeup_preempt_entity(curr, se)
(d) vdiff = curr.vruntime - se.vruntime
(e) return vdiff > wakeup_gran(se)
we see that W preempts C iff:
vdiff > wakeup_gran(se)
But:
wakeup_gran(se)
calc_delta_fair(delta=sysctl_sched_wakeup_granularity, se)
__calc_delta(delta_exec=delta, weight=NICE_0_LOAD, lw=&se->load)
(c) wakeup_gran = sched_wakeup_granularity * (NICE_0_LOAD / W.load.weight)
Thus, the wakeup granularity of W depends on:
- the system-wide configured wakeup granularity
sysctl_sched_wakeup_granularity := [0..1e9]ns
- W.load.weight [88761, .., 15]
But since:
set_user_nice()
p->static_prio = NICE_TO_PRIO(nice) = 120 + nice
set_load_weight(p, update_load=true)
reweight_task(p, prio)
(a) prio = p->static_prio - MAX_RT_PRIO = 120 + nice - 100 = nice + 20
(b) weight = scale_load(sched_prio_to_weight[prio]) = [88761, ..., 15]
reweight_entity(cfs_rq, se, weight=weight, runnable=weight)
update_load_set(lw=&se->load, w=weight)
lw->weight = w
p->prio = effective_prio(p)
We see that by tuning a task's nice value we affect its wakeup granularity:
lower the nice
=(a)=> lower the prio
=(b)=> higher the weight
=(c)=> smaller the wakeup_grain
This means that for a given system-wide knob (sched_wakeup_granularity),
we still get different behaviours depending on a task specific knob.
A smaller nice makes more likely W to preempt C.
next prev parent reply other threads:[~2020-07-10 19:59 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <87v9kv2545.derkling@matbug.com>
[not found] ` <87h7wd15v2.derkling@matbug.net>
[not found] ` <87imgrlrqi.derkling@matbug.net>
[not found] ` <87mu5sqwkt.derkling@matbug.net>
[not found] ` <87eer42clt.derkling@matbug.net>
2020-06-23 7:29 ` Scheduler wakeup path tuning surface: Use-Cases and Requirements Patrick Bellasi
2020-06-23 7:49 ` [SchedulerWakeupLatency] Per-task vruntime wakeup bonus Patrick Bellasi
2020-07-10 13:21 ` Vincent Guittot
2020-07-10 19:59 ` Patrick Bellasi [this message]
2020-07-13 12:59 ` Vincent Guittot
2020-07-16 16:48 ` Dietmar Eggemann
2020-07-17 12:20 ` Vincent Guittot
2020-07-16 19:54 ` Patrick Bellasi
2020-07-17 14:19 ` Vincent Guittot
2020-07-09 12:07 ` [SchedulerTaskPacking] Small background task packing Parth Shah
2020-07-09 23:08 ` [SchedulerWakeupLatency] Skipping Idle Cores and CPU Search chris hyser
2020-07-20 8:47 ` Dietmar Eggemann
2020-07-22 18:56 ` chris hyser
2020-07-20 15:20 ` [SchedulerTaskPacking] Small background task packing Vincent Guittot
2020-07-23 9:33 ` Parth Shah
2020-07-23 9:31 ` [SchedulerWakeupLatency] Skip energy aware task placement Dietmar Eggemann
2020-07-29 14:01 ` Quentin Perret
2020-11-12 6:13 ` Scheduler wakeup path tuning surface: Interface discussion Parth Shah
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=878sfrywd8.derkling@matbug.net \
--to=patrick.bellasi@matbug.net \
--cc=bsegall@google.com \
--cc=chris.hyser@oracle.com \
--cc=corbet@lwn.net \
--cc=dhaval.giani@oracle.com \
--cc=dietmar.eggemann@arm.com \
--cc=jbacik@fb.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=parth@linux.ibm.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).