From: Joel Fernandes <joel@joelfernandes.org>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com,
dietmar.eggemann@arm.com, rostedt@goodmis.org,
bsegall@google.com, mgorman@suse.de, bristot@redhat.com,
vschneid@redhat.com, linux-kernel@vger.kernel.org,
parth@linux.ibm.com, qyousef@layalina.io, chris.hyser@oracle.com,
patrick.bellasi@matbug.net, David.Laight@aculab.com,
pjt@google.com, pavel@ucw.cz, tj@kernel.org, qperret@google.com,
tim.c.chen@linux.intel.com, joshdon@google.com, timj@gnu.org,
kprateek.nayak@amd.com, yu.c.chen@intel.com,
youssefesmat@chromium.org, riel@redhat.com
Subject: Re: [PATCH v8 1/9] sched/fair: fix unfairness at wakeup
Date: Mon, 14 Nov 2022 03:06:23 +0000 [thread overview]
Message-ID: <Y3Gwr2p5BcofuZ8e@google.com> (raw)
In-Reply-To: <20221110175009.18458-2-vincent.guittot@linaro.org>
Hi Vincent,
On Thu, Nov 10, 2022 at 06:50:01PM +0100, Vincent Guittot wrote:
> At wake up, the vruntime of a task is updated to not be more older than
> a sched_latency period behind the min_vruntime. This prevents long sleeping
> task to get unlimited credit at wakeup.
> Such waking task should preempt current one to use its CPU bandwidth but
> wakeup_gran() can be larger than sched_latency, filter out the
> wakeup preemption and as a results steals some CPU bandwidth to
> the waking task.
Just a thought: one can argue that this also hurts the running task because
wakeup_gran() is expected to not preempt the running task for a certain
minimum amount of time right?
So for example, if I set sysctl_sched_wakeup_granularity to a high value, I
expect the current task to not be preempted for that long, even if the
sched_latency cap in place_entity() makes the delta smaller than
wakeup_gran(). The place_entity() in current code is used to cap the sleep
credit, it does not really talk about preemption.
I don't mind this change, but it does change the meaning a bit of
sysctl_sched_wakeup_granularity I think.
> Make sure that a task, which vruntime has been capped, will preempt current
> task and use its CPU bandwidth even if wakeup_gran() is in the same range
> as sched_latency.
nit: I would prefer we say, instead of "is in the same range", "is greater
than". Because it got confusing a bit for me.
> If the waking task failed to preempt current it could to wait up to
> sysctl_sched_min_granularity before preempting it during next tick.
>
> Strictly speaking, we should use cfs->min_vruntime instead of
> curr->vruntime but it doesn't worth the additional overhead and complexity
> as the vruntime of current should be close to min_vruntime if not equal.
Could we add here,
Reported-by: Youssef Esmat <youssefesmat@chromium.org>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Just a few more comments below:
> ---
> kernel/sched/fair.c | 46 ++++++++++++++++++++------------------------
> kernel/sched/sched.h | 30 ++++++++++++++++++++++++++++-
> 2 files changed, 50 insertions(+), 26 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 5ffec4370602..eb04c83112a0 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4345,33 +4345,17 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> {
> u64 vruntime = cfs_rq->min_vruntime;
>
> - /*
> - * The 'current' period is already promised to the current tasks,
> - * however the extra weight of the new task will slow them down a
> - * little, place the new task so that it fits in the slot that
> - * stays open at the end.
> - */
> - if (initial && sched_feat(START_DEBIT))
> - vruntime += sched_vslice(cfs_rq, se);
> -
> - /* sleeps up to a single latency don't count. */
> - if (!initial) {
> - unsigned long thresh;
> -
> - if (se_is_idle(se))
> - thresh = sysctl_sched_min_granularity;
> - else
> - thresh = sysctl_sched_latency;
> -
> + if (!initial)
> + /* sleeps up to a single latency don't count. */
> + vruntime -= get_sched_latency(se_is_idle(se));
> + else if (sched_feat(START_DEBIT))
> /*
> - * Halve their sleep time's effect, to allow
> - * for a gentler effect of sleepers:
> + * The 'current' period is already promised to the current tasks,
> + * however the extra weight of the new task will slow them down a
> + * little, place the new task so that it fits in the slot that
> + * stays open at the end.
> */
> - if (sched_feat(GENTLE_FAIR_SLEEPERS))
> - thresh >>= 1;
> -
> - vruntime -= thresh;
> - }
> + vruntime += sched_vslice(cfs_rq, se);
>
> /* ensure we never gain time by being placed backwards. */
> se->vruntime = max_vruntime(se->vruntime, vruntime);
> @@ -7187,6 +7171,18 @@ wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se)
> return -1;
>
> gran = wakeup_gran(se);
> +
> + /*
> + * At wake up, the vruntime of a task is capped to not be older than
> + * a sched_latency period compared to min_vruntime. This prevents long
> + * sleeping task to get unlimited credit at wakeup. Such waking up task
> + * has to preempt current in order to not lose its share of CPU
> + * bandwidth but wakeup_gran() can become higher than scheduling period
> + * for low priority task. Make sure that long sleeping task will get a
> + * chance to preempt current.
> + */
> + gran = min_t(s64, gran, get_latency_max());
> +
Can we move this to wakeup_gran(se)? IMO, it belongs there because you are
adjusting the wakeup_gran().
> if (vdiff > gran)
> return 1;
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 1fc198be1ffd..14879d429919 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -2432,9 +2432,9 @@ extern void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags);
> extern const_debug unsigned int sysctl_sched_nr_migrate;
> extern const_debug unsigned int sysctl_sched_migration_cost;
>
> -#ifdef CONFIG_SCHED_DEBUG
> extern unsigned int sysctl_sched_latency;
> extern unsigned int sysctl_sched_min_granularity;
> +#ifdef CONFIG_SCHED_DEBUG
> extern unsigned int sysctl_sched_idle_min_granularity;
> extern unsigned int sysctl_sched_wakeup_granularity;
> extern int sysctl_resched_latency_warn_ms;
> @@ -2448,6 +2448,34 @@ extern unsigned int sysctl_numa_balancing_scan_period_max;
> extern unsigned int sysctl_numa_balancing_scan_size;
> #endif
>
> +static inline unsigned long get_sched_latency(bool idle)
> +{
IMO, since there are other users of sysctl_sched_latency, it would be better
to call this get_max_sleep_credit() or something.
> + unsigned long thresh;
> +
> + if (idle)
> + thresh = sysctl_sched_min_granularity;
> + else
> + thresh = sysctl_sched_latency;
> +
> + /*
> + * Halve their sleep time's effect, to allow
> + * for a gentler effect of sleepers:
> + */
> + if (sched_feat(GENTLE_FAIR_SLEEPERS))
> + thresh >>= 1;
> +
> + return thresh;
> +}
> +
> +static inline unsigned long get_latency_max(void)
> +{
> + unsigned long thresh = get_sched_latency(false);
> +
> + thresh -= sysctl_sched_min_granularity;
Could you clarify, why are you subtracting sched_min_granularity here? Could
you add some comments here to make it clear?
thanks,
- Joel
> +
> + return thresh;
> +}
> +
> #ifdef CONFIG_SCHED_HRTICK
>
> /*
> --
> 2.17.1
>
next prev parent reply other threads:[~2022-11-14 3:06 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-10 17:50 [PATCH v8 0/9] Add latency priority for CFS class Vincent Guittot
2022-11-10 17:50 ` [PATCH v8 1/9] sched/fair: fix unfairness at wakeup Vincent Guittot
2022-11-14 3:06 ` Joel Fernandes [this message]
2022-11-14 11:05 ` Vincent Guittot
2022-11-16 2:10 ` Joel Fernandes
2022-11-16 8:25 ` Aaron Lu
2022-11-17 9:18 ` Vincent Guittot
2022-11-14 16:19 ` Patrick Bellasi
2022-11-14 16:46 ` Vincent Guittot
2022-11-14 19:13 ` Dietmar Eggemann
2022-11-15 7:26 ` Vincent Guittot
2022-11-10 17:50 ` [PATCH v8 2/9] sched: Introduce latency-nice as a per-task attribute Vincent Guittot
2022-11-10 17:50 ` [PATCH v8 3/9] sched/core: Propagate parent task's latency requirements to the child task Vincent Guittot
2022-11-10 17:50 ` [PATCH v8 4/9] sched: Allow sched_{get,set}attr to change latency_nice of the task Vincent Guittot
2022-11-10 17:50 ` [PATCH v8 5/9] sched/fair: Take into account latency priority at wakeup Vincent Guittot
2022-11-14 16:20 ` Patrick Bellasi
2022-11-15 15:40 ` Vincent Guittot
2022-11-10 17:50 ` [PATCH v8 6/9] sched/fair: Add sched group latency support Vincent Guittot
2022-11-14 16:20 ` Patrick Bellasi
2022-11-14 16:57 ` Vincent Guittot
2022-11-10 17:50 ` [PATCH v8 7/9] sched/core: Support latency priority with sched core Vincent Guittot
2022-11-10 17:50 ` [PATCH v8 8/9] sched/fair: Add latency list Vincent Guittot
2022-11-10 17:50 ` [PATCH v8 9/9] sched/fair: remove check_preempt_from_others Vincent Guittot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y3Gwr2p5BcofuZ8e@google.com \
--to=joel@joelfernandes.org \
--cc=David.Laight@aculab.com \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=chris.hyser@oracle.com \
--cc=dietmar.eggemann@arm.com \
--cc=joshdon@google.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=parth@linux.ibm.com \
--cc=patrick.bellasi@matbug.net \
--cc=pavel@ucw.cz \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=qperret@google.com \
--cc=qyousef@layalina.io \
--cc=riel@redhat.com \
--cc=rostedt@goodmis.org \
--cc=tim.c.chen@linux.intel.com \
--cc=timj@gnu.org \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=youssefesmat@chromium.org \
--cc=yu.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.