From: Chen Yu <yu.c.chen@intel.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Juri Lelli <juri.lelli@redhat.com>,
Tim Chen <tim.c.chen@intel.com>, Aaron Lu <aaron.lu@intel.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
"Steven Rostedt" <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Daniel Bristot de Oliveira <bristot@redhat.com>,
"Valentin Schneider" <vschneid@redhat.com>,
K Prateek Nayak <kprateek.nayak@amd.com>,
"Gautham R . Shenoy" <gautham.shenoy@amd.com>,
<linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 2/2] sched/fair: skip the cache hot CPU in select_idle_cpu()
Date: Tue, 12 Sep 2023 19:53:05 +0800 [thread overview]
Message-ID: <ZQBRIUYT8PE/UbZe@chenyu5-mobl2.ccr.corp.intel.com> (raw)
In-Reply-To: <db8fa2ae-317b-1c5a-e23f-9d3396165c45@efficios.com>
Hi Mathieu,
thanks for the review,
On 2023-09-11 at 11:43:27 -0400, Mathieu Desnoyers wrote:
> On 9/11/23 11:26, Mathieu Desnoyers wrote:
> > On 9/10/23 22:50, Chen Yu wrote:
> [...]
> > > ---
> > > kernel/sched/fair.c | 30 +++++++++++++++++++++++++++---
> > > kernel/sched/features.h | 1 +
> > > kernel/sched/sched.h | 1 +
> > > 3 files changed, 29 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index e20f50726ab8..fe3b760c9654 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -6629,6 +6629,21 @@ static void dequeue_task_fair(struct rq *rq,
> > > struct task_struct *p, int flags)
> > > hrtick_update(rq);
> > > now = sched_clock_cpu(cpu_of(rq));
> > > p->se.prev_sleep_time = task_sleep ? now : 0;
> > > +#ifdef CONFIG_SMP
> > > + /*
> > > + * If this rq will become idle, and dequeued task is
> > > + * a short sleeping one, check if we can reserve
> > > + * this idle CPU for that task for a short while.
> > > + * During this reservation period, other wakees will
> > > + * skip this 'idle' CPU in select_idle_cpu(), and this
> > > + * short sleeping task can pick its previous CPU in
> > > + * select_idle_sibling(), which brings better cache
> > > + * locality.
> > > + */
> > > + if (sched_feat(SIS_CACHE) && task_sleep && !rq->nr_running &&
> > > + p->se.sleep_avg && p->se.sleep_avg <
> > > sysctl_sched_migration_cost)
> > > + rq->cache_hot_timeout = now + p->se.sleep_avg;
> >
> > This is really cool!
> >
> > There is one scenario that worries me with this approach: workloads
> > that sleep for a long time and then have short blocked periods.
> > Those bursts will likely bring the average to values too high
> > to stay below sysctl_sched_migration_cost.
> >
> > I wonder if changing the code above for the following would help ?
> >
> > if (sched_feat(SIS_CACHE) && task_sleep && !rq->nr_running &&
> > p->se.sleep_avg)
> > rq->cache_hot_timeout = now + min(sysctl_sched_migration_cost,
> > p->se.sleep_avg);
> >
> > For tasks that have a large sleep_avg, it would activate this rq
> > "appear as not idle for rq selection" scheme for a window of
> > sysctl_sched_migration_cost. If the sleep ends up being a long one,
> > preventing other tasks from being migrated to this rq for a tiny
> > window should not matter performance-wise. I would expect that it
> > could help workloads that come in bursts.
>
> There is perhaps a better way to handle bursts:
>
> When calculating the sleep_avg, we actually only really care about
> the sleep time for short bursts, so we could use the sysctl_sched_migration_cost
> to select which of the sleeps should be taken into account in the avg.
>
> We could rename the field "sleep_avg" to "burst_sleep_avg", and have:
>
> u64 now = sched_clock_cpu(task_cpu(p));
>
> if ((flags & ENQUEUE_WAKEUP) && last_sleep && cpu_online(task_cpu(p)) &&
> now > last_sleep && now - last_sleep < sysctl_sched_migration_cost)
> update_avg(&p->se.burst_sleep_avg, now - last_sleep);
>
> Then we can have this code is dequeue_task_fair:
>
> if (sched_feat(SIS_CACHE) && task_sleep && !rq->nr_running && p->se.busrt_sleep_avg)
> rq->cache_hot_timeout = now + p->se.burst_sleep_avg;
>
> Thoughts ?
>
This looks reasonable, it would be fine grained to only monitor the short sleep duration
of that task. Let me try this approach to see if there is any difference.
thanks,
Chenyu
> Thanks,
>
> Mathieu
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
>
next prev parent reply other threads:[~2023-09-12 11:53 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-11 2:49 [RFC PATCH 0/2] Makes it easier for the wakee to choose previous CPU Chen Yu
2023-09-11 2:49 ` [RFC PATCH 1/2] sched/fair: Record the average sleep time of a task Chen Yu
2023-09-11 2:50 ` [RFC PATCH 2/2] sched/fair: skip the cache hot CPU in select_idle_cpu() Chen Yu
2023-09-11 7:26 ` Aaron Lu
2023-09-11 8:40 ` Chen Yu
2023-09-13 6:22 ` Gautham R. Shenoy
2023-09-13 7:25 ` Chen Yu
2023-09-14 7:06 ` Gautham R. Shenoy
2023-09-14 12:09 ` Chen Yu
2023-09-15 15:18 ` Gautham R. Shenoy
2023-09-19 9:01 ` Chen Yu
2023-09-11 8:29 ` K Prateek Nayak
2023-09-11 10:19 ` Chen Yu
2023-09-12 3:05 ` K Prateek Nayak
2023-09-12 12:32 ` Chen Yu
2023-09-12 14:26 ` K Prateek Nayak
2023-09-13 2:57 ` Chen Yu
2023-09-14 4:13 ` K Prateek Nayak
2023-09-14 11:01 ` Chen Yu
2023-09-15 3:21 ` K Prateek Nayak
2023-09-12 9:39 ` Mike Galbraith
2023-09-12 14:51 ` Chen Yu
2023-09-12 6:32 ` Mike Galbraith
2023-09-11 15:26 ` Mathieu Desnoyers
2023-09-11 15:43 ` Mathieu Desnoyers
2023-09-12 11:53 ` Chen Yu [this message]
2023-09-12 14:06 ` Mathieu Desnoyers
2023-09-12 14:14 ` Chen Yu
2023-09-12 15:18 ` Mathieu Desnoyers
2023-09-13 3:02 ` Chen Yu
2023-09-20 12:34 ` Chen Yu
2023-09-14 5:30 ` K Prateek Nayak
2023-09-14 10:43 ` Chen Yu
2023-09-15 3:37 ` K Prateek Nayak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZQBRIUYT8PE/UbZe@chenyu5-mobl2.ccr.corp.intel.com \
--to=yu.c.chen@intel.com \
--cc=aaron.lu@intel.com \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=gautham.shenoy@amd.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tim.c.chen@intel.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox