From: "Chen, Yu C" <yu.c.chen@intel.com>
To: Hillf Danton <hdanton@sina.com>
Cc: <vincent.guittot@linaro.org>, <linux-kernel@vger.kernel.org>,
<kprateek.nayak@amd.com>, <yu.chen.surf@foxmail.com>,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: [RFC][PATCH] sched: Cache aware load-balancing
Date: Mon, 31 Mar 2025 14:25:32 +0800 [thread overview]
Message-ID: <09ba5932-a256-4cdd-94dc-4f2b6569c855@intel.com> (raw)
In-Reply-To: <20250327112059.3661-1-hdanton@sina.com>
On 3/27/2025 7:20 PM, Hillf Danton wrote:
> On Wed, Mar 26, 2025 at 11:25:53AM +0100, Peter Zijlstra wrote:
>> On Wed, Mar 26, 2025 at 10:38:41AM +0100, Peter Zijlstra wrote:
>>
>>> Nah, the saner thing to do is to preserve the topology averages and look
>>> at those instead of the per-cpu values.
>>>
>>> Eg. have task_cache_work() compute and store averages in the
>>> sched_domain structure and then use those.
>>
>> A little something like so perhaps ?
>>
> My $.02 followup with the assumption that l2 cache temperature can not
> make sense without comparing. Just for idea show.
>
> Hillf
>
> --- m/include/linux/sched.h
> +++ n/include/linux/sched.h
> @@ -1355,6 +1355,11 @@ struct task_struct {
> unsigned long numa_pages_migrated;
> #endif /* CONFIG_NUMA_BALANCING */
>
> +#ifdef CONFIG_SCHED_CACHE
> +#define LXC_SIZE 64 /* should be setup by parsing topology */
> + unsigned long lxc_temp[LXC_SIZE]; /* x > 1, l2 cache temperature for instance */
> +#endif
> +
> #ifdef CONFIG_RSEQ
> struct rseq __user *rseq;
> u32 rseq_len;
> --- m/kernel/sched/fair.c
> +++ n/kernel/sched/fair.c
> @@ -7953,6 +7953,22 @@ static int select_idle_sibling(struct ta
> if ((unsigned)i < nr_cpumask_bits)
> return i;
>
> +#ifdef CONFIG_SCHED_CACHE
> + /*
> + * 2, lxc temp can not make sense without comparing
> + *
> + * target can be any cpu if lxc is cold
> + */
> + if ((unsigned int)prev_aff < nr_cpumask_bits)
> + if (p->lxc_temp[per_cpu(sd_share_id, (unsigned int)prev_aff)] >
> + p->lxc_temp[per_cpu(sd_share_id, target)])
> + target = prev_aff;
> + if ((unsigned int)recent_used_cpu < nr_cpumask_bits)
> + if (p->lxc_temp[per_cpu(sd_share_id, (unsigned int)recent_used_cpu)] >
> + p->lxc_temp[per_cpu(sd_share_id, target)])
> + target = recent_used_cpu;
> + p->lxc_temp[per_cpu(sd_share_id, target)] += 1;
> +#else
> /*
> * For cluster machines which have lower sharing cache like L2 or
> * LLC Tag, we tend to find an idle CPU in the target's cluster
> @@ -7963,6 +7979,7 @@ static int select_idle_sibling(struct ta
> return prev_aff;
> if ((unsigned int)recent_used_cpu < nr_cpumask_bits)
> return recent_used_cpu;
> +#endif
>
> return target;
> }
> @@ -13059,6 +13076,13 @@ static void task_tick_fair(struct rq *rq
> if (static_branch_unlikely(&sched_numa_balancing))
> task_tick_numa(rq, curr);
>
> +#ifdef CONFIG_SCHED_CACHE
> + /*
> + * 0, lxc is defined cold after 2-second nap
> + * 1, task migrate across NUMA node makes lxc cold
> + */
> + curr->lxc_temp[per_cpu(sd_share_id, rq->cpu)] += 5;
If lxc_temp is per task, this might be of another direction that to
track each task's activity rather than the whole process activity.
The idea I think it is applicable to overwrite target to other CPU
if the latter is in a hot LLC, so select_idle_cpu() can search for
an idle CPU in cache hot LLC.
thanks,
Chenyu
next prev parent reply other threads:[~2025-03-31 6:25 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-25 12:09 [RFC][PATCH] sched: Cache aware load-balancing Peter Zijlstra
2025-03-25 15:19 ` Chen, Yu C
2025-03-25 18:44 ` Peter Zijlstra
2025-03-26 6:18 ` K Prateek Nayak
2025-03-26 9:15 ` Chen, Yu C
2025-03-26 9:42 ` Peter Zijlstra
2025-03-27 8:10 ` Chen, Yu C
2025-03-26 9:38 ` Peter Zijlstra
2025-03-26 10:25 ` Peter Zijlstra
2025-03-26 10:42 ` Peter Zijlstra
2025-03-26 10:46 ` Peter Zijlstra
[not found] ` <20250327112059.3661-1-hdanton@sina.com>
2025-03-31 6:25 ` Chen, Yu C [this message]
2025-03-27 2:48 ` Chen, Yu C
2025-03-27 2:43 ` Madadi Vineeth Reddy
2025-03-27 11:14 ` Chen, Yu C
2025-03-31 20:17 ` Madadi Vineeth Reddy
2025-03-28 13:57 ` Abel Wu
2025-03-29 15:06 ` Chen, Yu C
2025-03-30 8:46 ` Abel Wu
2025-03-31 5:25 ` Chen, Yu C
2025-03-31 8:04 ` Abel Wu
2025-03-31 21:06 ` Tim Chen
2025-04-02 1:52 ` Libo Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=09ba5932-a256-4cdd-94dc-4f2b6569c855@intel.com \
--to=yu.c.chen@intel.com \
--cc=hdanton@sina.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=vincent.guittot@linaro.org \
--cc=yu.chen.surf@foxmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox