Re: [RFC][PATCH] sched: Cache aware load-balancing

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "Chen, Yu C" <yu.c.chen@intel.com>
To: Hillf Danton <hdanton@sina.com>
Cc: <vincent.guittot@linaro.org>, <linux-kernel@vger.kernel.org>,
	<kprateek.nayak@amd.com>, <yu.chen.surf@foxmail.com>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [RFC][PATCH] sched: Cache aware load-balancing
Date: Mon, 31 Mar 2025 14:25:32 +0800	[thread overview]
Message-ID: <09ba5932-a256-4cdd-94dc-4f2b6569c855@intel.com> (raw)
In-Reply-To: <20250327112059.3661-1-hdanton@sina.com>

On 3/27/2025 7:20 PM, Hillf Danton wrote:
> On Wed, Mar 26, 2025 at 11:25:53AM +0100, Peter Zijlstra wrote:
>> On Wed, Mar 26, 2025 at 10:38:41AM +0100, Peter Zijlstra wrote:
>>
>>> Nah, the saner thing to do is to preserve the topology averages and look
>>> at those instead of the per-cpu values.
>>>
>>> Eg. have task_cache_work() compute and store averages in the
>>> sched_domain structure and then use those.
>>
>> A little something like so perhaps ?
>>
> My $.02 followup with the assumption that l2 cache temperature can not
> make sense without comparing. Just for idea show.
> 
> 	Hillf
> 
> --- m/include/linux/sched.h
> +++ n/include/linux/sched.h
> @@ -1355,6 +1355,11 @@ struct task_struct {
>   	unsigned long			numa_pages_migrated;
>   #endif /* CONFIG_NUMA_BALANCING */
>   
> +#ifdef CONFIG_SCHED_CACHE
> +#define LXC_SIZE 64 /* should be setup by parsing topology */
> +	unsigned long lxc_temp[LXC_SIZE]; /* x > 1, l2 cache temperature for instance */
> +#endif
> +
>   #ifdef CONFIG_RSEQ
>   	struct rseq __user *rseq;
>   	u32 rseq_len;
> --- m/kernel/sched/fair.c
> +++ n/kernel/sched/fair.c
> @@ -7953,6 +7953,22 @@ static int select_idle_sibling(struct ta
>   	if ((unsigned)i < nr_cpumask_bits)
>   		return i;
>   
> +#ifdef CONFIG_SCHED_CACHE
> +	/*
> +	 * 2, lxc temp can not make sense without comparing
> +	 *
> +	 * target can be any cpu if lxc is cold
> +	 */
> +	if ((unsigned int)prev_aff < nr_cpumask_bits)
> +		if (p->lxc_temp[per_cpu(sd_share_id, (unsigned int)prev_aff)] >
> +		    p->lxc_temp[per_cpu(sd_share_id, target)])
> +			target = prev_aff;
> +	if ((unsigned int)recent_used_cpu < nr_cpumask_bits)
> +		if (p->lxc_temp[per_cpu(sd_share_id, (unsigned int)recent_used_cpu)] >
> +		    p->lxc_temp[per_cpu(sd_share_id, target)])
> +			target = recent_used_cpu;
> +	p->lxc_temp[per_cpu(sd_share_id, target)] += 1;
> +#else
>   	/*
>   	 * For cluster machines which have lower sharing cache like L2 or
>   	 * LLC Tag, we tend to find an idle CPU in the target's cluster
> @@ -7963,6 +7979,7 @@ static int select_idle_sibling(struct ta
>   		return prev_aff;
>   	if ((unsigned int)recent_used_cpu < nr_cpumask_bits)
>   		return recent_used_cpu;
> +#endif
>   
>   	return target;
>   }
> @@ -13059,6 +13076,13 @@ static void task_tick_fair(struct rq *rq
>   	if (static_branch_unlikely(&sched_numa_balancing))
>   		task_tick_numa(rq, curr);
>   
> +#ifdef CONFIG_SCHED_CACHE
> +	/*
> +	 * 0, lxc is defined cold after 2-second nap
> +	 * 1, task migrate across NUMA node makes lxc cold
> +	 */
> +	curr->lxc_temp[per_cpu(sd_share_id, rq->cpu)] += 5;

If lxc_temp is per task, this might be of another direction that to 
track each task's activity rather than the whole process activity.
The idea I think it is applicable to overwrite target to other CPU
if the latter is in a hot LLC, so select_idle_cpu() can search for
an idle CPU in cache hot LLC.

thanks,
Chenyu

next prev parent reply	other threads:[~2025-03-31  6:25 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-25 12:09 [RFC][PATCH] sched: Cache aware load-balancing Peter Zijlstra
2025-03-25 15:19 ` Chen, Yu C
2025-03-25 18:44   ` Peter Zijlstra
2025-03-26  6:18     ` K Prateek Nayak
2025-03-26  9:15       ` Chen, Yu C
2025-03-26  9:42         ` Peter Zijlstra
2025-03-27  8:10           ` Chen, Yu C
2025-03-26  9:38   ` Peter Zijlstra
2025-03-26 10:25     ` Peter Zijlstra
2025-03-26 10:42       ` Peter Zijlstra
2025-03-26 10:46       ` Peter Zijlstra
     [not found]       ` <20250327112059.3661-1-hdanton@sina.com>
2025-03-31  6:25         ` Chen, Yu C [this message]
2025-03-27  2:48     ` Chen, Yu C
2025-03-27  2:43 ` Madadi Vineeth Reddy
2025-03-27 11:14   ` Chen, Yu C
2025-03-31 20:17     ` Madadi Vineeth Reddy
2025-03-28 13:57 ` Abel Wu
2025-03-29 15:06   ` Chen, Yu C
2025-03-30  8:46     ` Abel Wu
2025-03-31  5:25       ` Chen, Yu C
2025-03-31  8:04         ` Abel Wu
2025-03-31 21:06 ` Tim Chen
2025-04-02  1:52 ` Libo Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=09ba5932-a256-4cdd-94dc-4f2b6569c855@intel.com \
    --to=yu.c.chen@intel.com \
    --cc=hdanton@sina.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=vincent.guittot@linaro.org \
    --cc=yu.chen.surf@foxmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox