Re: [RFC patch v3 07/20] sched: Add helper function to decide whether to allow cache aware scheduling

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Libo Chen <libo.chen@oracle.com>
To: "Chen, Yu C" <yu.c.chen@intel.com>
Cc: Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Abel Wu <wuyun.abel@bytedance.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>, Len Brown <len.brown@intel.com>,
	linux-kernel@vger.kernel.org,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Subject: Re: [RFC patch v3 07/20] sched: Add helper function to decide whether to allow cache aware scheduling
Date: Tue, 8 Jul 2025 10:22:36 -0700	[thread overview]
Message-ID: <9e78f54c-f993-4505-814d-b8815f5c6bf0@oracle.com> (raw)
In-Reply-To: <6a36eae1-7d1c-4c23-aec3-056d641e3edd@intel.com>



On 7/8/25 01:29, Chen, Yu C wrote:
> On 7/8/2025 8:41 AM, Libo Chen wrote:
>> Hi Tim and Chenyu,
>>
>>
>> On 6/18/25 11:27, Tim Chen wrote:
>>> Cache-aware scheduling is designed to aggregate threads into their
>>> preferred LLC, either via the task wake up path or the load balancing
>>> path. One side effect is that when the preferred LLC is saturated,
>>> more threads will continue to be stacked on it, degrading the workload's
>>> latency. A strategy is needed to prevent this aggregation from going too
>>> far such that the preferred LLC is too overloaded.
>>>
>>> Introduce helper function _get_migrate_hint() to implement the LLC
>>> migration policy:
>>>
>>> 1) A task is aggregated to its preferred LLC if both source/dest LLC
>>>     are not too busy (<50% utilization, tunable), or the preferred
>>>     LLC will not be too out of balanced from the non preferred LLC
>>>     (>20% utilization, tunable, close to imbalance_pct of the LLC
>>>     domain).
>>> 2) Allow a task to be moved from the preferred LLC to the
>>>     non-preferred one if the non-preferred LLC will not be too out
>>>     of balanced from the preferred prompting an aggregation task
>>>     migration later.  We are still experimenting with the aggregation
>>>     and migration policy. Some other possibilities are policy based
>>>     on LLC's load or average number of tasks running.  Those could
>>>     be tried out by tweaking _get_migrate_hint().
>>>
>>> The function _get_migrate_hint() returns migration suggestions for the upper-le
>>> +__read_mostly unsigned int sysctl_llc_aggr_cap       = 50;
>>> +__read_mostly unsigned int sysctl_llc_aggr_imb       = 20;
>>> +
>>
>>
>> I think this patch has a great potential.
>>
>> Since _get_migrate_hint() is tied to an individual task anyway, why not add a
>> per-task llc_aggr_imb which defaults to the sysctl one? Tasks have different
>> preferences for llc stacking, they can all be running in the same system at the
>> same time. This way you can offer a greater deal of optimization without much
>> burden to others.
> 
> Yes, this doable. It can be evaluated after the global generic strategy
> has been verified to work, like NUMA balancing :)
> 

I will run some real-world workloads and get back to you (may take some time)

>>
>> Also with sysctl_llc_aggr_imb, do we really need SCHED_CACHE_WAKE? 
> 
> Do you mean the SCHED_CACHE_WAKE or SCHED_CACHE_LB?
> 

Ah I was thinking sysctl_llc_aggr_imb alone can help reduce overstacking on
target LLC from a few hyperactive wakees (may consider to ratelimit those
wakees as a solution), but just realize this can affect lb as well and doesn't
really reduce overheads from frequent wakeups (no good idea on top of my head
but we should find a better solution than sched_feat to address the overhead issue).



>> Does setting sysctl_llc_aggr_imb to 0 basically say no preference for either LLC, no?
>>
> 
> My understanding is that, if sysctl_llc_aggr_imb is 0, the task aggregation
> might still consider other aspects, like if that target LLC's utilization has
> exceeded 50% or not.
> 

which can be controlled by sysctl_llc_aggr_cap, right? Okay so if both LLCs have
<$(sysctl_llc_aggr_cap)% utilization, should sysctl_llc_aggr_cap be the only
determining factor here barring NUMA balancing?

Libo

> thanks,
> Chenyu> Thanks,
>> Libo
>>
>>> +static enum llc_mig_hint _get_migrate_hint(int src_cpu, int dst_cpu,
>>> +                       unsigned long tsk_util,
>>> +                       bool to_pref)
>>> +{
>>> +    unsigned long src_util, dst_util, src_cap, dst_cap;
>>> +
>>> +    if (cpus_share_cache(src_cpu, dst_cpu))
>>> +        return mig_allow;
>>> +
>>> +    if (!get_llc_stats(src_cpu, &src_util, &src_cap) ||
>>> +        !get_llc_stats(dst_cpu, &dst_util, &dst_cap))
>>> +        return mig_allow;
>>> +
>>> +    if (!fits_llc_capacity(dst_util, dst_cap) &&
>>> +        !fits_llc_capacity(src_util, src_cap))
>>> +        return mig_ignore;
>>> +
>>> +    src_util = src_util < tsk_util ? 0 : src_util - tsk_util;
>>> +    dst_util = dst_util + tsk_util;
>>> +    if (to_pref) {
>>> +        /*
>>> +         * sysctl_llc_aggr_imb is the imbalance allowed between
>>> +         * preferred LLC and non-preferred LLC.
>>> +         * Don't migrate if we will get preferred LLC too
>>> +         * heavily loaded and if the dest is much busier
>>> +         * than the src, in which case migration will
>>> +         * increase the imbalance too much.
>>> +         */
>>> +        if (!fits_llc_capacity(dst_util, dst_cap) &&
>>> +            util_greater(dst_util, src_util))
>>> +            return mig_forbid;
>>> +    } else {
>>> +        /*
>>> +         * Don't migrate if we will leave preferred LLC
>>> +         * too idle, or if this migration leads to the
>>> +         * non-preferred LLC falls within sysctl_aggr_imb percent
>>> +         * of preferred LLC, leading to migration again
>>> +         * back to preferred LLC.
>>> +         */
>>> +        if (fits_llc_capacity(src_util, src_cap) ||
>>> +            !util_greater(src_util, dst_util))
>>> +            return mig_forbid;
>>> +    }
>>> +    return mig_allow;
>>> +}
>>
>>

next prev parent reply	other threads:[~2025-07-08 17:23 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-18 18:27 [RFC patch v3 00/20] Cache aware scheduling Tim Chen
2025-06-18 18:27 ` [RFC patch v3 01/20] sched: Cache aware load-balancing Tim Chen
2025-06-26 12:23   ` Jianyong Wu
2025-06-26 13:32     ` Chen, Yu C
2025-06-27  0:10       ` Tim Chen
2025-06-27  2:13         ` Jianyong Wu
2025-07-03 19:29   ` Shrikanth Hegde
2025-07-04  8:40     ` Chen, Yu C
2025-07-04  8:45       ` Peter Zijlstra
2025-07-04  8:54         ` Shrikanth Hegde
2025-07-07 19:57     ` Tim Chen
2025-06-18 18:27 ` [RFC patch v3 02/20] sched: Several fixes for cache aware scheduling Tim Chen
2025-07-03 19:33   ` Shrikanth Hegde
2025-07-07 21:02     ` Tim Chen
2025-07-08  1:15   ` Libo Chen
2025-07-08  7:54     ` Chen, Yu C
2025-07-08 15:47       ` Libo Chen
2025-06-18 18:27 ` [RFC patch v3 03/20] sched: Avoid task migration within its preferred LLC Tim Chen
2025-06-18 18:27 ` [RFC patch v3 04/20] sched: Avoid calculating the cpumask if the system is overloaded Tim Chen
2025-07-03 19:39   ` Shrikanth Hegde
2025-07-07 14:57     ` Tim Chen
2025-06-18 18:27 ` [RFC patch v3 05/20] sched: Add hysteresis to switch a task's preferred LLC Tim Chen
2025-07-02  6:47   ` Madadi Vineeth Reddy
2025-07-02 21:47     ` Tim Chen
2025-06-18 18:27 ` [RFC patch v3 06/20] sched: Save the per LLC utilization for better cache aware scheduling Tim Chen
2025-06-18 18:27 ` [RFC patch v3 07/20] sched: Add helper function to decide whether to allow " Tim Chen
2025-07-08  0:41   ` Libo Chen
2025-07-08  8:29     ` Chen, Yu C
2025-07-08 17:22       ` Libo Chen [this message]
2025-07-09 14:41         ` Chen, Yu C
2025-07-09 21:31           ` Libo Chen
2025-07-08 21:59     ` Tim Chen
2025-07-09 21:22       ` Libo Chen
2025-06-18 18:27 ` [RFC patch v3 08/20] sched: Set up LLC indexing Tim Chen
2025-07-03 19:44   ` Shrikanth Hegde
2025-07-04  9:36     ` Chen, Yu C
2025-06-18 18:27 ` [RFC patch v3 09/20] sched: Introduce task preferred LLC field Tim Chen
2025-06-18 18:27 ` [RFC patch v3 10/20] sched: Calculate the number of tasks that have LLC preference on a runqueue Tim Chen
2025-07-03 19:45   ` Shrikanth Hegde
2025-07-04 15:00     ` Chen, Yu C
2025-06-18 18:27 ` [RFC patch v3 11/20] sched: Introduce per runqueue task LLC preference counter Tim Chen
2025-06-18 18:28 ` [RFC patch v3 12/20] sched: Calculate the total number of preferred LLC tasks during load balance Tim Chen
2025-06-18 18:28 ` [RFC patch v3 13/20] sched: Tag the sched group as llc_balance if it has tasks prefer other LLC Tim Chen
2025-06-18 18:28 ` [RFC patch v3 14/20] sched: Introduce update_llc_busiest() to deal with groups having preferred LLC tasks Tim Chen
2025-07-03 19:52   ` Shrikanth Hegde
2025-07-05  2:26     ` Chen, Yu C
2025-06-18 18:28 ` [RFC patch v3 15/20] sched: Introduce a new migration_type to track the preferred LLC load balance Tim Chen
2025-06-18 18:28 ` [RFC patch v3 16/20] sched: Consider LLC locality for active balance Tim Chen
2025-06-18 18:28 ` [RFC patch v3 17/20] sched: Consider LLC preference when picking tasks from busiest queue Tim Chen
2025-06-18 18:28 ` [RFC patch v3 18/20] sched: Do not migrate task if it is moving out of its preferred LLC Tim Chen
2025-06-18 18:28 ` [RFC patch v3 19/20] sched: Introduce SCHED_CACHE_LB to control cache aware load balance Tim Chen
2025-06-18 18:28 ` [RFC patch v3 20/20] sched: Introduce SCHED_CACHE_WAKE to control LLC aggregation on wake up Tim Chen
2025-06-19  6:39 ` [RFC patch v3 00/20] Cache aware scheduling Yangyu Chen
2025-06-19 13:21   ` Chen, Yu C
2025-06-19 14:12     ` Yangyu Chen
2025-06-20 19:25 ` Madadi Vineeth Reddy
2025-06-22  0:39   ` Chen, Yu C
2025-06-24 17:47     ` Madadi Vineeth Reddy
2025-06-23 16:45   ` Tim Chen
2025-06-24  5:00 ` K Prateek Nayak
2025-06-24 12:16   ` Chen, Yu C
2025-06-25  4:19     ` K Prateek Nayak
2025-06-25  0:30   ` Tim Chen
2025-06-25  4:30     ` K Prateek Nayak
2025-07-03 20:00   ` Shrikanth Hegde
2025-07-04 10:09     ` Chen, Yu C
2025-07-09 19:39 ` Madadi Vineeth Reddy
2025-07-10  3:33   ` Chen, Yu C

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9e78f54c-f993-4505-814d-b8815f5c6bf0@oracle.com \
    --to=libo.chen@oracle.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=gautham.shenoy@amd.com \
    --cc=hdanton@sina.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tim.c.chen@intel.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vineethr@linux.ibm.com \
    --cc=vschneid@redhat.com \
    --cc=wuyun.abel@bytedance.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).