Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Shrikanth Hegde <sshegde@linux.ibm.com>
To: Chen Yu <yu.c.chen@intel.com>, kprateek.nayak@amd.com
Cc: srikar@linux.ibm.com, venkat88@linux.ibm.com,
	maddy@linux.ibm.com, riteshh@linux.ibm.com, chleroy@kernel.org,
	tim.c.chen@linux.intel.com, peterz@infradead.org,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-sched@vger.kernel.org
Subject: Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask
Date: Wed, 27 May 2026 12:31:14 +0530	[thread overview]
Message-ID: <d45877d5-86f1-4581-b3e0-59ceb722de3a@linux.ibm.com> (raw)
In-Reply-To: <20260526140856.139657-1-yu.c.chen@intel.com>

Hi Chen, Prateek.

I got back to work today, sorry for delay.
I am trying to go through the mails.
Apologies in case i have missed any bits.

On 5/26/26 7:38 PM, Chen Yu wrote:
> Hi Prateek,
> 
> On Tue, 26 May 2026 11:23:59 +0530, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>> Hello Srikar,
>>
>> On 5/26/2026 10:28 AM, Srikar Dronamraju wrote:
>>> L2 Cache reported here is for SMT8 Core aka CACHE domain.
>>
>> Apart for the scheduler, nothing in tree currently cares about
>> cpu_coregroup_mask() except for drivers/base/arch_topology.c but
>> Power doesn't select GENERIC_ARCH_TOPOLOGY.
>>
>> Why can't Power have an internal mask for MC domain (tl_mc_mask) and
>> the scheduler can use cpu_coregroup_mask() for the actual LLc? (The L2
>> mask in this case.)

This seems wrong. there is no notion that coregroup_mask
(MC domain) has to point at LLC domain.

For example, on Shared LPAR, there is no MC domain and LLC is at SMT core level.
In that case coregroup_mask has point at SMT mask is wrong.

If we need a mask to point to the LLC mask which arch has to return, then we would
need a new api say cpu_llc_mask ? that can point accordingly.

I don't like mixing MC domain and LLC into one bit.

>>
>> Power anyways adds its own topology via set_sched_topology() so the
>> default_topology from kernel/sched/topology.c remains unused.
>>
>> ...
>>
>>> Shouldnt cache-aware scheduling be worried about cpuset partitions too.
>>> If a cpuset has subset of LLC cores, then we should Scheduler assume it can
>>> control complete LLC?
>>
>> Well, the scheduling takes care of partitions and the cache aware
>> scheduling bits take care of looking at the full system perspective
>> for stats aggregation and pointing to a particular LLc.
>>
>> We don't compare llc_id across cpusets so we keeping one unique llc_id
>> per H/W LLC instance is feasible and it enables us to keep llc_id space
>> limited for optimizing cache-aware scheduling.
>>
>> Now if we have threads of same process across partitions, we'll
>> still aggregate the utilization numbers across the full LLC but
>> the load balancers at individual partitions will make a call on
>> the aggregation.
>>
>> -- 
>> Thanks and Regards,
>> Prateek
>>
>>
> 
> I suppose what you suggested looks like below:
> 
> powerpc/smp: make cpu_coregroup_mask() return the LLC
> 
> On pSeries shared LPARs(or coregroup_enabled is false on
> Power9 and earlier) the hemisphere map is not allocated, so
> build_sched_domains() dereferences a NULL cpumask and crashes.
> 
> The generic scheduler expects cpu_coregroup_mask() to span the LLC.
> On powerpc the LLC is the L2. Return cpu_l2_cache_mask() instead of
> the hemisphere map. Use a coregroup_map() helper for the in-file
> hemisphere users, and a powerpc_tl_mc_mask() wrapper for the MC
> sched-domain level.
> 
> Fixes: b5ea300a17e3 ("sched/cache: Make LLC id continuous")
> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
>   arch/powerpc/kernel/smp.c | 35 +++++++++++++++++++++++------------
>   1 file changed, 23 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1040,11 +1040,22 @@ static const struct cpumask *tl_smallcore_smt_mask(struct sched_domain_topology_
>   }
>   #endif
>   
> +static inline struct cpumask *coregroup_map(int cpu)
> +{
> +	return per_cpu(cpu_coregroup_map, cpu);
> +}
> +
>   struct cpumask *cpu_coregroup_mask(int cpu)
>   {
> -	return per_cpu(cpu_coregroup_map, cpu);
> +	return cpu_l2_cache_mask(cpu);
> +}

This looks wrong to me too. In different hardware topologies
there maybe distinction between coregroup and l2 mask.

Let me go through the code and see if there is better way.

> +
> +static const struct cpumask *
> +powerpc_tl_mc_mask(struct sched_domain_topology_level *tl, int cpu)
> +{
> +	return coregroup_map(cpu);
>   }
>   
>   static bool has_coregroup_support(void)
>   {
>   	if (is_shared_processor())
> @@ -1155,7 +1166,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
>   	cpumask_set_cpu(boot_cpuid, cpu_core_mask(boot_cpuid));
>   
>   	if (has_coregroup_support())
> -		cpumask_set_cpu(boot_cpuid, cpu_coregroup_mask(boot_cpuid));
> +		cpumask_set_cpu(boot_cpuid, coregroup_map(boot_cpuid));
>   
>   	init_big_cores();
>   	if (has_big_cores) {
> @@ -1520,8 +1531,8 @@ static void remove_cpu_from_masks(int cpu)
>   		set_cpus_unrelated(cpu, i, cpu_core_mask);
>   
>   	if (has_coregroup_support()) {
> -		for_each_cpu(i, cpu_coregroup_mask(cpu))
> -			set_cpus_unrelated(cpu, i, cpu_coregroup_mask);
> +		for_each_cpu(i, coregroup_map(cpu))
> +			set_cpus_unrelated(cpu, i, coregroup_map);
>   	}
>   }
>   #endif
> @@ -1553,7 +1564,7 @@ static void update_coregroup_mask(int cpu, cpumask_var_t *mask)
>   	if (!*mask) {
>   		/* Assume only siblings are part of this CPU's coregroup */
>   		for_each_cpu(i, submask_fn(cpu))
> -			set_cpus_related(cpu, i, cpu_coregroup_mask);
> +			set_cpus_related(cpu, i, coregroup_map);
>   
>   		return;
>   	}
> @@ -1561,18 +1572,18 @@ static void update_coregroup_mask(int cpu, cpumask_var_t *mask)
>   	cpumask_and(*mask, cpu_online_mask, cpu_node_mask(cpu));
>   
>   	/* Update coregroup mask with all the CPUs that are part of submask */
> -	or_cpumasks_related(cpu, cpu, submask_fn, cpu_coregroup_mask);
> +	or_cpumasks_related(cpu, cpu, submask_fn, coregroup_map);
>   
>   	/* Skip all CPUs already part of coregroup mask */
> -	cpumask_andnot(*mask, *mask, cpu_coregroup_mask(cpu));
> +	cpumask_andnot(*mask, *mask, coregroup_map(cpu));
>   
>   	for_each_cpu(i, *mask) {
>   		/* Skip all CPUs not part of this coregroup */
>   		if (coregroup_id == cpu_to_coregroup_id(i)) {
> -			or_cpumasks_related(cpu, i, submask_fn, cpu_coregroup_mask);
> +			or_cpumasks_related(cpu, i, submask_fn, coregroup_map);
>   			cpumask_andnot(*mask, *mask, submask_fn(i));
>   		} else {
> -			cpumask_andnot(*mask, *mask, cpu_coregroup_mask(i));
> +			cpumask_andnot(*mask, *mask, coregroup_map(i));
>   		}
>   	}
>   }
> @@ -1733,7 +1744,7 @@ static void __init build_sched_topology(void)
>   
>   	if (has_coregroup_support()) {
>   		powerpc_topology[i++] =
> -			SDTL_INIT(tl_mc_mask, powerpc_shared_proc_flags, MC);

I would prefer not do this rename. having tl_mc_mask helps to find the usage across
the codebase.

> +			SDTL_INIT(powerpc_tl_mc_mask, powerpc_shared_proc_flags, MC);
>   	}
>   
>   	powerpc_topology[i++] = SDTL_INIT(tl_pkg_mask, powerpc_shared_proc_flags, PKG);

next prev parent reply	other threads:[~2026-05-27  7:01 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-25 14:07 [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9 Venkat Rao Bagalkote
2026-05-25 15:35 ` Chen, Yu C
2026-05-25 16:16   ` K Prateek Nayak
2026-05-26  3:14     ` Chen, Yu C
2026-05-26  3:14   ` Srikar Dronamraju
2026-05-26  4:08     ` Chen, Yu C
2026-05-26  4:58       ` Srikar Dronamraju
2026-05-26  5:53         ` K Prateek Nayak
2026-05-26 14:08           ` [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask Chen Yu
2026-05-27  7:01             ` Shrikanth Hegde [this message]
2026-05-27 16:05               ` Chen, Yu C
2026-05-27 18:07                 ` Shrikanth Hegde
2026-05-28  4:58                   ` Shrikanth Hegde
2026-05-28  9:12                     ` Chen, Yu C
2026-05-28 10:26                       ` Shrikanth Hegde
2026-05-28 15:54                       ` Srikar Dronamraju
2026-05-28 15:58                   ` Srikar Dronamraju
2026-05-27 16:30               ` K Prateek Nayak
2026-05-26  5:24       ` [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9 Venkat Rao Bagalkote
2026-05-27  7:05         ` Shrikanth Hegde
2026-05-28 16:01           ` Srikar Dronamraju
2026-05-28  6:54 ` Ritesh Harjani
2026-05-28 16:06   ` Srikar Dronamraju
2026-05-28 11:27 ` Shrikanth Hegde
2026-05-28 13:21   ` Chen, Yu C
2026-05-28 15:06   ` Ritesh Harjani
2026-05-28 15:56   ` Srikar Dronamraju
2026-05-28 16:31     ` Shrikanth Hegde
2026-05-28 16:44       ` Srikar Dronamraju
2026-05-29  3:58 ` Shrikanth Hegde
2026-05-29  6:59   ` Venkat Rao Bagalkote

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d45877d5-86f1-4581-b3e0-59ceb722de3a@linux.ibm.com \
    --to=sshegde@linux.ibm.com \
    --cc=chleroy@kernel.org \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-sched@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maddy@linux.ibm.com \
    --cc=peterz@infradead.org \
    --cc=riteshh@linux.ibm.com \
    --cc=srikar@linux.ibm.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=venkat88@linux.ibm.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox