The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* Re: [PATCH v8 next 00/10] arm_mpam: Introduce Narrow-PARTID feature
       [not found] <20260413085405.1166412-1-zengheng4@huawei.com>
@ 2026-05-14 17:06 ` James Morse
       [not found] ` <20260413085405.1166412-2-zengheng4@huawei.com>
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 5+ messages in thread
From: James Morse @ 2026-05-14 17:06 UTC (permalink / raw)
  To: Zeng Heng, ben.horgan, Dave.Martin, tan.shaopeng, reinette.chatre,
	fenghuay, tglx, will, hpa, bp, babu.moger, dave.hansen, mingo,
	tony.luck, gshan, catalin.marinas
  Cc: linux-arm-kernel, x86, linux-kernel, wangkefeng.wang

Hi Zeng,

(beware this is the first version I've seen - arm have been silently deleting your mail,
 it looks like a problem with DKIM signatures)

On 13/04/2026 09:53, Zeng Heng wrote:
> Background
> ==========
> 
> On x86, the resctrl allows creating up to num_rmids monitoring groups
> under parent control group. However, ARM64 MPAM is currently limited by
> the PMG (Performance Monitoring Group) count, which is typically much
> smaller than the theoretical RMID limit.

The MPAM PMG limit is 255. Is that not enough?

I think the real problem is the CHI interconnect protocol is forcing people
to only have 1 bit of PMG - regardless of what the architecture says. This
isn't an MPAM problem as such - its an implementation issue.

(but we can try and work around it)


> This creates a significant
> scalability gap: users expecting fine-grained per-process or per-thread
> monitoring quickly exhaust the PMG space, even when plenty of reqPARTIDs
> remain available.

This is more about MPAM's philosophical stance that PMG extents PARTID, whereas
on x86 RMID is an independent number.

Please don't muddle these - it results in muddled patches!
If we want to try and attack both with narrowing, we should do them separately.


> The Narrow-PARTID feature, defined in the ARM MPAM architecture,
> addresses this by associating reqPARTIDs with intPARTIDs through a
> programmable many-to-one mapping. This allows the kernel to present more
> logical monitoring contexts.

I'd put this as "can be abused to avoid this problem"! We still have a problem with
controls that don't alias and need to be removed from MSC that don't support narrowing.
This isn't what the feature was designed for - but it is a really cool trick, it works
for some real platforms, and solves a problem seen in user-space.

However - throughout this series you seem to be discarding all the control-group support
for a monitoring-only setup that allocates intPARTID for everything. This might work for
your use-case on your platform, but it doesn't generalise to platforms without narrowing
or where multiple control-groups are needed.


> Design Overview
> ===============
> 
> The implementation extends the RMID encoding to carry reqPARTID
> information:
> 
>   RMID = reqPARTID * NUM_PMG + PMG
> 
> In this patchset, a monitoring group is uniquely identified by the
> combination of reqPARTID and PMG. The closid is represented by intPARTID,
> which is exactly the original PARTID.

The way I think of this is 'RMID' bits being spilled into PARTID. This
means each control group has a set of PARTID. For MSC using narrowing,
CLOSID would be the intPARTID value. But as you note, we need to support
mismatches:


> For systems with homogeneous MSCs (all supporting Narrow-PARTID), the
> driver exposes the full reqPARTID range directly. For heterogeneous
> systems where some MSCs lack Narrow-PARTID support, the driver utilizes
> PARTIDs beyond the intPARTID range as reqPARTIDs to expand monitoring
> capability. The sole exception is when any type of MSCs lack Narrow-PARTID
> support, their percentage-based control mechanism prevents the use of
> PARTIDs as reqPARTIDs.

It'd be good to have some discussion about what the interface between the
mpam_devices code and any other user (like resctrl) should be.

As a hypothetical system to think about:
  64 PARTID at the L3, which support CPOR and CCAP
  64 PARTID and narrowing to 16 at the SLC, which supoprts CPOR
  64 PARTID and narrowing to 32 at the memory-controller, which support MBWU_MAX

I think whether using intPARTID is a benefit needs to be user-space policy.
You've likely got a platform where that choice is obvious - but it is a
trade-off as you lose the non-aliasing controls. In the example above, using
narrowing on this system means losing the CCAP controls on L3 as they don't alias [*].
Where its a policy, its likely to be one policy for resctrl, and another for any other
user.
We can get the resctrl glue code to turn it on unconditionally if there is no trade off,
I think that means: no non-aliasing controls in any class that doesn't support narrowing
- including 'unknown'. (we couldn't add them to resctrl in the future if you already chose
to enable this).

As for the interface with mpam_devices:
I think this means the resctrl glue code needs to be able to discover which
classes support intPARTID, and how many controls they actually have. From there
it can apply to policy to determine whether its better to support fewer features
in resctrl to get more RMID. (the alternative is always to ignore the MSC with
narrowing - narrowing lets hardware lie about the features it supports).

Currently the resctrl glue code has to program a configuration for two PARTID
when CDP is being hidden on the MB resource. This is ugly and fragile. I'd like
to explore generalising it as this narrowing stuff will also need to apply a
configuration to a set of PARTID when that MSC doesn't support narrowing.
In the example above, we'd need to discard the CCAP controls and write the same
CPOR bitmap to each PARTID that is mapped together by narrowing.


I think this means the resctrl glue code will need to be able to write a configuration
to controls using the full partid_max range as it does today. But also be able to set
the narrowing mapping on classes that support it.
For the monitors, the resctrl glue code will need to allocate and configure a set of
monitors, and read and sum them. This will be regardless of whether narrowing is
supported.

I think this means allocating a table of CLOSID to PARTID(s). the intPARTID would
always match the CLOSID. Monitors and non-narrowing MSC would need to walk the list.
I'm hoping we can make CDP a subset of this problem.
Some clever arithmetic may save allocating memory for a table - but if we change resctrl
to do this dynamically, the numbers become arbitrary forcing it to be a table.
It might also be possible to support moving monitor-groups between control groups with
the table driven approach. (see what you think on how complex it ends up ...)

I'd like to keep that grouping static for now, the table needs creating at setup time,
(+/- CDP), to avoid problems like you've found with CDP. This means the intpartid mappings
can be written once at setup time.

I'd like to avoid exposing user ABI to control this until we get it working, then we can
talk about whether to try making the grouping dynamically managed by resctrl. (there were
some proposals in that area - but I can't find them on lore).
If there are platforms were its certainly not a trade-off, we can enable it
unconditionally - but I'm wary of this being "what we care about now", requiring user-abi
to enable features that were detectable.
e.g. we ignore an unknown MSC, and add a resctrl schema for it later - only we can't
expose it if we were using narrowing. Now its a trade-off.


> Capability Improvements
> =======================
> 
> --------------------------------------------------------------------------
> The maximum        |  Sub-monitoring groups            | System-wide
> number of          |  under a control group            | monitoring groups
> --------------------------------------------------------------------------
> Without reqPARTID  |  PMG                              | intPARTID * PMG
> --------------------------------------------------------------------------
> reqPARTID          |                                   |
> static allocation  | (reqPARTID // intPARTID) * PMG    | reqPARTID * PMG
> --------------------------------------------------------------------------
> reqPARTID          |                                   |
> dynamic allocation | (reqPARTID - intPARTID + 1) * PMG | reqPARTID * PMG
> --------------------------------------------------------------------------
> 
> Note: The number of intPARTIDs can be capped via the boot parameter
> mpam.intpartid_max. Under MPAM, reqPARTID count is always greater than
> or equal to intPARTID count.
> 
> Series Structure
> ================
> 
> Patch 1: Fix pre-existing out-of-range PARTID issue between mount sessions.
> Patches 2-6: Implement static reqPARTID allocation.
> Patches 7-10: Implement dynamic reqPARTID allocation.

I've had  a hard time following this series. You dive in with invasive changes, then
unbreak things in later patches.

Please added the needed infrastructure in mpam_devices.c first. This should be free of
resctrl-isms, and 'only' needs reviewing against the architecture.

Then add the resctrl glue code stuff. That needs to comply with what resctrl expects.

I think the cleanest way to think about this is to break the mapping between CLOSID and
PARTID. We're effectively moving bits of RMID out of PMG into PARTID. Adding helpers
to explicitly do this early in those patches will make your changes clearer.
Please avoid spraying the narrowing terms for things everywhere.


Thanks,

James


[*] It's terminology from discussing this with Dave, just in case a summary is needed:
  aliasing controls are like CPOR where two different PARTID with the same bitmap
  compete for the same resource. If you give them each the same 50% of the portions,
  they can't exceed that together.
  non-aliasing controls are like CCAP where to different PARTID with the same fraction
  compete for different resources. If you give them each 50% of the capacity, it adds
  up to 100%. You can't represent 'the same' 50% using these controls.

  Narrowing papers over this problem with its remapping table, which gives you a 'same'
  property. For MSC that have controls of that shape - and where more monitors are
  desired - we'd have to drop the controls.

  I think "more monitors are desired" is going to need to be user-space policy. But
  we can come back to how to do that later.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v8 next 01/10] fs/resctrl: Fix MPAM Partid parsing errors by preserving CDP state during umount
       [not found] ` <20260413085405.1166412-2-zengheng4@huawei.com>
@ 2026-05-14 17:06   ` James Morse
  0 siblings, 0 replies; 5+ messages in thread
From: James Morse @ 2026-05-14 17:06 UTC (permalink / raw)
  To: Zeng Heng, ben.horgan, Dave.Martin, tan.shaopeng, reinette.chatre,
	fenghuay, tglx, will, hpa, bp, babu.moger, dave.hansen, mingo,
	tony.luck, gshan, catalin.marinas
  Cc: linux-arm-kernel, x86, linux-kernel, wangkefeng.wang

Hi Zeng,

I think this should be a separate patch as its fixing a problem not adding a feature. It's
not actually relevant to the rest of the series.

On 13/04/2026 09:53, Zeng Heng wrote:
> This patch fixes a pre-existing issue in the resctrl filesystem teardown
> sequence where premature clearing of cdp_enabled could lead to MPAM Partid
> parsing errors.

resctrl changes need to go via tip, which has a bunch of rules about commit messages,
see Documentation/process/maintainer-tip.rst

You end up with a structure describing the current state, e.g:
| When resctrl is umounted it disables CDP,

what the problem is, e.g:
| CLOSID remain in the limbo list, and the mbm monitors continue to be read
| after umount. MPAM changes the meaning of CLOSID when CDP is enabled/disabled,
| resulting in out of bounds accesses.

Then, what you do about it, here you are:
| Throwing away the limbo list on umount.

(I don't suggest you take this wording - its just an example)

"this patch" is a phrase to avoid, acronyms like CLOSID need capitalising, etc.


> The closid to partid conversion logic inherently depends on the global
> cdp_enabled state. However, rdt_disable_ctx() clears this flag early in
> the umount path, while free_rmid() operations will reference after that.
> This creates a window where partid parsing operates with inconsistent CDP
> state, potentially makes monitor reads with wrong partid mapping.
> 
> Additionally, rmid_entry remaining in limbo between mount sessions may
> trigger potential partid out-of-range errors, leading to MPAM fault
> interrupts and subsequent MPAM disablement.

Can you give more details on this. I assume its going from CDP-disable to
enabled, means MPAM doubles the CLOSID from the stale limbo list, making it
out of range.


> Reorder rdt_kill_sb() to delay rdt_disable_ctx() until after
> rmdir_all_sub() and resctrl_fs_teardown() complete. This ensures
> all rmid-related operations finish with correct CDP state.


> Introduce rdt_flush_limbo() to flush and cancel limbo work before the
> filesystem teardown completes.

So, discard the state in the hope we don't need it again.
What happens if the filesystem is mounted again quickly afterwards?
Surely we get noisy bandwidth results for ~minutes afterwards?


> An alternative approach would be to cancel limbo work on umount

Sounds like a move in the right direction - having bits of resctrl still
taking CPU time when its not in use is surprising.

I'd love to eventually remove the limbo worker and have the RMID alloc code
search the limbo list for a clean RMID when a control/monitor group is created.
By deferring the work as late as possible, we do less work overall.


> and restart it on remount with remaked bitmap.
> However, this would require substantial changes in the resctrl layer to
> handle CDP state transitions across mount sessions,

This would be necessary if the limbo timer was stopped on umount too.
It also covers cases where you kexec and re-mount resctrl.

I think this is a good idea. I agree its more work.


> which is beyond the
> scope of the reqpartid feature work this patchset focuses on.

Was it a mistake to include it in this series then?


> The current
> fix addresses the immediate correctness issue with minimal churn.

I'm not a fan of papering over problems in resctrl. Could we do it properly
by rebuilding the limbo list at mount time as you suggested above?



Thanks,

James

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v8 next 02/10] arm_mpam: Add intPARTID and reqPARTID support for Narrow-PARTID feature
       [not found] ` <20260413085405.1166412-3-zengheng4@huawei.com>
@ 2026-05-14 17:06   ` James Morse
  0 siblings, 0 replies; 5+ messages in thread
From: James Morse @ 2026-05-14 17:06 UTC (permalink / raw)
  To: Zeng Heng, ben.horgan, Dave.Martin, tan.shaopeng, reinette.chatre,
	fenghuay, tglx, will, hpa, bp, babu.moger, dave.hansen, mingo,
	tony.luck, gshan, catalin.marinas
  Cc: linux-arm-kernel, x86, linux-kernel, wangkefeng.wang

Hi Zeng,

On 13/04/2026 09:53, Zeng Heng wrote:
> Introduce Narrow-PARTID (partid_nrw) feature support, which enables
> many-to-one mapping of request PARTIDs (reqPARTID) to internal PARTIDs
> (intPARTID). This expands monitoring capability by allowing a single
> control group to track more task types through multiple reqPARTIDs
> per intPARTID, bypassing the PMG limit in some extent.
> 
> intPARTID: Internal PARTID used for control group configuration.
> Configurations are synchronized to all reqPARTIDs mapped to the same
> intPARTID. Count is indicated by MPAMF_PARTID_NRW_IDR.INTPARTID_MAX, or
> defaults to PARTID count if Narrow-PARTID is unsupported.
> 
> reqPARTID: Request PARTID used to expand monitoring groups. Enables
> a single control group to monitor more task types by multiple reqPARTIDs
> within one intPARTID, overcoming the PMG count limitation.
> 
> For systems with homogeneous MSCs (all supporting Narrow-PARTID), the
> driver exposes the full reqPARTID range directly. For heterogeneous
> systems where some MSCs lack Narrow-PARTID support, the driver utilizes
> PARTIDs beyond the intPARTID range as reqPARTIDs to expand monitoring
> capacity.
> 
> So, the numbers of control group and monitoring group are calculated as:
> 
>   n = min(intPARTID, PARTID)  /* the number of control groups */
>   l = min(reqPARTID, PARTID)  /* the number of monitoring groups */
>   m = l // n                  /* monitoring groups per control group */
> 
> Where:
> 
>   intPARTID: intPARTIDs on Narrow-PARTID-capable MSCs
>   reqPARTID: reqPARTIDs on Narrow-PARTID-capable MSCs
>   PARTID:    PARTIDs on non-Narrow-PARTID-capable MSCs
> 
> Example: L3 cache (256 PARTIDs, without Narrow-PARTID feature) +
>          MATA (32 intPARTIDs, 256 reqPARTIDs):
> 
>   n = min( 32, 256) =  32 intPARTIDs
>   l = min(256, 256) = 256 reqPARTIDs
>   m = 256 / 32 = 8 reqPARTIDs per intPARTID
> 
> Implementation notes:
>   * Handle mixed MSC systems (some support Narrow-PARTID, some don't) by
>     taking minimum number of intPARTIDs across all MSCs.
>   * resctrl_arch_get_num_closid() now returns the number of intPARTIDs
>     (was PARTID).

What you're doing here is making intPARTID the fundamental unit in MPAM. I
don't think we should do this as its not true in the architecture: narrowing
is a single, optional feature. We have platforms that don't support narrowing
at all - so having to think in terms of "what is the intPARTID limit on this
platform that doesn't have the feature" is confusing.
Narrowing doesn't affect the monitoring, so you can't just string-replace the
driver.

The resctrl glue code is going to have to know about narrowing as it must
either duplicate the control values for aliasing PARTID, or remap them using
narrowing.

I'd prefer it if the mpam_devices code exposed an API to make use of narrowing
that makes sense given the MPAM architecture. (e.g. narrowing is optional!)
Whetever resctrl needs should then be built on top of that.


Currently the MAX PARTID/PMG are dealt with separately as we need a global value
at the end. It was done separately as it could be a patch on its own, to try and
keep each patch reviewable.
But it should probably have been dealt with the same way we deal with all MSC
features - stash them in struct mpam_msc_ris at hw_probe time, and combine them
up to the class level handling different values with __props_mismatch().
The global state that includes the requestors can be created after that point.

I think we should do this now to add intpartid_max to struct mpam_props so that the
resctrl glue code can find the intpartid_max per class. I don't think it makes sense as
a global property. (we should move partid_max and pmg_max at the same time)

... I think moving partid_max would just be cleanup. The resctrl glue code needs to
know the maximum PARTID for monitoring, but I think this would always be the global
PARTID max value.


> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 41b14344b16f..cf067bf5092e 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -63,6 +63,7 @@ static DEFINE_MUTEX(mpam_cpuhp_state_lock);
>   * Generating traffic outside this range will result in screaming interrupts.
>   */
>  u16 mpam_partid_max;
> +u16 mpam_intpartid_max;
>  u8 mpam_pmg_max;
>  static bool partid_max_init, partid_max_published;
>  static DEFINE_SPINLOCK(partid_max_lock);

The global properties are supposed to mean "if you generate any traffic outside
this range, you'll get an out of range error causing mpam_disable()".

This doesn't hold for intPARTID.

In my hypothetical system from the cover letter: {
  64 PARTID at the L3
  64 PARTID and narrowing to 16 at the SLC
  64 PARTID and narrowing to 32 at the memory-controller

  The resctrl glue code could ignore the SLC if it wanted to use 32 PARTID, and just
  duplicate the aliasing controls at L3. (and remove the non-aliasing controls)
}


> @@ -2743,9 +2749,13 @@ static void mpam_enable_once(void)
>  	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline,
>  				      "mpam:online");
>  
> -	/* Use printk() to avoid the pr_fmt adding the function name. */
> -	printk(KERN_INFO "MPAM enabled with %u PARTIDs and %u PMGs\n",
> -	       mpam_partid_max + 1, mpam_pmg_max + 1);
> +	if (mpam_partid_max == mpam_intpartid_max)
> +		/* Use printk() to avoid the pr_fmt adding the function name. */
> +		printk(KERN_INFO "MPAM enabled with %u PARTIDs and %u PMGs\n",
> +		       mpam_partid_max + 1, mpam_pmg_max + 1);
> +	else
> +		printk(KERN_INFO "MPAM enabled with %u reqPARTIDs, %u intPARTIDs and %u PMGs\n",
> +		       mpam_partid_max + 1, mpam_intpartid_max + 1, mpam_pmg_max + 1);

intPARTID is not a global property. It's also an optional feature, so its problematic
to print this on platforms that don't have the feature.


>  }
>  
>  static void mpam_reset_component_locked(struct mpam_component *comp)
>  
>  u32 resctrl_arch_system_num_rmid_idx(void)


Thanks,

James

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v8 next 03/10] arm_mpam: Disable reqPARTID expansion when Narrow-PARTID is unavailable
       [not found] ` <20260413085405.1166412-4-zengheng4@huawei.com>
@ 2026-05-14 17:06   ` James Morse
  0 siblings, 0 replies; 5+ messages in thread
From: James Morse @ 2026-05-14 17:06 UTC (permalink / raw)
  To: Zeng Heng, ben.horgan, Dave.Martin, tan.shaopeng, reinette.chatre,
	fenghuay, tglx, will, hpa, bp, babu.moger, dave.hansen, mingo,
	tony.luck, gshan, catalin.marinas
  Cc: linux-arm-kernel, x86, linux-kernel, wangkefeng.wang

Hi Zeng,

On 13/04/2026 09:53, Zeng Heng wrote:
> MPAM supports heterogeneous systems where some type of MSCs may implement
> Narrow-PARTID while others do not. However, when an MSC uses
> percentage-based throttling (non-bitmap partition control) and lacks
> Narrow-PARTID support, resctrl cannot correctly apply control group
> configurations across multiple PARTIDs.
> 
> To enable free assignment of multiple reqPARTIDs to resource control
> groups, all MSCs used by resctrl must either: Implement Narrow-PARTID,
> allowing explicit PARTID remapping, or only have stateless resource
> controls (non-percentage-based), such that splitting a control group
> across multiple PARTIDs does not affect behavior.

I prefer Dave's terminology for this: aliasing and non-aliasing. It implies
there are two controls, which stateless does not.


> The detection occurs at initialization time on the first call to
> get_num_reqpartid() from update_rmid_limits(). This call is guaranteed
> to occur after mpam_resctrl_pick_{mba,caches}() have set up the
> resource classes, ensuring the necessary properties are available
> for the Narrow-PARTID capability check.
> 
> When an MSC with percentage-based control lacks Narrow-PARTID support,
> get_num_reqpartid() falls back to returning the number of intPARTIDs,
> effectively disabling the reqPARTID expansion for monitoring groups.

No MSC has percentage based controls - that's an x86ism. The MSCs have
fixed point fractions, bitmaps or a cost/weight.


I think you're thinking about this the wrong way up - we should only enable
this on a small number of platforms that don't have any controls we'd have to discard.
(hopefully yours is such a platform!)

I don't think this should be added to resctrl_arch_system_num_rmid_idx(). Please make
this decision for resctrl at mpam_resctrl_setup() time.


> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> index 5f4364c8101a..56859f354efa 100644
> --- a/drivers/resctrl/mpam_resctrl.c
> +++ b/drivers/resctrl/mpam_resctrl.c
> @@ -257,9 +257,50 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *ignored)
>  	return mpam_intpartid_max + 1;
>  }
>  
> +/*
> + * Determine the effective number of PARTIDs available for resctrl.
> + *
> + * This function performs a one-time check to determine if Narrow-PARTID
> + * can be used. It must be called after mpam_resctrl_pick_{mba,caches}()
> + * have initialized the resource classes, as class properties are used
> + * to detect Narrow-PARTID support.

> + * The first call occurs in update_rmid_limits(), ensuring the
> + * prerequisite initialization is complete.

This is fragile to changes in the order resctrl makes these calls. We need these
properties to be fixed before we call resctrl_init().

(yes - I think CDP is fragile too!)


> + */
> +static u32 get_num_reqpartid(void)
> +{
> +	struct mpam_resctrl_res *res;
> +	struct mpam_props *cprops;
> +	static bool first = true;
> +	int rid;
> +
> +	if (first) {
> +		for_each_mpam_resctrl_control(res, rid) {
> +			if (!res->class)
> +				continue;
> +
> +			cprops = &res->class->props;
> +			if (mpam_has_feature(mpam_feat_partid_nrw, cprops))
> +				continue;


> +			if (mpam_has_feature(mpam_feat_mbw_max, cprops) ||
> +			    mpam_has_feature(mpam_feat_mbw_min, cprops) ||
> +			    mpam_has_feature(mpam_feat_cmax_cmax, cprops) ||
> +			    mpam_has_feature(mpam_feat_cmax_cmin, cprops)) {

Please make this a helper in mpam_internal.h with 'controls' and 'aliasing' in its name.
(maybe has_aliasing_controls()).

What about the priority for PRI and the proportional-stride?

I don't think proportional-stride aliases properly: if I have groups with stride 1 and 2,
I can't add a second '2' without decreasing the first groups stride from 1/3 to 1/5. If I
halve the second groups, they each get half the bandwidth instead of sharing it.

Can you check whether the priority for PRI aliases?


> +				mpam_partid_max = mpam_intpartid_max;
> +				break;
> +			}
> +		}
> +	}
> +
> +	first = false;
> +	return mpam_partid_max + 1;
> +}
> +
>  u32 resctrl_arch_system_num_rmid_idx(void)
>  {
> -	return (mpam_pmg_max + 1) * (mpam_partid_max + 1);
> +	return (mpam_pmg_max + 1) * get_num_reqpartid();
>  }


Thanks,

James

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v8 next 04/10] arm_mpam: Refactor rmid to reqPARTID/PMG mapping
       [not found] ` <20260413085405.1166412-5-zengheng4@huawei.com>
@ 2026-05-14 17:07   ` James Morse
  0 siblings, 0 replies; 5+ messages in thread
From: James Morse @ 2026-05-14 17:07 UTC (permalink / raw)
  To: Zeng Heng, ben.horgan, Dave.Martin, tan.shaopeng, reinette.chatre,
	fenghuay, tglx, will, hpa, bp, babu.moger, dave.hansen, mingo,
	tony.luck, gshan, catalin.marinas
  Cc: linux-arm-kernel, x86, linux-kernel, wangkefeng.wang

Hi Zeng,

On 13/04/2026 09:53, Zeng Heng wrote:
> The Narrow-PARTID feature allows the MPAM driver to statically or
> dynamically allocate request PARTIDs (reqPARTIDs) to internal

(I think we should leave anything dynamic here for the further future,
 lets get static partitioning working first)


> PARTIDs (intPARTIDs). This enables expanding the number of monitoring
> groups beyond the hardware PMG limit.

> For systems with mixed MSCs (Memory System Components), MSCs that do
> not support Narrow-PARTID use PARTIDs exceeding the minimum number of
> intPARTIDs as reqPARTIDs to expand monitoring groups.
> 
> Expand RMID to include reqPARTID information:
> 
>   RMID = reqPARTID * NUM_PMG + PMG
> 
> To maintain compatibility with the existing resctrl layer, reqPARTIDs
> are allocated statically with a linear mapping to intPARTIDs via
> req2intpartid().
> 
> Mapping relationships (n = intPARTID count, m = reqPARTIDs per intPARTID):
> 
> P - Partition group
> M - Monitoring group
> 
> Group  closid rmid.reqPARTID  MSCs w/ Narrow-PARTID  MSCs w/o Narrow-PARTID
> P1     0                      intPARTID_1            PARTID_1
> M1_1   0      0               ├── reqPARTID_1_1      ├── PARTID_1
> M1_2   0      0+n             ├── reqPARTID_1_2      ├── PARTID_1_2
> M1_3   0      0+n*2           ├── reqPARTID_1_3      ├── PARTID_1_3
> ...                           ├── ...                ├── ...
> M1_m   0      0+n*(m-1)       └── reqPARTID_1_m      └── PARTID_1_m
> 
> P2     1                      intPARTID_2            PARTID_2
> M2_1   1      1               ├── reqPARTID_2_1      ├── PARTID_2
> M2_2   1      1+n             ├── reqPARTID_2_2      ├── PARTID_2_2
> M2_3   1      1+n*2           ├── reqPARTID_2_3      ├── PARTID_2_3
> ...                           ├── ...                ├── ...
> M2_m   1      1+n*(m-1)       └── reqPARTID_2_m      └── PARTID_2_m
> 
> Pn     n-1                    intPARTID_n            PARTID_n
> Mn_1   n-1    n-1             ├── reqPARTID_n_1      ├── PARTID_n
> Mn_2   n-1    n-1+n           ├── reqPARTID_n_2      ├── PARTID_n_2
> Mn_3   n-1    n-1+n*2         ├── reqPARTID_n_3      ├── PARTID_n_3
> ...                           ├── ...                ├── ...
> Mn_m   n-1    n*m-1           └── reqPARTID_n_m      └── PARTID_n_m

This table doesn't appear to say anything - you've got 'req' in the column
with narrowing .. that's about it.

Having an example is good, but I think it needs actual example values.
e.g.

my_control_group has CLOSID 0x8 using RMID 0.
It has two monitor groups with RMID 1 and 2.

mpam_partid_closid_shift is set to 1.

my_control_group uses two PARTID 0x10 and 0x11, both with PMG 0.
One monitor group uses PARTID 0x10 and PMG 1, the other uses PARTID 0x11
and PMG 1.


> Refactor the glue layer between resctrl abstractions (rmid) and MPAM
> hardware registers (reqPARTID/PMG) to support Narrow-PARTID. The resctrl
> layer uses rmid2reqpartid() and rmid2pmg() to extract components from
> rmid. The closid-to-intPARTID translation remains unchanged via
> resctrl_get_config_index().
> 
> Since Narrow-PARTID is a monitoring enhancement,

Please don't imply this is what the narrowing feature is for! We're abusing it.
The evidence its abuse is this breaks all the non-aliasing controls, meaning we
have to get rid of them.


> reqPARTID is only
> used in monitoring paths while configuration paths maintain the original
> semantics of closid.

I can't work out what this means. 'reqPARTID' is a term the architecture only uses
in the context of narrowing, which doesn't apply to monitors. Controls need to either
use the intPARTID or the set of PARTID. Monitors always need to use the set of PARTID.

I don't think we should use the terms intPARTID or reqPARTID everywhere - the spec
doesn't, and this thing is an optional feature.


> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
> index 70d396e7b6da..40d60b0ab0fd 100644
> --- a/arch/arm64/include/asm/mpam.h
> +++ b/arch/arm64/include/asm/mpam.h

It shouldn't be necesary to touch the arch code's header file!


> @@ -63,6 +63,15 @@ static inline void mpam_set_task_partid_pmg(struct task_struct *tsk,
>  	WRITE_ONCE(task_thread_info(tsk)->mpam_partid_pmg, regval);
>  }
>  
> +u16 req2intpartid(u16 reqpartid);

Please don't add a dependency on something in a driver in the context-switch path.
This stuff needs to be standalone.


> +static inline u16 mpam_get_regval_partid(u64 regval)
> +{
> +	u16 reqpartid = (regval & MPAMSM_EL1_PARTID_D_MASK) >> MPAMSM_EL1_PARTID_D_SHIFT;
> +
> +	return req2intpartid(reqpartid);
> +}
> +
>  static inline void mpam_thread_switch(struct task_struct *tsk)
>  {
>  	u64 oldregval;
> @@ -72,7 +81,8 @@ static inline void mpam_thread_switch(struct task_struct *tsk)
>  	if (!static_branch_likely(&mpam_enabled))
>  		return;
>  
> -	if (regval == READ_ONCE(arm64_mpam_global_default))
> +	if (regval == READ_ONCE(arm64_mpam_global_default) ||
> +	   !mpam_get_regval_partid(regval))
>  		regval = READ_ONCE(per_cpu(arm64_mpam_default, cpu));
>  
>  	oldregval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));

Why can't the resctrl glue code mess with the PARTID value for task using the existing
arch<->MPAM API? resctrl_arch_set_closid_rmid() already provides arbitrary values.
Please fix them up before they get stuffed in struct thread_info.


> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> index 56859f354efa..9d0a7a4dffd1 100644
> --- a/drivers/resctrl/mpam_resctrl.c
> +++ b/drivers/resctrl/mpam_resctrl.c
> @@ -303,15 +303,60 @@ u32 resctrl_arch_system_num_rmid_idx(void)
>  	return (mpam_pmg_max + 1) * get_num_reqpartid();
>  }
>  
> +static u16 rmid2reqpartid(u32 rmid)

Surely you need the CLOSID as well to generate something you are going to use as a PARTID.
How can this work if you have multiple control groups?


> +{
> +	rmid /= (mpam_pmg_max + 1);
> +
> +	if (cdp_enabled)
> +		return resctrl_get_config_index(rmid, CDP_DATA);
> +
> +	return resctrl_get_config_index(rmid, CDP_NONE);
> +}
> +
> +static u8 rmid2pmg(u32 rmid)
> +{
> +	return rmid % (mpam_pmg_max + 1);
> +}
> +
> +u16 req2intpartid(u16 reqpartid)
> +{
> +	return reqpartid % (mpam_intpartid_max + 1);
> +}

As before, I'd prefer we don't call everything intPARTID or reqPARTID - not every
platform has narrowing. This makes it harder to think about the simpler platforms.

'mpam_intpartid_max' shouldn't exist as a global property from the mpam_devices
code - but it should be a mask chosen by the resctrl glue code at setup time.

Something like this psuedocode:
--------%<--------
u64 mpam_partid_closid_shift;

static partid_set_idx_t convert_closid_any_rmid_cdp_safe(struct mpam_resctrl_res *res,
							 enum resctrl_conf_type cdp_type,
							 u32 closid)
{
	int i;
	u16 partid_base;
	partid_set_idx_t partid_set = { 0 };
	u16 num_groups = 1 << mpam_partid_closid_shift;

	if (mpam_resctrl_hide_cdp(r->rid))
		cdp_type = CDP_CODE;

	partid_base = resctrl_get_config_index(closid, cdp_type);
	partid_base <<= mpam_partid_closid_shift;

	if (res->using_narrowing) {
		u16 intpartid = closid;
		__set_bit(intpartid, partid_set);
		return partid_set;
	}

	for (i = 0; i < num_groups; i++)
		__set_bit(partid_base + i, partid_set);

	return partid_set;
}


static partid_set_idx_t mpam_resctrl_convert_closid_any_rmid(struct mpam_resctrl_res *res,
							 enum resctrl_conf_type cdp_type,
							 u32 closid)
{
	int i;
	u16 partid_base;
	partid_set_idx_t partid_set = { 0 };
	u16 num_groups = 1 << mpam_partid_closid_shift;

	if (mpam_resctrl_hide_cdp(r->rid))
		cdp_type = CDP_CODE;
	partid_set = mpam_resctrl_convert_closid_any_rmid_cdp_safe(res, cdp_type, closid);

	if (mpam_resctrl_hide_cdp(r->rid)) {
		cdp_type = CDP_DATA;
		partid_set |= mpam_resctrl_convert_closid_any_rmid_cdp_safe(res, cdp_type, closid);
	}

	return partid_set;
}
--------%<--------

This is pretending we can deal with a set of PARTID as a bitmap - which might not work
well if we have to allocate memory all over the place. The table description I put in the
cover-letter is an alternative - pointing at a pre-computed set: but that probably won't
work well for CDP. A hybrid of the two might be the something to try.

The reason I prefer this is CDP and narrowing become a subset of the same problem: dealing
with a set of PARTID. This function ends up complicated, but its callers all use it in the
same way - and can ignore CDP and the narrowing support is simple. e.g:

--------%<--------
	if (res->using_narrowing) {
		mpam_set_feature(mpam_feat_narrowing, &cfg);
--------%<--------


> +/*
> + * To avoid the reuse of rmid across multiple control groups, check
> + * the incoming closid to prevent rmid from being reallocated by
> + * resctrl_find_free_rmid().
> + *
> + * If the closid and rmid do not match upon inspection, immediately
> + * returns an invalid rmid. A valid rmid must not exceed 24 bits.
> + */

This will cause an out of range set_bit() in add_rmid_to_limbo().
This function must accept the CLOSID/RMID range it advertised to resctrl.

Can you expand on the problem here?


I suspect its because in systems doing this you've accidentally decoupled the
RMID from CLOSID, meaning CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID should be false.

We might want to do this decoupling as part of making this something that resctrl
can trigger dynamically. (e.g. rescrl_arch_grow_closid() that allocates another PARTID
behind the scenes). But this would need resctrl changes - at least to make the above
KConfig something that can be changed at runtime.

I'd prefer we get the static grouping working first, before we work out what needs
changing in resctrl to do it dynamically.


>  u32 resctrl_arch_rmid_idx_encode(u32 closid, u32 rmid)
>  {
> -	return closid * (mpam_pmg_max + 1) + rmid;
> +	u32 reqpartid = rmid2reqpartid(rmid);
> +	u32 intpartid = req2intpartid(reqpartid);
> +
> +	if (cdp_enabled)
> +		intpartid >>= 1;
> +
> +	if (closid != intpartid)
> +		return U32_MAX;

This will cause out of range accesses. This function must accept the full range
of CLOSID/RMID that were advertised to resctrl. It can't return an error.


> +	return rmid;
>  }
>  
>  void resctrl_arch_rmid_idx_decode(u32 idx, u32 *closid, u32 *rmid)
>  {
> -	*closid = idx / (mpam_pmg_max + 1);
> -	*rmid = idx % (mpam_pmg_max + 1);
> +	u32 reqpartid = rmid2reqpartid(idx);
> +	u32 intpartid = req2intpartid(reqpartid);
> +
> +	if (rmid)
> +		*rmid = idx;
> +	if (closid) {
> +		if (cdp_enabled)
> +			intpartid >>= 1;
> +		*closid = intpartid;
> +	}
>  }

These idx encode/decode would remain simple if they worked on a mask/shift
that was calculated at setup time. Currently that is all hidden behind
your helpers.

We shouldn't have to deal with CDP in here because the 'idx' is for resctrl's
monitoring code - which can't monitor the two halfs of CDP separately. That's an mpam-ism.


>  void resctrl_arch_sched_in(struct task_struct *tsk)
> @@ -323,21 +368,17 @@ void resctrl_arch_sched_in(struct task_struct *tsk)
>  
>  void resctrl_arch_set_cpu_default_closid_rmid(int cpu, u32 closid, u32 rmid)
>  {
> -	WARN_ON_ONCE(closid > U16_MAX);
> -	WARN_ON_ONCE(rmid > U8_MAX);
> +	u32 reqpartid = rmid2reqpartid(rmid);
> +	u8 pmg = rmid2pmg(rmid);
>  
> -	if (!cdp_enabled) {
> -		mpam_set_cpu_defaults(cpu, closid, closid, rmid, rmid);
> -	} else {
> +	if (!cdp_enabled)
> +		mpam_set_cpu_defaults(cpu, reqpartid, reqpartid, pmg, pmg);
> +	else
>  		/*
>  		 * When CDP is enabled, resctrl halves the closid range and we
>  		 * use odd/even partid for one closid.
>  		 */
> -		u32 partid_d = resctrl_get_config_index(closid, CDP_DATA);
> -		u32 partid_i = resctrl_get_config_index(closid, CDP_CODE);
> -
> -		mpam_set_cpu_defaults(cpu, partid_d, partid_i, rmid, rmid);
> -	}
> +		mpam_set_cpu_defaults(cpu, reqpartid, reqpartid + 1, pmg, pmg);
>  }

Please don't '+1' the parameters like this. How do we know these are the right way
round? What if resctrl_get_config_index() chagnes which way round they are?

We should try and make the code easy to review and maintain.
Optimising things like this is for the compiler.


I'd like it to be clear from this function that one of the set of PARTID is being
picked. I think this falls out of the (hidden) shifting here, but its not at all
obivious.

You completely discard the PARTID here. I don't think that works if you have multiple
control groups.

[snip]

> @@ -478,6 +518,7 @@ static int __read_mon(struct mpam_resctrl_mon *mon, struct mpam_component *mon_c
>  		      enum resctrl_conf_type cdp_type, u32 closid, u32 rmid, u64 *val)
>  {
>  	struct mon_cfg cfg;
> +	u32 reqpartid = rmid2reqpartid(rmid);
>  
>  	if (!mpam_is_enabled())
>  		return -EINVAL;
> @@ -493,8 +534,8 @@ static int __read_mon(struct mpam_resctrl_mon *mon, struct mpam_component *mon_c
>  	cfg = (struct mon_cfg) {
>  		.mon = mon_idx,
>  		.match_pmg = true,
> -		.partid = closid,
> -		.pmg = rmid,
> +		.partid = (cdp_type == CDP_CODE) ? reqpartid + 1 : reqpartid,
> +		.pmg = rmid2pmg(rmid),

Not using the CLOSID here breaks multiple control groups.

>  	};
>  
>  	return mpam_msmon_read(mon_comp, &cfg, mon_type, val);


James

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-05-14 17:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260413085405.1166412-1-zengheng4@huawei.com>
2026-05-14 17:06 ` [PATCH v8 next 00/10] arm_mpam: Introduce Narrow-PARTID feature James Morse
     [not found] ` <20260413085405.1166412-2-zengheng4@huawei.com>
2026-05-14 17:06   ` [PATCH v8 next 01/10] fs/resctrl: Fix MPAM Partid parsing errors by preserving CDP state during umount James Morse
     [not found] ` <20260413085405.1166412-3-zengheng4@huawei.com>
2026-05-14 17:06   ` [PATCH v8 next 02/10] arm_mpam: Add intPARTID and reqPARTID support for Narrow-PARTID feature James Morse
     [not found] ` <20260413085405.1166412-4-zengheng4@huawei.com>
2026-05-14 17:06   ` [PATCH v8 next 03/10] arm_mpam: Disable reqPARTID expansion when Narrow-PARTID is unavailable James Morse
     [not found] ` <20260413085405.1166412-5-zengheng4@huawei.com>
2026-05-14 17:07   ` [PATCH v8 next 04/10] arm_mpam: Refactor rmid to reqPARTID/PMG mapping James Morse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox