From: James Morse <james.morse@arm.com>
To: Zeng Heng <zengheng4@huawei.com>,
ben.horgan@arm.com, Dave.Martin@arm.com,
tan.shaopeng@jp.fujitsu.com, reinette.chatre@intel.com,
fenghuay@nvidia.com, tglx@kernel.org, will@kernel.org,
hpa@zytor.com, bp@alien8.de, babu.moger@amd.com,
dave.hansen@linux.intel.com, mingo@redhat.com,
tony.luck@intel.com, gshan@redhat.com, catalin.marinas@arm.com
Cc: linux-arm-kernel@lists.infradead.org, x86@kernel.org,
linux-kernel@vger.kernel.org, wangkefeng.wang@huawei.com
Subject: Re: [PATCH v8 next 00/10] arm_mpam: Introduce Narrow-PARTID feature
Date: Thu, 14 May 2026 18:06:04 +0100 [thread overview]
Message-ID: <ec8bc617-9e74-4749-ab33-39d1079415cc@arm.com> (raw)
In-Reply-To: <20260413085405.1166412-1-zengheng4@huawei.com>
Hi Zeng,
(beware this is the first version I've seen - arm have been silently deleting your mail,
it looks like a problem with DKIM signatures)
On 13/04/2026 09:53, Zeng Heng wrote:
> Background
> ==========
>
> On x86, the resctrl allows creating up to num_rmids monitoring groups
> under parent control group. However, ARM64 MPAM is currently limited by
> the PMG (Performance Monitoring Group) count, which is typically much
> smaller than the theoretical RMID limit.
The MPAM PMG limit is 255. Is that not enough?
I think the real problem is the CHI interconnect protocol is forcing people
to only have 1 bit of PMG - regardless of what the architecture says. This
isn't an MPAM problem as such - its an implementation issue.
(but we can try and work around it)
> This creates a significant
> scalability gap: users expecting fine-grained per-process or per-thread
> monitoring quickly exhaust the PMG space, even when plenty of reqPARTIDs
> remain available.
This is more about MPAM's philosophical stance that PMG extents PARTID, whereas
on x86 RMID is an independent number.
Please don't muddle these - it results in muddled patches!
If we want to try and attack both with narrowing, we should do them separately.
> The Narrow-PARTID feature, defined in the ARM MPAM architecture,
> addresses this by associating reqPARTIDs with intPARTIDs through a
> programmable many-to-one mapping. This allows the kernel to present more
> logical monitoring contexts.
I'd put this as "can be abused to avoid this problem"! We still have a problem with
controls that don't alias and need to be removed from MSC that don't support narrowing.
This isn't what the feature was designed for - but it is a really cool trick, it works
for some real platforms, and solves a problem seen in user-space.
However - throughout this series you seem to be discarding all the control-group support
for a monitoring-only setup that allocates intPARTID for everything. This might work for
your use-case on your platform, but it doesn't generalise to platforms without narrowing
or where multiple control-groups are needed.
> Design Overview
> ===============
>
> The implementation extends the RMID encoding to carry reqPARTID
> information:
>
> RMID = reqPARTID * NUM_PMG + PMG
>
> In this patchset, a monitoring group is uniquely identified by the
> combination of reqPARTID and PMG. The closid is represented by intPARTID,
> which is exactly the original PARTID.
The way I think of this is 'RMID' bits being spilled into PARTID. This
means each control group has a set of PARTID. For MSC using narrowing,
CLOSID would be the intPARTID value. But as you note, we need to support
mismatches:
> For systems with homogeneous MSCs (all supporting Narrow-PARTID), the
> driver exposes the full reqPARTID range directly. For heterogeneous
> systems where some MSCs lack Narrow-PARTID support, the driver utilizes
> PARTIDs beyond the intPARTID range as reqPARTIDs to expand monitoring
> capability. The sole exception is when any type of MSCs lack Narrow-PARTID
> support, their percentage-based control mechanism prevents the use of
> PARTIDs as reqPARTIDs.
It'd be good to have some discussion about what the interface between the
mpam_devices code and any other user (like resctrl) should be.
As a hypothetical system to think about:
64 PARTID at the L3, which support CPOR and CCAP
64 PARTID and narrowing to 16 at the SLC, which supoprts CPOR
64 PARTID and narrowing to 32 at the memory-controller, which support MBWU_MAX
I think whether using intPARTID is a benefit needs to be user-space policy.
You've likely got a platform where that choice is obvious - but it is a
trade-off as you lose the non-aliasing controls. In the example above, using
narrowing on this system means losing the CCAP controls on L3 as they don't alias [*].
Where its a policy, its likely to be one policy for resctrl, and another for any other
user.
We can get the resctrl glue code to turn it on unconditionally if there is no trade off,
I think that means: no non-aliasing controls in any class that doesn't support narrowing
- including 'unknown'. (we couldn't add them to resctrl in the future if you already chose
to enable this).
As for the interface with mpam_devices:
I think this means the resctrl glue code needs to be able to discover which
classes support intPARTID, and how many controls they actually have. From there
it can apply to policy to determine whether its better to support fewer features
in resctrl to get more RMID. (the alternative is always to ignore the MSC with
narrowing - narrowing lets hardware lie about the features it supports).
Currently the resctrl glue code has to program a configuration for two PARTID
when CDP is being hidden on the MB resource. This is ugly and fragile. I'd like
to explore generalising it as this narrowing stuff will also need to apply a
configuration to a set of PARTID when that MSC doesn't support narrowing.
In the example above, we'd need to discard the CCAP controls and write the same
CPOR bitmap to each PARTID that is mapped together by narrowing.
I think this means the resctrl glue code will need to be able to write a configuration
to controls using the full partid_max range as it does today. But also be able to set
the narrowing mapping on classes that support it.
For the monitors, the resctrl glue code will need to allocate and configure a set of
monitors, and read and sum them. This will be regardless of whether narrowing is
supported.
I think this means allocating a table of CLOSID to PARTID(s). the intPARTID would
always match the CLOSID. Monitors and non-narrowing MSC would need to walk the list.
I'm hoping we can make CDP a subset of this problem.
Some clever arithmetic may save allocating memory for a table - but if we change resctrl
to do this dynamically, the numbers become arbitrary forcing it to be a table.
It might also be possible to support moving monitor-groups between control groups with
the table driven approach. (see what you think on how complex it ends up ...)
I'd like to keep that grouping static for now, the table needs creating at setup time,
(+/- CDP), to avoid problems like you've found with CDP. This means the intpartid mappings
can be written once at setup time.
I'd like to avoid exposing user ABI to control this until we get it working, then we can
talk about whether to try making the grouping dynamically managed by resctrl. (there were
some proposals in that area - but I can't find them on lore).
If there are platforms were its certainly not a trade-off, we can enable it
unconditionally - but I'm wary of this being "what we care about now", requiring user-abi
to enable features that were detectable.
e.g. we ignore an unknown MSC, and add a resctrl schema for it later - only we can't
expose it if we were using narrowing. Now its a trade-off.
> Capability Improvements
> =======================
>
> --------------------------------------------------------------------------
> The maximum | Sub-monitoring groups | System-wide
> number of | under a control group | monitoring groups
> --------------------------------------------------------------------------
> Without reqPARTID | PMG | intPARTID * PMG
> --------------------------------------------------------------------------
> reqPARTID | |
> static allocation | (reqPARTID // intPARTID) * PMG | reqPARTID * PMG
> --------------------------------------------------------------------------
> reqPARTID | |
> dynamic allocation | (reqPARTID - intPARTID + 1) * PMG | reqPARTID * PMG
> --------------------------------------------------------------------------
>
> Note: The number of intPARTIDs can be capped via the boot parameter
> mpam.intpartid_max. Under MPAM, reqPARTID count is always greater than
> or equal to intPARTID count.
>
> Series Structure
> ================
>
> Patch 1: Fix pre-existing out-of-range PARTID issue between mount sessions.
> Patches 2-6: Implement static reqPARTID allocation.
> Patches 7-10: Implement dynamic reqPARTID allocation.
I've had a hard time following this series. You dive in with invasive changes, then
unbreak things in later patches.
Please added the needed infrastructure in mpam_devices.c first. This should be free of
resctrl-isms, and 'only' needs reviewing against the architecture.
Then add the resctrl glue code stuff. That needs to comply with what resctrl expects.
I think the cleanest way to think about this is to break the mapping between CLOSID and
PARTID. We're effectively moving bits of RMID out of PMG into PARTID. Adding helpers
to explicitly do this early in those patches will make your changes clearer.
Please avoid spraying the narrowing terms for things everywhere.
Thanks,
James
[*] It's terminology from discussing this with Dave, just in case a summary is needed:
aliasing controls are like CPOR where two different PARTID with the same bitmap
compete for the same resource. If you give them each the same 50% of the portions,
they can't exceed that together.
non-aliasing controls are like CCAP where to different PARTID with the same fraction
compete for different resources. If you give them each 50% of the capacity, it adds
up to 100%. You can't represent 'the same' 50% using these controls.
Narrowing papers over this problem with its remapping table, which gives you a 'same'
property. For MSC that have controls of that shape - and where more monitors are
desired - we'd have to drop the controls.
I think "more monitors are desired" is going to need to be user-space policy. But
we can come back to how to do that later.
prev parent reply other threads:[~2026-05-14 17:06 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-13 8:53 [PATCH v8 next 00/10] arm_mpam: Introduce Narrow-PARTID feature Zeng Heng
2026-04-13 8:53 ` [PATCH v8 next 01/10] fs/resctrl: Fix MPAM Partid parsing errors by preserving CDP state during umount Zeng Heng
2026-05-14 17:06 ` James Morse
2026-04-13 8:53 ` [PATCH v8 next 02/10] arm_mpam: Add intPARTID and reqPARTID support for Narrow-PARTID feature Zeng Heng
2026-05-14 17:06 ` James Morse
2026-04-13 8:53 ` [PATCH v8 next 03/10] arm_mpam: Disable reqPARTID expansion when Narrow-PARTID is unavailable Zeng Heng
2026-05-14 17:06 ` James Morse
2026-04-13 8:53 ` [PATCH v8 next 04/10] arm_mpam: Refactor rmid to reqPARTID/PMG mapping Zeng Heng
2026-05-14 17:07 ` James Morse
2026-04-13 8:54 ` [PATCH v8 next 05/10] arm_mpam: Propagate control group config to sub-monitoring groups Zeng Heng
2026-04-13 8:54 ` [PATCH v8 next 06/10] arm_mpam: Add boot parameter to limit mpam_intpartid_max Zeng Heng
2026-04-13 8:54 ` [PATCH v8 next 07/10] fs/resctrl: Add rmid_entry state helpers Zeng Heng
2026-04-13 8:54 ` [PATCH v8 next 08/10] arm_mpam: Implement dynamic reqPARTID allocation for monitoring groups Zeng Heng
2026-04-13 8:54 ` [PATCH v8 next 09/10] fs/resctrl: Wire up rmid expansion and reclaim functions Zeng Heng
2026-04-13 8:54 ` [PATCH v8 next 10/10] arm_mpam: Add mpam_sync_config() for dynamic rmid expansion Zeng Heng
2026-04-16 6:29 ` [PATCH v8 next 00/10] arm_mpam: Introduce Narrow-PARTID feature Shaopeng Tan (Fujitsu)
2026-04-20 7:31 ` Zeng Heng
2026-04-28 4:20 ` Shaopeng Tan (Fujitsu)
2026-04-29 9:47 ` Zeng Heng
2026-04-29 10:59 ` Zeng Heng
2026-05-14 17:06 ` James Morse [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ec8bc617-9e74-4749-ab33-39d1079415cc@arm.com \
--to=james.morse@arm.com \
--cc=Dave.Martin@arm.com \
--cc=babu.moger@amd.com \
--cc=ben.horgan@arm.com \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=dave.hansen@linux.intel.com \
--cc=fenghuay@nvidia.com \
--cc=gshan@redhat.com \
--cc=hpa@zytor.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=reinette.chatre@intel.com \
--cc=tan.shaopeng@jp.fujitsu.com \
--cc=tglx@kernel.org \
--cc=tony.luck@intel.com \
--cc=wangkefeng.wang@huawei.com \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=zengheng4@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox