* Re: [PATCH v7 11/14] mpam,x86/resctrl: Resolve INTEL_PMT_TELEMETRY symbols at runtime
From: Luck, Tony @ 2026-06-12 18:04 UTC (permalink / raw)
To: Reinette Chatre
Cc: Fenghua Yu, Wieczor-Retman, Maciej, Peter Newman, James Morse,
Babu Moger, Drew Fustini, Dave Martin, Chen, Yu C, Box, David E,
x86@kernel.org, Christoph Hellwig, linux-kernel@vger.kernel.org,
patches@lists.linux.dev
In-Reply-To: <ais2ROlQe9JqsN3w@agluck-desk3>
On Thu, Jun 11, 2026 at 03:27:16PM -0700, Luck, Tony wrote:
> On Thu, Jun 11, 2026 at 02:22:23PM -0700, Reinette Chatre wrote:
> > Hi Tony,
> >
> > On 6/11/26 11:01 AM, Luck, Tony wrote:
> > > With renamed variable, and comment. Is this clearer?
> > >
> > >
> > > /*
> > > * intel_aet_register_enumeration() provides the value of "THIS_MODULE"
> > > * from the the pmt_telemetry module. Note that this is NULL when
> > > * CONFIG_INTEL_PMT_TELEMETRY=y so it serves as a flag to decide whether
> > > * try_module_get() and module_put() need to be called.
> > > */
> > > static struct module *pmt_is_a_module;
> > >
> >
> > Using a single variable to contain both data and state has caused problem in
> > the past. Could it be helpful to create two utilities, for example get_pmt()
> > and put_pmt() although naming is not at issue at the moment, that contains
> > and makes obvious these "is INTEL_PMT_TELEMETRY a module or not" checks? These would
> > then be no-op when INTEL_PMT_TELEMETRY=y. This would simplify the code in the
> > callers by containing this logic and not rely on callers doing this check so
> > many times.
>
> I can code this up to see how it looks.
Eye-strain "if (...)" are gone. Code is now:
--- internal.h ---
#ifdef CONFIG_INTEL_PMT_TELEMETRY_MODULE
static inline bool intel_aet_try_module_get(struct module *mod)
{
return try_module_get(mod);
}
static inline void intel_aet_module_put(struct module *mod)
{
module_put(mod);
}
#else
static inline bool intel_aet_try_module_get(struct module *mod) { return true; }
static inline void intel_aet_module_put(struct module *mod) { }
#endif
--- intel_aet.c ---
bool intel_aet_pre_mount(void)
{
bool ret;
guard(mutex)(&aet_register_lock);
if (!get_feature || !put_feature)
return false;
if (!intel_aet_try_module_get(pmt_module))
return false;
ret = aet_get_events();
if (!ret)
intel_aet_module_put(pmt_module);
else
pmt_in_use = true;
return ret;
}
void intel_aet_unmount(void)
{
struct event_group **peg;
guard(mutex)(&aet_register_lock);
if (!pmt_in_use)
return;
for_each_event_group(peg) {
struct event_group *e = *peg;
if (e->pfg) {
for (int j = 0; j < e->num_events; j++)
resctrl_disable_mon_event(e->evts[j].id);
put_feature(e->pfg);
e->pfg = NULL;
}
}
intel_aet_module_put(pmt_module);
pmt_in_use = false;
}
-Tony
^ permalink raw reply
* Re: [PATCH 0/5] Improve the invalidation path in AMD
From: Jason Gunthorpe @ 2026-06-12 12:27 UTC (permalink / raw)
To: Joerg Roedel
Cc: Srivastava, Dheeraj Kumar, iommu, Robin Murphy,
Suravee Suthikulpanit, Will Deacon, patches
In-Reply-To: <aiu0kMeaSNIUKwJH@8bytes.org>
On Fri, Jun 12, 2026 at 09:48:38AM +0200, Joerg Roedel wrote:
> On Thu, Jun 11, 2026 at 01:33:17PM -0300, Jason Gunthorpe wrote:
> > I understood he delegated this to Dheerja and this email from him is
> > confirming the regression tests?
> >
> > https://lore.kernel.org/all/83b8a0f3-2409-464d-a9e4-59b9eea926cd@amd.com/
> >
> > Vasant reviewed it here:
> >
> > https://lore.kernel.org/all/4cc8368d-3972-47c6-bd60-7fc8b9c7be3e@amd.com/
>
> Alright, I applied it, but there were some conflicts due to
>
> 69fe699afe1a iommu/amd: Don't split flush for amd_iommu_domain_flush_all()
>
> in the amd/amd-vi branch. I think I resolved them, but can you please
> double-check once the tree is pushed?
It looks right to me thanks
Jason
^ permalink raw reply
* Re: [PATCH 0/5] Improve the invalidation path in AMD
From: Joerg Roedel @ 2026-06-12 7:48 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Srivastava, Dheeraj Kumar, iommu, Robin Murphy,
Suravee Suthikulpanit, Will Deacon, patches
In-Reply-To: <20260611163317.GA1453429@ziepe.ca>
On Thu, Jun 11, 2026 at 01:33:17PM -0300, Jason Gunthorpe wrote:
> I understood he delegated this to Dheerja and this email from him is
> confirming the regression tests?
>
> https://lore.kernel.org/all/83b8a0f3-2409-464d-a9e4-59b9eea926cd@amd.com/
>
> Vasant reviewed it here:
>
> https://lore.kernel.org/all/4cc8368d-3972-47c6-bd60-7fc8b9c7be3e@amd.com/
Alright, I applied it, but there were some conflicts due to
69fe699afe1a iommu/amd: Don't split flush for amd_iommu_domain_flush_all()
in the amd/amd-vi branch. I think I resolved them, but can you please
double-check once the tree is pushed?
Thanks,
Joerg
^ permalink raw reply
* Re: [PATCH v7 11/14] mpam,x86/resctrl: Resolve INTEL_PMT_TELEMETRY symbols at runtime
From: Luck, Tony @ 2026-06-11 22:27 UTC (permalink / raw)
To: Reinette Chatre
Cc: Fenghua Yu, Wieczor-Retman, Maciej, Peter Newman, James Morse,
Babu Moger, Drew Fustini, Dave Martin, Chen, Yu C, Box, David E,
x86@kernel.org, Christoph Hellwig, linux-kernel@vger.kernel.org,
patches@lists.linux.dev
In-Reply-To: <84004bd5-db4d-4abc-a676-6565abf6320d@intel.com>
On Thu, Jun 11, 2026 at 02:22:23PM -0700, Reinette Chatre wrote:
> Hi Tony,
>
> On 6/11/26 11:01 AM, Luck, Tony wrote:
> > With renamed variable, and comment. Is this clearer?
> >
> >
> > /*
> > * intel_aet_register_enumeration() provides the value of "THIS_MODULE"
> > * from the the pmt_telemetry module. Note that this is NULL when
> > * CONFIG_INTEL_PMT_TELEMETRY=y so it serves as a flag to decide whether
> > * try_module_get() and module_put() need to be called.
> > */
> > static struct module *pmt_is_a_module;
> >
>
> Using a single variable to contain both data and state has caused problem in
> the past. Could it be helpful to create two utilities, for example get_pmt()
> and put_pmt() although naming is not at issue at the moment, that contains
> and makes obvious these "is INTEL_PMT_TELEMETRY a module or not" checks? These would
> then be no-op when INTEL_PMT_TELEMETRY=y. This would simplify the code in the
> callers by containing this logic and not rely on callers doing this check so
> many times.
I can code this up to see how it looks.
> Even so, it is not clear to me how the AET lifetime will look like since
> INTEL_PMT_TELEMETRY's .remove() may be called at other times so resctrl
> may need other hooks that could replace this module handling.
Case 1: No races
Resctrl mount: intel_aet_pre_mount() gets a success return from
try_module_get() - so the hold on the pmt_module will block the
kernel from removing the module (-EBUSY).
Case 2: Race between resctrl mount and rmmod pmt_telemetry
Mount code gets into intel_aet_pre_mount() and rmmod gets to intel_aet_unregister_enumeration()
where both try to acquire aet_register_lock.
If intel_aet_pre_mount() gets the mutex it calls try_module_get(), which
fails because if the pmt_telemetry code is in the .remove() functions,
then the module state is already in MODULE_STATE_GOING.
If rmmod process wins the mutex it sets get_feature and put_feature to
NULL and releases the mutex. intel_aet_pre_mount() runs and returns
false because get_feature and put_feature are NULL.
Is there some other race condition to consider?
> Reinette
-Tony
P.S. David Box is OK with setting suppress_bind_attrs for the telemetry
driver. So that will make the unbind issue moot.
>
^ permalink raw reply
* Re: [PATCH v7 11/14] mpam,x86/resctrl: Resolve INTEL_PMT_TELEMETRY symbols at runtime
From: Reinette Chatre @ 2026-06-11 21:22 UTC (permalink / raw)
To: Luck, Tony
Cc: Fenghua Yu, Wieczor-Retman, Maciej, Peter Newman, James Morse,
Babu Moger, Drew Fustini, Dave Martin, Chen, Yu C, Box, David E,
x86@kernel.org, Christoph Hellwig, linux-kernel@vger.kernel.org,
patches@lists.linux.dev
In-Reply-To: <air33e4js_onwUik@agluck-desk3>
Hi Tony,
On 6/11/26 11:01 AM, Luck, Tony wrote:
> With renamed variable, and comment. Is this clearer?
>
>
> /*
> * intel_aet_register_enumeration() provides the value of "THIS_MODULE"
> * from the the pmt_telemetry module. Note that this is NULL when
> * CONFIG_INTEL_PMT_TELEMETRY=y so it serves as a flag to decide whether
> * try_module_get() and module_put() need to be called.
> */
> static struct module *pmt_is_a_module;
>
Using a single variable to contain both data and state has caused problem in
the past. Could it be helpful to create two utilities, for example get_pmt()
and put_pmt() although naming is not at issue at the moment, that contains
and makes obvious these "is INTEL_PMT_TELEMETRY a module or not" checks? These would
then be no-op when INTEL_PMT_TELEMETRY=y. This would simplify the code in the
callers by containing this logic and not rely on callers doing this check so
many times.
Even so, it is not clear to me how the AET lifetime will look like since
INTEL_PMT_TELEMETRY's .remove() may be called at other times so resctrl
may need other hooks that could replace this module handling.
Reinette
^ permalink raw reply
* Re: [PATCH v7 06/14] fs/resctrl: Add interface to disable a monitor event
From: Reinette Chatre @ 2026-06-11 21:22 UTC (permalink / raw)
To: Luck, Tony
Cc: Fenghua Yu, Wieczor-Retman, Maciej, Peter Newman, James Morse,
Babu Moger, Drew Fustini, Dave Martin, Chen, Yu C, Box, David E,
x86@kernel.org, Christoph Hellwig, linux-kernel@vger.kernel.org,
patches@lists.linux.dev
In-Reply-To: <SJ1PR11MB6083DF803C214160BA106555FC1A2@SJ1PR11MB6083.namprd11.prod.outlook.com>
Hi Tony,
On 6/10/26 4:19 PM, Luck, Tony wrote:
> I took your suggested commit comment. Did some edits of my own and then asked AI to offer improvements.
>
> It came up with this
>
> ---
>
> resctrl currently assumes all monitor events are enabled before any domain
> is created, because per-domain state is allocated by the architecture's CPU
> hotplug callbacks. There is no way to disable an event once registered.
>
> AET events are enumerated by the INTEL_PMT_TELEMETRY driver. To allow that
> driver to be a loadable module, resctrl must tolerate AET events appearing
> and disappearing, which requires the ability to disable an event when the
> driver is unloaded.
>
> Add resctrl_disable_mon_event(). The architecture owns domain lifetime and mount
> state, so it is responsible for calling this only while resctrl is unmounted
nit: "owns domain lifetime and mount state" -> "owns domain lifetime and knows mount
state"
(architecture does not own mount state)
> and for cleaning up any per-domain state. Document those requirements in
> the kerneldoc since they are not enforced in code.
>
Looks good to me. Thank you.
Reinette
^ permalink raw reply
* Re: [PATCH v7 11/14] mpam,x86/resctrl: Resolve INTEL_PMT_TELEMETRY symbols at runtime
From: Luck, Tony @ 2026-06-11 18:01 UTC (permalink / raw)
To: Reinette Chatre
Cc: Fenghua Yu, Wieczor-Retman, Maciej, Peter Newman, James Morse,
Babu Moger, Drew Fustini, Dave Martin, Chen, Yu C, Box, David E,
x86@kernel.org, Christoph Hellwig, linux-kernel@vger.kernel.org,
patches@lists.linux.dev
In-Reply-To: <aingld8Hoz-x5MTj@agluck-desk3>
With renamed variable, and comment. Is this clearer?
/*
* intel_aet_register_enumeration() provides the value of "THIS_MODULE"
* from the the pmt_telemetry module. Note that this is NULL when
* CONFIG_INTEL_PMT_TELEMETRY=y so it serves as a flag to decide whether
* try_module_get() and module_put() need to be called.
*/
static struct module *pmt_is_a_module;
...
/*
* Track whether pmt_telemetry enumeration succeeded during mount for use
* during unmount.
*/
static bool pmt_in_use;
bool intel_aet_pre_mount(void)
{
bool ret;
if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
return false;
guard(mutex)(&aet_register_lock);
if (!get_feature || !put_feature)
return false;
if (pmt_is_a_module) {
if (!try_module_get(pmt_is_a_module))
return false;
}
ret = aet_get_events();
if (!ret) {
if (pmt_is_a_module)
module_put(pmt_is_a_module);
} else {
pmt_in_use = true;
}
return ret;
}
void intel_aet_unmount(void)
{
struct event_group **peg;
guard(mutex)(&aet_register_lock);
if (!pmt_in_use)
return;
for_each_event_group(peg) {
if ((*peg)->pfg) {
struct event_group *e = *peg;
for (int j = 0; j < e->num_events; j++)
resctrl_disable_mon_event(e->evts[j].id);
put_feature((*peg)->pfg);
(*peg)->pfg = NULL;
}
}
if (pmt_is_a_module)
module_put(pmt_is_a_module);
pmt_in_use = false;
}
-Tony
^ permalink raw reply
* Re: [PATCH v7 08/14] fs,x86,mpam/resctrl: Handle change in number of RMIDs on each mount
From: Luck, Tony @ 2026-06-11 17:40 UTC (permalink / raw)
To: Reinette Chatre
Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
Babu Moger, Drew Fustini, Dave Martin, Chen Yu, David E Box, x86,
Christoph Hellwig, linux-kernel, patches
In-Reply-To: <c46d7e2f-e247-4838-b265-e22b3791d1c9@intel.com>
On Tue, Jun 09, 2026 at 04:35:42PM -0700, Reinette Chatre wrote:
> Hi Tony,
>
> On 6/9/26 2:58 PM, Luck, Tony wrote:
> > On Mon, Jun 08, 2026 at 04:21:10PM -0700, Reinette Chatre wrote:
> >> Hi Tony,
> >>
> > Changed the Subject: tag to "mpam,x86,fs/resctrl" to match what you used
> > for the RFC discussion. I think the implied rule is "architectures first
> > in alphabetical order, filesystem last".
>
> After seeing the RFC discussions I do not think what I did in PoC is ideal.
> MPAM is the name of the Arm feature and soon there will also be RISC-V with its
> "Ssqosid" and "QBQRI" features. Since we currently have x86 as established
> prefix for the PQoS and RDT features it may be more appropriate and simpler
> to instead use something like "arm,riscv,x86,fs/resctrl" if such global
> change is ever needed? It still follows your implied rule of "architectures
> first in alphabetical order" ... but it instead actually uses the architecture
> names and not a mix of feature and architecture names. I am not dictating here
> and open to suggestions.
This sounds good. I'll switch to "arm,x86,fs/resctrl" for the patches in
this series that touch three places.
>
> >
> >> On 6/1/26 12:56 PM, Tony Luck wrote:
> ...
> >>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> >>> index 327e7a863614..b374e2f84a75 100644
> >>> --- a/fs/resctrl/monitor.c
> >>> +++ b/fs/resctrl/monitor.c
> >>> @@ -115,10 +115,17 @@ static inline struct rmid_entry *__rmid_entry(u32 idx)
> >>>
> >>> static void limbo_release_entry(struct rmid_entry *entry)
> >>> {
> >>> + u32 idx_limit = resctrl_arch_system_num_rmid_idx();
> >>
> >> Having the code switch idx_limit to sometimes be resctrl_arch_system_num_rmid_idx()
> >> and other times resctrl_arch_system_max_rmid_idx() makes this change difficult to
> >> follow. I think it will help to use different variable names to differentiate
> >> the context in which it is being used and not leave reader trying to understand
> >> why there are *two* limits.
> >
> > I'll change the local variable name to max_idx_limit when dealing with
> > the max value rather than the current. setup_rmid_lru_list() will have
> > both variables since max_idx_limit is used for allocation, and idx_limit
> > to add the right number to rmid_free_lru list. Added a comment to
> > __check_limbo() explaining why max RMID is used there. I think the other
> > spots are easy to see why.
> What do you think of "min_idx_limit" instead of "idx_limit" to complement
> the "max_idx_limit" while reflecting that it is the limit that is common
> (hence "minimum") among all monitoring resources?
min_idx_limit nicely matches max_idx_limit. current_idx_limit could also
be a contender (though it is longer, and "current" might not obviously
refer to the mount/unmount cycle).
I think I'll switch to min_idx_limit.
>
> Reinette
^ permalink raw reply
* Re: [PATCH v7 07/14] x86/resctrl: Maintain a count of enabled monitor features
From: Luck, Tony @ 2026-06-11 17:27 UTC (permalink / raw)
To: Reinette Chatre
Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
Babu Moger, Drew Fustini, Dave Martin, Chen Yu, David E Box, x86,
Christoph Hellwig, linux-kernel, patches
In-Reply-To: <40404fc1-c355-4848-af2c-f39f8b3b03ba@intel.com>
On Tue, Jun 09, 2026 at 04:03:53PM -0700, Reinette Chatre wrote:
> Hi Tony,
>
> On 6/9/26 11:46 AM, Luck, Tony wrote:
>
> ...
>
> > Conceptually this is much simpler. But I have questions about the
> > implementation. The new function is trivial:
> >
> > bool resctrl_arch_mon_capable(void)
> > {
> > struct rdt_resource *r;
> >
> > for_each_mon_capable_rdt_resource(r)
> > return true;
> >
> > return false;
> > }
> >
> > But that led me to #include hell when I tried to keep it as an inline
> > function in <asm/resctrl.h> because for_each_mon_capable_rdt_resource()
> > is defined in <linux/resctrl.h> after the #include <asm/resctrl.h>
> >
> > So I've moved it out-of-line into arch/x86/kernel/cpu/resctrl/core.c.
> >
> > The MPAM implementation is also out-of-line.
> >
> > But then I wondered about performance. This change on x86 goes from an
> > inline function that simply returns the value of a global variable to an
> > out-of-line function that scans the array of rdt resources. The common
> > case will be a hit on the first element, so not awful. But still worse
> > that before I touched it.
> >
> > So I looked for places where resctrl_arch_mon_capable() is called in
> > "hot" code paths. There's a bunch in mount and mkdir, but those aren't
> > very hot.
> >
> > My list (check to see if I missed any others):
> >
> > 1) Recurring call once per second in mbm_handle_overflow()
> >
> > Seems redundant. There is a check to only start the overflow handler
> > on mon_capable systems (only with enabled MBM events!)
> >
> > 2) Call for potentially every task when reading tasks files in is_rmid_match()
> >
> > Also seems redundant. Next part of that "if" looks at "r->type == RDTMON_GROUP"
> > which can only be true on mon_capable systems.
> >
> > Should I clean these up in this series? As part of this patch which
> > exacerbates the performance impact, or as a separate cleanup patch?
>
> Thanks for catching this. I think it is reasonable to include it in this series
> as a preparatory patch with this patch helping to motivate its inclusion.
OK. I'll add a new patch to the series for this.
>
> Even so, I am now a bit confused and concerned about the PMT dependencies. I am
> not very familiar with module_get()/module_put() capabilities - can it be
> guaranteed that PMT remains accessible between those two calls? I peeked at the
> sashiko review and it mentioned the usage of "unbind" triggered from user space.
> It sounds to me as though PMT's .probe() and .remove() can be triggered at
> various times from user space irrespective of resctrl being mounted or not and not
> prevented by a module_get()?
>
> So, consider scenario where PMT is loaded and then resctrl is mounted and user space
> creates a couple of monitor groups that contains the AET event files. What will
> happen if user space then unbinds PMT? From what I can tell this will not
> trigger AET unmount like resctrl unmount would and thus leave a lot of dangling state?
>
> Back to the above ... if AET needs handle a PMT unbind, do you think that, for
> example like in is_rmid_match(), that resctrl_arch_mon_capable() could return
> different values during a single resctrl mount?
I'm pursing options to prevent the unbind (until someone tells me there
is a critical use case that needs this to work at any arbitrary moment).
If preventing unbind works out, then resctrl_arch_mon_capable() won't change during
a single mount.
>
> Reinette
-Tony
^ permalink raw reply
* Re: [PATCH 0/5] Improve the invalidation path in AMD
From: Jason Gunthorpe @ 2026-06-11 16:33 UTC (permalink / raw)
To: Joerg Roedel
Cc: Srivastava, Dheeraj Kumar, iommu, Robin Murphy,
Suravee Suthikulpanit, Will Deacon, patches
In-Reply-To: <airU7xpucwkZMAo7@8bytes.org>
On Thu, Jun 11, 2026 at 05:32:27PM +0200, Joerg Roedel wrote:
> On Thu, Jun 11, 2026 at 08:56:11AM -0300, Jason Gunthorpe wrote:
> > Joerg?
>
> This waits for regression test results from Vasant.
I understood he delegated this to Dheerja and this email from him is
confirming the regression tests?
https://lore.kernel.org/all/83b8a0f3-2409-464d-a9e4-59b9eea926cd@amd.com/
Vasant reviewed it here:
https://lore.kernel.org/all/4cc8368d-3972-47c6-bd60-7fc8b9c7be3e@amd.com/
Jason
^ permalink raw reply
* Re: [PATCH 0/5] Improve the invalidation path in AMD
From: Joerg Roedel @ 2026-06-11 15:32 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Srivastava, Dheeraj Kumar, iommu, Robin Murphy,
Suravee Suthikulpanit, Will Deacon, patches
In-Reply-To: <20260611115611.GE1066031@ziepe.ca>
On Thu, Jun 11, 2026 at 08:56:11AM -0300, Jason Gunthorpe wrote:
> Joerg?
This waits for regression test results from Vasant.
^ permalink raw reply
* Re: [PATCH 0/5] Improve the invalidation path in AMD
From: Jason Gunthorpe @ 2026-06-11 11:56 UTC (permalink / raw)
To: Srivastava, Dheeraj Kumar
Cc: iommu, Joerg Roedel, Robin Murphy, Suravee Suthikulpanit,
Will Deacon, patches
In-Reply-To: <83b8a0f3-2409-464d-a9e4-59b9eea926cd@amd.com>
On Tue, Jun 02, 2026 at 12:36:18PM +0530, Srivastava, Dheeraj Kumar wrote:
> Hi Jason,
>
> On 3/27/2026 8:53 PM, Jason Gunthorpe wrote:
> > Auditing this path reveals some unnecessary complexity and some
> > confusion around end/size/last and types that may cause issues at the
> > ULONG_MAX limit.
> >
> > First, fix up the call chains so that the gather start/last is
> > retained faithfully throughout. This avoids converting to a
> > start/size, and avoids various type conversions along this path. Fix
> > everything so the theoretical cases like start=0 last=U64_MAX work
> > correctly.
> >
> > Simplify all the calculations for alignment/etc to use trivial bit
> > maths instead of the complicated versions. These are all intended to
> > be no functional change.
> >
> > Finally, add support for PDE=0, deriving it from the gather. PDE=0
> > allows the HW to avoid scanning the page directory cache and
> > supporting it should improve performance.
> >
> > Jason Gunthorpe (5):
> > iommu/amd: Simplify build_inv_address()
> > iommu/amd: Pass last in through to build_inv_address()
> > iommu/amd: Have amd_iommu_domain_flush_pages() use last
> > iommu/amd: Make CMD_INV_IOMMU_ALL_PAGES_ADDRESS match the spec
> > iommu/amd: Control INVALIDATE_IOMMU_PAGES PDE from the gather
> >
> > drivers/iommu/amd/amd_iommu.h | 4 +-
> > drivers/iommu/amd/amd_iommu_types.h | 2 +-
> > drivers/iommu/amd/iommu.c | 175 +++++++++++++---------------
> > drivers/iommu/amd/pasid.c | 2 +-
> > 4 files changed, 87 insertions(+), 96 deletions(-)
> >
>
> Tested across multiple IOMMU modes; no regressions were detected for this
> series.
>
> Tested-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
Joerg?
Jason
^ permalink raw reply
* RE: [PATCH v7 06/14] fs/resctrl: Add interface to disable a monitor event
From: Luck, Tony @ 2026-06-10 23:19 UTC (permalink / raw)
To: Chatre, Reinette
Cc: Fenghua Yu, Wieczor-Retman, Maciej, Peter Newman, James Morse,
Babu Moger, Drew Fustini, Dave Martin, Chen, Yu C, Box, David E,
x86@kernel.org, Christoph Hellwig, linux-kernel@vger.kernel.org,
patches@lists.linux.dev
In-Reply-To: <f9b0c233-04a7-4ddc-b792-f2def000febd@intel.com>
I took your suggested commit comment. Did some edits of my own and then asked AI to offer improvements.
It came up with this
---
resctrl currently assumes all monitor events are enabled before any domain
is created, because per-domain state is allocated by the architecture's CPU
hotplug callbacks. There is no way to disable an event once registered.
AET events are enumerated by the INTEL_PMT_TELEMETRY driver. To allow that
driver to be a loadable module, resctrl must tolerate AET events appearing
and disappearing, which requires the ability to disable an event when the
driver is unloaded.
Add resctrl_disable_mon_event(). The architecture owns domain lifetime and mount
state, so it is responsible for calling this only while resctrl is unmounted
and for cleaning up any per-domain state. Document those requirements in
the kerneldoc since they are not enforced in code.
---
-Tony
^ permalink raw reply
* Re: [PATCH v5 00/11] x86,fs/resctrl: Fix long-standing issues
From: Reinette Chatre @ 2026-06-10 23:17 UTC (permalink / raw)
To: Borislav Petkov
Cc: tony.luck, james.morse, Dave.Martin, babu.moger, tglx,
dave.hansen, x86, hpa, ben.horgan, fustini, fenghuay, peternewman,
yu.c.chen, linux-kernel, patches
In-Reply-To: <20260610231028.GFainu5IR4RLvZRg6X@fat_crate.local>
On 6/10/26 4:10 PM, Borislav Petkov wrote:
> All good. I think in a couple of weeks, when we have -rc1, I can queue yours
> and then they'll get the longest exposure in linux-next so that should be
> good testing.
That would be great. Thank you very much Boris.
Reinette
^ permalink raw reply
* Re: [PATCH v5 00/11] x86,fs/resctrl: Fix long-standing issues
From: Borislav Petkov @ 2026-06-10 23:10 UTC (permalink / raw)
To: Reinette Chatre
Cc: tony.luck, james.morse, Dave.Martin, babu.moger, tglx,
dave.hansen, x86, hpa, ben.horgan, fustini, fenghuay, peternewman,
yu.c.chen, linux-kernel, patches
In-Reply-To: <5295a4a6-6652-4f9b-82a9-8310523636b9@intel.com>
On Wed, Jun 10, 2026 at 01:34:43PM -0700, Reinette Chatre wrote:
> Thank you very much Boris. Apologies if I implied so but I really did not
> expect nor want this to make it into the upcoming merge window. Especially
> after learning about Linus's comment [1] about fixes for long-standing
> issues not being appropriate late in the cycle (with v5 already considered
> late).
Right. Nothing to worry about, we do pay attention to how critical the fixes
are we send during the stabilization phase.
> Considering the changes in locking I do instead hope that this work could
> percolate on x86/cache for a while before moving further. To support this
> I hope to get few more acks by the time x86/cache can accept new changes.
> I should have made this clear.
All good. I think in a couple of weeks, when we have -rc1, I can queue yours
and then they'll get the longest exposure in linux-next so that should be
good testing.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply
* Re: [PATCH v7 06/14] fs/resctrl: Add interface to disable a monitor event
From: Reinette Chatre @ 2026-06-10 22:26 UTC (permalink / raw)
To: Luck, Tony
Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
Babu Moger, Drew Fustini, Dave Martin, Chen Yu, David E Box, x86,
Christoph Hellwig, linux-kernel, patches
In-Reply-To: <ainPljLE0yc0o1mv@agluck-desk3>
Hi Tony,
On 6/10/26 1:56 PM, Luck, Tony wrote:
> On Tue, Jun 09, 2026 at 04:02:55PM -0700, Reinette Chatre wrote:
>> On 6/9/26 10:21 AM, Luck, Tony wrote:
>>> On Mon, Jun 08, 2026 at 04:18:23PM -0700, Reinette Chatre wrote:
>>>> On 6/1/26 12:56 PM, Tony Luck wrote:
>>>>> In preparation for re-running AET enumeration on every mount, AET code must be
>>>>> able to disable events on unmount so the next mount starts from a clean slate.
>>>>>
>>>>> Add a file system interface for architecture to clear the enabled flag for
>>>>> a given event.
>>>>
>>>> This just verbatim describes what the patch does. It would be helpful to describe
>>>> how the flag is used by resctrl to support the first paragraph's implicit claim
>>>> that the enabled flag's value is not relevant when resctrl is unmounted.
>>>
>>> Revised commit:
>>> ---
>>> Subject: fs/resctrl: Add interface to disable a monitor event
>>>
>>> In preparation for re-running AET enumeration on every mount, AET code must be
>>> able to disable events on unmount so the next mount starts from a clean slate.
>>>
>>> Add a file system interface for architecture to reset architecture
>>> controlled fields of the given event. mon_event::enabled is only used
>>> during mount and at run time to check which events to include in file
>>> system objects. It is not used during unmount, so it is safe to clear
>>> it as part of the unmount flow.
>>
>> This introduces a general resctrl fs interface and makes some powerful
>> generalized statements in support of the interface but these statements
>> are only true for the AET events. Surely mon_event::enabled is used
>> during unmount since domains can come and go while resctrl is not mounted
>> and as the new comments explain there is significant state coordination that
>> needs to be done between these event callbacks and hotplug handlers.
>>
>> Similarly, the resctrl LLC occupancy worker keeps running while resctrl
>> is unmounted and depends on LLC occupancy event being enabled.
>>
>> Creating a resctrl fs generalized interface but motivating it with a
>> highly customized lens of usage without making that clear in the changelog
>> but instead just making grand claims of how safe this is seems underhanded.
>
> I can rewrite this commit comment to call out the limitations on event
> removal. Those are listed in the new kerneldoc comments that I added to
> the resctrl_disable_mon_event() declaration in <linux/resctrl.h> based on
> your feedback on previous version of this patch.
It is not necessary to duplicate the documentation added by the patch but
really should not make false statements like "It is not used during unmount"
>
> Is that what you are looking for here? Or are you suggesting that the
> new interface be less general?
I do not think it is necessary to make the interface less general but if you have
ideas then please share. My concern is just that the changelog should accurately
describe what the patch does.
Consider, for example, something like below. Please do not copy&paste what I write since
I already acknowledge it is not ideal. Please consider it just as a draft of how I think
this change can be described:
Without enforcement resctrl assumes that all events are enabled before any
domain is created. This assumption supports events that requires per-domain
state that is created with the domain via the architecture's CPU hotplug handlers.
resctrl does not support disabling of events since, in addition to coordination with
the domain management done by the architecture's CPU hotplug handlers, disabling
of events needs to be coordinated with the resctrl filesystem that may be mounted
and exposing enabled events.
The AET telemetry resources are enumerated and managed by the INTEL_PMT_TELEMETRY
driver. In preparation for INTEL_PMT_TELEMETRY to be loaded as module resctrl should
handle the scenario where INTEL_PMT_TELEMETRY is unloaded after resctrl
discovered the telemetry resources and enabled the associated events. This means
that resctrl needs to support disabling of events.
The architecture manages the domain and knows the resctrl mount state via
the resctrl_arch_pre_mount() callback. The architecture is thus in the best position
to know when it is safe to disable an event.
Allow the architecture to disable an event and trust it to only do so when resctrl
fs is not mounted and that it will ensure the event's state is cleaned up. Without being
able to do enforcement of safe event disabling, document what an architecture needs to
consider before using this new capability.
Reinette
^ permalink raw reply
* Re: [PATCH v7 11/14] mpam,x86/resctrl: Resolve INTEL_PMT_TELEMETRY symbols at runtime
From: Luck, Tony @ 2026-06-10 22:09 UTC (permalink / raw)
To: Reinette Chatre
Cc: Fenghua Yu, Wieczor-Retman, Maciej, Peter Newman, James Morse,
Babu Moger, Drew Fustini, Dave Martin, Chen, Yu C, Box, David E,
x86@kernel.org, Christoph Hellwig, linux-kernel@vger.kernel.org,
patches@lists.linux.dev
In-Reply-To: <a1dd1858-a6d1-4760-9181-ca30de3a1510@intel.com>
On Wed, Jun 10, 2026 at 10:58:26AM -0700, Reinette Chatre wrote:
> Hi Tony,
>
> On 6/10/26 10:24 AM, Luck, Tony wrote:
> >> I see many "if (pmt_module)" checks ... intel_aet_pre_mount() even calls it twice. Are they all necessary?
> >> It creates the impression that the PMT module can be yanked from AET at any time, something which
> >> intel_aet_unregister_enumeration() seems to allow. I was hoping that there can be some guarantee
> >> that if PMT is available during pre_mount() it will continue to be available at least until
> >> unmount() completes.
> >
> > pmt_module is NULL if CONFIG_INTEL_PMT_TELEMETRY=y ... i.e. built-in to the kernel.
> >
> > In that case it obviously can't go away, and doesn't need module_get()/module_put().
> > There's no special case for this. try_module_get() takes a fault on NULL dereference.
> >
> > When CONFIG_INTEL_PMT_TELEMETRY=m then the get/put should stop it going away
> > (I tried rmmod while resctrl mounted and it fails to remove as expected).
>
> Have you tried "unbind" via sysfs?
I need to resolve the unbind issue. Looks like I may have to extend the
registration interface to pass a pointer to the device so I can call
get_device() to prevent such an unbind operation while the file system
is mounted in addition to try_get_module() to stop the module from being
unloaded.
Or talk David into marking the driver with "device_driver::suppress_bind_attrs"
so that unbind isn't available for the user.
But that's separate from the issue of whether all those "if (pmt_module)" tests are needed.
> Even so, my comment was in response to the code you shared. Let me paste it
> back and highlight what I meant with the "many pmt_module checks":
>
> > /*
> > * Track whether pmt_telemetry enumeration succeeded during mount for use
> > * during unmount.
> > */
> > static bool pmt_in_use;
> >
> > bool intel_aet_pre_mount(void)
> > {
> > bool ret;
> >
> > guard(mutex)(&aet_register_lock);
> > if (!get_feature || !put_feature)
> > return false;
> >
> > if (pmt_module) {
>
> Here is an "if (pmt_module)" check ... can it ever be false? If so
> then the rest of this function becomes very confusing (more below ...)
Yes. pmt_module is NULL when the kernel is built with CONFIG_INTEL_PMT_TELEMETRY=y
I.e. the value of THIS_MODULE in the pmt_telemetry code is NULL when it
isn't a module, but is built-in to the kernel.
>
> > if (!try_module_get(pmt_module))
> > return false;
> > }
> >
> > ret = aet_get_events();
>
> aet_get_events() can thus seemingly be called when pmt_module is unset?
>
> >
> > if (!ret) {
> > if (pmt_module)
>
> Can pmt_module be unset here?
If it was NULL earlier (because CONFIG_INTEL_PMT_TELEMETRY=y) then it
will be NULL forever. If it was non-NULL above, then it must still be
non-NULL here because changes are protected but the aet_register_lock
mutex.
>
> > module_put(pmt_module);
> > } else {
> > pmt_in_use = true;
>
> So pmt_in_use could be true if pmt_module is unset? Confusing, no?
I'm using "pmt", it just isn't a module. I'm using the built-in kernel
copy of the code.
Do you have a better suggestion for the name of "pmt_in_use" that makes
it more obvious that it doesn't refer to the loaded state of the module?
Or maybe change the "pmt_module" variable name? Suggestions welcome.
> > }
> >
> > return ret;
> > }
> >
> > void intel_aet_unmount(void)
> > {
> > struct event_group **peg;
> >
> > guard(mutex)(&aet_register_lock);
> > if (!pmt_in_use)
> > return;
> >
> > for_each_event_group(peg) {
> > if ((*peg)->pfg) {
> > struct event_group *e = *peg;
> >
> > for (int j = 0; j < e->num_events; j++)
> > resctrl_disable_mon_event(e->evts[j].id);
> > put_feature((*peg)->pfg);
> > (*peg)->pfg = NULL;
> > }
> > }
> > if (pmt_module)
>
> So this implies that pmt_module could be unset here ... if that is the
> case then that means that the PMT module disappeared while resctrl was
> mounted which is exactly what this work aims to prevent, no?
As above. If CONFIG_INTEL_PMT_TELEMETRY=y then pmt_module is always NULL.
The module didn't disappear, there never was a module. It was built-in
the whole time.
>
> > module_put(pmt_module);
> > pmt_in_use = false;
>
> ... if pmt_module was unset then how could pmt_in_use ever be true?
It can be set in the case where the kernel was built with CONFIG_INTEL_PMT_TELEMETRY=y
>
> > }
>
> Reinette
-Tony
^ permalink raw reply
* Re: [PATCH v7 06/14] fs/resctrl: Add interface to disable a monitor event
From: Luck, Tony @ 2026-06-10 20:56 UTC (permalink / raw)
To: Reinette Chatre
Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
Babu Moger, Drew Fustini, Dave Martin, Chen Yu, David E Box, x86,
Christoph Hellwig, linux-kernel, patches
In-Reply-To: <7bbc2796-0f33-433d-a949-d95fa1db6401@intel.com>
On Tue, Jun 09, 2026 at 04:02:55PM -0700, Reinette Chatre wrote:
> Hi Tony,
>
> On 6/9/26 10:21 AM, Luck, Tony wrote:
> > On Mon, Jun 08, 2026 at 04:18:23PM -0700, Reinette Chatre wrote:
> >> Hi Tony,
> >>
> >> On 6/1/26 12:56 PM, Tony Luck wrote:
> >>> In preparation for re-running AET enumeration on every mount, AET code must be
> >>> able to disable events on unmount so the next mount starts from a clean slate.
> >>>
> >>> Add a file system interface for architecture to clear the enabled flag for
> >>> a given event.
> >>
> >> This just verbatim describes what the patch does. It would be helpful to describe
> >> how the flag is used by resctrl to support the first paragraph's implicit claim
> >> that the enabled flag's value is not relevant when resctrl is unmounted.
> >
> > Revised commit:
> > ---
> > Subject: fs/resctrl: Add interface to disable a monitor event
> >
> > In preparation for re-running AET enumeration on every mount, AET code must be
> > able to disable events on unmount so the next mount starts from a clean slate.
> >
> > Add a file system interface for architecture to reset architecture
> > controlled fields of the given event. mon_event::enabled is only used
> > during mount and at run time to check which events to include in file
> > system objects. It is not used during unmount, so it is safe to clear
> > it as part of the unmount flow.
>
> This introduces a general resctrl fs interface and makes some powerful
> generalized statements in support of the interface but these statements
> are only true for the AET events. Surely mon_event::enabled is used
> during unmount since domains can come and go while resctrl is not mounted
> and as the new comments explain there is significant state coordination that
> needs to be done between these event callbacks and hotplug handlers.
>
> Similarly, the resctrl LLC occupancy worker keeps running while resctrl
> is unmounted and depends on LLC occupancy event being enabled.
>
> Creating a resctrl fs generalized interface but motivating it with a
> highly customized lens of usage without making that clear in the changelog
> but instead just making grand claims of how safe this is seems underhanded.
I can rewrite this commit comment to call out the limitations on event
removal. Those are listed in the new kerneldoc comments that I added to
the resctrl_disable_mon_event() declaration in <linux/resctrl.h> based on
your feedback on previous version of this patch.
Is that what you are looking for here? Or are you suggesting that the
new interface be less general?
>
> >
> > Add kerneldoc comments to describe limitations on when events may be enabled
> > or disabled.
> > ---
> >
> >>
> >>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> >>> index 9fd901c78dc6..327e7a863614 100644
> >>> --- a/fs/resctrl/monitor.c
> >>> +++ b/fs/resctrl/monitor.c
> >>> @@ -1012,6 +1012,18 @@ bool resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu,
> >>> return true;
> >>> }
> >>>
> >>> +void resctrl_disable_mon_event(enum resctrl_event_id eventid)
> >>> +{
> >>> + if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS))
> >>> + return;
> >>> + if (!mon_event_all[eventid].enabled) {
> >>> + pr_warn("Repeat disable for event %d\n", eventid);
> >>> + return;
> >>> + }
> >>> +
> >>> + mon_event_all[eventid].enabled = false;
> >>> +}
> >>
> >> It seems reasonable for architecture to expect that "disable" of event undoes all
> >> settings from the earlier "enable" of event. The new function only sets the "enabled"
> >> flag to false though. This is potentially confusing since it leaves the event with
> >> some lingering state, some of which is a pointer to state that an architecture may
> >> be reasonable to remove after disabling this event leaving a dangling pointer.
> >
> > Ok. I'll add:
> >
> > + mon_event_all[eventid].any_cpu = false;
> > + mon_event_all[eventid].binary_bits = 0;
> > + mon_event_all[eventid].arch_priv = NULL;
>
> Thank you.
>
> Reinette
>
-Tony
^ permalink raw reply
* Re: [PATCH v5 00/11] x86,fs/resctrl: Fix long-standing issues
From: Reinette Chatre @ 2026-06-10 20:34 UTC (permalink / raw)
To: Borislav Petkov
Cc: tony.luck, james.morse, Dave.Martin, babu.moger, tglx,
dave.hansen, x86, hpa, ben.horgan, fustini, fenghuay, peternewman,
yu.c.chen, linux-kernel, patches
In-Reply-To: <20260610192137.GCaim5QSFx-YKgAzSV@fat_crate.local>
On 6/10/26 12:21 PM, Borislav Petkov wrote:
> On Wed, Jun 10, 2026 at 10:48:06AM -0700, Reinette Chatre wrote:
>> After considering the latest Sashiko feedback and deciding to stop adding
>> new fixes to this series, I do think this series can now be considered as
>> "settled". If there are no concerns with this series self then I would like
>> to proceed for it to be considered for inclusion.
>
> Ping me after the merge window pls. They'll have to catch the "next train" as
> it is too late now.
Thank you very much Boris. Apologies if I implied so but I really did not expect
nor want this to make it into the upcoming merge window. Especially after learning
about Linus's comment [1] about fixes for long-standing issues not being appropriate
late in the cycle (with v5 already considered late).
Considering the changes in locking I do instead hope that this work could percolate
on x86/cache for a while before moving further. To support this I hope to get few
more acks by the time x86/cache can accept new changes. I should have made this clear.
Reinette
[1] https://lore.kernel.org/lkml/CAHk-=wjt1NiKOdyAMz_DT7NmZ++SizPOhRSi492ukdTnpDzHQw@mail.gmail.com/
^ permalink raw reply
* Re: [PATCH v7 05/14] x86/resctrl: Stop setting event_group::force_off on RMID shortage
From: Luck, Tony @ 2026-06-10 20:01 UTC (permalink / raw)
To: Reinette Chatre
Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
Babu Moger, Drew Fustini, Dave Martin, Chen Yu, David E Box, x86,
Christoph Hellwig, linux-kernel, patches
In-Reply-To: <933eb065-17ea-413e-875b-3783ad2456aa@intel.com>
On Tue, Jun 09, 2026 at 04:02:11PM -0700, Reinette Chatre wrote:
> Hi Tony,
>
> On 6/9/26 9:51 AM, Luck, Tony wrote:
> > On Mon, Jun 08, 2026 at 04:16:58PM -0700, Reinette Chatre wrote:
> >> Hi Tony,
> >>
> >> On 6/1/26 12:56 PM, Tony Luck wrote:
> >>> Drop the force_off assignment from all_regions_have_sufficient_rmid().
> >>
> >> Apart from this being obvious from the patch, this is not how an x86 changelog
> >> should be structured. Please see "Changelog" in
> >> Documentation/process/maintainer-tip.rst
> >
> > Revised version:
> > ---
> > Subject: x86/resctrl: Fix usage of event_group::force_off in AET
> >
> > The kernel command line option "rdt=" is used to force enable or disable
> > individual resctrl features.
> >
> > There are two problems with usage in the AET (Application Energy
> > Telemetry) feature:
>
> Above seems incomplete. Could it be "two problems with usage of the rdt= kernel
> command line ..." or "two problems with the "rdt=" usage ..." or ...?
>
> > 1) If the user specifies both enabling and disabling of the same feature
> > (e.g. "rdt=energy,!energy") the request to enable should override the
> > request to disable (for consistency with other features).
> > 2) event_group::force_off is set true in all_regions_have_sufficient_rmid()
> > which will cause problems when AET features are enumerated on each mount
> > of the resctrl file system
>
> Two comments:
> - In general we should aim to make the changelog as specific as possible to
> prevent reader needing to do any deciphering. This makes the review easier
> and faster. So, please avoid general things like "will cause problems" that
> requires reader to figure out the problems (plural!) on their own. Instead
> just be specific about what the problems are.
> - Is (2) still a problem when (1) is fixed? An underlying question is: if an
> event group is successfully enumerated during resctrl mount, will it always
> enumerate with identical properties on every subsequent mount?
>
> >
> > Fix the first issue by checking event_group::force_on in enable_events().
>
> "the first issue" seems vague to me while the rest describes the code that can be
> seen from the patch. How about something like below to help describe what the code
> change accomplishes:
> Enumerate the event group if user space overrides any disable of the event group.
>
> >
> > Fix the other issue by dropping the assignment to event_group::force_off.
>
> "the other issue" is very vague. Here it will also help to be specific. Although
> per earlier comment the need for this fix is no longer clear to me?
>
> >
> > ---
> >
> >>
> >> Please note that, after the preparatory fixes that are expected to land separately,
> >> this will be the first patch of this series ... having this first patch just
> >> kick off with "what changes" without any context is a difficult way for a
> >> reviewer to start considering this piece of work.
> >>
> >>> This preserves current single-enumeration behaviour while preparing for
> >>> the upcoming per-mount enumeration, where latching force_off would
> >>> incorrectly suppress re-enumeration on subsequent mounts - even when the
> >>> user explicitly requested the feature via "rdt={feature}".
> >>
> >> I believe that user space should still expect that rdt= options behave
> >> consistently. So if user space for some reason provides: rdt=!mbm,mbm,!energy,energy
> >> on a system that supports both then it should not be the case that one is enabled and the
> >> other not.
> >>
> >> I this think that, for example, enable_events() should start with:
> >>
> >> if (e->force_off && !e->force_on)
> >> return false;
> >
> > OK.
> >
> >>
> >>> Signed-off-by: Tony Luck <tony.luck@intel.com>
> >>> ---
> >>> arch/x86/kernel/cpu/resctrl/intel_aet.c | 8 +++-----
> >>> 1 file changed, 3 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> >>> index 89b8b619d5d5..e2af700bca04 100644
> >>> --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
> >>> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> >>> @@ -60,8 +60,8 @@ struct pmt_event {
> >>> * data for all telemetry regions of type @pfname.
> >>> * Valid if the system supports the event group,
> >>> * NULL otherwise.
> >>> - * @force_off: True when "rdt" command line or architecture code disables
> >>> - * this event group due to insufficient RMIDs.
> >>> + * @force_off: True when "rdt" command line disables this event group
> >>> + * to avoid system limitations due to insufficient RMIDs.
> >>
> >> >From what I understand it is only architecture that can disable an event group
> >> "to avoid system limitations due to insufficient RMIDs" and this capability is removed
> >> in this patch. Above implies that this capability now needs to be implemented
> >> by userspace, which I do not think is accurate?
> >
> > Should I just drop the second line? The user may have various other reasons
> > why they want to disable this event group.
>
> Now that enable_events() starts with a check of event_group::force_on it seems that
> the existing behavior of event_group::force_off to also reflect architecture
> disable can be maintained?
OK. I'll drop this kerneldoc change and keep the setting of force_off = true
when there are insufficient RMIDs.
>
> > Combined with the code change you suggest above, this would describe the
> > use of these two fields. "force_off" disables, "force_on" enables (and
> > overrides any "force_off").
>
> I believe the "force_on" description below already accurately describes how it
> overrides an earlier disable.
>
> >
> >>
> >>> * @force_on: True when "rdt" command line overrides disable of this
> >>> * event group.
> >>> * @guid: Unique number per XML description file.
> >>> @@ -214,10 +214,8 @@ static bool all_regions_have_sufficient_rmid(struct event_group *e, struct pmt_f
> >>> if (!p->regions[i].addr)
> >>> continue;
> >>> tr = &p->regions[i];
> >>> - if (tr->num_rmids < e->num_rmid) {
> >>> - e->force_off = true;
> >>> + if (tr->num_rmids < e->num_rmid)
> >>> return false;
> >>> - }
> >>> }
> >>>
> >>> return true;
>
> Reinette
^ permalink raw reply
* Re: [PATCH v5 00/11] x86,fs/resctrl: Fix long-standing issues
From: Borislav Petkov @ 2026-06-10 19:21 UTC (permalink / raw)
To: Reinette Chatre
Cc: tony.luck, james.morse, Dave.Martin, babu.moger, tglx,
dave.hansen, x86, hpa, ben.horgan, fustini, fenghuay, peternewman,
yu.c.chen, linux-kernel, patches
In-Reply-To: <28449c4f-49e6-4342-938c-4a3699e3d682@intel.com>
On Wed, Jun 10, 2026 at 10:48:06AM -0700, Reinette Chatre wrote:
> After considering the latest Sashiko feedback and deciding to stop adding
> new fixes to this series, I do think this series can now be considered as
> "settled". If there are no concerns with this series self then I would like
> to proceed for it to be considered for inclusion.
Ping me after the merge window pls. They'll have to catch the "next train" as
it is too late now.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply
* Re: [PATCH 6.6.y] usb: gadget: f_ncm: Fix net_device lifecycle with device_move
From: Carlos Llamas @ 2026-06-10 17:58 UTC (permalink / raw)
To: Sasha Levin
Cc: gregkh, stable, khtsai, patches, linux-kernel, raubcameo,
andrzej.p, balbi, kyungmin.park, linux-usb, Jianqiang kang
In-Reply-To: <20260608-stable-reply-0013@kernel.org>
On Mon, Jun 08, 2026 at 08:51:59PM -0400, Sasha Levin wrote:
> > [PATCH 6.6.y] usb: gadget: f_ncm: Fix net_device lifecycle with device_move
>
> I can't take this one (6.6 or 6.1) on its own. ec35c1969650 alone opens a
> userspace-reachable NULL deref in eth_get_drvinfo() that is later closed
> upstream by e002e92e88e1 ("usb: gadget: u_ether: Fix NULL pointer deref in
> eth_get_drvinfo"), so applying this commit by itself trades a UAF for a DoS.
>
> Please send a complete backport that also includes e002e92e88e124 (as the
> follow-up patch in the same series) for both 6.6.y and 6.1.y, and I'll queue
> them together.
I just ran this patch plus a cherry-pick of e002e92e88e1 that you
mentioned through Android's CI with no issues. Same goes with 6.1.y, so
I can send Jianqiang's backports with the fix added to them.
Cheers,
--
Carlos Llamas
^ permalink raw reply
* Re: [PATCH v7 11/14] mpam,x86/resctrl: Resolve INTEL_PMT_TELEMETRY symbols at runtime
From: Reinette Chatre @ 2026-06-10 17:58 UTC (permalink / raw)
To: Luck, Tony
Cc: Fenghua Yu, Wieczor-Retman, Maciej, Peter Newman, James Morse,
Babu Moger, Drew Fustini, Dave Martin, Chen, Yu C, Box, David E,
x86@kernel.org, Christoph Hellwig, linux-kernel@vger.kernel.org,
patches@lists.linux.dev
In-Reply-To: <SJ1PR11MB60838C247727AE7CC324BB25FC1A2@SJ1PR11MB6083.namprd11.prod.outlook.com>
Hi Tony,
On 6/10/26 10:24 AM, Luck, Tony wrote:
>> I see many "if (pmt_module)" checks ... intel_aet_pre_mount() even calls it twice. Are they all necessary?
>> It creates the impression that the PMT module can be yanked from AET at any time, something which
>> intel_aet_unregister_enumeration() seems to allow. I was hoping that there can be some guarantee
>> that if PMT is available during pre_mount() it will continue to be available at least until
>> unmount() completes.
>
> pmt_module is NULL if CONFIG_INTEL_PMT_TELEMETRY=y ... i.e. built-in to the kernel.
>
> In that case it obviously can't go away, and doesn't need module_get()/module_put().
> There's no special case for this. try_module_get() takes a fault on NULL dereference.
>
> When CONFIG_INTEL_PMT_TELEMETRY=m then the get/put should stop it going away
> (I tried rmmod while resctrl mounted and it fails to remove as expected).
Have you tried "unbind" via sysfs?
Even so, my comment was in response to the code you shared. Let me paste it
back and highlight what I meant with the "many pmt_module checks":
> /*
> * Track whether pmt_telemetry enumeration succeeded during mount for use
> * during unmount.
> */
> static bool pmt_in_use;
>
> bool intel_aet_pre_mount(void)
> {
> bool ret;
>
> guard(mutex)(&aet_register_lock);
> if (!get_feature || !put_feature)
> return false;
>
> if (pmt_module) {
Here is an "if (pmt_module)" check ... can it ever be false? If so
then the rest of this function becomes very confusing (more below ...)
> if (!try_module_get(pmt_module))
> return false;
> }
>
> ret = aet_get_events();
aet_get_events() can thus seemingly be called when pmt_module is unset?
>
> if (!ret) {
> if (pmt_module)
Can pmt_module be unset here?
> module_put(pmt_module);
> } else {
> pmt_in_use = true;
So pmt_in_use could be true if pmt_module is unset? Confusing, no?
> }
>
> return ret;
> }
>
> void intel_aet_unmount(void)
> {
> struct event_group **peg;
>
> guard(mutex)(&aet_register_lock);
> if (!pmt_in_use)
> return;
>
> for_each_event_group(peg) {
> if ((*peg)->pfg) {
> struct event_group *e = *peg;
>
> for (int j = 0; j < e->num_events; j++)
> resctrl_disable_mon_event(e->evts[j].id);
> put_feature((*peg)->pfg);
> (*peg)->pfg = NULL;
> }
> }
> if (pmt_module)
So this implies that pmt_module could be unset here ... if that is the
case then that means that the PMT module disappeared while resctrl was
mounted which is exactly what this work aims to prevent, no?
> module_put(pmt_module);
> pmt_in_use = false;
... if pmt_module was unset then how could pmt_in_use ever be true?
> }
Reinette
^ permalink raw reply
* Re: [PATCH v5 00/11] x86,fs/resctrl: Fix long-standing issues
From: Reinette Chatre @ 2026-06-10 17:48 UTC (permalink / raw)
To: tony.luck, james.morse, Dave.Martin, babu.moger, bp, tglx,
dave.hansen
Cc: x86, hpa, ben.horgan, fustini, fenghuay, peternewman, yu.c.chen,
linux-kernel, patches
In-Reply-To: <cover.1781029125.git.reinette.chatre@intel.com>
Hi Everybody,
On 6/9/26 2:02 PM, Reinette Chatre wrote:
> v4: https://lore.kernel.org/lkml/cover.1780456704.git.reinette.chatre@intel.com/
> v3: https://lore.kernel.org/lkml/cover.1779476724.git.reinette.chatre@intel.com/
> v2: https://lore.kernel.org/lkml/20260515193944.15114-1-tony.luck@intel.com/
> v1: https://lore.kernel.org/all/20260508182143.14592-1-tony.luck@intel.com/
>
> While reviewing the AET series [1] Sashiko reported a deadlock during mount,
> and a use-after-free when an L3 domain is removed during CPU offline. More issues
> were uncovered as fixes were developed and reviewed. While the goal is to
> fix all issues the races surrounding pseudo-locked regions are not yet
> solved and have been removed from this series (last appearance was in V3 of
> this series).
>
> Applies against tip/master to ensure it considers pending x86/cache changes
> as well as the lockdep_is_cpus_held() stubs available in smp/core.
>
> Changes since V4:
> - Add new fix to prevent out-of-bouds read when SNC is enabled and domain
> with busy RMID goes offline.
> - Add substitute for "is domain going offline" check to workers to avoid
> reading any event counters on soon-to-be-offline domain since its
> cpu_mask is empty and reading an event counter on an SNC enabled system
> depends on knowing a CPU associated with the domain.
>
> Changes since V3:
> - Drop majority of pseudo-locking fixes, only keep the double free/double
> list add fix.
> - Add patch to help document safe RCU list traversal.
> - See individual patches for detailed changes.
>
> [1] https://sashiko.dev/#/patchset/20260429184858.36423-1-tony.luck%40intel.com
>
> Reinette Chatre (8):
> x86,fs/resctrl: Prevent out-of-bounds access while offlining CPU when
> SNC enabled
> x86,fs/resctrl: Document safe RCU list traversal
> fs/resctrl: Fix deadlock on errors during mount
> fs/resctrl: Prevent use-after-free in rdtgroup_kn_put()
> fs/resctrl: Fix double-add of pseudo-locked region's RMID to free list
> fs/resctrl: Prevent deadlock and use-after-free in info file handlers
> x86/resctrl: Ensure domain fully initialized before placed on RCU list
> fs/resctrl: Fix UAF from worker threads when domains are removed
>
> Tony Luck (3):
> fs/resctrl: Move functions to avoid forward references in subsequent
> fixes
> fs/resctrl: Free mon_data structures on rdt_get_tree() failure
> fs/resctrl: Fix use-after-free during unmount
Addressing Sashiko [2] review feedback here:
[PATCH v5 05/11] fs/resctrl: Fix use-after-free during unmount
Sashiko encountered a "Tool error" during review of this patch. It was able to
complete review of identical patch submitted as part of V4 [3].
Even though Sashiko encountered the "Tool error" it did report a new issue. To help
make progress with the existing fixes I would like to propose that adding new fixes
to this series be stopped. I will start a new series of fixes to address any new
reports.
[PATCH v5 07/11] fs/resctrl: Prevent use-after-free in rdtgroup_kn_put()
[PATCH v5 08/11] fs/resctrl: Fix double-add of pseudo-locked region's RMID to free list
The existing pseudo-locking related issues are known and have been dropped from this series
(last appearance was in V3) while trying to determine how to fix them.
[PATCH v5 11/11] fs/resctrl: Fix UAF from worker threads when domains are removed
Sashiko again (previously reported in V3) reported that this would trigger a lockdep splat
on MPAM. As described in response to previous report [4] this is a false positive since
MPAM does not support the software controller.
Sashiko also reports that "Does this exact ID matching and premature break miss multiple
L3 monitor domains on Sub-NUMA Clustering (SNC) systems?"
This is a false positive since the software controller cannot be used on SNC systems. For
confirmation, see:
commit ac20aa423052 ("x86/resctrl: Block use of mba_MBps mount option on Sub-NUMA Cluster (SNC) systems")
After considering the latest Sashiko feedback and deciding to stop adding new fixes
to this series, I do think this series can now be considered as "settled".
If there are no concerns with this series self then I would like to proceed for it
to be considered for inclusion.
Thank you very much.
Reinette
[2] https://sashiko.dev/#/patchset/cover.1781029125.git.reinette.chatre%40intel.com
[3] https://sashiko.dev/#/patchset/cover.1780456704.git.reinette.chatre%40intel.com?part=4
[4] https://lore.kernel.org/lkml/9ea1986a-a88f-4224-b530-252e0f5cbfd0@intel.com/
^ permalink raw reply
* RE: [PATCH v7 11/14] mpam,x86/resctrl: Resolve INTEL_PMT_TELEMETRY symbols at runtime
From: Luck, Tony @ 2026-06-10 17:24 UTC (permalink / raw)
To: Chatre, Reinette
Cc: Fenghua Yu, Wieczor-Retman, Maciej, Peter Newman, James Morse,
Babu Moger, Drew Fustini, Dave Martin, Chen, Yu C, Box, David E,
x86@kernel.org, Christoph Hellwig, linux-kernel@vger.kernel.org,
patches@lists.linux.dev
In-Reply-To: <8ec6b46e-dd18-491e-97f9-04cbba827370@intel.com>
> I see many "if (pmt_module)" checks ... intel_aet_pre_mount() even calls it twice. Are they all necessary?
> It creates the impression that the PMT module can be yanked from AET at any time, something which
> intel_aet_unregister_enumeration() seems to allow. I was hoping that there can be some guarantee
> that if PMT is available during pre_mount() it will continue to be available at least until
> unmount() completes.
pmt_module is NULL if CONFIG_INTEL_PMT_TELEMETRY=y ... i.e. built-in to the kernel.
In that case it obviously can't go away, and doesn't need module_get()/module_put().
There's no special case for this. try_module_get() takes a fault on NULL dereference.
When CONFIG_INTEL_PMT_TELEMETRY=m then the get/put should stop it going away
(I tried rmmod while resctrl mounted and it fails to remove as expected).
-Tony
^ permalink raw reply
page: next (older)
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox