public inbox for patches@lists.linux.dev
 help / color / mirror / Atom feed
From: "Luck, Tony" <tony.luck@intel.com>
To: Reinette Chatre <reinette.chatre@intel.com>
Cc: Fenghua Yu <fenghuay@nvidia.com>,
	Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>,
	Peter Newman <peternewman@google.com>,
	James Morse <james.morse@arm.com>,
	Babu Moger <babu.moger@amd.com>,
	"Drew Fustini" <dfustini@baylibre.com>,
	Dave Martin <Dave.Martin@arm.com>, Chen Yu <yu.c.chen@intel.com>,
	David E Box <david.e.box@intel.com>, <x86@kernel.org>,
	<linux-kernel@vger.kernel.org>, <patches@lists.linux.dev>
Subject: Re: [PATCH v4 5/7] x86/resctrl: Resolve PMT and TPMI symbols at runtime
Date: Tue, 7 Apr 2026 11:13:37 -0700	[thread overview]
Message-ID: <adVJUSFFypXYp4qT@agluck-desk3> (raw)
In-Reply-To: <435073c9-02c7-4816-ad07-fdec09610114@intel.com>

Hi Reinette,

On Fri, Apr 03, 2026 at 05:56:35PM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> On 3/30/26 2:43 PM, Tony Luck wrote:
> > The resctrl subsystem is always built-in. Currently, it has a direct
> > dependency on INTEL_PMT_TELEMETRY and INTEL_TPMI for Application Event
> > Telemetry (AET) features. This forces these drivers to be built-in as well,
> > even though they are logically separate.
> > 
> > Switch to using symbol_request() to resolve AET-related symbols at
> > runtime. This allows PMT and TPMI to be built as loadable modules.
> > 
> > Update the mount/unmount logic to manage these references:
> > - Acquire symbol references and pin the modules during resctrl mount.
> > - Ensure AET structures are cleaned up and events disabled on unmount.
> > - Release symbol references and unpin modules on unmount or if
> >   mounting fails.
> 
> If I remember correctly resctrl depends on the mount being done much later
> than driver initialization to ensure all PMT load and enumeration is
> complete by the time resctrl fs is mounted.
> Now it looks like these modules can be loaded dynamically on resctrl mount
> with the functions learning about PMT feature groups called right after.
> Considering the change in timing, is it guaranteed that if
> intel_pmt_get_regions_by_feature() is called right after module probe that
> the telemetry driver would be done with its enumeration?

You do remmeber correctly. Your concerns are valid. I'd seen that the
pmt_telemetry module was auto-loaded, but hadn't tracked how/when that
happens.

The sequence is the intel_vsec driver is auto-loaded based on the OOBMSM PCIe
device ID. That sets up an auxillary bus which pulls in the pmt_telemetry
module. Adding a debug pr_info() I saw that the probe routine for the
pmt discover module was called at kernel timestamp 42.720156 for socket 0
and 42.725646 for socket 1.

Adding a "resctrl" line to /etc/fstab attempts the mount at 39.667. Three
seconds too early. No PMT events are found, and code in this V4 version
of the patch series marks the system as AET_NOT_PRESENT and will never
look again :-(

I can drop the AET_NOT_PRESENT state so that a retry will succeed. I don't
see another fix other than to document this limitation.

Workarounds are:
1) Change the CONFIG to build pmt_telemetry into the kernel (where we
   are today, but haven't heard from Linux distros like Red Hat, SUSE etc.
   on whether this is acceptable.)
2) Delay mounting the resctrl file system.

> 
> > 
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > ---
> >  arch/x86/kernel/cpu/resctrl/intel_aet.c | 93 ++++++++++++++++++++++---
> >  1 file changed, 83 insertions(+), 10 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> > index 743a1894fe9a..14b472106f52 100644
> > --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
> > +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> > @@ -23,6 +23,7 @@
> >  #include <linux/intel_vsec.h>
> >  #include <linux/io.h>
> >  #include <linux/minmax.h>
> > +#include <linux/module.h>
> >  #include <linux/printk.h>
> >  #include <linux/rculist.h>
> >  #include <linux/rcupdate.h>
> > @@ -289,6 +290,9 @@ static enum pmt_feature_id lookup_pfid(const char *pfname)
> >  	return FEATURE_INVALID;
> >  }
> >  
> > +static struct pmt_feature_group *(*get_feature)(enum pmt_feature_id id);
> > +static void (*put_feature)(struct pmt_feature_group *p);
> > +
> >  /*
> >   * Request a copy of struct pmt_feature_group for each event group. If there is
> >   * one, the returned structure has an array of telemetry_region structures,
> > @@ -300,7 +304,7 @@ static enum pmt_feature_id lookup_pfid(const char *pfname)
> >   * struct pmt_feature_group to indicate that its events are successfully
> >   * enabled.
> >   */
> > -bool intel_aet_get_events(void)
> > +static bool aet_get_events(void)
> >  {
> >  	struct pmt_feature_group *p;
> >  	enum pmt_feature_id pfid;
> > @@ -309,14 +313,14 @@ bool intel_aet_get_events(void)
> >  
> >  	for_each_event_group(peg) {
> >  		pfid = lookup_pfid((*peg)->pfname);
> > -		p = intel_pmt_get_regions_by_feature(pfid);
> > +		p = get_feature(pfid);
> >  		if (IS_ERR_OR_NULL(p))
> >  			continue;
> >  		if (enable_events(*peg, p)) {
> >  			(*peg)->pfg = p;
> >  			ret = true;
> >  		} else {
> > -			intel_pmt_put_feature_group(p);
> > +			put_feature(p);
> >  		}
> >  	}
> >  
> > @@ -325,27 +329,96 @@ bool intel_aet_get_events(void)
> >  
> >  void __exit intel_aet_exit(void)
> >  {
> > -	struct event_group **peg;
> > +}
> >  
> > -	for_each_event_group(peg) {
> > -		if ((*peg)->pfg) {
> > -			intel_pmt_put_feature_group((*peg)->pfg);
> > -			(*peg)->pfg = NULL;
> > -		}
> > +static bool get_pmt_references(void)
> > +{
> > +	get_feature = symbol_request(intel_pmt_get_regions_by_feature);
> > +	if (!get_feature)
> > +		return false;
> > +	put_feature = symbol_request(intel_pmt_put_feature_group);
> > +	if (!put_feature) {
> > +		symbol_put(intel_pmt_get_regions_by_feature);
> > +		get_feature = NULL;
> > +		return false;
> > +	}
> > +
> > +	return true;
> > +}
> > +
> > +static void put_pmt_references(void)
> > +{
> > +	if (get_feature) {
> > +		symbol_put(intel_pmt_get_regions_by_feature);
> > +		get_feature = NULL;
> > +	}
> > +	if (put_feature) {
> > +		symbol_put(intel_pmt_put_feature_group);
> > +		put_feature = NULL;
> >  	}
> >  }
> >  
> > +static enum {
> > +	AET_UNINITIALIZED,
> > +	AET_PRESENT,
> > +	AET_NOT_PRESENT
> > +} aet_state;
> > +
> >  bool intel_aet_pre_mount(void)
> >  {
> > -	return false; // Temporary stub
> > +	bool ret;
> > +
> > +	if (aet_state == AET_PRESENT)
> > +		return true;
> > +
> > +	if (aet_state == AET_NOT_PRESENT || !get_pmt_references())
> > +		return false;
> > +
> > +	ret = aet_get_events();
> > +
> > +	if (ret) {
> > +		aet_state = AET_PRESENT;
> > +	} else {
> > +		aet_state = AET_NOT_PRESENT;
> > +		put_pmt_references();
> > +	}
> > +
> > +	return ret;
> > +}
> 
> I think I am missing something here. This seems to build a new state
> machine on top of existing state tracking. 
> What if, instead, just do:

The state machine was built to cope with the spurious call to intel_aet_pre_mount()
in the case that user attempts to mount resctrl while it is already mounted.
I removed that issue with the refactor in the file system layer, but didn't
clean up here.

The only other part of this state machine was to avoid repeated calls to
enumerate telemetry events on platforms that don't support them. But given
the race to complete enumeration is lost on early mount, that seems to be
a bad idea too and can be dropped.

> 	aet_get_events() 
> 	{
> 
> 		for_each_event_group(peg) {
> 			if ((*peg)->pfg) /* This means "AET_PRESENT" ? */
> 				continue;
> 			...
> 		}
> 
> 	}
> 
> Similarly, get_feature and put_feature maintains state about
> the symbols.
Yes.

> > +
> > +static void aet_cleanup(void)
> > +{
> > +	struct event_group **peg;
> > +
> > +	if (aet_state == AET_PRESENT) {
> 
> I do not see why this check/state machine is needed, if enumeration was not
> done the loop should just do nothing because (*peg)->pfg will always be NULL. 
Yes.

> > +		for_each_event_group(peg) {
> > +			if ((*peg)->pfg) {
> > +				struct event_group *e = *peg;
> > +
> > +				for (int j = 0; j < e->num_events; j++)
> > +					resctrl_disable_mon_event(e->evts[j].id);
> 
> Can this be expected to clean up/reset all changes from enable_events()? For example,
> how about event_group::num_rmid and rdt_resource::resctrl_mon::num_rmid?

I don't clean these up. But I don't see that they would change from one mount to
the next.

> > +				put_feature((*peg)->pfg);
> > +				(*peg)->pfg = NULL;
> > +			}
> > +		}
> > +		put_pmt_references();
> 
> Similarly here, put_pmt_references() could just always be called and it will not
> do anything if there aren't references?
Yes.

> > +		aet_state = AET_UNINITIALIZED;
> > +	}
> >  }
> >  
> >  void intel_aet_mount_result(int ret)
> >  {
> > +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
> > +
> > +	if (ret) {
> > +		r->mon_capable = false;
> > +		aet_cleanup();
> > +	}
> >  }
> >  
> >  void intel_aet_unmount(void)
> >  {
> > +	aet_cleanup();
> >  }
> 
> I am expecting intel_aet_mount_result() and intel_aet_unmount() to look the same and that
> intel_aet_mount_result() thus becomes unnecessary.

Agreed. Per your comments on the previous patch in the series. File system can just call
the architecture unmount function in the case that the mount failed. Action for
architecture code is same for an aborted mount and for an unmount that follows a
successful mount.

> Just setting r->mon_capable to false does not seem enough though ... how about all those
> monitoring domains that were created? This flow looks unexpected to me, what am I missing?

I'm missing the code to clean up the domains. Will add to next version.

> Reinette
> 

-Tony

  reply	other threads:[~2026-04-07 18:13 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-30 21:43 [PATCH v4 0/7] Allow AET to use PMT/TPMI as loadable modules Tony Luck
2026-03-30 21:43 ` [PATCH v4 1/7] platform/x86/intel/pmt: Export PMT enumeration functions as GPL Tony Luck
2026-04-04  0:00   ` Reinette Chatre
2026-04-06 18:07     ` David Box
2026-04-08  5:07   ` Christoph Hellwig
2026-04-08 17:01     ` Luck, Tony
2026-04-09  5:41       ` Christoph Hellwig
2026-03-30 21:43 ` [PATCH v4 2/7] x86/resctrl: Drop setting of event_group::force_off when insufficient RMIDs Tony Luck
2026-04-04  0:01   ` Reinette Chatre
2026-03-30 21:43 ` [PATCH v4 3/7] fs/resctrl: Add interface to disable a monitor event Tony Luck
2026-04-04  0:03   ` Reinette Chatre
2026-04-06 18:35     ` Luck, Tony
2026-04-06 21:13       ` Reinette Chatre
2026-04-07 18:40         ` Luck, Tony
2026-04-07 23:10           ` Reinette Chatre
2026-03-30 21:43 ` [PATCH v4 4/7] fs,x86/resctrl: Add architecture hooks for every mount/unmount Tony Luck
2026-04-04  0:52   ` Reinette Chatre
2026-04-06 20:35     ` Luck, Tony
2026-04-06 21:16       ` Reinette Chatre
2026-04-09 20:35         ` Luck, Tony
2026-04-10 15:16           ` Reinette Chatre
2026-04-10 18:59             ` Luck, Tony
2026-04-10 21:21               ` Reinette Chatre
2026-04-10 23:03                 ` Luck, Tony
2026-03-30 21:43 ` [PATCH v4 5/7] x86/resctrl: Resolve PMT and TPMI symbols at runtime Tony Luck
2026-04-04  0:56   ` Reinette Chatre
2026-04-07 18:13     ` Luck, Tony [this message]
2026-04-07 18:40       ` Reinette Chatre
2026-04-07 20:33         ` Luck, Tony
2026-03-30 21:43 ` [PATCH v4 6/7] x86/resctrl: Delete intel_aet_exit() Tony Luck
2026-03-30 21:43 ` [PATCH v4 7/7] x86/resctrl: Downgrade dependency of AET on INTEL_PMT Tony Luck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adVJUSFFypXYp4qT@agluck-desk3 \
    --to=tony.luck@intel.com \
    --cc=Dave.Martin@arm.com \
    --cc=babu.moger@amd.com \
    --cc=david.e.box@intel.com \
    --cc=dfustini@baylibre.com \
    --cc=fenghuay@nvidia.com \
    --cc=james.morse@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maciej.wieczor-retman@intel.com \
    --cc=patches@lists.linux.dev \
    --cc=peternewman@google.com \
    --cc=reinette.chatre@intel.com \
    --cc=x86@kernel.org \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox