linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
@ 2025-05-15 22:51 Babu Moger
  2025-05-15 22:51 ` [PATCH v13 01/27] x86/cpufeatures: Add support for " Babu Moger
                   ` (28 more replies)
  0 siblings, 29 replies; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:51 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy


This series adds the support for Assignable Bandwidth Monitoring Counters
(ABMC). It is also called QoS RMID Pinning feature

Series is written such that it is easier to support other assignable
features supported from different vendors.

The feature details are documented in the  APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC). The documentation is available at
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537

The patches are based on top of commit
92a09c47464d0 (tag: v6.15-rc5, tip/irq/merge) Linux 6.15-rc5
plus 
https://lore.kernel.org/lkml/20250515165855.31452-1-james.morse@arm.com/

It is very clear these patches will go after James's resctrl FS/ARCH
restructure. Hoping to avoid one review cycle due to the merge.

# Introduction

Users can create as many monitor groups as RMIDs supported by the hardware.
However, bandwidth monitoring feature on AMD system only guarantees that
RMIDs currently assigned to a processor will be tracked by hardware.
The counters of any other RMIDs which are no longer being tracked will be
reset to zero. The MBM event counters return "Unavailable" for the RMIDs
that are not tracked by hardware. So, there can be only limited number of
groups that can give guaranteed monitoring numbers. With ever changing
configurations there is no way to definitely know which of these groups
are being tracked for certain point of time. Users do not have the option
to monitor a group or set of groups for certain period of time without
worrying about counter being reset in between.
    
The ABMC feature provides an option to the user to assign a hardware
counter to an RMID, event pair and monitor the bandwidth as long as it is
assigned.  The assigned RMID will be tracked by the hardware until the user
unassigns it manually. There is no need to worry about counters being reset
during this period. Additionally, the user can specify a bitmask identifying
the specific bandwidth types from the given source to track with the counter.

Without ABMC enabled, monitoring will work in current 'default' mode without
assignment option.

# History

Earlier implementation of ABMC had dependancy on BMEC (Bandwidth Monitoring
Event Configuration). Peter had concerns with that implementation because
it may be not be compatible with ARM's MPAM.

Here are the threads discussing the concerns and new interface to address the concerns.
https://lore.kernel.org/lkml/CALPaoCg97cLVVAcacnarp+880xjsedEWGJPXhYpy4P7=ky4MZw@mail.gmail.com/
https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/

Here are the finalized requirements based on the discussion:

*   Remove BMEC dependency on the ABMC feature.

*   Eliminate global assignment listing. The interface
    /sys/fs/resctrl/info/L3_MON/mbm_assign_control is no longer required.

*   Create the configuration directories at /sys/fs/resctrl/info/L3_MON/counter_configs/.
    The configuration file names should be free-form, allowing users to create them as needed.

*   Perform assignment listing at the group level by introducing mbm_L3_assignments
    in each monitoring group. The listing should provide the following details:

    Event Configuration: Specifies the event configuration applied. This will be crucial
    when "mkdir" on event configuration is added in the future, leading to the creation
    of mon_data/mon_l3_*/<event configuration>.

    Domains: Identifies the domains where the configuration is applied, supporting multi-domain setups.

    Assignment Type: Indicates whether the assignment is Exclusive (e or d), Shared (s), or Unassigned (_).

*   Provide option to enable or disable auto assignment when new group is created.

This series tries to address all the requirements listed above.

# Implementation details

Create a generic interface aimed to support user space assignment of scarce
counters used for monitoring. First usage of interface is by ABMC with option
to expand usage to "soft-ABMC" and MPAM counters in future.

Feature adds following interface files:

/sys/fs/resctrl/info/L3_MON/mbm_assign_mode: Reports the list of assignable
monitoring features supported. The enclosed brackets indicate which
feature is enabled.

/sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
counters available for assignment.

/sys/fs/resctrl/info/L3_MON/available_mbm_cntrs: Reports the number of monitoring
counters free in each domain.

/sys/fs/resctrl/info/L3_MON/counter_configs : Directory to hold the counter configuration.

/sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter : Default configuration
for MBM total events.

/sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter : Default configuration
for MBM local events.

/sys/fs/resctrl/mbm_L3_assignments: Interface to list or modify assignment states on each group.

# Examples

a. Check if ABMC support is available
	#mount -t resctrl resctrl /sys/fs/resctrl/

	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
	[mbm_cntr_assign]
	default

	ABMC feature is detected and it is enabled.

b. Check how many ABMC counters are available. 

	# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs 
	32

c. Check how many ABMC counters are available in each domain.

	# cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs 
	0=30;1=30

d. Check default counter configuration.

	# cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter 
	local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
        local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all

	# cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter 
	local_reads, local_non_temporal_writes, local_reads_slow_memory

e. Series adds a new interface file "mbm_L3_assignments" in each monitoring group
   to list and modify any group's monitoring states.

	The list is displayed in the following format:

        <Event configuration>:<Domain id>=<Assignment type>

        Event configuration: A valid event configuration listed in the
        /sys/fs/resctrl/info/L3_MON/counter_configs directory.

        Domain ID: A valid domain ID number.

        Assignment types:

        _ : No event configuration assigned

        e : Event configuration assigned in exclusive mode

	To list the default group states:
	# cat /sys/fs/resctrl/mbm_L3_assignments
	mbm_total_bytes:0=e;1=e
	mbm_local_bytes:0=e;1=e

	To unassign the configuration of mbm_total_bytes on domain 0:
	#echo "mbm_total_bytes:0=_" > mbm_L3_assignments
	#cat mbm_L3_assignments
	mbm_total_bytes:0=_;1=e
	mbm_local_bytes:0=e;1=e

	To unassign the mbm_total_bytes configuration on all domains:
    	$echo "mbm_total_bytes:*=_" > mbm_L3_assignments
	$cat mbm_L3_assignments
	mbm_total_bytes:0=_;1=_
	mbm_local_bytes:0=e;1=e

	To assign the mbm_total_bytes configuration on all domains in exclusive mode:
    	$echo "mbm_total_bytes:*=e" > mbm_L3_assignments
	$cat mbm_L3_assignments
	mbm_total_bytes:0=e;1=e
	mbm_local_bytes:0=e;1=e

g. Read the events mbm_total_bytes and mbm_local_bytes of the default group.
   There is no change in reading the events with ABMC. If the event is unassigned
   when reading, then the read will come back as "Unassigned".
	
	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	779247936
	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes 
	765207488
	
h. Check the default event configurations.

	#cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
	local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
	local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all

	#cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
	local_reads, local_non_temporal_writes, local_reads_slow_memory

i. Change the event configuration for mbm_local_bytes.

	#echo "local_reads, local_non_temporal_writes, local_reads_slow_memory, remote_reads" >
	/sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter

	#cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
	local_reads, local_non_temporal_writes, local_reads_slow_memory, remote_reads
	
        This will update the assignments where mbm_local_bytes are configured.
	
j. Now read the total event again. The first read may come back with "Unavailable"
   status. The subsequent read of mbm_total_bytes will display only the read events.
	
	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	Unavailable
	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	314101

k. Users will have the option to go back to 'default' mbm_assign_mode if required.
   This can be done using the following command. Note that switching the
   mbm_assign_mode will reset all the MBM counters of all resctrl groups.

	# echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
	mbm_cntr_assign
	[default]
	
l. Unmount the resctrl
	 
	#umount /sys/fs/resctrl/
---
v13:
   Removed BMEC related 2 patches which were in the previous series.
   It was related to optimization which can be doen later.

   Patches are created on top of FS/ARCH restructure. So, major changes
   are due to FS/ARCH restructure. The files are split between
   arch/x86/kernel/cpu/resctrl/ and fs/resctrl/. So, functions
   are moved between these files accordingly.

   Added fflag RFTYPE_RES_CACHE for mbm_assign_mode, num_mbm_cntrs, available_mbm_cntrs.

   Removed the references to "mbm_assign_control".
  
   Moved resctrl_arch_config_cntr() prototype to include/linux/resctrl.h.
   Changed resctrl_arch_config_cntr() to retun void from int to simplify few call
   sequences.

   Added the event configuration details inside the evt_list in monitor domains.
   The avoids the need for new structure mbm_assign_config. 

   Passed evtid to functions resctrl_alloc_config_cntr() and resctrl_assign_cntr_event().
   Event configuration value can be easily obtained from mon_evt list.

   Added new patch to pass the entire struct rdtgroup to __mon_event_count(),
   mbm_update(), and related functions. We can easily get RMID,CLOSID etc from rdtgroup.

   Added new function __cntr_id_read_phys() to handle ABMC event reading.

   Added a new patch to hide BMEC related files when mbm_cntr_assign mode is enabled..
  
   Added the call resctrl_init_evt_configuration() to setup the event configuration during init.

   And few other commit message updates and user doc updates.

   Removed Reviewed-by from few patches as patches have changed due to FS/ARCH restructure.

   Let me know if I missed something.

v12:
   This version is kind of RFC series with a new interface.
   
   Removed Reviewed-by tag on few patches when the patch has changed.

   Moved BMEC related patches (1 and 2) to beginning of the series.
   Removed the dependancy on BMEC to ABMC feature.

   Removed the un-necessary initialization of mon_config_info structure.
   Changed wrmsrl instead of wrmsr to address the below comment.
   https://lore.kernel.org/lkml/0fc8dbd4-07d8-40bd-8eec-402b48762807@zytor.com/

   Fixed the conflicts due to recent changes in rdt_resource data structure.
   Added new mbm_cfg_mask field to resctrl_mon.
   
   Added the code to reset arch state inside _resctrl_abmc_enable().

   Added the check CONFIG_RESCTRL_ASSIGN_FIXED to take care of arm platforms.
   This will be defined only in arm and not in x86.

   Changed the code to display the max supported monitoring counters in each domain.
   
   Fixed the struct mbm_cntr_cfg code documentation.
   Moved the struct mbm_cntr_cfg definition to resctrl/internal.h as suggested by James.

   Replaced seq_puts(s, ";") with seq_putc(s, ';');
   Added missing rdt_last_cmd_clear() in resctrl_available_mbm_cntrs_show().

   Added the check to reset the architecture-specific state only when assign is requested.

   Added evt_cfg as the parameter to resctrl_arch_config_cntr() as the user will
   be passing the event configuration from /info/L3_MON/event_configs/.

   Changed the check in resctrl_alloc_config_cntr() to reduce the indentation.
   Fixed the handling error on first failure while assigning.
   Added new parameter event configuration (evt_cfg) to get the event configuration from user space.

   Added tte support for reading ABMC counters. This is bit involved change and affects lots of code.

   New patch to support event configurations via new counter_configs method.

   Removed mbm_cntr_reset() as it is not required while removing the group.

   Added new patch to handle auto assign on group creation ("mbm_assign_on_mkdir")

   Added couple of patches add interface for "mbm_L3_assignments" on each mon group.

   Introduced mbm_cntr_free_all() and resctrl_reset_rmid_all() to clear counters and
   non-architectural states when monitor mode is changed.
   https://lore.kernel.org/lkml/b60b4f72-6245-46db-a126-428fb13b6310@intel.com/

   Moved the resctrl_arch_mbm_cntr_assign_set_one to domain_add_cpu_mon().

   Patches 17, 18, 19, 20, 21, 23, 24 are completely new to address the new interface requirement.

v11:
   The commit 2937f9c361f7a ("x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags")
   is already merged. Removed from the series.
   
   Resolved minor conflicts due to code displacement in latest code.
 
   Moved the monitoring related calls to monitor.c file when possible.
   Moved some of the changes from include/linux/resctrl.h to arch/x86/kernel/cpu/resctrl/internal.h
   as requested by Reinette. This changes will be moved back when arch and non code is separated.
   
   Renamed rdtgroup_mbm_assign_mode_show() to resctrl_mbm_assign_mode_show().
   Renamed rdtgroup_num_mbm_cntrs_show() to resctrl_num_mbm_cntrs_show().

   Moved the mon_config_info structure definition to internal.h.
   Moved resctrl_arch_mon_event_config_get() and resctrl_arch_mon_event_config_set()
   to monitor.c file.

   Moved resctrl_arch_assign_cntr() and resctrl_abmc_config_one_amd() to monitor.c.
   Added the code to reset the arch state in resctrl_arch_assign_cntr().
   Also removed resctrl_arch_reset_rmid() inside IPI as the counters are reset from the callers.

   Renamed rdtgroup_assign_cntr_event() to resctrl_assign_cntr_event().
   Refactored the resctrl_assign_cntr_event().
   Added functionality to exit on the first error during assignment.
   Simplified mbm_cntr_free().
   Removed the function mbm_cntr_assigned(). Will be using mbm_cntr_get() to
   figure out if the counter is assigned or not.
   
   Renamed rdtgroup_unassign_cntr_event() to resctrl_unassign_cntr_event().
   Refactored the resctrl_unassign_cntr_event().

   Moved mbm_cntr_reset() to monitor.c.
   Added code reset non-architectural state in mbm_cntr_reset().
   Added missing rdtgroup_unassign_cntrs() calls on failure path.

   Domain can be NULL with SNC support so moved the unassign check in rdtgroup_mondata_show().

   Renamed rdtgroup_mbm_assign_mode_write() to resctrl_mbm_assign_mode_write().
   Added more details in resctrl.rst about mbm_cntr_assign mode.
   Re-arranged the text in resctrl.rst file in section mbm_cntr_assign.

   Moved resctrl_arch_mbm_cntr_assign_set_one() to monitor.c

   Added non-arch RMID reset in mbm_config_write_domain().
   Removed resctrl_arch_reset_rmid() call in resctrl_abmc_config_one_amd(). Not required
   as reset of arch and non-arch rmid counters done from the callers. It simplies the IPI code.

   Fixed printing the separator after each domain while listing the group assignments.
   Renamed rdtgroup_mbm_assign_control_show to resctrl_mbm_assign_control_show().

   Fixed the static check warning with initializing dom_id in resctrl_process_flags()

   Added change log in each patch for specific changes.

v10:
   Major change is related to domain specific assignment.
   Added struct mbm_cntr_cfg inside mon domains. This will handle
   the domain specific assignments as discussed in below.
   https://lore.kernel.org/lkml/CALPaoCj+zWq1vkHVbXYP0znJbe6Ke3PXPWjtri5AFgD9cQDCUg@mail.gmail.com/
   I did not see the need to add cntr_id in mbm_state structure. Not used in the code.
   Following patches take care of these changes.
   Patch 12, 13, 15, 16, 17, 18.
   
   Added __init attribute to cache_alloc_hsw_probe(). Followed function
   prototype rules (preferred order is storage class before return type).
   
   Moved the mon_config_info structure definition to resctrl.h
   
   Added call resctrl_arch_reset_rmid() to reset the RMID in the domain inside IPI call
   resctrl_abmc_config_one_amd.
   
   SMP and non-SMP call support is not required in resctrl_arch_config_cntr with new
   domain specific assign approach/data structure.
   
   Assigned the counter before exposing the event files.
   Moved the call rdtgroup_assign_cntrs() inside mkdir_rdt_prepare_rmid_alloc().
   This is called both CNTR_MON and MON group creation.
   
   Call mbm_cntr_reset() when unmounted to clear all the assignments.
   
   Fixed the issue with finding the domain in multiple iterations in rdtgroup_process_flags().
   
   Printed full error message with domain information when assign fails.
   
   Taken care of other text comments in all the patches. Patch specific changes are in each patch.
   
   If I missed something please point me and it is not intentional.

v9:
   Patch 14 is a new addition. 
   Major change in patch 24.
   Moved the fix patch to address __init attribute to begining of the series.
   Fixed all the call sequences. Added additional Fixed tags.

   Added Reviewed-by where applicable.

   Took care of couple of minor merge conflicts with latest code.
   Re-ordered the MSR in couple of instances.
   Added available_mbm_cntrs (patch 14) to print the number of counter in a domain.

   Used MBM_EVENT_ARRAY_INDEX macro to get the event index.
   Introduced rdtgroup_cntr_id_init() to initialize the cntr_id

   Introduced new function resctrl_config_cntr to assign the counter, update
   the bitmap and reset the architectural state.
   Taken care of error handling(freeing the counter) when assignment fails.
  
   Changed rdtgroup_assign_cntrs() and rdtgroup_unassign_cntrs() to return void.
   Updated couple of rdtgroup_unassign_cntrs() calls properly.

   Fixed problem changing the mode to mbm_cntr_assign mode when it is
   not supported. Added extra checks to detect if systems supports it.
   
   https://lore.kernel.org/lkml/03b278b5-6c15-4d09-9ab7-3317e84a409e@intel.com/
   As discussed in the above comment, introduced resctrl_mon_event_config_set to
   handle IPI. But sending another IPI inside IPI causes problem. Kernel
   reports SMP warning. So, introduced resctrl_arch_update_cntr() to send the
   command directly.

   Fixed handling special case '//0=' and '//".
   Removed extra strstr() call in rdtgroup_mbm_assign_control_write().
   Added generic failure text when assignment operation fails.
   Corrected user documentation format texts.

v8:
  Patches are getting into final stages. 
  Couple of changes Patch 8, Patch 19 and Patch 23.
  Most of the other changes are related to rename and text message updates.

  Details are in each patch. Here is the summary.

  Added __init attribute to dom_data_init() in patch 8/25.
  Moved the mbm_cntrs_init() and mbm_cntrs_exit() functionality inside
  dom_data_init() and dom_data_exit() respectively.

  Renamed resctrl_mbm_evt_config_init() to arch_mbm_evt_config_init()
  Renamed resctrl_arch_event_config_get() to resctrl_arch_mon_event_config_get().
          resctrl_arch_event_config_set() to resctrl_arch_mon_event_config_set().

  Rename resctrl_arch_assign_cntr to resctrl_arch_config_cntr.
  Renamed rdtgroup_assign_cntr() to rdtgroup_assign_cntr_event().
  Added the code to return the error if rdtgroup_assign_cntr_event fails.
  Moved definition of MBM_EVENT_ARRAY_INDEX to resctrl/internal.h.
  Renamed rdtgroup_mbm_cntr_is_assigned to mbm_cntr_assigned_to_domain
  Added return error handling in resctrl_arch_config_cntr().
  Renamed rdtgroup_assign_grp to rdtgroup_assign_cntrs.
  Renamed rdtgroup_unassign_grp to rdtgroup_unassign_cntrs.
  Fixed the problem with unassigning the child MON groups of CTRL_MON group.
  Reset the internal counters after mbm_cntr_assign mode is changed.
  Renamed rdtgroup_mbm_cntr_reset() to mbm_cntr_reset()
  Renamed resctrl_arch_mbm_cntr_assign_configure to
            resctrl_arch_mbm_cntr_assign_set_one.

  Used the same IPI as event update to modify the assignment.
  Could not do the way we discussed in the thread.
  https://lore.kernel.org/lkml/f77737ac-d3f6-3e4b-3565-564f79c86ca8@amd.com/
  Needed to figure out event type to update the configuration.

  Moved unassign first and assign during the assign modification.
  Assign none "_" takes priority. Cannot be mixed with other flags.
  Updated the documentation and .rst file format. htmldoc looks ok.

v7:
   Major changes are related to FS and arch codes separation.
   Changed few interface names based on feedback.
   Here are the summary and each patch contains changes specific the patch.

   Removed WARN_ON for num_mbm_cntrs. Decided to dynamically allocate the bitmap.
   WARN_ON is not required anymore.
 
   Renamed the function resctrl_arch_get_abmc_enabled() to resctrl_arch_mbm_cntr_assign_enabled().

   Merged resctrl_arch_mbm_cntr_assign_disable, resctrl_arch_mbm_cntr_assign_disable
   and renamed to resctrl_arch_mbm_cntr_assign_set(). Passed the struct rdt_resource
   to these functions.

   Removed resctrl_arch_reset_rmid_all() from arch code. This will be done from FS the caller.

   Updated the descriptions/commit log in resctrl.rst to generic text. Removed ABMC references.
   Renamed mbm_mode to mbm_assign_mode.
   Renamed mbm_control to  mbm_assign_control.
   Introduced mutex lock in rdtgroup_mbm_mode_show().
 
   The 'legacy' mode is called 'default' mode. 

   Removed the static allocation and now allocating bitmap mbm_cntr_free_map dynamically.

   Merged rdtgroup_assign_cntr(), rdtgroup_alloc_cntr() into one.
   Merged rdtgroup_unassign_cntr(), rdtgroup_free_cntr() into one.
   
  Added struct rdt_resource to the interface functions resctrl_arch_assign_cntr ()
  and resctrl_arch_unassign_cntr().
  Rename rdtgroup_abmc_cfg() to resctrl_abmc_config_one_amd().
   
  Added a new patch to fix counter assignment on event config changes.

  Removed the references of ABMC from user interfaces.

  Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write().
  Added mutex lock in rdtgroup_mbm_assign_control_write() while processing.

  Thomas Gleixner asked us to update  https://gitlab.com/x86-cpuid.org/x86-cpuid-db. 
  It needs internal approval. We are working on it.

v6:
  We still need to finalize few interface details on mbm_assign_mode and mbm_assign_control
  in case of ABMC and Soft-ABMC. We can continue the discussion with this series.

  Added support for domain-id '*' to update all the domains at once.
  Fixed assign interface to allocate the counter if counter is
  not assigned.   
  Fixed unassign interface to free the counter if the counter is not
  assigned in any of the domains.

  Renamed abmc_capable to mbm_cntr_assignable.

  Renamed abmc_enabled to mbm_cntr_assign_enabled.
  Used msr_set_bit and msr_clear_bit for msr updates.
  Renamed resctrl_arch_abmc_enable() to resctrl_arch_mbm_cntr_assign_enable().
  Renamed resctrl_arch_abmc_disable() to resctrl_arch_mbm_cntr_assign_disable().

  Changed the display name from num_cntrs to num_mbm_cntrs.

  Removed the variable mbm_cntrs_free_map_len. This is not required.
  Removed the call mbm_cntrs_init() in arch code. This needs to be done at higher level.
  Used DECLARE_BITMAP to initialize mbm_cntrs_free_map.
  Removed unused config value definitions.

  Introduced mbm_cntr_map to track counters at domain level. With this
  we dont need to send MSR read to read the counter configuration.

  Separated all the counter id management to upper level in FS code.

  Added checks to detect "Unassigned" before reading the RMID.

  More details in each patch.

v5:
  Rebase changes (because of SNC support)

  Interface changes.
   /sys/fs/resctrl/mbm_assign to /sys/fs/resctrl/mbm_assign_mode.
   /sys/fs/resctrl/mbm_assign_control to /sys/fs/resctrl/mbm_assign_control.

  Added few arch specific routines.
  resctrl_arch_get_abmc_enabled.
  resctrl_arch_abmc_enable.
  resctrl_arch_abmc_disable.

  Few renames
   num_cntrs_free_map -> mbm_cntrs_free_map
   num_cntrs_init -> mbm_cntrs_init
   arch_domain_mbm_evt_config -> resctrl_arch_mbm_evt_config

  Introduced resctrl_arch_event_config_get and
    resctrl_arch_event_config_set() to update event configuration.

  Removed mon_state field mongroup. Added MON_CNTR_UNSET to initialize counters.

  Renamed ctr_id to cntr_id for the hardware counter.
 
  Report "Unassigned" in case the user attempts to read the events without assigning the counter.
  
  ABMC is enabled during the boot up. Can be enabled or disabled later.

  Fixed opcode and flags combination.
    '=_" is valid.
    "-_" amd "+_" is not valid.

 Added all the comments as far as I know. If I missed something, it is not intentional.

v4: 
  Main change is domain specific event assignment.
  Kept the ABMC feature as a default.
  Dynamcic switching between ABMC and mbm_legacy is still allowed.
  We are still not clear about mount option.
  Moved the monitoring related data in resctrl_mon structure from rdt_resource.
  Fixed the display of legacy and ABMC mode.
  Used bimap APIs when possible.
  Removed event configuration read from MSRs. We can use the
  internal saved data.(patch 12)
  Added more comments about L3_QOS_ABMC_CFG MSR.
  Added IPIs to read the assignment status for each domain (patch 18 and 19)
  More details in each patch.

v3:
   This series adds the support for global assignment mode discussed in
   the thread. https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/
   Removed the individual assignment mode and included the global assignment interface.
   Added following interface files.
   a. /sys/fs/resctrl/info/L3_MON/mbm_assign
      Used for displaying the current assignment mode and switch between
      ABMC and legacy mode.
   b. /sys/fs/resctrl/info/L3_MON/mbm_assign_control
      Used for lising the groups assignment mode and modify the assignment states.
   c. Most of the changes are related to the new interface.
   d. Addressed the comments from Reinette, James and Peter.
   e. Hope I have addressed most of the major feedbacks discussed. If I missed
      something then it is not intentional. Please feel free to comment.
   f. Sending this as an RFC as per Reinette's comment. So, this is still open
      for discussion.

v2:
   a. Major change is the way ABMC is enabled. Earlier, user needed to remount
      with -o abmc to enable ABMC feature. Removed that option now.
      Now users can enable ABMC by "$echo 1 to /sys/fs/resctrl/info/L3_MON/mbm_assign_enable".
     
   b. Added new word 21 to x86/cpufeatures.h.

   c. Display unsupported if user attempts to read the events when ABMC is enabled
      and event is not assigned.

   d. Display monitor_state as "Unsupported" when ABMC is disabled.
  
   e. Text updates and rebase to latest tip tree (as of Jan 18).
 
   f. This series is still work in progress. I am yet to hear from ARM developers. 

--------------------------------------------------------------------------------------

Previous revisions:
v12: https://lore.kernel.org/lkml/cover.1743725907.git.babu.moger@amd.com/
v11: https://lore.kernel.org/lkml/cover.1737577229.git.babu.moger@amd.com/
v10: https://lore.kernel.org/lkml/cover.1734034524.git.babu.moger@amd.com/
v9: https://lore.kernel.org/lkml/cover.1730244116.git.babu.moger@amd.com/
v8: https://lore.kernel.org/lkml/cover.1728495588.git.babu.moger@amd.com/
v7: https://lore.kernel.org/lkml/cover.1725488488.git.babu.moger@amd.com/
v6: https://lore.kernel.org/lkml/cover.1722981659.git.babu.moger@amd.com/
v5: https://lore.kernel.org/lkml/cover.1720043311.git.babu.moger@amd.com/
v4: https://lore.kernel.org/lkml/cover.1716552602.git.babu.moger@amd.com/
v3: https://lore.kernel.org/lkml/cover.1711674410.git.babu.moger@amd.com/  
v2: https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/
v1: https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/

Babu Moger (27):
  x86/cpufeatures: Add support for Assignable Bandwidth Monitoring
    Counters (ABMC)
  x86/resctrl: Add ABMC feature in the command line options
  x86/resctrl: Consolidate monitoring related data from rdt_resource
  x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
  x86/resctrl: Add support to enable/disable AMD ABMC feature
  x86/resctrl: Introduce the interface to display monitor mode
  x86/resctrl: Introduce interface to display number of monitoring
    counters
  x86/resctrl: Introduce mbm_cntr_cfg to track assignable counters at
    domain
  x86/resctrl: Introduce interface to display number of free MBM
    counters
  x86/resctrl: Add data structures and definitions for ABMC assignment
  x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter
    with ABMC
  x86/resctrl: Introduce event configuration modes
  x86/resctrl: Add the functionality to assign MBM events
  x86/resctrl: Add the functionality to unassign MBM events
  x86/resctrl: Report 'Unassigned' for MBM events in mbm_cntr_assign
    mode
  x86/resctrl: Pass entire struct rdtgroup rather than passing
    individual members
  x86/resctrl: Add the support for reading ABMC counters
  x86/resctrl: Add definitions for MBM event configuration
  x86/resctrl: Add event configuration directory under info/L3_MON/
  x86/resctrl: Provide interface to update the event configurations
  x86/resctrl: Introduce mbm_assign_on_mkdir to configure assignments
  x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is
    enabled
  x86/resctrl: Introduce mbm_L3_assignments to list assignments in a
    group
  x86/resctrl: Introduce the interface to modify assignments in a group
  x86/resctrl: Hide the BMEC related files when mbm_cnt_assign is
    enabled
  x86/resctrl: Introduce the interface to switch between monitor modes
  x86/resctrl: Configure mbm_cntr_assign mode if supported

 .../admin-guide/kernel-parameters.txt         |   2 +-
 Documentation/filesystems/resctrl.rst         | 188 +++++
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/msr-index.h              |   2 +
 arch/x86/kernel/cpu/cpuid-deps.c              |   2 +
 arch/x86/kernel/cpu/resctrl/core.c            |  13 +-
 arch/x86/kernel/cpu/resctrl/internal.h        |  47 ++
 arch/x86/kernel/cpu/resctrl/monitor.c         | 176 +++-
 arch/x86/kernel/cpu/scattered.c               |   1 +
 fs/resctrl/ctrlmondata.c                      |  14 +
 fs/resctrl/internal.h                         |  37 +-
 fs/resctrl/monitor.c                          | 309 ++++++-
 fs/resctrl/rdtgroup.c                         | 768 +++++++++++++++++-
 include/linux/resctrl.h                       |  74 +-
 include/linux/resctrl_types.h                 |  11 +
 15 files changed, 1577 insertions(+), 68 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH v13 01/27] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
@ 2025-05-15 22:51 ` Babu Moger
  2025-05-22 20:51   ` Reinette Chatre
  2025-05-15 22:51 ` [PATCH v13 02/27] x86/resctrl: Add ABMC feature in the command line options Babu Moger
                   ` (27 subsequent siblings)
  28 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:51 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

Users can create as many monitor groups as RMIDs supported by the hardware.
However, bandwidth monitoring feature on AMD system only guarantees that
RMIDs currently assigned to a processor will be tracked by hardware. The
counters of any other RMIDs which are no longer being tracked will be reset
to zero. The MBM event counters return "Unavailable" for the RMIDs that are
not tracked by hardware. So, there can be only limited number of groups
that can give guaranteed monitoring numbers. With ever changing
configurations there is no way to definitely know which of these groups are
being tracked for certain point of time. Users do not have the option to
monitor a group or set of groups for certain period of time without
worrying about RMID being reset in between.

The ABMC feature provides an option to the user to assign a hardware
counter to an RMID, event pair and monitor the bandwidth as long as it is
assigned. The assigned RMID will be tracked by the hardware until the user
unassigns it manually. There is no need to worry about counters being reset
during this period. Additionally, the user can specify a bitmask
identifying the specific bandwidth types from the given source to track
with the counter.

Without ABMC enabled, monitoring will work in current mode without
assignment option.

The Linux resctrl subsystem provides an interface that allows monitoring of
up to two memory bandwidth events per group, selected from a combination of
available total and local events. When ABMC is enabled, two events will be
assigned to each group by default, in line with the current interface
design. Users will also have the option to configure which types of memory
transactions are counted by these events.

Due to the limited number of available counters (32), users may quickly
exhaust the available counters. If the system runs out of assignable ABMC
counters, the kernel will report an error. In such cases, users will nee
dto unassign one or more active counters to free up countes for new
assignments. The interface will provide options to assign or unassign
events through the group-specific interface file.

The feature can be detected via CPUID_Fn80000020_EBX_x00 bit 5.
Bits Description
5    ABMC (Assignable Bandwidth Monitoring Counters)

The feature details are documented in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
Note: Checkpatch checks/warnings are ignored to maintain coding style.

v13: Updated the commit log with Linux interface details.

v12: Removed the dependancy on X86_FEATURE_BMEC.
     Removed the Reviewed-by tag as patch has changed.

v11: No changes.

v10: No changes.

v9: Took care of couple of minor merge conflicts. No other changes.

v8: No changes.

v7: Removed "" from feature flags. Not required anymore.
    https://lore.kernel.org/lkml/20240817145058.GCZsC40neU4wkPXeVR@fat_crate.local/

v6: Added Reinette's Reviewed-by. Moved the Checkpatch note below ---.

v5: Minor rebase change and subject line update.

v4: Changes because of rebase. Feature word 21 has few more additions now.
    Changed the text to "tracked by hardware" instead of active.

v3: Change because of rebase. Actual patch did not change.

v2: Added dependency on X86_FEATURE_BMEC.
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/cpu/cpuid-deps.c   | 2 ++
 arch/x86/kernel/cpu/scattered.c    | 1 +
 3 files changed, 4 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 6c2c152d8a67..d5c14dc678df 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -481,6 +481,7 @@
 #define X86_FEATURE_AMD_HETEROGENEOUS_CORES (21*32 + 6) /* Heterogeneous Core Topology */
 #define X86_FEATURE_AMD_WORKLOAD_CLASS	(21*32 + 7) /* Workload Classification */
 #define X86_FEATURE_PREFER_YMM		(21*32 + 8) /* Avoid ZMM registers due to downclocking */
+#define X86_FEATURE_ABMC		(21*32 + 9) /* Assignable Bandwidth Monitoring Counters */
 
 /*
  * BUG word(s)
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index a2fbea0be535..2f54831e04e5 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -71,6 +71,8 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_CQM_MBM_LOCAL,		X86_FEATURE_CQM_LLC   },
 	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_TOTAL   },
 	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_LOCAL   },
+	{ X86_FEATURE_ABMC,			X86_FEATURE_CQM_MBM_TOTAL   },
+	{ X86_FEATURE_ABMC,			X86_FEATURE_CQM_MBM_LOCAL   },
 	{ X86_FEATURE_AVX512_BF16,		X86_FEATURE_AVX512VL  },
 	{ X86_FEATURE_AVX512_FP16,		X86_FEATURE_AVX512BW  },
 	{ X86_FEATURE_ENQCMD,			X86_FEATURE_XSAVES    },
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 16f3ca30626a..3b72b72270f1 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -49,6 +49,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_MBA,			CPUID_EBX,  6, 0x80000008, 0 },
 	{ X86_FEATURE_SMBA,			CPUID_EBX,  2, 0x80000020, 0 },
 	{ X86_FEATURE_BMEC,			CPUID_EBX,  3, 0x80000020, 0 },
+	{ X86_FEATURE_ABMC,			CPUID_EBX,  5, 0x80000020, 0 },
 	{ X86_FEATURE_AMD_WORKLOAD_CLASS,	CPUID_EAX, 22, 0x80000021, 0 },
 	{ X86_FEATURE_PERFMON_V2,		CPUID_EAX,  0, 0x80000022, 0 },
 	{ X86_FEATURE_AMD_LBR_V2,		CPUID_EAX,  1, 0x80000022, 0 },
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v13 02/27] x86/resctrl: Add ABMC feature in the command line options
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
  2025-05-15 22:51 ` [PATCH v13 01/27] x86/cpufeatures: Add support for " Babu Moger
@ 2025-05-15 22:51 ` Babu Moger
  2025-05-15 22:51 ` [PATCH v13 03/27] x86/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
                   ` (26 subsequent siblings)
  28 siblings, 0 replies; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:51 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

Add the command line option to enable or disable exposing the ABMC
(Assignable Bandwidth Monitoring Counters) hardware feature to resctrl.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v13: Removed the Reviewed-by as the file resctrl.rst is moved to
     Documentation/filesystems/resctrl.rst. In that sense patch has changed.

v12: No changes.

v11: No changes.

v10: No changes.

v9: No code changes. Added Reviewed-by.

v8: Commit message update.

v7: No changes

v6: No changes

v5: No changes

v4: No changes

v3: No changes

v2: No changes
---
 Documentation/admin-guide/kernel-parameters.txt | 2 +-
 Documentation/filesystems/resctrl.rst           | 1 +
 arch/x86/kernel/cpu/resctrl/core.c              | 2 ++
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index d9fd26b95b34..ed9761bb2e4a 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5988,7 +5988,7 @@
 	rdt=		[HW,X86,RDT]
 			Turn on/off individual RDT features. List is:
 			cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
-			mba, smba, bmec.
+			mba, smba, bmec, abmc.
 			E.g. to turn on cmt and turn off mba use:
 				rdt=cmt,!mba
 
diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index c7949dd44f2f..c97fd77a107d 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -26,6 +26,7 @@ MBM (Memory Bandwidth Monitoring)		"cqm_mbm_total", "cqm_mbm_local"
 MBA (Memory Bandwidth Allocation)		"mba"
 SMBA (Slow Memory Bandwidth Allocation)         ""
 BMEC (Bandwidth Monitoring Event Configuration) ""
+ABMC (Assignable Bandwidth Monitoring Counters) ""
 ===============================================	================================
 
 Historically, new features were made visible by default in /proc/cpuinfo. This
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 224bed28f341..15a1dfa92923 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -704,6 +704,7 @@ enum {
 	RDT_FLAG_MBA,
 	RDT_FLAG_SMBA,
 	RDT_FLAG_BMEC,
+	RDT_FLAG_ABMC,
 };
 
 #define RDT_OPT(idx, n, f)	\
@@ -729,6 +730,7 @@ static struct rdt_options rdt_options[]  __ro_after_init = {
 	RDT_OPT(RDT_FLAG_MBA,	    "mba",	X86_FEATURE_MBA),
 	RDT_OPT(RDT_FLAG_SMBA,	    "smba",	X86_FEATURE_SMBA),
 	RDT_OPT(RDT_FLAG_BMEC,	    "bmec",	X86_FEATURE_BMEC),
+	RDT_OPT(RDT_FLAG_ABMC,	    "abmc",	X86_FEATURE_ABMC),
 };
 #define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options)
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v13 03/27] x86/resctrl: Consolidate monitoring related data from rdt_resource
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
  2025-05-15 22:51 ` [PATCH v13 01/27] x86/cpufeatures: Add support for " Babu Moger
  2025-05-15 22:51 ` [PATCH v13 02/27] x86/resctrl: Add ABMC feature in the command line options Babu Moger
@ 2025-05-15 22:51 ` Babu Moger
  2025-05-22 20:52   ` Reinette Chatre
  2025-05-15 22:51 ` [PATCH v13 04/27] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
                   ` (25 subsequent siblings)
  28 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:51 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

The cache allocation and memory bandwidth allocation feature properties
are consolidated into struct resctrl_cache and struct resctrl_membw
respectively.

In preparation for more monitoring properties that will clobber the
existing resource struct more, re-organize the monitoring specific
properties to also be in a separate structure.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v13: Changes due to FS/ARCH restructure.

v12: Fixed the conflicts due to recent changes in rdt_resource data structure.
     Added new mbm_cfg_mask field to resctrl_mon.
     Removed Reviewed-by tag as patch has changed.

v11: No changes.

v10: No changes.

v9: No changes.

v8: Added Reviewed-by from Reinette. No other changes.

v7: Added kernel doc for data structure. Minor text update.

v6: Update commit message and update kernel doc for rdt_resource.

v5: Commit message update.
    Also changes related to data structure updates does to SNC support.

v4: New patch.
---
 arch/x86/kernel/cpu/resctrl/core.c    |  4 ++--
 arch/x86/kernel/cpu/resctrl/monitor.c | 12 ++++++------
 fs/resctrl/monitor.c                  |  8 ++++----
 fs/resctrl/rdtgroup.c                 | 12 ++++++------
 include/linux/resctrl.h               | 22 +++++++++++++++-------
 5 files changed, 33 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 15a1dfa92923..6859566398d6 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -106,7 +106,7 @@ u32 resctrl_arch_system_num_rmid_idx(void)
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
 
 	/* RMID are independent numbers for x86. num_rmid_idx == num_rmid */
-	return r->num_rmid;
+	return r->mon.num_rmid;
 }
 
 struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
@@ -534,7 +534,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
 
 	arch_mon_domain_online(r, d);
 
-	if (arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
+	if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
 		mon_domain_free(hw_dom);
 		return;
 	}
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 3fc4d9f56f0d..aeb2a9283069 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -134,7 +134,7 @@ static int logical_rmid_to_physical_rmid(int cpu, int lrmid)
 	if (snc_nodes_per_l3_cache == 1)
 		return lrmid;
 
-	return lrmid + (cpu_to_node(cpu) % snc_nodes_per_l3_cache) * r->num_rmid;
+	return lrmid + (cpu_to_node(cpu) % snc_nodes_per_l3_cache) * r->mon.num_rmid;
 }
 
 static int __rmid_read_phys(u32 prmid, enum resctrl_event_id eventid, u64 *val)
@@ -208,11 +208,11 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *
 
 	if (resctrl_arch_is_mbm_total_enabled())
 		memset(hw_dom->arch_mbm_total, 0,
-		       sizeof(*hw_dom->arch_mbm_total) * r->num_rmid);
+		       sizeof(*hw_dom->arch_mbm_total) * r->mon.num_rmid);
 
 	if (resctrl_arch_is_mbm_local_enabled())
 		memset(hw_dom->arch_mbm_local, 0,
-		       sizeof(*hw_dom->arch_mbm_local) * r->num_rmid);
+		       sizeof(*hw_dom->arch_mbm_local) * r->mon.num_rmid);
 }
 
 static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
@@ -350,7 +350,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 
 	resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
 	hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale / snc_nodes_per_l3_cache;
-	r->num_rmid = (boot_cpu_data.x86_cache_max_rmid + 1) / snc_nodes_per_l3_cache;
+	r->mon.num_rmid = (boot_cpu_data.x86_cache_max_rmid + 1) / snc_nodes_per_l3_cache;
 	hw_res->mbm_width = MBM_CNTR_WIDTH_BASE;
 
 	if (mbm_offset > 0 && mbm_offset <= MBM_CNTR_WIDTH_OFFSET_MAX)
@@ -365,7 +365,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 	 *
 	 * For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC.
 	 */
-	threshold = resctrl_rmid_realloc_limit / r->num_rmid;
+	threshold = resctrl_rmid_realloc_limit / r->mon.num_rmid;
 
 	/*
 	 * Because num_rmid may not be a power of two, round the value
@@ -379,7 +379,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 
 		/* Detect list of bandwidth sources that can be tracked */
 		cpuid_count(0x80000020, 3, &eax, &ebx, &ecx, &edx);
-		r->mbm_cfg_mask = ecx & MAX_EVT_CONFIG_BITS;
+		r->mon.mbm_cfg_mask = ecx & MAX_EVT_CONFIG_BITS;
 	}
 
 	r->mon_capable = true;
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index bde2801289d3..6ffa9d14a8b4 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -866,14 +866,14 @@ static struct mon_evt mbm_local_event = {
  */
 static void l3_mon_evt_init(struct rdt_resource *r)
 {
-	INIT_LIST_HEAD(&r->evt_list);
+	INIT_LIST_HEAD(&r->mon.evt_list);
 
 	if (resctrl_arch_is_llc_occupancy_enabled())
-		list_add_tail(&llc_occupancy_event.list, &r->evt_list);
+		list_add_tail(&llc_occupancy_event.list, &r->mon.evt_list);
 	if (resctrl_arch_is_mbm_total_enabled())
-		list_add_tail(&mbm_total_event.list, &r->evt_list);
+		list_add_tail(&mbm_total_event.list, &r->mon.evt_list);
 	if (resctrl_arch_is_mbm_local_enabled())
-		list_add_tail(&mbm_local_event.list, &r->evt_list);
+		list_add_tail(&mbm_local_event.list, &r->mon.evt_list);
 }
 
 /**
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index ec28228f6a8d..9412c7b64523 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1139,7 +1139,7 @@ static int rdt_num_rmids_show(struct kernfs_open_file *of,
 {
 	struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
 
-	seq_printf(seq, "%d\n", r->num_rmid);
+	seq_printf(seq, "%d\n", r->mon.num_rmid);
 
 	return 0;
 }
@@ -1150,7 +1150,7 @@ static int rdt_mon_features_show(struct kernfs_open_file *of,
 	struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
 	struct mon_evt *mevt;
 
-	list_for_each_entry(mevt, &r->evt_list, list) {
+	list_for_each_entry(mevt, &r->mon.evt_list, list) {
 		seq_printf(seq, "%s\n", mevt->name);
 		if (mevt->configurable)
 			seq_printf(seq, "%s_config\n", mevt->name);
@@ -1733,9 +1733,9 @@ static int mon_config_write(struct rdt_resource *r, char *tok, u32 evtid)
 	}
 
 	/* Value from user cannot be more than the supported set of events */
-	if ((val & r->mbm_cfg_mask) != val) {
+	if ((val & r->mon.mbm_cfg_mask) != val) {
 		rdt_last_cmd_printf("Invalid event configuration: max valid mask is 0x%02x\n",
-				    r->mbm_cfg_mask);
+				    r->mon.mbm_cfg_mask);
 		return -EINVAL;
 	}
 
@@ -3055,10 +3055,10 @@ static int mon_add_all_files(struct kernfs_node *kn, struct rdt_mon_domain *d,
 	struct mon_evt *mevt;
 	int ret, domid;
 
-	if (WARN_ON(list_empty(&r->evt_list)))
+	if (WARN_ON(list_empty(&r->mon.evt_list)))
 		return -EPERM;
 
-	list_for_each_entry(mevt, &r->evt_list, list) {
+	list_for_each_entry(mevt, &r->mon.evt_list, list) {
 		domid = do_sum ? d->ci->id : d->hdr.id;
 		priv = mon_get_kn_priv(r->rid, domid, mevt, do_sum);
 		if (WARN_ON_ONCE(!priv))
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 9ba771f2ddea..2a8fa454d3e6 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -255,40 +255,48 @@ enum resctrl_schema_fmt {
 	RESCTRL_SCHEMA_RANGE,
 };
 
+/**
+ * struct resctrl_mon - Monitoring related data of a resctrl resource
+ * @num_rmid:		Number of RMIDs available
+ * @mbm_cfg_mask:	Bandwidth sources that can be tracked when bandwidth
+ *			monitoring events can be configured.
+ * @evt_list:		List of monitoring events
+ */
+struct resctrl_mon {
+	int			num_rmid;
+	unsigned int		mbm_cfg_mask;
+	struct list_head	evt_list;
+};
+
 /**
  * struct rdt_resource - attributes of a resctrl resource
  * @rid:		The index of the resource
  * @alloc_capable:	Is allocation available on this machine
  * @mon_capable:	Is monitor feature available on this machine
- * @num_rmid:		Number of RMIDs available
  * @ctrl_scope:		Scope of this resource for control functions
  * @mon_scope:		Scope of this resource for monitor functions
  * @cache:		Cache allocation related data
  * @membw:		If the component has bandwidth controls, their properties.
+ * @mon:		Monitoring related data.
  * @ctrl_domains:	RCU list of all control domains for this resource
  * @mon_domains:	RCU list of all monitor domains for this resource
  * @name:		Name to use in "schemata" file.
  * @schema_fmt:		Which format string and parser is used for this schema.
- * @evt_list:		List of monitoring events
- * @mbm_cfg_mask:	Bandwidth sources that can be tracked when bandwidth
- *			monitoring events can be configured.
  * @cdp_capable:	Is the CDP feature available on this resource
  */
 struct rdt_resource {
 	int			rid;
 	bool			alloc_capable;
 	bool			mon_capable;
-	int			num_rmid;
 	enum resctrl_scope	ctrl_scope;
 	enum resctrl_scope	mon_scope;
 	struct resctrl_cache	cache;
 	struct resctrl_membw	membw;
+	struct resctrl_mon	mon;
 	struct list_head	ctrl_domains;
 	struct list_head	mon_domains;
 	char			*name;
 	enum resctrl_schema_fmt	schema_fmt;
-	struct list_head	evt_list;
-	unsigned int		mbm_cfg_mask;
 	bool			cdp_capable;
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v13 04/27] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (2 preceding siblings ...)
  2025-05-15 22:51 ` [PATCH v13 03/27] x86/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
@ 2025-05-15 22:51 ` Babu Moger
  2025-05-22 20:54   ` Reinette Chatre
  2025-05-15 22:51 ` [PATCH v13 05/27] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
                   ` (24 subsequent siblings)
  28 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:51 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

ABMC feature details are reported via CPUID Fn8000_0020_EBX_x5.
Bits Description
15:0 MAX_ABMC Maximum Supported Assignable Bandwidth
     Monitoring Counter ID + 1

The feature details are documented in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).

Detect the feature and number of assignable monitoring counters supported.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v13: No changes.

v12: Resolved conflicts because of latest merge.
     Removed Reviewed-by as the patch has changed.

v11: No changes.

v10: No changes.

v9: Added Reviewed-by tag. No code changes

v8: Used GENMASK for the mask.

v7: Removed WARN_ON for num_mbm_cntrs. Decided to dynamically allocate the
    bitmap. WARN_ON is not required anymore.
    Removed redundant comments.

v6: Commit message update.
    Renamed abmc_capable to mbm_cntr_assignable.

v5: Name change num_cntrs to num_mbm_cntrs.
    Moved abmc_capable to resctrl_mon.

v4: Removed resctrl_arch_has_abmc(). Added all the code inline. We dont
    need to separate this as arch code.

v3: Removed changes related to mon_features.
    Moved rdt_cpu_has to core.c and added new function resctrl_arch_has_abmc.
    Also moved the fields mbm_assign_capable and mbm_assign_cntrs to
    rdt_resource. (James)

v2: Changed the field name to mbm_assign_capable from abmc_capable.
---
 arch/x86/kernel/cpu/resctrl/monitor.c | 9 +++++++--
 include/linux/resctrl.h               | 4 ++++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index aeb2a9283069..fd2761d9f3f7 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -345,6 +345,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 	unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 	unsigned int threshold;
+	u32 eax, ebx, ecx, edx;
 
 	snc_nodes_per_l3_cache = snc_get_config();
 
@@ -375,13 +376,17 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 	resctrl_rmid_realloc_threshold = resctrl_arch_round_mon_val(threshold);
 
 	if (rdt_cpu_has(X86_FEATURE_BMEC)) {
-		u32 eax, ebx, ecx, edx;
-
 		/* Detect list of bandwidth sources that can be tracked */
 		cpuid_count(0x80000020, 3, &eax, &ebx, &ecx, &edx);
 		r->mon.mbm_cfg_mask = ecx & MAX_EVT_CONFIG_BITS;
 	}
 
+	if (rdt_cpu_has(X86_FEATURE_ABMC)) {
+		r->mon.mbm_cntr_assignable = true;
+		cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
+		r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
+	}
+
 	r->mon_capable = true;
 
 	return 0;
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 2a8fa454d3e6..065fb6e38933 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -260,11 +260,15 @@ enum resctrl_schema_fmt {
  * @num_rmid:		Number of RMIDs available
  * @mbm_cfg_mask:	Bandwidth sources that can be tracked when bandwidth
  *			monitoring events can be configured.
+ * @num_mbm_cntrs:	Number of assignable monitoring counters
+ * @mbm_cntr_assignable:Is system capable of supporting monitor assignment?
  * @evt_list:		List of monitoring events
  */
 struct resctrl_mon {
 	int			num_rmid;
 	unsigned int		mbm_cfg_mask;
+	int			num_mbm_cntrs;
+	bool			mbm_cntr_assignable;
 	struct list_head	evt_list;
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v13 05/27] x86/resctrl: Add support to enable/disable AMD ABMC feature
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (3 preceding siblings ...)
  2025-05-15 22:51 ` [PATCH v13 04/27] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
@ 2025-05-15 22:51 ` Babu Moger
  2025-05-22 20:56   ` Reinette Chatre
  2025-05-15 22:51 ` [PATCH v13 06/27] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
                   ` (23 subsequent siblings)
  28 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:51 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

Add the functionality to enable/disable AMD ABMC feature.

AMD ABMC feature is enabled by setting enabled bit(0) in MSR
L3_QOS_EXT_CFG. When the state of ABMC is changed, the MSR needs
to be updated on all the logical processors in the QOS Domain.

Hardware counters will reset when ABMC state is changed.

The ABMC feature details are documented in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v13: Resolved minor conflicts with recent FS/ARCH restructure.

v12: Clarified the comment on _resctrl_abmc_enable().
     Added the code to reset arch state in _resctrl_abmc_enable().
     Resolved the conflicts with latest merge.

v11: Moved the monitoring related calls to monitor.c file.
     Moved the changes from include/linux/resctrl.h to
     arch/x86/kernel/cpu/resctrl/internal.h.
     Removed the Reviewed-by tag as patch changed.
     Actual code did not change.

v10: No changes.

v9: Re-ordered the MSR and added Reviewed-by tag.

v8: Commit message update and moved around the comments about L3_QOS_EXT_CFG
    to _resctrl_abmc_enable.

v7: Renamed the function
    resctrl_arch_get_abmc_enabled() to resctrl_arch_mbm_cntr_assign_enabled().

    Merged resctrl_arch_mbm_cntr_assign_disable, resctrl_arch_mbm_cntr_assign_disable
    and renamed to resctrl_arch_mbm_cntr_assign_set().

    Moved the function definition to linux/resctrl.h.

    Passed the struct rdt_resource to these functions.
    Removed resctrl_arch_reset_rmid_all() from arch code. This will be done
    from the caller.

v6: Renamed abmc_enabled to mbm_cntr_assign_enabled.
    Used msr_set_bit and msr_clear_bit for msr updates.
    Renamed resctrl_arch_abmc_enable() to resctrl_arch_mbm_cntr_assign_enable().
    Renamed resctrl_arch_abmc_disable() to resctrl_arch_mbm_cntr_assign_disable().
    Made _resctrl_abmc_enable to return void.

v5: Renamed resctrl_abmc_enable to resctrl_arch_abmc_enable.
    Renamed resctrl_abmc_disable to resctrl_arch_abmc_disable.
    Introduced resctrl_arch_get_abmc_enabled to get abmc state from
    non-arch code.
    Renamed resctrl_abmc_set_all to _resctrl_abmc_enable().
    Modified commit log to make it clear about AMD ABMC feature.

v3: No changes.

v2: Few text changes in commit message.
---
 arch/x86/include/asm/msr-index.h       |  1 +
 arch/x86/kernel/cpu/resctrl/internal.h |  5 +++
 arch/x86/kernel/cpu/resctrl/monitor.c  | 43 ++++++++++++++++++++++++++
 include/linux/resctrl.h                |  3 ++
 4 files changed, 52 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index e6134ef2263d..3970e0b16e47 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1203,6 +1203,7 @@
 /* - AMD: */
 #define MSR_IA32_MBA_BW_BASE		0xc0000200
 #define MSR_IA32_SMBA_BW_BASE		0xc0000280
+#define MSR_IA32_L3_QOS_EXT_CFG		0xc00003ff
 #define MSR_IA32_EVT_CFG_BASE		0xc0000400
 
 /* AMD-V MSRs */
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 5e3c41b36437..fcc9d23686a1 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -37,6 +37,9 @@ struct arch_mbm_state {
 	u64	prev_msr;
 };
 
+/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
+#define ABMC_ENABLE_BIT			0
+
 /**
  * struct rdt_hw_ctrl_domain - Arch private attributes of a set of CPUs that share
  *			       a resource for a control function
@@ -102,6 +105,7 @@ struct msr_param {
  * @mon_scale:		cqm counter * mon_scale = occupancy in bytes
  * @mbm_width:		Monitor width, to detect and correct for overflow.
  * @cdp_enabled:	CDP state of this resource
+ * @mbm_cntr_assign_enabled:	ABMC feature is enabled
  *
  * Members of this structure are either private to the architecture
  * e.g. mbm_width, or accessed via helpers that provide abstraction. e.g.
@@ -115,6 +119,7 @@ struct rdt_hw_resource {
 	unsigned int		mon_scale;
 	unsigned int		mbm_width;
 	bool			cdp_enabled;
+	bool			mbm_cntr_assign_enabled;
 };
 
 static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r)
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index fd2761d9f3f7..ff4b2abfa044 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -405,3 +405,46 @@ void __init intel_rdt_mbm_apply_quirk(void)
 	mbm_cf_rmidthreshold = mbm_cf_table[cf_index].rmidthreshold;
 	mbm_cf = mbm_cf_table[cf_index].cf;
 }
+
+static void resctrl_abmc_set_one_amd(void *arg)
+{
+	bool *enable = arg;
+
+	if (*enable)
+		msr_set_bit(MSR_IA32_L3_QOS_EXT_CFG, ABMC_ENABLE_BIT);
+	else
+		msr_clear_bit(MSR_IA32_L3_QOS_EXT_CFG, ABMC_ENABLE_BIT);
+}
+
+/*
+ * ABMC enable/disable requires update of L3_QOS_EXT_CFG MSR on all the CPUs
+ * associated with all monitor domains.
+ */
+static void _resctrl_abmc_enable(struct rdt_resource *r, bool enable)
+{
+	struct rdt_mon_domain *d;
+
+	list_for_each_entry(d, &r->mon_domains, hdr.list) {
+		on_each_cpu_mask(&d->hdr.cpu_mask,
+				 resctrl_abmc_set_one_amd, &enable, 1);
+		resctrl_arch_reset_rmid_all(r, d);
+	}
+}
+
+int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable)
+{
+	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+
+	if (r->mon.mbm_cntr_assignable &&
+	    hw_res->mbm_cntr_assign_enabled != enable) {
+		_resctrl_abmc_enable(r, enable);
+		hw_res->mbm_cntr_assign_enabled = enable;
+	}
+
+	return 0;
+}
+
+inline bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
+{
+	return resctrl_to_arch_res(r)->mbm_cntr_assign_enabled;
+}
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 065fb6e38933..bdb264875ef6 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -428,6 +428,9 @@ static inline u32 resctrl_get_config_index(u32 closid,
 bool resctrl_arch_get_cdp_enabled(enum resctrl_res_level l);
 int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
 
+bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r);
+int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable);
+
 /*
  * Update the ctrl_val and apply this config right now.
  * Must be called on one of the domain's CPUs.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v13 06/27] x86/resctrl: Introduce the interface to display monitor mode
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (4 preceding siblings ...)
  2025-05-15 22:51 ` [PATCH v13 05/27] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
@ 2025-05-15 22:51 ` Babu Moger
  2025-05-22 20:56   ` Reinette Chatre
  2025-05-15 22:51 ` [PATCH v13 07/27] x86/resctrl: Introduce interface to display number of monitoring counters Babu Moger
                   ` (22 subsequent siblings)
  28 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:51 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

Introduce the resctrl file "mbm_assign_mode" to list monitor modes
supported.

The "mbm_cntr_assign" mode provides the option to assign a counter to
an RMID, event pair and monitor the bandwidth as long as it is assigned.

On AMD systems "mbm_cntr_assign" mode is backed by the ABMC (Assignable
Bandwidth Monitoring Counters) hardware feature and is enabled by default.

The "default" mode is the existing monitoring mode that works without the
explicit counter assignment, instead relying on dynamic counter assignment
by hardware that may result in hardware not dedicating a counter resulting
in monitoring data reads returning "Unavailable".

Provide an interface to display the monitor mode on the system.
$ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
[mbm_cntr_assign]
default

Add IS_ENABLED(CONFIG_RESCTRL_ASSIGN_FIXED) check to support Arm64.

On x86, CONFIG_RESCTRL_ASSIGN_FIXED is not defined. On Arm64, it will be
defined when the "mbm_cntr_assign" mode is supported.

Add IS_ENABLED(CONFIG_RESCTRL_ASSIGN_FIXED) check early to ensure the user
interface remains compatible with upcoming Arm64 support. IS_ENABLED()
safely evaluates to 0 when the configuration is not defined.

As a result, for MPAM, the display would be either:
[default]
or
[mbm_cntr_assign]

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v13: Updated the commit log with motivation for adding CONFIG_RESCTRL_ASSIGN_FIXED.
     Added fflag RFTYPE_RES_CACHE for mbm_assign_mode file.
     Updated user doc. Removed the references to "mbm_assign_control".
     Resolved the conflicts with latest FS/ARCH code restructure.

v12: Minor text update in change log and user documentation.
     Added the check CONFIG_RESCTRL_ASSIGN_FIXED to take care of arm platforms.
     This will be defined only in arm and not in x86.

v11: Renamed rdtgroup_mbm_assign_mode_show() to resctrl_mbm_assign_mode_show().
     Removed few texts in resctrl.rst about AMD specific information.
     Updated few texts.

v10: Added few more text to user documentation clarify on the default mode.

v9: Updated user documentation based on comments.

v8: Commit message update.

v7: Updated the descriptions/commit log in resctrl.rst to generic text.
    Thanks to James and Reinette.
    Rename mbm_mode to mbm_assign_mode.
    Introduced mutex lock in rdtgroup_mbm_mode_show().

v6: Added documentation for mbm_cntr_assign and legacy mode.
    Moved mbm_mode fflags initialization to static initialization.

v5: Changed interface name to mbm_mode.
    It will be always available even if ABMC feature is not supported.
    Added description in resctrl.rst about ABMC mode.
    Fixed display abmc and legacy consistantly.

v4: Fixed the checks for legacy and abmc mode. Default it ABMC.

v3: New patch to display ABMC capability.
---
 Documentation/filesystems/resctrl.rst | 27 +++++++++++++++++++
 fs/resctrl/rdtgroup.c                 | 37 +++++++++++++++++++++++++++
 2 files changed, 64 insertions(+)

diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index c97fd77a107d..8013418b6ca2 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -257,6 +257,33 @@ with the following files:
 	    # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
 	    0=0x30;1=0x30;3=0x15;4=0x15
 
+"mbm_assign_mode":
+	Reports the list of monitoring modes supported. The enclosed brackets
+	indicate which mode is enabled.
+	::
+
+	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
+	  [mbm_cntr_assign]
+	  default
+
+	"mbm_cntr_assign":
+
+	In mbm_cntr_assign mode, a monitoring event can only accumulate data
+	while it is backed by a hardware counter. Use "mbm_L3_assignments" found
+	in each CTRL_MON and MON group to specify which of the events should have
+	a counter assigned. The number of counters available is described in the
+	"num_mbm_cntrs" file. Changing the mode may cause all counters on the
+	resource to reset.
+
+	"default":
+
+	In default mode, resctrl assumes there is a hardware counter for each
+	event within every CTRL_MON and MON group. On AMD platforms, it is
+	recommended to use the mbm_cntr_assign mode, if supported, to prevent
+	reset of MBM events between reads resulting from hardware re-allocating
+	counters. This can result in misleading values or display "Unavailable"
+	if no counter is assigned to the event.
+
 "max_threshold_occupancy":
 		Read/write file provides the largest value (in
 		bytes) at which a previously used LLC_occupancy
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 9412c7b64523..a9e1055df75f 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1801,6 +1801,36 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
 	return ret ?: nbytes;
 }
 
+static int resctrl_mbm_assign_mode_show(struct kernfs_open_file *of,
+					struct seq_file *s, void *v)
+{
+	struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
+	bool enabled;
+
+	mutex_lock(&rdtgroup_mutex);
+	enabled = resctrl_arch_mbm_cntr_assign_enabled(r);
+
+	if (r->mon.mbm_cntr_assignable) {
+		if (enabled)
+			seq_puts(s, "[mbm_cntr_assign]\n");
+		else
+			seq_puts(s, "[default]\n");
+
+		if (!IS_ENABLED(CONFIG_RESCTRL_ASSIGN_FIXED)) {
+			if (enabled)
+				seq_puts(s, "default\n");
+			else
+				seq_puts(s, "mbm_cntr_assign\n");
+		}
+	} else {
+		seq_puts(s, "[default]\n");
+	}
+
+	mutex_unlock(&rdtgroup_mutex);
+
+	return 0;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
@@ -1913,6 +1943,13 @@ static struct rftype res_common_files[] = {
 		.seq_show	= mbm_local_bytes_config_show,
 		.write		= mbm_local_bytes_config_write,
 	},
+	{
+		.name		= "mbm_assign_mode",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= resctrl_mbm_assign_mode_show,
+		.fflags		= RFTYPE_MON_INFO | RFTYPE_RES_CACHE,
+	},
 	{
 		.name		= "cpus",
 		.mode		= 0644,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v13 07/27] x86/resctrl: Introduce interface to display number of monitoring counters
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (5 preceding siblings ...)
  2025-05-15 22:51 ` [PATCH v13 06/27] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
@ 2025-05-15 22:51 ` Babu Moger
  2025-05-15 22:51 ` [PATCH v13 08/27] x86/resctrl: Introduce mbm_cntr_cfg to track assignable counters at domain Babu Moger
                   ` (21 subsequent siblings)
  28 siblings, 0 replies; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:51 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

The mbm_cntr_assign mode provides an option to the user to assign a counter
to an RMID, event pair and monitor the bandwidth as long as the counter is
assigned. Number of assignments depend on number of monitoring counters
available.

Create 'num_mbm_cntrs' resctrl file that displays the number of monitoring
counters supported in each domain. 'num_mbm_cntrs' is only visible to user
space when the system supports mbm_cntr_assign mode.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v13: Updated the changelog.
     Added fflags RFTYPE_RES_CACHE to the file num_mbm_cntrs.
     Replaced seq_puts from seq_putc where applicable.
     Resolved conflicts caused by the recent FS/ARCH code restructure.
     The files monitor.c/rdtgroup.c have been split between FS and ARCH directories.

v12: Changed the code to display the max supported monitoring counters in
     each domain. Also updated the documentation.
     Resolved the conflict with the latest code.

v11: Renamed rdtgroup_num_mbm_cntrs_show() to resctrl_num_mbm_cntrs_show().
     Few monor text updates.

v10: No changes.

v9: Updated user document based on the comments.
    Will add a new file available_mbm_cntrs later in the series.

v8: Commit message update and documentation update.

v7: Minor commit log text changes.

v6: No changes.

v5: Changed the display name from num_cntrs to num_mbm_cntrs.
    Updated the commit message.
    Moved the patch after mbm_mode is introduced.

v4: Changed the counter name to num_cntrs. And few text changes.

v3: Changed the field name to mbm_assign_cntrs.

v2: Changed the field name to mbm_assignable_counters from abmc_counter.
---
 Documentation/filesystems/resctrl.rst | 11 ++++++++++
 fs/resctrl/monitor.c                  |  4 ++++
 fs/resctrl/rdtgroup.c                 | 30 +++++++++++++++++++++++++++
 3 files changed, 45 insertions(+)

diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index 8013418b6ca2..abbbbdf61062 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -284,6 +284,17 @@ with the following files:
 	counters. This can result in misleading values or display "Unavailable"
 	if no counter is assigned to the event.
 
+"num_mbm_cntrs":
+	The maximum number of monitoring counters (total of available and assigned
+	counters) in each domain when the system supports mbm_cntr_assign mode.
+
+	For example, on a system with maximum of 32 memory bandwidth monitoring
+	counters in each of its L3 domains:
+	::
+
+	  # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
+	  0=32;1=32
+
 "max_threshold_occupancy":
 		Read/write file provides the largest value (in
 		bytes) at which a previously used LLC_occupancy
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 6ffa9d14a8b4..039516cf410d 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -918,6 +918,10 @@ int resctrl_mon_resource_init(void)
 	else if (resctrl_arch_is_mbm_total_enabled())
 		mba_mbps_default_event = QOS_L3_MBM_TOTAL_EVENT_ID;
 
+	if (r->mon.mbm_cntr_assignable)
+		resctrl_file_fflags_init("num_mbm_cntrs",
+					 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
+
 	return 0;
 }
 
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index a9e1055df75f..51f8f8d3ccbc 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1831,6 +1831,30 @@ static int resctrl_mbm_assign_mode_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static int resctrl_num_mbm_cntrs_show(struct kernfs_open_file *of,
+				      struct seq_file *s, void *v)
+{
+	struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
+	struct rdt_mon_domain *dom;
+	bool sep = false;
+
+	cpus_read_lock();
+	mutex_lock(&rdtgroup_mutex);
+
+	list_for_each_entry(dom, &r->mon_domains, hdr.list) {
+		if (sep)
+			seq_putc(s, ';');
+
+		seq_printf(s, "%d=%d", dom->hdr.id, r->mon.num_mbm_cntrs);
+		sep = true;
+	}
+	seq_putc(s, '\n');
+
+	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
+	return 0;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
@@ -1868,6 +1892,12 @@ static struct rftype res_common_files[] = {
 		.seq_show	= rdt_default_ctrl_show,
 		.fflags		= RFTYPE_CTRL_INFO | RFTYPE_RES_CACHE,
 	},
+	{
+		.name		= "num_mbm_cntrs",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= resctrl_num_mbm_cntrs_show,
+	},
 	{
 		.name		= "min_cbm_bits",
 		.mode		= 0444,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v13 08/27] x86/resctrl: Introduce mbm_cntr_cfg to track assignable counters at domain
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (6 preceding siblings ...)
  2025-05-15 22:51 ` [PATCH v13 07/27] x86/resctrl: Introduce interface to display number of monitoring counters Babu Moger
@ 2025-05-15 22:51 ` Babu Moger
  2025-05-22 21:02   ` Reinette Chatre
  2025-05-15 22:51 ` [PATCH v13 09/27] x86/resctrl: Introduce interface to display number of free MBM counters Babu Moger
                   ` (20 subsequent siblings)
  28 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:51 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

In mbm_cntr_assign mode hardware counters are assigned/unassigned to an
MBM event of a monitor group. Hardware counters are assigned/unassigned
at monitoring domain level.

Manage a monitoring domain's hardware counters using a per monitoring
domain array of struct mbm_cntr_cfg that is indexed by the hardware
counter ID. A hardware counter's configuration contains the MBM event
ID and points to the monitoring group that it is assigned to, with a
NULL pointer meaning that the hardware counter is available for assignment.

There is no direct way to determine which hardware counters are assigned
to a particular monitoring group. Check every entry of every hardware
counter configuration array in every monitoring domain to query which
MBM events of a monitoring group is tracked by hardware. Such queries are
acceptable because of a very small number of assignable counters (32
to 64).

Suggested-by: Peter Newman <peternewman@google.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v13: Resolved conflicts caused by the recent FS/ARCH code restructure.
     The files monitor.c/rdtgroup.c have been split between FS and ARCH directories.

v12: Fixed the struct mbm_cntr_cfg code documentation.
     Removed few strange charactors in changelog.
     Added the counter range for better understanding.
     Moved the struct mbm_cntr_cfg definition to resctrl/internal.h as
     suggested by James.

v11: Refined the change log based on Reinette's feedback.
     Fixed few style issues.

v10: Patch changed completely to handle the counters at domain level.
     https://lore.kernel.org/lkml/CALPaoCj+zWq1vkHVbXYP0znJbe6Ke3PXPWjtri5AFgD9cQDCUg@mail.gmail.com/
     Removed Reviewed-by tag.
     Did not see the need to add cntr_id in mbm_state structure. Not used in the code.

v9: Added Reviewed-by tag. No other changes.

v8: Minor commit message changes.

v7: Added check mbm_cntr_assignable for allocating bitmap mbm_cntr_map

v6: New patch to add domain level assignment.
---
 fs/resctrl/rdtgroup.c   | 11 +++++++++++
 include/linux/resctrl.h | 16 ++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 51f8f8d3ccbc..e2005fc9acd9 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -4085,6 +4085,7 @@ static void rdtgroup_setup_default(void)
 
 static void domain_destroy_mon_state(struct rdt_mon_domain *d)
 {
+	kfree(d->cntr_cfg);
 	bitmap_free(d->rmid_busy_llc);
 	kfree(d->mbm_total);
 	kfree(d->mbm_local);
@@ -4171,6 +4172,16 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain
 			return -ENOMEM;
 		}
 	}
+	if (resctrl_is_mbm_enabled() && r->mon.mbm_cntr_assignable) {
+		tsize = sizeof(*d->cntr_cfg);
+		d->cntr_cfg = kcalloc(r->mon.num_mbm_cntrs, tsize, GFP_KERNEL);
+		if (!d->cntr_cfg) {
+			bitmap_free(d->rmid_busy_llc);
+			kfree(d->mbm_total);
+			kfree(d->mbm_local);
+			return -ENOMEM;
+		}
+	}
 
 	return 0;
 }
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index bdb264875ef6..d77981d1fcb9 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -156,6 +156,20 @@ struct rdt_ctrl_domain {
 	u32				*mbps_val;
 };
 
+/**
+ * struct mbm_cntr_cfg - Assignable counter configuration
+ * @evtid:		MBM event to which the counter is assigned. Only valid
+ *			if @rdtgroup is not NULL.
+ * @evt_cfg:		Event configuration value.
+ * @rdtgrp:		resctrl group assigned to the counter. NULL if the
+ *			counter is free.
+ */
+struct mbm_cntr_cfg {
+	enum resctrl_event_id   evtid;
+	u32                     evt_cfg;
+	struct rdtgroup         *rdtgrp;
+};
+
 /**
  * struct rdt_mon_domain - group of CPUs sharing a resctrl monitor resource
  * @hdr:		common header for different domain types
@@ -167,6 +181,7 @@ struct rdt_ctrl_domain {
  * @cqm_limbo:		worker to periodically read CQM h/w counters
  * @mbm_work_cpu:	worker CPU for MBM h/w counters
  * @cqm_work_cpu:	worker CPU for CQM h/w counters
+ * @cntr_cfg:		assignable counters configuration
  */
 struct rdt_mon_domain {
 	struct rdt_domain_hdr		hdr;
@@ -178,6 +193,7 @@ struct rdt_mon_domain {
 	struct delayed_work		cqm_limbo;
 	int				mbm_work_cpu;
 	int				cqm_work_cpu;
+	struct mbm_cntr_cfg		*cntr_cfg;
 };
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v13 09/27] x86/resctrl: Introduce interface to display number of free MBM counters
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (7 preceding siblings ...)
  2025-05-15 22:51 ` [PATCH v13 08/27] x86/resctrl: Introduce mbm_cntr_cfg to track assignable counters at domain Babu Moger
@ 2025-05-15 22:51 ` Babu Moger
  2025-05-15 22:51 ` [PATCH v13 10/27] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
                   ` (19 subsequent siblings)
  28 siblings, 0 replies; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:51 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

Provide the interface to display the number of monitoring counters
available for assignment in each domain when mbm_cntr_assign mode is
enabled.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v13: Resolved conflicts caused by the recent FS/ARCH code restructure.
     The files monitor.c and rdtgroup.c file has now been split between
     the FS and ARCH directories.

v12: Minor change to change log.
     Updated the documentation text with an example.
     Replaced seq_puts(s, ";") with seq_putc(s, ';');
     Added missing rdt_last_cmd_clear() in resctrl_available_mbm_cntrs_show().

v11: Rename rdtgroup_available_mbm_cntrs_show() to resctrl_available_mbm_cntrs_show().
     Few minor text changes.

v10: Patch changed to handle the counters at domain level.
     https://lore.kernel.org/lkml/CALPaoCj+zWq1vkHVbXYP0znJbe6Ke3PXPWjtri5AFgD9cQDCUg@mail.gmail.com/
     So, display logic also changed now.

v9: New patch
---
 Documentation/filesystems/resctrl.rst | 11 ++++++
 fs/resctrl/monitor.c                  |  5 ++-
 fs/resctrl/rdtgroup.c                 | 48 +++++++++++++++++++++++++++
 3 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index abbbbdf61062..2bfad43aac9c 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -295,6 +295,17 @@ with the following files:
 	  # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
 	  0=32;1=32
 
+"available_mbm_cntrs":
+	The number of monitoring counters available for assignment in each
+	domain when mbm_cntr_assign mode is enabled on the system.
+
+	For example, on a system with 30 available [hardware] monitoring counters
+	in each of its L3 domains:
+	::
+
+	  # cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
+	  0=30;1=30
+
 "max_threshold_occupancy":
 		Read/write file provides the largest value (in
 		bytes) at which a previously used LLC_occupancy
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 039516cf410d..2548aee0151c 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -918,9 +918,12 @@ int resctrl_mon_resource_init(void)
 	else if (resctrl_arch_is_mbm_total_enabled())
 		mba_mbps_default_event = QOS_L3_MBM_TOTAL_EVENT_ID;
 
-	if (r->mon.mbm_cntr_assignable)
+	if (r->mon.mbm_cntr_assignable) {
 		resctrl_file_fflags_init("num_mbm_cntrs",
 					 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
+		resctrl_file_fflags_init("available_mbm_cntrs",
+					 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
+	}
 
 	return 0;
 }
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index e2005fc9acd9..752750e3e443 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1855,6 +1855,48 @@ static int resctrl_num_mbm_cntrs_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static int resctrl_available_mbm_cntrs_show(struct kernfs_open_file *of,
+					    struct seq_file *s, void *v)
+{
+	struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
+	struct rdt_mon_domain *dom;
+	bool sep = false;
+	u32 cntrs, i;
+	int ret = 0;
+
+	cpus_read_lock();
+	mutex_lock(&rdtgroup_mutex);
+
+	rdt_last_cmd_clear();
+
+	if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
+		rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
+		ret = -EINVAL;
+		goto unlock_cntrs_show;
+	}
+
+	list_for_each_entry(dom, &r->mon_domains, hdr.list) {
+		if (sep)
+			seq_putc(s, ';');
+
+		cntrs = 0;
+		for (i = 0; i < r->mon.num_mbm_cntrs; i++) {
+			if (!dom->cntr_cfg[i].rdtgrp)
+				cntrs++;
+		}
+
+		seq_printf(s, "%d=%u", dom->hdr.id, cntrs);
+		sep = true;
+	}
+	seq_putc(s, '\n');
+
+unlock_cntrs_show:
+	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
+
+	return ret;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
@@ -1878,6 +1920,12 @@ static struct rftype res_common_files[] = {
 		.seq_show	= rdt_mon_features_show,
 		.fflags		= RFTYPE_MON_INFO,
 	},
+	{
+		.name		= "available_mbm_cntrs",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= resctrl_available_mbm_cntrs_show,
+	},
 	{
 		.name		= "num_rmids",
 		.mode		= 0444,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v13 10/27] x86/resctrl: Add data structures and definitions for ABMC assignment
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (8 preceding siblings ...)
  2025-05-15 22:51 ` [PATCH v13 09/27] x86/resctrl: Introduce interface to display number of free MBM counters Babu Moger
@ 2025-05-15 22:51 ` Babu Moger
  2025-05-22 21:10   ` Reinette Chatre
  2025-05-15 22:51 ` [PATCH v13 11/27] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC Babu Moger
                   ` (18 subsequent siblings)
  28 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:51 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

The ABMC feature provides an option to the user to assign a hardware
counter to an RMID, event pair and monitor the bandwidth as long as the
counter is assigned. The bandwidth events will be tracked by the hardware
until the user changes the configuration. Each resctrl group can configure
maximum two counters, one for total event and one for local event.

The ABMC feature implements an MSR L3_QOS_ABMC_CFG (C000_03FDh).
ABMC counter assignment is done by setting the counter id, bandwidth
source (RMID) and bandwidth configuration. Users will have the option to
change the bandwidth configuration using resctrl interface which will be
introduced later in the series.

Attempts to read or write the MSR when ABMC is not enabled will result
in a #GP(0) exception.

Introduce the data structures and definitions for MSR L3_QOS_ABMC_CFG
(0xC000_03FDh):
=========================================================================
Bits 	Mnemonic	Description			Access Reset
							Type   Value
=========================================================================
63 	CfgEn 		Configuration Enable 		R/W 	0

62 	CtrEn 		Enable/disable counting		R/W 	0

61:53 	– 		Reserved 			MBZ 	0

52:48 	CtrID 		Counter Identifier		R/W	0

47 	IsCOS		BwSrc field is a CLOSID		R/W	0
			(not an RMID)

46:44 	–		Reserved			MBZ	0

43:32	BwSrc		Bandwidth Source		R/W	0
			(RMID or CLOSID)

31:0	BwType		Bandwidth configuration		R/W	0
			to track for this counter
==========================================================================

The feature details are documented in the APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v13: Removed the Reviewed-by tag as there is commit log change to remove
     BMEC reference.

v12: No changes.

v11: No changes.

v10: No changes.

v9: Removed the references of L3_QOS_ABMC_DSC.
    Text changes about configuration in kernel doc.

v8: Update the configuration notes in kernel_doc.
    Few commit message update.

v7: Removed the reference of L3_QOS_ABMC_DSC as it is not used anymore.
    Moved the configuration notes to kernel_doc.
    Adjusted the tabs for l3_qos_abmc_cfg and checkpatch seems happy.

v6: Removed all the fs related changes.
    Added note on CfgEn,CtrEn.
    Removed the definitions which are not used.
    Removed cntr_id initialization.

v5: Moved assignment flags here (path 10/19 of v4).
    Added MON_CNTR_UNSET definition to initialize cntr_id's.
    More details in commit log.
    Renamed few fields in l3_qos_abmc_cfg for readability.

v4: Added more descriptions.
    Changed the name abmc_ctr_id to ctr_id.
    Added L3_QOS_ABMC_DSC. Used for reading the configuration.

v3: No changes.

v2: No changes.
---
 arch/x86/include/asm/msr-index.h       |  1 +
 arch/x86/kernel/cpu/resctrl/internal.h | 35 ++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 3970e0b16e47..b5b5ebead24f 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1203,6 +1203,7 @@
 /* - AMD: */
 #define MSR_IA32_MBA_BW_BASE		0xc0000200
 #define MSR_IA32_SMBA_BW_BASE		0xc0000280
+#define MSR_IA32_L3_QOS_ABMC_CFG	0xc00003fd
 #define MSR_IA32_L3_QOS_EXT_CFG		0xc00003ff
 #define MSR_IA32_EVT_CFG_BASE		0xc0000400
 
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index fcc9d23686a1..db6b0c28ee6b 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -164,6 +164,41 @@ union cpuid_0x10_x_edx {
 	unsigned int full;
 };
 
+/*
+ * ABMC counters are configured by writing to L3_QOS_ABMC_CFG.
+ * @bw_type		: Bandwidth configuration (supported by BMEC)
+ *			  tracked by the @cntr_id.
+ * @bw_src		: Bandwidth source (RMID or CLOSID).
+ * @reserved1		: Reserved.
+ * @is_clos		: @bw_src field is a CLOSID (not an RMID).
+ * @cntr_id		: Counter identifier.
+ * @reserved		: Reserved.
+ * @cntr_en		: Counting enable bit.
+ * @cfg_en		: Configuration enable bit.
+ *
+ * Configuration and counting:
+ * Counter can be configured across multiple writes to MSR. Configuration
+ * is applied only when @cfg_en = 1. Counter @cntr_id is reset when the
+ * configuration is applied.
+ * @cfg_en = 1, @cntr_en = 0 : Apply @cntr_id configuration but do not
+ *                             count events.
+ * @cfg_en = 1, @cntr_en = 1 : Apply @cntr_id configuration and start
+ *                             counting events.
+ */
+union l3_qos_abmc_cfg {
+	struct {
+		unsigned long bw_type  :32,
+			      bw_src   :12,
+			      reserved1: 3,
+			      is_clos  : 1,
+			      cntr_id  : 5,
+			      reserved : 9,
+			      cntr_en  : 1,
+			      cfg_en   : 1;
+	} split;
+	unsigned long full;
+};
+
 void rdt_ctrl_update(void *arg);
 
 int rdt_get_mon_l3_config(struct rdt_resource *r);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v13 11/27] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (9 preceding siblings ...)
  2025-05-15 22:51 ` [PATCH v13 10/27] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
@ 2025-05-15 22:51 ` Babu Moger
  2025-05-22 21:51   ` Reinette Chatre
  2025-05-15 22:51 ` [PATCH v13 12/27] x86/resctrl: Introduce event configuration modes Babu Moger
                   ` (17 subsequent siblings)
  28 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:51 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

The ABMC feature provides an option to the user to assign a hardware
counter to an RMID, event pair and monitor the bandwidth as long as it
is assigned. The assigned RMID will be tracked by the hardware until the
user unassigns it manually.

Implement an architecture-specific handler to assign and unassign the
counter. Configure counters by writing to the L3_QOS_ABMC_CFG MSR,
specifying the counter ID, bandwidth source (RMID), and event
configuration.

The feature details are documented in the APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
    Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
    Monitoring (ABMC).

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v13: Moved resctrl_arch_config_cntr() prototype to include/linux/resctrl.h.
     Changed resctrl_arch_config_cntr() to retun void from int.
     Updated the kernal doc for the prototype.
     Updated the code comment.

12: Added the check to reset the architecture-specific state only when
     assign is requested.
     Added evt_cfg as the parameter as the user will be passing the event
     configuration from /info/L3_MON/event_configs/.

v11: Moved resctrl_arch_assign_cntr() and resctrl_abmc_config_one_amd() to
     monitor.c.
     Added the code to reset the arch state in resctrl_arch_assign_cntr().
     Also removed resctrl_arch_reset_rmid() inside IPI as the counters are
     reset from the callers.
     Re-wrote commit message.

v10: Added call resctrl_arch_reset_rmid() to reset the RMID in the domain
     inside IPI call.
     SMP and non-SMP call support is not required in resctrl_arch_config_cntr
     with new domain specific assign approach/data structure.
     Commit message update.

v9: Removed the code to reset the architectural state. It will done
    in another patch.

v8: Rename resctrl_arch_assign_cntr to resctrl_arch_config_cntr.

v7: Separated arch and fs functions. This patch only has arch implementation.
    Added struct rdt_resource to the interface resctrl_arch_assign_cntr.
    Rename rdtgroup_abmc_cfg() to resctrl_abmc_config_one_amd().

v6: Removed mbm_cntr_alloc() from this patch to keep fs and arch code
    separate.
    Added code to update the counter assignment at domain level.

v5: Few name changes to match cntr_id.
    Changed the function names to
      rdtgroup_assign_cntr
      resctr_arch_assign_cntr
      More comments on commit log.
      Added function summary.

v4: Commit message update.
      User bitmap APIs where applicable.
      Changed the interfaces considering MPAM(arm).
      Added domain specific assignment.

v3: Removed the static from the prototype of rdtgroup_assign_abmc.
      The function is not called directly from user anymore. These
---
 arch/x86/kernel/cpu/resctrl/monitor.c | 37 +++++++++++++++++++++++++++
 include/linux/resctrl.h               | 17 ++++++++++++
 2 files changed, 54 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index ff4b2abfa044..e31084f7babd 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -448,3 +448,40 @@ inline bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
 {
 	return resctrl_to_arch_res(r)->mbm_cntr_assign_enabled;
 }
+
+static void resctrl_abmc_config_one_amd(void *info)
+{
+	union l3_qos_abmc_cfg *abmc_cfg = info;
+
+	wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, abmc_cfg->full);
+}
+
+/*
+ * Send an IPI to the domain to assign the counter to RMID, event pair.
+ */
+void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+			      enum resctrl_event_id evtid, u32 rmid, u32 closid,
+			      u32 cntr_id, u32 evt_cfg, bool assign)
+{
+	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
+	union l3_qos_abmc_cfg abmc_cfg = { 0 };
+	struct arch_mbm_state *am;
+
+	abmc_cfg.split.cfg_en = 1;
+	abmc_cfg.split.cntr_en = assign ? 1 : 0;
+	abmc_cfg.split.cntr_id = cntr_id;
+	abmc_cfg.split.bw_src = rmid;
+	abmc_cfg.split.bw_type = evt_cfg;
+
+	smp_call_function_any(&d->hdr.cpu_mask, resctrl_abmc_config_one_amd, &abmc_cfg, 1);
+
+	/*
+	 * The hardware counter is reset (because cfg_en == 1) so there is no
+	 * need to record initial non-zero counts.
+	 */
+	if (assign) {
+		am = get_arch_mbm_state(hw_dom, rmid, evtid);
+		if (am)
+			memset(am, 0, sizeof(*am));
+	}
+}
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index d77981d1fcb9..59a4fe60ab46 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -559,6 +559,23 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *
  */
 void resctrl_arch_reset_all_ctrls(struct rdt_resource *r);
 
+/**
+ * resctrl_arch_config_cntr() - Configure the counter id to RMID, event
+ *				pair on the domain.
+ * @r:			Resource structure.
+ * @d:			Domain that the counter id to be configured.
+ * @evtid:		Event type to configure.
+ * @rmid:		RMID to configure.
+ * @closid:		CLOSID to configure.
+ * @cntr_id:		Counter ID to configure.
+ * @evt_cfg:		MBM event configuration value representing reads,
+ *			writes etc.
+ * @assign:		Assign or unassign.
+ */
+void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+			      enum resctrl_event_id evtid, u32 rmid, u32 closid,
+			      u32 cntr_id, u32 evt_cfg, bool assign);
+
 extern unsigned int resctrl_rmid_realloc_threshold;
 extern unsigned int resctrl_rmid_realloc_limit;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v13 12/27] x86/resctrl: Introduce event configuration modes
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (10 preceding siblings ...)
  2025-05-15 22:51 ` [PATCH v13 11/27] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC Babu Moger
@ 2025-05-15 22:51 ` Babu Moger
  2025-05-22 22:05   ` Reinette Chatre
  2025-05-15 22:51 ` [PATCH v13 13/27] x86/resctrl: Add the functionality to assign MBM events Babu Moger
                   ` (16 subsequent siblings)
  28 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:51 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

MBM events can be configured using either BMEC (Bandwidth Monitoring Event
Configuration) or the mbm_cntr_assign mode.

Introduce a data structure to represent the various event configuration
modes and their corresponding values.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v13: New patch to handle different event configuration types with
     mbm_cntr_assign mode.
---
 fs/resctrl/internal.h         |  6 ++++--
 fs/resctrl/monitor.c          |  4 ++--
 fs/resctrl/rdtgroup.c         |  2 +-
 include/linux/resctrl_types.h | 11 +++++++++++
 4 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 9a8cf6f11151..0fae374559ba 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -55,13 +55,15 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
  * struct mon_evt - Entry in the event list of a resource
  * @evtid:		event id
  * @name:		name of the event
- * @configurable:	true if the event is configurable
+ * @mbm_mode:		monitoring mode (BMEC or mbm_cntr_assign)
+ * @evt_cfg:		event configuration value decoding reads, writes.
  * @list:		entry in &rdt_resource->evt_list
  */
 struct mon_evt {
 	enum resctrl_event_id	evtid;
 	char			*name;
-	bool			configurable;
+	enum resctrl_mbm_mode	mbm_mode;
+	u32			evt_cfg;
 	struct list_head	list;
 };
 
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 2548aee0151c..8e403587a02f 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -903,12 +903,12 @@ int resctrl_mon_resource_init(void)
 	l3_mon_evt_init(r);
 
 	if (resctrl_arch_is_evt_configurable(QOS_L3_MBM_TOTAL_EVENT_ID)) {
-		mbm_total_event.configurable = true;
+		mbm_total_event.mbm_mode = MBM_MODE_BMEC;
 		resctrl_file_fflags_init("mbm_total_bytes_config",
 					 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
 	}
 	if (resctrl_arch_is_evt_configurable(QOS_L3_MBM_LOCAL_EVENT_ID)) {
-		mbm_local_event.configurable = true;
+		mbm_local_event.mbm_mode = MBM_MODE_BMEC;
 		resctrl_file_fflags_init("mbm_local_bytes_config",
 					 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
 	}
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 752750e3e443..f192b2736a77 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1152,7 +1152,7 @@ static int rdt_mon_features_show(struct kernfs_open_file *of,
 
 	list_for_each_entry(mevt, &r->mon.evt_list, list) {
 		seq_printf(seq, "%s\n", mevt->name);
-		if (mevt->configurable)
+		if (mevt->mbm_mode == MBM_MODE_BMEC)
 			seq_printf(seq, "%s_config\n", mevt->name);
 	}
 
diff --git a/include/linux/resctrl_types.h b/include/linux/resctrl_types.h
index a25fb9c4070d..26cd1fec72db 100644
--- a/include/linux/resctrl_types.h
+++ b/include/linux/resctrl_types.h
@@ -47,4 +47,15 @@ enum resctrl_event_id {
 	QOS_NUM_EVENTS,
 };
 
+/*
+ * Event configuration mode.
+ * Events can be configured either in BMEC (Bandwidth Monitoring Event
+ * Configuration) mode or mbm_cntr_assign mode.
+ */
+enum resctrl_mbm_mode {
+	MBM_MODE_NONE,
+	MBM_MODE_BMEC,
+	MBM_MODE_ASSIGN,
+};
+
 #endif /* __LINUX_RESCTRL_TYPES_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v13 13/27] x86/resctrl: Add the functionality to assign MBM events
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (11 preceding siblings ...)
  2025-05-15 22:51 ` [PATCH v13 12/27] x86/resctrl: Introduce event configuration modes Babu Moger
@ 2025-05-15 22:51 ` Babu Moger
  2025-05-22 22:41   ` Reinette Chatre
  2025-05-15 22:51 ` [PATCH v13 14/27] x86/resctrl: Add the functionality to unassign " Babu Moger
                   ` (15 subsequent siblings)
  28 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:51 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

The mbm_cntr_assign mode offers "num_mbm_cntrs" number of counters that
can be assigned to RMID, event pair and monitor the bandwidth as long
as it is assigned.

Add the functionality to allocate and assign a counter to am RMID, event
pair in the domain.

If all the counters are in use, kernel will log the error message "Unable
to allocate counter in domain" in /sys/fs/resctrl/info/last_cmd_status
when a new assignment is requested. Exit on the first failure when
assigning counters across all the domains.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v13: Updated changelog.
     Changed resctrl_arch_config_cntr() to return void instead of int.
     Just passing evtid is to resctrl_alloc_config_cntr() and
     resctrl_assign_cntr_event(). Event configuration value can be easily
     obtained from mon_evt list.
     Introduced new function mbm_get_mon_event() to get event configuration value.
     Added prototype descriptions to mbm_cntr_get() and mbm_cntr_alloc().
     Resolved conflicts caused by the recent FS/ARCH code restructure.
     The files monitor.c/rdtgroup.c have been split between FS and ARCH directories.

v12: Fixed typo in the subjest line.
     Replaced several counters with "num_mbm_cntrs" counters.
     Changed the check in resctrl_alloc_config_cntr() to reduce the indentation.
     Fixed the handling error on first failure.
     Added domain id and event id on failure.
     Fixed the return error override.
     Added new parameter event configuration (evt_cfg) to get the event configuration
     from user space.

v11: Patch changed again quite a bit.
     Moved the functions to monitor.c.
     Renamed rdtgroup_assign_cntr_event() to resctrl_assign_cntr_event().
     Refactored the resctrl_assign_cntr_event().
     Added functionality to exit on the first error during assignment.
     Simplified mbm_cntr_free().
     Removed the function mbm_cntr_assigned(). Will be using mbm_cntr_get() to
     figure out if the counter is assigned or not.
     Updated commit message and code comments.

v10: Patch changed completely.
     Counters are managed at the domain based on the discussion.
     https://lore.kernel.org/lkml/CALPaoCj+zWq1vkHVbXYP0znJbe6Ke3PXPWjtri5AFgD9cQDCUg@mail.gmail.com/
     Reset non-architectural MBM state.
     Commit message update.

v9: Introduced new function resctrl_config_cntr to assign the counter, update
    the bitmap and reset the architectural state.
    Taken care of error handling(freeing the counter) when assignment fails.
    Moved mbm_cntr_assigned_to_domain here as it used in this patch.
    Minor text changes.

v8: Renamed rdtgroup_assign_cntr() to rdtgroup_assign_cntr_event().
    Added the code to return the error if rdtgroup_assign_cntr_event fails.
    Moved definition of MBM_EVENT_ARRAY_INDEX to resctrl/internal.h.
    Updated typo in the comments.

v7: New patch. Moved all the FS code here.
    Merged rdtgroup_assign_cntr and rdtgroup_alloc_cntr.
    Adde new #define MBM_EVENT_ARRAY_INDEX.
---
 fs/resctrl/internal.h |   3 +
 fs/resctrl/monitor.c  | 134 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 137 insertions(+)

diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 0fae374559ba..ce4fcac91937 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -377,6 +377,9 @@ bool closid_allocated(unsigned int closid);
 
 int resctrl_find_cleanest_closid(void);
 
+int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
+			      struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
+
 #ifdef CONFIG_RESCTRL_FS_PSEUDO_LOCK
 int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
 
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 8e403587a02f..d76fd0840946 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -934,3 +934,137 @@ void resctrl_mon_resource_exit(void)
 
 	dom_data_exit(r);
 }
+
+/*
+ * Configure the counter for the event, RMID pair for the domain. Reset the
+ * non-architectural state to clear all the event counters.
+ */
+static void resctrl_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+				enum resctrl_event_id evtid, u32 rmid, u32 closid,
+				u32 cntr_id, u32 evt_cfg, bool assign)
+{
+	struct mbm_state *m;
+
+	resctrl_arch_config_cntr(r, d, evtid, rmid, closid, cntr_id, evt_cfg, assign);
+
+	m = get_mbm_state(d, closid, rmid, evtid);
+	if (m)
+		memset(m, 0, sizeof(struct mbm_state));
+}
+
+/*
+ * mbm_cntr_get() - Return the cntr_id for the matching evtid and rdtgrp in
+ *		    cntr_cfg array.
+ */
+static int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
+			struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
+{
+	int cntr_id;
+
+	for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
+		if (d->cntr_cfg[cntr_id].rdtgrp == rdtgrp &&
+		    d->cntr_cfg[cntr_id].evtid == evtid)
+			return cntr_id;
+	}
+
+	return -ENOENT;
+}
+
+/*
+ * mbm_cntr_alloc() - Return the first free entry in cntr_cfg array.
+ */
+static int mbm_cntr_alloc(struct rdt_resource *r, struct rdt_mon_domain *d,
+			  struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
+{
+	int cntr_id;
+
+	for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
+		if (!d->cntr_cfg[cntr_id].rdtgrp) {
+			d->cntr_cfg[cntr_id].rdtgrp = rdtgrp;
+			d->cntr_cfg[cntr_id].evtid = evtid;
+			return cntr_id;
+		}
+	}
+
+	return -ENOSPC;
+}
+
+/*
+ * mbm_get_mon_event() - Return the mon_evt entry for the matching evtid.
+ */
+static struct mon_evt *mbm_get_mon_event(struct rdt_resource *r,
+					 enum resctrl_event_id evtid)
+{
+	struct mon_evt *mevt;
+
+	list_for_each_entry(mevt, &r->mon.evt_list, list) {
+		if (mevt->evtid == evtid)
+			return mevt;
+	}
+
+	return NULL;
+}
+
+/*
+ * Allocate a fresh counter and configure the event if not assigned already.
+ */
+static int resctrl_alloc_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+				     struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
+{
+	struct mon_evt *mevt;
+	int cntr_id;
+
+	/* No need to allocate a new counter if it is already assigned */
+	cntr_id = mbm_cntr_get(r, d, rdtgrp, evtid);
+	if (cntr_id >= 0)
+		goto cntr_configure;
+
+	cntr_id = mbm_cntr_alloc(r, d, rdtgrp, evtid);
+	if (cntr_id <  0) {
+		rdt_last_cmd_printf("Unable to allocate counter in domain %d\n",
+				    d->hdr.id);
+		return cntr_id;
+	}
+
+cntr_configure:
+	mevt = mbm_get_mon_event(r, evtid);
+	if (!mevt) {
+		rdt_last_cmd_printf("Invalid event id %d\n", evtid);
+		return -EINVAL;
+	}
+
+	/*
+	 * Skip reconfiguration if the event setup is current; otherwise,
+	 * update and apply the new configuration to the domain.
+	 */
+	if (mevt->evt_cfg != d->cntr_cfg[cntr_id].evt_cfg) {
+		d->cntr_cfg[cntr_id].evt_cfg = mevt->evt_cfg;
+		resctrl_config_cntr(r, d, evtid, rdtgrp->mon.rmid, rdtgrp->closid,
+				    cntr_id, mevt->evt_cfg, true);
+	}
+
+	return 0;
+}
+
+/*
+ * Assign a hardware counter to event @evtid of group @rdtgrp.
+ * Assign counters to all domains if @d is NULL; otherwise, assign the
+ * counter to the specified domain @d.
+ */
+int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
+			      struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
+{
+	int ret = 0;
+
+	if (!d) {
+		list_for_each_entry(d, &r->mon_domains, hdr.list) {
+			ret = resctrl_alloc_config_cntr(r, d, rdtgrp, evtid);
+			if (ret)
+				return ret;
+		}
+	} else {
+		ret = resctrl_alloc_config_cntr(r, d, rdtgrp, evtid);
+	}
+
+	return ret;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v13 14/27] x86/resctrl: Add the functionality to unassign MBM events
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (12 preceding siblings ...)
  2025-05-15 22:51 ` [PATCH v13 13/27] x86/resctrl: Add the functionality to assign MBM events Babu Moger
@ 2025-05-15 22:51 ` Babu Moger
  2025-05-22 22:49   ` Reinette Chatre
  2025-05-15 22:52 ` [PATCH v13 15/27] x86/resctrl: Report 'Unassigned' for MBM events in mbm_cntr_assign mode Babu Moger
                   ` (14 subsequent siblings)
  28 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:51 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

The mbm_cntr_assign mode offers "num_mbm_cntrs" number of counters that
can be assigned to an RMID, event pair and monitor the bandwidth as long
as it is assigned. If all the counters are in use, the kernel will log the
error message "Unable to allocate counter in domain" in
/sys/fs/resctrl/info/last_cmd_status when a new assignment is requested.

To make space for a new assignment, users must unassign an already
assigned counter and retry the assignment again.

Add the functionality to unassign and free the counters in the domain.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v13: Moved mbm_cntr_free() to this patch as it is used in here first.
     Not required to pass evt_cfg to resctrl_unassign_cntr_event(). It is
     available via mbm_get_mon_event().
     Resolved conflicts caused by the recent FS/ARCH code restructure.
     The monitor.c file has now been split between the FS and ARCH directories.

v12: Updated the commit text to make bit more clear.
     Replaced several counters with "num_mbm_cntrs" counters.
     Fixed typo in the subjest line.
     Fixed the handling error on first failure.
     Added domain id and event id on failure.
     Added new parameter event configuration (evt_cfg) to provide the event from
     user space.

v11: Moved the functions to monitor.c.
     Renamed rdtgroup_unassign_cntr_event() to resctrl_unassign_cntr_event().
     Refactored the resctrl_unassign_cntr_event().
     Updated commit message and code comments.

v10: Patch changed again.
     Counters are managed at the domain based on the discussion.
     https://lore.kernel.org/lkml/CALPaoCj+zWq1vkHVbXYP0znJbe6Ke3PXPWjtri5AFgD9cQDCUg@mail.gmail.com/
     commit message update.

v9: Changes related to addition of new function resctrl_config_cntr().
    The removed rdtgroup_mbm_cntr_is_assigned() as it was introduced
    already.
    Text changes to take care comments.

v8: Renamed rdtgroup_mbm_cntr_is_assigned to mbm_cntr_assigned_to_domain
    Added return error handling in resctrl_arch_config_cntr().

v7: Merged rdtgroup_unassign_cntr and rdtgroup_free_cntr functions.
    Renamed rdtgroup_mbm_cntr_test() to rdtgroup_mbm_cntr_is_assigned().
    Reworded the commit log little bit.

v6: Removed mbm_cntr_free from this patch.
    Added counter test in all the domains and free if it is not assigned to
    any domains.

v5: Few name changes to match cntr_id.
    Changed the function names to rdtgroup_unassign_cntr
    More comments on commit log.

v4: Added domain specific unassign feature.
    Few name changes.

v3: Removed the static from the prototype of rdtgroup_unassign_abmc.
    The function is not called directly from user anymore. These
    changes are related to global assignment interface.

v2: No changes.
---
 fs/resctrl/internal.h |  2 ++
 fs/resctrl/monitor.c  | 60 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 62 insertions(+)

diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index ce4fcac91937..64ddc107fcab 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -379,6 +379,8 @@ int resctrl_find_cleanest_closid(void);
 
 int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
 			      struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
+int resctrl_unassign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
+				struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
 
 #ifdef CONFIG_RESCTRL_FS_PSEUDO_LOCK
 int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index d76fd0840946..fbc938bd3b23 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -989,6 +989,14 @@ static int mbm_cntr_alloc(struct rdt_resource *r, struct rdt_mon_domain *d,
 	return -ENOSPC;
 }
 
+/*
+ * mbm_cntr_free() -  Reset cntr_id to zero.
+ */
+static void mbm_cntr_free(struct rdt_mon_domain *d, int cntr_id)
+{
+	memset(&d->cntr_cfg[cntr_id], 0, sizeof(struct mbm_cntr_cfg));
+}
+
 /*
  * mbm_get_mon_event() - Return the mon_evt entry for the matching evtid.
  */
@@ -1068,3 +1076,55 @@ int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
 
 	return ret;
 }
+
+/*
+ * Unassign and free the counter if assigned.
+ */
+static int resctrl_free_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+				    struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
+{
+	struct mon_evt *mevt;
+	int cntr_id;
+
+	cntr_id = mbm_cntr_get(r, d, rdtgrp, evtid);
+
+	/* If there is no cntr_id assigned, nothing to do */
+	if (cntr_id < 0)
+		return 0;
+
+	mevt = mbm_get_mon_event(r, evtid);
+	if (!mevt) {
+		rdt_last_cmd_printf("Invalid event id %d\n", evtid);
+		return -EINVAL;
+	}
+
+	resctrl_config_cntr(r, d, evtid, rdtgrp->mon.rmid, rdtgrp->closid,
+			    cntr_id, mevt->evt_cfg, false);
+
+	mbm_cntr_free(d, cntr_id);
+
+	return 0;
+}
+
+/*
+ * Unassign a hardware counter associated with @evtid from the domain and
+ * the group. Unassign the counters from all the domains if @d is NULL else
+ * unassign from @d.
+ */
+int  resctrl_unassign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
+				 struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
+{
+	int ret;
+
+	if (!d) {
+		list_for_each_entry(d, &r->mon_domains, hdr.list) {
+			ret = resctrl_free_config_cntr(r, d, rdtgrp, evtid);
+			if (ret)
+				return ret;
+		}
+	} else {
+		ret = resctrl_free_config_cntr(r, d, rdtgrp, evtid);
+	}
+
+	return ret;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v13 15/27] x86/resctrl: Report 'Unassigned' for MBM events in mbm_cntr_assign mode
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (13 preceding siblings ...)
  2025-05-15 22:51 ` [PATCH v13 14/27] x86/resctrl: Add the functionality to unassign " Babu Moger
@ 2025-05-15 22:52 ` Babu Moger
  2025-05-22 23:01   ` Reinette Chatre
  2025-05-15 22:52 ` [PATCH v13 16/27] x86/resctrl: Pass entire struct rdtgroup rather than passing individual members Babu Moger
                   ` (13 subsequent siblings)
  28 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:52 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

In mbm_cntr_assign mode, the hardware counter should be assigned to read
the MBM events.

Report 'Unassigned' in case the user attempts to read the event without
assigning a hardware counter.

Export resctrl_is_mbm_event() and mbm_cntr_get() to allow usage from other
functions within fs/resctrl.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v13: Minor commitlog and user doc update.
     Resolved conflicts caused by the recent FS/ARCH code restructure.
     The monitor.c/rdtgroup.c files have been split between the FS and ARCH directories.

v12: Updated the documentation for more clarity.

v11: Domain can be NULL with SNC support so moved the unassign check in
     rdtgroup_mondata_show().

v10: Moved the code to check the assign state inside mon_event_read().
     Fixed few text comments.

v9: Used is_mbm_event() to check the event type.
    Minor user documentation update.

v8: Used MBM_EVENT_ARRAY_INDEX to get the index for the MBM event.
    Documentation update to make the text generic.

v7: Moved the documentation under "mon_data".
    Updated the text little bit.

v6: Added more explaination in the resctrl.rst
    Added checks to detect "Unassigned" before reading RMID.

v5: New patch.
---
 Documentation/filesystems/resctrl.rst |  8 ++++++++
 fs/resctrl/ctrlmondata.c              | 14 ++++++++++++++
 fs/resctrl/internal.h                 |  2 ++
 fs/resctrl/monitor.c                  |  4 ++--
 fs/resctrl/rdtgroup.c                 |  2 +-
 include/linux/resctrl.h               |  1 +
 6 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index 2bfad43aac9c..5cf2d742f04c 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -430,6 +430,14 @@ When monitoring is enabled all MON groups will also contain:
 	for the L3 cache they occupy). These are named "mon_sub_L3_YY"
 	where "YY" is the node number.
 
+	The mbm_cntr_assign mode offers "num_mbm_cntrs" number of counters
+	and allows users to assign a counter to mon_hw_id, event pair enabling
+	bandwidth monitoring for as long as the counter remains assigned.
+	The hardware will continue tracking the assigned mon_hw_id until
+	the user manually unassigns it, ensuring that counters are not reset
+	during this period. An MBM event returns 'Unassigned' when the event
+	does not have a hardware counter assigned.
+
 "mon_hw_id":
 	Available only with debug option. The identifier used by hardware
 	for the monitor group. On x86 this is the RMID.
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index 6ed2dfd4dbbd..f6b8ad24b0b5 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -643,6 +643,18 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 			goto out;
 		}
 		d = container_of(hdr, struct rdt_mon_domain, hdr);
+
+		/*
+		 * Report 'Unassigned' if mbm_cntr_assign mode is enabled and
+		 * counter is unassigned.
+		 */
+		if (resctrl_arch_mbm_cntr_assign_enabled(r) &&
+		    resctrl_is_mbm_event(evtid) &&
+		    (mbm_cntr_get(r, d, rdtgrp, evtid) < 0)) {
+			rr.err = -ENOENT;
+			goto checkresult;
+		}
+
 		mon_event_read(&rr, r, d, rdtgrp, &d->hdr.cpu_mask, evtid, false);
 	}
 
@@ -652,6 +664,8 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 		seq_puts(m, "Error\n");
 	else if (rr.err == -EINVAL)
 		seq_puts(m, "Unavailable\n");
+	else if (rr.err == -ENOENT)
+		seq_puts(m, "Unassigned\n");
 	else
 		seq_printf(m, "%llu\n", rr.val);
 
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 64ddc107fcab..0dfd2efe68fc 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -381,6 +381,8 @@ int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
 			      struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
 int resctrl_unassign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
 				struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
+int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
+		 struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
 
 #ifdef CONFIG_RESCTRL_FS_PSEUDO_LOCK
 int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index fbc938bd3b23..c98a61bde179 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -956,8 +956,8 @@ static void resctrl_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d
  * mbm_cntr_get() - Return the cntr_id for the matching evtid and rdtgrp in
  *		    cntr_cfg array.
  */
-static int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
-			struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
+int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
+		 struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
 {
 	int cntr_id;
 
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index f192b2736a77..72317a5adee2 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -127,7 +127,7 @@ static bool resctrl_is_mbm_enabled(void)
 		resctrl_arch_is_mbm_local_enabled());
 }
 
-static bool resctrl_is_mbm_event(int e)
+bool resctrl_is_mbm_event(int e)
 {
 	return (e >= QOS_L3_MBM_TOTAL_EVENT_ID &&
 		e <= QOS_L3_MBM_LOCAL_EVENT_ID);
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 59a4fe60ab46..f78b6064230c 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -441,6 +441,7 @@ static inline u32 resctrl_get_config_index(u32 closid,
 	}
 }
 
+bool resctrl_is_mbm_event(int e);
 bool resctrl_arch_get_cdp_enabled(enum resctrl_res_level l);
 int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v13 16/27] x86/resctrl: Pass entire struct rdtgroup rather than passing individual members
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (14 preceding siblings ...)
  2025-05-15 22:52 ` [PATCH v13 15/27] x86/resctrl: Report 'Unassigned' for MBM events in mbm_cntr_assign mode Babu Moger
@ 2025-05-15 22:52 ` Babu Moger
  2025-05-22 23:05   ` Reinette Chatre
  2025-05-15 22:52 ` [PATCH v13 17/27] x86/resctrl: Add the support for reading ABMC counters Babu Moger
                   ` (12 subsequent siblings)
  28 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:52 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

The mbm_cntr_assign mode requires a cntr_id to read event data. The
cntr_id is retrieved via mbm_cntr_get, which takes a struct rdtgroup as
a parameter.

Passing the full rdtgroup also provides access to closid and rmid, both of
which are necessary to read monitoring events.

Refactor the code to pass the entire struct rdtgroup instead of individual
members in preparation for this requirement.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v13: New patch to pass the entire struct rdtgroup to __mon_event_count(),
     mbm_update(), and related functions.
---
 fs/resctrl/monitor.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index c98a61bde179..a477be9cdb66 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -357,9 +357,11 @@ static struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
 	}
 }
 
-static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
+static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 {
 	int cpu = smp_processor_id();
+	u32 closid = rdtgrp->closid;
+	u32 rmid = rdtgrp->mon.rmid;
 	struct rdt_mon_domain *d;
 	struct mbm_state *m;
 	int err, ret;
@@ -428,9 +430,11 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
  * __mon_event_count() is compared with the chunks value from the previous
  * invocation. This must be called once per second to maintain values in MBps.
  */
-static void mbm_bw_count(u32 closid, u32 rmid, struct rmid_read *rr)
+static void mbm_bw_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 {
 	u64 cur_bw, bytes, cur_bytes;
+	u32 closid = rdtgrp->closid;
+	u32 rmid = rdtgrp->mon.rmid;
 	struct mbm_state *m;
 
 	m = get_mbm_state(rr->d, closid, rmid, rr->evtid);
@@ -459,7 +463,7 @@ void mon_event_count(void *info)
 
 	rdtgrp = rr->rgrp;
 
-	ret = __mon_event_count(rdtgrp->closid, rdtgrp->mon.rmid, rr);
+	ret = __mon_event_count(rdtgrp, rr);
 
 	/*
 	 * For Ctrl groups read data from child monitor groups and
@@ -470,8 +474,7 @@ void mon_event_count(void *info)
 
 	if (rdtgrp->type == RDTCTRL_GROUP) {
 		list_for_each_entry(entry, head, mon.crdtgrp_list) {
-			if (__mon_event_count(entry->closid, entry->mon.rmid,
-					      rr) == 0)
+			if (__mon_event_count(entry, rr) == 0)
 				ret = 0;
 		}
 	}
@@ -602,7 +605,7 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
 }
 
 static void mbm_update_one_event(struct rdt_resource *r, struct rdt_mon_domain *d,
-				 u32 closid, u32 rmid, enum resctrl_event_id evtid)
+				 struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
 {
 	struct rmid_read rr = {0};
 
@@ -616,30 +619,30 @@ static void mbm_update_one_event(struct rdt_resource *r, struct rdt_mon_domain *
 		return;
 	}
 
-	__mon_event_count(closid, rmid, &rr);
+	__mon_event_count(rdtgrp, &rr);
 
 	/*
 	 * If the software controller is enabled, compute the
 	 * bandwidth for this event id.
 	 */
 	if (is_mba_sc(NULL))
-		mbm_bw_count(closid, rmid, &rr);
+		mbm_bw_count(rdtgrp, &rr);
 
 	resctrl_arch_mon_ctx_free(rr.r, rr.evtid, rr.arch_mon_ctx);
 }
 
 static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d,
-		       u32 closid, u32 rmid)
+		       struct rdtgroup *rdtgrp)
 {
 	/*
 	 * This is protected from concurrent reads from user as both
 	 * the user and overflow handler hold the global mutex.
 	 */
 	if (resctrl_arch_is_mbm_total_enabled())
-		mbm_update_one_event(r, d, closid, rmid, QOS_L3_MBM_TOTAL_EVENT_ID);
+		mbm_update_one_event(r, d, rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
 
 	if (resctrl_arch_is_mbm_local_enabled())
-		mbm_update_one_event(r, d, closid, rmid, QOS_L3_MBM_LOCAL_EVENT_ID);
+		mbm_update_one_event(r, d, rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
 }
 
 /*
@@ -712,11 +715,11 @@ void mbm_handle_overflow(struct work_struct *work)
 	d = container_of(work, struct rdt_mon_domain, mbm_over.work);
 
 	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
-		mbm_update(r, d, prgrp->closid, prgrp->mon.rmid);
+		mbm_update(r, d, prgrp);
 
 		head = &prgrp->mon.crdtgrp_list;
 		list_for_each_entry(crgrp, head, mon.crdtgrp_list)
-			mbm_update(r, d, crgrp->closid, crgrp->mon.rmid);
+			mbm_update(r, d, crgrp);
 
 		if (is_mba_sc(NULL))
 			update_mba_bw(prgrp, d);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v13 17/27] x86/resctrl: Add the support for reading ABMC counters
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (15 preceding siblings ...)
  2025-05-15 22:52 ` [PATCH v13 16/27] x86/resctrl: Pass entire struct rdtgroup rather than passing individual members Babu Moger
@ 2025-05-15 22:52 ` Babu Moger
  2025-05-22 23:31   ` Reinette Chatre
  2025-05-15 22:52 ` [PATCH v13 18/27] x86/resctrl: Add definitions for MBM event configuration Babu Moger
                   ` (11 subsequent siblings)
  28 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:52 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

Software can read the assignable counters using the QM_EVTSEL and QM_CTR
register pair.

QM_EVTSEL Register definition:
=======================================================
Bits	Mnemonic	Description
=======================================================
63:44	--		Reserved
43:32   RMID		Resource Monitoring Identifier
31	ExtEvtID	Extended Event Identifier
30:8	--		Reserved
7:0	EvtID		Event Identifier
=======================================================

The contents of a specific counter can be read by setting the following
fields in QM_EVTSEL.ExtendedEvtID = 1, QM_EVTSEL.EvtID = L3CacheABMC (=1)
and setting [RMID] to the desired counter ID. Reading QM_CTR will then
return the contents of the specified counter. The E bit will be set if the
counter configuration was invalid, or if an invalid counter ID was set
in the QM_EVTSEL[RMID] field.

Introduce __cntr_id_read_phys() to read the counter ID event data.

Link: https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/40332.pdf
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v13: Split the patch into 2. First one to handle the passing of rdtgroup structure to few
     functions( __mon_event_count and mbm_update(). Second one to handle ABMC counter reading.
     Added new function __cntr_id_read_phys() to handle ABMC event reading.
     Updated kernel doc for resctrl_arch_reset_rmid() and resctrl_arch_rmid_read().
     Resolved conflicts caused by the recent FS/ARCH code restructure.
     The monitor.c file has now been split between the FS and ARCH directories.

v12: New patch to support extended event mode when ABMC is enabled.
---
 arch/x86/kernel/cpu/resctrl/internal.h |  6 +++
 arch/x86/kernel/cpu/resctrl/monitor.c  | 66 ++++++++++++++++++++++----
 fs/resctrl/monitor.c                   | 14 ++++--
 include/linux/resctrl.h                |  9 ++--
 4 files changed, 80 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index db6b0c28ee6b..3b0cdb5520c7 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -40,6 +40,12 @@ struct arch_mbm_state {
 /* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
 #define ABMC_ENABLE_BIT			0
 
+/*
+ * ABMC Qos Event Identifiers.
+ */
+#define ABMC_EXTENDED_EVT_ID		BIT(31)
+#define ABMC_EVT_ID			1
+
 /**
  * struct rdt_hw_ctrl_domain - Arch private attributes of a set of CPUs that share
  *			       a resource for a control function
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index e31084f7babd..36a03dae6d8e 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -161,6 +161,41 @@ static int __rmid_read_phys(u32 prmid, enum resctrl_event_id eventid, u64 *val)
 	return 0;
 }
 
+static int __cntr_id_read_phys(u32 cntr_id, u64 *val)
+{
+	u64 msr_val;
+
+	/*
+	 * QM_EVTSEL Register definition:
+	 * =======================================================
+	 * Bits    Mnemonic        Description
+	 * =======================================================
+	 * 63:44   --              Reserved
+	 * 43:32   RMID            Resource Monitoring Identifier
+	 * 31      ExtEvtID        Extended Event Identifier
+	 * 30:8    --              Reserved
+	 * 7:0     EvtID           Event Identifier
+	 * =======================================================
+	 * The contents of a specific counter can be read by setting the
+	 * following fields in QM_EVTSEL.ExtendedEvtID(=1) and
+	 * QM_EVTSEL.EvtID = L3CacheABMC (=1) and setting [RMID] to the
+	 * desired counter ID. Reading QM_CTR will then return the
+	 * contents of the specified counter. The E bit will be set if the
+	 * counter configuration was invalid, or if an invalid counter ID
+	 * was set in the QM_EVTSEL[RMID] field.
+	 */
+	wrmsr(MSR_IA32_QM_EVTSEL, ABMC_EXTENDED_EVT_ID | ABMC_EVT_ID, cntr_id);
+	rdmsrl(MSR_IA32_QM_CTR, msr_val);
+
+	if (msr_val & RMID_VAL_ERROR)
+		return -EIO;
+	if (msr_val & RMID_VAL_UNAVAIL)
+		return -EINVAL;
+
+	*val = msr_val;
+	return 0;
+}
+
 static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_mon_domain *hw_dom,
 						 u32 rmid,
 						 enum resctrl_event_id eventid)
@@ -180,7 +215,7 @@ static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_mon_domain *hw_do
 }
 
 void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
-			     u32 unused, u32 rmid,
+			     u32 unused, u32 rmid, int cntr_id,
 			     enum resctrl_event_id eventid)
 {
 	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
@@ -192,9 +227,16 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
 	if (am) {
 		memset(am, 0, sizeof(*am));
 
-		prmid = logical_rmid_to_physical_rmid(cpu, rmid);
-		/* Record any initial, non-zero count value. */
-		__rmid_read_phys(prmid, eventid, &am->prev_msr);
+		if (resctrl_arch_mbm_cntr_assign_enabled(r) &&
+		    resctrl_is_mbm_event(eventid)) {
+			if (cntr_id < 0)
+				return;
+			__cntr_id_read_phys(cntr_id, &am->prev_msr);
+		} else {
+			prmid = logical_rmid_to_physical_rmid(cpu, rmid);
+			/* Record any initial, non-zero count value. */
+			__rmid_read_phys(prmid, eventid, &am->prev_msr);
+		}
 	}
 }
 
@@ -224,8 +266,8 @@ static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
 }
 
 int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
-			   u32 unused, u32 rmid, enum resctrl_event_id eventid,
-			   u64 *val, void *ignored)
+			   u32 unused, u32 rmid, int cntr_id,
+			   enum resctrl_event_id eventid, u64 *val, void *ignored)
 {
 	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
@@ -237,8 +279,16 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
 
 	resctrl_arch_rmid_read_context_check();
 
-	prmid = logical_rmid_to_physical_rmid(cpu, rmid);
-	ret = __rmid_read_phys(prmid, eventid, &msr_val);
+	if (resctrl_arch_mbm_cntr_assign_enabled(r) &&
+	    resctrl_is_mbm_event(eventid)) {
+		if (cntr_id < 0)
+			return cntr_id;
+		ret = __cntr_id_read_phys(cntr_id, &msr_val);
+	} else {
+		prmid = logical_rmid_to_physical_rmid(cpu, rmid);
+		ret = __rmid_read_phys(prmid, eventid, &msr_val);
+	}
+
 	if (ret)
 		return ret;
 
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index a477be9cdb66..72f3dfb5b903 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -159,7 +159,11 @@ void __check_limbo(struct rdt_mon_domain *d, bool force_free)
 			break;
 
 		entry = __rmid_entry(idx);
-		if (resctrl_arch_rmid_read(r, d, entry->closid, entry->rmid,
+		/*
+		 * cntr_id is not relevant for QOS_L3_OCCUP_EVENT_ID.
+		 * Pass dummy value -1.
+		 */
+		if (resctrl_arch_rmid_read(r, d, entry->closid, entry->rmid, -1,
 					   QOS_L3_OCCUP_EVENT_ID, &val,
 					   arch_mon_ctx)) {
 			rmid_dirty = true;
@@ -359,6 +363,7 @@ static struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
 
 static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 {
+	int cntr_id = mbm_cntr_get(rr->r, rr->d, rdtgrp, rr->evtid);
 	int cpu = smp_processor_id();
 	u32 closid = rdtgrp->closid;
 	u32 rmid = rdtgrp->mon.rmid;
@@ -368,7 +373,7 @@ static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 	u64 tval = 0;
 
 	if (rr->first) {
-		resctrl_arch_reset_rmid(rr->r, rr->d, closid, rmid, rr->evtid);
+		resctrl_arch_reset_rmid(rr->r, rr->d, closid, rmid, cntr_id, rr->evtid);
 		m = get_mbm_state(rr->d, closid, rmid, rr->evtid);
 		if (m)
 			memset(m, 0, sizeof(struct mbm_state));
@@ -379,7 +384,7 @@ static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 		/* Reading a single domain, must be on a CPU in that domain. */
 		if (!cpumask_test_cpu(cpu, &rr->d->hdr.cpu_mask))
 			return -EINVAL;
-		rr->err = resctrl_arch_rmid_read(rr->r, rr->d, closid, rmid,
+		rr->err = resctrl_arch_rmid_read(rr->r, rr->d, closid, rmid, cntr_id,
 						 rr->evtid, &tval, rr->arch_mon_ctx);
 		if (rr->err)
 			return rr->err;
@@ -404,7 +409,8 @@ static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 	list_for_each_entry(d, &rr->r->mon_domains, hdr.list) {
 		if (d->ci->id != rr->ci->id)
 			continue;
-		err = resctrl_arch_rmid_read(rr->r, d, closid, rmid,
+		cntr_id = mbm_cntr_get(rr->r, d, rdtgrp, rr->evtid);
+		err = resctrl_arch_rmid_read(rr->r, d, closid, rmid, cntr_id,
 					     rr->evtid, &tval, rr->arch_mon_ctx);
 		if (!err) {
 			rr->val += tval;
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index f78b6064230c..cd24d1577e0a 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -473,6 +473,7 @@ void resctrl_offline_cpu(unsigned int cpu);
  *			counter may match traffic of both @closid and @rmid, or @rmid
  *			only.
  * @rmid:		rmid of the counter to read.
+ * @cntr_id:		cntr_id to read MBM events with mbm_cntr_assign mode.
  * @eventid:		eventid to read, e.g. L3 occupancy.
  * @val:		result of the counter read in bytes.
  * @arch_mon_ctx:	An architecture specific value from
@@ -490,8 +491,9 @@ void resctrl_offline_cpu(unsigned int cpu);
  * 0 on success, or -EIO, -EINVAL etc on error.
  */
 int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
-			   u32 closid, u32 rmid, enum resctrl_event_id eventid,
-			   u64 *val, void *arch_mon_ctx);
+			   u32 closid, u32 rmid, int cntr_id,
+			   enum resctrl_event_id eventid, u64 *val,
+			   void *arch_mon_ctx);
 
 /**
  * resctrl_arch_rmid_read_context_check()  - warn about invalid contexts
@@ -532,12 +534,13 @@ struct rdt_domain_hdr *resctrl_find_domain(struct list_head *h, int id,
  * @closid:	closid that matches the rmid. Depending on the architecture, the
  *		counter may match traffic of both @closid and @rmid, or @rmid only.
  * @rmid:	The rmid whose counter values should be reset.
+ * @cntr_id:	The cntr_id to read MBM events with mbm_cntr_assign mode.
  * @eventid:	The eventid whose counter values should be reset.
  *
  * This can be called from any CPU.
  */
 void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
-			     u32 closid, u32 rmid,
+			     u32 closid, u32 rmid, int cntr_id,
 			     enum resctrl_event_id eventid);
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v13 18/27] x86/resctrl: Add definitions for MBM event configuration
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (16 preceding siblings ...)
  2025-05-15 22:52 ` [PATCH v13 17/27] x86/resctrl: Add the support for reading ABMC counters Babu Moger
@ 2025-05-15 22:52 ` Babu Moger
  2025-05-23  4:41   ` Reinette Chatre
  2025-05-15 22:52 ` [PATCH v13 19/27] x86/resctrl: Add event configuration directory under info/L3_MON/ Babu Moger
                   ` (10 subsequent siblings)
  28 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:52 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

The "mbm_cntr_assign" mode allows users to manually assign a hardware
counter to a specific RMID and event pair. The events available for
assignment are configurable.

By default, each resctrl group supports two MBM events: mbm_total_bytes
and mbm_local_bytes. Each event corresponds to an MBM configuration that
specifies the bandwidth sources tracked by the event.

Add definitions of supported bandwidth sources.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v13: Updated the changelog.
     Removed the definitions from resctrl_types.h and moved to internal.h.
     Removed mbm_assign_config definition. Configurations will be part of
     mon_evt list.
     Resolved conflicts caused by the recent FS/ARCH code restructure.
     The rdtgroup.c file has now been split between the FS and ARCH directories.

v12: New patch to support event configurations via new counter_configs
     method.
---
 fs/resctrl/internal.h | 10 ++++++++++
 fs/resctrl/rdtgroup.c | 14 ++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 0dfd2efe68fc..019d00bf5adf 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -203,6 +203,16 @@ struct rdtgroup {
 	struct pseudo_lock_region	*plr;
 };
 
+/**
+ * struct mbm_evt_value - Specific type of memory events.
+ * @evt_name:		Name of memory transaction type (read, write etc).
+ * @evt_val:		Value representing the memory transaction.
+ */
+struct mbm_evt_value {
+	char    evt_name[32];
+	u32     evt_val;
+};
+
 /* rdtgroup.flags */
 #define	RDT_DELETED		1
 
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 72317a5adee2..b109e91096b0 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -75,6 +75,20 @@ static void rdtgroup_destroy_root(void);
 
 struct dentry *debugfs_resctrl;
 
+/* Number of memory transaction types that can be monitored */
+#define NUM_MBM_EVT_VALUES             7
+
+/* Decoded values for each type of memory events */
+struct mbm_evt_value mbm_evt_values[NUM_MBM_EVT_VALUES] = {
+	{"local_reads", READS_TO_LOCAL_MEM},
+	{"remote_reads", READS_TO_REMOTE_MEM},
+	{"local_non_temporal_writes", NON_TEMP_WRITE_TO_LOCAL_MEM},
+	{"remote_non_temporal_writes", NON_TEMP_WRITE_TO_REMOTE_MEM},
+	{"local_reads_slow_memory", READS_TO_LOCAL_S_MEM},
+	{"remote_reads_slow_memory", READS_TO_REMOTE_S_MEM},
+	{"dirty_victim_writes_all", DIRTY_VICTIMS_TO_ALL_MEM},
+};
+
 /*
  * Memory bandwidth monitoring event to use for the default CTRL_MON group
  * and each new CTRL_MON group created by the user.  Only relevant when
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v13 19/27] x86/resctrl: Add event configuration directory under info/L3_MON/
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (17 preceding siblings ...)
  2025-05-15 22:52 ` [PATCH v13 18/27] x86/resctrl: Add definitions for MBM event configuration Babu Moger
@ 2025-05-15 22:52 ` Babu Moger
  2025-05-23  4:43   ` Reinette Chatre
  2025-05-15 22:52 ` [PATCH v13 20/27] x86/resctrl: Provide interface to update the event configurations Babu Moger
                   ` (9 subsequent siblings)
  28 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:52 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

Create the configuration directory and files for mbm_cntr_assign mode.
These configurations will be used to assign MBM events in mbm_cntr_assign
mode, with two default configurations created upon mounting.

Example:
$ cd /sys/fs/resctrl/
$ cat info/L3_MON/counter_configs/mbm_total_bytes/event_filter
  local_reads, remote_reads, local_non_temporal_writes,
  remote_non_temporal_writes, local_reads_slow_memory,
  remote_reads_slow_memory, dirty_victim_writes_all

$ cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
  local_reads, local_non_temporal_writes, local_reads_slow_memory

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v13: Updated user doc (resctrl.rst).
     Changed the name of the function resctrl_mkdir_info_configs to
     resctrl_mkdir_counter_configs().
     Replaced seq_puts() with seq_putc() where applicable.
     Removed RFTYPE_MON_CONFIG definition. Not required.
     Changed the name of the flag RFTYPE_CONFIG to RFTYPE_ASSIGN_CONFIG.
     Reinette suggested RFTYPE_MBM_EVENT_CONFIG but RFTYPE_ASSIGN_CONFIG
     seemed shorter and pricise.
     The configuration is created using evt_list.
     Resolved conflicts caused by the recent FS/ARCH code restructure.
     The monitor.c/rdtgroup.c files have been split between the FS and ARCH directories.

v12: New patch to hold the MBM event configurations for mbm_cntr_assign mode.
---
 Documentation/filesystems/resctrl.rst | 30 ++++++++++
 fs/resctrl/internal.h                 |  2 +
 fs/resctrl/monitor.c                  |  1 +
 fs/resctrl/rdtgroup.c                 | 80 +++++++++++++++++++++++++++
 4 files changed, 113 insertions(+)

diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index 5cf2d742f04c..4eb9f007ba3d 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -306,6 +306,36 @@ with the following files:
 	  # cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
 	  0=30;1=30
 
+"counter_configs":
+	When the "mbm_cntr_assign" mode is supported, a dedicated directory is created
+	under the "L3_MON" directory to store configuration files.
+
+	These files contain the list of configurable events. There are two default
+	configurations: mbm_local_bytes and mbm_total_bytes.
+
+	Following types of events are supported:
+
+	==== ========================= ============================================================
+	Bits Name   		         Description
+	==== ========================= ============================================================
+	6    dirty_victim_writes_all     Dirty Victims from the QOS domain to all types of memory
+	5    remote_reads_slow_memory    Reads to slow memory in the non-local NUMA domain
+	4    local_reads_slow_memory     Reads to slow memory in the local NUMA domain
+	3    remote_non_temporal_writes  Non-temporal writes to non-local NUMA domain
+	2    local_non_temporal_writes   Non-temporal writes to local NUMA domain
+	1    remote_reads                Reads to memory in the non-local NUMA domain
+	0    local_reads                 Reads to memory in the local NUMA domain
+	==== ========================= ==========================================================
+
+	For example::
+
+	  # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
+	  local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
+	  local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all
+
+	  # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
+	  local_reads, local_non_temporal_writes, local_reads_slow_memory
+
 "max_threshold_occupancy":
 		Read/write file provides the largest value (in
 		bytes) at which a previously used LLC_occupancy
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 019d00bf5adf..446cc9cc61df 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -238,6 +238,8 @@ struct mbm_evt_value {
 
 #define RFTYPE_DEBUG			BIT(10)
 
+#define RFTYPE_ASSIGN_CONFIG		BIT(11)
+
 #define RFTYPE_CTRL_INFO		(RFTYPE_INFO | RFTYPE_CTRL)
 
 #define RFTYPE_MON_INFO			(RFTYPE_INFO | RFTYPE_MON)
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 72f3dfb5b903..1f72249a5c93 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -932,6 +932,7 @@ int resctrl_mon_resource_init(void)
 					 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
 		resctrl_file_fflags_init("available_mbm_cntrs",
 					 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
+		resctrl_file_fflags_init("event_filter", RFTYPE_ASSIGN_CONFIG);
 	}
 
 	return 0;
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index b109e91096b0..cf84e3a382ac 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1911,6 +1911,25 @@ static int resctrl_available_mbm_cntrs_show(struct kernfs_open_file *of,
 	return ret;
 }
 
+static int event_filter_show(struct kernfs_open_file *of, struct seq_file *seq, void *v)
+{
+	struct mon_evt *mevt = rdt_kn_parent_priv(of->kn);
+	bool sep = false;
+	int i;
+
+	for (i = 0; i < NUM_MBM_EVT_VALUES; i++) {
+		if (mevt->evt_cfg & mbm_evt_values[i].evt_val) {
+			if (sep)
+				seq_putc(seq, ',');
+			seq_printf(seq, "%s", mbm_evt_values[i].evt_name);
+			sep = true;
+		}
+	}
+	seq_putc(seq, '\n');
+
+	return 0;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
@@ -2035,6 +2054,12 @@ static struct rftype res_common_files[] = {
 		.seq_show	= mbm_local_bytes_config_show,
 		.write		= mbm_local_bytes_config_write,
 	},
+	{
+		.name		= "event_filter",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= event_filter_show,
+	},
 	{
 		.name		= "mbm_assign_mode",
 		.mode		= 0444,
@@ -2317,6 +2342,55 @@ static int rdtgroup_mkdir_info_resdir(void *priv, char *name,
 	return ret;
 }
 
+static int resctrl_mkdir_counter_configs(struct rdt_resource *r, char *name)
+{
+	struct kernfs_node *l3_mon_kn, *kn_subdir, *kn_subdir2;
+	struct mon_evt *mevt;
+	int ret;
+
+	l3_mon_kn = kernfs_find_and_get(kn_info, name);
+	if (!l3_mon_kn)
+		return -ENOENT;
+
+	kn_subdir = kernfs_create_dir(l3_mon_kn, "counter_configs", l3_mon_kn->mode, NULL);
+	if (IS_ERR(kn_subdir)) {
+		kernfs_put(l3_mon_kn);
+		return PTR_ERR(kn_subdir);
+	}
+
+	ret = rdtgroup_kn_set_ugid(kn_subdir);
+	if (ret) {
+		kernfs_put(l3_mon_kn);
+		return ret;
+	}
+
+	list_for_each_entry(mevt, &r->mon.evt_list, list) {
+		if (mevt->mbm_mode == MBM_MODE_ASSIGN) {
+			kn_subdir2 = kernfs_create_dir(kn_subdir, mevt->name,
+						       kn_subdir->mode, mevt);
+			if (IS_ERR(kn_subdir2)) {
+				ret = PTR_ERR(kn_subdir2);
+				goto config_out;
+			}
+
+			ret = rdtgroup_kn_set_ugid(kn_subdir2);
+			if (ret)
+				goto config_out;
+
+			ret = rdtgroup_add_files(kn_subdir2, RFTYPE_ASSIGN_CONFIG);
+			if (!ret)
+				kernfs_activate(kn_subdir);
+		}
+	}
+
+config_out:
+	kernfs_put(l3_mon_kn);
+	if (ret)
+		kernfs_remove(kn_subdir);
+
+	return ret;
+}
+
 static unsigned long fflags_from_resource(struct rdt_resource *r)
 {
 	switch (r->rid) {
@@ -2363,6 +2437,12 @@ static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
 		ret = rdtgroup_mkdir_info_resdir(r, name, fflags);
 		if (ret)
 			goto out_destroy;
+
+		if (r->mon.mbm_cntr_assignable) {
+			ret = resctrl_mkdir_counter_configs(r, name);
+			if (ret)
+				goto out_destroy;
+		}
 	}
 
 	ret = rdtgroup_kn_set_ugid(kn_info);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v13 20/27] x86/resctrl: Provide interface to update the event configurations
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (18 preceding siblings ...)
  2025-05-15 22:52 ` [PATCH v13 19/27] x86/resctrl: Add event configuration directory under info/L3_MON/ Babu Moger
@ 2025-05-15 22:52 ` Babu Moger
  2025-05-23  4:45   ` Reinette Chatre
  2025-05-15 22:52 ` [PATCH v13 21/27] x86/resctrl: Introduce mbm_assign_on_mkdir to configure assignments Babu Moger
                   ` (8 subsequent siblings)
  28 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:52 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

Users can modify the event configuration by writing to the event_filter
interface file. The event configurations for mbm_cntr_assign mode are
located in /sys/fs/resctrl/info/event_configs/.

Update the assignments of all groups when the event configuration is
modified.

Example:
$ cd /sys/fs/resctrl/

$ cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
  local_reads,local_non_temporal_writes,local_reads_slow_memory

$ echo "local_reads,local_non_temporal_writes" >
  info/L3_MON/counter_configs/mbm_total_bytes/event_filter

$ cat info/L3_MON/counter_configs/mbm_total_bytes/event_filter
  local_reads,local_non_temporal_writes

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v13: Updated changelog for imperative mode.
     Added function description in the prototype.
     Updated the user doc resctrl.rst to address few feedback.
     Resolved conflicts caused by the recent FS/ARCH code restructure.
     The rdtgroup.c/monitor.c file has now been split between the FS and ARCH directories.

v12: New patch to modify event configurations.
---
 Documentation/filesystems/resctrl.rst |  12 +++
 fs/resctrl/rdtgroup.c                 | 120 +++++++++++++++++++++++++-
 2 files changed, 131 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index 4eb9f007ba3d..9923276826db 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -336,6 +336,18 @@ with the following files:
 	  # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
 	  local_reads, local_non_temporal_writes, local_reads_slow_memory
 
+	Modify the event configuration by writing to the "event_filter" file within the
+	configuration directory. The read/write event_filter file contains the configuration
+	of the event that reflects which memory transactions are counted by it.
+
+	For example::
+
+	  # echo "local_reads, local_non_temporal_writes" >
+	    /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
+
+	  # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
+	   local_reads, local_non_temporal_writes
+
 "max_threshold_occupancy":
 		Read/write file provides the largest value (in
 		bytes) at which a previously used LLC_occupancy
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index cf84e3a382ac..8c498b41be5d 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1930,6 +1930,123 @@ static int event_filter_show(struct kernfs_open_file *of, struct seq_file *seq,
 	return 0;
 }
 
+/**
+ * resctrl_group_assign - Update the counter assignments for the event in
+ *			  a group.
+ * @r:		Resource to which update needs to be done.
+ * @rdtgrp:	Resctrl group.
+ * @evtid:	Event ID.
+ * @evt_cfg:	Event configuration value.
+ */
+static int resctrl_group_assign(struct rdt_resource *r, struct rdtgroup *rdtgrp,
+				enum resctrl_event_id evtid, u32 evt_cfg)
+{
+	struct rdt_mon_domain *d;
+	int cntr_id;
+
+	list_for_each_entry(d, &r->mon_domains, hdr.list) {
+		cntr_id = mbm_cntr_get(r, d, rdtgrp, evtid);
+		if (cntr_id >= 0 && d->cntr_cfg[cntr_id].evt_cfg != evt_cfg) {
+			d->cntr_cfg[cntr_id].evt_cfg = evt_cfg;
+			resctrl_arch_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
+						 rdtgrp->closid, cntr_id, evt_cfg, true);
+		}
+	}
+
+	return 0;
+}
+
+/**
+ * resctrl_update_assign - Update the counter assignments for the event for all
+ *			   the groups.
+ * @r:		Resource to which update needs to be done.
+ * @evtid:	Event ID.
+ * @evt_cfg:	Event configuration value.
+ */
+static int resctrl_update_assign(struct rdt_resource *r, enum resctrl_event_id evtid,
+				 u32 evt_cfg)
+{
+	struct rdtgroup *prgrp, *crgrp;
+
+	/* Check if the cntr_id is associated to the event type updated */
+	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
+		resctrl_group_assign(r, prgrp, evtid, evt_cfg);
+
+		list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list, mon.crdtgrp_list) {
+			resctrl_group_assign(r, crgrp, evtid, evt_cfg);
+		}
+	}
+
+	return 0;
+}
+
+static int resctrl_process_configs(char *tok, u32 *val)
+{
+	char *evt_str;
+	bool found;
+	int i;
+
+next_config:
+	if (!tok || tok[0] == '\0')
+		return 0;
+
+	/* Start processing the strings for each event type */
+	evt_str = strim(strsep(&tok, ","));
+	found = false;
+	for (i = 0; i < NUM_MBM_EVT_VALUES; i++) {
+		if (!strcmp(mbm_evt_values[i].evt_name, evt_str)) {
+			*val |=  mbm_evt_values[i].evt_val;
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		rdt_last_cmd_printf("Invalid event type %s\n", evt_str);
+		return -EINVAL;
+	}
+
+	goto next_config;
+}
+
+static ssize_t event_filter_write(struct kernfs_open_file *of, char *buf,
+				  size_t nbytes, loff_t off)
+{
+	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
+	struct mon_evt *mevt = rdt_kn_parent_priv(of->kn);
+	u32 evt_cfg = 0;
+	int ret = 0;
+
+	/* Valid input requires a trailing newline */
+	if (nbytes == 0 || buf[nbytes - 1] != '\n')
+		return -EINVAL;
+
+	buf[nbytes - 1] = '\0';
+
+	cpus_read_lock();
+	mutex_lock(&rdtgroup_mutex);
+
+	rdt_last_cmd_clear();
+
+	if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
+		rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
+		ret = -EINVAL;
+		goto unlock_out;
+	}
+
+	ret = resctrl_process_configs(buf, &evt_cfg);
+	if (!ret && mevt->evt_val != evt_cfg) {
+		mevt->evt_val = evt_cfg;
+		resctrl_update_assign(r, mevt->evtid, evt_cfg);
+	}
+
+unlock_out:
+	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
+
+	return ret ?: nbytes;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
@@ -2056,9 +2173,10 @@ static struct rftype res_common_files[] = {
 	},
 	{
 		.name		= "event_filter",
-		.mode		= 0444,
+		.mode		= 0644,
 		.kf_ops		= &rdtgroup_kf_single_ops,
 		.seq_show	= event_filter_show,
+		.write		= event_filter_write,
 	},
 	{
 		.name		= "mbm_assign_mode",
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v13 21/27] x86/resctrl: Introduce mbm_assign_on_mkdir to configure assignments
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (19 preceding siblings ...)
  2025-05-15 22:52 ` [PATCH v13 20/27] x86/resctrl: Provide interface to update the event configurations Babu Moger
@ 2025-05-15 22:52 ` Babu Moger
  2025-05-23  4:48   ` Reinette Chatre
  2025-05-15 22:52 ` [PATCH v13 22/27] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled Babu Moger
                   ` (7 subsequent siblings)
  28 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:52 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

The mbm_cntr_assign mode provides an option to the user to assign a
counter to an RMID, event pair and monitor the bandwidth as long as
the counter is assigned.

Introduce a configuration option to automatically assign counter IDs
when a resctrl group is created, provided the counters are available.
By default, this option is enabled at boot.

Suggested-by: Peter Newman <peternewman@google.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v13: Added Suggested-by tag.
     Resolved conflicts caused by the recent FS/ARCH code restructure.
     The rdtgroup.c/monitor.c file has now been split between the FS and ARCH directories.

v12: New patch. Added after the discussion on the list.
     https://lore.kernel.org/lkml/CALPaoCh8siZKjL_3yvOYGL4cF_n_38KpUFgHVGbQ86nD+Q2_SA@mail.gmail.com/
---
 Documentation/filesystems/resctrl.rst | 10 ++++++
 fs/resctrl/monitor.c                  |  2 ++
 fs/resctrl/rdtgroup.c                 | 44 +++++++++++++++++++++++++--
 include/linux/resctrl.h               |  2 ++
 4 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index 9923276826db..356f1f918a86 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -348,6 +348,16 @@ with the following files:
 	  # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
 	   local_reads, local_non_temporal_writes
 
+"mbm_assign_on_mkdir":
+	Automatically assign the monitoring counters on resctrl group creation
+	if the counters are available. It is enabled by default on boot and users
+	can disable by writing to the interface.
+	::
+
+	  # echo 0 > /sys/fs/resctrl/info/L3_MON/mbm_assign_on_mkdir
+	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_on_mkdir
+	  0
+
 "max_threshold_occupancy":
 		Read/write file provides the largest value (in
 		bytes) at which a previously used LLC_occupancy
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 1f72249a5c93..5f6c4b662f3b 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -933,6 +933,8 @@ int resctrl_mon_resource_init(void)
 		resctrl_file_fflags_init("available_mbm_cntrs",
 					 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
 		resctrl_file_fflags_init("event_filter", RFTYPE_ASSIGN_CONFIG);
+		resctrl_file_fflags_init("mbm_assign_on_mkdir", RFTYPE_MON_INFO |
+					 RFTYPE_RES_CACHE);
 	}
 
 	return 0;
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 8c498b41be5d..0093b323d858 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2035,8 +2035,8 @@ static ssize_t event_filter_write(struct kernfs_open_file *of, char *buf,
 	}
 
 	ret = resctrl_process_configs(buf, &evt_cfg);
-	if (!ret && mevt->evt_val != evt_cfg) {
-		mevt->evt_val = evt_cfg;
+	if (!ret && mevt->evt_cfg != evt_cfg) {
+		mevt->evt_cfg = evt_cfg;
 		resctrl_update_assign(r, mevt->evtid, evt_cfg);
 	}
 
@@ -2047,6 +2047,39 @@ static ssize_t event_filter_write(struct kernfs_open_file *of, char *buf,
 	return ret ?: nbytes;
 }
 
+static int resctrl_mbm_assign_on_mkdir_show(struct kernfs_open_file *of,
+					    struct seq_file *s, void *v)
+{
+	struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
+
+	seq_printf(s, "%u\n", r->mon.mbm_assign_on_mkdir);
+
+	return 0;
+}
+
+static ssize_t resctrl_mbm_assign_on_mkdir_write(struct kernfs_open_file *of,
+						 char *buf, size_t nbytes, loff_t off)
+{
+	struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
+	bool value;
+	int ret;
+
+	ret = kstrtobool(buf, &value);
+	if (ret)
+		return ret;
+
+	cpus_read_lock();
+	mutex_lock(&rdtgroup_mutex);
+	rdt_last_cmd_clear();
+
+	r->mon.mbm_assign_on_mkdir = value;
+
+	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
+
+	return ret ?: nbytes;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
@@ -2056,6 +2089,13 @@ static struct rftype res_common_files[] = {
 		.seq_show	= rdt_last_cmd_status_show,
 		.fflags		= RFTYPE_TOP_INFO,
 	},
+	{
+		.name		= "mbm_assign_on_mkdir",
+		.mode		= 0644,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= resctrl_mbm_assign_on_mkdir_show,
+		.write		= resctrl_mbm_assign_on_mkdir_write,
+	},
 	{
 		.name		= "num_closids",
 		.mode		= 0444,
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index cd24d1577e0a..d6435abdde7b 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -278,6 +278,7 @@ enum resctrl_schema_fmt {
  *			monitoring events can be configured.
  * @num_mbm_cntrs:	Number of assignable monitoring counters
  * @mbm_cntr_assignable:Is system capable of supporting monitor assignment?
+ * @mbm_assign_on_mkdir:Auto enable monitor assignment on mkdir?
  * @evt_list:		List of monitoring events
  */
 struct resctrl_mon {
@@ -285,6 +286,7 @@ struct resctrl_mon {
 	unsigned int		mbm_cfg_mask;
 	int			num_mbm_cntrs;
 	bool			mbm_cntr_assignable;
+	bool			mbm_assign_on_mkdir;
 	struct list_head	evt_list;
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v13 22/27] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (20 preceding siblings ...)
  2025-05-15 22:52 ` [PATCH v13 21/27] x86/resctrl: Introduce mbm_assign_on_mkdir to configure assignments Babu Moger
@ 2025-05-15 22:52 ` Babu Moger
  2025-05-15 22:52 ` [PATCH v13 23/27] x86/resctrl: Introduce mbm_L3_assignments to list assignments in a group Babu Moger
                   ` (6 subsequent siblings)
  28 siblings, 0 replies; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:52 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

Automatically assign or unassign counters when a resctrl group is created
or deleted. By default, each group requires two counters: one for the MBM
total event and one for the MBM local event.

The mbm_cntr_assign mode offers "num_mbm_cntrs" number of counters that
can be assigned to an RMID, event pair and monitor the bandwidth as long
as it is assigned. If these counters are exhausted, the kernel will log
the error message "Unable to allocate counter in domain" in
/sys/fs/resctrl/info/last_cmd_status when a new group is created.

However, the creation of a group should not fail due to assignment
failures. Users have the flexibility to modify the assignments at a later
time.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v13: Changes due to calling of resctrl_assign_cntr_event() and resctrl_unassign_cntr_event().
     It only takes evtid. evt_cfg is not required anymore.
     Resolved conflicts caused by the recent FS/ARCH code restructure.
     The monitor.c/rdtgroup.c files have been split between the FS and ARCH directories.

v12: Removed mbm_cntr_reset() as it is not required while removing the group.
     Update the commit text.
     Added r->mon_capable  check in rdtgroup_assign_cntrs() and rdtgroup_unassign_cntrs.

v11: Moved mbm_cntr_reset() to monitor.c.
     Added code reset non-architectural state in mbm_cntr_reset().
     Added missing rdtgroup_unassign_cntrs() calls on failure path.

v10: Assigned the counter before exposing the event files.
    Moved the call rdtgroup_assign_cntrs() inside mkdir_rdt_prepare_rmid_alloc().
    This is called both CNTR_MON and MON group creation.
    Call mbm_cntr_reset() when unmounted to clear all the assignments.
    Taken care of few other feedback comments.

v9: Changed rdtgroup_assign_cntrs() and rdtgroup_unassign_cntrs() to return void.
    Updated couple of rdtgroup_unassign_cntrs() calls properly.
    Updated function comments.

v8: Renamed rdtgroup_assign_grp to rdtgroup_assign_cntrs.
    Renamed rdtgroup_unassign_grp to rdtgroup_unassign_cntrs.
    Fixed the problem with unassigning the child MON groups of CTRL_MON group.

v7: Reworded the commit message.
    Removed the reference of ABMC with mbm_cntr_assign.
    Renamed the function rdtgroup_assign_cntrs to rdtgroup_assign_grp.

v6: Removed the redundant comments on all the calls of
    rdtgroup_assign_cntrs. Updated the commit message.
    Dropped printing error message on every call of rdtgroup_assign_cntrs.

v5: Removed the code to enable/disable ABMC during the mount.
    That will be another patch.
    Added arch callers to get the arch specific data.
    Renamed fuctions to match the other abmc function.
    Added code comments for assignment failures.

v4: Few name changes based on the upstream discussion.
    Commit message update.

v3: This is a new patch. Patch addresses the upstream comment to enable
    ABMC feature by default if the feature is available.
---
 arch/x86/kernel/cpu/resctrl/monitor.c |  1 +
 fs/resctrl/rdtgroup.c                 | 65 ++++++++++++++++++++++++++-
 2 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 36a03dae6d8e..c3e15f4de0b4 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -435,6 +435,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 		r->mon.mbm_cntr_assignable = true;
 		cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
 		r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
+		r->mon.mbm_assign_on_mkdir = true;
 	}
 
 	r->mon_capable = true;
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 0093b323d858..931ea355f159 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2946,6 +2946,49 @@ static void schemata_list_destroy(void)
 	}
 }
 
+/*
+ * Called when a new group is created. If "mbm_cntr_assign" mode is enabled,
+ * counters are automatically assigned. Each group can accommodate two counters:
+ * one for the total event and one for the local event. Assignments may fail
+ * due to the limited number of counters. However, it is not necessary to fail
+ * the group creation and thus no failure is returned. Users have the option
+ * to modify the counter assignments after the group has been created.
+ */
+static void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
+{
+	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
+
+	if (!r->mon_capable)
+		return;
+
+	if (resctrl_arch_mbm_cntr_assign_enabled(r) && !r->mon.mbm_assign_on_mkdir)
+		return;
+
+	if (resctrl_arch_is_mbm_total_enabled())
+		resctrl_assign_cntr_event(r, NULL, rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
+
+	if (resctrl_arch_is_mbm_local_enabled())
+		resctrl_assign_cntr_event(r, NULL, rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
+}
+
+/*
+ * Called when a group is deleted. Counters are unassigned if it was in
+ * assigned state.
+ */
+static void rdtgroup_unassign_cntrs(struct rdtgroup *rdtgrp)
+{
+	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
+
+	if (!r->mon_capable || !resctrl_arch_mbm_cntr_assign_enabled(r))
+		return;
+
+	if (resctrl_arch_is_mbm_total_enabled())
+		resctrl_unassign_cntr_event(r, NULL, rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
+
+	if (resctrl_arch_is_mbm_local_enabled())
+		resctrl_unassign_cntr_event(r, NULL, rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
+}
+
 static int rdt_get_tree(struct fs_context *fc)
 {
 	struct rdt_fs_context *ctx = rdt_fc2context(fc);
@@ -3002,6 +3045,8 @@ static int rdt_get_tree(struct fs_context *fc)
 		if (ret < 0)
 			goto out_info;
 
+		rdtgroup_assign_cntrs(&rdtgroup_default);
+
 		ret = mkdir_mondata_all(rdtgroup_default.kn,
 					&rdtgroup_default, &kn_mondata);
 		if (ret < 0)
@@ -3040,8 +3085,10 @@ static int rdt_get_tree(struct fs_context *fc)
 	if (resctrl_arch_mon_capable())
 		kernfs_remove(kn_mondata);
 out_mongrp:
-	if (resctrl_arch_mon_capable())
+	if (resctrl_arch_mon_capable()) {
+		rdtgroup_unassign_cntrs(&rdtgroup_default);
 		kernfs_remove(kn_mongrp);
+	}
 out_info:
 	kernfs_remove(kn_info);
 out_closid_exit:
@@ -3187,6 +3234,7 @@ static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp)
 
 	head = &rdtgrp->mon.crdtgrp_list;
 	list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) {
+		rdtgroup_unassign_cntrs(sentry);
 		free_rmid(sentry->closid, sentry->mon.rmid);
 		list_del(&sentry->mon.crdtgrp_list);
 
@@ -3227,6 +3275,8 @@ static void rmdir_all_sub(void)
 		cpumask_or(&rdtgroup_default.cpu_mask,
 			   &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);
 
+		rdtgroup_unassign_cntrs(rdtgrp);
+
 		free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
 
 		kernfs_remove(rdtgrp->kn);
@@ -3311,6 +3361,7 @@ static void resctrl_fs_teardown(void)
 		return;
 
 	rmdir_all_sub();
+	rdtgroup_unassign_cntrs(&rdtgroup_default);
 	mon_put_kn_priv();
 	rdt_pseudo_lock_release();
 	rdtgroup_default.mode = RDT_MODE_SHAREABLE;
@@ -3792,9 +3843,12 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
 	}
 	rdtgrp->mon.rmid = ret;
 
+	rdtgroup_assign_cntrs(rdtgrp);
+
 	ret = mkdir_mondata_all(rdtgrp->kn, rdtgrp, &rdtgrp->mon.mon_data_kn);
 	if (ret) {
 		rdt_last_cmd_puts("kernfs subdir error\n");
+		rdtgroup_unassign_cntrs(rdtgrp);
 		free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
 		return ret;
 	}
@@ -3804,8 +3858,10 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
 
 static void mkdir_rdt_prepare_rmid_free(struct rdtgroup *rgrp)
 {
-	if (resctrl_arch_mon_capable())
+	if (resctrl_arch_mon_capable()) {
+		rdtgroup_unassign_cntrs(rgrp);
 		free_rmid(rgrp->closid, rgrp->mon.rmid);
+	}
 }
 
 /*
@@ -4079,6 +4135,9 @@ static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
 	update_closid_rmid(tmpmask, NULL);
 
 	rdtgrp->flags = RDT_DELETED;
+
+	rdtgroup_unassign_cntrs(rdtgrp);
+
 	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
 
 	/*
@@ -4126,6 +4185,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
 	cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
 	update_closid_rmid(tmpmask, NULL);
 
+	rdtgroup_unassign_cntrs(rdtgrp);
+
 	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
 	closid_free(rdtgrp->closid);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v13 23/27] x86/resctrl: Introduce mbm_L3_assignments to list assignments in a group
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (21 preceding siblings ...)
  2025-05-15 22:52 ` [PATCH v13 22/27] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled Babu Moger
@ 2025-05-15 22:52 ` Babu Moger
  2025-05-23  4:47   ` Reinette Chatre
  2025-05-15 22:52 ` [PATCH v13 24/27] x86/resctrl: Introduce the interface to modify " Babu Moger
                   ` (5 subsequent siblings)
  28 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:52 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

Introduce the interface to display the assignment states for each group
when mbm_cntr_assign mode is enabled.

The list is displayed in the following format:
<Event configuration>:<Domain id>=<Assignment type>

Event configuration: A valid event configuration listed in the
/sys/fs/resctrl/info/L3_MON/counter_configs directory.

Domain ID: A valid domain ID number.

The assignment type can be one of the following:

_ : No event configuration assigned

e : Event configuration assigned in exclusive mode

Example:
$cd /sys/fs/resctrl
$cat mbm_L3_assignments
mbm_total_bytes:0=e;1=e
mbm_local_bytes:0=e;1=e

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v13: Changelog update.
     Few changes in mbm_L3_assignments_show() after moving the event config to evt_list.
     Resolved conflicts caused by the recent FS/ARCH code restructure.
     The rdtgroup.c/monitor.c files have been split between the FS and ARCH directories.

v12: New patch:
     Assignment interface moved inside the group based the discussion
     https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/#t
---
 Documentation/filesystems/resctrl.rst | 28 +++++++++++++++
 fs/resctrl/monitor.c                  |  1 +
 fs/resctrl/rdtgroup.c                 | 52 +++++++++++++++++++++++++++
 3 files changed, 81 insertions(+)

diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index 356f1f918a86..2350c1f21f4e 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -504,6 +504,34 @@ When the "mba_MBps" mount option is used all CTRL_MON groups will also contain:
 	/sys/fs/resctrl/info/L3_MON/mon_features changes the input
 	event.
 
+"mbm_L3_assignments":
+	This interface file is created when the mbm_cntr_assign mode is supported
+	and shows the assignment status for each group.
+
+	The assignment list is displayed in the following format:
+
+	<Event configuration>:<Domain id>=<Assignment type>
+
+	Event configuration: A valid event configuration listed in the
+	/sys/fs/resctrl/info/L3_MON/counter_configs directory.
+
+	Domain ID: A valid domain ID number.
+
+	Assignment types:
+
+	_ : No event configuration assigned
+
+	e : Event configuration assigned in exclusive mode
+
+	Example:
+	To list the assignment states for the default group.
+	::
+
+	  # cd /sys/fs/resctrl
+	  # cat mbm_L3_assignments
+	    mbm_total_bytes:0=e;1=e
+	    mbm_local_bytes:0=e;1=e
+
 Resource allocation rules
 -------------------------
 
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 5f6c4b662f3b..b982540ce4e3 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -935,6 +935,7 @@ int resctrl_mon_resource_init(void)
 		resctrl_file_fflags_init("event_filter", RFTYPE_ASSIGN_CONFIG);
 		resctrl_file_fflags_init("mbm_assign_on_mkdir", RFTYPE_MON_INFO |
 					 RFTYPE_RES_CACHE);
+		resctrl_file_fflags_init("mbm_L3_assignments", RFTYPE_MON_BASE);
 	}
 
 	return 0;
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 931ea355f159..8d970b99bbbd 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2080,6 +2080,52 @@ static ssize_t resctrl_mbm_assign_on_mkdir_write(struct kernfs_open_file *of,
 	return ret ?: nbytes;
 }
 
+static int mbm_L3_assignments_show(struct kernfs_open_file *of, struct seq_file *s, void *v)
+{
+	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
+	struct rdt_mon_domain *d;
+	struct rdtgroup *rdtgrp;
+	struct mon_evt *mevt;
+	int ret = 0;
+	bool sep;
+
+	rdtgrp = rdtgroup_kn_lock_live(of->kn);
+	if (!rdtgrp)
+		return -ENOENT;
+
+	rdt_last_cmd_clear();
+	if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
+		rdt_last_cmd_puts("mbm_cntr_assign mode not enabled\n");
+		ret = -ENOENT;
+		goto assign_out;
+	}
+
+	list_for_each_entry(mevt, &r->mon.evt_list, list) {
+		if (mevt->mbm_mode != MBM_MODE_ASSIGN)
+			continue;
+
+		sep = false;
+		seq_printf(s, "%s:", mevt->name);
+		list_for_each_entry(d, &r->mon_domains, hdr.list) {
+			if (sep)
+				seq_putc(s, ';');
+
+			if (mbm_cntr_get(r, d, rdtgrp, mevt->evtid) >= 0)
+				seq_printf(s, "%d=e", d->hdr.id);
+			else
+				seq_printf(s, "%d=_", d->hdr.id);
+
+			sep = true;
+		}
+		seq_putc(s, '\n');
+	}
+
+assign_out:
+	rdtgroup_kn_unlock(of->kn);
+
+	return ret;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
@@ -2218,6 +2264,12 @@ static struct rftype res_common_files[] = {
 		.seq_show	= event_filter_show,
 		.write		= event_filter_write,
 	},
+	{
+		.name		= "mbm_L3_assignments",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= mbm_L3_assignments_show,
+	},
 	{
 		.name		= "mbm_assign_mode",
 		.mode		= 0444,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v13 24/27] x86/resctrl: Introduce the interface to modify assignments in a group
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (22 preceding siblings ...)
  2025-05-15 22:52 ` [PATCH v13 23/27] x86/resctrl: Introduce mbm_L3_assignments to list assignments in a group Babu Moger
@ 2025-05-15 22:52 ` Babu Moger
  2025-05-26  9:48   ` Peter Newman
  2025-05-15 22:52 ` [PATCH v13 25/27] x86/resctrl: Hide the BMEC related files when mbm_cnt_assign is enabled Babu Moger
                   ` (4 subsequent siblings)
  28 siblings, 1 reply; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:52 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

Introduce an interface to modify assignments within a group.

Modifications follow this format:
<Event configuration>:<Domain id>=<Assignment type>

The assignment type can be one of the following:

_ : No event configuration assigned

e : Event configuration assigned in exclusive mode

Domain id can be any valid domain ID number or '*' to update all the
domains.

Example:
$cd /sys/fs/resctrl
$cat mbm_L3_assignments
mbm_total_bytes:0=e;1=e
mbm_local_bytes:0=e;1=e

To unassign the configuration of mbm_total_bytes on domain 0:

$echo "mbm_total_bytes:0=_" > mbm_L3_assignments
$cat mbm_L3_assignments
mbm_total_bytes:0=_;1=e
mbm_local_bytes:0=e;1=e

To unassign the mbm_total_bytes configuration on all domains:

$echo "mbm_total_bytes:*=_" > mbm_L3_assignments
$cat mbm_L3_assignments
mbm_total_bytes:0=_;1=_
mbm_local_bytes:0=e;1=e

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v13: Few changes in mbm_L3_assignments_write() after moving the event config to evt_list.
     Resolved conflicts caused by the recent FS/ARCH code restructure.

v12: New patch:
     Assignment interface moved inside the group based the discussion
     https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/#t
---
 Documentation/filesystems/resctrl.rst |  29 ++++-
 fs/resctrl/internal.h                 |   9 ++
 fs/resctrl/rdtgroup.c                 | 165 +++++++++++++++++++++++++-
 3 files changed, 201 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index 2350c1f21f4e..d779554a2f91 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -515,7 +515,7 @@ When the "mba_MBps" mount option is used all CTRL_MON groups will also contain:
 	Event configuration: A valid event configuration listed in the
 	/sys/fs/resctrl/info/L3_MON/counter_configs directory.
 
-	Domain ID: A valid domain ID number.
+	Domain ID: A valid domain ID number or '*' to update all the domains.
 
 	Assignment types:
 
@@ -532,6 +532,33 @@ When the "mba_MBps" mount option is used all CTRL_MON groups will also contain:
 	    mbm_total_bytes:0=e;1=e
 	    mbm_local_bytes:0=e;1=e
 
+	Modify the assignment states by writing to the interface file.
+
+	Example:
+	To unassign the configuration of mbm_total_bytes on domain 0:
+	::
+
+	 # echo "mbm_total_bytes:0=_" > mbm_L3_assignments
+	 # cat mbm_L3_assignments
+	 mbm_total_bytes:0=_;1=e
+	 mbm_local_bytes:0=e;1=e
+
+	To unassign the mbm_total_bytes configuration on all domains:
+	::
+
+	 # echo "mbm_total_bytes:*=_" > mbm_L3_assignments
+	 # cat mbm_L3_assignments
+	 mbm_total_bytes:0=_;1=_
+	 mbm_local_bytes:0=e;1=e
+
+	To assign the mbm_total_bytes configuration on all domains in exclusive mode:
+	::
+
+	 # echo "mbm_total_bytes:*=e" > mbm_L3_assignments
+	 # cat mbm_L3_assignments
+	 mbm_total_bytes:0=e;1=e
+	 mbm_local_bytes:0=e;1=e
+
 Resource allocation rules
 -------------------------
 
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 446cc9cc61df..a6069a5dfd49 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -51,6 +51,15 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
 	return container_of(kfc, struct rdt_fs_context, kfc);
 }
 
+/*
+ * Assignment types for mbm_cntr_assign mode
+ */
+enum {
+	ASSIGN_NONE		= 0,
+	ASSIGN_EXCLUSIVE,
+	ASSIGN_INVALID,
+};
+
 /**
  * struct mon_evt - Entry in the event list of a resource
  * @evtid:		event id
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 8d970b99bbbd..ea1782723f81 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2126,6 +2126,168 @@ static int mbm_L3_assignments_show(struct kernfs_open_file *of, struct seq_file
 	return ret;
 }
 
+/*
+ * mbm_get_mon_event_by_name() - Return the mon_evt entry for the matching
+ * event name.
+ */
+static struct mon_evt *mbm_get_mon_event_by_name(struct rdt_resource *r,
+						 char *name)
+{
+	struct mon_evt *mevt;
+
+	list_for_each_entry(mevt, &r->mon.evt_list, list) {
+		if (!strcmp(mevt->name, name))
+			return mevt;
+	}
+
+	return NULL;
+}
+
+static unsigned int resctrl_get_assing_type(char *assign)
+{
+	unsigned int mon_state = ASSIGN_NONE;
+	int len = strlen(assign);
+
+	if (!len || len > 1)
+		return ASSIGN_INVALID;
+
+	switch (*assign) {
+	case 'e':
+		mon_state = ASSIGN_EXCLUSIVE;
+		break;
+	case '_':
+		mon_state = ASSIGN_NONE;
+		break;
+	default:
+		mon_state = ASSIGN_INVALID;
+		break;
+	}
+
+	return mon_state;
+}
+
+static int resctrl_process_assign(struct rdt_resource *r, struct rdtgroup *rdtgrp,
+				  char *config, char *tok)
+{
+	struct rdt_mon_domain *d;
+	char *dom_str, *id_str;
+	unsigned long dom_id = 0;
+	struct mon_evt *mevt;
+	int assign_type;
+	char domain[10];
+	bool found;
+	int ret;
+
+	mevt = mbm_get_mon_event_by_name(r, config);
+	if (!mevt) {
+		rdt_last_cmd_printf("Invalid assign configuration %s\n", config);
+		return  -ENOENT;
+	}
+
+next:
+	if (!tok || tok[0] == '\0')
+		return 0;
+
+	/* Start processing the strings for each domain */
+	dom_str = strim(strsep(&tok, ";"));
+
+	id_str = strsep(&dom_str, "=");
+
+	/* Check for domain id '*' which means all domains */
+	if (id_str && *id_str == '*') {
+		d = NULL;
+		goto check_state;
+	} else if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
+		rdt_last_cmd_puts("Missing domain id\n");
+		return -EINVAL;
+	}
+
+	/* Verify if the dom_id is valid */
+	found = false;
+	list_for_each_entry(d, &r->mon_domains, hdr.list) {
+		if (d->hdr.id == dom_id) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
+		return -EINVAL;
+	}
+
+check_state:
+	assign_type = resctrl_get_assing_type(dom_str);
+
+	switch (assign_type) {
+	case ASSIGN_NONE:
+		ret = resctrl_unassign_cntr_event(r, d, rdtgrp, mevt->evtid);
+		break;
+	case ASSIGN_EXCLUSIVE:
+		ret = resctrl_assign_cntr_event(r, d, rdtgrp, mevt->evtid);
+		break;
+	case ASSIGN_INVALID:
+		ret = -EINVAL;
+	}
+
+	if (ret)
+		goto out_fail;
+
+	goto next;
+
+out_fail:
+	sprintf(domain, d ? "%ld" : "*", dom_id);
+
+	rdt_last_cmd_printf("Assign operation '%s:%s=%s' failed\n", config, domain, dom_str);
+
+	return ret;
+}
+
+static ssize_t mbm_L3_assignments_write(struct kernfs_open_file *of, char *buf,
+					size_t nbytes, loff_t off)
+{
+	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
+	struct rdtgroup *rdtgrp;
+	char *token, *config;
+	int ret = 0;
+
+	/* Valid input requires a trailing newline */
+	if (nbytes == 0 || buf[nbytes - 1] != '\n')
+		return -EINVAL;
+
+	buf[nbytes - 1] = '\0';
+
+	rdtgrp = rdtgroup_kn_lock_live(of->kn);
+	if (!rdtgrp) {
+		rdtgroup_kn_unlock(of->kn);
+		return -ENOENT;
+	}
+	rdt_last_cmd_clear();
+
+	if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
+		rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
+		rdtgroup_kn_unlock(of->kn);
+		return -EINVAL;
+	}
+
+	while ((token = strsep(&buf, "\n")) != NULL) {
+		/*
+		 * The write command follows the following format:
+		 * “<Assign config>:<domain_id>=<assign mode>”
+		 * Extract Assign config first.
+		 */
+		config = strsep(&token, ":");
+
+		ret = resctrl_process_assign(r, rdtgrp, config, token);
+		if (ret)
+			break;
+	}
+
+	rdtgroup_kn_unlock(of->kn);
+
+	return ret ?: nbytes;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
@@ -2266,9 +2428,10 @@ static struct rftype res_common_files[] = {
 	},
 	{
 		.name		= "mbm_L3_assignments",
-		.mode		= 0444,
+		.mode		= 0644,
 		.kf_ops		= &rdtgroup_kf_single_ops,
 		.seq_show	= mbm_L3_assignments_show,
+		.write		= mbm_L3_assignments_write,
 	},
 	{
 		.name		= "mbm_assign_mode",
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v13 25/27] x86/resctrl: Hide the BMEC related files when mbm_cnt_assign is enabled
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (23 preceding siblings ...)
  2025-05-15 22:52 ` [PATCH v13 24/27] x86/resctrl: Introduce the interface to modify " Babu Moger
@ 2025-05-15 22:52 ` Babu Moger
  2025-05-15 22:52 ` [PATCH v13 26/27] x86/resctrl: Introduce the interface to switch between monitor modes Babu Moger
                   ` (3 subsequent siblings)
  28 siblings, 0 replies; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:52 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

BMEC (Bandwidth Monitoring Event Configuration) and mbm_cntr_assign cannot
be used simultaneously.

When mbm_cntr_assign is active, suppress visibility of BMEC-related files
to prevent confusion.

The files /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config and
/sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config will not be visible
when mbm_cntr_assign mode is enabled.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v13: New patch to hide BMEC related files.
---
 fs/resctrl/rdtgroup.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index ea1782723f81..d6bf2a50a105 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1815,6 +1815,33 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
 	return ret ?: nbytes;
 }
 
+static void resctrl_bmec_files_show(struct rdt_resource *r, bool show)
+{
+	struct kernfs_node *kn_config, *l3_mon_kn;
+	char name[32];
+
+	sprintf(name, "%s_MON", r->name);
+	l3_mon_kn = kernfs_find_and_get(kn_info, name);
+	if (!l3_mon_kn)
+		return;
+
+	kn_config = kernfs_find_and_get(l3_mon_kn, "mbm_total_bytes_config");
+	if (kn_config) {
+		kernfs_get(kn_config);
+		kernfs_show(kn_config, show);
+		kernfs_put(kn_config);
+	}
+
+	kn_config = kernfs_find_and_get(l3_mon_kn, "mbm_local_bytes_config");
+	if (kn_config) {
+		kernfs_get(kn_config);
+		kernfs_show(kn_config, show);
+		kernfs_put(kn_config);
+	}
+
+	kernfs_put(l3_mon_kn);
+}
+
 static int resctrl_mbm_assign_mode_show(struct kernfs_open_file *of,
 					struct seq_file *s, void *v)
 {
@@ -2815,6 +2842,10 @@ static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
 			ret = resctrl_mkdir_counter_configs(r, name);
 			if (ret)
 				goto out_destroy;
+
+			/* Hide BMEC related files if mbm_cntr_assign is enabled */
+			if (resctrl_arch_mbm_cntr_assign_enabled(r))
+				resctrl_bmec_files_show(r, false);
 		}
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v13 26/27] x86/resctrl: Introduce the interface to switch between monitor modes
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (24 preceding siblings ...)
  2025-05-15 22:52 ` [PATCH v13 25/27] x86/resctrl: Hide the BMEC related files when mbm_cnt_assign is enabled Babu Moger
@ 2025-05-15 22:52 ` Babu Moger
  2025-05-15 22:52 ` [PATCH v13 27/27] x86/resctrl: Configure mbm_cntr_assign mode if supported Babu Moger
                   ` (2 subsequent siblings)
  28 siblings, 0 replies; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:52 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

Resctrl subsystem can support two monitoring modes, "mbm_cntr_assign" or
"default". In mbm_cntr_assign, monitoring event can only accumulate data
while it is backed by a hardware counter. In "default" mode, resctrl
assumes there is a hardware counter for each event within every CTRL_MON
and MON group.

Introduce interface to switch between mbm_cntr_assign and default modes.

$ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
[mbm_cntr_assign]
default

To enable the "mbm_cntr_assign" monitoring mode:
$ echo "mbm_cntr_assign" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode

To enable the "default" monitoring mode:
$ echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode

MBM event counters are automatically reset as part of changing the mode.
Clear both architectural and non-architectural event states to prevent
overflow conditions during the next event read.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v13: Resolved the conflicts due to FS/ARCH restructure.
     Introduced the new resctrl_init_evt_configuration() to initialize
     the event modes and configuration values.
     Added the call to resctrl_bmec_files_show() hide/show BMEC related
     files.

v12: Fixed the documentation for a consistency.
     Introduced mbm_cntr_free_all() and resctrl_reset_rmid_all() to clear
     counters and non-architectural states when monitor mode is changed.
     https://lore.kernel.org/lkml/b60b4f72-6245-46db-a126-428fb13b6310@intel.com/

v11: Changed the name of the function rdtgroup_mbm_assign_mode_write() to
     resctrl_mbm_assign_mode_write().
     Rewrote the commit message with context.
     Added few more details in resctrl.rst about mbm_cntr_assign mode.
     Re-arranged the text in resctrl.rst file.

v10: The call mbm_cntr_reset() has been moved to earlier patch.
     Minor documentation update.

v9: Fixed extra spaces in user documentation.
    Fixed problem changing the mode to mbm_cntr_assign mode when it is
    not supported. Added extra checks to detect if systems supports it.
    Used the rdtgroup_cntr_id_init to initialize cntr_id.

v8: Reset the internal counters after mbm_cntr_assign mode is changed.
    Renamed rdtgroup_mbm_cntr_reset() to mbm_cntr_reset()
    Updated the documentation to make text generic.

v7: Changed the interface name to mbm_assign_mode.
    Removed the references of ABMC.
    Added the changes to reset global and domain bitmaps.
    Added the changes to reset rmid.

v6: Changed the mode name to mbm_cntr_assign.
    Moved all the FS related code here.
    Added changes to reset mbm_cntr_map and resctrl group counters.

v5: Change log and mode description text correction.

v4: Minor commit text changes. Keep the default to ABMC when supported.
    Fixed comments to reflect changed interface "mbm_mode".

v3: New patch to address the review comments from upstream.
---
 Documentation/filesystems/resctrl.rst | 25 ++++++++++-
 fs/resctrl/internal.h                 |  3 ++
 fs/resctrl/monitor.c                  | 53 +++++++++++++++++++---
 fs/resctrl/rdtgroup.c                 | 65 ++++++++++++++++++++++++++-
 4 files changed, 138 insertions(+), 8 deletions(-)

diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index d779554a2f91..7c304821ce93 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -259,7 +259,10 @@ with the following files:
 
 "mbm_assign_mode":
 	Reports the list of monitoring modes supported. The enclosed brackets
-	indicate which mode is enabled.
+	indicate which mode is enabled. The MBM events (mbm_total_bytes and/or
+	mbm_local_bytes) associated with counters may reset when "mbm_assign_mode"
+	is changed.
+
 	::
 
 	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
@@ -275,6 +278,16 @@ with the following files:
 	"num_mbm_cntrs" file. Changing the mode may cause all counters on the
 	resource to reset.
 
+	Moving to mbm_cntr_assign mode require users to assign the counters to
+	the events. Otherwise, the MBM event counters will return 'Unassigned'
+	when read.
+
+	The mode is beneficial for AMD platforms that support more CTRL_MON
+	and MON groups than available hardware counters. By default, this
+	feature is enabled on AMD platforms with the ABMC (Assignable Bandwidth
+	Monitoring Counters) capability, ensuring counters remain assigned even
+	when the corresponding RMID is not actively used by any processor.
+
 	"default":
 
 	In default mode, resctrl assumes there is a hardware counter for each
@@ -284,6 +297,16 @@ with the following files:
 	counters. This can result in misleading values or display "Unavailable"
 	if no counter is assigned to the event.
 
+	* To enable "mbm_cntr_assign" monitoring mode:
+	  ::
+
+	    # echo "mbm_cntr_assign" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
+
+	* To enable "default" monitoring mode:
+	  ::
+
+	    # echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
+
 "num_mbm_cntrs":
 	The maximum number of monitoring counters (total of available and assigned
 	counters) in each domain when the system supports mbm_cntr_assign mode.
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index a6069a5dfd49..d5edb28a8df7 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -404,6 +404,9 @@ int resctrl_unassign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d
 				struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
 int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
 		 struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
+void resctrl_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d);
+void mbm_cntr_free_all(struct rdt_resource *r, struct rdt_mon_domain *d);
+void resctrl_init_evt_configuration(struct rdt_resource *r, bool enable);
 
 #ifdef CONFIG_RESCTRL_FS_PSEUDO_LOCK
 int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index b982540ce4e3..bebe83cf48d5 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -911,16 +911,13 @@ int resctrl_mon_resource_init(void)
 
 	l3_mon_evt_init(r);
 
-	if (resctrl_arch_is_evt_configurable(QOS_L3_MBM_TOTAL_EVENT_ID)) {
-		mbm_total_event.mbm_mode = MBM_MODE_BMEC;
+	if (resctrl_arch_is_evt_configurable(QOS_L3_MBM_TOTAL_EVENT_ID))
 		resctrl_file_fflags_init("mbm_total_bytes_config",
 					 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
-	}
-	if (resctrl_arch_is_evt_configurable(QOS_L3_MBM_LOCAL_EVENT_ID)) {
-		mbm_local_event.mbm_mode = MBM_MODE_BMEC;
+
+	if (resctrl_arch_is_evt_configurable(QOS_L3_MBM_LOCAL_EVENT_ID))
 		resctrl_file_fflags_init("mbm_local_bytes_config",
 					 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
-	}
 
 	if (resctrl_arch_is_mbm_local_enabled())
 		mba_mbps_default_event = QOS_L3_MBM_LOCAL_EVENT_ID;
@@ -938,6 +935,8 @@ int resctrl_mon_resource_init(void)
 		resctrl_file_fflags_init("mbm_L3_assignments", RFTYPE_MON_BASE);
 	}
 
+	resctrl_init_evt_configuration(r, true);
+
 	return 0;
 }
 
@@ -1010,6 +1009,25 @@ static void mbm_cntr_free(struct rdt_mon_domain *d, int cntr_id)
 	memset(&d->cntr_cfg[cntr_id], 0, sizeof(struct mbm_cntr_cfg));
 }
 
+void mbm_cntr_free_all(struct rdt_resource *r, struct rdt_mon_domain *d)
+{
+	memset(d->cntr_cfg, 0, sizeof(*d->cntr_cfg) * r->mon.num_mbm_cntrs);
+}
+
+/*
+ * Reset all non-architecture states for all the supported RMIDs.
+ */
+void resctrl_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d)
+{
+	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
+
+	if (resctrl_arch_is_mbm_total_enabled())
+		memset(d->mbm_total, 0, sizeof(struct mbm_state) * idx_limit);
+
+	if (resctrl_arch_is_mbm_local_enabled())
+		memset(d->mbm_local, 0, sizeof(struct mbm_state) * idx_limit);
+}
+
 /*
  * mbm_get_mon_event() - Return the mon_evt entry for the matching evtid.
  */
@@ -1119,6 +1137,29 @@ static int resctrl_free_config_cntr(struct rdt_resource *r, struct rdt_mon_domai
 	return 0;
 }
 
+/*
+ * Initialize the event modes and configuration values.
+ *
+ * total event is set to count all the supported memory transactions.
+ * local event is set to count all the local memory transactions.
+ */
+void resctrl_init_evt_configuration(struct rdt_resource *r, bool enable)
+{
+	if (resctrl_arch_mbm_cntr_assign_enabled(r)) {
+		mbm_total_event.mbm_mode = MBM_MODE_ASSIGN;
+		mbm_total_event.evt_cfg = MAX_EVT_CONFIG_BITS;
+		mbm_local_event.mbm_mode = MBM_MODE_ASSIGN;
+		mbm_local_event.evt_cfg = READS_TO_LOCAL_MEM |
+					  NON_TEMP_WRITE_TO_LOCAL_MEM |
+					  READS_TO_LOCAL_S_MEM;
+	} else {
+		if (resctrl_arch_is_evt_configurable(QOS_L3_MBM_TOTAL_EVENT_ID))
+			mbm_total_event.mbm_mode = MBM_MODE_BMEC;
+		if (resctrl_arch_is_evt_configurable(QOS_L3_MBM_LOCAL_EVENT_ID))
+			mbm_local_event.mbm_mode = MBM_MODE_BMEC;
+	}
+}
+
 /*
  * Unassign a hardware counter associated with @evtid from the domain and
  * the group. Unassign the counters from all the domains if @d is NULL else
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index d6bf2a50a105..c76d598e4d23 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1872,6 +1872,68 @@ static int resctrl_mbm_assign_mode_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static ssize_t resctrl_mbm_assign_mode_write(struct kernfs_open_file *of,
+					     char *buf, size_t nbytes, loff_t off)
+{
+	struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
+	struct rdt_mon_domain *d;
+	int ret = 0;
+	bool enable;
+
+	/* Valid input requires a trailing newline */
+	if (nbytes == 0 || buf[nbytes - 1] != '\n')
+		return -EINVAL;
+
+	buf[nbytes - 1] = '\0';
+
+	cpus_read_lock();
+	mutex_lock(&rdtgroup_mutex);
+
+	rdt_last_cmd_clear();
+
+	if (!strcmp(buf, "default")) {
+		enable = 0;
+	} else if (!strcmp(buf, "mbm_cntr_assign")) {
+		if (r->mon.mbm_cntr_assignable) {
+			enable = 1;
+		} else {
+			ret = -EINVAL;
+			rdt_last_cmd_puts("mbm_cntr_assign mode is not supported\n");
+			goto write_exit;
+		}
+	} else {
+		ret = -EINVAL;
+		rdt_last_cmd_puts("Unsupported assign mode\n");
+		goto write_exit;
+	}
+
+	if (enable != resctrl_arch_mbm_cntr_assign_enabled(r)) {
+		ret = resctrl_arch_mbm_cntr_assign_set(r, enable);
+		if (ret)
+			goto write_exit;
+
+		/* Initialize event configuration details accordingly */
+		resctrl_init_evt_configuration(r, enable);
+
+		/* Update the visibility of BMEC related files */
+		resctrl_bmec_files_show(r, !enable);
+
+		/*
+		 * Reset all the non-achitectural RMID state and assignable counters.
+		 */
+		list_for_each_entry(d, &r->mon_domains, hdr.list) {
+			mbm_cntr_free_all(r, d);
+			resctrl_reset_rmid_all(r, d);
+		}
+	}
+
+write_exit:
+	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
+
+	return ret ?: nbytes;
+}
+
 static int resctrl_num_mbm_cntrs_show(struct kernfs_open_file *of,
 				      struct seq_file *s, void *v)
 {
@@ -2462,9 +2524,10 @@ static struct rftype res_common_files[] = {
 	},
 	{
 		.name		= "mbm_assign_mode",
-		.mode		= 0444,
+		.mode		= 0644,
 		.kf_ops		= &rdtgroup_kf_single_ops,
 		.seq_show	= resctrl_mbm_assign_mode_show,
+		.write		= resctrl_mbm_assign_mode_write,
 		.fflags		= RFTYPE_MON_INFO | RFTYPE_RES_CACHE,
 	},
 	{
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v13 27/27] x86/resctrl: Configure mbm_cntr_assign mode if supported
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (25 preceding siblings ...)
  2025-05-15 22:52 ` [PATCH v13 26/27] x86/resctrl: Introduce the interface to switch between monitor modes Babu Moger
@ 2025-05-15 22:52 ` Babu Moger
  2025-05-19 15:59 ` [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Peter Newman
  2025-05-22 20:44 ` Reinette Chatre
  28 siblings, 0 replies; 114+ messages in thread
From: Babu Moger @ 2025-05-15 22:52 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, babu.moger, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, peternewman, maciej.wieczor-retman,
	eranian, Xiaojian.Du, gautham.shenoy

Configure mbm_cntr_assign mode on AMD platforms. On AMD platforms, it is
recommended to use the mbm_cntr_assign mode, if supported, to prevent the
hardware from resetting counters between reads. This can result in
misleading values or display "Unavailable" if no counter is assigned to
the event.

The mbm_cntr_assign mode, referred to as ABMC (Assignable Bandwidth
Monitoring Counters) on AMD, is enabled by default when supported by the
system.

Update ABMC across all logical processors within the resctrl domain to
ensure proper functionality.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v13 : Added the call resctrl_init_evt_configuration() to setup the event
      configuration during init.
      Resolved conflicts caused by the recent FS/ARCH code restructure.

v12: Moved the resctrl_arch_mbm_cntr_assign_set_one to domain_add_cpu_mon().
     Updated the commit log.

v11: Commit text in imperative tone. Added few more details.
     Moved resctrl_arch_mbm_cntr_assign_set_one() to monitor.c.

v10: Commit text in imperative tone.

v9: Minor code change due to merge. Actual code did not change.

v8: Renamed resctrl_arch_mbm_cntr_assign_configure to
        resctrl_arch_mbm_cntr_assign_set_one.
    Adde r->mon_capable check.
    Commit message update.

v7: Introduced resctrl_arch_mbm_cntr_assign_configure() to configure.
    Moved the default settings to rdt_get_mon_l3_config(). It should be
    done before the hotplug handler is called. It cannot be done at
    rdtgroup_init().

v6: Keeping the default enablement in arch init code for now.
     This may need some discussion.
     Renamed resctrl_arch_configure_abmc to resctrl_arch_mbm_cntr_assign_configure.

v5: New patch to enable ABMC by default.
---
 arch/x86/kernel/cpu/resctrl/core.c     | 7 +++++++
 arch/x86/kernel/cpu/resctrl/internal.h | 1 +
 arch/x86/kernel/cpu/resctrl/monitor.c  | 8 ++++++++
 3 files changed, 16 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 6859566398d6..b59f5db96016 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -514,6 +514,9 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
 		d = container_of(hdr, struct rdt_mon_domain, hdr);
 
 		cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
+		/* Update the mbm_cntr_assign state for the CPU if supported */
+		if (r->mon.mbm_cntr_assignable)
+			resctrl_arch_mbm_cntr_assign_set_one(r);
 		return;
 	}
 
@@ -532,6 +535,10 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
 	}
 	cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
 
+	/* Update the mbm_cntr_assign state for the CPU if supported */
+	if (r->mon.mbm_cntr_assignable)
+		resctrl_arch_mbm_cntr_assign_set_one(r);
+
 	arch_mon_domain_online(r, d);
 
 	if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 3b0cdb5520c7..85ebf60a9f1c 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -214,5 +214,6 @@ bool rdt_cpu_has(int flag);
 void __init intel_rdt_mbm_apply_quirk(void);
 
 void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
+void resctrl_arch_mbm_cntr_assign_set_one(struct rdt_resource *r);
 
 #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index c3e15f4de0b4..51a99b8e69d6 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -436,6 +436,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 		cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
 		r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
 		r->mon.mbm_assign_on_mkdir = true;
+		hw_res->mbm_cntr_assign_enabled = true;
 	}
 
 	r->mon_capable = true;
@@ -536,3 +537,10 @@ void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
 			memset(am, 0, sizeof(*am));
 	}
 }
+
+void resctrl_arch_mbm_cntr_assign_set_one(struct rdt_resource *r)
+{
+	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+
+	resctrl_abmc_set_one_amd(&hw_res->mbm_cntr_assign_enabled);
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (26 preceding siblings ...)
  2025-05-15 22:52 ` [PATCH v13 27/27] x86/resctrl: Configure mbm_cntr_assign mode if supported Babu Moger
@ 2025-05-19 15:59 ` Peter Newman
  2025-05-20 15:28   ` Moger, Babu
  2025-05-22 20:44 ` Reinette Chatre
  28 siblings, 1 reply; 114+ messages in thread
From: Peter Newman @ 2025-05-19 15:59 UTC (permalink / raw)
  To: Babu Moger
  Cc: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen,
	james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, maciej.wieczor-retman, eranian, Xiaojian.Du,
	gautham.shenoy

Hi Babu,

On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@amd.com> wrote:
>
>
> This series adds the support for Assignable Bandwidth Monitoring Counters
> (ABMC). It is also called QoS RMID Pinning feature
>
> Series is written such that it is easier to support other assignable
> features supported from different vendors.
>
> The feature details are documented in the  APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC). The documentation is available at
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>
> The patches are based on top of commit
> 92a09c47464d0 (tag: v6.15-rc5, tip/irq/merge) Linux 6.15-rc5
> plus
> https://lore.kernel.org/lkml/20250515165855.31452-1-james.morse@arm.com/
>
> It is very clear these patches will go after James's resctrl FS/ARCH
> restructure. Hoping to avoid one review cycle due to the merge.
>
> # Introduction
>
> Users can create as many monitor groups as RMIDs supported by the hardware.
> However, bandwidth monitoring feature on AMD system only guarantees that
> RMIDs currently assigned to a processor will be tracked by hardware.
> The counters of any other RMIDs which are no longer being tracked will be
> reset to zero. The MBM event counters return "Unavailable" for the RMIDs
> that are not tracked by hardware. So, there can be only limited number of
> groups that can give guaranteed monitoring numbers. With ever changing
> configurations there is no way to definitely know which of these groups
> are being tracked for certain point of time. Users do not have the option
> to monitor a group or set of groups for certain period of time without
> worrying about counter being reset in between.
>
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID, event pair and monitor the bandwidth as long as it is
> assigned.  The assigned RMID will be tracked by the hardware until the user
> unassigns it manually. There is no need to worry about counters being reset
> during this period. Additionally, the user can specify a bitmask identifying
> the specific bandwidth types from the given source to track with the counter.
>
> Without ABMC enabled, monitoring will work in current 'default' mode without
> assignment option.
>
> # History
>
> Earlier implementation of ABMC had dependancy on BMEC (Bandwidth Monitoring
> Event Configuration). Peter had concerns with that implementation because
> it may be not be compatible with ARM's MPAM.
>
> Here are the threads discussing the concerns and new interface to address the concerns.
> https://lore.kernel.org/lkml/CALPaoCg97cLVVAcacnarp+880xjsedEWGJPXhYpy4P7=ky4MZw@mail.gmail.com/
> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>
> Here are the finalized requirements based on the discussion:
>
> *   Remove BMEC dependency on the ABMC feature.
>
> *   Eliminate global assignment listing. The interface
>     /sys/fs/resctrl/info/L3_MON/mbm_assign_control is no longer required.
>
> *   Create the configuration directories at /sys/fs/resctrl/info/L3_MON/counter_configs/.
>     The configuration file names should be free-form, allowing users to create them as needed.
>
> *   Perform assignment listing at the group level by introducing mbm_L3_assignments
>     in each monitoring group. The listing should provide the following details:
>
>     Event Configuration: Specifies the event configuration applied. This will be crucial
>     when "mkdir" on event configuration is added in the future, leading to the creation
>     of mon_data/mon_l3_*/<event configuration>.
>
>     Domains: Identifies the domains where the configuration is applied, supporting multi-domain setups.
>
>     Assignment Type: Indicates whether the assignment is Exclusive (e or d), Shared (s), or Unassigned (_).
>
> *   Provide option to enable or disable auto assignment when new group is created.

So far I was able to reenable MBM on AMD implementations (for some
users) while deferring on the counter assignment interface discussion
by just making shared assignment the default for newly-created groups.
Until they want to upgrade assignments to exclusive or break down
traffic with multiple counters to watch a particular group more
closely, they won't need to change any assignments.

Just pointing out that this turned out to be a useful first step in
deploying ABMC support.

>
> This series tries to address all the requirements listed above.
>
> # Implementation details
>
> Create a generic interface aimed to support user space assignment of scarce
> counters used for monitoring. First usage of interface is by ABMC with option
> to expand usage to "soft-ABMC" and MPAM counters in future.

I'll try to identify any issues I've encountered with "soft-ABMC".
Hopefully I'll be able to share a sample implementation based on these
patches soon.

There's now more interest in Google for allowing explicit control of
where RMIDs are assigned on Intel platforms. Even though the number of
RMIDs implemented by hardware tends to be roughly the number of
containers they want to support, they often still need to create
containers when all RMIDs have already been allocated, which is not
currently allowed. Once the container has been created and starts
running, it's no longer possible to move its threads into a monitoring
group whenever RMIDs should become available again, so it's important
for resctrl to maintain an accurate task list for a container even
when RMIDs are not available.

>
> Feature adds following interface files:
>
> /sys/fs/resctrl/info/L3_MON/mbm_assign_mode: Reports the list of assignable
> monitoring features supported. The enclosed brackets indicate which
> feature is enabled.
>
> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
> counters available for assignment.

Earlier I discussed with Reinette[1] what num_mbm_cntrs should
represent in a "soft-ABMC" implementation where assignment is
implemented by assigning an RMID, which would result in all events
being assigned at once.

My main concern is how many "counters" you can assign by assigning
RMIDs. I recall Reinette proposed reporting the number of groups which
can be assigned separately from counters which can be assigned.

>
> /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs: Reports the number of monitoring
> counters free in each domain.
>
> /sys/fs/resctrl/info/L3_MON/counter_configs : Directory to hold the counter configuration.
>
> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter : Default configuration
> for MBM total events.
>
> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter : Default configuration
> for MBM local events.

IIUC, this needs to be implemented now so you can drop BMEC with this series?

>
> /sys/fs/resctrl/mbm_L3_assignments: Interface to list or modify assignment states on each group.
>
> # Examples
>
> a. Check if ABMC support is available
>         #mount -t resctrl resctrl /sys/fs/resctrl/
>
>         # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>         [mbm_cntr_assign]
>         default
>
>         ABMC feature is detected and it is enabled.
>
> b. Check how many ABMC counters are available.
>
>         # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
>         32
>
> c. Check how many ABMC counters are available in each domain.
>
>         # cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
>         0=30;1=30
>
> d. Check default counter configuration.
>
>         # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>         local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
>         local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all
>
>         # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>         local_reads, local_non_temporal_writes, local_reads_slow_memory
>
> e. Series adds a new interface file "mbm_L3_assignments" in each monitoring group
>    to list and modify any group's monitoring states.

To confirm, would we have "mbm_<resource_name>_assignments" for each
resource where MBM-ish events could be assigned?

>
>         The list is displayed in the following format:
>
>         <Event configuration>:<Domain id>=<Assignment type>

For soft-ABMC assignment, is there just a single event configuration
representing all the events tracked by the RMID?

>
>         Event configuration: A valid event configuration listed in the
>         /sys/fs/resctrl/info/L3_MON/counter_configs directory.
>
>         Domain ID: A valid domain ID number.
>
>         Assignment types:
>
>         _ : No event configuration assigned
>
>         e : Event configuration assigned in exclusive mode
>
>         To list the default group states:
>         # cat /sys/fs/resctrl/mbm_L3_assignments
>         mbm_total_bytes:0=e;1=e
>         mbm_local_bytes:0=e;1=e
>
>         To unassign the configuration of mbm_total_bytes on domain 0:
>         #echo "mbm_total_bytes:0=_" > mbm_L3_assignments
>         #cat mbm_L3_assignments
>         mbm_total_bytes:0=_;1=e
>         mbm_local_bytes:0=e;1=e
>
>         To unassign the mbm_total_bytes configuration on all domains:
>         $echo "mbm_total_bytes:*=_" > mbm_L3_assignments
>         $cat mbm_L3_assignments
>         mbm_total_bytes:0=_;1=_
>         mbm_local_bytes:0=e;1=e
>
>         To assign the mbm_total_bytes configuration on all domains in exclusive mode:
>         $echo "mbm_total_bytes:*=e" > mbm_L3_assignments
>         $cat mbm_L3_assignments
>         mbm_total_bytes:0=e;1=e
>         mbm_local_bytes:0=e;1=e
>
> g. Read the events mbm_total_bytes and mbm_local_bytes of the default group.
>    There is no change in reading the events with ABMC. If the event is unassigned
>    when reading, then the read will come back as "Unassigned".
>
>         # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>         779247936
>         # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>         765207488
>
> h. Check the default event configurations.
>
>         #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>         local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
>         local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all
>
>         #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>         local_reads, local_non_temporal_writes, local_reads_slow_memory

These look like the BMEC event names converted from camel case. Will
event filter programming be portable?

Thanks,
-Peter


[1] https://lore.kernel.org/lkml/b3babdac-da08-4dfd-9544-47db31d574f5@intel.com/

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-19 15:59 ` [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Peter Newman
@ 2025-05-20 15:28   ` Moger, Babu
  2025-05-20 16:06     ` Reinette Chatre
  0 siblings, 1 reply; 114+ messages in thread
From: Moger, Babu @ 2025-05-20 15:28 UTC (permalink / raw)
  To: Peter Newman
  Cc: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen,
	james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, maciej.wieczor-retman, eranian, Xiaojian.Du,
	gautham.shenoy

Hi Peter,

Thanks for trying the series.

On 5/19/25 10:59, Peter Newman wrote:
> Hi Babu,
> 
> On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@amd.com> wrote:
>>
>>
>> This series adds the support for Assignable Bandwidth Monitoring Counters
>> (ABMC). It is also called QoS RMID Pinning feature
>>
>> Series is written such that it is easier to support other assignable
>> features supported from different vendors.
>>
>> The feature details are documented in the  APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC). The documentation is available at
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>>
>> The patches are based on top of commit
>> 92a09c47464d0 (tag: v6.15-rc5, tip/irq/merge) Linux 6.15-rc5
>> plus
>> https://lore.kernel.org/lkml/20250515165855.31452-1-james.morse@arm.com/
>>
>> It is very clear these patches will go after James's resctrl FS/ARCH
>> restructure. Hoping to avoid one review cycle due to the merge.
>>
>> # Introduction
>>
>> Users can create as many monitor groups as RMIDs supported by the hardware.
>> However, bandwidth monitoring feature on AMD system only guarantees that
>> RMIDs currently assigned to a processor will be tracked by hardware.
>> The counters of any other RMIDs which are no longer being tracked will be
>> reset to zero. The MBM event counters return "Unavailable" for the RMIDs
>> that are not tracked by hardware. So, there can be only limited number of
>> groups that can give guaranteed monitoring numbers. With ever changing
>> configurations there is no way to definitely know which of these groups
>> are being tracked for certain point of time. Users do not have the option
>> to monitor a group or set of groups for certain period of time without
>> worrying about counter being reset in between.
>>
>> The ABMC feature provides an option to the user to assign a hardware
>> counter to an RMID, event pair and monitor the bandwidth as long as it is
>> assigned.  The assigned RMID will be tracked by the hardware until the user
>> unassigns it manually. There is no need to worry about counters being reset
>> during this period. Additionally, the user can specify a bitmask identifying
>> the specific bandwidth types from the given source to track with the counter.
>>
>> Without ABMC enabled, monitoring will work in current 'default' mode without
>> assignment option.
>>
>> # History
>>
>> Earlier implementation of ABMC had dependancy on BMEC (Bandwidth Monitoring
>> Event Configuration). Peter had concerns with that implementation because
>> it may be not be compatible with ARM's MPAM.
>>
>> Here are the threads discussing the concerns and new interface to address the concerns.
>> https://lore.kernel.org/lkml/CALPaoCg97cLVVAcacnarp+880xjsedEWGJPXhYpy4P7=ky4MZw@mail.gmail.com/
>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>>
>> Here are the finalized requirements based on the discussion:
>>
>> *   Remove BMEC dependency on the ABMC feature.
>>
>> *   Eliminate global assignment listing. The interface
>>     /sys/fs/resctrl/info/L3_MON/mbm_assign_control is no longer required.
>>
>> *   Create the configuration directories at /sys/fs/resctrl/info/L3_MON/counter_configs/.
>>     The configuration file names should be free-form, allowing users to create them as needed.
>>
>> *   Perform assignment listing at the group level by introducing mbm_L3_assignments
>>     in each monitoring group. The listing should provide the following details:
>>
>>     Event Configuration: Specifies the event configuration applied. This will be crucial
>>     when "mkdir" on event configuration is added in the future, leading to the creation
>>     of mon_data/mon_l3_*/<event configuration>.
>>
>>     Domains: Identifies the domains where the configuration is applied, supporting multi-domain setups.
>>
>>     Assignment Type: Indicates whether the assignment is Exclusive (e or d), Shared (s), or Unassigned (_).
>>
>> *   Provide option to enable or disable auto assignment when new group is created.
> 
> So far I was able to reenable MBM on AMD implementations (for some
> users) while deferring on the counter assignment interface discussion
> by just making shared assignment the default for newly-created groups.
> Until they want to upgrade assignments to exclusive or break down
> traffic with multiple counters to watch a particular group more
> closely, they won't need to change any assignments.
> 
> Just pointing out that this turned out to be a useful first step in
> deploying ABMC support.

Thank you.

> 
>>
>> This series tries to address all the requirements listed above.
>>
>> # Implementation details
>>
>> Create a generic interface aimed to support user space assignment of scarce
>> counters used for monitoring. First usage of interface is by ABMC with option
>> to expand usage to "soft-ABMC" and MPAM counters in future.
> 
> I'll try to identify any issues I've encountered with "soft-ABMC".
> Hopefully I'll be able to share a sample implementation based on these
> patches soon.

That would be wonderful.

> 
> There's now more interest in Google for allowing explicit control of
> where RMIDs are assigned on Intel platforms. Even though the number of
> RMIDs implemented by hardware tends to be roughly the number of
> containers they want to support, they often still need to create
> containers when all RMIDs have already been allocated, which is not
> currently allowed. Once the container has been created and starts
> running, it's no longer possible to move its threads into a monitoring
> group whenever RMIDs should become available again, so it's important
> for resctrl to maintain an accurate task list for a container even
> when RMIDs are not available.
> 
>>
>> Feature adds following interface files:
>>
>> /sys/fs/resctrl/info/L3_MON/mbm_assign_mode: Reports the list of assignable
>> monitoring features supported. The enclosed brackets indicate which
>> feature is enabled.
>>
>> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
>> counters available for assignment.
> 
> Earlier I discussed with Reinette[1] what num_mbm_cntrs should
> represent in a "soft-ABMC" implementation where assignment is
> implemented by assigning an RMID, which would result in all events
> being assigned at once.
> 
> My main concern is how many "counters" you can assign by assigning
> RMIDs. I recall Reinette proposed reporting the number of groups which
> can be assigned separately from counters which can be assigned.

More context may be needed here. Currently, num_mbm_cntrs indicates the
number of counters available per domain, which is 32.

At the moment, we can assign 2 counters to each group, meaning each RMID
can be associated with 2 hardware counters. In theory, it's possible to
assign all 32 hardware counters to a group—allowing one RMID to be linked
with up to 32 counters. However, we currently lack the interface to
support that level of assignment.

For now, the plan is to support basic assignment and expand functionality
later once we have the necessary data structure and requirements.

> 
>>
>> /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs: Reports the number of monitoring
>> counters free in each domain.
>>
>> /sys/fs/resctrl/info/L3_MON/counter_configs : Directory to hold the counter configuration.
>>
>> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter : Default configuration
>> for MBM total events.
>>
>> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter : Default configuration
>> for MBM local events.
> 
> IIUC, this needs to be implemented now so you can drop BMEC with this series?

This series hides the configuration files (mbm_local_bytes_config and
mbm_total_bytes_config) required for BMEC when ABMC is enabled.

When the user switches back to "default" mode, BMEC becomes available
again. I believe it's a good approach to keep it this way.

> 
>>
>> /sys/fs/resctrl/mbm_L3_assignments: Interface to list or modify assignment states on each group.
>>
>> # Examples
>>
>> a. Check if ABMC support is available
>>         #mount -t resctrl resctrl /sys/fs/resctrl/
>>
>>         # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>>         [mbm_cntr_assign]
>>         default
>>
>>         ABMC feature is detected and it is enabled.
>>
>> b. Check how many ABMC counters are available.
>>
>>         # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
>>         32
>>
>> c. Check how many ABMC counters are available in each domain.
>>
>>         # cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
>>         0=30;1=30
>>
>> d. Check default counter configuration.
>>
>>         # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>>         local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
>>         local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all
>>
>>         # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>         local_reads, local_non_temporal_writes, local_reads_slow_memory
>>
>> e. Series adds a new interface file "mbm_L3_assignments" in each monitoring group
>>    to list and modify any group's monitoring states.
> 
> To confirm, would we have "mbm_<resource_name>_assignments" for each
> resource where MBM-ish events could be assigned?

This is a group-level property—it resides within each group and is not
related to any specific resource.

> 
>>
>>         The list is displayed in the following format:
>>
>>         <Event configuration>:<Domain id>=<Assignment type>
> 
> For soft-ABMC assignment, is there just a single event configuration
> representing all the events tracked by the RMID?


I’m not sure about the details of how soft-ABMC will be supported. It’s
not available at the moment, but I believe it can be added once soft-ABMC
support is in place.

> 
>>
>>         Event configuration: A valid event configuration listed in the
>>         /sys/fs/resctrl/info/L3_MON/counter_configs directory.
>>
>>         Domain ID: A valid domain ID number.
>>
>>         Assignment types:
>>
>>         _ : No event configuration assigned
>>
>>         e : Event configuration assigned in exclusive mode
>>
>>         To list the default group states:
>>         # cat /sys/fs/resctrl/mbm_L3_assignments
>>         mbm_total_bytes:0=e;1=e
>>         mbm_local_bytes:0=e;1=e
>>
>>         To unassign the configuration of mbm_total_bytes on domain 0:
>>         #echo "mbm_total_bytes:0=_" > mbm_L3_assignments
>>         #cat mbm_L3_assignments
>>         mbm_total_bytes:0=_;1=e
>>         mbm_local_bytes:0=e;1=e
>>
>>         To unassign the mbm_total_bytes configuration on all domains:
>>         $echo "mbm_total_bytes:*=_" > mbm_L3_assignments
>>         $cat mbm_L3_assignments
>>         mbm_total_bytes:0=_;1=_
>>         mbm_local_bytes:0=e;1=e
>>
>>         To assign the mbm_total_bytes configuration on all domains in exclusive mode:
>>         $echo "mbm_total_bytes:*=e" > mbm_L3_assignments
>>         $cat mbm_L3_assignments
>>         mbm_total_bytes:0=e;1=e
>>         mbm_local_bytes:0=e;1=e
>>
>> g. Read the events mbm_total_bytes and mbm_local_bytes of the default group.
>>    There is no change in reading the events with ABMC. If the event is unassigned
>>    when reading, then the read will come back as "Unassigned".
>>
>>         # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>         779247936
>>         # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>         765207488
>>
>> h. Check the default event configurations.
>>
>>         #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>>         local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
>>         local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all
>>
>>         #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>         local_reads, local_non_temporal_writes, local_reads_slow_memory
> 
> These look like the BMEC event names converted from camel case. Will
> event filter programming be portable?


Yes, that’s correct. The event types (reads, writes, etc.) supported by
both BMEC and ABMC are the same, so I’ve used generalized names here.

As for portability, I can’t comment, since I’m not familiar with how event
configuration is handled in MPAM or other architectures.

> 
> Thanks,
> -Peter
> 
> 
> [1] https://lore.kernel.org/lkml/b3babdac-da08-4dfd-9544-47db31d574f5@intel.com/

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-20 15:28   ` Moger, Babu
@ 2025-05-20 16:06     ` Reinette Chatre
  2025-05-20 17:51       ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-05-20 16:06 UTC (permalink / raw)
  To: babu.moger, Peter Newman
  Cc: corbet, tony.luck, tglx, mingo, bp, dave.hansen, james.morse,
	dave.martin, fenghuay, x86, hpa, paulmck, akpm, thuth, rostedt,
	ardb, gregkh, daniel.sneddon, jpoimboe, alexandre.chartre,
	pawan.kumar.gupta, thomas.lendacky, perry.yuan, seanjc, kai.huang,
	xiaoyao.li, kan.liang, xin3.li, ebiggers, xin, sohil.mehta,
	andrew.cooper3, mario.limonciello, linux-doc, linux-kernel,
	maciej.wieczor-retman, eranian, Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/20/25 8:28 AM, Moger, Babu wrote:
> On 5/19/25 10:59, Peter Newman wrote:
>> On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@amd.com> wrote:

...

>>> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
>>> counters available for assignment.
>>
>> Earlier I discussed with Reinette[1] what num_mbm_cntrs should
>> represent in a "soft-ABMC" implementation where assignment is
>> implemented by assigning an RMID, which would result in all events
>> being assigned at once.
>>
>> My main concern is how many "counters" you can assign by assigning
>> RMIDs. I recall Reinette proposed reporting the number of groups which
>> can be assigned separately from counters which can be assigned.
> 
> More context may be needed here. Currently, num_mbm_cntrs indicates the
> number of counters available per domain, which is 32.
> 
> At the moment, we can assign 2 counters to each group, meaning each RMID
> can be associated with 2 hardware counters. In theory, it's possible to
> assign all 32 hardware counters to a group—allowing one RMID to be linked
> with up to 32 counters. However, we currently lack the interface to
> support that level of assignment.
> 
> For now, the plan is to support basic assignment and expand functionality
> later once we have the necessary data structure and requirements.

Looks like some requirements did not make it into this implementation.
Do you recall the discussion that resulted in you writing [2]? Looks like
there is a question to Peter in there on how to determine how many "counters"
are available in soft-ABMC. I interpreted [3] at that time to mean that this
information would be available in a future AMD publication.

Reinette

[2] https://lore.kernel.org/lkml/afb99efe-0de2-f7ad-d0b8-f2a0ea998efd@amd.com/ 
[3] https://lore.kernel.org/lkml/CALPaoCg3KpF94g2MEmfP_Ro2mQZYFA8sKVkmb+7isotKNgdY9A@mail.gmail.com/

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-20 16:06     ` Reinette Chatre
@ 2025-05-20 17:51       ` Moger, Babu
  2025-05-20 18:23         ` Reinette Chatre
  0 siblings, 1 reply; 114+ messages in thread
From: Moger, Babu @ 2025-05-20 17:51 UTC (permalink / raw)
  To: Reinette Chatre, Peter Newman
  Cc: corbet, tony.luck, tglx, mingo, bp, dave.hansen, james.morse,
	dave.martin, fenghuay, x86, hpa, paulmck, akpm, thuth, rostedt,
	ardb, gregkh, daniel.sneddon, jpoimboe, alexandre.chartre,
	pawan.kumar.gupta, thomas.lendacky, perry.yuan, seanjc, kai.huang,
	xiaoyao.li, kan.liang, xin3.li, ebiggers, xin, sohil.mehta,
	andrew.cooper3, mario.limonciello, linux-doc, linux-kernel,
	maciej.wieczor-retman, eranian, Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/20/25 11:06, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/20/25 8:28 AM, Moger, Babu wrote:
>> On 5/19/25 10:59, Peter Newman wrote:
>>> On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@amd.com> wrote:
> 
> ...
> 
>>>> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
>>>> counters available for assignment.
>>>
>>> Earlier I discussed with Reinette[1] what num_mbm_cntrs should
>>> represent in a "soft-ABMC" implementation where assignment is
>>> implemented by assigning an RMID, which would result in all events
>>> being assigned at once.
>>>
>>> My main concern is how many "counters" you can assign by assigning
>>> RMIDs. I recall Reinette proposed reporting the number of groups which
>>> can be assigned separately from counters which can be assigned.
>>
>> More context may be needed here. Currently, num_mbm_cntrs indicates the
>> number of counters available per domain, which is 32.
>>
>> At the moment, we can assign 2 counters to each group, meaning each RMID
>> can be associated with 2 hardware counters. In theory, it's possible to
>> assign all 32 hardware counters to a group—allowing one RMID to be linked
>> with up to 32 counters. However, we currently lack the interface to
>> support that level of assignment.
>>
>> For now, the plan is to support basic assignment and expand functionality
>> later once we have the necessary data structure and requirements.
> 
> Looks like some requirements did not make it into this implementation.
> Do you recall the discussion that resulted in you writing [2]? Looks like
> there is a question to Peter in there on how to determine how many "counters"
> are available in soft-ABMC. I interpreted [3] at that time to mean that this
> information would be available in a future AMD publication.

We already have a method to determine the number of counters in soft-ABMC
mode, which Peter has addressed [4].

[4]
https://lore.kernel.org/lkml/20250203132642.2746754-1-peternewman@google.com/

This appears to be more of a workaround, and I doubt it will be included
in any official AMD documentation. Additionally, the long-term direction
is moving towards ABMC.

I don’t believe this workaround needs to be part of the current series. It
can be added later when soft-ABMC is implemented.

> 
> Reinette
> 
> [2] https://lore.kernel.org/lkml/afb99efe-0de2-f7ad-d0b8-f2a0ea998efd@amd.com/ 
> [3] https://lore.kernel.org/lkml/CALPaoCg3KpF94g2MEmfP_Ro2mQZYFA8sKVkmb+7isotKNgdY9A@mail.gmail.com/

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-20 17:51       ` Moger, Babu
@ 2025-05-20 18:23         ` Reinette Chatre
  2025-05-20 23:25           ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-05-20 18:23 UTC (permalink / raw)
  To: babu.moger, Peter Newman
  Cc: corbet, tony.luck, tglx, mingo, bp, dave.hansen, james.morse,
	dave.martin, fenghuay, x86, hpa, paulmck, akpm, thuth, rostedt,
	ardb, gregkh, daniel.sneddon, jpoimboe, alexandre.chartre,
	pawan.kumar.gupta, thomas.lendacky, perry.yuan, seanjc, kai.huang,
	xiaoyao.li, kan.liang, xin3.li, ebiggers, xin, sohil.mehta,
	andrew.cooper3, mario.limonciello, linux-doc, linux-kernel,
	maciej.wieczor-retman, eranian, Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/20/25 10:51 AM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 5/20/25 11:06, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 5/20/25 8:28 AM, Moger, Babu wrote:
>>> On 5/19/25 10:59, Peter Newman wrote:
>>>> On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@amd.com> wrote:
>>
>> ...
>>
>>>>> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
>>>>> counters available for assignment.
>>>>
>>>> Earlier I discussed with Reinette[1] what num_mbm_cntrs should
>>>> represent in a "soft-ABMC" implementation where assignment is
>>>> implemented by assigning an RMID, which would result in all events
>>>> being assigned at once.
>>>>
>>>> My main concern is how many "counters" you can assign by assigning
>>>> RMIDs. I recall Reinette proposed reporting the number of groups which
>>>> can be assigned separately from counters which can be assigned.
>>>
>>> More context may be needed here. Currently, num_mbm_cntrs indicates the
>>> number of counters available per domain, which is 32.
>>>
>>> At the moment, we can assign 2 counters to each group, meaning each RMID
>>> can be associated with 2 hardware counters. In theory, it's possible to
>>> assign all 32 hardware counters to a group—allowing one RMID to be linked
>>> with up to 32 counters. However, we currently lack the interface to
>>> support that level of assignment.
>>>
>>> For now, the plan is to support basic assignment and expand functionality
>>> later once we have the necessary data structure and requirements.
>>
>> Looks like some requirements did not make it into this implementation.
>> Do you recall the discussion that resulted in you writing [2]? Looks like
>> there is a question to Peter in there on how to determine how many "counters"
>> are available in soft-ABMC. I interpreted [3] at that time to mean that this
>> information would be available in a future AMD publication.
> 
> We already have a method to determine the number of counters in soft-ABMC
> mode, which Peter has addressed [4].
> 
> [4]
> https://lore.kernel.org/lkml/20250203132642.2746754-1-peternewman@google.com/
> 
> This appears to be more of a workaround, and I doubt it will be included
> in any official AMD documentation. Additionally, the long-term direction
> is moving towards ABMC.
> 
> I don’t believe this workaround needs to be part of the current series. It
> can be added later when soft-ABMC is implemented.

Agreed. What about the plans described in [2]? (Thanks to Peter for
catching this!).

It is important to keep track of requirements while working on a feature to
ensure that the implementation supports the planned use cases. Re-reading that
thread it is not clear to me how soft-ABMC's per-group assignment would look.
Could you please share how you see it progress from this implementation?
This includes the single event vs. multiple event assignment. I would like to
highlight that this is not a request for this to be supported in this implementation
but there needs to be a plan for how this can be supported on top of interfaces
established by this work.

Reinette

> 
>>
>> Reinette
>>
>> [2] https://lore.kernel.org/lkml/afb99efe-0de2-f7ad-d0b8-f2a0ea998efd@amd.com/ 
>> [3] https://lore.kernel.org/lkml/CALPaoCg3KpF94g2MEmfP_Ro2mQZYFA8sKVkmb+7isotKNgdY9A@mail.gmail.com/
> 


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-20 18:23         ` Reinette Chatre
@ 2025-05-20 23:25           ` Moger, Babu
  2025-05-20 23:44             ` Reinette Chatre
  0 siblings, 1 reply; 114+ messages in thread
From: Moger, Babu @ 2025-05-20 23:25 UTC (permalink / raw)
  To: Reinette Chatre, babu.moger, Peter Newman
  Cc: corbet, tony.luck, tglx, mingo, bp, dave.hansen, james.morse,
	dave.martin, fenghuay, x86, hpa, paulmck, akpm, thuth, rostedt,
	ardb, gregkh, daniel.sneddon, jpoimboe, alexandre.chartre,
	pawan.kumar.gupta, thomas.lendacky, perry.yuan, seanjc, kai.huang,
	xiaoyao.li, kan.liang, xin3.li, ebiggers, xin, sohil.mehta,
	andrew.cooper3, mario.limonciello, linux-doc, linux-kernel,
	maciej.wieczor-retman, eranian, Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/20/2025 1:23 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/20/25 10:51 AM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 5/20/25 11:06, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 5/20/25 8:28 AM, Moger, Babu wrote:
>>>> On 5/19/25 10:59, Peter Newman wrote:
>>>>> On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@amd.com> wrote:
>>>
>>> ...
>>>
>>>>>> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
>>>>>> counters available for assignment.
>>>>>
>>>>> Earlier I discussed with Reinette[1] what num_mbm_cntrs should
>>>>> represent in a "soft-ABMC" implementation where assignment is
>>>>> implemented by assigning an RMID, which would result in all events
>>>>> being assigned at once.
>>>>>
>>>>> My main concern is how many "counters" you can assign by assigning
>>>>> RMIDs. I recall Reinette proposed reporting the number of groups which
>>>>> can be assigned separately from counters which can be assigned.
>>>>
>>>> More context may be needed here. Currently, num_mbm_cntrs indicates the
>>>> number of counters available per domain, which is 32.
>>>>
>>>> At the moment, we can assign 2 counters to each group, meaning each RMID
>>>> can be associated with 2 hardware counters. In theory, it's possible to
>>>> assign all 32 hardware counters to a group—allowing one RMID to be linked
>>>> with up to 32 counters. However, we currently lack the interface to
>>>> support that level of assignment.
>>>>
>>>> For now, the plan is to support basic assignment and expand functionality
>>>> later once we have the necessary data structure and requirements.
>>>
>>> Looks like some requirements did not make it into this implementation.
>>> Do you recall the discussion that resulted in you writing [2]? Looks like
>>> there is a question to Peter in there on how to determine how many "counters"
>>> are available in soft-ABMC. I interpreted [3] at that time to mean that this
>>> information would be available in a future AMD publication.
>>
>> We already have a method to determine the number of counters in soft-ABMC
>> mode, which Peter has addressed [4].
>>
>> [4]
>> https://lore.kernel.org/lkml/20250203132642.2746754-1-peternewman@google.com/
>>
>> This appears to be more of a workaround, and I doubt it will be included
>> in any official AMD documentation. Additionally, the long-term direction
>> is moving towards ABMC.
>>
>> I don’t believe this workaround needs to be part of the current series. It
>> can be added later when soft-ABMC is implemented.
> 
> Agreed. What about the plans described in [2]? (Thanks to Peter for
> catching this!).
> 
> It is important to keep track of requirements while working on a feature to
> ensure that the implementation supports the planned use cases. Re-reading that
> thread it is not clear to me how soft-ABMC's per-group assignment would look.
> Could you please share how you see it progress from this implementation?
> This includes the single event vs. multiple event assignment. I would like to
> highlight that this is not a request for this to be supported in this implementation
> but there needs to be a plan for how this can be supported on top of interfaces
> established by this work.
> 

Here’s my current understanding of soft-ABMC. Peter may have a more 
in-depth perspective on this.

Soft-ABMC:
a. num_mbm_cntrs: This is a software-defined limit based on the number 
of active RMIDs that can be supported. The value can be obtained using 
the code referenced in [4].

b. Assignments: No hardware configuration is required. We simply need to 
ensure that no more than num_mbm_cntrs RMIDs are active at any given time.

c. Configuration: Controlled via /info/L3_MON/mbm_total_bytes_config and 
mbm_local_bytes_config.

d. Events: Only two events can be assigned(local and total).

ABMC:
a. num_mbm_cntrs: This is defined by the hardware.
b. Assignments: Requires special MSR writes to assign counters.
c. Configuration: Comes from /info/L3_MON/counter_configs/.
d. Events: More than two events can be assigned to a group (currently up 
to 2).

Commonalities:
a. Assignments can be either exclusive or shared in both these modes.

Given these, I believe we can easily accommodate soft-ABMC in this 
interface.

>>>
>>> [2] https://lore.kernel.org/lkml/afb99efe-0de2-f7ad-d0b8-f2a0ea998efd@amd.com/
>>> [3] https://lore.kernel.org/lkml/CALPaoCg3KpF94g2MEmfP_Ro2mQZYFA8sKVkmb+7isotKNgdY9A@mail.gmail.com/
>>
> 
> 


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-20 23:25           ` Moger, Babu
@ 2025-05-20 23:44             ` Reinette Chatre
  2025-05-21  9:18               ` Peter Newman
  2025-05-21 14:27               ` Peter Newman
  0 siblings, 2 replies; 114+ messages in thread
From: Reinette Chatre @ 2025-05-20 23:44 UTC (permalink / raw)
  To: Moger, Babu, babu.moger, Peter Newman
  Cc: corbet, tony.luck, tglx, mingo, bp, dave.hansen, james.morse,
	dave.martin, fenghuay, x86, hpa, paulmck, akpm, thuth, rostedt,
	ardb, gregkh, daniel.sneddon, jpoimboe, alexandre.chartre,
	pawan.kumar.gupta, thomas.lendacky, perry.yuan, seanjc, kai.huang,
	xiaoyao.li, kan.liang, xin3.li, ebiggers, xin, sohil.mehta,
	andrew.cooper3, mario.limonciello, linux-doc, linux-kernel,
	maciej.wieczor-retman, eranian, Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/20/25 4:25 PM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 5/20/2025 1:23 PM, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 5/20/25 10:51 AM, Moger, Babu wrote:
>>> Hi Reinette,
>>>
>>> On 5/20/25 11:06, Reinette Chatre wrote:
>>>> Hi Babu,
>>>>
>>>> On 5/20/25 8:28 AM, Moger, Babu wrote:
>>>>> On 5/19/25 10:59, Peter Newman wrote:
>>>>>> On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@amd.com> wrote:
>>>>
>>>> ...
>>>>
>>>>>>> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
>>>>>>> counters available for assignment.
>>>>>>
>>>>>> Earlier I discussed with Reinette[1] what num_mbm_cntrs should
>>>>>> represent in a "soft-ABMC" implementation where assignment is
>>>>>> implemented by assigning an RMID, which would result in all events
>>>>>> being assigned at once.
>>>>>>
>>>>>> My main concern is how many "counters" you can assign by assigning
>>>>>> RMIDs. I recall Reinette proposed reporting the number of groups which
>>>>>> can be assigned separately from counters which can be assigned.
>>>>>
>>>>> More context may be needed here. Currently, num_mbm_cntrs indicates the
>>>>> number of counters available per domain, which is 32.
>>>>>
>>>>> At the moment, we can assign 2 counters to each group, meaning each RMID
>>>>> can be associated with 2 hardware counters. In theory, it's possible to
>>>>> assign all 32 hardware counters to a group—allowing one RMID to be linked
>>>>> with up to 32 counters. However, we currently lack the interface to
>>>>> support that level of assignment.
>>>>>
>>>>> For now, the plan is to support basic assignment and expand functionality
>>>>> later once we have the necessary data structure and requirements.
>>>>
>>>> Looks like some requirements did not make it into this implementation.
>>>> Do you recall the discussion that resulted in you writing [2]? Looks like
>>>> there is a question to Peter in there on how to determine how many "counters"
>>>> are available in soft-ABMC. I interpreted [3] at that time to mean that this
>>>> information would be available in a future AMD publication.
>>>
>>> We already have a method to determine the number of counters in soft-ABMC
>>> mode, which Peter has addressed [4].
>>>
>>> [4]
>>> https://lore.kernel.org/lkml/20250203132642.2746754-1-peternewman@google.com/
>>>
>>> This appears to be more of a workaround, and I doubt it will be included
>>> in any official AMD documentation. Additionally, the long-term direction
>>> is moving towards ABMC.
>>>
>>> I don’t believe this workaround needs to be part of the current series. It
>>> can be added later when soft-ABMC is implemented.
>>
>> Agreed. What about the plans described in [2]? (Thanks to Peter for
>> catching this!).
>>
>> It is important to keep track of requirements while working on a feature to
>> ensure that the implementation supports the planned use cases. Re-reading that
>> thread it is not clear to me how soft-ABMC's per-group assignment would look.
>> Could you please share how you see it progress from this implementation?
>> This includes the single event vs. multiple event assignment. I would like to
>> highlight that this is not a request for this to be supported in this implementation
>> but there needs to be a plan for how this can be supported on top of interfaces
>> established by this work.
>>
> 
> Here’s my current understanding of soft-ABMC. Peter may have a more in-depth perspective on this.
> 
> Soft-ABMC:
> a. num_mbm_cntrs: This is a software-defined limit based on the number of active RMIDs that can be supported. The value can be obtained using the code referenced in [4].
> 
> b. Assignments: No hardware configuration is required. We simply need to ensure that no more than num_mbm_cntrs RMIDs are active at any given time.
> 
> c. Configuration: Controlled via /info/L3_MON/mbm_total_bytes_config and mbm_local_bytes_config.
> 
> d. Events: Only two events can be assigned(local and total).
> 
> ABMC:
> a. num_mbm_cntrs: This is defined by the hardware.
> b. Assignments: Requires special MSR writes to assign counters.
> c. Configuration: Comes from /info/L3_MON/counter_configs/.
> d. Events: More than two events can be assigned to a group (currently up to 2).
> 
> Commonalities:
> a. Assignments can be either exclusive or shared in both these modes.
> 
> Given these, I believe we can easily accommodate soft-ABMC in this interface.

This is not so obvious to me. It looks to me as though the user is forced to interpret
the content of resctrl files differently based on soft-ABMC vs ABMC making the interface 
inconsistent and user thus needing to know details of implementations. This is what the previous
discussion I linked to aimed to address. It sounds to me as though you believe that this is no longer
an issue. Could you please show examples of what a user can expect from the interfaces and how a user
will interact with the interfaces on both a non-ABMC and ABMC system? 

Thank you

Reinette

> 
>>>>
>>>> [2] https://lore.kernel.org/lkml/afb99efe-0de2-f7ad-d0b8-f2a0ea998efd@amd.com/
>>>> [3] https://lore.kernel.org/lkml/CALPaoCg3KpF94g2MEmfP_Ro2mQZYFA8sKVkmb+7isotKNgdY9A@mail.gmail.com/
>>>
>>
>>
> 


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-20 23:44             ` Reinette Chatre
@ 2025-05-21  9:18               ` Peter Newman
  2025-05-21 23:03                 ` Reinette Chatre
  2025-05-21 14:27               ` Peter Newman
  1 sibling, 1 reply; 114+ messages in thread
From: Peter Newman @ 2025-05-21  9:18 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Moger, Babu, babu.moger, corbet, tony.luck, tglx, mingo, bp,
	dave.hansen, james.morse, dave.martin, fenghuay, x86, hpa,
	paulmck, akpm, thuth, rostedt, ardb, gregkh, daniel.sneddon,
	jpoimboe, alexandre.chartre, pawan.kumar.gupta, thomas.lendacky,
	perry.yuan, seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Babu/Reinette,

On Wed, May 21, 2025 at 1:44 AM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> Hi Babu,
>
> On 5/20/25 4:25 PM, Moger, Babu wrote:
> > Hi Reinette,
> >
> > On 5/20/2025 1:23 PM, Reinette Chatre wrote:
> >> Hi Babu,
> >>
> >> On 5/20/25 10:51 AM, Moger, Babu wrote:
> >>> Hi Reinette,
> >>>
> >>> On 5/20/25 11:06, Reinette Chatre wrote:
> >>>> Hi Babu,
> >>>>
> >>>> On 5/20/25 8:28 AM, Moger, Babu wrote:
> >>>>> On 5/19/25 10:59, Peter Newman wrote:
> >>>>>> On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@amd.com> wrote:
> >>>>
> >>>> ...
> >>>>
> >>>>>>> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
> >>>>>>> counters available for assignment.
> >>>>>>
> >>>>>> Earlier I discussed with Reinette[1] what num_mbm_cntrs should
> >>>>>> represent in a "soft-ABMC" implementation where assignment is
> >>>>>> implemented by assigning an RMID, which would result in all events
> >>>>>> being assigned at once.
> >>>>>>
> >>>>>> My main concern is how many "counters" you can assign by assigning
> >>>>>> RMIDs. I recall Reinette proposed reporting the number of groups which
> >>>>>> can be assigned separately from counters which can be assigned.
> >>>>>
> >>>>> More context may be needed here. Currently, num_mbm_cntrs indicates the
> >>>>> number of counters available per domain, which is 32.
> >>>>>
> >>>>> At the moment, we can assign 2 counters to each group, meaning each RMID
> >>>>> can be associated with 2 hardware counters. In theory, it's possible to
> >>>>> assign all 32 hardware counters to a group—allowing one RMID to be linked
> >>>>> with up to 32 counters. However, we currently lack the interface to
> >>>>> support that level of assignment.
> >>>>>
> >>>>> For now, the plan is to support basic assignment and expand functionality
> >>>>> later once we have the necessary data structure and requirements.
> >>>>
> >>>> Looks like some requirements did not make it into this implementation.
> >>>> Do you recall the discussion that resulted in you writing [2]? Looks like
> >>>> there is a question to Peter in there on how to determine how many "counters"
> >>>> are available in soft-ABMC. I interpreted [3] at that time to mean that this
> >>>> information would be available in a future AMD publication.
> >>>
> >>> We already have a method to determine the number of counters in soft-ABMC
> >>> mode, which Peter has addressed [4].
> >>>
> >>> [4]
> >>> https://lore.kernel.org/lkml/20250203132642.2746754-1-peternewman@google.com/
> >>>
> >>> This appears to be more of a workaround, and I doubt it will be included
> >>> in any official AMD documentation. Additionally, the long-term direction
> >>> is moving towards ABMC.
> >>>
> >>> I don’t believe this workaround needs to be part of the current series. It
> >>> can be added later when soft-ABMC is implemented.
> >>
> >> Agreed. What about the plans described in [2]? (Thanks to Peter for
> >> catching this!).
> >>
> >> It is important to keep track of requirements while working on a feature to
> >> ensure that the implementation supports the planned use cases. Re-reading that
> >> thread it is not clear to me how soft-ABMC's per-group assignment would look.
> >> Could you please share how you see it progress from this implementation?
> >> This includes the single event vs. multiple event assignment. I would like to
> >> highlight that this is not a request for this to be supported in this implementation
> >> but there needs to be a plan for how this can be supported on top of interfaces
> >> established by this work.
> >>
> >
> > Here’s my current understanding of soft-ABMC. Peter may have a more in-depth perspective on this.
> >
> > Soft-ABMC:
> > a. num_mbm_cntrs: This is a software-defined limit based on the number of active RMIDs that can be supported. The value can be obtained using the code referenced in [4].

I would call it a hardware-defined limit that can be probed by software.

The main question is whether this file returns the exact number of
RMIDs hardware can track or double that number (mbm_total_bytes +
mbm_local_bytes) so that the value is always measured in events.

There's also the mongroup-RMID overcommit use case I described
above[1]. On Intel we can safely assume that there are counters to
back all RMIDs, so num_mbm_cntrs would be calculated directly from
num_rmids.

I realized this use case is more difficult to implement on MPAM,
because a PARTID is effectively a CLOSID+RMID, so deferring assigning
a unique PARTID to a group also results in it being in a different
allocation group. It will work if the unmonitored groups could find a
way to share PARTIDs, but this has consequences on allocation - but
hopefully no worse than sharing CLOSIDs on x86.

There's a lot of interest in monitoring ID overcommit in Google, so I
think it's worth it for me to investigate the additional structural
changes needed in resctrl (i.e., breaking the FS-level association
between mongroups and HW monitoring IDs). Such a framework could be a
better fit for soft-ABMC. For example, if overcommit is allowed, we
would just report the number of simultaneous RMIDs we were able to
probe as num_rmids. I would want the same shared assignment scheduler
to be able to work with RMIDs and counters, though.

Thanks,
-Peter

[1] https://lore.kernel.org/lkml/CALPaoChSzzU5mzMZsdT6CeyEn0WD1qdT9fKCoNW_ty4tojtrkw@mail.gmail.com/

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-20 23:44             ` Reinette Chatre
  2025-05-21  9:18               ` Peter Newman
@ 2025-05-21 14:27               ` Peter Newman
  2025-05-21 23:05                 ` Reinette Chatre
  1 sibling, 1 reply; 114+ messages in thread
From: Peter Newman @ 2025-05-21 14:27 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Moger, Babu, babu.moger, corbet, tony.luck, tglx, mingo, bp,
	dave.hansen, james.morse, dave.martin, fenghuay, x86, hpa,
	paulmck, akpm, thuth, rostedt, ardb, gregkh, daniel.sneddon,
	jpoimboe, alexandre.chartre, pawan.kumar.gupta, thomas.lendacky,
	perry.yuan, seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On Wed, May 21, 2025 at 1:44 AM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> Hi Babu,
>
> On 5/20/25 4:25 PM, Moger, Babu wrote:
> > Hi Reinette,
> >
> > On 5/20/2025 1:23 PM, Reinette Chatre wrote:
> >> Hi Babu,
> >>
> >> On 5/20/25 10:51 AM, Moger, Babu wrote:
> >>> Hi Reinette,
> >>>
> >>> On 5/20/25 11:06, Reinette Chatre wrote:
> >>>> Hi Babu,
> >>>>
> >>>> On 5/20/25 8:28 AM, Moger, Babu wrote:
> >>>>> On 5/19/25 10:59, Peter Newman wrote:
> >>>>>> On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@amd.com> wrote:
> >>>>
> >>>> ...
> >>>>
> >>>>>>> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
> >>>>>>> counters available for assignment.
> >>>>>>
> >>>>>> Earlier I discussed with Reinette[1] what num_mbm_cntrs should
> >>>>>> represent in a "soft-ABMC" implementation where assignment is
> >>>>>> implemented by assigning an RMID, which would result in all events
> >>>>>> being assigned at once.
> >>>>>>
> >>>>>> My main concern is how many "counters" you can assign by assigning
> >>>>>> RMIDs. I recall Reinette proposed reporting the number of groups which
> >>>>>> can be assigned separately from counters which can be assigned.
> >>>>>
> >>>>> More context may be needed here. Currently, num_mbm_cntrs indicates the
> >>>>> number of counters available per domain, which is 32.
> >>>>>
> >>>>> At the moment, we can assign 2 counters to each group, meaning each RMID
> >>>>> can be associated with 2 hardware counters. In theory, it's possible to
> >>>>> assign all 32 hardware counters to a group—allowing one RMID to be linked
> >>>>> with up to 32 counters. However, we currently lack the interface to
> >>>>> support that level of assignment.
> >>>>>
> >>>>> For now, the plan is to support basic assignment and expand functionality
> >>>>> later once we have the necessary data structure and requirements.
> >>>>
> >>>> Looks like some requirements did not make it into this implementation.
> >>>> Do you recall the discussion that resulted in you writing [2]? Looks like
> >>>> there is a question to Peter in there on how to determine how many "counters"
> >>>> are available in soft-ABMC. I interpreted [3] at that time to mean that this
> >>>> information would be available in a future AMD publication.
> >>>
> >>> We already have a method to determine the number of counters in soft-ABMC
> >>> mode, which Peter has addressed [4].
> >>>
> >>> [4]
> >>> https://lore.kernel.org/lkml/20250203132642.2746754-1-peternewman@google.com/
> >>>
> >>> This appears to be more of a workaround, and I doubt it will be included
> >>> in any official AMD documentation. Additionally, the long-term direction
> >>> is moving towards ABMC.
> >>>
> >>> I don’t believe this workaround needs to be part of the current series. It
> >>> can be added later when soft-ABMC is implemented.
> >>
> >> Agreed. What about the plans described in [2]? (Thanks to Peter for
> >> catching this!).
> >>
> >> It is important to keep track of requirements while working on a feature to
> >> ensure that the implementation supports the planned use cases. Re-reading that
> >> thread it is not clear to me how soft-ABMC's per-group assignment would look.
> >> Could you please share how you see it progress from this implementation?
> >> This includes the single event vs. multiple event assignment. I would like to
> >> highlight that this is not a request for this to be supported in this implementation
> >> but there needs to be a plan for how this can be supported on top of interfaces
> >> established by this work.
> >>
> >
> > Here’s my current understanding of soft-ABMC. Peter may have a more in-depth perspective on this.
> >
> > Soft-ABMC:
> > a. num_mbm_cntrs: This is a software-defined limit based on the number of active RMIDs that can be supported. The value can be obtained using the code referenced in [4].
> >
> > b. Assignments: No hardware configuration is required. We simply need to ensure that no more than num_mbm_cntrs RMIDs are active at any given time.
> >
> > c. Configuration: Controlled via /info/L3_MON/mbm_total_bytes_config and mbm_local_bytes_config.
> >
> > d. Events: Only two events can be assigned(local and total).
> >
> > ABMC:
> > a. num_mbm_cntrs: This is defined by the hardware.
> > b. Assignments: Requires special MSR writes to assign counters.
> > c. Configuration: Comes from /info/L3_MON/counter_configs/.
> > d. Events: More than two events can be assigned to a group (currently up to 2).
> >
> > Commonalities:
> > a. Assignments can be either exclusive or shared in both these modes.
> >
> > Given these, I believe we can easily accommodate soft-ABMC in this interface.
>
> This is not so obvious to me. It looks to me as though the user is forced to interpret
> the content of resctrl files differently based on soft-ABMC vs ABMC making the interface
> inconsistent and user thus needing to know details of implementations. This is what the previous
> discussion I linked to aimed to address. It sounds to me as though you believe that this is no longer
> an issue. Could you please show examples of what a user can expect from the interfaces and how a user
> will interact with the interfaces on both a non-ABMC and ABMC system?

At the interface level, I think mbm_L3_assignments on a non-ABMC
system would only need to contain a single line:

0=s;1=s;...;31=s

But maybe for consistency we would synthesize a single, unmodifiable
counter configuration to reflect that allocating an RMID in a domain
results in assignment to all events and deallocating the RMID
unassigns all events. We could call it "group" to say it's assigning
at the group level, or perhaps just '*':

*:0=s;1=s;...;31=s

I'm not sure about allowing a '*' on ABMC hardware, because it could
be interpreted as allocating a lot of counters when a large number of
event configurations exist.

*:0=s;1=s;...;31=s

-Peter


>
> Thank you
>
> Reinette
>
> >
> >>>>
> >>>> [2] https://lore.kernel.org/lkml/afb99efe-0de2-f7ad-d0b8-f2a0ea998efd@amd.com/
> >>>> [3] https://lore.kernel.org/lkml/CALPaoCg3KpF94g2MEmfP_Ro2mQZYFA8sKVkmb+7isotKNgdY9A@mail.gmail.com/
> >>>
> >>
> >>
> >
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-21  9:18               ` Peter Newman
@ 2025-05-21 23:03                 ` Reinette Chatre
  2025-05-21 23:43                   ` Luck, Tony
  2025-05-22 15:44                   ` Moger, Babu
  0 siblings, 2 replies; 114+ messages in thread
From: Reinette Chatre @ 2025-05-21 23:03 UTC (permalink / raw)
  To: Peter Newman
  Cc: Moger, Babu, babu.moger, corbet, tony.luck, tglx, mingo, bp,
	dave.hansen, james.morse, dave.martin, fenghuay, x86, hpa,
	paulmck, akpm, thuth, rostedt, ardb, gregkh, daniel.sneddon,
	jpoimboe, alexandre.chartre, pawan.kumar.gupta, thomas.lendacky,
	perry.yuan, seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Peter and Babu,

On 5/21/25 2:18 AM, Peter Newman wrote:
> Hi Babu/Reinette,
> 
> On Wed, May 21, 2025 at 1:44 AM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>>
>> Hi Babu,
>>
>> On 5/20/25 4:25 PM, Moger, Babu wrote:
>>> Hi Reinette,
>>>
>>> On 5/20/2025 1:23 PM, Reinette Chatre wrote:
>>>> Hi Babu,
>>>>
>>>> On 5/20/25 10:51 AM, Moger, Babu wrote:
>>>>> Hi Reinette,
>>>>>
>>>>> On 5/20/25 11:06, Reinette Chatre wrote:
>>>>>> Hi Babu,
>>>>>>
>>>>>> On 5/20/25 8:28 AM, Moger, Babu wrote:
>>>>>>> On 5/19/25 10:59, Peter Newman wrote:
>>>>>>>> On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@amd.com> wrote:
>>>>>>
>>>>>> ...
>>>>>>
>>>>>>>>> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
>>>>>>>>> counters available for assignment.
>>>>>>>>
>>>>>>>> Earlier I discussed with Reinette[1] what num_mbm_cntrs should
>>>>>>>> represent in a "soft-ABMC" implementation where assignment is
>>>>>>>> implemented by assigning an RMID, which would result in all events
>>>>>>>> being assigned at once.
>>>>>>>>
>>>>>>>> My main concern is how many "counters" you can assign by assigning
>>>>>>>> RMIDs. I recall Reinette proposed reporting the number of groups which
>>>>>>>> can be assigned separately from counters which can be assigned.
>>>>>>>
>>>>>>> More context may be needed here. Currently, num_mbm_cntrs indicates the
>>>>>>> number of counters available per domain, which is 32.
>>>>>>>
>>>>>>> At the moment, we can assign 2 counters to each group, meaning each RMID
>>>>>>> can be associated with 2 hardware counters. In theory, it's possible to
>>>>>>> assign all 32 hardware counters to a group—allowing one RMID to be linked
>>>>>>> with up to 32 counters. However, we currently lack the interface to
>>>>>>> support that level of assignment.
>>>>>>>
>>>>>>> For now, the plan is to support basic assignment and expand functionality
>>>>>>> later once we have the necessary data structure and requirements.
>>>>>>
>>>>>> Looks like some requirements did not make it into this implementation.
>>>>>> Do you recall the discussion that resulted in you writing [2]? Looks like
>>>>>> there is a question to Peter in there on how to determine how many "counters"
>>>>>> are available in soft-ABMC. I interpreted [3] at that time to mean that this
>>>>>> information would be available in a future AMD publication.
>>>>>
>>>>> We already have a method to determine the number of counters in soft-ABMC
>>>>> mode, which Peter has addressed [4].
>>>>>
>>>>> [4]
>>>>> https://lore.kernel.org/lkml/20250203132642.2746754-1-peternewman@google.com/
>>>>>
>>>>> This appears to be more of a workaround, and I doubt it will be included
>>>>> in any official AMD documentation. Additionally, the long-term direction
>>>>> is moving towards ABMC.
>>>>>
>>>>> I don’t believe this workaround needs to be part of the current series. It
>>>>> can be added later when soft-ABMC is implemented.
>>>>
>>>> Agreed. What about the plans described in [2]? (Thanks to Peter for
>>>> catching this!).
>>>>
>>>> It is important to keep track of requirements while working on a feature to
>>>> ensure that the implementation supports the planned use cases. Re-reading that
>>>> thread it is not clear to me how soft-ABMC's per-group assignment would look.
>>>> Could you please share how you see it progress from this implementation?
>>>> This includes the single event vs. multiple event assignment. I would like to
>>>> highlight that this is not a request for this to be supported in this implementation
>>>> but there needs to be a plan for how this can be supported on top of interfaces
>>>> established by this work.
>>>>
>>>
>>> Here’s my current understanding of soft-ABMC. Peter may have a more in-depth perspective on this.
>>>
>>> Soft-ABMC:
>>> a. num_mbm_cntrs: This is a software-defined limit based on the number of active RMIDs that can be supported. The value can be obtained using the code referenced in [4].
> 
> I would call it a hardware-defined limit that can be probed by software.
> 
> The main question is whether this file returns the exact number of
> RMIDs hardware can track or double that number (mbm_total_bytes +
> mbm_local_bytes) so that the value is always measured in events.

tl;dr: I continue [3] to find it most intuitive for num_mbm_cntrs to be the exact
number of "active" RMIDs that the system can support *and* changing the name of
the modes to help user interpret num_mbm_cntrs: "mbm_cntr_event_assign" for ABMC,
"mbm_cntr_group_assign" for soft-ABMC.

details
-------

We are now back to the previous discussion about what user can expect from
the interface. Let me try and re-cap that discussion so that we can all hopefully
get back on the same page. Please add corrections/updates where needed.

soft-ABMC
---------
  soft-ABMC manages "active" (term TBD) RMID assignment to monitor groups. When an
  "active" RMID is assigned to a monitor group then *all* MBM events (not LLC occupancy)
  in that monitor group are counted. "Active" RMID assignment can be done per domain.

  Requirement: resctrl should accurately reflect which events are counted. That is,
  we do not want resctrl to pretend to allow user to assign an "active" RMID to
  only one event in a monitor group while all events are actually counted.

  Caveat: To support rapid re-assignment of RMIDs to monitor groups, llc_occupancy
  event is disabled when soft-ABMC is enabled.

ABMC
----
  ABMC manages (hardware) counter assignment to monitor group (RMID), event pairs.
  When a hardware counter is assigned to an RMID, event pair then only that
  RMID, event is counted. Hardware counter assignment can be done per domain.


shared assignment
-----------------
A shared assignment applies to both soft-ABMC and ABMC. A user can designate a
"counter" (could be hardware counter or "active" RMID) as shared and that means
the counter within that domain is shared between different monitor groups and actual
assignment is scheduled by resctrl.  


user interface
--------------

Next, consider the interface while keeping above definitions and requirements in mind.

This series introduces (using implementation, not cover-letter):

/sys/fs/resctrl/info/L3_MON/num_mbm_cntrs 
"num_mbm_cntrs":                                                               
	The maximum number of monitoring counters (total of available and assigned
	counters) in each domain when the system supports mbm_cntr_assign mode. 

/sys/fs/resctrl/mbm_L3_assignments
"mbm_L3_assignments":                                                          
	This interface file is created when the mbm_cntr_assign mode is supported
	and shows the assignment status for each group.              

Consider "mbm_L3_assignments" first. The interface is documented for ABMC support
where it is possible to manage individual event assignment within monitor group.

For ABMC it is possible to assign just one event at a time and doing so consumes
one counter in that domain:

a) Starting state on system with 32 counters per domain, two events in default
   resource group consumes two counters in that domain:
# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
0=30;1=32
# cat /sys/fs/resctrl/mbm_L3_assignments
mbm_total_bytes:0=e;1=_
mbm_local_bytes:0=e;1=_

b) Assign counter to mbm_local_bytes in domain 1:
# echo "mbm_local_bytes:1=e" > /sys/fs/resctrl/mbm_L3_assignments
# cat /sys/fs/resctrl/mbm_L3_assignments
mbm_total_bytes:0=e;1=_
mbm_local_bytes:0=e;1=e
# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
0=30;1=31

The question is how this should look on soft-ABMC system. Let's say hypothetically
that on a soft-ABMC system it is possible to have 32 "active" RMIDs.

a) Starting state on system with 32 "active RMIDs" per domain, two events in default
   resource group consumes one RMID in that domain:

# cat /sys/fs/resctrl/mbm_L3_assignments
mbm_total_bytes:0=e;1=_
mbm_local_bytes:0=e;1=_

What should num_mbm_cntrs display?

Option A (counters are RMIDs):
# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
0=31;1=32

Option B (pretend RMIDs are events):
# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
0=62;1=64

b) Assign counter to mbm_local_bytes in domain 1:
# echo "mbm_local_bytes:1=e" > /sys/fs/resctrl/mbm_L3_assignments
# cat /sys/fs/resctrl/mbm_L3_assignments
mbm_total_bytes:0=e;1=e
mbm_local_bytes:0=e;1=e

Note that even though user requested only mbm_local_bytes to be assigned, it
actually results in both mbm_total_bytes and mbm_local_bytes to be assigned. This
ensures accurate state representation to user space but this also creates an
inconsistent user interface between soft-ABMC and ABMC since user space intends
to use the same interface but "sometimes" assigning one event results in assign
of one event while "sometimes" it results in assign of multiple events.

wrt "num_mbm_cntrs"

Option A (counters are RMIDs):
# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
0=31;1=31

Option B (pretend RMIDs are events):
# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
0=62;1=62 

Neither option seems ideal to me since the interface cannot be consistent
between ABMC and soft-ABMC.
As I mentioned in [2] it is not possible to hide ABMC and soft-ABMC behind
the same interface. When user space wants to monitor a particular monitor group
then it should be clear how that can be accomplished. Not knowing if
an assignment/unassignment to/from an event would impact one or all events
and whether it will consume one or multiple counters does not sound like a good
interface to me. 

As I understand current interface, user is required to know how ABMC and soft-ABMC
is implemented to be able to configure the system. For example, if user has file like:
	# cat /sys/fs/resctrl/mbm_L3_assignments
	mbm_total_bytes:0=e;1=e
	mbm_local_bytes:0=e;1=e
user must know underlying implementation to be able to manage monitoring of
events and assigning counters otherwise it will be a surprise to lose monitoring
of all events when unassigning one event.

This is why I proposed in [3] that the name of the mode reflects how user can interact
with the system. Instead of one "mbm_cntr_assign" mode there can be "mbm_cntr_event_assign"
that is used for ABMC and "mbm_cntr_group_assign" that is used for soft-ABMC. The mode should
make it clear what the system is capable of wrt counter assignments.

Considering this the interface should be clear:
num_mbm_cntrs: reflects the number of counters in each domain that can be assigned. In
"mbm_cntr_event_assign" this will be the number of counters that can be assigned to 
each event within a monitoring group, in "mbm_cntr_group_assign" this will be the number
of counters that can be assigned to entire monitoring groups impacting all MBM events.

mbm_L3_assignments: manages the counter assignment in each group. When user knows the mode
is "mbm_cntr_event_assign"/"mbm_cntr_group_assign" then it should be clear to user space how the
interface behaves wrt assignment, no surprises of multiple events impacted when
assigning/unassigning single event.

For soft-ABMC I thus find it most intuitive for num_mbm_cntrs to be the exact number
of "active" RMIDs that the system can support *and* changing the name of the modes
to help user interpret num_mbm_cntrs.

> 
> There's also the mongroup-RMID overcommit use case I described
> above[1]. On Intel we can safely assume that there are counters to
> back all RMIDs, so num_mbm_cntrs would be calculated directly from
> num_rmids.

This is about the:
	There's now more interest in Google for allowing explicit control of
	where RMIDs are assigned on Intel platforms. Even though the number of
	RMIDs implemented by hardware tends to be roughly the number of
	containers they want to support, they often still need to create
	containers when all RMIDs have already been allocated, which is not
	currently allowed. Once the container has been created and starts
	running, it's no longer possible to move its threads into a monitoring
	group whenever RMIDs should become available again, so it's important
	for resctrl to maintain an accurate task list for a container even
	when RMIDs are not available.

I see a monitor group as a collection of tasks that need to be monitored together.
The "task list" is the group of tasks that share a monitoring ID that
is required to be a valid ID since when any of the tasks are scheduled that ID is
written to the hardware. I intentionally tried to not use RMID since I believe
this is required for all archs.
I thus do not understand how a task can start running when it does not have
a valid monitoring ID. The idea of "deferred assignment" is not clear to me,
there can never be "unmonitored tasks", no? I think I am missing something here.

> I realized this use case is more difficult to implement on MPAM,
> because a PARTID is effectively a CLOSID+RMID, so deferring assigning
> a unique PARTID to a group also results in it being in a different
> allocation group. It will work if the unmonitored groups could find a
> way to share PARTIDs, but this has consequences on allocation - but
> hopefully no worse than sharing CLOSIDs on x86.
> 
> There's a lot of interest in monitoring ID overcommit in Google, so I
> think it's worth it for me to investigate the additional structural
> changes needed in resctrl (i.e., breaking the FS-level association
> between mongroups and HW monitoring IDs). Such a framework could be a
> better fit for soft-ABMC. For example, if overcommit is allowed, we
> would just report the number of simultaneous RMIDs we were able to
> probe as num_rmids. I would want the same shared assignment scheduler
> to be able to work with RMIDs and counters, though.
> 
> Thanks,
> -Peter
> 
> [1] https://lore.kernel.org/lkml/CALPaoChSzzU5mzMZsdT6CeyEn0WD1qdT9fKCoNW_ty4tojtrkw@mail.gmail.com/

Reinette

[2] https://lore.kernel.org/lkml/b9e48e8f-3035-4a7e-a983-ce829bd9215a@intel.com/
[3] https://lore.kernel.org/lkml/b3babdac-da08-4dfd-9544-47db31d574f5@intel.com/

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-21 14:27               ` Peter Newman
@ 2025-05-21 23:05                 ` Reinette Chatre
  2025-05-22  9:14                   ` Peter Newman
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-05-21 23:05 UTC (permalink / raw)
  To: Peter Newman
  Cc: Moger, Babu, babu.moger, corbet, tony.luck, tglx, mingo, bp,
	dave.hansen, james.morse, dave.martin, fenghuay, x86, hpa,
	paulmck, akpm, thuth, rostedt, ardb, gregkh, daniel.sneddon,
	jpoimboe, alexandre.chartre, pawan.kumar.gupta, thomas.lendacky,
	perry.yuan, seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Peter,

On 5/21/25 7:27 AM, Peter Newman wrote:
> On Wed, May 21, 2025 at 1:44 AM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>> On 5/20/25 4:25 PM, Moger, Babu wrote:

...
>>>
>>> Here’s my current understanding of soft-ABMC. Peter may have a more in-depth perspective on this.
>>>
>>> Soft-ABMC:
>>> a. num_mbm_cntrs: This is a software-defined limit based on the number of active RMIDs that can be supported. The value can be obtained using the code referenced in [4].
>>>
>>> b. Assignments: No hardware configuration is required. We simply need to ensure that no more than num_mbm_cntrs RMIDs are active at any given time.
>>>
>>> c. Configuration: Controlled via /info/L3_MON/mbm_total_bytes_config and mbm_local_bytes_config.
>>>
>>> d. Events: Only two events can be assigned(local and total).
>>>
>>> ABMC:
>>> a. num_mbm_cntrs: This is defined by the hardware.
>>> b. Assignments: Requires special MSR writes to assign counters.
>>> c. Configuration: Comes from /info/L3_MON/counter_configs/.
>>> d. Events: More than two events can be assigned to a group (currently up to 2).
>>>
>>> Commonalities:
>>> a. Assignments can be either exclusive or shared in both these modes.
>>>
>>> Given these, I believe we can easily accommodate soft-ABMC in this interface.
>>
>> This is not so obvious to me. It looks to me as though the user is forced to interpret
>> the content of resctrl files differently based on soft-ABMC vs ABMC making the interface
>> inconsistent and user thus needing to know details of implementations. This is what the previous
>> discussion I linked to aimed to address. It sounds to me as though you believe that this is no longer
>> an issue. Could you please show examples of what a user can expect from the interfaces and how a user
>> will interact with the interfaces on both a non-ABMC and ABMC system?
> 
> At the interface level, I think mbm_L3_assignments on a non-ABMC
> system would only need to contain a single line:
> 
> 0=s;1=s;...;31=s

It should be obvious to user space how to interpret the fields. When there is
thus a single "mbm_cntr_assign" mode used for ABMC and soft-ABMC a single
line like this would be difficult to parse since that would imply/require
that user space knows whether it is running on ABMC or soft-ABMC system,
which we should avoid.

If there are different modes, for example "mbm_cntr_event_assign" and
"mbm_cntr_group_assign" then this could be used by user space to distinguish
how to interact with mbm_L3_assignments making something like this possible.

> 
> But maybe for consistency we would synthesize a single, unmodifiable
> counter configuration to reflect that allocating an RMID in a domain
> results in assignment to all events and deallocating the RMID
> unassigns all events. We could call it "group" to say it's assigning
> at the group level, or perhaps just '*':
> 
> *:0=s;1=s;...;31=s
> 
> I'm not sure about allowing a '*' on ABMC hardware, because it could
> be interpreted as allocating a lot of counters when a large number of
> event configurations exist.
> 
> *:0=s;1=s;...;31=s
> 

Either could work also. Whether it is "group" or "*" ABMC systems could
respond with "not supported". Will think about this more but would
like to hear your opinion about the flexibility that distinguishing between
a "mbm_cntr_event_assign" and "mbm_cntr_group_assign" mode provides.

Reinette

> -Peter
> 
> 
>>
>> Thank you
>>
>> Reinette
>>
>>>
>>>>>>
>>>>>> [2] https://lore.kernel.org/lkml/afb99efe-0de2-f7ad-d0b8-f2a0ea998efd@amd.com/
>>>>>> [3] https://lore.kernel.org/lkml/CALPaoCg3KpF94g2MEmfP_Ro2mQZYFA8sKVkmb+7isotKNgdY9A@mail.gmail.com/
>>>>>
>>>>
>>>>
>>>
>>


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-21 23:03                 ` Reinette Chatre
@ 2025-05-21 23:43                   ` Luck, Tony
  2025-05-22  0:10                     ` Reinette Chatre
  2025-05-22 15:44                   ` Moger, Babu
  1 sibling, 1 reply; 114+ messages in thread
From: Luck, Tony @ 2025-05-21 23:43 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Peter Newman, Moger, Babu, babu.moger, corbet, tglx, mingo, bp,
	dave.hansen, james.morse, dave.martin, fenghuay, x86, hpa,
	paulmck, akpm, thuth, rostedt, ardb, gregkh, daniel.sneddon,
	jpoimboe, alexandre.chartre, pawan.kumar.gupta, thomas.lendacky,
	perry.yuan, seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

On Wed, May 21, 2025 at 04:03:37PM -0700, Reinette Chatre wrote:
> Hi Peter and Babu,
> 
> On 5/21/25 2:18 AM, Peter Newman wrote:
> > Hi Babu/Reinette,
> > 
> > On Wed, May 21, 2025 at 1:44 AM Reinette Chatre
> > <reinette.chatre@intel.com> wrote:
> >>
> >> Hi Babu,
> >>
> >> On 5/20/25 4:25 PM, Moger, Babu wrote:
> >>> Hi Reinette,
> >>>
> >>> On 5/20/2025 1:23 PM, Reinette Chatre wrote:
> >>>> Hi Babu,
> >>>>
> >>>> On 5/20/25 10:51 AM, Moger, Babu wrote:
> >>>>> Hi Reinette,
> >>>>>
> >>>>> On 5/20/25 11:06, Reinette Chatre wrote:
> >>>>>> Hi Babu,
> >>>>>>
> >>>>>> On 5/20/25 8:28 AM, Moger, Babu wrote:
> >>>>>>> On 5/19/25 10:59, Peter Newman wrote:
> >>>>>>>> On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@amd.com> wrote:
> >>>>>>
> >>>>>> ...
> >>>>>>
> >>>>>>>>> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
> >>>>>>>>> counters available for assignment.
> >>>>>>>>
> >>>>>>>> Earlier I discussed with Reinette[1] what num_mbm_cntrs should
> >>>>>>>> represent in a "soft-ABMC" implementation where assignment is
> >>>>>>>> implemented by assigning an RMID, which would result in all events
> >>>>>>>> being assigned at once.
> >>>>>>>>
> >>>>>>>> My main concern is how many "counters" you can assign by assigning
> >>>>>>>> RMIDs. I recall Reinette proposed reporting the number of groups which
> >>>>>>>> can be assigned separately from counters which can be assigned.
> >>>>>>>
> >>>>>>> More context may be needed here. Currently, num_mbm_cntrs indicates the
> >>>>>>> number of counters available per domain, which is 32.
> >>>>>>>
> >>>>>>> At the moment, we can assign 2 counters to each group, meaning each RMID
> >>>>>>> can be associated with 2 hardware counters. In theory, it's possible to
> >>>>>>> assign all 32 hardware counters to a group—allowing one RMID to be linked
> >>>>>>> with up to 32 counters. However, we currently lack the interface to
> >>>>>>> support that level of assignment.
> >>>>>>>
> >>>>>>> For now, the plan is to support basic assignment and expand functionality
> >>>>>>> later once we have the necessary data structure and requirements.
> >>>>>>
> >>>>>> Looks like some requirements did not make it into this implementation.
> >>>>>> Do you recall the discussion that resulted in you writing [2]? Looks like
> >>>>>> there is a question to Peter in there on how to determine how many "counters"
> >>>>>> are available in soft-ABMC. I interpreted [3] at that time to mean that this
> >>>>>> information would be available in a future AMD publication.
> >>>>>
> >>>>> We already have a method to determine the number of counters in soft-ABMC
> >>>>> mode, which Peter has addressed [4].
> >>>>>
> >>>>> [4]
> >>>>> https://lore.kernel.org/lkml/20250203132642.2746754-1-peternewman@google.com/
> >>>>>
> >>>>> This appears to be more of a workaround, and I doubt it will be included
> >>>>> in any official AMD documentation. Additionally, the long-term direction
> >>>>> is moving towards ABMC.
> >>>>>
> >>>>> I don’t believe this workaround needs to be part of the current series. It
> >>>>> can be added later when soft-ABMC is implemented.
> >>>>
> >>>> Agreed. What about the plans described in [2]? (Thanks to Peter for
> >>>> catching this!).
> >>>>
> >>>> It is important to keep track of requirements while working on a feature to
> >>>> ensure that the implementation supports the planned use cases. Re-reading that
> >>>> thread it is not clear to me how soft-ABMC's per-group assignment would look.
> >>>> Could you please share how you see it progress from this implementation?
> >>>> This includes the single event vs. multiple event assignment. I would like to
> >>>> highlight that this is not a request for this to be supported in this implementation
> >>>> but there needs to be a plan for how this can be supported on top of interfaces
> >>>> established by this work.
> >>>>
> >>>
> >>> Here’s my current understanding of soft-ABMC. Peter may have a more in-depth perspective on this.
> >>>
> >>> Soft-ABMC:
> >>> a. num_mbm_cntrs: This is a software-defined limit based on the number of active RMIDs that can be supported. The value can be obtained using the code referenced in [4].
> > 
> > I would call it a hardware-defined limit that can be probed by software.
> > 
> > The main question is whether this file returns the exact number of
> > RMIDs hardware can track or double that number (mbm_total_bytes +
> > mbm_local_bytes) so that the value is always measured in events.
> 
> tl;dr: I continue [3] to find it most intuitive for num_mbm_cntrs to be the exact
> number of "active" RMIDs that the system can support *and* changing the name of
> the modes to help user interpret num_mbm_cntrs: "mbm_cntr_event_assign" for ABMC,
> "mbm_cntr_group_assign" for soft-ABMC.
> 
> details
> -------
> 
> We are now back to the previous discussion about what user can expect from
> the interface. Let me try and re-cap that discussion so that we can all hopefully
> get back on the same page. Please add corrections/updates where needed.
> 
> soft-ABMC
> ---------
>   soft-ABMC manages "active" (term TBD) RMID assignment to monitor groups. When an
>   "active" RMID is assigned to a monitor group then *all* MBM events (not LLC occupancy)
>   in that monitor group are counted. "Active" RMID assignment can be done per domain.
> 
>   Requirement: resctrl should accurately reflect which events are counted. That is,
>   we do not want resctrl to pretend to allow user to assign an "active" RMID to
>   only one event in a monitor group while all events are actually counted.
> 
>   Caveat: To support rapid re-assignment of RMIDs to monitor groups, llc_occupancy
>   event is disabled when soft-ABMC is enabled.
> 
> ABMC
> ----
>   ABMC manages (hardware) counter assignment to monitor group (RMID), event pairs.
>   When a hardware counter is assigned to an RMID, event pair then only that
>   RMID, event is counted. Hardware counter assignment can be done per domain.
> 
> 
> shared assignment
> -----------------
> A shared assignment applies to both soft-ABMC and ABMC. A user can designate a
> "counter" (could be hardware counter or "active" RMID) as shared and that means
> the counter within that domain is shared between different monitor groups and actual
> assignment is scheduled by resctrl.  
> 
> 
> user interface
> --------------
> 
> Next, consider the interface while keeping above definitions and requirements in mind.
> 
> This series introduces (using implementation, not cover-letter):
> 
> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs 
> "num_mbm_cntrs":                                                               
> 	The maximum number of monitoring counters (total of available and assigned
> 	counters) in each domain when the system supports mbm_cntr_assign mode. 
> 
> /sys/fs/resctrl/mbm_L3_assignments
> "mbm_L3_assignments":                                                          
> 	This interface file is created when the mbm_cntr_assign mode is supported
> 	and shows the assignment status for each group.              
> 
> Consider "mbm_L3_assignments" first. The interface is documented for ABMC support
> where it is possible to manage individual event assignment within monitor group.
> 
> For ABMC it is possible to assign just one event at a time and doing so consumes
> one counter in that domain:
> 
> a) Starting state on system with 32 counters per domain, two events in default
>    resource group consumes two counters in that domain:
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 0=30;1=32
> # cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=e;1=_
> mbm_local_bytes:0=e;1=_
> 
> b) Assign counter to mbm_local_bytes in domain 1:
> # echo "mbm_local_bytes:1=e" > /sys/fs/resctrl/mbm_L3_assignments
> # cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=e;1=_
> mbm_local_bytes:0=e;1=e
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 0=30;1=31
> 
> The question is how this should look on soft-ABMC system. Let's say hypothetically
> that on a soft-ABMC system it is possible to have 32 "active" RMIDs.
> 
> a) Starting state on system with 32 "active RMIDs" per domain, two events in default
>    resource group consumes one RMID in that domain:
> 
> # cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=e;1=_
> mbm_local_bytes:0=e;1=_
> 
> What should num_mbm_cntrs display?
> 
> Option A (counters are RMIDs):
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 0=31;1=32
> 
> Option B (pretend RMIDs are events):
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 0=62;1=64
> 
> b) Assign counter to mbm_local_bytes in domain 1:
> # echo "mbm_local_bytes:1=e" > /sys/fs/resctrl/mbm_L3_assignments
> # cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=e;1=e
> mbm_local_bytes:0=e;1=e
> 
> Note that even though user requested only mbm_local_bytes to be assigned, it
> actually results in both mbm_total_bytes and mbm_local_bytes to be assigned. This
> ensures accurate state representation to user space but this also creates an
> inconsistent user interface between soft-ABMC and ABMC since user space intends
> to use the same interface but "sometimes" assigning one event results in assign
> of one event while "sometimes" it results in assign of multiple events.
> 
> wrt "num_mbm_cntrs"
> 
> Option A (counters are RMIDs):
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 0=31;1=31
> 
> Option B (pretend RMIDs are events):
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 0=62;1=62 
> 
> Neither option seems ideal to me since the interface cannot be consistent
> between ABMC and soft-ABMC.
> As I mentioned in [2] it is not possible to hide ABMC and soft-ABMC behind
> the same interface. When user space wants to monitor a particular monitor group
> then it should be clear how that can be accomplished. Not knowing if
> an assignment/unassignment to/from an event would impact one or all events
> and whether it will consume one or multiple counters does not sound like a good
> interface to me. 
> 
> As I understand current interface, user is required to know how ABMC and soft-ABMC
> is implemented to be able to configure the system. For example, if user has file like:
> 	# cat /sys/fs/resctrl/mbm_L3_assignments
> 	mbm_total_bytes:0=e;1=e
> 	mbm_local_bytes:0=e;1=e
> user must know underlying implementation to be able to manage monitoring of
> events and assigning counters otherwise it will be a surprise to lose monitoring
> of all events when unassigning one event.
> 
> This is why I proposed in [3] that the name of the mode reflects how user can interact
> with the system. Instead of one "mbm_cntr_assign" mode there can be "mbm_cntr_event_assign"
> that is used for ABMC and "mbm_cntr_group_assign" that is used for soft-ABMC. The mode should
> make it clear what the system is capable of wrt counter assignments.
> 
> Considering this the interface should be clear:
> num_mbm_cntrs: reflects the number of counters in each domain that can be assigned. In
> "mbm_cntr_event_assign" this will be the number of counters that can be assigned to 
> each event within a monitoring group, in "mbm_cntr_group_assign" this will be the number
> of counters that can be assigned to entire monitoring groups impacting all MBM events.
> 
> mbm_L3_assignments: manages the counter assignment in each group. When user knows the mode
> is "mbm_cntr_event_assign"/"mbm_cntr_group_assign" then it should be clear to user space how the
> interface behaves wrt assignment, no surprises of multiple events impacted when
> assigning/unassigning single event.
> 
> For soft-ABMC I thus find it most intuitive for num_mbm_cntrs to be the exact number
> of "active" RMIDs that the system can support *and* changing the name of the modes
> to help user interpret num_mbm_cntrs.
> 
> > 
> > There's also the mongroup-RMID overcommit use case I described
> > above[1]. On Intel we can safely assume that there are counters to
> > back all RMIDs, so num_mbm_cntrs would be calculated directly from
> > num_rmids.
> 
> This is about the:
> 	There's now more interest in Google for allowing explicit control of
> 	where RMIDs are assigned on Intel platforms. Even though the number of
> 	RMIDs implemented by hardware tends to be roughly the number of
> 	containers they want to support, they often still need to create
> 	containers when all RMIDs have already been allocated, which is not
> 	currently allowed. Once the container has been created and starts
> 	running, it's no longer possible to move its threads into a monitoring
> 	group whenever RMIDs should become available again, so it's important
> 	for resctrl to maintain an accurate task list for a container even
> 	when RMIDs are not available.
> 
> I see a monitor group as a collection of tasks that need to be monitored together.
> The "task list" is the group of tasks that share a monitoring ID that
> is required to be a valid ID since when any of the tasks are scheduled that ID is
> written to the hardware. I intentionally tried to not use RMID since I believe
> this is required for all archs.
> I thus do not understand how a task can start running when it does not have
> a valid monitoring ID. The idea of "deferred assignment" is not clear to me,
> there can never be "unmonitored tasks", no? I think I am missing something here.

In the AMD/RMID implemenentation this might be achieved with something
extra in the task structure to denote whether a task is in a monitored
group or not. E.g. We add "task->rmid_valid" as well as "task->rmid".
Tasks in an unmonitored group retain their "task->rmid" (that's what
identifies them as a member of a group) but have task->rmid_valid set
to false.  Context switch code would be updated to load "0" into the
IA32_PQR_ASSOC.RMID field for tasks without a valid RMID. So they
would still be monitored, but activity would be bundled with all
tasks in the default resctrl group.

Presumably something analogous could be done for ARM/MPAM.

> > I realized this use case is more difficult to implement on MPAM,
> > because a PARTID is effectively a CLOSID+RMID, so deferring assigning
> > a unique PARTID to a group also results in it being in a different
> > allocation group. It will work if the unmonitored groups could find a
> > way to share PARTIDs, but this has consequences on allocation - but
> > hopefully no worse than sharing CLOSIDs on x86.
> > 
> > There's a lot of interest in monitoring ID overcommit in Google, so I
> > think it's worth it for me to investigate the additional structural
> > changes needed in resctrl (i.e., breaking the FS-level association
> > between mongroups and HW monitoring IDs). Such a framework could be a
> > better fit for soft-ABMC. For example, if overcommit is allowed, we
> > would just report the number of simultaneous RMIDs we were able to
> > probe as num_rmids. I would want the same shared assignment scheduler
> > to be able to work with RMIDs and counters, though.
> > 
> > Thanks,
> > -Peter
> > 
> > [1] https://lore.kernel.org/lkml/CALPaoChSzzU5mzMZsdT6CeyEn0WD1qdT9fKCoNW_ty4tojtrkw@mail.gmail.com/
> 
> Reinette
> 
> [2] https://lore.kernel.org/lkml/b9e48e8f-3035-4a7e-a983-ce829bd9215a@intel.com/
> [3] https://lore.kernel.org/lkml/b3babdac-da08-4dfd-9544-47db31d574f5@intel.com/

-Tony

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-21 23:43                   ` Luck, Tony
@ 2025-05-22  0:10                     ` Reinette Chatre
  2025-05-22  0:21                       ` Luck, Tony
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-05-22  0:10 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Peter Newman, Moger, Babu, babu.moger, corbet, tglx, mingo, bp,
	dave.hansen, james.morse, dave.martin, fenghuay, x86, hpa,
	paulmck, akpm, thuth, rostedt, ardb, gregkh, daniel.sneddon,
	jpoimboe, alexandre.chartre, pawan.kumar.gupta, thomas.lendacky,
	perry.yuan, seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Tony,

On 5/21/25 4:43 PM, Luck, Tony wrote:
> On Wed, May 21, 2025 at 04:03:37PM -0700, Reinette Chatre wrote:
>> Hi Peter and Babu,
>>
>> On 5/21/25 2:18 AM, Peter Newman wrote:

..

>>> There's also the mongroup-RMID overcommit use case I described
>>> above[1]. On Intel we can safely assume that there are counters to
>>> back all RMIDs, so num_mbm_cntrs would be calculated directly from
>>> num_rmids.
>>
>> This is about the:
>> 	There's now more interest in Google for allowing explicit control of
>> 	where RMIDs are assigned on Intel platforms. Even though the number of
>> 	RMIDs implemented by hardware tends to be roughly the number of
>> 	containers they want to support, they often still need to create
>> 	containers when all RMIDs have already been allocated, which is not
>> 	currently allowed. Once the container has been created and starts
>> 	running, it's no longer possible to move its threads into a monitoring
>> 	group whenever RMIDs should become available again, so it's important
>> 	for resctrl to maintain an accurate task list for a container even
>> 	when RMIDs are not available.
>>
>> I see a monitor group as a collection of tasks that need to be monitored together.
>> The "task list" is the group of tasks that share a monitoring ID that
>> is required to be a valid ID since when any of the tasks are scheduled that ID is
>> written to the hardware. I intentionally tried to not use RMID since I believe
>> this is required for all archs.
>> I thus do not understand how a task can start running when it does not have
>> a valid monitoring ID. The idea of "deferred assignment" is not clear to me,
>> there can never be "unmonitored tasks", no? I think I am missing something here.
> 
> In the AMD/RMID implemenentation this might be achieved with something
> extra in the task structure to denote whether a task is in a monitored
> group or not. E.g. We add "task->rmid_valid" as well as "task->rmid".
> Tasks in an unmonitored group retain their "task->rmid" (that's what
> identifies them as a member of a group) but have task->rmid_valid set
> to false.  Context switch code would be updated to load "0" into the
> IA32_PQR_ASSOC.RMID field for tasks without a valid RMID. So they
> would still be monitored, but activity would be bundled with all
> tasks in the default resctrl group.
> 
> Presumably something analogous could be done for ARM/MPAM.
> 

I do not interpret this as an unmonitored task but instead a task that
belongs to the default resource group. Specifically, any data accumulated by
such a task is attributed to the default resource group. Having tasks
in a separate group but their monitoring data accumulating in/contributed to
the default resource group (that has its own set of tasks) sounds wrong to me. 
Such an implementation makes any monitoring data of default resource group
invalid, and by extension impossible to use default resource group to manage
an allocation for a group of monitor groups if user space needs insight
in monitoring data across all these monitor groups. User space will need to
interact with resctrl differently and individually query monitor groups instead
of CTRL_MON group once.

Reinette


^ permalink raw reply	[flat|nested] 114+ messages in thread

* RE: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-22  0:10                     ` Reinette Chatre
@ 2025-05-22  0:21                       ` Luck, Tony
  2025-05-22  8:47                         ` Peter Newman
  0 siblings, 1 reply; 114+ messages in thread
From: Luck, Tony @ 2025-05-22  0:21 UTC (permalink / raw)
  To: Chatre, Reinette
  Cc: Peter Newman, Moger, Babu, babu.moger@amd.com, corbet@lwn.net,
	tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, james.morse@arm.com,
	dave.martin@arm.com, fenghuay@nvidia.com, x86@kernel.org,
	hpa@zytor.com, paulmck@kernel.org, akpm@linux-foundation.org,
	thuth@redhat.com, rostedt@goodmis.org, ardb@kernel.org,
	gregkh@linuxfoundation.org, daniel.sneddon@linux.intel.com,
	jpoimboe@kernel.org, alexandre.chartre@oracle.com,
	pawan.kumar.gupta@linux.intel.com, thomas.lendacky@amd.com,
	perry.yuan@amd.com, seanjc@google.com, Huang, Kai, Li, Xiaoyao,
	kan.liang@linux.intel.com, Li, Xin3, ebiggers@google.com,
	xin@zytor.com, Mehta, Sohil, andrew.cooper3@citrix.com,
	mario.limonciello@amd.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Wieczor-Retman, Maciej,
	Eranian, Stephane, Xiaojian.Du@amd.com, gautham.shenoy@amd.com

> >>> There's also the mongroup-RMID overcommit use case I described
> >>> above[1]. On Intel we can safely assume that there are counters to
> >>> back all RMIDs, so num_mbm_cntrs would be calculated directly from
> >>> num_rmids.
> >>
> >> This is about the:
> >>    There's now more interest in Google for allowing explicit control of
> >>    where RMIDs are assigned on Intel platforms. Even though the number of
> >>    RMIDs implemented by hardware tends to be roughly the number of
> >>    containers they want to support, they often still need to create
> >>    containers when all RMIDs have already been allocated, which is not
> >>    currently allowed. Once the container has been created and starts
> >>    running, it's no longer possible to move its threads into a monitoring
> >>    group whenever RMIDs should become available again, so it's important
> >>    for resctrl to maintain an accurate task list for a container even
> >>    when RMIDs are not available.
> >>
> >> I see a monitor group as a collection of tasks that need to be monitored together.
> >> The "task list" is the group of tasks that share a monitoring ID that
> >> is required to be a valid ID since when any of the tasks are scheduled that ID is
> >> written to the hardware. I intentionally tried to not use RMID since I believe
> >> this is required for all archs.
> >> I thus do not understand how a task can start running when it does not have
> >> a valid monitoring ID. The idea of "deferred assignment" is not clear to me,
> >> there can never be "unmonitored tasks", no? I think I am missing something here.
> >
> > In the AMD/RMID implemenentation this might be achieved with something
> > extra in the task structure to denote whether a task is in a monitored
> > group or not. E.g. We add "task->rmid_valid" as well as "task->rmid".
> > Tasks in an unmonitored group retain their "task->rmid" (that's what
> > identifies them as a member of a group) but have task->rmid_valid set
> > to false.  Context switch code would be updated to load "0" into the
> > IA32_PQR_ASSOC.RMID field for tasks without a valid RMID. So they
> > would still be monitored, but activity would be bundled with all
> > tasks in the default resctrl group.
> >
> > Presumably something analogous could be done for ARM/MPAM.
> >
>
> I do not interpret this as an unmonitored task but instead a task that
> belongs to the default resource group. Specifically, any data accumulated by
> such a task is attributed to the default resource group. Having tasks
> in a separate group but their monitoring data accumulating in/contributed to
> the default resource group (that has its own set of tasks) sounds wrong to me.
> Such an implementation makes any monitoring data of default resource group
> invalid, and by extension impossible to use default resource group to manage
> an allocation for a group of monitor groups if user space needs insight
> in monitoring data across all these monitor groups. User space will need to
> interact with resctrl differently and individually query monitor groups instead
> of CTRL_MON group once.

Maybe assign one of the limited supply of RMIDs for these "unmonitored"
tasks. Populate a resctrl group named "unmonitored" that lists all the
unmonitored tasks in a (read-only) "tasks" file. And supply all the counts
for these tasks in normal looking "mon_data" directory.

-Tony

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-22  0:21                       ` Luck, Tony
@ 2025-05-22  8:47                         ` Peter Newman
  2025-05-22 16:32                           ` Reinette Chatre
  2025-05-22 17:21                           ` Luck, Tony
  0 siblings, 2 replies; 114+ messages in thread
From: Peter Newman @ 2025-05-22  8:47 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Chatre, Reinette, Moger, Babu, babu.moger@amd.com, corbet@lwn.net,
	tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, james.morse@arm.com,
	dave.martin@arm.com, fenghuay@nvidia.com, x86@kernel.org,
	hpa@zytor.com, paulmck@kernel.org, akpm@linux-foundation.org,
	thuth@redhat.com, rostedt@goodmis.org, ardb@kernel.org,
	gregkh@linuxfoundation.org, daniel.sneddon@linux.intel.com,
	jpoimboe@kernel.org, alexandre.chartre@oracle.com,
	pawan.kumar.gupta@linux.intel.com, thomas.lendacky@amd.com,
	perry.yuan@amd.com, seanjc@google.com, Huang, Kai, Li, Xiaoyao,
	kan.liang@linux.intel.com, Li, Xin3, ebiggers@google.com,
	xin@zytor.com, Mehta, Sohil, andrew.cooper3@citrix.com,
	mario.limonciello@amd.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Wieczor-Retman, Maciej,
	Eranian, Stephane, Xiaojian.Du@amd.com, gautham.shenoy@amd.com

Hi Tony, Reinette,

On Thu, May 22, 2025 at 2:21 AM Luck, Tony <tony.luck@intel.com> wrote:
>
> > >>> There's also the mongroup-RMID overcommit use case I described
> > >>> above[1]. On Intel we can safely assume that there are counters to
> > >>> back all RMIDs, so num_mbm_cntrs would be calculated directly from
> > >>> num_rmids.
> > >>
> > >> This is about the:
> > >>    There's now more interest in Google for allowing explicit control of
> > >>    where RMIDs are assigned on Intel platforms. Even though the number of
> > >>    RMIDs implemented by hardware tends to be roughly the number of
> > >>    containers they want to support, they often still need to create
> > >>    containers when all RMIDs have already been allocated, which is not
> > >>    currently allowed. Once the container has been created and starts
> > >>    running, it's no longer possible to move its threads into a monitoring
> > >>    group whenever RMIDs should become available again, so it's important
> > >>    for resctrl to maintain an accurate task list for a container even
> > >>    when RMIDs are not available.
> > >>
> > >> I see a monitor group as a collection of tasks that need to be monitored together.
> > >> The "task list" is the group of tasks that share a monitoring ID that
> > >> is required to be a valid ID since when any of the tasks are scheduled that ID is
> > >> written to the hardware. I intentionally tried to not use RMID since I believe
> > >> this is required for all archs.
> > >> I thus do not understand how a task can start running when it does not have
> > >> a valid monitoring ID. The idea of "deferred assignment" is not clear to me,
> > >> there can never be "unmonitored tasks", no? I think I am missing something here.

You are correct. I did forget to mention something...

> > >
> > > In the AMD/RMID implemenentation this might be achieved with something
> > > extra in the task structure to denote whether a task is in a monitored
> > > group or not. E.g. We add "task->rmid_valid" as well as "task->rmid".
> > > Tasks in an unmonitored group retain their "task->rmid" (that's what
> > > identifies them as a member of a group) but have task->rmid_valid set
> > > to false.  Context switch code would be updated to load "0" into the
> > > IA32_PQR_ASSOC.RMID field for tasks without a valid RMID. So they
> > > would still be monitored, but activity would be bundled with all
> > > tasks in the default resctrl group.
> > >
> > > Presumably something analogous could be done for ARM/MPAM.
> > >
> >
> > I do not interpret this as an unmonitored task but instead a task that
> > belongs to the default resource group. Specifically, any data accumulated by
> > such a task is attributed to the default resource group. Having tasks
> > in a separate group but their monitoring data accumulating in/contributed to
> > the default resource group (that has its own set of tasks) sounds wrong to me.
> > Such an implementation makes any monitoring data of default resource group
> > invalid, and by extension impossible to use default resource group to manage
> > an allocation for a group of monitor groups if user space needs insight
> > in monitoring data across all these monitor groups. User space will need to
> > interact with resctrl differently and individually query monitor groups instead
> > of CTRL_MON group once.
>
> Maybe assign one of the limited supply of RMIDs for these "unmonitored"
> tasks. Populate a resctrl group named "unmonitored" that lists all the
> unmonitored tasks in a (read-only) "tasks" file. And supply all the counts
> for these tasks in normal looking "mon_data" directory.

I needed to switch to an rdtgroup struct pointer rather than hardware
IDs in the task structure to indicate group membership[1], otherwise
it's not possible to determine which tasks are in a group when it
doesn't have a unique HW ID value.

Also this is required for shared assignment so that changing a group's
IDs in a domain only requires updating running tasks rather than
needing to search the entire task list, which would lead to the same
problem we encountered in mongroup rename[2].

-Peter

[1] https://lore.kernel.org/lkml/20240325172707.73966-5-peternewman@google.com/
[2] https://lore.kernel.org/lkml/CALPaoCh0SbG1+VbbgcxjubE7Cc2Pb6QqhG3NH6X=WwsNfqNjtA@mail.gmail.com/

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-21 23:05                 ` Reinette Chatre
@ 2025-05-22  9:14                   ` Peter Newman
  2025-05-22 16:33                     ` Reinette Chatre
  0 siblings, 1 reply; 114+ messages in thread
From: Peter Newman @ 2025-05-22  9:14 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Moger, Babu, babu.moger, corbet, tony.luck, tglx, mingo, bp,
	dave.hansen, james.morse, dave.martin, fenghuay, x86, hpa,
	paulmck, akpm, thuth, rostedt, ardb, gregkh, daniel.sneddon,
	jpoimboe, alexandre.chartre, pawan.kumar.gupta, thomas.lendacky,
	perry.yuan, seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On Thu, May 22, 2025 at 1:05 AM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> Hi Peter,
>
> On 5/21/25 7:27 AM, Peter Newman wrote:
> > On Wed, May 21, 2025 at 1:44 AM Reinette Chatre
> > <reinette.chatre@intel.com> wrote:
> >> On 5/20/25 4:25 PM, Moger, Babu wrote:
>
> ...
> >>>
> >>> Here’s my current understanding of soft-ABMC. Peter may have a more in-depth perspective on this.
> >>>
> >>> Soft-ABMC:
> >>> a. num_mbm_cntrs: This is a software-defined limit based on the number of active RMIDs that can be supported. The value can be obtained using the code referenced in [4].
> >>>
> >>> b. Assignments: No hardware configuration is required. We simply need to ensure that no more than num_mbm_cntrs RMIDs are active at any given time.
> >>>
> >>> c. Configuration: Controlled via /info/L3_MON/mbm_total_bytes_config and mbm_local_bytes_config.
> >>>
> >>> d. Events: Only two events can be assigned(local and total).
> >>>
> >>> ABMC:
> >>> a. num_mbm_cntrs: This is defined by the hardware.
> >>> b. Assignments: Requires special MSR writes to assign counters.
> >>> c. Configuration: Comes from /info/L3_MON/counter_configs/.
> >>> d. Events: More than two events can be assigned to a group (currently up to 2).
> >>>
> >>> Commonalities:
> >>> a. Assignments can be either exclusive or shared in both these modes.
> >>>
> >>> Given these, I believe we can easily accommodate soft-ABMC in this interface.
> >>
> >> This is not so obvious to me. It looks to me as though the user is forced to interpret
> >> the content of resctrl files differently based on soft-ABMC vs ABMC making the interface
> >> inconsistent and user thus needing to know details of implementations. This is what the previous
> >> discussion I linked to aimed to address. It sounds to me as though you believe that this is no longer
> >> an issue. Could you please show examples of what a user can expect from the interfaces and how a user
> >> will interact with the interfaces on both a non-ABMC and ABMC system?
> >
> > At the interface level, I think mbm_L3_assignments on a non-ABMC
> > system would only need to contain a single line:
> >
> > 0=s;1=s;...;31=s
>
> It should be obvious to user space how to interpret the fields. When there is
> thus a single "mbm_cntr_assign" mode used for ABMC and soft-ABMC a single
> line like this would be difficult to parse since that would imply/require
> that user space knows whether it is running on ABMC or soft-ABMC system,
> which we should avoid.
>
> If there are different modes, for example "mbm_cntr_event_assign" and
> "mbm_cntr_group_assign" then this could be used by user space to distinguish
> how to interact with mbm_L3_assignments making something like this possible.

I meant to say I was proposing the format of this file when in the
group assignment mode. I didn't mean to imply that a separate mode
wasn't needed.

>
> >
> > But maybe for consistency we would synthesize a single, unmodifiable
> > counter configuration to reflect that allocating an RMID in a domain
> > results in assignment to all events and deallocating the RMID
> > unassigns all events. We could call it "group" to say it's assigning
> > at the group level, or perhaps just '*':
> >
> > *:0=s;1=s;...;31=s
> >
> > I'm not sure about allowing a '*' on ABMC hardware, because it could
> > be interpreted as allocating a lot of counters when a large number of
> > event configurations exist.
> >
> > *:0=s;1=s;...;31=s
> >
>
> Either could work also. Whether it is "group" or "*" ABMC systems could
> respond with "not supported". Will think about this more but would
> like to hear your opinion about the flexibility that distinguishing between
> a "mbm_cntr_event_assign" and "mbm_cntr_group_assign" mode provides.

I agree it's clearer when they are separate modes. Between "*" and
"group", I prefer "group" because it seems the least ambiguous.

I just want to make sure we'd never want both modes at the same time,
such as an implementation with both a small number of monitoring IDs
and a small number of MBM counters. I support one MPAM implementation
that has a small number of PARTIDs and only one MBWU counter per
domain. Fingers crossed that the number of PARTIDs it supports isn't
small compared to the number of jobs we would run on it. Otherwise
maybe it will work out to just pick the more limited of the two
(monitor IDs or counters) and make allocation of one drive the other.

(In case you read this before my earlier reply[1], see the note about
rdtgroup pointers in the task_struct, as this is a prerequisite for
overcommitting HW monitor IDs.)

Thanks,
-Peter

[1] https://lore.kernel.org/lkml/CALPaoCjh_NXQLtNBqei=7a6Jsr17fEnPO+kqMaNq4xNu2UPDJA@mail.gmail.com/

>
> Reinette
>
> > -Peter
> >
> >
> >>
> >> Thank you
> >>
> >> Reinette
> >>
> >>>
> >>>>>>
> >>>>>> [2] https://lore.kernel.org/lkml/afb99efe-0de2-f7ad-d0b8-f2a0ea998efd@amd.com/
> >>>>>> [3] https://lore.kernel.org/lkml/CALPaoCg3KpF94g2MEmfP_Ro2mQZYFA8sKVkmb+7isotKNgdY9A@mail.gmail.com/
> >>>>>
> >>>>
> >>>>
> >>>
> >>
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-21 23:03                 ` Reinette Chatre
  2025-05-21 23:43                   ` Luck, Tony
@ 2025-05-22 15:44                   ` Moger, Babu
  2025-05-22 16:33                     ` Reinette Chatre
  1 sibling, 1 reply; 114+ messages in thread
From: Moger, Babu @ 2025-05-22 15:44 UTC (permalink / raw)
  To: Reinette Chatre, Peter Newman
  Cc: Moger, Babu, corbet, tony.luck, tglx, mingo, bp, dave.hansen,
	james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, maciej.wieczor-retman, eranian, Xiaojian.Du,
	gautham.shenoy

Hi  Reinette,

On 5/21/25 18:03, Reinette Chatre wrote:
> Hi Peter and Babu,
> 
> On 5/21/25 2:18 AM, Peter Newman wrote:
>> Hi Babu/Reinette,
>>
>> On Wed, May 21, 2025 at 1:44 AM Reinette Chatre
>> <reinette.chatre@intel.com> wrote:
>>>
>>> Hi Babu,
>>>
>>> On 5/20/25 4:25 PM, Moger, Babu wrote:
>>>> Hi Reinette,
>>>>
>>>> On 5/20/2025 1:23 PM, Reinette Chatre wrote:
>>>>> Hi Babu,
>>>>>
>>>>> On 5/20/25 10:51 AM, Moger, Babu wrote:
>>>>>> Hi Reinette,
>>>>>>
>>>>>> On 5/20/25 11:06, Reinette Chatre wrote:
>>>>>>> Hi Babu,
>>>>>>>
>>>>>>> On 5/20/25 8:28 AM, Moger, Babu wrote:
>>>>>>>> On 5/19/25 10:59, Peter Newman wrote:
>>>>>>>>> On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@amd.com> wrote:
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>>>>> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
>>>>>>>>>> counters available for assignment.
>>>>>>>>>
>>>>>>>>> Earlier I discussed with Reinette[1] what num_mbm_cntrs should
>>>>>>>>> represent in a "soft-ABMC" implementation where assignment is
>>>>>>>>> implemented by assigning an RMID, which would result in all events
>>>>>>>>> being assigned at once.
>>>>>>>>>
>>>>>>>>> My main concern is how many "counters" you can assign by assigning
>>>>>>>>> RMIDs. I recall Reinette proposed reporting the number of groups which
>>>>>>>>> can be assigned separately from counters which can be assigned.
>>>>>>>>
>>>>>>>> More context may be needed here. Currently, num_mbm_cntrs indicates the
>>>>>>>> number of counters available per domain, which is 32.
>>>>>>>>
>>>>>>>> At the moment, we can assign 2 counters to each group, meaning each RMID
>>>>>>>> can be associated with 2 hardware counters. In theory, it's possible to
>>>>>>>> assign all 32 hardware counters to a group—allowing one RMID to be linked
>>>>>>>> with up to 32 counters. However, we currently lack the interface to
>>>>>>>> support that level of assignment.
>>>>>>>>
>>>>>>>> For now, the plan is to support basic assignment and expand functionality
>>>>>>>> later once we have the necessary data structure and requirements.
>>>>>>>
>>>>>>> Looks like some requirements did not make it into this implementation.
>>>>>>> Do you recall the discussion that resulted in you writing [2]? Looks like
>>>>>>> there is a question to Peter in there on how to determine how many "counters"
>>>>>>> are available in soft-ABMC. I interpreted [3] at that time to mean that this
>>>>>>> information would be available in a future AMD publication.
>>>>>>
>>>>>> We already have a method to determine the number of counters in soft-ABMC
>>>>>> mode, which Peter has addressed [4].
>>>>>>
>>>>>> [4]
>>>>>> https://lore.kernel.org/lkml/20250203132642.2746754-1-peternewman@google.com/
>>>>>>
>>>>>> This appears to be more of a workaround, and I doubt it will be included
>>>>>> in any official AMD documentation. Additionally, the long-term direction
>>>>>> is moving towards ABMC.
>>>>>>
>>>>>> I don’t believe this workaround needs to be part of the current series. It
>>>>>> can be added later when soft-ABMC is implemented.
>>>>>
>>>>> Agreed. What about the plans described in [2]? (Thanks to Peter for
>>>>> catching this!).
>>>>>
>>>>> It is important to keep track of requirements while working on a feature to
>>>>> ensure that the implementation supports the planned use cases. Re-reading that
>>>>> thread it is not clear to me how soft-ABMC's per-group assignment would look.
>>>>> Could you please share how you see it progress from this implementation?
>>>>> This includes the single event vs. multiple event assignment. I would like to
>>>>> highlight that this is not a request for this to be supported in this implementation
>>>>> but there needs to be a plan for how this can be supported on top of interfaces
>>>>> established by this work.
>>>>>
>>>>
>>>> Here’s my current understanding of soft-ABMC. Peter may have a more in-depth perspective on this.
>>>>
>>>> Soft-ABMC:
>>>> a. num_mbm_cntrs: This is a software-defined limit based on the number of active RMIDs that can be supported. The value can be obtained using the code referenced in [4].
>>
>> I would call it a hardware-defined limit that can be probed by software.
>>
>> The main question is whether this file returns the exact number of
>> RMIDs hardware can track or double that number (mbm_total_bytes +
>> mbm_local_bytes) so that the value is always measured in events.
> 
> tl;dr: I continue [3] to find it most intuitive for num_mbm_cntrs to be the exact
> number of "active" RMIDs that the system can support *and* changing the name of
> the modes to help user interpret num_mbm_cntrs: "mbm_cntr_event_assign" for ABMC,
> "mbm_cntr_group_assign" for soft-ABMC.
> 
> details
> -------
> 
> We are now back to the previous discussion about what user can expect from
> the interface. Let me try and re-cap that discussion so that we can all hopefully
> get back on the same page. Please add corrections/updates where needed.
> 
> soft-ABMC
> ---------
>   soft-ABMC manages "active" (term TBD) RMID assignment to monitor groups. When an
>   "active" RMID is assigned to a monitor group then *all* MBM events (not LLC occupancy)
>   in that monitor group are counted. "Active" RMID assignment can be done per domain.
> 
>   Requirement: resctrl should accurately reflect which events are counted. That is,
>   we do not want resctrl to pretend to allow user to assign an "active" RMID to
>   only one event in a monitor group while all events are actually counted.
> 
>   Caveat: To support rapid re-assignment of RMIDs to monitor groups, llc_occupancy
>   event is disabled when soft-ABMC is enabled.
> 
> ABMC
> ----
>   ABMC manages (hardware) counter assignment to monitor group (RMID), event pairs.
>   When a hardware counter is assigned to an RMID, event pair then only that
>   RMID, event is counted. Hardware counter assignment can be done per domain.
> 
> 
> shared assignment
> -----------------
> A shared assignment applies to both soft-ABMC and ABMC. A user can designate a
> "counter" (could be hardware counter or "active" RMID) as shared and that means
> the counter within that domain is shared between different monitor groups and actual
> assignment is scheduled by resctrl.  

Good summary: Thanks.

> 
> 
> user interface
> --------------
> 
> Next, consider the interface while keeping above definitions and requirements in mind.
> 
> This series introduces (using implementation, not cover-letter):
> 
> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs 
> "num_mbm_cntrs":                                                               
> 	The maximum number of monitoring counters (total of available and assigned
> 	counters) in each domain when the system supports mbm_cntr_assign mode. 
> 
> /sys/fs/resctrl/mbm_L3_assignments
> "mbm_L3_assignments":                                                          
> 	This interface file is created when the mbm_cntr_assign mode is supported
> 	and shows the assignment status for each group.              
> 
> Consider "mbm_L3_assignments" first. The interface is documented for ABMC support
> where it is possible to manage individual event assignment within monitor group.
> 
> For ABMC it is possible to assign just one event at a time and doing so consumes
> one counter in that domain:
> 
> a) Starting state on system with 32 counters per domain, two events in default
>    resource group consumes two counters in that domain:
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 0=30;1=32
> # cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=e;1=_
> mbm_local_bytes:0=e;1=_
> 
> b) Assign counter to mbm_local_bytes in domain 1:
> # echo "mbm_local_bytes:1=e" > /sys/fs/resctrl/mbm_L3_assignments
> # cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=e;1=_
> mbm_local_bytes:0=e;1=e
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 0=30;1=31
> 
> The question is how this should look on soft-ABMC system. Let's say hypothetically
> that on a soft-ABMC system it is possible to have 32 "active" RMIDs.
> 
> a) Starting state on system with 32 "active RMIDs" per domain, two events in default
>    resource group consumes one RMID in that domain:
> 
> # cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=e;1=_
> mbm_local_bytes:0=e;1=_
> 
> What should num_mbm_cntrs display?
> 
> Option A (counters are RMIDs):
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 0=31;1=32
> 
> Option B (pretend RMIDs are events):
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 0=62;1=64
> 
> b) Assign counter to mbm_local_bytes in domain 1:
> # echo "mbm_local_bytes:1=e" > /sys/fs/resctrl/mbm_L3_assignments
> # cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=e;1=e
> mbm_local_bytes:0=e;1=e
> 
> Note that even though user requested only mbm_local_bytes to be assigned, it
> actually results in both mbm_total_bytes and mbm_local_bytes to be assigned. This
> ensures accurate state representation to user space but this also creates an
> inconsistent user interface between soft-ABMC and ABMC since user space intends
> to use the same interface but "sometimes" assigning one event results in assign
> of one event while "sometimes" it results in assign of multiple events.
> 
> wrt "num_mbm_cntrs"
> 
> Option A (counters are RMIDs):
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 0=31;1=31
> 
> Option B (pretend RMIDs are events):
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 0=62;1=62 
> 
> Neither option seems ideal to me since the interface cannot be consistent
> between ABMC and soft-ABMC.
> As I mentioned in [2] it is not possible to hide ABMC and soft-ABMC behind
> the same interface. When user space wants to monitor a particular monitor group
> then it should be clear how that can be accomplished. Not knowing if
> an assignment/unassignment to/from an event would impact one or all events
> and whether it will consume one or multiple counters does not sound like a good
> interface to me. 
> 
> As I understand current interface, user is required to know how ABMC and soft-ABMC
> is implemented to be able to configure the system. For example, if user has file like:
> 	# cat /sys/fs/resctrl/mbm_L3_assignments
> 	mbm_total_bytes:0=e;1=e
> 	mbm_local_bytes:0=e;1=e
> user must know underlying implementation to be able to manage monitoring of
> events and assigning counters otherwise it will be a surprise to lose monitoring
> of all events when unassigning one event.
> 
> This is why I proposed in [3] that the name of the mode reflects how user can interact
> with the system. Instead of one "mbm_cntr_assign" mode there can be "mbm_cntr_event_assign"
> that is used for ABMC and "mbm_cntr_group_assign" that is used for soft-ABMC. The mode should
> make it clear what the system is capable of wrt counter assignments.

Yes, that makes sense. Perhaps we can also simplify it further:

# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode:
[mbm_cntr_evt_assign] <- for ABMC
 mbm_cntr_grp_assign  <- for soft-ABMC

> 
> Considering this the interface should be clear:
> num_mbm_cntrs: reflects the number of counters in each domain that can be assigned. In
> "mbm_cntr_event_assign" this will be the number of counters that can be assigned to 
> each event within a monitoring group, in "mbm_cntr_group_assign" this will be the number
> of counters that can be assigned to entire monitoring groups impacting all MBM events.
> 
> mbm_L3_assignments: manages the counter assignment in each group. When user knows the mode
> is "mbm_cntr_event_assign"/"mbm_cntr_group_assign" then it should be clear to user space how the
> interface behaves wrt assignment, no surprises of multiple events impacted when
> assigning/unassigning single event.
> 
> For soft-ABMC I thus find it most intuitive for num_mbm_cntrs to be the exact number
> of "active" RMIDs that the system can support *and* changing the name of the modes
> to help user interpret num_mbm_cntrs.

Sure. The option A: fits well here then.

 Option A (counters are RMIDs):
 # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
 0=31;1=31

> 
>>
>> There's also the mongroup-RMID overcommit use case I described
>> above[1]. On Intel we can safely assume that there are counters to
>> back all RMIDs, so num_mbm_cntrs would be calculated directly from
>> num_rmids.
> 
> This is about the:
> 	There's now more interest in Google for allowing explicit control of
> 	where RMIDs are assigned on Intel platforms. Even though the number of
> 	RMIDs implemented by hardware tends to be roughly the number of
> 	containers they want to support, they often still need to create
> 	containers when all RMIDs have already been allocated, which is not
> 	currently allowed. Once the container has been created and starts
> 	running, it's no longer possible to move its threads into a monitoring
> 	group whenever RMIDs should become available again, so it's important
> 	for resctrl to maintain an accurate task list for a container even
> 	when RMIDs are not available.
> 
> I see a monitor group as a collection of tasks that need to be monitored together.
> The "task list" is the group of tasks that share a monitoring ID that
> is required to be a valid ID since when any of the tasks are scheduled that ID is
> written to the hardware. I intentionally tried to not use RMID since I believe
> this is required for all archs.
> I thus do not understand how a task can start running when it does not have
> a valid monitoring ID. The idea of "deferred assignment" is not clear to me,
> there can never be "unmonitored tasks", no? I think I am missing something here.
> 
>> I realized this use case is more difficult to implement on MPAM,
>> because a PARTID is effectively a CLOSID+RMID, so deferring assigning
>> a unique PARTID to a group also results in it being in a different
>> allocation group. It will work if the unmonitored groups could find a
>> way to share PARTIDs, but this has consequences on allocation - but
>> hopefully no worse than sharing CLOSIDs on x86.
>>
>> There's a lot of interest in monitoring ID overcommit in Google, so I
>> think it's worth it for me to investigate the additional structural
>> changes needed in resctrl (i.e., breaking the FS-level association
>> between mongroups and HW monitoring IDs). Such a framework could be a
>> better fit for soft-ABMC. For example, if overcommit is allowed, we
>> would just report the number of simultaneous RMIDs we were able to
>> probe as num_rmids. I would want the same shared assignment scheduler
>> to be able to work with RMIDs and counters, though.
>>
>> Thanks,
>> -Peter
>>
>> [1] https://lore.kernel.org/lkml/CALPaoChSzzU5mzMZsdT6CeyEn0WD1qdT9fKCoNW_ty4tojtrkw@mail.gmail.com/
> 
> Reinette
> 
> [2] https://lore.kernel.org/lkml/b9e48e8f-3035-4a7e-a983-ce829bd9215a@intel.com/
> [3] https://lore.kernel.org/lkml/b3babdac-da08-4dfd-9544-47db31d574f5@intel.com/
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-22  8:47                         ` Peter Newman
@ 2025-05-22 16:32                           ` Reinette Chatre
  2025-05-22 17:21                           ` Luck, Tony
  1 sibling, 0 replies; 114+ messages in thread
From: Reinette Chatre @ 2025-05-22 16:32 UTC (permalink / raw)
  To: Peter Newman, Luck, Tony
  Cc: Moger, Babu, babu.moger@amd.com, corbet@lwn.net,
	tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, james.morse@arm.com,
	dave.martin@arm.com, fenghuay@nvidia.com, x86@kernel.org,
	hpa@zytor.com, paulmck@kernel.org, akpm@linux-foundation.org,
	thuth@redhat.com, rostedt@goodmis.org, ardb@kernel.org,
	gregkh@linuxfoundation.org, daniel.sneddon@linux.intel.com,
	jpoimboe@kernel.org, alexandre.chartre@oracle.com,
	pawan.kumar.gupta@linux.intel.com, thomas.lendacky@amd.com,
	perry.yuan@amd.com, seanjc@google.com, Huang, Kai, Li, Xiaoyao,
	kan.liang@linux.intel.com, Li, Xin3, ebiggers@google.com,
	xin@zytor.com, Mehta, Sohil, andrew.cooper3@citrix.com,
	mario.limonciello@amd.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Wieczor-Retman, Maciej,
	Eranian, Stephane, Xiaojian.Du@amd.com, gautham.shenoy@amd.com

Hi Peter,

On 5/22/25 1:47 AM, Peter Newman wrote:
> Hi Tony, Reinette,
> 
> On Thu, May 22, 2025 at 2:21 AM Luck, Tony <tony.luck@intel.com> wrote:
>>
>>>>>> There's also the mongroup-RMID overcommit use case I described
>>>>>> above[1]. On Intel we can safely assume that there are counters to
>>>>>> back all RMIDs, so num_mbm_cntrs would be calculated directly from
>>>>>> num_rmids.
>>>>>
>>>>> This is about the:
>>>>>    There's now more interest in Google for allowing explicit control of
>>>>>    where RMIDs are assigned on Intel platforms. Even though the number of
>>>>>    RMIDs implemented by hardware tends to be roughly the number of
>>>>>    containers they want to support, they often still need to create
>>>>>    containers when all RMIDs have already been allocated, which is not
>>>>>    currently allowed. Once the container has been created and starts
>>>>>    running, it's no longer possible to move its threads into a monitoring
>>>>>    group whenever RMIDs should become available again, so it's important
>>>>>    for resctrl to maintain an accurate task list for a container even
>>>>>    when RMIDs are not available.
>>>>>
>>>>> I see a monitor group as a collection of tasks that need to be monitored together.
>>>>> The "task list" is the group of tasks that share a monitoring ID that
>>>>> is required to be a valid ID since when any of the tasks are scheduled that ID is
>>>>> written to the hardware. I intentionally tried to not use RMID since I believe
>>>>> this is required for all archs.
>>>>> I thus do not understand how a task can start running when it does not have
>>>>> a valid monitoring ID. The idea of "deferred assignment" is not clear to me,
>>>>> there can never be "unmonitored tasks", no? I think I am missing something here.
> 
> You are correct. I did forget to mention something...
> 
>>>>
>>>> In the AMD/RMID implemenentation this might be achieved with something
>>>> extra in the task structure to denote whether a task is in a monitored
>>>> group or not. E.g. We add "task->rmid_valid" as well as "task->rmid".
>>>> Tasks in an unmonitored group retain their "task->rmid" (that's what
>>>> identifies them as a member of a group) but have task->rmid_valid set
>>>> to false.  Context switch code would be updated to load "0" into the
>>>> IA32_PQR_ASSOC.RMID field for tasks without a valid RMID. So they
>>>> would still be monitored, but activity would be bundled with all
>>>> tasks in the default resctrl group.
>>>>
>>>> Presumably something analogous could be done for ARM/MPAM.
>>>>
>>>
>>> I do not interpret this as an unmonitored task but instead a task that
>>> belongs to the default resource group. Specifically, any data accumulated by
>>> such a task is attributed to the default resource group. Having tasks
>>> in a separate group but their monitoring data accumulating in/contributed to
>>> the default resource group (that has its own set of tasks) sounds wrong to me.
>>> Such an implementation makes any monitoring data of default resource group
>>> invalid, and by extension impossible to use default resource group to manage
>>> an allocation for a group of monitor groups if user space needs insight
>>> in monitoring data across all these monitor groups. User space will need to
>>> interact with resctrl differently and individually query monitor groups instead
>>> of CTRL_MON group once.
>>
>> Maybe assign one of the limited supply of RMIDs for these "unmonitored"
>> tasks. Populate a resctrl group named "unmonitored" that lists all the
>> unmonitored tasks in a (read-only) "tasks" file. And supply all the counts
>> for these tasks in normal looking "mon_data" directory.
> 
> I needed to switch to an rdtgroup struct pointer rather than hardware
> IDs in the task structure to indicate group membership[1], otherwise
> it's not possible to determine which tasks are in a group when it
> doesn't have a unique HW ID value.

Whether the task struct contains a pointer (albeit accompanied with its
own complexities) does not address the issue that I am concerned about.

Looking at [1] I expect this new feature handles "unmonitored" groups by
placing them in the default monitoring group, following Tony's first [3]
suggestion.

When considering [1] by itself in the context of current resctrl all tasks
should be members of resource groups that have valid HW monitoring IDs allocated.
Using the default resource group in this way seems like addressing edge cases
where pointer is not yet valid (unclear what these scenarios may be) instead of
routing many tasks to the default group. I am not sure and I'll have to study
that change closer to reason accurately.

From what I understand the new proposal that builds on [1] involves creating
new monitor groups that are "unmonitored" for any length of time and when backed
by the implementation in [1] this would mean these groups will actually
still be monitored but the data attributed to the default resource group.

As I mentioned in response [4] to Tony this fundamentally changes the
behavior users can expect from the default resource group. In addition,
this breaks the first of the "Resource monitoring rules" from
Documentation/filesystems/resctrl.rst:

1) If a task is a member of a MON group, or non-default CTRL_MON group          
   then RDT events for the task will be reported in that group.  

How does this fit with the ABMC work? I continue to think that I am missing
parts of the discussion as it seems this new feature discussion mixed in
with ABMC work.

Reinette

> 
> Also this is required for shared assignment so that changing a group's
> IDs in a domain only requires updating running tasks rather than
> needing to search the entire task list, which would lead to the same
> problem we encountered in mongroup rename[2].
> 
> -Peter
> 
> [1] https://lore.kernel.org/lkml/20240325172707.73966-5-peternewman@google.com/
> [2] https://lore.kernel.org/lkml/CALPaoCh0SbG1+VbbgcxjubE7Cc2Pb6QqhG3NH6X=WwsNfqNjtA@mail.gmail.com/
[3] https://lore.kernel.org/lkml/aC5lL_qY00vd8qp4@agluck-desk3/
[4] https://lore.kernel.org/lkml/a131e8ed-88b2-4fed-983b-5deea955a9a5@intel.com/

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-22  9:14                   ` Peter Newman
@ 2025-05-22 16:33                     ` Reinette Chatre
  0 siblings, 0 replies; 114+ messages in thread
From: Reinette Chatre @ 2025-05-22 16:33 UTC (permalink / raw)
  To: Peter Newman
  Cc: Moger, Babu, babu.moger, corbet, tony.luck, tglx, mingo, bp,
	dave.hansen, james.morse, dave.martin, fenghuay, x86, hpa,
	paulmck, akpm, thuth, rostedt, ardb, gregkh, daniel.sneddon,
	jpoimboe, alexandre.chartre, pawan.kumar.gupta, thomas.lendacky,
	perry.yuan, seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li,
	ebiggers, xin, sohil.mehta, andrew.cooper3, mario.limonciello,
	linux-doc, linux-kernel, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Peter,

On 5/22/25 2:14 AM, Peter Newman wrote:
> On Thu, May 22, 2025 at 1:05 AM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>> On 5/21/25 7:27 AM, Peter Newman wrote:

...

>>> At the interface level, I think mbm_L3_assignments on a non-ABMC
>>> system would only need to contain a single line:
>>>
>>> 0=s;1=s;...;31=s
>>
>> It should be obvious to user space how to interpret the fields. When there is
>> thus a single "mbm_cntr_assign" mode used for ABMC and soft-ABMC a single
>> line like this would be difficult to parse since that would imply/require
>> that user space knows whether it is running on ABMC or soft-ABMC system,
>> which we should avoid.
>>
>> If there are different modes, for example "mbm_cntr_event_assign" and
>> "mbm_cntr_group_assign" then this could be used by user space to distinguish
>> how to interact with mbm_L3_assignments making something like this possible.
> 
> I meant to say I was proposing the format of this file when in the
> group assignment mode. I didn't mean to imply that a separate mode
> wasn't needed.

Thanks for confirming.

>>> But maybe for consistency we would synthesize a single, unmodifiable
>>> counter configuration to reflect that allocating an RMID in a domain
>>> results in assignment to all events and deallocating the RMID
>>> unassigns all events. We could call it "group" to say it's assigning
>>> at the group level, or perhaps just '*':
>>>
>>> *:0=s;1=s;...;31=s
>>>
>>> I'm not sure about allowing a '*' on ABMC hardware, because it could
>>> be interpreted as allocating a lot of counters when a large number of
>>> event configurations exist.
>>>
>>> *:0=s;1=s;...;31=s
>>>
>>
>> Either could work also. Whether it is "group" or "*" ABMC systems could
>> respond with "not supported". Will think about this more but would
>> like to hear your opinion about the flexibility that distinguishing between
>> a "mbm_cntr_event_assign" and "mbm_cntr_group_assign" mode provides.
> 
> I agree it's clearer when they are separate modes. Between "*" and
> "group", I prefer "group" because it seems the least ambiguous.

Sounds good to me. resctrl will need extra guards to prevent user
from creating an event named "group" but this matches what resctrl already
needs to do for other parts (eg. user cannot create a monitor group named
"mon_groups").

> 
> I just want to make sure we'd never want both modes at the same time,
> such as an implementation with both a small number of monitoring IDs

hmmm ... my assumption was that a system could only support one of these
modes ("mbm_cntr_event_assign" or "mbm_cntr_group_assign") but it could
be possible to have both possible on a system. But beyond that to have
both active at the *same* time? That will take a lot of wrangling
during runtime.

> and a small number of MBM counters. I support one MPAM implementation
> that has a small number of PARTIDs and only one MBWU counter per
> domain. Fingers crossed that the number of PARTIDs it supports isn't
> small compared to the number of jobs we would run on it. Otherwise
> maybe it will work out to just pick the more limited of the two
> (monitor IDs or counters) and make allocation of one drive the other.

Could a scenario like this be addressed by "mbm_cntr_event_assign" mode
gaining support for "shared assignment"?


> (In case you read this before my earlier reply[1], see the note about
> rdtgroup pointers in the task_struct, as this is a prerequisite for
> overcommitting HW monitor IDs.)
> 

Reinette

> [1] https://lore.kernel.org/lkml/CALPaoCjh_NXQLtNBqei=7a6Jsr17fEnPO+kqMaNq4xNu2UPDJA@mail.gmail.com/

...

>>>>>>>> [2] https://lore.kernel.org/lkml/afb99efe-0de2-f7ad-d0b8-f2a0ea998efd@amd.com/
>>>>>>>> [3] https://lore.kernel.org/lkml/CALPaoCg3KpF94g2MEmfP_Ro2mQZYFA8sKVkmb+7isotKNgdY9A@mail.gmail.com/



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-22 15:44                   ` Moger, Babu
@ 2025-05-22 16:33                     ` Reinette Chatre
  2025-05-22 19:15                       ` Moger, Babu
  2025-06-10 23:19                       ` Moger, Babu
  0 siblings, 2 replies; 114+ messages in thread
From: Reinette Chatre @ 2025-05-22 16:33 UTC (permalink / raw)
  To: babu.moger, Peter Newman
  Cc: Moger, Babu, corbet, tony.luck, tglx, mingo, bp, dave.hansen,
	james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, maciej.wieczor-retman, eranian, Xiaojian.Du,
	gautham.shenoy

Hi Babu,

On 5/22/25 8:44 AM, Moger, Babu wrote:
> On 5/21/25 18:03, Reinette Chatre wrote:

...

>> This is why I proposed in [3] that the name of the mode reflects how user can interact
>> with the system. Instead of one "mbm_cntr_assign" mode there can be "mbm_cntr_event_assign"
>> that is used for ABMC and "mbm_cntr_group_assign" that is used for soft-ABMC. The mode should
>> make it clear what the system is capable of wrt counter assignments.
> 
> Yes, that makes sense. Perhaps we can also simplify it further:
> 
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode:
> [mbm_cntr_evt_assign] <- for ABMC
>  mbm_cntr_grp_assign  <- for soft-ABMC

Looks good to me. Thank you.

>> Considering this the interface should be clear:
>> num_mbm_cntrs: reflects the number of counters in each domain that can be assigned. In
>> "mbm_cntr_event_assign" this will be the number of counters that can be assigned to 
>> each event within a monitoring group, in "mbm_cntr_group_assign" this will be the number
>> of counters that can be assigned to entire monitoring groups impacting all MBM events.
>>
>> mbm_L3_assignments: manages the counter assignment in each group. When user knows the mode
>> is "mbm_cntr_event_assign"/"mbm_cntr_group_assign" then it should be clear to user space how the
>> interface behaves wrt assignment, no surprises of multiple events impacted when
>> assigning/unassigning single event.
>>
>> For soft-ABMC I thus find it most intuitive for num_mbm_cntrs to be the exact number
>> of "active" RMIDs that the system can support *and* changing the name of the modes
>> to help user interpret num_mbm_cntrs.
> 
> Sure. The option A: fits well here then.
> 
>  Option A (counters are RMIDs):
>  # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
>  0=31;1=31

Thank you for considering.

Please add the requirements from this discussion to your running list. Also please keep in mind
how soft-ABMC intends to use the interfaces created by this work so that the documentation that
accompanies the ABMC support in this series leaves enough "wiggle room" for soft-ABMC to be built on top.

>>> [1] https://lore.kernel.org/lkml/CALPaoChSzzU5mzMZsdT6CeyEn0WD1qdT9fKCoNW_ty4tojtrkw@mail.gmail.com/
>>
>> Reinette
>>
>> [2] https://lore.kernel.org/lkml/b9e48e8f-3035-4a7e-a983-ce829bd9215a@intel.com/
>> [3] https://lore.kernel.org/lkml/b3babdac-da08-4dfd-9544-47db31d574f5@intel.com/
>>
> 

Reinette

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-22  8:47                         ` Peter Newman
  2025-05-22 16:32                           ` Reinette Chatre
@ 2025-05-22 17:21                           ` Luck, Tony
  1 sibling, 0 replies; 114+ messages in thread
From: Luck, Tony @ 2025-05-22 17:21 UTC (permalink / raw)
  To: Peter Newman
  Cc: Chatre, Reinette, Moger, Babu, babu.moger@amd.com, corbet@lwn.net,
	tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, james.morse@arm.com,
	dave.martin@arm.com, fenghuay@nvidia.com, x86@kernel.org,
	hpa@zytor.com, paulmck@kernel.org, akpm@linux-foundation.org,
	thuth@redhat.com, rostedt@goodmis.org, ardb@kernel.org,
	gregkh@linuxfoundation.org, daniel.sneddon@linux.intel.com,
	jpoimboe@kernel.org, alexandre.chartre@oracle.com,
	pawan.kumar.gupta@linux.intel.com, thomas.lendacky@amd.com,
	perry.yuan@amd.com, seanjc@google.com, Huang, Kai, Li, Xiaoyao,
	kan.liang@linux.intel.com, Li, Xin3, ebiggers@google.com,
	xin@zytor.com, Mehta, Sohil, andrew.cooper3@citrix.com,
	mario.limonciello@amd.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Wieczor-Retman, Maciej,
	Eranian, Stephane, Xiaojian.Du@amd.com, gautham.shenoy@amd.com

> > Maybe assign one of the limited supply of RMIDs for these "unmonitored"
> > tasks. Populate a resctrl group named "unmonitored" that lists all the
> > unmonitored tasks in a (read-only) "tasks" file. And supply all the counts
> > for these tasks in normal looking "mon_data" directory.
> 
> I needed to switch to an rdtgroup struct pointer rather than hardware
> IDs in the task structure to indicate group membership[1], otherwise
> it's not possible to determine which tasks are in a group when it
> doesn't have a unique HW ID value.
> 
> Also this is required for shared assignment so that changing a group's
> IDs in a domain only requires updating running tasks rather than
> needing to search the entire task list, which would lead to the same
> problem we encountered in mongroup rename[2].

Having a pointer to the rdtgroup in the task structure does make
file system operations easier. But the cost appears to be more
complexity (and memory references) in the context switch code.

Your patch[1] seems to do some extra work outside of the static_branch
protected sections. So has a cost to context switch even if resctrl
is not in use.

Chasing pointers "closid = rgrp->mon.parent->closid;" could be
expensive when those miss in the cache.

> 
> -Peter
> 
> [1] https://lore.kernel.org/lkml/20240325172707.73966-5-peternewman@google.com/
> [2] https://lore.kernel.org/lkml/CALPaoCh0SbG1+VbbgcxjubE7Cc2Pb6QqhG3NH6X=WwsNfqNjtA@mail.gmail.com/

-Tony

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-22 16:33                     ` Reinette Chatre
@ 2025-05-22 19:15                       ` Moger, Babu
  2025-06-10 23:19                       ` Moger, Babu
  1 sibling, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-05-22 19:15 UTC (permalink / raw)
  To: Reinette Chatre, Peter Newman
  Cc: Moger, Babu, corbet, tony.luck, tglx, mingo, bp, dave.hansen,
	james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, maciej.wieczor-retman, eranian, Xiaojian.Du,
	gautham.shenoy

Hi Reinette,

On 5/22/25 11:33, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/22/25 8:44 AM, Moger, Babu wrote:
>> On 5/21/25 18:03, Reinette Chatre wrote:
> 
> ...
> 
>>> This is why I proposed in [3] that the name of the mode reflects how user can interact
>>> with the system. Instead of one "mbm_cntr_assign" mode there can be "mbm_cntr_event_assign"
>>> that is used for ABMC and "mbm_cntr_group_assign" that is used for soft-ABMC. The mode should
>>> make it clear what the system is capable of wrt counter assignments.
>>
>> Yes, that makes sense. Perhaps we can also simplify it further:
>>
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode:
>> [mbm_cntr_evt_assign] <- for ABMC
>>  mbm_cntr_grp_assign  <- for soft-ABMC
> 
> Looks good to me. Thank you.
> 
>>> Considering this the interface should be clear:
>>> num_mbm_cntrs: reflects the number of counters in each domain that can be assigned. In
>>> "mbm_cntr_event_assign" this will be the number of counters that can be assigned to 
>>> each event within a monitoring group, in "mbm_cntr_group_assign" this will be the number
>>> of counters that can be assigned to entire monitoring groups impacting all MBM events.
>>>
>>> mbm_L3_assignments: manages the counter assignment in each group. When user knows the mode
>>> is "mbm_cntr_event_assign"/"mbm_cntr_group_assign" then it should be clear to user space how the
>>> interface behaves wrt assignment, no surprises of multiple events impacted when
>>> assigning/unassigning single event.
>>>
>>> For soft-ABMC I thus find it most intuitive for num_mbm_cntrs to be the exact number
>>> of "active" RMIDs that the system can support *and* changing the name of the modes
>>> to help user interpret num_mbm_cntrs.
>>
>> Sure. The option A: fits well here then.
>>
>>  Option A (counters are RMIDs):
>>  # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
>>  0=31;1=31
> 
> Thank you for considering.
> 
> Please add the requirements from this discussion to your running list. Also please keep in mind
> how soft-ABMC intends to use the interfaces created by this work so that the documentation that
> accompanies the ABMC support in this series leaves enough "wiggle room" for soft-ABMC to be built on top.

Sure. Thanks

> 
>>>> [1] https://lore.kernel.org/lkml/CALPaoChSzzU5mzMZsdT6CeyEn0WD1qdT9fKCoNW_ty4tojtrkw@mail.gmail.com/
>>>
>>> Reinette
>>>
>>> [2] https://lore.kernel.org/lkml/b9e48e8f-3035-4a7e-a983-ce829bd9215a@intel.com/
>>> [3] https://lore.kernel.org/lkml/b3babdac-da08-4dfd-9544-47db31d574f5@intel.com/
>>>
>>
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (27 preceding siblings ...)
  2025-05-19 15:59 ` [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Peter Newman
@ 2025-05-22 20:44 ` Reinette Chatre
  28 siblings, 0 replies; 114+ messages in thread
From: Reinette Chatre @ 2025-05-22 20:44 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/15/25 3:51 PM, Babu Moger wrote:
> 
> This series adds the support for Assignable Bandwidth Monitoring Counters
> (ABMC). It is also called QoS RMID Pinning feature
> 
> Series is written such that it is easier to support other assignable
> features supported from different vendors.
> 
> The feature details are documented in the  APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC). The documentation is available at
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> 
> The patches are based on top of commit
> 92a09c47464d0 (tag: v6.15-rc5, tip/irq/merge) Linux 6.15-rc5
> plus 
> https://lore.kernel.org/lkml/20250515165855.31452-1-james.morse@arm.com/
> 
> It is very clear these patches will go after James's resctrl FS/ARCH
> restructure. Hoping to avoid one review cycle due to the merge.
> 
> # Introduction
> 
> Users can create as many monitor groups as RMIDs supported by the hardware.
> However, bandwidth monitoring feature on AMD system only guarantees that

"bandwidth monitoring feature on AMD system" -> " the bandwidth monitoring
feature on AMD systems"? or "the bandwidth monitoring feature on an AMD system".
Not sure.

> RMIDs currently assigned to a processor will be tracked by hardware.
> The counters of any other RMIDs which are no longer being tracked will be
> reset to zero. The MBM event counters return "Unavailable" for the RMIDs
> that are not tracked by hardware. So, there can be only limited number of
> groups that can give guaranteed monitoring numbers. With ever changing
> configurations there is no way to definitely know which of these groups
> are being tracked for certain point of time. Users do not have the option

"for certain point of time" -> "during a particular/certain(?) time"?

> to monitor a group or set of groups for certain period of time without

"for certain period of time" -> "for a certain period of time"?

This series contains many duplicate snippets. When you update one, please
check that all the duplicates are updated also.


> worrying about counter being reset in between.
>     
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID, event pair and monitor the bandwidth as long as it is
> assigned.  The assigned RMID will be tracked by the hardware until the user
> unassigns it manually. There is no need to worry about counters being reset
> during this period. Additionally, the user can specify a bitmask identifying
> the specific bandwidth types from the given source to track with the counter.

Instead of tacking it on as an "additionally" I see this capability now as essential
to this new implementation. I tried to give this series a thorough review to help finalize
this work but I kept being turned around by all the descriptions and finally it dawned
that all the descriptions are at their code still based on the original "event ID"
based implementation with either a small append or as little change as possible to
adjust to the "extended event ID" based implementation. 

The previous implementation still used (and copy&pasted many times) in these descriptions
as "assign a hardware counter to an RMID, event pair" can only be accurate for this new
implementation if an event is re-defined ... it is no longer the original constrained
"event IDs" but instead an MBM event has become a generic name that identifies the
configurable "bandwidth types" (but, see note about terminology later) to be monitored. 
This is never done.

I assume "the given source" is the assigned RMID? If so I think it will help to
understand if this is specific: "bandwidth types from the assigned RMID ..."

I find this series to use several terms for the same concept,
for example, "bandwidth types", "memory transactions", "types of L3 transactions",
"bandwidth sources", etc. This work will be easier to consume if it uses consistent
and specific terminology.

> Without ABMC enabled, monitoring will work in current 'default' mode without
> assignment option.
> 
> # History
> 
> Earlier implementation of ABMC had dependancy on BMEC (Bandwidth Monitoring
> Event Configuration). Peter had concerns with that implementation because
> it may be not be compatible with ARM's MPAM.
> 
> Here are the threads discussing the concerns and new interface to address the concerns.
> https://lore.kernel.org/lkml/CALPaoCg97cLVVAcacnarp+880xjsedEWGJPXhYpy4P7=ky4MZw@mail.gmail.com/
> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
> 
> Here are the finalized requirements based on the discussion:
> 
> *   Remove BMEC dependency on the ABMC feature.

Even stronger, BMEC and ABMC are now "incompatible" in that resctrl will not let them be used
at the same time.

> 
> *   Eliminate global assignment listing. The interface
>     /sys/fs/resctrl/info/L3_MON/mbm_assign_control is no longer required.
> 
> *   Create the configuration directories at /sys/fs/resctrl/info/L3_MON/counter_configs/.
>     The configuration file names should be free-form, allowing users to create them as needed.
> 
> *   Perform assignment listing at the group level by introducing mbm_L3_assignments

"the group level" -> "the monitoring group level"

>     in each monitoring group. The listing should provide the following details:
> 
>     Event Configuration: Specifies the event configuration applied. This will be crucial
>     when "mkdir" on event configuration is added in the future, leading to the creation
>     of mon_data/mon_l3_*/<event configuration>.

hmmm ... sounds like it has become more natural to refer to it as "event configuration", which is
a good match for what the purpose is. This thus sounds like good motivation to change "counter_configs"
to "event_configs".

> 
>     Domains: Identifies the domains where the configuration is applied, supporting multi-domain setups.
> 
>     Assignment Type: Indicates whether the assignment is Exclusive (e or d), Shared (s), or Unassigned (_).

Could you please add definition of what "exclusive" and "shared" means?

> 
> *   Provide option to enable or disable auto assignment when new group is created.
> 
> This series tries to address all the requirements listed above.
> 
> # Implementation details
> 
> Create a generic interface aimed to support user space assignment of scarce
> counters used for monitoring. First usage of interface is by ABMC with option
> to expand usage to "soft-ABMC" and MPAM counters in future.
> 
> Feature adds following interface files:
> 
> /sys/fs/resctrl/info/L3_MON/mbm_assign_mode: Reports the list of assignable
> monitoring features supported. The enclosed brackets indicate which
> feature is enabled.
> 
> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
> counters available for assignment.

Please aim to use consistent and clear terms to help understand this work. It is
confusing that above uses "available" in description for num_mbm_cntrs and then below
there is a new interface "available_mbm_cntrs" that uses the "available" term in name
but not description.

> 
> /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs: Reports the number of monitoring
> counters free in each domain.
> 
> /sys/fs/resctrl/info/L3_MON/counter_configs : Directory to hold the counter configuration.

Everywhere else seems to refer to this as "event configurations". Please just stick to one,
"event configuration" seems most appropriate.

> 
> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter : Default configuration
> for MBM total events.

I think "default" should be dropped to make it clear that this is the actual configuration
that is always used, not a static "default" that may be used in "some" circumstances.

> 
> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter : Default configuration
> for MBM local events.

Same wrt "default"

> 
> /sys/fs/resctrl/mbm_L3_assignments: Interface to list or modify assignment states on each group.

"Per monitor group interface to list or modify counters assigned to the group."? (Please improve.)

> 
> # Examples
> 
> a. Check if ABMC support is available

Please drop the "ABMC" from all the descriptions since this is intended to be a generic interface.

> 	#mount -t resctrl resctrl /sys/fs/resctrl/
> 
> 	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> 	[mbm_cntr_assign]
> 	default
> 

I believe the naming has been finalized in
https://lore.kernel.org/lkml/7628cec8-5914-4895-8289-027e7821777e@amd.com/.

> 	ABMC feature is detected and it is enabled.
> 
> b. Check how many ABMC counters are available. 

available -> supported? This will help distinguish it from the
next interface file named "available_mbm_cntrs".

> 
> 	# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs 
> 	32

Please update to reflect what implementation does.

> 
> c. Check how many ABMC counters are available in each domain.

"available" -> "available for assignment"

> 
> 	# cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs 
> 	0=30;1=30
> 
> d. Check default counter configuration.
> 
> 	# cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter 
> 	local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
>         local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all
> 
> 	# cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter 
> 	local_reads, local_non_temporal_writes, local_reads_slow_memory

Does not look like this matches implementation wrt spacing?

> 
> e. Series adds a new interface file "mbm_L3_assignments" in each monitoring group
>    to list and modify any group's monitoring states.

"any group's" -> "that group's"

> 
> 	The list is displayed in the following format:
> 
>         <Event configuration>:<Domain id>=<Assignment type>
> 
>         Event configuration: A valid event configuration listed in the
>         /sys/fs/resctrl/info/L3_MON/counter_configs directory.
> 
>         Domain ID: A valid domain ID number.

"A valid domain ID number" -> "A valid domain ID"

> 
>         Assignment types:
> 
>         _ : No event configuration assigned
> 
>         e : Event configuration assigned in exclusive mode
> 
> 	To list the default group states:
> 	# cat /sys/fs/resctrl/mbm_L3_assignments
> 	mbm_total_bytes:0=e;1=e
> 	mbm_local_bytes:0=e;1=e
> 
> 	To unassign the configuration of mbm_total_bytes on domain 0:

This unassigns a counter, as opposed to a configuration, no? How about
"To unassign the counter associated with the mbm_total_bytes event"?

> 	#echo "mbm_total_bytes:0=_" > mbm_L3_assignments
> 	#cat mbm_L3_assignments

(May help to follow if the examples consistently uses full path.)

> 	mbm_total_bytes:0=_;1=e
> 	mbm_local_bytes:0=e;1=e
> 
> 	To unassign the mbm_total_bytes configuration on all domains:

same wrt unassigning a counter

>     	$echo "mbm_total_bytes:*=_" > mbm_L3_assignments
> 	$cat mbm_L3_assignments

# prompt is usually used for administrator and $ for user without
administrator privileges. Switching between # and $ in these examples 
is confusing.

> 	mbm_total_bytes:0=_;1=_
> 	mbm_local_bytes:0=e;1=e
> 
> 	To assign the mbm_total_bytes configuration on all domains in exclusive mode:

same wrt unassigning a counter

>     	$echo "mbm_total_bytes:*=e" > mbm_L3_assignments
> 	$cat mbm_L3_assignments
> 	mbm_total_bytes:0=e;1=e
> 	mbm_local_bytes:0=e;1=e
> 
> g. Read the events mbm_total_bytes and mbm_local_bytes of the default group.
>    There is no change in reading the events with ABMC. If the event is unassigned
>    when reading, then the read will come back as "Unassigned".
> 	
> 	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 	779247936
> 	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes 
> 	765207488
> 	
> h. Check the default event configurations.
> 
> 	#cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
> 	local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
> 	local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all
> 
> 	#cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> 	local_reads, local_non_temporal_writes, local_reads_slow_memory
> 
> i. Change the event configuration for mbm_local_bytes.
> 
> 	#echo "local_reads, local_non_temporal_writes, local_reads_slow_memory, remote_reads" >
> 	/sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> 
> 	#cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> 	local_reads, local_non_temporal_writes, local_reads_slow_memory, remote_reads
> 	
>         This will update the assignments where mbm_local_bytes are configured.

"This will update all (across all domains of all monitor groups) counter assignments 
associated with the mbm_local_bytes event." (Please improve).

> 	
> j. Now read the total event again. The first read may come back with "Unavailable"
>    status. The subsequent read of mbm_total_bytes will display only the read events.

Was this intended to be example of reading *local* bytes after modification in previous step?

> 	
> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 	Unavailable
> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 	314101
> 
> k. Users will have the option to go back to 'default' mbm_assign_mode if required.

"Users will have the option" -> "Users have the option"

>    This can be done using the following command. Note that switching the
>    mbm_assign_mode will reset all the MBM counters of all resctrl groups.

"all the MBM counters " -> "all the MBM counters (and thus all MBM events)"? 

> 
> 	# echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> 	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> 	mbm_cntr_assign
> 	[default]
> 	
> l. Unmount the resctrl
> 	 
> 	#umount /sys/fs/resctrl/
> ---

Reinette

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 01/27] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-15 22:51 ` [PATCH v13 01/27] x86/cpufeatures: Add support for " Babu Moger
@ 2025-05-22 20:51   ` Reinette Chatre
  2025-05-27 17:23     ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-05-22 20:51 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/15/25 3:51 PM, Babu Moger wrote:
> Users can create as many monitor groups as RMIDs supported by the hardware.
> However, bandwidth monitoring feature on AMD system only guarantees that
> RMIDs currently assigned to a processor will be tracked by hardware. The
> counters of any other RMIDs which are no longer being tracked will be reset
> to zero. The MBM event counters return "Unavailable" for the RMIDs that are
> not tracked by hardware. So, there can be only limited number of groups
> that can give guaranteed monitoring numbers. With ever changing
> configurations there is no way to definitely know which of these groups are
> being tracked for certain point of time. Users do not have the option to
> monitor a group or set of groups for certain period of time without
> worrying about RMID being reset in between.
> 
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID, event pair and monitor the bandwidth as long as it is
> assigned. The assigned RMID will be tracked by the hardware until the user
> unassigns it manually. There is no need to worry about counters being reset
> during this period. Additionally, the user can specify a bitmask
> identifying the specific bandwidth types from the given source to track
> with the counter.
> 
> Without ABMC enabled, monitoring will work in current mode without
> assignment option.
> 
> The Linux resctrl subsystem provides an interface that allows monitoring of
> up to two memory bandwidth events per group, selected from a combination of
> available total and local events. When ABMC is enabled, two events will be
> assigned to each group by default, in line with the current interface
> design. Users will also have the option to configure which types of memory
> transactions are counted by these events.
> 
> Due to the limited number of available counters (32), users may quickly
> exhaust the available counters. If the system runs out of assignable ABMC
> counters, the kernel will report an error. In such cases, users will nee
> dto unassign one or more active counters to free up countes for new

"nee dto" -> "need to"
"countes" -> "counters"


> assignments. The interface will provide options to assign or unassign

"The interface will" -> "resctrl will"?

> events through the group-specific interface file.
> 
> The feature can be detected via CPUID_Fn80000020_EBX_x00 bit 5.

"The feature can be detected" -> "The feature is detected"

> Bits Description
> 5    ABMC (Assignable Bandwidth Monitoring Counters)
> 
> The feature details are documented in APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

...
>  arch/x86/include/asm/cpufeatures.h | 1 +
>  arch/x86/kernel/cpu/cpuid-deps.c   | 2 ++
>  arch/x86/kernel/cpu/scattered.c    | 1 +
>  3 files changed, 4 insertions(+)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 6c2c152d8a67..d5c14dc678df 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -481,6 +481,7 @@
>  #define X86_FEATURE_AMD_HETEROGENEOUS_CORES (21*32 + 6) /* Heterogeneous Core Topology */
>  #define X86_FEATURE_AMD_WORKLOAD_CLASS	(21*32 + 7) /* Workload Classification */
>  #define X86_FEATURE_PREFER_YMM		(21*32 + 8) /* Avoid ZMM registers due to downclocking */
> +#define X86_FEATURE_ABMC		(21*32 + 9) /* Assignable Bandwidth Monitoring Counters */
>  
>  /*
>   * BUG word(s)
> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
> index a2fbea0be535..2f54831e04e5 100644
> --- a/arch/x86/kernel/cpu/cpuid-deps.c
> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
> @@ -71,6 +71,8 @@ static const struct cpuid_dep cpuid_deps[] = {
>  	{ X86_FEATURE_CQM_MBM_LOCAL,		X86_FEATURE_CQM_LLC   },
>  	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_TOTAL   },
>  	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_LOCAL   },
> +	{ X86_FEATURE_ABMC,			X86_FEATURE_CQM_MBM_TOTAL   },
> +	{ X86_FEATURE_ABMC,			X86_FEATURE_CQM_MBM_LOCAL   },

Is this dependency still accurate now that the implementation switched to the 
"extended event ID" variant of ABMC that no longer uses the event IDs associated
with X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL?

>  	{ X86_FEATURE_AVX512_BF16,		X86_FEATURE_AVX512VL  },
>  	{ X86_FEATURE_AVX512_FP16,		X86_FEATURE_AVX512BW  },
>  	{ X86_FEATURE_ENQCMD,			X86_FEATURE_XSAVES    },
> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> index 16f3ca30626a..3b72b72270f1 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -49,6 +49,7 @@ static const struct cpuid_bit cpuid_bits[] = {
>  	{ X86_FEATURE_MBA,			CPUID_EBX,  6, 0x80000008, 0 },
>  	{ X86_FEATURE_SMBA,			CPUID_EBX,  2, 0x80000020, 0 },
>  	{ X86_FEATURE_BMEC,			CPUID_EBX,  3, 0x80000020, 0 },
> +	{ X86_FEATURE_ABMC,			CPUID_EBX,  5, 0x80000020, 0 },
>  	{ X86_FEATURE_AMD_WORKLOAD_CLASS,	CPUID_EAX, 22, 0x80000021, 0 },
>  	{ X86_FEATURE_PERFMON_V2,		CPUID_EAX,  0, 0x80000022, 0 },
>  	{ X86_FEATURE_AMD_LBR_V2,		CPUID_EAX,  1, 0x80000022, 0 },

Reinette

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 03/27] x86/resctrl: Consolidate monitoring related data from rdt_resource
  2025-05-15 22:51 ` [PATCH v13 03/27] x86/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
@ 2025-05-22 20:52   ` Reinette Chatre
  2025-05-27 18:49     ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-05-22 20:52 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/15/25 3:51 PM, Babu Moger wrote:

> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 9ba771f2ddea..2a8fa454d3e6 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -255,40 +255,48 @@ enum resctrl_schema_fmt {
>  	RESCTRL_SCHEMA_RANGE,
>  };
>  
> +/**
> + * struct resctrl_mon - Monitoring related data of a resctrl resource
> + * @num_rmid:		Number of RMIDs available
> + * @mbm_cfg_mask:	Bandwidth sources that can be tracked when bandwidth
> + *			monitoring events can be configured.
> + * @evt_list:		List of monitoring events
> + */

Nit: this new comment portion can start with a clean slate by all sentences
having good structure by starting with upper case and ending with period.

Reinette

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 04/27] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
  2025-05-15 22:51 ` [PATCH v13 04/27] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
@ 2025-05-22 20:54   ` Reinette Chatre
  2025-05-27 19:52     ` Moger, Babu
  2025-05-27 20:15     ` Moger, Babu
  0 siblings, 2 replies; 114+ messages in thread
From: Reinette Chatre @ 2025-05-22 20:54 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/15/25 3:51 PM, Babu Moger wrote:
> ABMC feature details are reported via CPUID Fn8000_0020_EBX_x5.
> Bits Description
> 15:0 MAX_ABMC Maximum Supported Assignable Bandwidth
>      Monitoring Counter ID + 1
> 
> The feature details are documented in APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
> 
> Detect the feature and number of assignable monitoring counters supported.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

...

> ---
>  arch/x86/kernel/cpu/resctrl/monitor.c | 9 +++++++--
>  include/linux/resctrl.h               | 4 ++++
>  2 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index aeb2a9283069..fd2761d9f3f7 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -345,6 +345,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>  	unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
>  	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>  	unsigned int threshold;
> +	u32 eax, ebx, ecx, edx;
>  
>  	snc_nodes_per_l3_cache = snc_get_config();
>  
> @@ -375,13 +376,17 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>  	resctrl_rmid_realloc_threshold = resctrl_arch_round_mon_val(threshold);
>  
>  	if (rdt_cpu_has(X86_FEATURE_BMEC)) {
> -		u32 eax, ebx, ecx, edx;
> -
>  		/* Detect list of bandwidth sources that can be tracked */
>  		cpuid_count(0x80000020, 3, &eax, &ebx, &ecx, &edx);
>  		r->mon.mbm_cfg_mask = ecx & MAX_EVT_CONFIG_BITS;
>  	}
>  
> +	if (rdt_cpu_has(X86_FEATURE_ABMC)) {
> +		r->mon.mbm_cntr_assignable = true;
> +		cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
> +		r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
> +	}
> +

Shouldn't ABMC detection also include enumeration of which configurations
are supported? From what I can tell, looking ahead patch #18 hardcodes definitions
of all known "bandwidth types" (which term to use TBD) and then patch #20 allows
*any* of these types to be configured irrespective of whether system
supports it.
AMD spec mentions "The types of L3 transactions that ABMC can track are
configurable and identified by CPUID Fn8000_0020_ECX_x3."  It thus looks
like the enumeration of r->mon.mbm_cfg_mask when BMEC is enabled is
required for ABMC also and used by this implementation.

>  	r->mon_capable = true;
>  
>  	return 0;
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 2a8fa454d3e6..065fb6e38933 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -260,11 +260,15 @@ enum resctrl_schema_fmt {
>   * @num_rmid:		Number of RMIDs available
>   * @mbm_cfg_mask:	Bandwidth sources that can be tracked when bandwidth
>   *			monitoring events can be configured.
> + * @num_mbm_cntrs:	Number of assignable monitoring counters
> + * @mbm_cntr_assignable:Is system capable of supporting monitor assignment?

"monitor assignment" has not been used so far, was this intended to be
"counter assignment"?

Reinette

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 05/27] x86/resctrl: Add support to enable/disable AMD ABMC feature
  2025-05-15 22:51 ` [PATCH v13 05/27] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
@ 2025-05-22 20:56   ` Reinette Chatre
  2025-05-27 20:21     ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-05-22 20:56 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/15/25 3:51 PM, Babu Moger wrote:
> Add the functionality to enable/disable AMD ABMC feature.
> 
> AMD ABMC feature is enabled by setting enabled bit(0) in MSR
> L3_QOS_EXT_CFG. When the state of ABMC is changed, the MSR needs
> to be updated on all the logical processors in the QOS Domain.
> 
> Hardware counters will reset when ABMC state is changed.
> 
> The ABMC feature details are documented in APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

...

> ---
>  arch/x86/include/asm/msr-index.h       |  1 +
>  arch/x86/kernel/cpu/resctrl/internal.h |  5 +++
>  arch/x86/kernel/cpu/resctrl/monitor.c  | 43 ++++++++++++++++++++++++++
>  include/linux/resctrl.h                |  3 ++
>  4 files changed, 52 insertions(+)
> 
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index e6134ef2263d..3970e0b16e47 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -1203,6 +1203,7 @@
>  /* - AMD: */
>  #define MSR_IA32_MBA_BW_BASE		0xc0000200
>  #define MSR_IA32_SMBA_BW_BASE		0xc0000280
> +#define MSR_IA32_L3_QOS_EXT_CFG		0xc00003ff
>  #define MSR_IA32_EVT_CFG_BASE		0xc0000400
>  
>  /* AMD-V MSRs */
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 5e3c41b36437..fcc9d23686a1 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -37,6 +37,9 @@ struct arch_mbm_state {
>  	u64	prev_msr;
>  };
>  
> +/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
> +#define ABMC_ENABLE_BIT			0
> +
>  /**
>   * struct rdt_hw_ctrl_domain - Arch private attributes of a set of CPUs that share
>   *			       a resource for a control function
> @@ -102,6 +105,7 @@ struct msr_param {
>   * @mon_scale:		cqm counter * mon_scale = occupancy in bytes
>   * @mbm_width:		Monitor width, to detect and correct for overflow.
>   * @cdp_enabled:	CDP state of this resource
> + * @mbm_cntr_assign_enabled:	ABMC feature is enabled
>   *
>   * Members of this structure are either private to the architecture
>   * e.g. mbm_width, or accessed via helpers that provide abstraction. e.g.
> @@ -115,6 +119,7 @@ struct rdt_hw_resource {
>  	unsigned int		mon_scale;
>  	unsigned int		mbm_width;
>  	bool			cdp_enabled;
> +	bool			mbm_cntr_assign_enabled;
>  };
>  
>  static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r)
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index fd2761d9f3f7..ff4b2abfa044 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -405,3 +405,46 @@ void __init intel_rdt_mbm_apply_quirk(void)
>  	mbm_cf_rmidthreshold = mbm_cf_table[cf_index].rmidthreshold;
>  	mbm_cf = mbm_cf_table[cf_index].cf;
>  }
> +
> +static void resctrl_abmc_set_one_amd(void *arg)
> +{
> +	bool *enable = arg;
> +
> +	if (*enable)
> +		msr_set_bit(MSR_IA32_L3_QOS_EXT_CFG, ABMC_ENABLE_BIT);
> +	else
> +		msr_clear_bit(MSR_IA32_L3_QOS_EXT_CFG, ABMC_ENABLE_BIT);
> +}
> +
> +/*
> + * ABMC enable/disable requires update of L3_QOS_EXT_CFG MSR on all the CPUs
> + * associated with all monitor domains.
> + */
> +static void _resctrl_abmc_enable(struct rdt_resource *r, bool enable)
> +{
> +	struct rdt_mon_domain *d;
> +

It remains a challenge to consider these building blocks without insight into
how/when they will be used. To help out, please add guardrails to help with review.
For example, this could benefit from a:

	lockdep_assert_cpus_held();

> +	list_for_each_entry(d, &r->mon_domains, hdr.list) {
> +		on_each_cpu_mask(&d->hdr.cpu_mask,
> +				 resctrl_abmc_set_one_amd, &enable, 1);
> +		resctrl_arch_reset_rmid_all(r, d);
> +	}
> +}
> +
> +int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable)
> +{
> +	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
> +
> +	if (r->mon.mbm_cntr_assignable &&
> +	    hw_res->mbm_cntr_assign_enabled != enable) {
> +		_resctrl_abmc_enable(r, enable);
> +		hw_res->mbm_cntr_assign_enabled = enable;
> +	}
> +
> +	return 0;
> +}
> +
> +inline bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)

This "inline" in the .c file is unexpected. Why is this needed?

> +{
> +	return resctrl_to_arch_res(r)->mbm_cntr_assign_enabled;
> +}
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 065fb6e38933..bdb264875ef6 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -428,6 +428,9 @@ static inline u32 resctrl_get_config_index(u32 closid,
>  bool resctrl_arch_get_cdp_enabled(enum resctrl_res_level l);
>  int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
>  
> +bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r);
> +int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable);
> +
>  /*
>   * Update the ctrl_val and apply this config right now.
>   * Must be called on one of the domain's CPUs.

Reinette


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 06/27] x86/resctrl: Introduce the interface to display monitor mode
  2025-05-15 22:51 ` [PATCH v13 06/27] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
@ 2025-05-22 20:56   ` Reinette Chatre
  2025-05-27 20:33     ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-05-22 20:56 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/15/25 3:51 PM, Babu Moger wrote:

No comments on changelog since I expect it to be reworked based on 
https://lore.kernel.org/lkml/7628cec8-5914-4895-8289-027e7821777e@amd.com/

> --- a/Documentation/filesystems/resctrl.rst
> +++ b/Documentation/filesystems/resctrl.rst
> @@ -257,6 +257,33 @@ with the following files:
>  	    # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
>  	    0=0x30;1=0x30;3=0x15;4=0x15
>  
> +"mbm_assign_mode":
> +	Reports the list of monitoring modes supported. The enclosed brackets

Please try to avoid unnecessary words. For example,
"Reports the list of monitoring modes supported." -> "The supported monitoring modes."

Reinette

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 08/27] x86/resctrl: Introduce mbm_cntr_cfg to track assignable counters at domain
  2025-05-15 22:51 ` [PATCH v13 08/27] x86/resctrl: Introduce mbm_cntr_cfg to track assignable counters at domain Babu Moger
@ 2025-05-22 21:02   ` Reinette Chatre
  2025-05-28 16:56     ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-05-22 21:02 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Babu,

shortlog: "at domain" -> "per domain"?

On 5/15/25 3:51 PM, Babu Moger wrote:
> In mbm_cntr_assign mode hardware counters are assigned/unassigned to an
> MBM event of a monitor group. Hardware counters are assigned/unassigned
> at monitoring domain level.
> 
> Manage a monitoring domain's hardware counters using a per monitoring
> domain array of struct mbm_cntr_cfg that is indexed by the hardware
> counter ID. A hardware counter's configuration contains the MBM event
> ID and points to the monitoring group that it is assigned to, with a
> NULL pointer meaning that the hardware counter is available for assignment.
> 
> There is no direct way to determine which hardware counters are assigned
> to a particular monitoring group. Check every entry of every hardware
> counter configuration array in every monitoring domain to query which
> MBM events of a monitoring group is tracked by hardware. Such queries are
> acceptable because of a very small number of assignable counters (32
> to 64).
> 
> Suggested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v13: Resolved conflicts caused by the recent FS/ARCH code restructure.
>      The files monitor.c/rdtgroup.c have been split between FS and ARCH directories.
> 
> v12: Fixed the struct mbm_cntr_cfg code documentation.
>      Removed few strange charactors in changelog.
>      Added the counter range for better understanding.
>      Moved the struct mbm_cntr_cfg definition to resctrl/internal.h as
>      suggested by James.
> 
> v11: Refined the change log based on Reinette's feedback.
>      Fixed few style issues.
> 
> v10: Patch changed completely to handle the counters at domain level.
>      https://lore.kernel.org/lkml/CALPaoCj+zWq1vkHVbXYP0znJbe6Ke3PXPWjtri5AFgD9cQDCUg@mail.gmail.com/
>      Removed Reviewed-by tag.
>      Did not see the need to add cntr_id in mbm_state structure. Not used in the code.
> 
> v9: Added Reviewed-by tag. No other changes.
> 
> v8: Minor commit message changes.
> 
> v7: Added check mbm_cntr_assignable for allocating bitmap mbm_cntr_map
> 
> v6: New patch to add domain level assignment.
> ---
>  fs/resctrl/rdtgroup.c   | 11 +++++++++++
>  include/linux/resctrl.h | 16 ++++++++++++++++
>  2 files changed, 27 insertions(+)
> 
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 51f8f8d3ccbc..e2005fc9acd9 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -4085,6 +4085,7 @@ static void rdtgroup_setup_default(void)
>  
>  static void domain_destroy_mon_state(struct rdt_mon_domain *d)
>  {
> +	kfree(d->cntr_cfg);
>  	bitmap_free(d->rmid_busy_llc);
>  	kfree(d->mbm_total);
>  	kfree(d->mbm_local);
> @@ -4171,6 +4172,16 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain
>  			return -ENOMEM;
>  		}
>  	}
> +	if (resctrl_is_mbm_enabled() && r->mon.mbm_cntr_assignable) {
> +		tsize = sizeof(*d->cntr_cfg);
> +		d->cntr_cfg = kcalloc(r->mon.num_mbm_cntrs, tsize, GFP_KERNEL);
> +		if (!d->cntr_cfg) {
> +			bitmap_free(d->rmid_busy_llc);
> +			kfree(d->mbm_total);
> +			kfree(d->mbm_local);
> +			return -ENOMEM;
> +		}
> +	}
>  
>  	return 0;
>  }
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index bdb264875ef6..d77981d1fcb9 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -156,6 +156,20 @@ struct rdt_ctrl_domain {
>  	u32				*mbps_val;
>  };
>  
> +/**
> + * struct mbm_cntr_cfg - Assignable counter configuration
> + * @evtid:		MBM event to which the counter is assigned. Only valid
> + *			if @rdtgroup is not NULL.
> + * @evt_cfg:		Event configuration value.

@evt_cfg is not introduced in changelog nor defined here. Please add a snippet here
on what @evt_cfg's values represent. This is important since this is exposed
as resctrl fs API to architectures so all architectures need to use same values when
interacting with resctrl.

> + * @rdtgrp:		resctrl group assigned to the counter. NULL if the
> + *			counter is free.
> + */
> +struct mbm_cntr_cfg {
> +	enum resctrl_event_id   evtid;
> +	u32                     evt_cfg;
> +	struct rdtgroup         *rdtgrp;

Please align struct member names using TABs.

> +};
> +
>  /**
>   * struct rdt_mon_domain - group of CPUs sharing a resctrl monitor resource
>   * @hdr:		common header for different domain types
> @@ -167,6 +181,7 @@ struct rdt_ctrl_domain {
>   * @cqm_limbo:		worker to periodically read CQM h/w counters
>   * @mbm_work_cpu:	worker CPU for MBM h/w counters
>   * @cqm_work_cpu:	worker CPU for CQM h/w counters
> + * @cntr_cfg:		assignable counters configuration

"array of assignable counters' configuration (indexed by counter ID)"

>   */
>  struct rdt_mon_domain {
>  	struct rdt_domain_hdr		hdr;
> @@ -178,6 +193,7 @@ struct rdt_mon_domain {
>  	struct delayed_work		cqm_limbo;
>  	int				mbm_work_cpu;
>  	int				cqm_work_cpu;
> +	struct mbm_cntr_cfg		*cntr_cfg;
>  };
>  
>  /**

Reinette

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 10/27] x86/resctrl: Add data structures and definitions for ABMC assignment
  2025-05-15 22:51 ` [PATCH v13 10/27] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
@ 2025-05-22 21:10   ` Reinette Chatre
  2025-05-28 19:15     ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-05-22 21:10 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/15/25 3:51 PM, Babu Moger wrote:
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID, event pair and monitor the bandwidth as long as the
> counter is assigned. The bandwidth events will be tracked by the hardware
> until the user changes the configuration. Each resctrl group can configure
> maximum two counters, one for total event and one for local event.

(please update, above describes previous design)

> 
> The ABMC feature implements an MSR L3_QOS_ABMC_CFG (C000_03FDh).
> ABMC counter assignment is done by setting the counter id, bandwidth
> source (RMID) and bandwidth configuration. Users will have the option to
> change the bandwidth configuration using resctrl interface which will be
> introduced later in the series.

"will be introduced later in the series" is similar to "in a subsequent patch"
and should not be used in a changelog. Just describe what this patch does.

> 
> Attempts to read or write the MSR when ABMC is not enabled will result
> in a #GP(0) exception.
> 
> Introduce the data structures and definitions for MSR L3_QOS_ABMC_CFG
> (0xC000_03FDh):
> =========================================================================
> Bits 	Mnemonic	Description			Access Reset
> 							Type   Value
> =========================================================================
> 63 	CfgEn 		Configuration Enable 		R/W 	0
> 
> 62 	CtrEn 		Enable/disable counting		R/W 	0
> 
> 61:53 	– 		Reserved 			MBZ 	0
> 
> 52:48 	CtrID 		Counter Identifier		R/W	0
> 
> 47 	IsCOS		BwSrc field is a CLOSID		R/W	0
> 			(not an RMID)
> 
> 46:44 	–		Reserved			MBZ	0
> 
> 43:32	BwSrc		Bandwidth Source		R/W	0
> 			(RMID or CLOSID)
> 
> 31:0	BwType		Bandwidth configuration		R/W	0
> 			to track for this counter
> ==========================================================================
> 
> The feature details are documented in the APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v13: Removed the Reviewed-by tag as there is commit log change to remove
>      BMEC reference.
> 
> v12: No changes.
> 
> v11: No changes.
> 
> v10: No changes.
> 
> v9: Removed the references of L3_QOS_ABMC_DSC.
>     Text changes about configuration in kernel doc.
> 
> v8: Update the configuration notes in kernel_doc.
>     Few commit message update.
> 
> v7: Removed the reference of L3_QOS_ABMC_DSC as it is not used anymore.
>     Moved the configuration notes to kernel_doc.
>     Adjusted the tabs for l3_qos_abmc_cfg and checkpatch seems happy.
> 
> v6: Removed all the fs related changes.
>     Added note on CfgEn,CtrEn.
>     Removed the definitions which are not used.
>     Removed cntr_id initialization.
> 
> v5: Moved assignment flags here (path 10/19 of v4).
>     Added MON_CNTR_UNSET definition to initialize cntr_id's.
>     More details in commit log.
>     Renamed few fields in l3_qos_abmc_cfg for readability.
> 
> v4: Added more descriptions.
>     Changed the name abmc_ctr_id to ctr_id.
>     Added L3_QOS_ABMC_DSC. Used for reading the configuration.
> 
> v3: No changes.
> 
> v2: No changes.
> ---
>  arch/x86/include/asm/msr-index.h       |  1 +
>  arch/x86/kernel/cpu/resctrl/internal.h | 35 ++++++++++++++++++++++++++
>  2 files changed, 36 insertions(+)
> 
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 3970e0b16e47..b5b5ebead24f 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -1203,6 +1203,7 @@
>  /* - AMD: */
>  #define MSR_IA32_MBA_BW_BASE		0xc0000200
>  #define MSR_IA32_SMBA_BW_BASE		0xc0000280
> +#define MSR_IA32_L3_QOS_ABMC_CFG	0xc00003fd
>  #define MSR_IA32_L3_QOS_EXT_CFG		0xc00003ff
>  #define MSR_IA32_EVT_CFG_BASE		0xc0000400
>  
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index fcc9d23686a1..db6b0c28ee6b 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -164,6 +164,41 @@ union cpuid_0x10_x_edx {
>  	unsigned int full;
>  };
>  
> +/*
> + * ABMC counters are configured by writing to L3_QOS_ABMC_CFG.
> + * @bw_type		: Bandwidth configuration (supported by BMEC)
> + *			  tracked by the @cntr_id.

The "supported by BMEC" is unexpected with the new design that separated
the two features.

> + * @bw_src		: Bandwidth source (RMID or CLOSID).
> + * @reserved1		: Reserved.
> + * @is_clos		: @bw_src field is a CLOSID (not an RMID).
> + * @cntr_id		: Counter identifier.
> + * @reserved		: Reserved.
> + * @cntr_en		: Counting enable bit.
> + * @cfg_en		: Configuration enable bit.
> + *
> + * Configuration and counting:
> + * Counter can be configured across multiple writes to MSR. Configuration
> + * is applied only when @cfg_en = 1. Counter @cntr_id is reset when the
> + * configuration is applied.
> + * @cfg_en = 1, @cntr_en = 0 : Apply @cntr_id configuration but do not
> + *                             count events.
> + * @cfg_en = 1, @cntr_en = 1 : Apply @cntr_id configuration and start
> + *                             counting events.
> + */
> +union l3_qos_abmc_cfg {
> +	struct {
> +		unsigned long bw_type  :32,
> +			      bw_src   :12,
> +			      reserved1: 3,
> +			      is_clos  : 1,
> +			      cntr_id  : 5,
> +			      reserved : 9,
> +			      cntr_en  : 1,
> +			      cfg_en   : 1;
> +	} split;
> +	unsigned long full;
> +};
> +
>  void rdt_ctrl_update(void *arg);
>  
>  int rdt_get_mon_l3_config(struct rdt_resource *r);

Reinette

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 11/27] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
  2025-05-15 22:51 ` [PATCH v13 11/27] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC Babu Moger
@ 2025-05-22 21:51   ` Reinette Chatre
  2025-05-22 22:16     ` Luck, Tony
  2025-05-28 21:39     ` Moger, Babu
  0 siblings, 2 replies; 114+ messages in thread
From: Reinette Chatre @ 2025-05-22 21:51 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/15/25 3:51 PM, Babu Moger wrote:
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID, event pair and monitor the bandwidth as long as it
> is assigned. The assigned RMID will be tracked by the hardware until the
> user unassigns it manually.

(please review this often repeated snippet to match new design)

> 
> Implement an architecture-specific handler to assign and unassign the
> counter. Configure counters by writing to the L3_QOS_ABMC_CFG MSR,
> specifying the counter ID, bandwidth source (RMID), and event
> configuration.
> 

...

> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index ff4b2abfa044..e31084f7babd 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -448,3 +448,40 @@ inline bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
>  {
>  	return resctrl_to_arch_res(r)->mbm_cntr_assign_enabled;
>  }
> +
> +static void resctrl_abmc_config_one_amd(void *info)
> +{
> +	union l3_qos_abmc_cfg *abmc_cfg = info;
> +
> +	wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, abmc_cfg->full);
> +}
> +
> +/*
> + * Send an IPI to the domain to assign the counter to RMID, event pair.
> + */
> +void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
> +			      enum resctrl_event_id evtid, u32 rmid, u32 closid,
> +			      u32 cntr_id, u32 evt_cfg, bool assign)
> +{
> +	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
> +	union l3_qos_abmc_cfg abmc_cfg = { 0 };
> +	struct arch_mbm_state *am;
> +
> +	abmc_cfg.split.cfg_en = 1;
> +	abmc_cfg.split.cntr_en = assign ? 1 : 0;
> +	abmc_cfg.split.cntr_id = cntr_id;
> +	abmc_cfg.split.bw_src = rmid;
> +	abmc_cfg.split.bw_type = evt_cfg;

Is evt_cfg really needed to be programmed when unassigning a counter? Looking ahead at 
patch #14 resctrl_free_config_cntr() needs to go through extra list walk to get this data
but why would hardware need an accurate event configuration to *unassign* a counter?

It seems unnecessary to provide both the event ID *and* the configuration. 
resctrl_arch_config_cntr() could drop the "evt_cfg" parameter and instead there
can be a new resctrl utility that architecture can use to query the event's configuration.
Similar to resctrl_is_mon_event_enabled() introduced in 
https://lore.kernel.org/lkml/20250521225049.132551-3-tony.luck@intel.com/ that exposes an
event property.

It looks to me as though there are a couple of changes in the telemetry work
that would benefit this work. https://lore.kernel.org/lkml/20250521225049.132551-2-tony.luck@intel.com/
switches the monitor events to be maintained in an array indexed by event ID, eliminating the
need for searching the evt_list that this work does in a couple of places. Also note the handy
new for_each_mbm_event() helper (https://lore.kernel.org/lkml/20250521225049.132551-5-tony.luck@intel.com/).


> +
> +	smp_call_function_any(&d->hdr.cpu_mask, resctrl_abmc_config_one_amd, &abmc_cfg, 1);
> +
> +	/*
> +	 * The hardware counter is reset (because cfg_en == 1) so there is no
> +	 * need to record initial non-zero counts.
> +	 */
> +	if (assign) {
> +		am = get_arch_mbm_state(hw_dom, rmid, evtid);
> +		if (am)
> +			memset(am, 0, sizeof(*am));
> +	}
> +}
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index d77981d1fcb9..59a4fe60ab46 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -559,6 +559,23 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *
>   */
>  void resctrl_arch_reset_all_ctrls(struct rdt_resource *r);
>  
> +/**
> + * resctrl_arch_config_cntr() - Configure the counter id to RMID, event
> + *				pair on the domain.

The sentence seem strange, should "Configure the counter" perhaps be
"Assign the counter"? Or if the naming requires "configure" ...
"Configure the counter with its new RMID and event details."? Please feel
free to improve.

> + * @r:			Resource structure.
> + * @d:			Domain that the counter id to be configured.

I am unable to parse description of @d.

> + * @evtid:		Event type to configure.
> + * @rmid:		RMID to configure.
> + * @closid:		CLOSID to configure.
> + * @cntr_id:		Counter ID to configure.

All four parameters descriptions end with "to configure" ... but it is actually only
the counter that is configured while the rest is the data that the counter is configured with, no?

> + * @evt_cfg:		MBM event configuration value representing reads,
> + *			writes etc.

Needs definition about what the contents of @evt_cfg means. This is the API ...it
cannot be vague like "reads, write, etc." but should be specific about which bit means
what.

> + * @assign:		Assign or unassign.

"True to assign the counter, false to unassign the counter."


Needs some context here about what architecture can expect on how this function will
be called. For example, "Can be called from any CPU."

> + */
> +void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
> +			      enum resctrl_event_id evtid, u32 rmid, u32 closid,
> +			      u32 cntr_id, u32 evt_cfg, bool assign);
> +
>  extern unsigned int resctrl_rmid_realloc_threshold;
>  extern unsigned int resctrl_rmid_realloc_limit;
>  

Reinette

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 12/27] x86/resctrl: Introduce event configuration modes
  2025-05-15 22:51 ` [PATCH v13 12/27] x86/resctrl: Introduce event configuration modes Babu Moger
@ 2025-05-22 22:05   ` Reinette Chatre
  2025-05-29 15:21     ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-05-22 22:05 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/15/25 3:51 PM, Babu Moger wrote:
> MBM events can be configured using either BMEC (Bandwidth Monitoring Event
> Configuration) or the mbm_cntr_assign mode.
> 
> Introduce a data structure to represent the various event configuration
> modes and their corresponding values.
> 
> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>

I cannot recall suggesting this.

(/me digs)

Are you perhaps referring to https://lore.kernel.org/lkml/d2966a26-4483-4808-a538-bb20973dd2a1@intel.com/

This is not referring to new modes but the existing mbm_cntr_assign modes.
resctrl knows which "mbm_cntr_assign" mode is active and it can use that
to determine whether BMEC can be exposed to user space or not. There is
already enough information in resctrl to know whether BMEC files should be
exposed or not.

I think this work self makes clear that these modes are useless since
patch #25 that determines whether to hide BMEC files doesn't even
use it.


> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v13: New patch to handle different event configuration types with
>      mbm_cntr_assign mode.
> ---
>  fs/resctrl/internal.h         |  6 ++++--
>  fs/resctrl/monitor.c          |  4 ++--
>  fs/resctrl/rdtgroup.c         |  2 +-
>  include/linux/resctrl_types.h | 11 +++++++++++
>  4 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index 9a8cf6f11151..0fae374559ba 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -55,13 +55,15 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
>   * struct mon_evt - Entry in the event list of a resource
>   * @evtid:		event id
>   * @name:		name of the event
> - * @configurable:	true if the event is configurable
> + * @mbm_mode:		monitoring mode (BMEC or mbm_cntr_assign)
> + * @evt_cfg:		event configuration value decoding reads, writes.
>   * @list:		entry in &rdt_resource->evt_list
>   */
>  struct mon_evt {
>  	enum resctrl_event_id	evtid;
>  	char			*name;
> -	bool			configurable;
> +	enum resctrl_mbm_mode	mbm_mode;
> +	u32			evt_cfg;

This very important yet totally unrelated member sneaked in without
any mention.

>  	struct list_head	list;
>  };
>  
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 2548aee0151c..8e403587a02f 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -903,12 +903,12 @@ int resctrl_mon_resource_init(void)
>  	l3_mon_evt_init(r);
>  
>  	if (resctrl_arch_is_evt_configurable(QOS_L3_MBM_TOTAL_EVENT_ID)) {
> -		mbm_total_event.configurable = true;
> +		mbm_total_event.mbm_mode = MBM_MODE_BMEC;
>  		resctrl_file_fflags_init("mbm_total_bytes_config",
>  					 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
>  	}
>  	if (resctrl_arch_is_evt_configurable(QOS_L3_MBM_LOCAL_EVENT_ID)) {
> -		mbm_local_event.configurable = true;
> +		mbm_local_event.mbm_mode = MBM_MODE_BMEC;
>  		resctrl_file_fflags_init("mbm_local_bytes_config",
>  					 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
>  	}
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 752750e3e443..f192b2736a77 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -1152,7 +1152,7 @@ static int rdt_mon_features_show(struct kernfs_open_file *of,
>  
>  	list_for_each_entry(mevt, &r->mon.evt_list, list) {
>  		seq_printf(seq, "%s\n", mevt->name);
> -		if (mevt->configurable)
> +		if (mevt->mbm_mode == MBM_MODE_BMEC)

This can instead be a call to a utility that returns whether BMEC should be
visible based on resctrl_mon::mbm_cntr_assignable and rdt_hw_resource::mbm_cntr_assign_enabled
(via resctrl_arch_mbm_cntr_assign_enabled() of course).

>  			seq_printf(seq, "%s_config\n", mevt->name);
>  	}
>  
> diff --git a/include/linux/resctrl_types.h b/include/linux/resctrl_types.h
> index a25fb9c4070d..26cd1fec72db 100644
> --- a/include/linux/resctrl_types.h
> +++ b/include/linux/resctrl_types.h
> @@ -47,4 +47,15 @@ enum resctrl_event_id {
>  	QOS_NUM_EVENTS,
>  };
>  
> +/*
> + * Event configuration mode.
> + * Events can be configured either in BMEC (Bandwidth Monitoring Event
> + * Configuration) mode or mbm_cntr_assign mode.
> + */
> +enum resctrl_mbm_mode {
> +	MBM_MODE_NONE,
> +	MBM_MODE_BMEC,
> +	MBM_MODE_ASSIGN,
> +};
> +
>  #endif /* __LINUX_RESCTRL_TYPES_H */

Reinette

^ permalink raw reply	[flat|nested] 114+ messages in thread

* RE: [PATCH v13 11/27] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
  2025-05-22 21:51   ` Reinette Chatre
@ 2025-05-22 22:16     ` Luck, Tony
  2025-05-23 21:08       ` Luck, Tony
  2025-05-28 21:39     ` Moger, Babu
  1 sibling, 1 reply; 114+ messages in thread
From: Luck, Tony @ 2025-05-22 22:16 UTC (permalink / raw)
  To: Chatre, Reinette, Babu Moger, corbet@lwn.net, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com
  Cc: james.morse@arm.com, dave.martin@arm.com, fenghuay@nvidia.com,
	x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
	akpm@linux-foundation.org, thuth@redhat.com, rostedt@goodmis.org,
	ardb@kernel.org, gregkh@linuxfoundation.org,
	daniel.sneddon@linux.intel.com, jpoimboe@kernel.org,
	alexandre.chartre@oracle.com, pawan.kumar.gupta@linux.intel.com,
	thomas.lendacky@amd.com, perry.yuan@amd.com, seanjc@google.com,
	Huang, Kai, Li, Xiaoyao, kan.liang@linux.intel.com, Li, Xin3,
	ebiggers@google.com, xin@zytor.com, Mehta, Sohil,
	andrew.cooper3@citrix.com, mario.limonciello@amd.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	peternewman@google.com, Wieczor-Retman, Maciej, Eranian, Stephane,
	Xiaojian.Du@amd.com, gautham.shenoy@amd.com

> It looks to me as though there are a couple of changes in the telemetry work
> that would benefit this work. https://lore.kernel.org/lkml/20250521225049.132551-2-tony.luck@intel.com/
> switches the monitor events to be maintained in an array indexed by event ID, eliminating the
> need for searching the evt_list that this work does in a couple of places. Also note the handy
> new for_each_mbm_event() helper (https://lore.kernel.org/lkml/20250521225049.132551-5-tony.luck@intel.com/).

Yesterday I ran through the exercise of rebasing my AET patches on top of these
ABMC patches in order to check whether the ABMC patches painted resctrl
into some corner that would be hard to get back out of.

Good news: they don't.

There was a bunch of manual patching to make the first four patches fit on top
of the ABMC code, but I also noticed a few places where things were simpler
after combining the two series.

Maybe a good path forward would be to take those first four patches from
my AET series and then build ABMC on top of those.

-Tony

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 13/27] x86/resctrl: Add the functionality to assign MBM events
  2025-05-15 22:51 ` [PATCH v13 13/27] x86/resctrl: Add the functionality to assign MBM events Babu Moger
@ 2025-05-22 22:41   ` Reinette Chatre
  2025-05-29 16:05     ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-05-22 22:41 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/15/25 3:51 PM, Babu Moger wrote:
> The mbm_cntr_assign mode offers "num_mbm_cntrs" number of counters that
> can be assigned to RMID, event pair and monitor the bandwidth as long

"RMID, event pairs"? (assuming at this point in new version it will be
obvious what is meant by "event").

> as it is assigned.
> 
> Add the functionality to allocate and assign a counter to am RMID, event

"am" -> "an"

> pair in the domain.
> 
> If all the counters are in use, kernel will log the error message "Unable
> to allocate counter in domain" in /sys/fs/resctrl/info/last_cmd_status
> when a new assignment is requested. Exit on the first failure when
> assigning counters across all the domains.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

...

> ---
>  fs/resctrl/internal.h |   3 +
>  fs/resctrl/monitor.c  | 134 ++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 137 insertions(+)
> 
> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index 0fae374559ba..ce4fcac91937 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -377,6 +377,9 @@ bool closid_allocated(unsigned int closid);
>  
>  int resctrl_find_cleanest_closid(void);
>  
> +int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
> +			      struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
> +
>  #ifdef CONFIG_RESCTRL_FS_PSEUDO_LOCK
>  int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
>  
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 8e403587a02f..d76fd0840946 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -934,3 +934,137 @@ void resctrl_mon_resource_exit(void)
>  
>  	dom_data_exit(r);
>  }
> +
> +/*
> + * Configure the counter for the event, RMID pair for the domain. Reset the
> + * non-architectural state to clear all the event counters.

clear *all* the event counters?

"Reset the non-architectural state to clear all the event counters." ->
"Reset the associated non-architectural state."?

Also, please see https://lore.kernel.org/lkml/20250429003359.375508-3-tony.luck@intel.com/

> + */
> +static void resctrl_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
> +				enum resctrl_event_id evtid, u32 rmid, u32 closid,
> +				u32 cntr_id, u32 evt_cfg, bool assign)
> +{
> +	struct mbm_state *m;
> +
> +	resctrl_arch_config_cntr(r, d, evtid, rmid, closid, cntr_id, evt_cfg, assign);
> +
> +	m = get_mbm_state(d, closid, rmid, evtid);
> +	if (m)
> +		memset(m, 0, sizeof(struct mbm_state));
> +}
> +
> +/*
> + * mbm_cntr_get() - Return the cntr_id for the matching evtid and rdtgrp in
> + *		    cntr_cfg array.

Please prefix parameter names with @ in description to make obvious what is
refered to. Although "cntr_id" is a local variable so may be easier to parse
if cntr_id is replaced with actual "counter ID" term while keeping rest as
actual parameters. That makes cntr_cfg unneeded.

If intending to explain function context then failure return should also
be documented. Even better would be to follow typical style of kernel-doc
(even if not using /** start) and not mix and match so randomly.

> + */
> +static int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
> +			struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
> +{

A subtle issue here is only evident from later patches, for example patch #17,
that calls mbm_cntr_get() with a non MBM event ID from __mon_event_count().

If this usage is expected then these utilities needs extra checks to
ensure they are only called with valid MBM event IDs.

> +	int cntr_id;
> +
> +	for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
> +		if (d->cntr_cfg[cntr_id].rdtgrp == rdtgrp &&
> +		    d->cntr_cfg[cntr_id].evtid == evtid)
> +			return cntr_id;
> +	}
> +
> +	return -ENOENT;
> +}
> +
> +/*
> + * mbm_cntr_alloc() - Return the first free entry in cntr_cfg array.

"Return the first ...array."  -> "Initilialize and return ID of a new counter, return -ENOSPC on failure." ?
This is still an awkward use of kernel-doc ... better to be properly formatted.

> + */
> +static int mbm_cntr_alloc(struct rdt_resource *r, struct rdt_mon_domain *d,
> +			  struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
> +{
> +	int cntr_id;
> +
> +	for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
> +		if (!d->cntr_cfg[cntr_id].rdtgrp) {
> +			d->cntr_cfg[cntr_id].rdtgrp = rdtgrp;
> +			d->cntr_cfg[cntr_id].evtid = evtid;
> +			return cntr_id;
> +		}
> +	}
> +
> +	return -ENOSPC;
> +}
> +
> +/*
> + * mbm_get_mon_event() - Return the mon_evt entry for the matching evtid.
> + */
> +static struct mon_evt *mbm_get_mon_event(struct rdt_resource *r,
> +					 enum resctrl_event_id evtid)
> +{
> +	struct mon_evt *mevt;
> +
> +	list_for_each_entry(mevt, &r->mon.evt_list, list) {
> +		if (mevt->evtid == evtid)
> +			return mevt;
> +	}

With changes from  telemetry series this becomes an array lookup.

> +
> +	return NULL;
> +}
> +
> +/*
> + * Allocate a fresh counter and configure the event if not assigned already.
> + */
> +static int resctrl_alloc_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
> +				     struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
> +{
> +	struct mon_evt *mevt;
> +	int cntr_id;
> +
> +	/* No need to allocate a new counter if it is already assigned */
> +	cntr_id = mbm_cntr_get(r, d, rdtgrp, evtid);
> +	if (cntr_id >= 0)
> +		goto cntr_configure;
> +
> +	cntr_id = mbm_cntr_alloc(r, d, rdtgrp, evtid);
> +	if (cntr_id <  0) {
> +		rdt_last_cmd_printf("Unable to allocate counter in domain %d\n",
> +				    d->hdr.id);
> +		return cntr_id;
> +	}
> +
> +cntr_configure:
> +	mevt = mbm_get_mon_event(r, evtid);
> +	if (!mevt) {
> +		rdt_last_cmd_printf("Invalid event id %d\n", evtid);

Difficult to see at this point but it seems that this is in kernel bug territory since
user space provided text that is translated to event ID and here translated back to
monitor event. This must succeed. Could this be simplified and back-and-forth avoided
by passing the mon_evt instead of event ID?

> +		return -EINVAL;
> +	}



> +
> +	/*
> +	 * Skip reconfiguration if the event setup is current; otherwise,
> +	 * update and apply the new configuration to the domain.
> +	 */
> +	if (mevt->evt_cfg != d->cntr_cfg[cntr_id].evt_cfg) {

Lost me. Previous patch silently created mon_event::evt_cfg without initializing it.
Here it is compared and treated as the "source of truth" ... where does its value
come from?

> +		d->cntr_cfg[cntr_id].evt_cfg = mevt->evt_cfg;
> +		resctrl_config_cntr(r, d, evtid, rdtgrp->mon.rmid, rdtgrp->closid,
> +				    cntr_id, mevt->evt_cfg, true);
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Assign a hardware counter to event @evtid of group @rdtgrp.
> + * Assign counters to all domains if @d is NULL; otherwise, assign the
> + * counter to the specified domain @d.

Can add here what is mentioned in changelog that this exits on first failure
and so highlight that this can have partial assignment when exit on such failure.

> + */
> +int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
> +			      struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
> +{
> +	int ret = 0;
> +
> +	if (!d) {
> +		list_for_each_entry(d, &r->mon_domains, hdr.list) {
> +			ret = resctrl_alloc_config_cntr(r, d, rdtgrp, evtid);
> +			if (ret)
> +				return ret;
> +		}
> +	} else {
> +		ret = resctrl_alloc_config_cntr(r, d, rdtgrp, evtid);
> +	}
> +
> +	return ret;
> +}

Reinette

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 14/27] x86/resctrl: Add the functionality to unassign MBM events
  2025-05-15 22:51 ` [PATCH v13 14/27] x86/resctrl: Add the functionality to unassign " Babu Moger
@ 2025-05-22 22:49   ` Reinette Chatre
  2025-05-29 16:25     ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-05-22 22:49 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/15/25 3:51 PM, Babu Moger wrote:
> The mbm_cntr_assign mode offers "num_mbm_cntrs" number of counters that
> can be assigned to an RMID, event pair and monitor the bandwidth as long
> as it is assigned. If all the counters are in use, the kernel will log the
> error message "Unable to allocate counter in domain" in
> /sys/fs/resctrl/info/last_cmd_status when a new assignment is requested.
> 
> To make space for a new assignment, users must unassign an already
> assigned counter and retry the assignment again.
> 
> Add the functionality to unassign and free the counters in the domain.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
...

> ---
>  fs/resctrl/internal.h |  2 ++
>  fs/resctrl/monitor.c  | 60 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 62 insertions(+)
> 
> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index ce4fcac91937..64ddc107fcab 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -379,6 +379,8 @@ int resctrl_find_cleanest_closid(void);
>  
>  int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
>  			      struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
> +int resctrl_unassign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
> +				struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
>  
>  #ifdef CONFIG_RESCTRL_FS_PSEUDO_LOCK
>  int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index d76fd0840946..fbc938bd3b23 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -989,6 +989,14 @@ static int mbm_cntr_alloc(struct rdt_resource *r, struct rdt_mon_domain *d,
>  	return -ENOSPC;
>  }
>  
> +/*
> + * mbm_cntr_free() -  Reset cntr_id to zero.

"Reset cntr_id to zero"? cntr_id is an index to an array.
Please provide accurate and useful descriptions.

> + */
> +static void mbm_cntr_free(struct rdt_mon_domain *d, int cntr_id)
> +{
> +	memset(&d->cntr_cfg[cntr_id], 0, sizeof(struct mbm_cntr_cfg));
> +}
> +
>  /*
>   * mbm_get_mon_event() - Return the mon_evt entry for the matching evtid.
>   */
> @@ -1068,3 +1076,55 @@ int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
>  
>  	return ret;
>  }
> +
> +/*
> + * Unassign and free the counter if assigned.
> + */
> +static int resctrl_free_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
> +				    struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
> +{
> +	struct mon_evt *mevt;
> +	int cntr_id;
> +
> +	cntr_id = mbm_cntr_get(r, d, rdtgrp, evtid);
> +
> +	/* If there is no cntr_id assigned, nothing to do */
> +	if (cntr_id < 0)
> +		return 0;
> +
> +	mevt = mbm_get_mon_event(r, evtid);
> +	if (!mevt) {
> +		rdt_last_cmd_printf("Invalid event id %d\n", evtid);

Similar to previous comment this is in kernel bug territory and could be simplified
by passing mon_evt instead. Although this is the unassign portion where 
evt_cfg seems unnecessary.

> +		return -EINVAL;
> +	}
> +
> +	resctrl_config_cntr(r, d, evtid, rdtgrp->mon.rmid, rdtgrp->closid,
> +			    cntr_id, mevt->evt_cfg, false);
> +
> +	mbm_cntr_free(d, cntr_id);
> +
> +	return 0;
> +}
> +
> +/*
> + * Unassign a hardware counter associated with @evtid from the domain and
> + * the group. Unassign the counters from all the domains if @d is NULL else
> + * unassign from @d.
> + */
> +int  resctrl_unassign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
> +				 struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
> +{
> +	int ret;
> +
> +	if (!d) {
> +		list_for_each_entry(d, &r->mon_domains, hdr.list) {
> +			ret = resctrl_free_config_cntr(r, d, rdtgrp, evtid);
> +			if (ret)
> +				return ret;
> +		}
> +	} else {
> +		ret = resctrl_free_config_cntr(r, d, rdtgrp, evtid);
> +	}
> +
> +	return ret;
> +}

Reinette

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 15/27] x86/resctrl: Report 'Unassigned' for MBM events in mbm_cntr_assign mode
  2025-05-15 22:52 ` [PATCH v13 15/27] x86/resctrl: Report 'Unassigned' for MBM events in mbm_cntr_assign mode Babu Moger
@ 2025-05-22 23:01   ` Reinette Chatre
  2025-05-29 16:58     ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-05-22 23:01 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/15/25 3:52 PM, Babu Moger wrote:
> In mbm_cntr_assign mode, the hardware counter should be assigned to read
> the MBM events.
> 
> Report 'Unassigned' in case the user attempts to read the event without
> assigning a hardware counter.
> 
> Export resctrl_is_mbm_event() and mbm_cntr_get() to allow usage from other
> functions within fs/resctrl.

Please clarify that these two functions are exposed differently, resctrl_is_mbm_event()
is added to include/linux/resctrl.h (also note similar change in 
https://lore.kernel.org/lkml/20250429003359.375508-3-tony.luck@intel.com/)
so not just exposed to fs/resctrl but instead to resctrl fs as well as
arch code
while mbm_cntr_get() remains internal to resctrl fs by being added to
fs/resctrl/internal.h.


> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---


> ---
>  Documentation/filesystems/resctrl.rst |  8 ++++++++
>  fs/resctrl/ctrlmondata.c              | 14 ++++++++++++++
>  fs/resctrl/internal.h                 |  2 ++
>  fs/resctrl/monitor.c                  |  4 ++--
>  fs/resctrl/rdtgroup.c                 |  2 +-
>  include/linux/resctrl.h               |  1 +
>  6 files changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
> index 2bfad43aac9c..5cf2d742f04c 100644
> --- a/Documentation/filesystems/resctrl.rst
> +++ b/Documentation/filesystems/resctrl.rst
> @@ -430,6 +430,14 @@ When monitoring is enabled all MON groups will also contain:
>  	for the L3 cache they occupy). These are named "mon_sub_L3_YY"
>  	where "YY" is the node number.
>  
> +	The mbm_cntr_assign mode offers "num_mbm_cntrs" number of counters
> +	and allows users to assign a counter to mon_hw_id, event pair enabling
> +	bandwidth monitoring for as long as the counter remains assigned.
> +	The hardware will continue tracking the assigned mon_hw_id until
> +	the user manually unassigns it, ensuring that counters are not reset
> +	during this period. An MBM event returns 'Unassigned' when the event
> +	does not have a hardware counter assigned.

(please rework based on "event" vs "group" assignment ... not intending
that "group" assignment be documented but the "event" assignment needs
to be accurate for "group" assignment to be a simple extension)

> +
>  "mon_hw_id":
>  	Available only with debug option. The identifier used by hardware
>  	for the monitor group. On x86 this is the RMID.
> diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
> index 6ed2dfd4dbbd..f6b8ad24b0b5 100644
> --- a/fs/resctrl/ctrlmondata.c
> +++ b/fs/resctrl/ctrlmondata.c
> @@ -643,6 +643,18 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>  			goto out;
>  		}
>  		d = container_of(hdr, struct rdt_mon_domain, hdr);
> +
> +		/*
> +		 * Report 'Unassigned' if mbm_cntr_assign mode is enabled and
> +		 * counter is unassigned.
> +		 */
> +		if (resctrl_arch_mbm_cntr_assign_enabled(r) &&
> +		    resctrl_is_mbm_event(evtid) &&
> +		    (mbm_cntr_get(r, d, rdtgrp, evtid) < 0)) {
> +			rr.err = -ENOENT;
> +			goto checkresult;
> +		}
> +
>  		mon_event_read(&rr, r, d, rdtgrp, &d->hdr.cpu_mask, evtid, false);
>  	}
>  
> @@ -652,6 +664,8 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>  		seq_puts(m, "Error\n");
>  	else if (rr.err == -EINVAL)
>  		seq_puts(m, "Unavailable\n");
> +	else if (rr.err == -ENOENT)
> +		seq_puts(m, "Unassigned\n");
>  	else
>  		seq_printf(m, "%llu\n", rr.val);
>  

It may be unexpected that this is treated as "-ENOENT" but the function returns
success. This can be addressed with a comment when comparing the return codes to
other hardware return codes.

> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index 64ddc107fcab..0dfd2efe68fc 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -381,6 +381,8 @@ int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
>  			      struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
>  int resctrl_unassign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
>  				struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
> +int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
> +		 struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
>  
>  #ifdef CONFIG_RESCTRL_FS_PSEUDO_LOCK
>  int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index fbc938bd3b23..c98a61bde179 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -956,8 +956,8 @@ static void resctrl_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d
>   * mbm_cntr_get() - Return the cntr_id for the matching evtid and rdtgrp in
>   *		    cntr_cfg array.
>   */
> -static int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
> -			struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
> +int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
> +		 struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
>  {
>  	int cntr_id;
>  
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index f192b2736a77..72317a5adee2 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -127,7 +127,7 @@ static bool resctrl_is_mbm_enabled(void)
>  		resctrl_arch_is_mbm_local_enabled());
>  }
>  
> -static bool resctrl_is_mbm_event(int e)
> +bool resctrl_is_mbm_event(int e)
>  {
>  	return (e >= QOS_L3_MBM_TOTAL_EVENT_ID &&
>  		e <= QOS_L3_MBM_LOCAL_EVENT_ID);
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 59a4fe60ab46..f78b6064230c 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -441,6 +441,7 @@ static inline u32 resctrl_get_config_index(u32 closid,
>  	}
>  }
>  
> +bool resctrl_is_mbm_event(int e);
>  bool resctrl_arch_get_cdp_enabled(enum resctrl_res_level l);
>  int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
>  

Reinette


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 16/27] x86/resctrl: Pass entire struct rdtgroup rather than passing individual members
  2025-05-15 22:52 ` [PATCH v13 16/27] x86/resctrl: Pass entire struct rdtgroup rather than passing individual members Babu Moger
@ 2025-05-22 23:05   ` Reinette Chatre
  2025-05-29 18:07     ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-05-22 23:05 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/15/25 3:52 PM, Babu Moger wrote:
> The mbm_cntr_assign mode requires a cntr_id to read event data. The

cntr_id -> "counter ID"

> cntr_id is retrieved via mbm_cntr_get, which takes a struct rdtgroup as

cntr_id -> "counter ID"

mbm_cntr_get -> mbm_cntr_get()

> a parameter.
> 
> Passing the full rdtgroup also provides access to closid and rmid, both of

closid -> CLOSID
rmid -> RMID

> which are necessary to read monitoring events.
> 
> Refactor the code to pass the entire struct rdtgroup instead of individual

"the entire" -> "a pointer to"

> members in preparation for this requirement.
> 
> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
Patch looks good.

Reinette


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 17/27] x86/resctrl: Add the support for reading ABMC counters
  2025-05-15 22:52 ` [PATCH v13 17/27] x86/resctrl: Add the support for reading ABMC counters Babu Moger
@ 2025-05-22 23:31   ` Reinette Chatre
  2025-05-29 18:25     ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-05-22 23:31 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/15/25 3:52 PM, Babu Moger wrote:
> Software can read the assignable counters using the QM_EVTSEL and QM_CTR
> register pair.

Please append with more context on how register pair is used to support the
changes in this patch.

> 
> QM_EVTSEL Register definition:
> =======================================================
> Bits	Mnemonic	Description
> =======================================================
> 63:44	--		Reserved
> 43:32   RMID		Resource Monitoring Identifier
> 31	ExtEvtID	Extended Event Identifier
> 30:8	--		Reserved
> 7:0	EvtID		Event Identifier
> =======================================================
> 
> The contents of a specific counter can be read by setting the following
> fields in QM_EVTSEL.ExtendedEvtID = 1, QM_EVTSEL.EvtID = L3CacheABMC (=1)
> and setting [RMID] to the desired counter ID. Reading QM_CTR will then
> return the contents of the specified counter. The E bit will be set if the
> counter configuration was invalid, or if an invalid counter ID was set
> in the QM_EVTSEL[RMID] field.

Please rewrite above in imperative tone.

> 
> Introduce __cntr_id_read_phys() to read the counter ID event data.
> 
> Link: https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/40332.pdf
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v13: Split the patch into 2. First one to handle the passing of rdtgroup structure to few
>      functions( __mon_event_count and mbm_update(). Second one to handle ABMC counter reading.
>      Added new function __cntr_id_read_phys() to handle ABMC event reading.
>      Updated kernel doc for resctrl_arch_reset_rmid() and resctrl_arch_rmid_read().
>      Resolved conflicts caused by the recent FS/ARCH code restructure.
>      The monitor.c file has now been split between the FS and ARCH directories.
> 
> v12: New patch to support extended event mode when ABMC is enabled.
> ---
>  arch/x86/kernel/cpu/resctrl/internal.h |  6 +++
>  arch/x86/kernel/cpu/resctrl/monitor.c  | 66 ++++++++++++++++++++++----
>  fs/resctrl/monitor.c                   | 14 ++++--
>  include/linux/resctrl.h                |  9 ++--
>  4 files changed, 80 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index db6b0c28ee6b..3b0cdb5520c7 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -40,6 +40,12 @@ struct arch_mbm_state {
>  /* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
>  #define ABMC_ENABLE_BIT			0
>  
> +/*
> + * ABMC Qos Event Identifiers.

QoS?

> + */
> +#define ABMC_EXTENDED_EVT_ID		BIT(31)
> +#define ABMC_EVT_ID			1

Please use BIT(0) to be consistent.

> +
>  /**
>   * struct rdt_hw_ctrl_domain - Arch private attributes of a set of CPUs that share
>   *			       a resource for a control function
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index e31084f7babd..36a03dae6d8e 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -161,6 +161,41 @@ static int __rmid_read_phys(u32 prmid, enum resctrl_event_id eventid, u64 *val)
>  	return 0;
>  }
>  
> +static int __cntr_id_read_phys(u32 cntr_id, u64 *val)
> +{
> +	u64 msr_val;
> +
> +	/*
> +	 * QM_EVTSEL Register definition:
> +	 * =======================================================
> +	 * Bits    Mnemonic        Description
> +	 * =======================================================
> +	 * 63:44   --              Reserved
> +	 * 43:32   RMID            Resource Monitoring Identifier
> +	 * 31      ExtEvtID        Extended Event Identifier
> +	 * 30:8    --              Reserved
> +	 * 7:0     EvtID           Event Identifier
> +	 * =======================================================
> +	 * The contents of a specific counter can be read by setting the
> +	 * following fields in QM_EVTSEL.ExtendedEvtID(=1) and
> +	 * QM_EVTSEL.EvtID = L3CacheABMC (=1) and setting [RMID] to the
> +	 * desired counter ID. Reading QM_CTR will then return the
> +	 * contents of the specified counter. The E bit will be set if the
> +	 * counter configuration was invalid, or if an invalid counter ID
> +	 * was set in the QM_EVTSEL[RMID] field.
> +	 */
> +	wrmsr(MSR_IA32_QM_EVTSEL, ABMC_EXTENDED_EVT_ID | ABMC_EVT_ID, cntr_id);
> +	rdmsrl(MSR_IA32_QM_CTR, msr_val);
> +
> +	if (msr_val & RMID_VAL_ERROR)
> +		return -EIO;
> +	if (msr_val & RMID_VAL_UNAVAIL)
> +		return -EINVAL;
> +
> +	*val = msr_val;
> +	return 0;
> +}
> +
>  static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_mon_domain *hw_dom,
>  						 u32 rmid,
>  						 enum resctrl_event_id eventid)
> @@ -180,7 +215,7 @@ static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_mon_domain *hw_do
>  }
>  
>  void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
> -			     u32 unused, u32 rmid,
> +			     u32 unused, u32 rmid, int cntr_id,
>  			     enum resctrl_event_id eventid)
>  {
>  	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
> @@ -192,9 +227,16 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
>  	if (am) {
>  		memset(am, 0, sizeof(*am));
>  
> -		prmid = logical_rmid_to_physical_rmid(cpu, rmid);
> -		/* Record any initial, non-zero count value. */
> -		__rmid_read_phys(prmid, eventid, &am->prev_msr);
> +		if (resctrl_arch_mbm_cntr_assign_enabled(r) &&
> +		    resctrl_is_mbm_event(eventid)) {
> +			if (cntr_id < 0)

This would be a bug, no? how about WARN_ON_ONCE()?

> +				return;
> +			__cntr_id_read_phys(cntr_id, &am->prev_msr);
> +		} else {
> +			prmid = logical_rmid_to_physical_rmid(cpu, rmid);
> +			/* Record any initial, non-zero count value. */
> +			__rmid_read_phys(prmid, eventid, &am->prev_msr);
> +		}
>  	}
>  }
>  
> @@ -224,8 +266,8 @@ static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
>  }
>  
>  int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
> -			   u32 unused, u32 rmid, enum resctrl_event_id eventid,
> -			   u64 *val, void *ignored)
> +			   u32 unused, u32 rmid, int cntr_id,
> +			   enum resctrl_event_id eventid, u64 *val, void *ignored)
>  {
>  	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
>  	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
> @@ -237,8 +279,16 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
>  
>  	resctrl_arch_rmid_read_context_check();
>  
> -	prmid = logical_rmid_to_physical_rmid(cpu, rmid);
> -	ret = __rmid_read_phys(prmid, eventid, &msr_val);
> +	if (resctrl_arch_mbm_cntr_assign_enabled(r) &&
> +	    resctrl_is_mbm_event(eventid)) {
> +		if (cntr_id < 0)

WARN_ON_ONCE()?

> +			return cntr_id;
> +		ret = __cntr_id_read_phys(cntr_id, &msr_val);
> +	} else {
> +		prmid = logical_rmid_to_physical_rmid(cpu, rmid);
> +		ret = __rmid_read_phys(prmid, eventid, &msr_val);
> +	}
> +
>  	if (ret)
>  		return ret;
>  
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index a477be9cdb66..72f3dfb5b903 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -159,7 +159,11 @@ void __check_limbo(struct rdt_mon_domain *d, bool force_free)
>  			break;
>  
>  		entry = __rmid_entry(idx);
> -		if (resctrl_arch_rmid_read(r, d, entry->closid, entry->rmid,
> +		/*
> +		 * cntr_id is not relevant for QOS_L3_OCCUP_EVENT_ID.
> +		 * Pass dummy value -1.
> +		 */
> +		if (resctrl_arch_rmid_read(r, d, entry->closid, entry->rmid, -1,
>  					   QOS_L3_OCCUP_EVENT_ID, &val,
>  					   arch_mon_ctx)) {
>  			rmid_dirty = true;
> @@ -359,6 +363,7 @@ static struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
>  
>  static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
>  {
> +	int cntr_id = mbm_cntr_get(rr->r, rr->d, rdtgrp, rr->evtid);

So mbm_cntr_get() is called on *all* events (even non MBM) whether assignable counters
are supported or not. I assume it relies on num_mbm_cntrs to be zero on non-ABMC systems
but I think this needs to be explicit that mbm_cntr_get() returns -ENOENT in these cases.
Any developer attempting to modify mbm_cntr_get() needs to be aware of this usage.

This is quite subtle that resctrl_arch_reset_rmid() and resctrl_arch_rmid_read()
can be called with a negative counter ID. To help with code health this needs to
be highlighted (more later). 

>  	int cpu = smp_processor_id();
>  	u32 closid = rdtgrp->closid;
>  	u32 rmid = rdtgrp->mon.rmid;
> @@ -368,7 +373,7 @@ static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
>  	u64 tval = 0;
>  
>  	if (rr->first) {
> -		resctrl_arch_reset_rmid(rr->r, rr->d, closid, rmid, rr->evtid);
> +		resctrl_arch_reset_rmid(rr->r, rr->d, closid, rmid, cntr_id, rr->evtid);
>  		m = get_mbm_state(rr->d, closid, rmid, rr->evtid);
>  		if (m)
>  			memset(m, 0, sizeof(struct mbm_state));
> @@ -379,7 +384,7 @@ static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
>  		/* Reading a single domain, must be on a CPU in that domain. */
>  		if (!cpumask_test_cpu(cpu, &rr->d->hdr.cpu_mask))
>  			return -EINVAL;
> -		rr->err = resctrl_arch_rmid_read(rr->r, rr->d, closid, rmid,
> +		rr->err = resctrl_arch_rmid_read(rr->r, rr->d, closid, rmid, cntr_id,
>  						 rr->evtid, &tval, rr->arch_mon_ctx);
>  		if (rr->err)
>  			return rr->err;
> @@ -404,7 +409,8 @@ static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
>  	list_for_each_entry(d, &rr->r->mon_domains, hdr.list) {
>  		if (d->ci->id != rr->ci->id)
>  			continue;
> -		err = resctrl_arch_rmid_read(rr->r, d, closid, rmid,
> +		cntr_id = mbm_cntr_get(rr->r, d, rdtgrp, rr->evtid);
> +		err = resctrl_arch_rmid_read(rr->r, d, closid, rmid, cntr_id,
>  					     rr->evtid, &tval, rr->arch_mon_ctx);
>  		if (!err) {
>  			rr->val += tval;
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index f78b6064230c..cd24d1577e0a 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -473,6 +473,7 @@ void resctrl_offline_cpu(unsigned int cpu);
>   *			counter may match traffic of both @closid and @rmid, or @rmid
>   *			only.
>   * @rmid:		rmid of the counter to read.
> + * @cntr_id:		cntr_id to read MBM events with mbm_cntr_assign mode.

"Counter ID used to read MBM events in mbm_cntr_evt_assign mode. Only valid when
 mbm_cntr_evt_assign mode is enabled and @eventid is an MBM event. Can be negative
 when invalid." (Please feel free to improve)

>   * @eventid:		eventid to read, e.g. L3 occupancy.
>   * @val:		result of the counter read in bytes.
>   * @arch_mon_ctx:	An architecture specific value from
> @@ -490,8 +491,9 @@ void resctrl_offline_cpu(unsigned int cpu);
>   * 0 on success, or -EIO, -EINVAL etc on error.
>   */
>  int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
> -			   u32 closid, u32 rmid, enum resctrl_event_id eventid,
> -			   u64 *val, void *arch_mon_ctx);
> +			   u32 closid, u32 rmid, int cntr_id,
> +			   enum resctrl_event_id eventid, u64 *val,
> +			   void *arch_mon_ctx);
>  
>  /**
>   * resctrl_arch_rmid_read_context_check()  - warn about invalid contexts
> @@ -532,12 +534,13 @@ struct rdt_domain_hdr *resctrl_find_domain(struct list_head *h, int id,
>   * @closid:	closid that matches the rmid. Depending on the architecture, the
>   *		counter may match traffic of both @closid and @rmid, or @rmid only.
>   * @rmid:	The rmid whose counter values should be reset.
> + * @cntr_id:	The cntr_id to read MBM events with mbm_cntr_assign mode.

Same as above.

>   * @eventid:	The eventid whose counter values should be reset.
>   *
>   * This can be called from any CPU.
>   */
>  void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
> -			     u32 closid, u32 rmid,
> +			     u32 closid, u32 rmid, int cntr_id,
>  			     enum resctrl_event_id eventid);
>  
>  /**

Reinette

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 18/27] x86/resctrl: Add definitions for MBM event configuration
  2025-05-15 22:52 ` [PATCH v13 18/27] x86/resctrl: Add definitions for MBM event configuration Babu Moger
@ 2025-05-23  4:41   ` Reinette Chatre
  2025-05-29 19:00     ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-05-23  4:41 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/15/25 3:52 PM, Babu Moger wrote:
> The "mbm_cntr_assign" mode allows users to manually assign a hardware
> counter to a specific RMID and event pair. The events available for
> assignment are configurable.
> 
> By default, each resctrl group supports two MBM events: mbm_total_bytes
> and mbm_local_bytes. Each event corresponds to an MBM configuration that
> specifies the bandwidth sources tracked by the event.

hmmm ... earlier I thought "bandwidth source" means RMID but here it
seems to mean the memory transactions? The various terms are confusing.

> 
> Add definitions of supported bandwidth sources.

changelog uses "bandwidth sources" while the comments of patch
uses "memory transactions" ... please be consistent with terms.

> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v13: Updated the changelog.
>      Removed the definitions from resctrl_types.h and moved to internal.h.
>      Removed mbm_assign_config definition. Configurations will be part of
>      mon_evt list.
>      Resolved conflicts caused by the recent FS/ARCH code restructure.
>      The rdtgroup.c file has now been split between the FS and ARCH directories.
> 
> v12: New patch to support event configurations via new counter_configs
>      method.
> ---
>  fs/resctrl/internal.h | 10 ++++++++++
>  fs/resctrl/rdtgroup.c | 14 ++++++++++++++
>  2 files changed, 24 insertions(+)
> 
> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index 0dfd2efe68fc..019d00bf5adf 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -203,6 +203,16 @@ struct rdtgroup {
>  	struct pseudo_lock_region	*plr;
>  };
>  
> +/**
> + * struct mbm_evt_value - Specific type of memory events.

I am trying to decipher the terminology. If these are events, then it becomes confusing
since it becomes "these events are used to configure events". You mention "memory
transaction" below, this sounds more accurate to me. Above could thus be:

struct mbm_evt_value - Memory transaction an MBM event can be configured with.

The name of the struct could also do with a rename to avoid the "event" term that
conflicts with the actual MBM events. Maybe "mbm_cfg_value" ... I do not think this
is a good name so please consider what would work better.

> + * @evt_name:		Name of memory transaction type (read, write etc).

Unclear what "type" means ... maybe just "Name of memory transaction (read, write ...)"?

The "evt_" prefix looks unnecessary.

> + * @evt_val:		Value representing the memory transaction.

This could just be "val" and the description could be specific:

"The bit used to represent the memory transaction within an event's configuration."
Please feel free to improve.

> + */
> +struct mbm_evt_value {
> +	char    evt_name[32];
> +	u32     evt_val;

Please space member names with TABs.

> +};
> +
>  /* rdtgroup.flags */
>  #define	RDT_DELETED		1
>  
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 72317a5adee2..b109e91096b0 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -75,6 +75,20 @@ static void rdtgroup_destroy_root(void);
>  
>  struct dentry *debugfs_resctrl;
>  
> +/* Number of memory transaction types that can be monitored */

"Number of memory transactions that an MBM event can be configured with."?

> +#define NUM_MBM_EVT_VALUES             7
> +
> +/* Decoded values for each type of memory events */

Please be consistent with terminology. In the above lines it switches
between "memory transaction types" and "memory events".

> +struct mbm_evt_value mbm_evt_values[NUM_MBM_EVT_VALUES] = {
> +	{"local_reads", READS_TO_LOCAL_MEM},
> +	{"remote_reads", READS_TO_REMOTE_MEM},
> +	{"local_non_temporal_writes", NON_TEMP_WRITE_TO_LOCAL_MEM},
> +	{"remote_non_temporal_writes", NON_TEMP_WRITE_TO_REMOTE_MEM},
> +	{"local_reads_slow_memory", READS_TO_LOCAL_S_MEM},
> +	{"remote_reads_slow_memory", READS_TO_REMOTE_S_MEM},
> +	{"dirty_victim_writes_all", DIRTY_VICTIMS_TO_ALL_MEM},
> +};
> +
>  /*
>   * Memory bandwidth monitoring event to use for the default CTRL_MON group
>   * and each new CTRL_MON group created by the user.  Only relevant when

Reinette

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 19/27] x86/resctrl: Add event configuration directory under info/L3_MON/
  2025-05-15 22:52 ` [PATCH v13 19/27] x86/resctrl: Add event configuration directory under info/L3_MON/ Babu Moger
@ 2025-05-23  4:43   ` Reinette Chatre
  2025-05-29 19:54     ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-05-23  4:43 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/15/25 3:52 PM, Babu Moger wrote:
> Create the configuration directory and files for mbm_cntr_assign mode.
> These configurations will be used to assign MBM events in mbm_cntr_assign
> mode, with two default configurations created upon mounting.

This just jumps in with what the patch does. Requirements for proper changelog
should be familiar by now. The changelog *always* starts with a context.

Sample:

"When assignable counters are supported the
/sys/fs/resctrl/info/L3_MON/event_configs directory contains a sub-directory
for each MBM event that can be assigned to a counter. The MBM event
sub-directory contains a file named "event_filter" that is used to
view and modify which memory transactions the MBM event is configured with.

Create the /sys/fs/resctrl/info/L3_MON/event_configs directory on resctrl
mount and pre-populate it with directories for the two existing MBM events:
mbm_total_bytes and mbm_local_bytes. Create the "event_filter" file within
each MBM event directory with the needed *show() that displays the memory
transactions with which the MBM event is configured."

> 
> Example:
> $ cd /sys/fs/resctrl/
> $ cat info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>   local_reads, remote_reads, local_non_temporal_writes,
>   remote_non_temporal_writes, local_reads_slow_memory,
>   remote_reads_slow_memory, dirty_victim_writes_all
> 
> $ cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>   local_reads, local_non_temporal_writes, local_reads_slow_memory
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v13: Updated user doc (resctrl.rst).
>      Changed the name of the function resctrl_mkdir_info_configs to
>      resctrl_mkdir_counter_configs().
>      Replaced seq_puts() with seq_putc() where applicable.
>      Removed RFTYPE_MON_CONFIG definition. Not required.
>      Changed the name of the flag RFTYPE_CONFIG to RFTYPE_ASSIGN_CONFIG.
>      Reinette suggested RFTYPE_MBM_EVENT_CONFIG but RFTYPE_ASSIGN_CONFIG
>      seemed shorter and pricise.
>      The configuration is created using evt_list.
>      Resolved conflicts caused by the recent FS/ARCH code restructure.
>      The monitor.c/rdtgroup.c files have been split between the FS and ARCH directories.
> 
> v12: New patch to hold the MBM event configurations for mbm_cntr_assign mode.
> ---
>  Documentation/filesystems/resctrl.rst | 30 ++++++++++
>  fs/resctrl/internal.h                 |  2 +
>  fs/resctrl/monitor.c                  |  1 +
>  fs/resctrl/rdtgroup.c                 | 80 +++++++++++++++++++++++++++
>  4 files changed, 113 insertions(+)
> 
> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
> index 5cf2d742f04c..4eb9f007ba3d 100644
> --- a/Documentation/filesystems/resctrl.rst
> +++ b/Documentation/filesystems/resctrl.rst
> @@ -306,6 +306,36 @@ with the following files:
>  	  # cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
>  	  0=30;1=30
>  
> +"counter_configs":
> +	When the "mbm_cntr_assign" mode is supported, a dedicated directory is created
> +	under the "L3_MON" directory to store configuration files.

? it does not contain files but directories for each event, no?

It will help if the text is specific. For example,
	"event_configs":
	Directory that exists when mbm_cntr_evt_assign is supported. Contains sub-directory
	for each MBM event that can be assigned to a counter. Each MBM event
	sub-directory ...

> +
> +	These files contain the list of configurable events. There are two default

So confusing ... terminology is all over the place. Which files are even talked about here?
"configurable events" ... are these the memory transactions or MBM events? 

> +	configurations: mbm_local_bytes and mbm_total_bytes.

"two default configurations"? These are not "configurations" but "events", no?

> +
> +	Following types of events are supported:

events -> memory transactions?

I am unable to parse the above.


> +
> +	==== ========================= ============================================================
> +	Bits Name   		         Description
> +	==== ========================= ============================================================
> +	6    dirty_victim_writes_all     Dirty Victims from the QOS domain to all types of memory
> +	5    remote_reads_slow_memory    Reads to slow memory in the non-local NUMA domain
> +	4    local_reads_slow_memory     Reads to slow memory in the local NUMA domain
> +	3    remote_non_temporal_writes  Non-temporal writes to non-local NUMA domain
> +	2    local_non_temporal_writes   Non-temporal writes to local NUMA domain
> +	1    remote_reads                Reads to memory in the non-local NUMA domain
> +	0    local_reads                 Reads to memory in the local NUMA domain
> +	==== ========================= ==========================================================

Why does user need to know the bit position used to represent the memory transaction?

> +
> +	For example::
> +
> +	  # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
> +	  local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
> +	  local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all
> +
> +	  # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> +	  local_reads, local_non_temporal_writes, local_reads_slow_memory
> +
>  "max_threshold_occupancy":
>  		Read/write file provides the largest value (in
>  		bytes) at which a previously used LLC_occupancy
> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index 019d00bf5adf..446cc9cc61df 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -238,6 +238,8 @@ struct mbm_evt_value {
>  
>  #define RFTYPE_DEBUG			BIT(10)
>  
> +#define RFTYPE_ASSIGN_CONFIG		BIT(11)
> +
>  #define RFTYPE_CTRL_INFO		(RFTYPE_INFO | RFTYPE_CTRL)
>  
>  #define RFTYPE_MON_INFO			(RFTYPE_INFO | RFTYPE_MON)
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 72f3dfb5b903..1f72249a5c93 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -932,6 +932,7 @@ int resctrl_mon_resource_init(void)
>  					 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
>  		resctrl_file_fflags_init("available_mbm_cntrs",
>  					 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
> +		resctrl_file_fflags_init("event_filter", RFTYPE_ASSIGN_CONFIG);
>  	}
>  
>  	return 0;
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index b109e91096b0..cf84e3a382ac 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -1911,6 +1911,25 @@ static int resctrl_available_mbm_cntrs_show(struct kernfs_open_file *of,
>  	return ret;
>  }
>  
> +static int event_filter_show(struct kernfs_open_file *of, struct seq_file *seq, void *v)
> +{
> +	struct mon_evt *mevt = rdt_kn_parent_priv(of->kn);
> +	bool sep = false;
> +	int i;
> +
> +	for (i = 0; i < NUM_MBM_EVT_VALUES; i++) {
> +		if (mevt->evt_cfg & mbm_evt_values[i].evt_val) {

Still no idea where mevt->evt_cfg comes from. Patch ordering issue?

> +			if (sep)
> +				seq_putc(seq, ',');
> +			seq_printf(seq, "%s", mbm_evt_values[i].evt_name);
> +			sep = true;
> +		}
> +	}
> +	seq_putc(seq, '\n');
> +
> +	return 0;
> +}
> +
>  /* rdtgroup information files for one cache resource. */
>  static struct rftype res_common_files[] = {
>  	{
> @@ -2035,6 +2054,12 @@ static struct rftype res_common_files[] = {
>  		.seq_show	= mbm_local_bytes_config_show,
>  		.write		= mbm_local_bytes_config_write,
>  	},
> +	{
> +		.name		= "event_filter",
> +		.mode		= 0444,
> +		.kf_ops		= &rdtgroup_kf_single_ops,
> +		.seq_show	= event_filter_show,
> +	},
>  	{
>  		.name		= "mbm_assign_mode",
>  		.mode		= 0444,
> @@ -2317,6 +2342,55 @@ static int rdtgroup_mkdir_info_resdir(void *priv, char *name,
>  	return ret;
>  }
>  
> +static int resctrl_mkdir_counter_configs(struct rdt_resource *r, char *name)
> +{
> +	struct kernfs_node *l3_mon_kn, *kn_subdir, *kn_subdir2;
> +	struct mon_evt *mevt;
> +	int ret;
> +
> +	l3_mon_kn = kernfs_find_and_get(kn_info, name);
> +	if (!l3_mon_kn)
> +		return -ENOENT;
> +
> +	kn_subdir = kernfs_create_dir(l3_mon_kn, "counter_configs", l3_mon_kn->mode, NULL);
> +	if (IS_ERR(kn_subdir)) {
> +		kernfs_put(l3_mon_kn);
> +		return PTR_ERR(kn_subdir);
> +	}
> +
> +	ret = rdtgroup_kn_set_ugid(kn_subdir);
> +	if (ret) {
> +		kernfs_put(l3_mon_kn);
> +		return ret;
> +	}
> +
> +	list_for_each_entry(mevt, &r->mon.evt_list, list) {
> +		if (mevt->mbm_mode == MBM_MODE_ASSIGN) {

I do not think this "mbm_mode" is needed, resctrl_mon::mbm_cntr_assignable is already used
earlier, so would for_each_mbm_event() from the telemetry work be useful here?

> +			kn_subdir2 = kernfs_create_dir(kn_subdir, mevt->name,
> +						       kn_subdir->mode, mevt);
> +			if (IS_ERR(kn_subdir2)) {
> +				ret = PTR_ERR(kn_subdir2);
> +				goto config_out;

"grep goto fs/resctrl/rdtgroup.c" for naming conventions.

> +			}
> +
> +			ret = rdtgroup_kn_set_ugid(kn_subdir2);
> +			if (ret)
> +				goto config_out;
> +
> +			ret = rdtgroup_add_files(kn_subdir2, RFTYPE_ASSIGN_CONFIG);
> +			if (!ret)
> +				kernfs_activate(kn_subdir);
> +		}
> +	}
> +
> +config_out:
> +	kernfs_put(l3_mon_kn);
> +	if (ret)
> +		kernfs_remove(kn_subdir);

This looks unnecessary since caller does kernfs_remove() on error return. Compare
with how rdtgroup_mkdir_info_resdir() handles errors.

> +
> +	return ret;
> +}
> +
>  static unsigned long fflags_from_resource(struct rdt_resource *r)
>  {
>  	switch (r->rid) {
> @@ -2363,6 +2437,12 @@ static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
>  		ret = rdtgroup_mkdir_info_resdir(r, name, fflags);
>  		if (ret)
>  			goto out_destroy;
> +
> +		if (r->mon.mbm_cntr_assignable) {
> +			ret = resctrl_mkdir_counter_configs(r, name);
> +			if (ret)
> +				goto out_destroy;
> +		}
>  	}
>  
>  	ret = rdtgroup_kn_set_ugid(kn_info);

Reinette

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 20/27] x86/resctrl: Provide interface to update the event configurations
  2025-05-15 22:52 ` [PATCH v13 20/27] x86/resctrl: Provide interface to update the event configurations Babu Moger
@ 2025-05-23  4:45   ` Reinette Chatre
  2025-05-29 22:35     ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-05-23  4:45 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/15/25 3:52 PM, Babu Moger wrote:
> Users can modify the event configuration by writing to the event_filter
> interface file. The event configurations for mbm_cntr_assign mode are
> located in /sys/fs/resctrl/info/event_configs/.

heh ... looks like you also started thinking that "event_configs"
is a better name (also missing L3_MON).

> 
> Update the assignments of all groups when the event configuration is
> modified.
> 
> Example:
> $ cd /sys/fs/resctrl/
> 
> $ cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>   local_reads,local_non_temporal_writes,local_reads_slow_memory
> 
> $ echo "local_reads,local_non_temporal_writes" >
>   info/L3_MON/counter_configs/mbm_total_bytes/event_filter
> 
> $ cat info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>   local_reads,local_non_temporal_writes
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v13: Updated changelog for imperative mode.
>      Added function description in the prototype.
>      Updated the user doc resctrl.rst to address few feedback.
>      Resolved conflicts caused by the recent FS/ARCH code restructure.
>      The rdtgroup.c/monitor.c file has now been split between the FS and ARCH directories.
> 
> v12: New patch to modify event configurations.
> ---
>  Documentation/filesystems/resctrl.rst |  12 +++
>  fs/resctrl/rdtgroup.c                 | 120 +++++++++++++++++++++++++-
>  2 files changed, 131 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
> index 4eb9f007ba3d..9923276826db 100644
> --- a/Documentation/filesystems/resctrl.rst
> +++ b/Documentation/filesystems/resctrl.rst

...

> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index cf84e3a382ac..8c498b41be5d 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -1930,6 +1930,123 @@ static int event_filter_show(struct kernfs_open_file *of, struct seq_file *seq,
>  	return 0;
>  }
>  
> +/**
> + * resctrl_group_assign - Update the counter assignments for the event in
> + *			  a group.

This name is very generic with an unexpected namespace. "rdtgroup_" prefix
is often used for a function that operates on a rdtgroup. This can thus be
"rdtgroup_assign_cntr()".

> + * @r:		Resource to which update needs to be done.
> + * @rdtgrp:	Resctrl group.
> + * @evtid:	Event ID.
> + * @evt_cfg:	Event configuration value.
> + */
> +static int resctrl_group_assign(struct rdt_resource *r, struct rdtgroup *rdtgrp,
> +				enum resctrl_event_id evtid, u32 evt_cfg)
> +{
> +	struct rdt_mon_domain *d;
> +	int cntr_id;
> +
> +	list_for_each_entry(d, &r->mon_domains, hdr.list) {
> +		cntr_id = mbm_cntr_get(r, d, rdtgrp, evtid);
> +		if (cntr_id >= 0 && d->cntr_cfg[cntr_id].evt_cfg != evt_cfg) {
> +			d->cntr_cfg[cntr_id].evt_cfg = evt_cfg;
> +			resctrl_arch_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
> +						 rdtgrp->closid, cntr_id, evt_cfg, true);
> +		}
> +	}
> +
> +	return 0;

Can just return void?

> +}
> +
> +/**
> + * resctrl_update_assign - Update the counter assignments for the event for all
> + *			   the groups.

Again very generic with "update" and "assign" that seem redundant? How about
"resctrl_assign_cntr_allrdtgrp()"?

> + * @r:		Resource to which update needs to be done.
> + * @evtid:	Event ID.
> + * @evt_cfg:	Event configuration value.

Why are both event ID and evt_cfg needed? Could just passing mon_evt simplify this?

> + */
> +static int resctrl_update_assign(struct rdt_resource *r, enum resctrl_event_id evtid,
> +				 u32 evt_cfg)
> +{
> +	struct rdtgroup *prgrp, *crgrp;
> +
> +	/* Check if the cntr_id is associated to the event type updated */

Comment does not match code.

> +	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
> +		resctrl_group_assign(r, prgrp, evtid, evt_cfg);
> +
> +		list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list, mon.crdtgrp_list) {
> +			resctrl_group_assign(r, crgrp, evtid, evt_cfg);
> +		}

Unnecessary braces?

> +	}
> +
> +	return 0;

return void?

> +}
> +
> +static int resctrl_process_configs(char *tok, u32 *val)
> +{
> +	char *evt_str;
> +	bool found;
> +	int i;
> +
> +next_config:
> +	if (!tok || tok[0] == '\0')
> +		return 0;
> +
> +	/* Start processing the strings for each event type */

Does comment intend to describe one iteration or all iterations?
Also, "event type" -> "memory transaction"?

> +	evt_str = strim(strsep(&tok, ","));
> +	found = false;
> +	for (i = 0; i < NUM_MBM_EVT_VALUES; i++) {
> +		if (!strcmp(mbm_evt_values[i].evt_name, evt_str)) {
> +			*val |=  mbm_evt_values[i].evt_val;

check spacing.

> +			found = true;
> +			break;
> +		}
> +	}
> +
> +	if (!found) {
> +		rdt_last_cmd_printf("Invalid event type %s\n", evt_str);
> +		return -EINVAL;

Looks like this will return partially initialized data. Please use a local
variable in which to gather the new configuration and only assign that
to provided pointer on success.

> +	}
> +
> +	goto next_config;
> +}
> +
> +static ssize_t event_filter_write(struct kernfs_open_file *of, char *buf,
> +				  size_t nbytes, loff_t off)
> +{
> +	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
> +	struct mon_evt *mevt = rdt_kn_parent_priv(of->kn);
> +	u32 evt_cfg = 0;
> +	int ret = 0;
> +
> +	/* Valid input requires a trailing newline */
> +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
> +		return -EINVAL;
> +
> +	buf[nbytes - 1] = '\0';
> +
> +	cpus_read_lock();
> +	mutex_lock(&rdtgroup_mutex);
> +
> +	rdt_last_cmd_clear();
> +
> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
> +		rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
> +		ret = -EINVAL;
> +		goto unlock_out;

"grep goto fs/resctrl/rdtgroup.c"

> +	}
> +
> +	ret = resctrl_process_configs(buf, &evt_cfg);
> +	if (!ret && mevt->evt_val != evt_cfg) {
> +		mevt->evt_val = evt_cfg;

ah ... here it is. hmmm ... but it is mon_evt::evt_cfg, no? ah,
fixed in next patch.

I still seem to be missing something because I expected mon_evt::evt_cfg
of mbm_total_bytes and mbm_local_bytes to be initialized with a starting
default. I missed where this is done in this series.

> +		resctrl_update_assign(r, mevt->evtid, evt_cfg);
> +	}
> +
> +unlock_out:
> +	mutex_unlock(&rdtgroup_mutex);
> +	cpus_read_unlock();
> +
> +	return ret ?: nbytes;
> +}
> +
>  /* rdtgroup information files for one cache resource. */
>  static struct rftype res_common_files[] = {
>  	{
> @@ -2056,9 +2173,10 @@ static struct rftype res_common_files[] = {
>  	},
>  	{
>  		.name		= "event_filter",
> -		.mode		= 0444,
> +		.mode		= 0644,
>  		.kf_ops		= &rdtgroup_kf_single_ops,
>  		.seq_show	= event_filter_show,
> +		.write		= event_filter_write,
>  	},
>  	{
>  		.name		= "mbm_assign_mode",

Reinette

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 23/27] x86/resctrl: Introduce mbm_L3_assignments to list assignments in a group
  2025-05-15 22:52 ` [PATCH v13 23/27] x86/resctrl: Introduce mbm_L3_assignments to list assignments in a group Babu Moger
@ 2025-05-23  4:47   ` Reinette Chatre
  2025-05-30  0:55     ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-05-23  4:47 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/15/25 3:52 PM, Babu Moger wrote:
> Introduce the interface to display the assignment states for each group
> when mbm_cntr_assign mode is enabled.
> 
> The list is displayed in the following format:
> <Event configuration>:<Domain id>=<Assignment type>

Should this just be <Event>? The information is just the event name, not
its configuration that will be in the "event_filter" file.

> 
> Event configuration: A valid event configuration listed in the
> /sys/fs/resctrl/info/L3_MON/counter_configs directory.
> 
> Domain ID: A valid domain ID number.
> 
> The assignment type can be one of the following:
> 
> _ : No event configuration assigned
> 
> e : Event configuration assigned in exclusive mode
> 
> Example:
> $cd /sys/fs/resctrl
> $cat mbm_L3_assignments
> mbm_total_bytes:0=e;1=e
> mbm_local_bytes:0=e;1=e
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v13: Changelog update.
>      Few changes in mbm_L3_assignments_show() after moving the event config to evt_list.
>      Resolved conflicts caused by the recent FS/ARCH code restructure.
>      The rdtgroup.c/monitor.c files have been split between the FS and ARCH directories.
> 
> v12: New patch:
>      Assignment interface moved inside the group based the discussion
>      https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/#t
> ---
>  Documentation/filesystems/resctrl.rst | 28 +++++++++++++++
>  fs/resctrl/monitor.c                  |  1 +
>  fs/resctrl/rdtgroup.c                 | 52 +++++++++++++++++++++++++++
>  3 files changed, 81 insertions(+)
> 
> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
> index 356f1f918a86..2350c1f21f4e 100644
> --- a/Documentation/filesystems/resctrl.rst
> +++ b/Documentation/filesystems/resctrl.rst
> @@ -504,6 +504,34 @@ When the "mba_MBps" mount option is used all CTRL_MON groups will also contain:
>  	/sys/fs/resctrl/info/L3_MON/mon_features changes the input
>  	event.
>  
> +"mbm_L3_assignments":
> +	This interface file is created when the mbm_cntr_assign mode is supported

"This interface file is created when" -> "Exists when mbm_cntr_assign mode is supported"?

> +	and shows the assignment status for each group.

This doc is in the portion documenting files in monitor groups. So rather:
"the assignment status for each group" -> "the counter assignment status for the MON group"?

> +
> +	The assignment list is displayed in the following format:
> +
> +	<Event configuration>:<Domain id>=<Assignment type>

<Event configuration> -> <Event>

> +
> +	Event configuration: A valid event configuration listed in the

"A valid event in the /sys/fs/resctrl/info/L3_MON/event_configs directory"

> +	/sys/fs/resctrl/info/L3_MON/counter_configs directory.
> +
> +	Domain ID: A valid domain ID number.

"A valid domain ID"

> +
> +	Assignment types:
> +
> +	_ : No event configuration assigned

hmmm ... since the line has event as first field, would this not reflect the
counter? That is "No counter assigned"

> +
> +	e : Event configuration assigned in exclusive mode

"Counter assigned exclusively"? (with exclusive defined somewhere)

> +
> +	Example:
> +	To list the assignment states for the default group.

"the counter assignment states"?

> +	::
> +
> +	  # cd /sys/fs/resctrl
> +	  # cat mbm_L3_assignments
> +	    mbm_total_bytes:0=e;1=e
> +	    mbm_local_bytes:0=e;1=e
> +
>  Resource allocation rules
>  -------------------------
>  
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 5f6c4b662f3b..b982540ce4e3 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -935,6 +935,7 @@ int resctrl_mon_resource_init(void)
>  		resctrl_file_fflags_init("event_filter", RFTYPE_ASSIGN_CONFIG);
>  		resctrl_file_fflags_init("mbm_assign_on_mkdir", RFTYPE_MON_INFO |
>  					 RFTYPE_RES_CACHE);
> +		resctrl_file_fflags_init("mbm_L3_assignments", RFTYPE_MON_BASE);
>  	}
>  
>  	return 0;
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 931ea355f159..8d970b99bbbd 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -2080,6 +2080,52 @@ static ssize_t resctrl_mbm_assign_on_mkdir_write(struct kernfs_open_file *of,
>  	return ret ?: nbytes;
>  }
>  
> +static int mbm_L3_assignments_show(struct kernfs_open_file *of, struct seq_file *s, void *v)
> +{
> +	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
> +	struct rdt_mon_domain *d;
> +	struct rdtgroup *rdtgrp;
> +	struct mon_evt *mevt;
> +	int ret = 0;
> +	bool sep;
> +
> +	rdtgrp = rdtgroup_kn_lock_live(of->kn);
> +	if (!rdtgrp)
> +		return -ENOENT;

Missing a rdtgroup_kn_unlock()?

> +
> +	rdt_last_cmd_clear();
> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
> +		rdt_last_cmd_puts("mbm_cntr_assign mode not enabled\n");
> +		ret = -ENOENT;
> +		goto assign_out;

grep goto fs/resctrl/rdtgroup.c

> +	}
> +
> +	list_for_each_entry(mevt, &r->mon.evt_list, list) {

can use for_each_mbm_event() and then below will not be needed?

> +		if (mevt->mbm_mode != MBM_MODE_ASSIGN)
> +			continue;
> +
> +		sep = false;
> +		seq_printf(s, "%s:", mevt->name);
> +		list_for_each_entry(d, &r->mon_domains, hdr.list) {
> +			if (sep)
> +				seq_putc(s, ';');
> +
> +			if (mbm_cntr_get(r, d, rdtgrp, mevt->evtid) >= 0)
> +				seq_printf(s, "%d=e", d->hdr.id);
> +			else
> +				seq_printf(s, "%d=_", d->hdr.id);
> +
> +			sep = true;
> +		}
> +		seq_putc(s, '\n');
> +	}
> +
> +assign_out:
> +	rdtgroup_kn_unlock(of->kn);
> +
> +	return ret;
> +}
> +
>  /* rdtgroup information files for one cache resource. */
>  static struct rftype res_common_files[] = {
>  	{
> @@ -2218,6 +2264,12 @@ static struct rftype res_common_files[] = {
>  		.seq_show	= event_filter_show,
>  		.write		= event_filter_write,
>  	},
> +	{
> +		.name		= "mbm_L3_assignments",
> +		.mode		= 0444,
> +		.kf_ops		= &rdtgroup_kf_single_ops,
> +		.seq_show	= mbm_L3_assignments_show,
> +	},
>  	{
>  		.name		= "mbm_assign_mode",
>  		.mode		= 0444,

Reinette

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 21/27] x86/resctrl: Introduce mbm_assign_on_mkdir to configure assignments
  2025-05-15 22:52 ` [PATCH v13 21/27] x86/resctrl: Introduce mbm_assign_on_mkdir to configure assignments Babu Moger
@ 2025-05-23  4:48   ` Reinette Chatre
  2025-05-29 23:03     ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-05-23  4:48 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/15/25 3:52 PM, Babu Moger wrote:
> The mbm_cntr_assign mode provides an option to the user to assign a
> counter to an RMID, event pair and monitor the bandwidth as long as
> the counter is assigned.
> 
> Introduce a configuration option to automatically assign counter IDs

"assign counter IDs" -> "assign counter IDs to <what?>"

> when a resctrl group is created, provided the counters are available.
> By default, this option is enabled at boot.
> 
> Suggested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v13: Added Suggested-by tag.
>      Resolved conflicts caused by the recent FS/ARCH code restructure.
>      The rdtgroup.c/monitor.c file has now been split between the FS and ARCH directories.
> 
> v12: New patch. Added after the discussion on the list.
>      https://lore.kernel.org/lkml/CALPaoCh8siZKjL_3yvOYGL4cF_n_38KpUFgHVGbQ86nD+Q2_SA@mail.gmail.com/
> ---
>  Documentation/filesystems/resctrl.rst | 10 ++++++
>  fs/resctrl/monitor.c                  |  2 ++
>  fs/resctrl/rdtgroup.c                 | 44 +++++++++++++++++++++++++--
>  include/linux/resctrl.h               |  2 ++
>  4 files changed, 56 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
> index 9923276826db..356f1f918a86 100644
> --- a/Documentation/filesystems/resctrl.rst
> +++ b/Documentation/filesystems/resctrl.rst
> @@ -348,6 +348,16 @@ with the following files:
>  	  # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>  	   local_reads, local_non_temporal_writes
>  
> +"mbm_assign_on_mkdir":
> +	Automatically assign the monitoring counters on resctrl group creation

assign the monitoring counters to what?

> +	if the counters are available. It is enabled by default on boot and users
> +	can disable by writing to the interface.
> +	::
> +
> +	  # echo 0 > /sys/fs/resctrl/info/L3_MON/mbm_assign_on_mkdir
> +	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_on_mkdir
> +	  0

Please be explicit in docs what possible values are and what they mean.

> +
>  "max_threshold_occupancy":
>  		Read/write file provides the largest value (in
>  		bytes) at which a previously used LLC_occupancy
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 1f72249a5c93..5f6c4b662f3b 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -933,6 +933,8 @@ int resctrl_mon_resource_init(void)
>  		resctrl_file_fflags_init("available_mbm_cntrs",
>  					 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
>  		resctrl_file_fflags_init("event_filter", RFTYPE_ASSIGN_CONFIG);
> +		resctrl_file_fflags_init("mbm_assign_on_mkdir", RFTYPE_MON_INFO |
> +					 RFTYPE_RES_CACHE);
>  	}
>  
>  	return 0;
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 8c498b41be5d..0093b323d858 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -2035,8 +2035,8 @@ static ssize_t event_filter_write(struct kernfs_open_file *of, char *buf,
>  	}
>  
>  	ret = resctrl_process_configs(buf, &evt_cfg);
> -	if (!ret && mevt->evt_val != evt_cfg) {
> -		mevt->evt_val = evt_cfg;
> +	if (!ret && mevt->evt_cfg != evt_cfg) {
> +		mevt->evt_cfg = evt_cfg;
>  		resctrl_update_assign(r, mevt->evtid, evt_cfg);
>  	}
>  

Needs to be squashed.

> @@ -2047,6 +2047,39 @@ static ssize_t event_filter_write(struct kernfs_open_file *of, char *buf,
>  	return ret ?: nbytes;
>  }
>  
> +static int resctrl_mbm_assign_on_mkdir_show(struct kernfs_open_file *of,
> +					    struct seq_file *s, void *v)
> +{
> +	struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
> +
> +	seq_printf(s, "%u\n", r->mon.mbm_assign_on_mkdir);
> +
> +	return 0;
> +}
> +
> +static ssize_t resctrl_mbm_assign_on_mkdir_write(struct kernfs_open_file *of,
> +						 char *buf, size_t nbytes, loff_t off)
> +{
> +	struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
> +	bool value;
> +	int ret;
> +
> +	ret = kstrtobool(buf, &value);
> +	if (ret)
> +		return ret;
> +
> +	cpus_read_lock();

not traversing the domain list so hotplug lock not needed.

> +	mutex_lock(&rdtgroup_mutex);

rdtgroup_mutex seems only needed because the message buffer is cleared below, and this is why it
is not required in the show()?

> +	rdt_last_cmd_clear();
> +
> +	r->mon.mbm_assign_on_mkdir = value;
> +
> +	mutex_unlock(&rdtgroup_mutex);
> +	cpus_read_unlock();
> +
> +	return ret ?: nbytes;
> +}
> +
>  /* rdtgroup information files for one cache resource. */
>  static struct rftype res_common_files[] = {
>  	{
> @@ -2056,6 +2089,13 @@ static struct rftype res_common_files[] = {
>  		.seq_show	= rdt_last_cmd_status_show,
>  		.fflags		= RFTYPE_TOP_INFO,
>  	},
> +	{
> +		.name		= "mbm_assign_on_mkdir",
> +		.mode		= 0644,
> +		.kf_ops		= &rdtgroup_kf_single_ops,
> +		.seq_show	= resctrl_mbm_assign_on_mkdir_show,
> +		.write		= resctrl_mbm_assign_on_mkdir_write,
> +	},
>  	{
>  		.name		= "num_closids",
>  		.mode		= 0444,
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index cd24d1577e0a..d6435abdde7b 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -278,6 +278,7 @@ enum resctrl_schema_fmt {
>   *			monitoring events can be configured.
>   * @num_mbm_cntrs:	Number of assignable monitoring counters
>   * @mbm_cntr_assignable:Is system capable of supporting monitor assignment?
> + * @mbm_assign_on_mkdir:Auto enable monitor assignment on mkdir?

How is "monitor assignment" different from "counter assignment"?

>   * @evt_list:		List of monitoring events
>   */
>  struct resctrl_mon {
> @@ -285,6 +286,7 @@ struct resctrl_mon {
>  	unsigned int		mbm_cfg_mask;
>  	int			num_mbm_cntrs;
>  	bool			mbm_cntr_assignable;
> +	bool			mbm_assign_on_mkdir;
>  	struct list_head	evt_list;
>  };
>  

Reinette

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 11/27] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
  2025-05-22 22:16     ` Luck, Tony
@ 2025-05-23 21:08       ` Luck, Tony
  2025-05-26 13:14         ` Peter Newman
  0 siblings, 1 reply; 114+ messages in thread
From: Luck, Tony @ 2025-05-23 21:08 UTC (permalink / raw)
  To: Chatre, Reinette, Babu Moger, corbet@lwn.net, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com
  Cc: james.morse@arm.com, dave.martin@arm.com, fenghuay@nvidia.com,
	x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
	akpm@linux-foundation.org, thuth@redhat.com, rostedt@goodmis.org,
	ardb@kernel.org, gregkh@linuxfoundation.org,
	daniel.sneddon@linux.intel.com, jpoimboe@kernel.org,
	alexandre.chartre@oracle.com, pawan.kumar.gupta@linux.intel.com,
	thomas.lendacky@amd.com, perry.yuan@amd.com, seanjc@google.com,
	Huang, Kai, Li, Xiaoyao, kan.liang@linux.intel.com, Li, Xin3,
	ebiggers@google.com, xin@zytor.com, Mehta, Sohil,
	andrew.cooper3@citrix.com, mario.limonciello@amd.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	peternewman@google.com, Wieczor-Retman, Maciej, Eranian, Stephane,
	Xiaojian.Du@amd.com, gautham.shenoy@amd.com

On Thu, May 22, 2025 at 10:16:16PM +0000, Luck, Tony wrote:
> > It looks to me as though there are a couple of changes in the telemetry work
> > that would benefit this work. https://lore.kernel.org/lkml/20250521225049.132551-2-tony.luck@intel.com/
> > switches the monitor events to be maintained in an array indexed by event ID, eliminating the
> > need for searching the evt_list that this work does in a couple of places. Also note the handy
> > new for_each_mbm_event() helper (https://lore.kernel.org/lkml/20250521225049.132551-5-tony.luck@intel.com/).
> 
> Yesterday I ran through the exercise of rebasing my AET patches on top of these
> ABMC patches in order to check whether the ABMC patches painted resctrl
> into some corner that would be hard to get back out of.
> 
> Good news: they don't.
> 
> There was a bunch of manual patching to make the first four patches fit on top
> of the ABMC code, but I also noticed a few places where things were simpler
> after combining the two series.
> 
> Maybe a good path forward would be to take those first four patches from
> my AET series and then build ABMC on top of those.

As an encouragement to try this direction, I took my four patches
on top of tip x86/cache and then applied Babu's ABMC series.

Changes to Babu's code:
1) Adapt where needed for removal of evt_list. Use event array instead.
2) Use for_each_mbm_event() [Maybe didn't get all places?]
3) Bring the s/evt_val/evt_cfg/ fix into patch 20 from 21
4) Fix fir tree declaration for resctrl_process_assign()

I don't have an AMD system to check if the ABMC parts still work. But
it does pass the resctrl self tests, so legacy isn't broken.

Patches in the "my_mbm_plus_babu_abmc" branch of my kernel.org
repo: git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git

-Tony

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 24/27] x86/resctrl: Introduce the interface to modify assignments in a group
  2025-05-15 22:52 ` [PATCH v13 24/27] x86/resctrl: Introduce the interface to modify " Babu Moger
@ 2025-05-26  9:48   ` Peter Newman
  2025-05-27 15:24     ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Peter Newman @ 2025-05-26  9:48 UTC (permalink / raw)
  To: Babu Moger
  Cc: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen,
	james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, maciej.wieczor-retman, eranian, Xiaojian.Du,
	gautham.shenoy

Hi Babu,

On Fri, May 16, 2025 at 12:56 AM Babu Moger <babu.moger@amd.com> wrote:

> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 8d970b99bbbd..ea1782723f81 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -2126,6 +2126,168 @@ static int mbm_L3_assignments_show(struct kernfs_open_file *of, struct seq_file
>         return ret;
>  }
>
> +/*
> + * mbm_get_mon_event_by_name() - Return the mon_evt entry for the matching
> + * event name.
> + */
> +static struct mon_evt *mbm_get_mon_event_by_name(struct rdt_resource *r,
> +                                                char *name)
> +{
> +       struct mon_evt *mevt;
> +
> +       list_for_each_entry(mevt, &r->mon.evt_list, list) {
> +               if (!strcmp(mevt->name, name))
> +                       return mevt;
> +       }
> +
> +       return NULL;
> +}
> +
> +static unsigned int resctrl_get_assing_type(char *assign)
> +{
> +       unsigned int mon_state = ASSIGN_NONE;
> +       int len = strlen(assign);

[  395.013183] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  395.013426] #PF: supervisor read access in kernel mode
[  395.013600] #PF: error_code(0x0000) - not-present page
[  395.013779] PGD 39322c067 P4D 2a4f49067 PUD 2a4f4a067 PMD 0
[  395.013973] Oops: Oops: 0000 [#1] SMP DEBUG_PAGEALLOC NOPTI
[  395.014156] CPU: 37 UID: 0 PID: 24147 Comm: bash Not tainted
6.15.0-dbg-DEV #13 NONE
[  395.014403] Hardware name: Google Astoria-Turin/astoria, BIOS
0.20241223.2-0 01/17/2025
[  395.014652] RIP: 0010:strlen+0xb/0x20
[  395.014778] Code: 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 90
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 48 c7 c0 ff
ff ff ff <80> 7c 07 01 00 48 8d 40 01 75 f5 c3 cc cc cc cc cc 0f 1f 40
00 90
[  395.015356] RSP: 0018:ffa000002f743d58 EFLAGS: 00010246
[  395.015522] RAX: ffffffffffffffff RBX: ff11000129a00600 RCX: 0000000000000000
[  395.015747] RDX: ff110001299f5253 RSI: ffffffff827b9651 RDI: 0000000000000000
[  395.015968] RBP: 0000000000000000 R08: 000000000000003d R09: 0000000000000000
[  395.016202] R10: ffffffff827b9652 R11: 0000000000000000 R12: ffffffff8305b7f8
[  395.016421] R13: ff110001299f5240 R14: 0000000000000014 R15: 0000000000000000
[  395.016644] FS:  00007f1281ff8b80(0000) GS:ff1100bdc8276000(0000)
knlGS:0000000000000000
[  395.016893] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  395.017071] CR2: 0000000000000000 CR3: 0000000420bc8002 CR4: 0000000000771ef0
[  395.017298] PKRU: 55555554
[  395.017388] Call Trace:
[  395.017471]  <TASK>
[  395.017545]  mbm_L3_assignments_write+0x2d4/0x4e0
[  395.017700]  kernfs_fop_write_iter+0x132/0x1c0
[  395.017851]  vfs_write+0x2bf/0x3c0
[  395.017963]  ksys_write+0x82/0x100
[  395.018074]  do_syscall_64+0xee/0x210
[  395.018198]  ? exc_page_fault+0x81/0xe0
[  395.018321]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  395.018482] RIP: 0033:0x7f128177f8b3
[  395.018598] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
cc cc cc cc cc cc cc 48 8b 05 99 91 07 00 83 38 00 75 10 b8 01 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 4d c3 55 48 89 e5 41 57 41 56 53 50 48
89 d3
[  395.019167] RSP: 002b:00007ffff66e80f8 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[  395.019409] RAX: ffffffffffffffda RBX: 0000000000000014 RCX: 00007f128177f8b3
[  395.019636] RDX: 0000000000000014 RSI: 0000000001eedb60 RDI: 0000000000000001
[  395.019861] RBP: 00007ffff66e8120 R08: 0000000000000000 R09: 0000000000000000
[  395.020081] R10: 00007ffff66e81b0 R11: 0000000000000246 R12: 0000000001eedb60
[  395.020303] R13: 0000000000000001 R14: 00007f12817fa650 R15: 0000000000000014
[  395.020532]  </TASK>

> +
> +       if (!len || len > 1)
> +               return ASSIGN_INVALID;
> +
> +       switch (*assign) {
> +       case 'e':
> +               mon_state = ASSIGN_EXCLUSIVE;
> +               break;
> +       case '_':
> +               mon_state = ASSIGN_NONE;
> +               break;
> +       default:
> +               mon_state = ASSIGN_INVALID;
> +               break;
> +       }
> +
> +       return mon_state;
> +}
> +
> +static int resctrl_process_assign(struct rdt_resource *r, struct rdtgroup *rdtgrp,
> +                                 char *config, char *tok)
> +{
> +       struct rdt_mon_domain *d;
> +       char *dom_str, *id_str;
> +       unsigned long dom_id = 0;
> +       struct mon_evt *mevt;
> +       int assign_type;
> +       char domain[10];
> +       bool found;
> +       int ret;
> +
> +       mevt = mbm_get_mon_event_by_name(r, config);
> +       if (!mevt) {
> +               rdt_last_cmd_printf("Invalid assign configuration %s\n", config);
> +               return  -ENOENT;
> +       }
> +
> +next:
> +       if (!tok || tok[0] == '\0')
> +               return 0;
> +
> +       /* Start processing the strings for each domain */
> +       dom_str = strim(strsep(&tok, ";"));
> +
> +       id_str = strsep(&dom_str, "=");

If there's no '=' then dom_str becomes NULL...

> +
> +       /* Check for domain id '*' which means all domains */
> +       if (id_str && *id_str == '*') {
> +               d = NULL;
> +               goto check_state;
> +       } else if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
> +               rdt_last_cmd_puts("Missing domain id\n");
> +               return -EINVAL;
> +       }
> +
> +       /* Verify if the dom_id is valid */
> +       found = false;
> +       list_for_each_entry(d, &r->mon_domains, hdr.list) {
> +               if (d->hdr.id == dom_id) {
> +                       found = true;
> +                       break;
> +               }
> +       }
> +
> +       if (!found) {
> +               rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
> +               return -EINVAL;
> +       }
> +
> +check_state:
> +       assign_type = resctrl_get_assing_type(dom_str);

then the resulting type of whatever this is supposed to mean is "panic"

Thanks,
-Peter

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 11/27] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
  2025-05-23 21:08       ` Luck, Tony
@ 2025-05-26 13:14         ` Peter Newman
  2025-05-27 21:41           ` Luck, Tony
  0 siblings, 1 reply; 114+ messages in thread
From: Peter Newman @ 2025-05-26 13:14 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Chatre, Reinette, Babu Moger, corbet@lwn.net, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	james.morse@arm.com, dave.martin@arm.com, fenghuay@nvidia.com,
	x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
	akpm@linux-foundation.org, thuth@redhat.com, rostedt@goodmis.org,
	ardb@kernel.org, gregkh@linuxfoundation.org,
	daniel.sneddon@linux.intel.com, jpoimboe@kernel.org,
	alexandre.chartre@oracle.com, pawan.kumar.gupta@linux.intel.com,
	thomas.lendacky@amd.com, perry.yuan@amd.com, seanjc@google.com,
	Huang, Kai, Li, Xiaoyao, kan.liang@linux.intel.com, Li, Xin3,
	ebiggers@google.com, xin@zytor.com, Mehta, Sohil,
	andrew.cooper3@citrix.com, mario.limonciello@amd.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Wieczor-Retman, Maciej, Eranian, Stephane, Xiaojian.Du@amd.com,
	gautham.shenoy@amd.com

Hi Tony,

On Fri, May 23, 2025 at 11:08 PM Luck, Tony <tony.luck@intel.com> wrote:
>
> On Thu, May 22, 2025 at 10:16:16PM +0000, Luck, Tony wrote:
> > > It looks to me as though there are a couple of changes in the telemetry work
> > > that would benefit this work. https://lore.kernel.org/lkml/20250521225049.132551-2-tony.luck@intel.com/
> > > switches the monitor events to be maintained in an array indexed by event ID, eliminating the
> > > need for searching the evt_list that this work does in a couple of places. Also note the handy
> > > new for_each_mbm_event() helper (https://lore.kernel.org/lkml/20250521225049.132551-5-tony.luck@intel.com/).
> >
> > Yesterday I ran through the exercise of rebasing my AET patches on top of these
> > ABMC patches in order to check whether the ABMC patches painted resctrl
> > into some corner that would be hard to get back out of.
> >
> > Good news: they don't.
> >
> > There was a bunch of manual patching to make the first four patches fit on top
> > of the ABMC code, but I also noticed a few places where things were simpler
> > after combining the two series.
> >
> > Maybe a good path forward would be to take those first four patches from
> > my AET series and then build ABMC on top of those.
>
> As an encouragement to try this direction, I took my four patches
> on top of tip x86/cache and then applied Babu's ABMC series.

I did the same thing last week, except in the other order, so I
switched to your changes to test.

>
> Changes to Babu's code:
> 1) Adapt where needed for removal of evt_list. Use event array instead.
> 2) Use for_each_mbm_event() [Maybe didn't get all places?]
> 3) Bring the s/evt_val/evt_cfg/ fix into patch 20 from 21
> 4) Fix fir tree declaration for resctrl_process_assign()
>
> I don't have an AMD system to check if the ABMC parts still work. But
> it does pass the resctrl self tests, so legacy isn't broken.
>
> Patches in the "my_mbm_plus_babu_abmc" branch of my kernel.org
> repo: git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git

Thanks for applying my suggestion[1] about the array entry sizes, but
you needed one more dereference:

diff --git a/arch/x86/kernel/cpu/resctrl/core.c
b/arch/x86/kernel/cpu/resctrl/core.c
index 1db6a61e27746..0c27e0a5a7b96 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -399,7 +399,7 @@ static int domain_setup_ctrlval(struct
rdt_resource *r, struct rdt_ctrl_domain *
  */
 static int arch_domain_mbm_alloc(u32 num_rmid, struct
rdt_hw_mon_domain *hw_dom)
 {
-       size_t tsize = sizeof(hw_dom->arch_mbm_states[0]);
+       size_t tsize = sizeof(*hw_dom->arch_mbm_states[0]);
        enum resctrl_event_id evt;
        int idx;

diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 098ff002d2232..44ec33cb165f7 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -4819,7 +4823,7 @@ void resctrl_offline_mon_domain(struct
rdt_resource *r, struct rdt_mon_domain *d
 static int domain_setup_mon_state(struct rdt_resource *r, struct
rdt_mon_domain *d)
 {
        u32 idx_limit = resctrl_arch_system_num_rmid_idx();
-       size_t tsize = sizeof(d->mbm_states[0]);
+       size_t tsize = sizeof(*d->mbm_states[0]);
        enum resctrl_event_id evt;
        int idx;


You should be able to repro an array overrun without ABMC, and a page
fault is likely if the system implements a lot of RMIDs. The AMD EPYC
9B45 I tested on implements 4096 RMIDs.

Thanks,
-Peter


[1] https://lore.kernel.org/lkml/CALPaoCj8yfzJ=5CkxTPQXc0-WRWpu0xKRX8v4FAWFGQKtXtMUw@mail.gmail.com/

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 24/27] x86/resctrl: Introduce the interface to modify assignments in a group
  2025-05-26  9:48   ` Peter Newman
@ 2025-05-27 15:24     ` Moger, Babu
  0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-05-27 15:24 UTC (permalink / raw)
  To: Peter Newman
  Cc: corbet, tony.luck, reinette.chatre, tglx, mingo, bp, dave.hansen,
	james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, maciej.wieczor-retman, eranian, Xiaojian.Du,
	gautham.shenoy

Hi Peter,


On 5/26/25 04:48, Peter Newman wrote:
> Hi Babu,
> 
> On Fri, May 16, 2025 at 12:56 AM Babu Moger <babu.moger@amd.com> wrote:
> 
>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>> index 8d970b99bbbd..ea1782723f81 100644
>> --- a/fs/resctrl/rdtgroup.c
>> +++ b/fs/resctrl/rdtgroup.c
>> @@ -2126,6 +2126,168 @@ static int mbm_L3_assignments_show(struct kernfs_open_file *of, struct seq_file
>>         return ret;
>>  }
>>
>> +/*
>> + * mbm_get_mon_event_by_name() - Return the mon_evt entry for the matching
>> + * event name.
>> + */
>> +static struct mon_evt *mbm_get_mon_event_by_name(struct rdt_resource *r,
>> +                                                char *name)
>> +{
>> +       struct mon_evt *mevt;
>> +
>> +       list_for_each_entry(mevt, &r->mon.evt_list, list) {
>> +               if (!strcmp(mevt->name, name))
>> +                       return mevt;
>> +       }
>> +
>> +       return NULL;
>> +}
>> +
>> +static unsigned int resctrl_get_assing_type(char *assign)
>> +{
>> +       unsigned int mon_state = ASSIGN_NONE;
>> +       int len = strlen(assign);
> 
> [  395.013183] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [  395.013426] #PF: supervisor read access in kernel mode
> [  395.013600] #PF: error_code(0x0000) - not-present page
> [  395.013779] PGD 39322c067 P4D 2a4f49067 PUD 2a4f4a067 PMD 0
> [  395.013973] Oops: Oops: 0000 [#1] SMP DEBUG_PAGEALLOC NOPTI
> [  395.014156] CPU: 37 UID: 0 PID: 24147 Comm: bash Not tainted
> 6.15.0-dbg-DEV #13 NONE
> [  395.014403] Hardware name: Google Astoria-Turin/astoria, BIOS
> 0.20241223.2-0 01/17/2025
> [  395.014652] RIP: 0010:strlen+0xb/0x20
> [  395.014778] Code: 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 90
> 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 48 c7 c0 ff
> ff ff ff <80> 7c 07 01 00 48 8d 40 01 75 f5 c3 cc cc cc cc cc 0f 1f 40
> 00 90
> [  395.015356] RSP: 0018:ffa000002f743d58 EFLAGS: 00010246
> [  395.015522] RAX: ffffffffffffffff RBX: ff11000129a00600 RCX: 0000000000000000
> [  395.015747] RDX: ff110001299f5253 RSI: ffffffff827b9651 RDI: 0000000000000000
> [  395.015968] RBP: 0000000000000000 R08: 000000000000003d R09: 0000000000000000
> [  395.016202] R10: ffffffff827b9652 R11: 0000000000000000 R12: ffffffff8305b7f8
> [  395.016421] R13: ff110001299f5240 R14: 0000000000000014 R15: 0000000000000000
> [  395.016644] FS:  00007f1281ff8b80(0000) GS:ff1100bdc8276000(0000)
> knlGS:0000000000000000
> [  395.016893] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  395.017071] CR2: 0000000000000000 CR3: 0000000420bc8002 CR4: 0000000000771ef0
> [  395.017298] PKRU: 55555554
> [  395.017388] Call Trace:
> [  395.017471]  <TASK>
> [  395.017545]  mbm_L3_assignments_write+0x2d4/0x4e0
> [  395.017700]  kernfs_fop_write_iter+0x132/0x1c0
> [  395.017851]  vfs_write+0x2bf/0x3c0
> [  395.017963]  ksys_write+0x82/0x100
> [  395.018074]  do_syscall_64+0xee/0x210
> [  395.018198]  ? exc_page_fault+0x81/0xe0
> [  395.018321]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [  395.018482] RIP: 0033:0x7f128177f8b3
> [  395.018598] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
> cc cc cc cc cc cc cc 48 8b 05 99 91 07 00 83 38 00 75 10 b8 01 00 00
> 00 0f 05 <48> 3d 01 f0 ff ff 73 4d c3 55 48 89 e5 41 57 41 56 53 50 48
> 89 d3
> [  395.019167] RSP: 002b:00007ffff66e80f8 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000001
> [  395.019409] RAX: ffffffffffffffda RBX: 0000000000000014 RCX: 00007f128177f8b3
> [  395.019636] RDX: 0000000000000014 RSI: 0000000001eedb60 RDI: 0000000000000001
> [  395.019861] RBP: 00007ffff66e8120 R08: 0000000000000000 R09: 0000000000000000
> [  395.020081] R10: 00007ffff66e81b0 R11: 0000000000000246 R12: 0000000001eedb60
> [  395.020303] R13: 0000000000000001 R14: 00007f12817fa650 R15: 0000000000000014
> [  395.020532]  </TASK>
> 

Yes. Got it. Missing NULL check. Simplified the function now. Thanks


static unsigned int resctrl_get_assign_type(char *assign)
{

        if (!assign || strlen(assign) != 1)
                return ASSIGN_INVALID;

        switch (*assign) {
        case 'e':
                return ASSIGN_EXCLUSIVE;
        case '_':
                return ASSIGN_NONE;
        default:
                return ASSIGN_INVALID;
        }
}


>> +
>> +       if (!len || len > 1)
>> +               return ASSIGN_INVALID;
>> +
>> +       switch (*assign) {
>> +       case 'e':
>> +               mon_state = ASSIGN_EXCLUSIVE;
>> +               break;
>> +       case '_':
>> +               mon_state = ASSIGN_NONE;
>> +               break;
>> +       default:
>> +               mon_state = ASSIGN_INVALID;
>> +               break;
>> +       }
>> +
>> +       return mon_state;
>> +}
>> +
>> +static int resctrl_process_assign(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>> +                                 char *config, char *tok)
>> +{
>> +       struct rdt_mon_domain *d;
>> +       char *dom_str, *id_str;
>> +       unsigned long dom_id = 0;
>> +       struct mon_evt *mevt;
>> +       int assign_type;
>> +       char domain[10];
>> +       bool found;
>> +       int ret;
>> +
>> +       mevt = mbm_get_mon_event_by_name(r, config);
>> +       if (!mevt) {
>> +               rdt_last_cmd_printf("Invalid assign configuration %s\n", config);
>> +               return  -ENOENT;
>> +       }
>> +
>> +next:
>> +       if (!tok || tok[0] == '\0')
>> +               return 0;
>> +
>> +       /* Start processing the strings for each domain */
>> +       dom_str = strim(strsep(&tok, ";"));
>> +
>> +       id_str = strsep(&dom_str, "=");
> 
> If there's no '=' then dom_str becomes NULL...

Yea. That is correct.

> 
>> +
>> +       /* Check for domain id '*' which means all domains */
>> +       if (id_str && *id_str == '*') {
>> +               d = NULL;
>> +               goto check_state;
>> +       } else if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
>> +               rdt_last_cmd_puts("Missing domain id\n");
>> +               return -EINVAL;
>> +       }
>> +
>> +       /* Verify if the dom_id is valid */
>> +       found = false;
>> +       list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> +               if (d->hdr.id == dom_id) {
>> +                       found = true;
>> +                       break;
>> +               }
>> +       }
>> +
>> +       if (!found) {
>> +               rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
>> +               return -EINVAL;
>> +       }
>> +
>> +check_state:
>> +       assign_type = resctrl_get_assing_type(dom_str);
> 
> then the resulting type of whatever this is supposed to mean is "panic"
> 
> Thanks,
> -Peter
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 01/27] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-22 20:51   ` Reinette Chatre
@ 2025-05-27 17:23     ` Moger, Babu
  2025-05-27 17:54       ` Reinette Chatre
  0 siblings, 1 reply; 114+ messages in thread
From: Moger, Babu @ 2025-05-27 17:23 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/22/25 15:51, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/15/25 3:51 PM, Babu Moger wrote:
>> Users can create as many monitor groups as RMIDs supported by the hardware.
>> However, bandwidth monitoring feature on AMD system only guarantees that
>> RMIDs currently assigned to a processor will be tracked by hardware. The
>> counters of any other RMIDs which are no longer being tracked will be reset
>> to zero. The MBM event counters return "Unavailable" for the RMIDs that are
>> not tracked by hardware. So, there can be only limited number of groups
>> that can give guaranteed monitoring numbers. With ever changing
>> configurations there is no way to definitely know which of these groups are
>> being tracked for certain point of time. Users do not have the option to
>> monitor a group or set of groups for certain period of time without
>> worrying about RMID being reset in between.
>>
>> The ABMC feature provides an option to the user to assign a hardware
>> counter to an RMID, event pair and monitor the bandwidth as long as it is
>> assigned. The assigned RMID will be tracked by the hardware until the user
>> unassigns it manually. There is no need to worry about counters being reset
>> during this period. Additionally, the user can specify a bitmask
>> identifying the specific bandwidth types from the given source to track
>> with the counter.
>>
>> Without ABMC enabled, monitoring will work in current mode without
>> assignment option.
>>
>> The Linux resctrl subsystem provides an interface that allows monitoring of
>> up to two memory bandwidth events per group, selected from a combination of
>> available total and local events. When ABMC is enabled, two events will be
>> assigned to each group by default, in line with the current interface
>> design. Users will also have the option to configure which types of memory
>> transactions are counted by these events.
>>
>> Due to the limited number of available counters (32), users may quickly
>> exhaust the available counters. If the system runs out of assignable ABMC
>> counters, the kernel will report an error. In such cases, users will nee
>> dto unassign one or more active counters to free up countes for new
> 
> "nee dto" -> "need to"
> "countes" -> "counters"

Sure.

> 
>> assignments. The interface will provide options to assign or unassign
> 
> "The interface will" -> "resctrl will"?
> 

Sure.

>> events through the group-specific interface file.
>>
>> The feature can be detected via CPUID_Fn80000020_EBX_x00 bit 5.
> 
> "The feature can be detected" -> "The feature is detected"
> 

Sure.

>> Bits Description
>> 5    ABMC (Assignable Bandwidth Monitoring Counters)
>>
>> The feature details are documented in APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC).
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> 
> ...
>>  arch/x86/include/asm/cpufeatures.h | 1 +
>>  arch/x86/kernel/cpu/cpuid-deps.c   | 2 ++
>>  arch/x86/kernel/cpu/scattered.c    | 1 +
>>  3 files changed, 4 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
>> index 6c2c152d8a67..d5c14dc678df 100644
>> --- a/arch/x86/include/asm/cpufeatures.h
>> +++ b/arch/x86/include/asm/cpufeatures.h
>> @@ -481,6 +481,7 @@
>>  #define X86_FEATURE_AMD_HETEROGENEOUS_CORES (21*32 + 6) /* Heterogeneous Core Topology */
>>  #define X86_FEATURE_AMD_WORKLOAD_CLASS	(21*32 + 7) /* Workload Classification */
>>  #define X86_FEATURE_PREFER_YMM		(21*32 + 8) /* Avoid ZMM registers due to downclocking */
>> +#define X86_FEATURE_ABMC		(21*32 + 9) /* Assignable Bandwidth Monitoring Counters */
>>  
>>  /*
>>   * BUG word(s)
>> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
>> index a2fbea0be535..2f54831e04e5 100644
>> --- a/arch/x86/kernel/cpu/cpuid-deps.c
>> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
>> @@ -71,6 +71,8 @@ static const struct cpuid_dep cpuid_deps[] = {
>>  	{ X86_FEATURE_CQM_MBM_LOCAL,		X86_FEATURE_CQM_LLC   },
>>  	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_TOTAL   },
>>  	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_LOCAL   },
>> +	{ X86_FEATURE_ABMC,			X86_FEATURE_CQM_MBM_TOTAL   },
>> +	{ X86_FEATURE_ABMC,			X86_FEATURE_CQM_MBM_LOCAL   },
> 
> Is this dependency still accurate now that the implementation switched to the 
> "extended event ID" variant of ABMC that no longer uses the event IDs associated
> with X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL?

That's a good question. Unfortunately, we may need to retain this
dependency for now, as a significant portion of the code relies on
functions like resctrl_is_mbm_event(), resctrl_is_mbm_enabled(),
resctrl_arch_is_mbm_total_enabled(), and others.


> 
>>  	{ X86_FEATURE_AVX512_BF16,		X86_FEATURE_AVX512VL  },
>>  	{ X86_FEATURE_AVX512_FP16,		X86_FEATURE_AVX512BW  },
>>  	{ X86_FEATURE_ENQCMD,			X86_FEATURE_XSAVES    },
>> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
>> index 16f3ca30626a..3b72b72270f1 100644
>> --- a/arch/x86/kernel/cpu/scattered.c
>> +++ b/arch/x86/kernel/cpu/scattered.c
>> @@ -49,6 +49,7 @@ static const struct cpuid_bit cpuid_bits[] = {
>>  	{ X86_FEATURE_MBA,			CPUID_EBX,  6, 0x80000008, 0 },
>>  	{ X86_FEATURE_SMBA,			CPUID_EBX,  2, 0x80000020, 0 },
>>  	{ X86_FEATURE_BMEC,			CPUID_EBX,  3, 0x80000020, 0 },
>> +	{ X86_FEATURE_ABMC,			CPUID_EBX,  5, 0x80000020, 0 },
>>  	{ X86_FEATURE_AMD_WORKLOAD_CLASS,	CPUID_EAX, 22, 0x80000021, 0 },
>>  	{ X86_FEATURE_PERFMON_V2,		CPUID_EAX,  0, 0x80000022, 0 },
>>  	{ X86_FEATURE_AMD_LBR_V2,		CPUID_EAX,  1, 0x80000022, 0 },
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 01/27] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-27 17:23     ` Moger, Babu
@ 2025-05-27 17:54       ` Reinette Chatre
  2025-05-27 18:40         ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-05-27 17:54 UTC (permalink / raw)
  To: babu.moger, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/27/25 10:23 AM, Moger, Babu wrote:
> On 5/22/25 15:51, Reinette Chatre wrote:
>> On 5/15/25 3:51 PM, Babu Moger wrote:

>>> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
>>> index a2fbea0be535..2f54831e04e5 100644
>>> --- a/arch/x86/kernel/cpu/cpuid-deps.c
>>> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
>>> @@ -71,6 +71,8 @@ static const struct cpuid_dep cpuid_deps[] = {
>>>  	{ X86_FEATURE_CQM_MBM_LOCAL,		X86_FEATURE_CQM_LLC   },
>>>  	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_TOTAL   },
>>>  	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_LOCAL   },
>>> +	{ X86_FEATURE_ABMC,			X86_FEATURE_CQM_MBM_TOTAL   },
>>> +	{ X86_FEATURE_ABMC,			X86_FEATURE_CQM_MBM_LOCAL   },
>>
>> Is this dependency still accurate now that the implementation switched to the 
>> "extended event ID" variant of ABMC that no longer uses the event IDs associated
>> with X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL?
> 
> That's a good question. Unfortunately, we may need to retain this
> dependency for now, as a significant portion of the code relies on
> functions like resctrl_is_mbm_event(), resctrl_is_mbm_enabled(),
> resctrl_arch_is_mbm_total_enabled(), and others.
> 

Avoiding needing to change code is not a valid reason. 

I think that without this dependency the code will
still rely on "functions like resctrl_is_mbm_event(),
resctrl_is_mbm_enabled(), resctrl_arch_is_mbm_total_enabled(),
and others." though.

The core shift is to stop thinking about QOS_L3_MBM_TOTAL_EVENT_ID
to mean the same as X86_FEATURE_CQM_MBM_TOTAL, similarly to stop
thinking about QOS_L3_MBM_LOCAL_EVENT_ID to mean the same as
X86_FEATURE_CQM_MBM_LOCAL.

I expected that for backwards compatibility ABMC will start by
enabling QOS_L3_MBM_TOTAL_EVENT_ID and QOS_L3_MBM_LOCAL_EVENT_ID 
as part of its initialization, configuring them with the current
defaults for which memory transactions are expected to be monitored
by each. With these events enabled the existing flows using, for
example, resctrl_is_mbm_event(), will continue to work as expected, no?

This would require more familiarity with L3 monitoring enumeration
on AMD since it will still be required to determine the number of
RMIDs etc. but if ABMC does not actually depend on these CQM features
then the current enumeration would need to be re-worked anyway.

Reinette

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 01/27] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-27 17:54       ` Reinette Chatre
@ 2025-05-27 18:40         ` Moger, Babu
  2025-05-27 23:42           ` Reinette Chatre
  0 siblings, 1 reply; 114+ messages in thread
From: Moger, Babu @ 2025-05-27 18:40 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/27/25 12:54, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/27/25 10:23 AM, Moger, Babu wrote:
>> On 5/22/25 15:51, Reinette Chatre wrote:
>>> On 5/15/25 3:51 PM, Babu Moger wrote:
> 
>>>> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
>>>> index a2fbea0be535..2f54831e04e5 100644
>>>> --- a/arch/x86/kernel/cpu/cpuid-deps.c
>>>> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
>>>> @@ -71,6 +71,8 @@ static const struct cpuid_dep cpuid_deps[] = {
>>>>  	{ X86_FEATURE_CQM_MBM_LOCAL,		X86_FEATURE_CQM_LLC   },
>>>>  	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_TOTAL   },
>>>>  	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_LOCAL   },
>>>> +	{ X86_FEATURE_ABMC,			X86_FEATURE_CQM_MBM_TOTAL   },
>>>> +	{ X86_FEATURE_ABMC,			X86_FEATURE_CQM_MBM_LOCAL   },
>>>
>>> Is this dependency still accurate now that the implementation switched to the 
>>> "extended event ID" variant of ABMC that no longer uses the event IDs associated
>>> with X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL?
>>
>> That's a good question. Unfortunately, we may need to retain this
>> dependency for now, as a significant portion of the code relies on
>> functions like resctrl_is_mbm_event(), resctrl_is_mbm_enabled(),
>> resctrl_arch_is_mbm_total_enabled(), and others.
>>
> 
> Avoiding needing to change code is not a valid reason. 
> 
> I think that without this dependency the code will
> still rely on "functions like resctrl_is_mbm_event(),
> resctrl_is_mbm_enabled(), resctrl_arch_is_mbm_total_enabled(),
> and others." though.
> 
> The core shift is to stop thinking about QOS_L3_MBM_TOTAL_EVENT_ID
> to mean the same as X86_FEATURE_CQM_MBM_TOTAL, similarly to stop
> thinking about QOS_L3_MBM_LOCAL_EVENT_ID to mean the same as
> X86_FEATURE_CQM_MBM_LOCAL.

oh. ok.

> 
> I expected that for backwards compatibility ABMC will start by
> enabling QOS_L3_MBM_TOTAL_EVENT_ID and QOS_L3_MBM_LOCAL_EVENT_ID 
> as part of its initialization, configuring them with the current
> defaults for which memory transactions are expected to be monitored
> by each. With these events enabled the existing flows using, for
> example, resctrl_is_mbm_event(), will continue to work as expected, no?

Yes. It will work as it uses event id.
> 
> This would require more familiarity with L3 monitoring enumeration
> on AMD since it will still be required to determine the number of
> RMIDs etc. but if ABMC does not actually depend on these CQM features
> then the current enumeration would need to be re-worked anyway.

Are you suggesting to remove the dependency and rework ABMC enumeration in
get_rdt_mon_resources()?

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 03/27] x86/resctrl: Consolidate monitoring related data from rdt_resource
  2025-05-22 20:52   ` Reinette Chatre
@ 2025-05-27 18:49     ` Moger, Babu
  0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-05-27 18:49 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/22/25 15:52, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/15/25 3:51 PM, Babu Moger wrote:
> 
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index 9ba771f2ddea..2a8fa454d3e6 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -255,40 +255,48 @@ enum resctrl_schema_fmt {
>>  	RESCTRL_SCHEMA_RANGE,
>>  };
>>  
>> +/**
>> + * struct resctrl_mon - Monitoring related data of a resctrl resource
>> + * @num_rmid:		Number of RMIDs available
>> + * @mbm_cfg_mask:	Bandwidth sources that can be tracked when bandwidth
>> + *			monitoring events can be configured.
>> + * @evt_list:		List of monitoring events
>> + */
> 
> Nit: this new comment portion can start with a clean slate by all sentences
> having good structure by starting with upper case and ending with period.

Sure. Will do.
-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 04/27] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
  2025-05-22 20:54   ` Reinette Chatre
@ 2025-05-27 19:52     ` Moger, Babu
  2025-05-27 20:15     ` Moger, Babu
  1 sibling, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-05-27 19:52 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/22/25 15:54, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/15/25 3:51 PM, Babu Moger wrote:
>> ABMC feature details are reported via CPUID Fn8000_0020_EBX_x5.
>> Bits Description
>> 15:0 MAX_ABMC Maximum Supported Assignable Bandwidth
>>      Monitoring Counter ID + 1
>>
>> The feature details are documented in APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC).
>>
>> Detect the feature and number of assignable monitoring counters supported.
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> 
> ...
> 
>> ---
>>  arch/x86/kernel/cpu/resctrl/monitor.c | 9 +++++++--
>>  include/linux/resctrl.h               | 4 ++++
>>  2 files changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index aeb2a9283069..fd2761d9f3f7 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -345,6 +345,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>>  	unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
>>  	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>>  	unsigned int threshold;
>> +	u32 eax, ebx, ecx, edx;
>>  
>>  	snc_nodes_per_l3_cache = snc_get_config();
>>  
>> @@ -375,13 +376,17 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>>  	resctrl_rmid_realloc_threshold = resctrl_arch_round_mon_val(threshold);
>>  
>>  	if (rdt_cpu_has(X86_FEATURE_BMEC)) {
>> -		u32 eax, ebx, ecx, edx;
>> -
>>  		/* Detect list of bandwidth sources that can be tracked */
>>  		cpuid_count(0x80000020, 3, &eax, &ebx, &ecx, &edx);
>>  		r->mon.mbm_cfg_mask = ecx & MAX_EVT_CONFIG_BITS;
>>  	}
>>  
>> +	if (rdt_cpu_has(X86_FEATURE_ABMC)) {
>> +		r->mon.mbm_cntr_assignable = true;
>> +		cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
>> +		r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
>> +	}
>> +
> 
> Shouldn't ABMC detection also include enumeration of which configurations
> are supported? From what I can tell, looking ahead patch #18 hardcodes definitions
> of all known "bandwidth types" (which term to use TBD) and then patch #20 allows
> *any* of these types to be configured irrespective of whether system
> supports it.
> AMD spec mentions "The types of L3 transactions that ABMC can track are
> configurable and identified by CPUID Fn8000_0020_ECX_x3."  It thus looks
> like the enumeration of r->mon.mbm_cfg_mask when BMEC is enabled is
> required for ABMC also and used by this implementation.

That is true. Will add the following check.

if (rdt_cpu_has(X86_FEATURE_BMEC) || rdt_cpu_has(X86_FEATURE_ABMC)) {
  		/* Detect list of bandwidth sources that can be tracked */
 		cpuid_count(0x80000020, 3, &eax, &ebx, &ecx, &edx);
 		r->mon.mbm_cfg_mask = ecx & MAX_EVT_CONFIG_BITS;
}


> 
>>  	r->mon_capable = true;
>>  
>>  	return 0;
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index 2a8fa454d3e6..065fb6e38933 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -260,11 +260,15 @@ enum resctrl_schema_fmt {
>>   * @num_rmid:		Number of RMIDs available
>>   * @mbm_cfg_mask:	Bandwidth sources that can be tracked when bandwidth
>>   *			monitoring events can be configured.
>> + * @num_mbm_cntrs:	Number of assignable monitoring counters
>> + * @mbm_cntr_assignable:Is system capable of supporting monitor assignment?
> 
> "monitor assignment" has not been used so far, was this intended to be
> "counter assignment"?
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 04/27] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
  2025-05-22 20:54   ` Reinette Chatre
  2025-05-27 19:52     ` Moger, Babu
@ 2025-05-27 20:15     ` Moger, Babu
  1 sibling, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-05-27 20:15 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/22/25 15:54, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/15/25 3:51 PM, Babu Moger wrote:
>> ABMC feature details are reported via CPUID Fn8000_0020_EBX_x5.
>> Bits Description
>> 15:0 MAX_ABMC Maximum Supported Assignable Bandwidth
>>      Monitoring Counter ID + 1
>>
>> The feature details are documented in APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC).
>>
>> Detect the feature and number of assignable monitoring counters supported.
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> 
> ...
> 
>> ---
>>  arch/x86/kernel/cpu/resctrl/monitor.c | 9 +++++++--
>>  include/linux/resctrl.h               | 4 ++++
>>  2 files changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index aeb2a9283069..fd2761d9f3f7 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -345,6 +345,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>>  	unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
>>  	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>>  	unsigned int threshold;
>> +	u32 eax, ebx, ecx, edx;
>>  
>>  	snc_nodes_per_l3_cache = snc_get_config();
>>  
>> @@ -375,13 +376,17 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>>  	resctrl_rmid_realloc_threshold = resctrl_arch_round_mon_val(threshold);
>>  
>>  	if (rdt_cpu_has(X86_FEATURE_BMEC)) {
>> -		u32 eax, ebx, ecx, edx;
>> -
>>  		/* Detect list of bandwidth sources that can be tracked */
>>  		cpuid_count(0x80000020, 3, &eax, &ebx, &ecx, &edx);
>>  		r->mon.mbm_cfg_mask = ecx & MAX_EVT_CONFIG_BITS;
>>  	}
>>  
>> +	if (rdt_cpu_has(X86_FEATURE_ABMC)) {
>> +		r->mon.mbm_cntr_assignable = true;
>> +		cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
>> +		r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
>> +	}
>> +
> 
> Shouldn't ABMC detection also include enumeration of which configurations
> are supported? From what I can tell, looking ahead patch #18 hardcodes definitions
> of all known "bandwidth types" (which term to use TBD) and then patch #20 allows
> *any* of these types to be configured irrespective of whether system
> supports it.
> AMD spec mentions "The types of L3 transactions that ABMC can track are
> configurable and identified by CPUID Fn8000_0020_ECX_x3."  It thus looks
> like the enumeration of r->mon.mbm_cfg_mask when BMEC is enabled is
> required for ABMC also and used by this implementation.
> 
>>  	r->mon_capable = true;
>>  
>>  	return 0;
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index 2a8fa454d3e6..065fb6e38933 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -260,11 +260,15 @@ enum resctrl_schema_fmt {
>>   * @num_rmid:		Number of RMIDs available
>>   * @mbm_cfg_mask:	Bandwidth sources that can be tracked when bandwidth
>>   *			monitoring events can be configured.
>> + * @num_mbm_cntrs:	Number of assignable monitoring counters
>> + * @mbm_cntr_assignable:Is system capable of supporting monitor assignment?
> 
> "monitor assignment" has not been used so far, was this intended to be
> "counter assignment"?

Missed to respond to this comment.  Yes. Sure. Will correct it.
-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 05/27] x86/resctrl: Add support to enable/disable AMD ABMC feature
  2025-05-22 20:56   ` Reinette Chatre
@ 2025-05-27 20:21     ` Moger, Babu
  0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-05-27 20:21 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/22/25 15:56, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/15/25 3:51 PM, Babu Moger wrote:
>> Add the functionality to enable/disable AMD ABMC feature.
>>
>> AMD ABMC feature is enabled by setting enabled bit(0) in MSR
>> L3_QOS_EXT_CFG. When the state of ABMC is changed, the MSR needs
>> to be updated on all the logical processors in the QOS Domain.
>>
>> Hardware counters will reset when ABMC state is changed.
>>
>> The ABMC feature details are documented in APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC).
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> 
> ...
> 
>> ---
>>  arch/x86/include/asm/msr-index.h       |  1 +
>>  arch/x86/kernel/cpu/resctrl/internal.h |  5 +++
>>  arch/x86/kernel/cpu/resctrl/monitor.c  | 43 ++++++++++++++++++++++++++
>>  include/linux/resctrl.h                |  3 ++
>>  4 files changed, 52 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
>> index e6134ef2263d..3970e0b16e47 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -1203,6 +1203,7 @@
>>  /* - AMD: */
>>  #define MSR_IA32_MBA_BW_BASE		0xc0000200
>>  #define MSR_IA32_SMBA_BW_BASE		0xc0000280
>> +#define MSR_IA32_L3_QOS_EXT_CFG		0xc00003ff
>>  #define MSR_IA32_EVT_CFG_BASE		0xc0000400
>>  
>>  /* AMD-V MSRs */
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 5e3c41b36437..fcc9d23686a1 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -37,6 +37,9 @@ struct arch_mbm_state {
>>  	u64	prev_msr;
>>  };
>>  
>> +/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
>> +#define ABMC_ENABLE_BIT			0
>> +
>>  /**
>>   * struct rdt_hw_ctrl_domain - Arch private attributes of a set of CPUs that share
>>   *			       a resource for a control function
>> @@ -102,6 +105,7 @@ struct msr_param {
>>   * @mon_scale:		cqm counter * mon_scale = occupancy in bytes
>>   * @mbm_width:		Monitor width, to detect and correct for overflow.
>>   * @cdp_enabled:	CDP state of this resource
>> + * @mbm_cntr_assign_enabled:	ABMC feature is enabled
>>   *
>>   * Members of this structure are either private to the architecture
>>   * e.g. mbm_width, or accessed via helpers that provide abstraction. e.g.
>> @@ -115,6 +119,7 @@ struct rdt_hw_resource {
>>  	unsigned int		mon_scale;
>>  	unsigned int		mbm_width;
>>  	bool			cdp_enabled;
>> +	bool			mbm_cntr_assign_enabled;
>>  };
>>  
>>  static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r)
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index fd2761d9f3f7..ff4b2abfa044 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -405,3 +405,46 @@ void __init intel_rdt_mbm_apply_quirk(void)
>>  	mbm_cf_rmidthreshold = mbm_cf_table[cf_index].rmidthreshold;
>>  	mbm_cf = mbm_cf_table[cf_index].cf;
>>  }
>> +
>> +static void resctrl_abmc_set_one_amd(void *arg)
>> +{
>> +	bool *enable = arg;
>> +
>> +	if (*enable)
>> +		msr_set_bit(MSR_IA32_L3_QOS_EXT_CFG, ABMC_ENABLE_BIT);
>> +	else
>> +		msr_clear_bit(MSR_IA32_L3_QOS_EXT_CFG, ABMC_ENABLE_BIT);
>> +}
>> +
>> +/*
>> + * ABMC enable/disable requires update of L3_QOS_EXT_CFG MSR on all the CPUs
>> + * associated with all monitor domains.
>> + */
>> +static void _resctrl_abmc_enable(struct rdt_resource *r, bool enable)
>> +{
>> +	struct rdt_mon_domain *d;
>> +
> 
> It remains a challenge to consider these building blocks without insight into
> how/when they will be used. To help out, please add guardrails to help with review.
> For example, this could benefit from a:
> 
> 	lockdep_assert_cpus_held();

Yes. Sure.

> 
>> +	list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> +		on_each_cpu_mask(&d->hdr.cpu_mask,
>> +				 resctrl_abmc_set_one_amd, &enable, 1);
>> +		resctrl_arch_reset_rmid_all(r, d);
>> +	}
>> +}
>> +
>> +int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable)
>> +{
>> +	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>> +
>> +	if (r->mon.mbm_cntr_assignable &&
>> +	    hw_res->mbm_cntr_assign_enabled != enable) {
>> +		_resctrl_abmc_enable(r, enable);
>> +		hw_res->mbm_cntr_assign_enabled = enable;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +inline bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
> 
> This "inline" in the .c file is unexpected. Why is this needed?

Not required. No specific reason. Will remove it.

> 
>> +{
>> +	return resctrl_to_arch_res(r)->mbm_cntr_assign_enabled;
>> +}
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index 065fb6e38933..bdb264875ef6 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -428,6 +428,9 @@ static inline u32 resctrl_get_config_index(u32 closid,
>>  bool resctrl_arch_get_cdp_enabled(enum resctrl_res_level l);
>>  int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
>>  
>> +bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r);
>> +int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable);
>> +
>>  /*
>>   * Update the ctrl_val and apply this config right now.
>>   * Must be called on one of the domain's CPUs.
> 
> Reinette
> 
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 06/27] x86/resctrl: Introduce the interface to display monitor mode
  2025-05-22 20:56   ` Reinette Chatre
@ 2025-05-27 20:33     ` Moger, Babu
  0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-05-27 20:33 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/22/25 15:56, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/15/25 3:51 PM, Babu Moger wrote:
> 
> No comments on changelog since I expect it to be reworked based on 
> https://lore.kernel.org/lkml/7628cec8-5914-4895-8289-027e7821777e@amd.com/

Yes. Sure.

> 
>> --- a/Documentation/filesystems/resctrl.rst
>> +++ b/Documentation/filesystems/resctrl.rst
>> @@ -257,6 +257,33 @@ with the following files:
>>  	    # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
>>  	    0=0x30;1=0x30;3=0x15;4=0x15
>>  
>> +"mbm_assign_mode":
>> +	Reports the list of monitoring modes supported. The enclosed brackets
> 
> Please try to avoid unnecessary words. For example,
> "Reports the list of monitoring modes supported." -> "The supported monitoring modes."

Sure.
-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 114+ messages in thread

* RE: [PATCH v13 11/27] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
  2025-05-26 13:14         ` Peter Newman
@ 2025-05-27 21:41           ` Luck, Tony
  2025-05-28 21:41             ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Luck, Tony @ 2025-05-27 21:41 UTC (permalink / raw)
  To: Peter Newman
  Cc: Chatre, Reinette, Babu Moger, corbet@lwn.net, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	james.morse@arm.com, dave.martin@arm.com, fenghuay@nvidia.com,
	x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
	akpm@linux-foundation.org, thuth@redhat.com, rostedt@goodmis.org,
	ardb@kernel.org, gregkh@linuxfoundation.org,
	daniel.sneddon@linux.intel.com, jpoimboe@kernel.org,
	alexandre.chartre@oracle.com, pawan.kumar.gupta@linux.intel.com,
	thomas.lendacky@amd.com, perry.yuan@amd.com, seanjc@google.com,
	Huang, Kai, Li, Xiaoyao, kan.liang@linux.intel.com, Li, Xin3,
	ebiggers@google.com, xin@zytor.com, Mehta, Sohil,
	andrew.cooper3@citrix.com, mario.limonciello@amd.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Wieczor-Retman, Maciej, Eranian, Stephane, Xiaojian.Du@amd.com,
	gautham.shenoy@amd.com


> Thanks for applying my suggestion[1] about the array entry sizes, but
> you needed one more dereference:

> -       size_t tsize = sizeof(hw_dom->arch_mbm_states[0]);
> +       size_t tsize = sizeof(*hw_dom->arch_mbm_states[0]);

> -       size_t tsize = sizeof(d->mbm_states[0]);
> +       size_t tsize = sizeof(*d->mbm_states[0]);

Indeed yes. Thanks.

-Tony



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 01/27] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-27 18:40         ` Moger, Babu
@ 2025-05-27 23:42           ` Reinette Chatre
  2025-05-28 16:18             ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-05-27 23:42 UTC (permalink / raw)
  To: babu.moger, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/27/25 11:40 AM, Moger, Babu wrote:
> On 5/27/25 12:54, Reinette Chatre wrote:
>> On 5/27/25 10:23 AM, Moger, Babu wrote:
>>> On 5/22/25 15:51, Reinette Chatre wrote:
>>>> On 5/15/25 3:51 PM, Babu Moger wrote:
>>
>>>>> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
>>>>> index a2fbea0be535..2f54831e04e5 100644
>>>>> --- a/arch/x86/kernel/cpu/cpuid-deps.c
>>>>> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
>>>>> @@ -71,6 +71,8 @@ static const struct cpuid_dep cpuid_deps[] = {
>>>>>  	{ X86_FEATURE_CQM_MBM_LOCAL,		X86_FEATURE_CQM_LLC   },
>>>>>  	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_TOTAL   },
>>>>>  	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_LOCAL   },
>>>>> +	{ X86_FEATURE_ABMC,			X86_FEATURE_CQM_MBM_TOTAL   },
>>>>> +	{ X86_FEATURE_ABMC,			X86_FEATURE_CQM_MBM_LOCAL   },
>>>>
>>>> Is this dependency still accurate now that the implementation switched to the 
>>>> "extended event ID" variant of ABMC that no longer uses the event IDs associated
>>>> with X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL?
>>>
>>> That's a good question. Unfortunately, we may need to retain this
>>> dependency for now, as a significant portion of the code relies on
>>> functions like resctrl_is_mbm_event(), resctrl_is_mbm_enabled(),
>>> resctrl_arch_is_mbm_total_enabled(), and others.
>>>
>>
>> Avoiding needing to change code is not a valid reason. 
>>
>> I think that without this dependency the code will
>> still rely on "functions like resctrl_is_mbm_event(),
>> resctrl_is_mbm_enabled(), resctrl_arch_is_mbm_total_enabled(),
>> and others." though.
>>
>> The core shift is to stop thinking about QOS_L3_MBM_TOTAL_EVENT_ID
>> to mean the same as X86_FEATURE_CQM_MBM_TOTAL, similarly to stop
>> thinking about QOS_L3_MBM_LOCAL_EVENT_ID to mean the same as
>> X86_FEATURE_CQM_MBM_LOCAL.
> 
> oh. ok.
> 
>>
>> I expected that for backwards compatibility ABMC will start by
>> enabling QOS_L3_MBM_TOTAL_EVENT_ID and QOS_L3_MBM_LOCAL_EVENT_ID 
>> as part of its initialization, configuring them with the current
>> defaults for which memory transactions are expected to be monitored
>> by each. With these events enabled the existing flows using, for
>> example, resctrl_is_mbm_event(), will continue to work as expected, no?
> 
> Yes. It will work as it uses event id.
>>
>> This would require more familiarity with L3 monitoring enumeration
>> on AMD since it will still be required to determine the number of
>> RMIDs etc. but if ABMC does not actually depend on these CQM features
>> then the current enumeration would need to be re-worked anyway.
> 
> Are you suggesting to remove the dependency and rework ABMC enumeration in
> get_rdt_mon_resources()?
> 

If you have an alternative proposal that would accurately reflect the ABMC
and existing L3 MON features then we can surely consider it.

Reinette

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 01/27] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-27 23:42           ` Reinette Chatre
@ 2025-05-28 16:18             ` Moger, Babu
  0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-05-28 16:18 UTC (permalink / raw)
  To: Reinette Chatre, babu.moger, corbet, tony.luck, tglx, mingo, bp,
	dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/27/2025 6:42 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/27/25 11:40 AM, Moger, Babu wrote:
>> On 5/27/25 12:54, Reinette Chatre wrote:
>>> On 5/27/25 10:23 AM, Moger, Babu wrote:
>>>> On 5/22/25 15:51, Reinette Chatre wrote:
>>>>> On 5/15/25 3:51 PM, Babu Moger wrote:
>>>
>>>>>> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
>>>>>> index a2fbea0be535..2f54831e04e5 100644
>>>>>> --- a/arch/x86/kernel/cpu/cpuid-deps.c
>>>>>> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
>>>>>> @@ -71,6 +71,8 @@ static const struct cpuid_dep cpuid_deps[] = {
>>>>>>   	{ X86_FEATURE_CQM_MBM_LOCAL,		X86_FEATURE_CQM_LLC   },
>>>>>>   	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_TOTAL   },
>>>>>>   	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_LOCAL   },
>>>>>> +	{ X86_FEATURE_ABMC,			X86_FEATURE_CQM_MBM_TOTAL   },
>>>>>> +	{ X86_FEATURE_ABMC,			X86_FEATURE_CQM_MBM_LOCAL   },
>>>>>
>>>>> Is this dependency still accurate now that the implementation switched to the
>>>>> "extended event ID" variant of ABMC that no longer uses the event IDs associated
>>>>> with X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL?
>>>>
>>>> That's a good question. Unfortunately, we may need to retain this
>>>> dependency for now, as a significant portion of the code relies on
>>>> functions like resctrl_is_mbm_event(), resctrl_is_mbm_enabled(),
>>>> resctrl_arch_is_mbm_total_enabled(), and others.
>>>>
>>>
>>> Avoiding needing to change code is not a valid reason.
>>>
>>> I think that without this dependency the code will
>>> still rely on "functions like resctrl_is_mbm_event(),
>>> resctrl_is_mbm_enabled(), resctrl_arch_is_mbm_total_enabled(),
>>> and others." though.
>>>
>>> The core shift is to stop thinking about QOS_L3_MBM_TOTAL_EVENT_ID
>>> to mean the same as X86_FEATURE_CQM_MBM_TOTAL, similarly to stop
>>> thinking about QOS_L3_MBM_LOCAL_EVENT_ID to mean the same as
>>> X86_FEATURE_CQM_MBM_LOCAL.
>>
>> oh. ok.
>>
>>>
>>> I expected that for backwards compatibility ABMC will start by
>>> enabling QOS_L3_MBM_TOTAL_EVENT_ID and QOS_L3_MBM_LOCAL_EVENT_ID
>>> as part of its initialization, configuring them with the current
>>> defaults for which memory transactions are expected to be monitored
>>> by each. With these events enabled the existing flows using, for
>>> example, resctrl_is_mbm_event(), will continue to work as expected, no?
>>
>> Yes. It will work as it uses event id.
>>>
>>> This would require more familiarity with L3 monitoring enumeration
>>> on AMD since it will still be required to determine the number of
>>> RMIDs etc. but if ABMC does not actually depend on these CQM features
>>> then the current enumeration would need to be re-worked anyway.
>>
>> Are you suggesting to remove the dependency and rework ABMC enumeration in
>> get_rdt_mon_resources()?
>>
> 
> If you have an alternative proposal that would accurately reflect the ABMC
> and existing L3 MON features then we can surely consider it.

I don't see any other option at this point. Will change it next revision.

Thanks
Babu




^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 08/27] x86/resctrl: Introduce mbm_cntr_cfg to track assignable counters at domain
  2025-05-22 21:02   ` Reinette Chatre
@ 2025-05-28 16:56     ` Moger, Babu
  2025-05-28 17:34       ` Reinette Chatre
  0 siblings, 1 reply; 114+ messages in thread
From: Moger, Babu @ 2025-05-28 16:56 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, tony.luck, tglx, mingo, bp,
	dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/22/2025 4:02 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> shortlog: "at domain" -> "per domain"?
> 
Sure.

> On 5/15/25 3:51 PM, Babu Moger wrote:
>> In mbm_cntr_assign mode hardware counters are assigned/unassigned to an
>> MBM event of a monitor group. Hardware counters are assigned/unassigned
>> at monitoring domain level.
>>
>> Manage a monitoring domain's hardware counters using a per monitoring
>> domain array of struct mbm_cntr_cfg that is indexed by the hardware
>> counter ID. A hardware counter's configuration contains the MBM event
>> ID and points to the monitoring group that it is assigned to, with a
>> NULL pointer meaning that the hardware counter is available for assignment.
>>
>> There is no direct way to determine which hardware counters are assigned
>> to a particular monitoring group. Check every entry of every hardware
>> counter configuration array in every monitoring domain to query which
>> MBM events of a monitoring group is tracked by hardware. Such queries are
>> acceptable because of a very small number of assignable counters (32
>> to 64).
>>
>> Suggested-by: Peter Newman <peternewman@google.com>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v13: Resolved conflicts caused by the recent FS/ARCH code restructure.
>>       The files monitor.c/rdtgroup.c have been split between FS and ARCH directories.
>>
>> v12: Fixed the struct mbm_cntr_cfg code documentation.
>>       Removed few strange charactors in changelog.
>>       Added the counter range for better understanding.
>>       Moved the struct mbm_cntr_cfg definition to resctrl/internal.h as
>>       suggested by James.
>>
>> v11: Refined the change log based on Reinette's feedback.
>>       Fixed few style issues.
>>
>> v10: Patch changed completely to handle the counters at domain level.
>>       https://lore.kernel.org/lkml/CALPaoCj+zWq1vkHVbXYP0znJbe6Ke3PXPWjtri5AFgD9cQDCUg@mail.gmail.com/
>>       Removed Reviewed-by tag.
>>       Did not see the need to add cntr_id in mbm_state structure. Not used in the code.
>>
>> v9: Added Reviewed-by tag. No other changes.
>>
>> v8: Minor commit message changes.
>>
>> v7: Added check mbm_cntr_assignable for allocating bitmap mbm_cntr_map
>>
>> v6: New patch to add domain level assignment.
>> ---
>>   fs/resctrl/rdtgroup.c   | 11 +++++++++++
>>   include/linux/resctrl.h | 16 ++++++++++++++++
>>   2 files changed, 27 insertions(+)
>>
>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>> index 51f8f8d3ccbc..e2005fc9acd9 100644
>> --- a/fs/resctrl/rdtgroup.c
>> +++ b/fs/resctrl/rdtgroup.c
>> @@ -4085,6 +4085,7 @@ static void rdtgroup_setup_default(void)
>>   
>>   static void domain_destroy_mon_state(struct rdt_mon_domain *d)
>>   {
>> +	kfree(d->cntr_cfg);
>>   	bitmap_free(d->rmid_busy_llc);
>>   	kfree(d->mbm_total);
>>   	kfree(d->mbm_local);
>> @@ -4171,6 +4172,16 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain
>>   			return -ENOMEM;
>>   		}
>>   	}
>> +	if (resctrl_is_mbm_enabled() && r->mon.mbm_cntr_assignable) {
>> +		tsize = sizeof(*d->cntr_cfg);
>> +		d->cntr_cfg = kcalloc(r->mon.num_mbm_cntrs, tsize, GFP_KERNEL);
>> +		if (!d->cntr_cfg) {
>> +			bitmap_free(d->rmid_busy_llc);
>> +			kfree(d->mbm_total);
>> +			kfree(d->mbm_local);
>> +			return -ENOMEM;
>> +		}
>> +	}
>>   
>>   	return 0;
>>   }
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index bdb264875ef6..d77981d1fcb9 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -156,6 +156,20 @@ struct rdt_ctrl_domain {
>>   	u32				*mbps_val;
>>   };
>>   
>> +/**
>> + * struct mbm_cntr_cfg - Assignable counter configuration
>> + * @evtid:		MBM event to which the counter is assigned. Only valid
>> + *			if @rdtgroup is not NULL.
>> + * @evt_cfg:		Event configuration value.
> 
> @evt_cfg is not introduced in changelog nor defined here. Please add a snippet here
> on what @evt_cfg's values represent. This is important since this is exposed
> as resctrl fs API to architectures so all architectures need to use same values when
> interacting with resctrl.

Sure.

@evt_cfg: A value that represents memory transactions (e.g., reads, 
writes, etc.).

> 
>> + * @rdtgrp:		resctrl group assigned to the counter. NULL if the
>> + *			counter is free.
>> + */
>> +struct mbm_cntr_cfg {
>> +	enum resctrl_event_id   evtid;
>> +	u32                     evt_cfg;
>> +	struct rdtgroup         *rdtgrp;
> 
> Please align struct member names using TABs.

Sure.

> 
>> +};
>> +
>>   /**
>>    * struct rdt_mon_domain - group of CPUs sharing a resctrl monitor resource
>>    * @hdr:		common header for different domain types
>> @@ -167,6 +181,7 @@ struct rdt_ctrl_domain {
>>    * @cqm_limbo:		worker to periodically read CQM h/w counters
>>    * @mbm_work_cpu:	worker CPU for MBM h/w counters
>>    * @cqm_work_cpu:	worker CPU for CQM h/w counters
>> + * @cntr_cfg:		assignable counters configuration
> 
> "array of assignable counters' configuration (indexed by counter ID)"

Sure.

> 
>>    */
>>   struct rdt_mon_domain {
>>   	struct rdt_domain_hdr		hdr;
>> @@ -178,6 +193,7 @@ struct rdt_mon_domain {
>>   	struct delayed_work		cqm_limbo;
>>   	int				mbm_work_cpu;
>>   	int				cqm_work_cpu;
>> +	struct mbm_cntr_cfg		*cntr_cfg;
>>   };
>>   
>>   /**
> 
> Reinette
> 

thanks
Babu

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 08/27] x86/resctrl: Introduce mbm_cntr_cfg to track assignable counters at domain
  2025-05-28 16:56     ` Moger, Babu
@ 2025-05-28 17:34       ` Reinette Chatre
  2025-05-28 19:05         ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-05-28 17:34 UTC (permalink / raw)
  To: Moger, Babu, Babu Moger, corbet, tony.luck, tglx, mingo, bp,
	dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/28/25 9:56 AM, Moger, Babu wrote:
> On 5/22/2025 4:02 PM, Reinette Chatre wrote:
>> On 5/15/25 3:51 PM, Babu Moger wrote:

>>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>>> index bdb264875ef6..d77981d1fcb9 100644
>>> --- a/include/linux/resctrl.h
>>> +++ b/include/linux/resctrl.h
>>> @@ -156,6 +156,20 @@ struct rdt_ctrl_domain {
>>>       u32                *mbps_val;
>>>   };
>>>   +/**
>>> + * struct mbm_cntr_cfg - Assignable counter configuration
>>> + * @evtid:        MBM event to which the counter is assigned. Only valid
>>> + *            if @rdtgroup is not NULL.
>>> + * @evt_cfg:        Event configuration value.
>>
>> @evt_cfg is not introduced in changelog nor defined here. Please add a snippet here
>> on what @evt_cfg's values represent. This is important since this is exposed
>> as resctrl fs API to architectures so all architectures need to use same values when
>> interacting with resctrl.
> 
> Sure.
> 
> @evt_cfg: A value that represents memory transactions (e.g., reads, writes, etc.).

This still does not explain how an @evt_cfg value should be interpreted. For example, it
could be something like below (please feel free to improve).

@evt_cfg: Event configuration created using the READS_TO_LOCAL_MEM, READS_TO_REMOTE_MEM, etc. bits
	  that represent the memory transactions being counted.

Reinette

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 08/27] x86/resctrl: Introduce mbm_cntr_cfg to track assignable counters at domain
  2025-05-28 17:34       ` Reinette Chatre
@ 2025-05-28 19:05         ` Moger, Babu
  0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-05-28 19:05 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, tony.luck, tglx, mingo, bp,
	dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/28/2025 12:34 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/28/25 9:56 AM, Moger, Babu wrote:
>> On 5/22/2025 4:02 PM, Reinette Chatre wrote:
>>> On 5/15/25 3:51 PM, Babu Moger wrote:
> 
>>>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>>>> index bdb264875ef6..d77981d1fcb9 100644
>>>> --- a/include/linux/resctrl.h
>>>> +++ b/include/linux/resctrl.h
>>>> @@ -156,6 +156,20 @@ struct rdt_ctrl_domain {
>>>>        u32                *mbps_val;
>>>>    };
>>>>    +/**
>>>> + * struct mbm_cntr_cfg - Assignable counter configuration
>>>> + * @evtid:        MBM event to which the counter is assigned. Only valid
>>>> + *            if @rdtgroup is not NULL.
>>>> + * @evt_cfg:        Event configuration value.
>>>
>>> @evt_cfg is not introduced in changelog nor defined here. Please add a snippet here
>>> on what @evt_cfg's values represent. This is important since this is exposed
>>> as resctrl fs API to architectures so all architectures need to use same values when
>>> interacting with resctrl.
>>
>> Sure.
>>
>> @evt_cfg: A value that represents memory transactions (e.g., reads, writes, etc.).
> 
> This still does not explain how an @evt_cfg value should be interpreted. For example, it
> could be something like below (please feel free to improve).
> 
> @evt_cfg: Event configuration created using the READS_TO_LOCAL_MEM, READS_TO_REMOTE_MEM, etc. bits
> 	  that represent the memory transactions being counted.
> 

Looks good.
Thanks
Babu

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 10/27] x86/resctrl: Add data structures and definitions for ABMC assignment
  2025-05-22 21:10   ` Reinette Chatre
@ 2025-05-28 19:15     ` Moger, Babu
  0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-05-28 19:15 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, tony.luck, tglx, mingo, bp,
	dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/22/2025 4:10 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/15/25 3:51 PM, Babu Moger wrote:
>> The ABMC feature provides an option to the user to assign a hardware
>> counter to an RMID, event pair and monitor the bandwidth as long as the
>> counter is assigned. The bandwidth events will be tracked by the hardware
>> until the user changes the configuration. Each resctrl group can configure
>> maximum two counters, one for total event and one for local event.
> 
> (please update, above describes previous design)
> 

Ok. Will drop the last line.

>>
>> The ABMC feature implements an MSR L3_QOS_ABMC_CFG (C000_03FDh).
>> ABMC counter assignment is done by setting the counter id, bandwidth
>> source (RMID) and bandwidth configuration. Users will have the option to
>> change the bandwidth configuration using resctrl interface which will be
>> introduced later in the series.
> 
> "will be introduced later in the series" is similar to "in a subsequent patch"
> and should not be used in a changelog. Just describe what this patch does.

ok. Last line is really not required.

> 
>>
>> Attempts to read or write the MSR when ABMC is not enabled will result
>> in a #GP(0) exception.
>>
>> Introduce the data structures and definitions for MSR L3_QOS_ABMC_CFG
>> (0xC000_03FDh):
>> =========================================================================
>> Bits 	Mnemonic	Description			Access Reset
>> 							Type   Value
>> =========================================================================
>> 63 	CfgEn 		Configuration Enable 		R/W 	0
>>
>> 62 	CtrEn 		Enable/disable counting		R/W 	0
>>
>> 61:53 	– 		Reserved 			MBZ 	0
>>
>> 52:48 	CtrID 		Counter Identifier		R/W	0
>>
>> 47 	IsCOS		BwSrc field is a CLOSID		R/W	0
>> 			(not an RMID)
>>
>> 46:44 	–		Reserved			MBZ	0
>>
>> 43:32	BwSrc		Bandwidth Source		R/W	0
>> 			(RMID or CLOSID)
>>
>> 31:0	BwType		Bandwidth configuration		R/W	0
>> 			to track for this counter
>> ==========================================================================
>>
>> The feature details are documented in the APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC).
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v13: Removed the Reviewed-by tag as there is commit log change to remove
>>       BMEC reference.
>>
>> v12: No changes.
>>
>> v11: No changes.
>>
>> v10: No changes.
>>
>> v9: Removed the references of L3_QOS_ABMC_DSC.
>>      Text changes about configuration in kernel doc.
>>
>> v8: Update the configuration notes in kernel_doc.
>>      Few commit message update.
>>
>> v7: Removed the reference of L3_QOS_ABMC_DSC as it is not used anymore.
>>      Moved the configuration notes to kernel_doc.
>>      Adjusted the tabs for l3_qos_abmc_cfg and checkpatch seems happy.
>>
>> v6: Removed all the fs related changes.
>>      Added note on CfgEn,CtrEn.
>>      Removed the definitions which are not used.
>>      Removed cntr_id initialization.
>>
>> v5: Moved assignment flags here (path 10/19 of v4).
>>      Added MON_CNTR_UNSET definition to initialize cntr_id's.
>>      More details in commit log.
>>      Renamed few fields in l3_qos_abmc_cfg for readability.
>>
>> v4: Added more descriptions.
>>      Changed the name abmc_ctr_id to ctr_id.
>>      Added L3_QOS_ABMC_DSC. Used for reading the configuration.
>>
>> v3: No changes.
>>
>> v2: No changes.
>> ---
>>   arch/x86/include/asm/msr-index.h       |  1 +
>>   arch/x86/kernel/cpu/resctrl/internal.h | 35 ++++++++++++++++++++++++++
>>   2 files changed, 36 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
>> index 3970e0b16e47..b5b5ebead24f 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -1203,6 +1203,7 @@
>>   /* - AMD: */
>>   #define MSR_IA32_MBA_BW_BASE		0xc0000200
>>   #define MSR_IA32_SMBA_BW_BASE		0xc0000280
>> +#define MSR_IA32_L3_QOS_ABMC_CFG	0xc00003fd
>>   #define MSR_IA32_L3_QOS_EXT_CFG		0xc00003ff
>>   #define MSR_IA32_EVT_CFG_BASE		0xc0000400
>>   
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index fcc9d23686a1..db6b0c28ee6b 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -164,6 +164,41 @@ union cpuid_0x10_x_edx {
>>   	unsigned int full;
>>   };
>>   
>> +/*
>> + * ABMC counters are configured by writing to L3_QOS_ABMC_CFG.
>> + * @bw_type		: Bandwidth configuration (supported by BMEC)
>> + *			  tracked by the @cntr_id.
> 
> The "supported by BMEC" is unexpected with the new design that separated
> the two features.

My bad. Will remove it.


>> + * @bw_src		: Bandwidth source (RMID or CLOSID).
>> + * @reserved1		: Reserved.
>> + * @is_clos		: @bw_src field is a CLOSID (not an RMID).
>> + * @cntr_id		: Counter identifier.
>> + * @reserved		: Reserved.
>> + * @cntr_en		: Counting enable bit.
>> + * @cfg_en		: Configuration enable bit.
>> + *
>> + * Configuration and counting:
>> + * Counter can be configured across multiple writes to MSR. Configuration
>> + * is applied only when @cfg_en = 1. Counter @cntr_id is reset when the
>> + * configuration is applied.
>> + * @cfg_en = 1, @cntr_en = 0 : Apply @cntr_id configuration but do not
>> + *                             count events.
>> + * @cfg_en = 1, @cntr_en = 1 : Apply @cntr_id configuration and start
>> + *                             counting events.
>> + */
>> +union l3_qos_abmc_cfg {
>> +	struct {
>> +		unsigned long bw_type  :32,
>> +			      bw_src   :12,
>> +			      reserved1: 3,
>> +			      is_clos  : 1,
>> +			      cntr_id  : 5,
>> +			      reserved : 9,
>> +			      cntr_en  : 1,
>> +			      cfg_en   : 1;
>> +	} split;
>> +	unsigned long full;
>> +};
>> +
>>   void rdt_ctrl_update(void *arg);
>>   
>>   int rdt_get_mon_l3_config(struct rdt_resource *r);
> 
> Reinette
> 

Thanks
Babu

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 11/27] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
  2025-05-22 21:51   ` Reinette Chatre
  2025-05-22 22:16     ` Luck, Tony
@ 2025-05-28 21:39     ` Moger, Babu
  1 sibling, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-05-28 21:39 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, tony.luck, tglx, mingo, bp,
	dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/22/2025 4:51 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/15/25 3:51 PM, Babu Moger wrote:
>> The ABMC feature provides an option to the user to assign a hardware
>> counter to an RMID, event pair and monitor the bandwidth as long as it
>> is assigned. The assigned RMID will be tracked by the hardware until the
>> user unassigns it manually.
> 
> (please review this often repeated snippet to match new design)

Sure.
> 
>>
>> Implement an architecture-specific handler to assign and unassign the
>> counter. Configure counters by writing to the L3_QOS_ABMC_CFG MSR,
>> specifying the counter ID, bandwidth source (RMID), and event
>> configuration.
>>
> 
> ...
> 
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index ff4b2abfa044..e31084f7babd 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -448,3 +448,40 @@ inline bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
>>   {
>>   	return resctrl_to_arch_res(r)->mbm_cntr_assign_enabled;
>>   }
>> +
>> +static void resctrl_abmc_config_one_amd(void *info)
>> +{
>> +	union l3_qos_abmc_cfg *abmc_cfg = info;
>> +
>> +	wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, abmc_cfg->full);
>> +}
>> +
>> +/*
>> + * Send an IPI to the domain to assign the counter to RMID, event pair.
>> + */
>> +void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>> +			      enum resctrl_event_id evtid, u32 rmid, u32 closid,
>> +			      u32 cntr_id, u32 evt_cfg, bool assign)
>> +{
>> +	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
>> +	union l3_qos_abmc_cfg abmc_cfg = { 0 };
>> +	struct arch_mbm_state *am;
>> +
>> +	abmc_cfg.split.cfg_en = 1;
>> +	abmc_cfg.split.cntr_en = assign ? 1 : 0;
>> +	abmc_cfg.split.cntr_id = cntr_id;
>> +	abmc_cfg.split.bw_src = rmid;
>> +	abmc_cfg.split.bw_type = evt_cfg;
> 
> Is evt_cfg really needed to be programmed when unassigning a counter? Looking ahead at
> patch #14 resctrl_free_config_cntr() needs to go through extra list walk to get this data
> but why would hardware need an accurate event configuration to *unassign* a counter?

evt_cfg is not required during unassign. I can remove it.

> 
> It seems unnecessary to provide both the event ID *and* the configuration.
> resctrl_arch_config_cntr() could drop the "evt_cfg" parameter and instead there
> can be a new resctrl utility that architecture can use to query the event's configuration.
> Similar to resctrl_is_mon_event_enabled() introduced in
> https://lore.kernel.org/lkml/20250521225049.132551-3-tony.luck@intel.com/ that exposes an
> event property.

Sounds good.
I can add a new function resctrl_get_mon_event_config(evtid) and call it 
only during the "assign". It will be called inside 
resctrl_arch_config_cntr().

> 
> It looks to me as though there are a couple of changes in the telemetry work
> that would benefit this work. https://lore.kernel.org/lkml/20250521225049.132551-2-tony.luck@intel.com/
> switches the monitor events to be maintained in an array indexed by event ID, eliminating the
> need for searching the evt_list that this work does in a couple of places. Also note the handy
> new for_each_mbm_event() helper (https://lore.kernel.org/lkml/20250521225049.132551-5-tony.luck@intel.com/).

Sure. Looking at it now.

> 
> 
>> +
>> +	smp_call_function_any(&d->hdr.cpu_mask, resctrl_abmc_config_one_amd, &abmc_cfg, 1);
>> +
>> +	/*
>> +	 * The hardware counter is reset (because cfg_en == 1) so there is no
>> +	 * need to record initial non-zero counts.
>> +	 */
>> +	if (assign) {
>> +		am = get_arch_mbm_state(hw_dom, rmid, evtid);
>> +		if (am)
>> +			memset(am, 0, sizeof(*am));
>> +	}
>> +}
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index d77981d1fcb9..59a4fe60ab46 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -559,6 +559,23 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *
>>    */
>>   void resctrl_arch_reset_all_ctrls(struct rdt_resource *r);
>>   
>> +/**
>> + * resctrl_arch_config_cntr() - Configure the counter id to RMID, event
>> + *				pair on the domain.
> 
> The sentence seem strange, should "Configure the counter" perhaps be
> "Assign the counter"? Or if the naming requires "configure" ...
> "Configure the counter with its new RMID and event details."? Please feel
> free to improve.

Last one looks good.

> 
>> + * @r:			Resource structure.
>> + * @d:			Domain that the counter id to be configured.
> 
> I am unable to parse description of @d.

The domain in which the counter ID is to be configured.

> 
>> + * @evtid:		Event type to configure.
>> + * @rmid:		RMID to configure.
>> + * @closid:		CLOSID to configure.
>> + * @cntr_id:		Counter ID to configure.
> 
> All four parameters descriptions end with "to configure" ... but it is actually only
> the counter that is configured while the rest is the data that the counter is configured with, no?

Will remove "to configure" from all the other fields except the cntr_id.

> 
>> + * @evt_cfg:		MBM event configuration value representing reads,
>> + *			writes etc.
> 
> Needs definition about what the contents of @evt_cfg means. This is the API ...it
> cannot be vague like "reads, write, etc." but should be specific about which bit means
> what.

Copying your comment on other patch
https://lore.kernel.org/lkml/14ca1527-ee25-448d-949b-ed8df546c916@intel.com/

@evt_cfg: Event configuration created using the READS_TO_LOCAL_MEM, 
READS_TO_REMOTE_MEM, etc. bits that represent the memory transactions 
being counted.


> 
>> + * @assign:		Assign or unassign.
> 
> "True to assign the counter, false to unassign the counter."
> 

Sure.

> 
> Needs some context here about what architecture can expect on how this function will
> be called. For example, "Can be called from any CPU."
> 

Sure.

>> + */
>> +void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>> +			      enum resctrl_event_id evtid, u32 rmid, u32 closid,
>> +			      u32 cntr_id, u32 evt_cfg, bool assign);
>> +
>>   extern unsigned int resctrl_rmid_realloc_threshold;
>>   extern unsigned int resctrl_rmid_realloc_limit;
>>   
> 
> Reinette
> 

Thanks
Babu

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: RE: [PATCH v13 11/27] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
  2025-05-27 21:41           ` Luck, Tony
@ 2025-05-28 21:41             ` Moger, Babu
  2025-05-28 22:00               ` Luck, Tony
  2025-06-09 14:01               ` Moger, Babu
  0 siblings, 2 replies; 114+ messages in thread
From: Moger, Babu @ 2025-05-28 21:41 UTC (permalink / raw)
  To: Luck, Tony, Peter Newman
  Cc: Chatre, Reinette, Babu Moger, corbet@lwn.net, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	james.morse@arm.com, dave.martin@arm.com, fenghuay@nvidia.com,
	x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
	akpm@linux-foundation.org, thuth@redhat.com, rostedt@goodmis.org,
	ardb@kernel.org, gregkh@linuxfoundation.org,
	daniel.sneddon@linux.intel.com, jpoimboe@kernel.org,
	alexandre.chartre@oracle.com, pawan.kumar.gupta@linux.intel.com,
	thomas.lendacky@amd.com, perry.yuan@amd.com, seanjc@google.com,
	Huang, Kai, Li, Xiaoyao, kan.liang@linux.intel.com, Li, Xin3,
	ebiggers@google.com, xin@zytor.com, Mehta, Sohil,
	andrew.cooper3@citrix.com, mario.limonciello@amd.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Wieczor-Retman, Maciej, Eranian, Stephane, Xiaojian.Du@amd.com,
	gautham.shenoy@amd.com

Hi Tony, Peter,

On 5/27/2025 4:41 PM, Luck, Tony wrote:
> 
>> Thanks for applying my suggestion[1] about the array entry sizes, but
>> you needed one more dereference:
> 
>> -       size_t tsize = sizeof(hw_dom->arch_mbm_states[0]);
>> +       size_t tsize = sizeof(*hw_dom->arch_mbm_states[0]);
> 
>> -       size_t tsize = sizeof(d->mbm_states[0]);
>> +       size_t tsize = sizeof(*d->mbm_states[0]);
> 
> Indeed yes. Thanks.
> 

Tony, Thanks for porting patches.

I can actually pick your branch [1] and apply review comments on top for 
v14 series. Hope that is fine with everyone.
[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git/log/?h=my_mbm_plus_babu_abmc

One question though: Where will the Peter's fix [2] go?
[2] 
https://lore.kernel.org/lkml/CALPaoCj7FBv_vfDp+4tgqo4p8T7Eov_Ys+CQRoAX6u43a4OTDQ@mail.gmail.com/

thanks
Babu


^ permalink raw reply	[flat|nested] 114+ messages in thread

* RE: RE: [PATCH v13 11/27] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
  2025-05-28 21:41             ` Moger, Babu
@ 2025-05-28 22:00               ` Luck, Tony
  2025-05-28 22:13                 ` Luck, Tony
  2025-06-09 14:01               ` Moger, Babu
  1 sibling, 1 reply; 114+ messages in thread
From: Luck, Tony @ 2025-05-28 22:00 UTC (permalink / raw)
  To: Moger, Babu, Peter Newman
  Cc: Chatre, Reinette, Babu Moger, corbet@lwn.net, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	james.morse@arm.com, dave.martin@arm.com, fenghuay@nvidia.com,
	x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
	akpm@linux-foundation.org, thuth@redhat.com, rostedt@goodmis.org,
	ardb@kernel.org, gregkh@linuxfoundation.org,
	daniel.sneddon@linux.intel.com, jpoimboe@kernel.org,
	alexandre.chartre@oracle.com, pawan.kumar.gupta@linux.intel.com,
	thomas.lendacky@amd.com, perry.yuan@amd.com, seanjc@google.com,
	Huang, Kai, Li, Xiaoyao, kan.liang@linux.intel.com, Li, Xin3,
	ebiggers@google.com, xin@zytor.com, Mehta, Sohil,
	andrew.cooper3@citrix.com, mario.limonciello@amd.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Wieczor-Retman, Maciej, Eranian, Stephane, Xiaojian.Du@amd.com,
	gautham.shenoy@amd.com

> One question though: Where will the Peter's fix [2] go?
> [2] 
> https://lore.kernel.org/lkml/CALPaoCj7FBv_vfDp+4tgqo4p8T7Eov_Ys+CQRoAX6u43a4OTDQ@mail.gmail.com/

Babu,

I'll backport into my patches and then rebase the whole branch.  Hopefully should not take long.

-Tony



^ permalink raw reply	[flat|nested] 114+ messages in thread

* RE: RE: [PATCH v13 11/27] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
  2025-05-28 22:00               ` Luck, Tony
@ 2025-05-28 22:13                 ` Luck, Tony
  2025-05-28 23:48                   ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Luck, Tony @ 2025-05-28 22:13 UTC (permalink / raw)
  To: Moger, Babu, Peter Newman
  Cc: Chatre, Reinette, Babu Moger, corbet@lwn.net, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	james.morse@arm.com, dave.martin@arm.com, fenghuay@nvidia.com,
	x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
	akpm@linux-foundation.org, thuth@redhat.com, rostedt@goodmis.org,
	ardb@kernel.org, gregkh@linuxfoundation.org,
	daniel.sneddon@linux.intel.com, jpoimboe@kernel.org,
	alexandre.chartre@oracle.com, pawan.kumar.gupta@linux.intel.com,
	thomas.lendacky@amd.com, perry.yuan@amd.com, seanjc@google.com,
	Huang, Kai, Li, Xiaoyao, kan.liang@linux.intel.com, Li, Xin3,
	ebiggers@google.com, xin@zytor.com, Mehta, Sohil,
	andrew.cooper3@citrix.com, mario.limonciello@amd.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Wieczor-Retman, Maciej, Eranian, Stephane, Xiaojian.Du@amd.com,
	gautham.shenoy@amd.com

> I'll backport into my patches and then rebase the whole branch.  Hopefully should not take long.

Done and force pushed to same branch at kernel .org.

Both Peter's changes went into my 4th patch, and your patches didn't touch those functions
so git did all the real work.

-Tony

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: RE: RE: [PATCH v13 11/27] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
  2025-05-28 22:13                 ` Luck, Tony
@ 2025-05-28 23:48                   ` Moger, Babu
  0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-05-28 23:48 UTC (permalink / raw)
  To: Luck, Tony, Peter Newman
  Cc: Chatre, Reinette, Babu Moger, corbet@lwn.net, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	james.morse@arm.com, dave.martin@arm.com, fenghuay@nvidia.com,
	x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
	akpm@linux-foundation.org, thuth@redhat.com, rostedt@goodmis.org,
	ardb@kernel.org, gregkh@linuxfoundation.org,
	daniel.sneddon@linux.intel.com, jpoimboe@kernel.org,
	alexandre.chartre@oracle.com, pawan.kumar.gupta@linux.intel.com,
	thomas.lendacky@amd.com, perry.yuan@amd.com, seanjc@google.com,
	Huang, Kai, Li, Xiaoyao, kan.liang@linux.intel.com, Li, Xin3,
	ebiggers@google.com, xin@zytor.com, Mehta, Sohil,
	andrew.cooper3@citrix.com, mario.limonciello@amd.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Wieczor-Retman, Maciej, Eranian, Stephane, Xiaojian.Du@amd.com,
	gautham.shenoy@amd.com

Hi Tony,

On 5/28/2025 5:13 PM, Luck, Tony wrote:
>> I'll backport into my patches and then rebase the whole branch.  Hopefully should not take long.
> 
> Done and force pushed to same branch at kernel .org.
> 
> Both Peter's changes went into my 4th patch, and your patches didn't touch those functions
> so git did all the real work.
> 

I will base my v14 on top of this. Thanks
Babu

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 12/27] x86/resctrl: Introduce event configuration modes
  2025-05-22 22:05   ` Reinette Chatre
@ 2025-05-29 15:21     ` Moger, Babu
  0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-05-29 15:21 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/22/25 17:05, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/15/25 3:51 PM, Babu Moger wrote:
>> MBM events can be configured using either BMEC (Bandwidth Monitoring Event
>> Configuration) or the mbm_cntr_assign mode.
>>
>> Introduce a data structure to represent the various event configuration
>> modes and their corresponding values.
>>
>> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
> 
> I cannot recall suggesting this.
> 
> (/me digs)
> 
> Are you perhaps referring to https://lore.kernel.org/lkml/d2966a26-4483-4808-a538-bb20973dd2a1@intel.com/

Yes.
> 
> This is not referring to new modes but the existing mbm_cntr_assign modes.
> resctrl knows which "mbm_cntr_assign" mode is active and it can use that
> to determine whether BMEC can be exposed to user space or not. There is
> already enough information in resctrl to know whether BMEC files should be
> exposed or not.
> 
> I think this work self makes clear that these modes are useless since
> patch #25 that determines whether to hide BMEC files doesn't even
> use it.

Sure. I will remove change related to mbm_mode.

> 
> 
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v13: New patch to handle different event configuration types with
>>      mbm_cntr_assign mode.
>> ---
>>  fs/resctrl/internal.h         |  6 ++++--
>>  fs/resctrl/monitor.c          |  4 ++--
>>  fs/resctrl/rdtgroup.c         |  2 +-
>>  include/linux/resctrl_types.h | 11 +++++++++++
>>  4 files changed, 18 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
>> index 9a8cf6f11151..0fae374559ba 100644
>> --- a/fs/resctrl/internal.h
>> +++ b/fs/resctrl/internal.h
>> @@ -55,13 +55,15 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
>>   * struct mon_evt - Entry in the event list of a resource
>>   * @evtid:		event id
>>   * @name:		name of the event
>> - * @configurable:	true if the event is configurable
>> + * @mbm_mode:		monitoring mode (BMEC or mbm_cntr_assign)
>> + * @evt_cfg:		event configuration value decoding reads, writes.
>>   * @list:		entry in &rdt_resource->evt_list
>>   */
>>  struct mon_evt {
>>  	enum resctrl_event_id	evtid;
>>  	char			*name;
>> -	bool			configurable;
>> +	enum resctrl_mbm_mode	mbm_mode;
>> +	u32			evt_cfg;
> 
> This very important yet totally unrelated member sneaked in without
> any mention.

Yes. We only need evt_cfg. I will just add it as separate patch.

> 
>>  	struct list_head	list;
>>  };
>>  
>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>> index 2548aee0151c..8e403587a02f 100644
>> --- a/fs/resctrl/monitor.c
>> +++ b/fs/resctrl/monitor.c
>> @@ -903,12 +903,12 @@ int resctrl_mon_resource_init(void)
>>  	l3_mon_evt_init(r);
>>  
>>  	if (resctrl_arch_is_evt_configurable(QOS_L3_MBM_TOTAL_EVENT_ID)) {
>> -		mbm_total_event.configurable = true;
>> +		mbm_total_event.mbm_mode = MBM_MODE_BMEC;
>>  		resctrl_file_fflags_init("mbm_total_bytes_config",
>>  					 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
>>  	}
>>  	if (resctrl_arch_is_evt_configurable(QOS_L3_MBM_LOCAL_EVENT_ID)) {
>> -		mbm_local_event.configurable = true;
>> +		mbm_local_event.mbm_mode = MBM_MODE_BMEC;
>>  		resctrl_file_fflags_init("mbm_local_bytes_config",
>>  					 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
>>  	}
>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>> index 752750e3e443..f192b2736a77 100644
>> --- a/fs/resctrl/rdtgroup.c
>> +++ b/fs/resctrl/rdtgroup.c
>> @@ -1152,7 +1152,7 @@ static int rdt_mon_features_show(struct kernfs_open_file *of,
>>  
>>  	list_for_each_entry(mevt, &r->mon.evt_list, list) {
>>  		seq_printf(seq, "%s\n", mevt->name);
>> -		if (mevt->configurable)
>> +		if (mevt->mbm_mode == MBM_MODE_BMEC)
> 
> This can instead be a call to a utility that returns whether BMEC should be
> visible based on resctrl_mon::mbm_cntr_assignable and rdt_hw_resource::mbm_cntr_assign_enabled
> (via resctrl_arch_mbm_cntr_assign_enabled() of course).

Sure. Will do.

> 
>>  			seq_printf(seq, "%s_config\n", mevt->name);
>>  	}
>>  
>> diff --git a/include/linux/resctrl_types.h b/include/linux/resctrl_types.h
>> index a25fb9c4070d..26cd1fec72db 100644
>> --- a/include/linux/resctrl_types.h
>> +++ b/include/linux/resctrl_types.h
>> @@ -47,4 +47,15 @@ enum resctrl_event_id {
>>  	QOS_NUM_EVENTS,
>>  };
>>  
>> +/*
>> + * Event configuration mode.
>> + * Events can be configured either in BMEC (Bandwidth Monitoring Event
>> + * Configuration) mode or mbm_cntr_assign mode.
>> + */
>> +enum resctrl_mbm_mode {
>> +	MBM_MODE_NONE,
>> +	MBM_MODE_BMEC,
>> +	MBM_MODE_ASSIGN,
>> +};
>> +
>>  #endif /* __LINUX_RESCTRL_TYPES_H */
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 13/27] x86/resctrl: Add the functionality to assign MBM events
  2025-05-22 22:41   ` Reinette Chatre
@ 2025-05-29 16:05     ` Moger, Babu
  0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-05-29 16:05 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/22/25 17:41, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/15/25 3:51 PM, Babu Moger wrote:
>> The mbm_cntr_assign mode offers "num_mbm_cntrs" number of counters that
>> can be assigned to RMID, event pair and monitor the bandwidth as long
> 
> "RMID, event pairs"? (assuming at this point in new version it will be
> obvious what is meant by "event").

Sure.

> 
>> as it is assigned.
>>
>> Add the functionality to allocate and assign a counter to am RMID, event
> 
> "am" -> "an"
> 

sure.

>> pair in the domain.
>>
>> If all the counters are in use, kernel will log the error message "Unable
>> to allocate counter in domain" in /sys/fs/resctrl/info/last_cmd_status
>> when a new assignment is requested. Exit on the first failure when
>> assigning counters across all the domains.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> 
> ...
> 
>> ---
>>  fs/resctrl/internal.h |   3 +
>>  fs/resctrl/monitor.c  | 134 ++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 137 insertions(+)
>>
>> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
>> index 0fae374559ba..ce4fcac91937 100644
>> --- a/fs/resctrl/internal.h
>> +++ b/fs/resctrl/internal.h
>> @@ -377,6 +377,9 @@ bool closid_allocated(unsigned int closid);
>>  
>>  int resctrl_find_cleanest_closid(void);
>>  
>> +int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
>> +			      struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
>> +
>>  #ifdef CONFIG_RESCTRL_FS_PSEUDO_LOCK
>>  int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
>>  
>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>> index 8e403587a02f..d76fd0840946 100644
>> --- a/fs/resctrl/monitor.c
>> +++ b/fs/resctrl/monitor.c
>> @@ -934,3 +934,137 @@ void resctrl_mon_resource_exit(void)
>>  
>>  	dom_data_exit(r);
>>  }
>> +
>> +/*
>> + * Configure the counter for the event, RMID pair for the domain. Reset the
>> + * non-architectural state to clear all the event counters.
> 
> clear *all* the event counters?
> 
> "Reset the non-architectural state to clear all the event counters." ->
> "Reset the associated non-architectural state."?

ok.

> 
> Also, please see https://lore.kernel.org/lkml/20250429003359.375508-3-tony.luck@intel.com/

Yes. Sure.

> 
>> + */
>> +static void resctrl_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>> +				enum resctrl_event_id evtid, u32 rmid, u32 closid,
>> +				u32 cntr_id, u32 evt_cfg, bool assign)
>> +{
>> +	struct mbm_state *m;
>> +
>> +	resctrl_arch_config_cntr(r, d, evtid, rmid, closid, cntr_id, evt_cfg, assign);
>> +
>> +	m = get_mbm_state(d, closid, rmid, evtid);
>> +	if (m)
>> +		memset(m, 0, sizeof(struct mbm_state));
>> +}
>> +
>> +/*
>> + * mbm_cntr_get() - Return the cntr_id for the matching evtid and rdtgrp in
>> + *		    cntr_cfg array.
> 
> Please prefix parameter names with @ in description to make obvious what is
> refered to. Although "cntr_id" is a local variable so may be easier to parse
> if cntr_id is replaced with actual "counter ID" term while keeping rest as
> actual parameters. That makes cntr_cfg unneeded.

Sure.


> If intending to explain function context then failure return should also
> be documented. Even better would be to follow typical style of kernel-doc
> (even if not using /** start) and not mix and match so randomly.

Sure.

> 
>> + */
>> +static int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
>> +			struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
>> +{
> 
> A subtle issue here is only evident from later patches, for example patch #17,
> that calls mbm_cntr_get() with a non MBM event ID from __mon_event_count().
> 
> If this usage is expected then these utilities needs extra checks to
> ensure they are only called with valid MBM event IDs.

Sure. Will add the check resctrl_is_mbm_event().

> 
>> +	int cntr_id;
>> +
>> +	for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
>> +		if (d->cntr_cfg[cntr_id].rdtgrp == rdtgrp &&
>> +		    d->cntr_cfg[cntr_id].evtid == evtid)
>> +			return cntr_id;
>> +	}
>> +
>> +	return -ENOENT;
>> +}
>> +
>> +/*
>> + * mbm_cntr_alloc() - Return the first free entry in cntr_cfg array.
> 
> "Return the first ...array."  -> "Initilialize and return ID of a new counter, return -ENOSPC on failure." ?
> This is still an awkward use of kernel-doc ... better to be properly formatted.

Sure.

> 
>> + */
>> +static int mbm_cntr_alloc(struct rdt_resource *r, struct rdt_mon_domain *d,
>> +			  struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
>> +{
>> +	int cntr_id;
>> +
>> +	for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
>> +		if (!d->cntr_cfg[cntr_id].rdtgrp) {
>> +			d->cntr_cfg[cntr_id].rdtgrp = rdtgrp;
>> +			d->cntr_cfg[cntr_id].evtid = evtid;
>> +			return cntr_id;
>> +		}
>> +	}
>> +
>> +	return -ENOSPC;
>> +}
>> +
>> +/*
>> + * mbm_get_mon_event() - Return the mon_evt entry for the matching evtid.
>> + */
>> +static struct mon_evt *mbm_get_mon_event(struct rdt_resource *r,
>> +					 enum resctrl_event_id evtid)
>> +{
>> +	struct mon_evt *mevt;
>> +
>> +	list_for_each_entry(mevt, &r->mon.evt_list, list) {
>> +		if (mevt->evtid == evtid)
>> +			return mevt;
>> +	}
> 
> With changes from  telemetry series this becomes an array lookup.

Sure. Will look into this.

> 
>> +
>> +	return NULL;
>> +}
>> +
>> +/*
>> + * Allocate a fresh counter and configure the event if not assigned already.
>> + */
>> +static int resctrl_alloc_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>> +				     struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
>> +{
>> +	struct mon_evt *mevt;
>> +	int cntr_id;
>> +
>> +	/* No need to allocate a new counter if it is already assigned */
>> +	cntr_id = mbm_cntr_get(r, d, rdtgrp, evtid);
>> +	if (cntr_id >= 0)
>> +		goto cntr_configure;
>> +
>> +	cntr_id = mbm_cntr_alloc(r, d, rdtgrp, evtid);
>> +	if (cntr_id <  0) {
>> +		rdt_last_cmd_printf("Unable to allocate counter in domain %d\n",
>> +				    d->hdr.id);
>> +		return cntr_id;
>> +	}
>> +
>> +cntr_configure:
>> +	mevt = mbm_get_mon_event(r, evtid);
>> +	if (!mevt) {
>> +		rdt_last_cmd_printf("Invalid event id %d\n", evtid);
> 
> Difficult to see at this point but it seems that this is in kernel bug territory since
> user space provided text that is translated to event ID and here translated back to
> monitor event. This must succeed. Could this be simplified and back-and-forth avoided
> by passing the mon_evt instead of event ID?

We can do that.

> 
>> +		return -EINVAL;
>> +	}
> 
> 
> 
>> +
>> +	/*
>> +	 * Skip reconfiguration if the event setup is current; otherwise,
>> +	 * update and apply the new configuration to the domain.
>> +	 */
>> +	if (mevt->evt_cfg != d->cntr_cfg[cntr_id].evt_cfg) {
> 
> Lost me. Previous patch silently created mon_event::evt_cfg without initializing it.
> Here it is compared and treated as the "source of truth" ... where does its value
> come from?

Yes. That is correct. Will have to initialize evt_cfg when it is first
introduced. Will do.


> 
>> +		d->cntr_cfg[cntr_id].evt_cfg = mevt->evt_cfg;
>> +		resctrl_config_cntr(r, d, evtid, rdtgrp->mon.rmid, rdtgrp->closid,
>> +				    cntr_id, mevt->evt_cfg, true);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +/*
>> + * Assign a hardware counter to event @evtid of group @rdtgrp.
>> + * Assign counters to all domains if @d is NULL; otherwise, assign the
>> + * counter to the specified domain @d.
> 
> Can add here what is mentioned in changelog that this exits on first failure
> and so highlight that this can have partial assignment when exit on such failure.

Sure.

> 
>> + */
>> +int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
>> +			      struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
>> +{
>> +	int ret = 0;
>> +
>> +	if (!d) {
>> +		list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> +			ret = resctrl_alloc_config_cntr(r, d, rdtgrp, evtid);
>> +			if (ret)
>> +				return ret;
>> +		}
>> +	} else {
>> +		ret = resctrl_alloc_config_cntr(r, d, rdtgrp, evtid);
>> +	}
>> +
>> +	return ret;
>> +}
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 14/27] x86/resctrl: Add the functionality to unassign MBM events
  2025-05-22 22:49   ` Reinette Chatre
@ 2025-05-29 16:25     ` Moger, Babu
  0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-05-29 16:25 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/22/25 17:49, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/15/25 3:51 PM, Babu Moger wrote:
>> The mbm_cntr_assign mode offers "num_mbm_cntrs" number of counters that
>> can be assigned to an RMID, event pair and monitor the bandwidth as long
>> as it is assigned. If all the counters are in use, the kernel will log the
>> error message "Unable to allocate counter in domain" in
>> /sys/fs/resctrl/info/last_cmd_status when a new assignment is requested.
>>
>> To make space for a new assignment, users must unassign an already
>> assigned counter and retry the assignment again.
>>
>> Add the functionality to unassign and free the counters in the domain.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> ...
> 
>> ---
>>  fs/resctrl/internal.h |  2 ++
>>  fs/resctrl/monitor.c  | 60 +++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 62 insertions(+)
>>
>> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
>> index ce4fcac91937..64ddc107fcab 100644
>> --- a/fs/resctrl/internal.h
>> +++ b/fs/resctrl/internal.h
>> @@ -379,6 +379,8 @@ int resctrl_find_cleanest_closid(void);
>>  
>>  int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
>>  			      struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
>> +int resctrl_unassign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
>> +				struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
>>  
>>  #ifdef CONFIG_RESCTRL_FS_PSEUDO_LOCK
>>  int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>> index d76fd0840946..fbc938bd3b23 100644
>> --- a/fs/resctrl/monitor.c
>> +++ b/fs/resctrl/monitor.c
>> @@ -989,6 +989,14 @@ static int mbm_cntr_alloc(struct rdt_resource *r, struct rdt_mon_domain *d,
>>  	return -ENOSPC;
>>  }
>>  
>> +/*
>> + * mbm_cntr_free() -  Reset cntr_id to zero.
> 
> "Reset cntr_id to zero"? cntr_id is an index to an array.
> Please provide accurate and useful descriptions.

Yes. My bad. Will correct it.

> 
>> + */
>> +static void mbm_cntr_free(struct rdt_mon_domain *d, int cntr_id)
>> +{
>> +	memset(&d->cntr_cfg[cntr_id], 0, sizeof(struct mbm_cntr_cfg));
>> +}
>> +
>>  /*
>>   * mbm_get_mon_event() - Return the mon_evt entry for the matching evtid.
>>   */
>> @@ -1068,3 +1076,55 @@ int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
>>  
>>  	return ret;
>>  }
>> +
>> +/*
>> + * Unassign and free the counter if assigned.
>> + */
>> +static int resctrl_free_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>> +				    struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
>> +{
>> +	struct mon_evt *mevt;
>> +	int cntr_id;
>> +
>> +	cntr_id = mbm_cntr_get(r, d, rdtgrp, evtid);
>> +
>> +	/* If there is no cntr_id assigned, nothing to do */
>> +	if (cntr_id < 0)
>> +		return 0;
>> +
>> +	mevt = mbm_get_mon_event(r, evtid);
>> +	if (!mevt) {
>> +		rdt_last_cmd_printf("Invalid event id %d\n", evtid);
> 
> Similar to previous comment this is in kernel bug territory and could be simplified
> by passing mon_evt instead. Although this is the unassign portion where 
> evt_cfg seems unnecessary.

Sure. The call mbm_get_mon_event() is not required anymore in this path.

> 
>> +		return -EINVAL;
>> +	}
>> +
>> +	resctrl_config_cntr(r, d, evtid, rdtgrp->mon.rmid, rdtgrp->closid,
>> +			    cntr_id, mevt->evt_cfg, false);
>> +
>> +	mbm_cntr_free(d, cntr_id);
>> +
>> +	return 0;
>> +}
>> +
>> +/*
>> + * Unassign a hardware counter associated with @evtid from the domain and
>> + * the group. Unassign the counters from all the domains if @d is NULL else
>> + * unassign from @d.
>> + */
>> +int  resctrl_unassign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
>> +				 struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
>> +{
>> +	int ret;
>> +
>> +	if (!d) {
>> +		list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> +			ret = resctrl_free_config_cntr(r, d, rdtgrp, evtid);
>> +			if (ret)
>> +				return ret;
>> +		}
>> +	} else {
>> +		ret = resctrl_free_config_cntr(r, d, rdtgrp, evtid);
>> +	}
>> +
>> +	return ret;
>> +}
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 15/27] x86/resctrl: Report 'Unassigned' for MBM events in mbm_cntr_assign mode
  2025-05-22 23:01   ` Reinette Chatre
@ 2025-05-29 16:58     ` Moger, Babu
  0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-05-29 16:58 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/22/25 18:01, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/15/25 3:52 PM, Babu Moger wrote:
>> In mbm_cntr_assign mode, the hardware counter should be assigned to read
>> the MBM events.
>>
>> Report 'Unassigned' in case the user attempts to read the event without
>> assigning a hardware counter.
>>
>> Export resctrl_is_mbm_event() and mbm_cntr_get() to allow usage from other
>> functions within fs/resctrl.
> 
> Please clarify that these two functions are exposed differently, resctrl_is_mbm_event()
> is added to include/linux/resctrl.h (also note similar change in 
> https://lore.kernel.org/lkml/20250429003359.375508-3-tony.luck@intel.com/)
> so not just exposed to fs/resctrl but instead to resctrl fs as well as
> arch code
> while mbm_cntr_get() remains internal to resctrl fs by being added to
> fs/resctrl/internal.h.

Sure. Will update the comment.
With Tony's changes(
https://lore.kernel.org/lkml/20250429003359.375508-3-tony.luck@intel.com/),
the resctrl_is_mbm_event() is not required in here. It is already there.

I will have to update the comment only on mbm_cntr_get().  Will do.

>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> 
> 
>> ---
>>  Documentation/filesystems/resctrl.rst |  8 ++++++++
>>  fs/resctrl/ctrlmondata.c              | 14 ++++++++++++++
>>  fs/resctrl/internal.h                 |  2 ++
>>  fs/resctrl/monitor.c                  |  4 ++--
>>  fs/resctrl/rdtgroup.c                 |  2 +-
>>  include/linux/resctrl.h               |  1 +
>>  6 files changed, 28 insertions(+), 3 deletions(-)
>>
>> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
>> index 2bfad43aac9c..5cf2d742f04c 100644
>> --- a/Documentation/filesystems/resctrl.rst
>> +++ b/Documentation/filesystems/resctrl.rst
>> @@ -430,6 +430,14 @@ When monitoring is enabled all MON groups will also contain:
>>  	for the L3 cache they occupy). These are named "mon_sub_L3_YY"
>>  	where "YY" is the node number.
>>  
>> +	The mbm_cntr_assign mode offers "num_mbm_cntrs" number of counters
>> +	and allows users to assign a counter to mon_hw_id, event pair enabling
>> +	bandwidth monitoring for as long as the counter remains assigned.
>> +	The hardware will continue tracking the assigned mon_hw_id until
>> +	the user manually unassigns it, ensuring that counters are not reset
>> +	during this period. An MBM event returns 'Unassigned' when the event
>> +	does not have a hardware counter assigned.
> 
> (please rework based on "event" vs "group" assignment ... not intending
> that "group" assignment be documented but the "event" assignment needs
> to be accurate for "group" assignment to be a simple extension)

Sure.

> 
>> +
>>  "mon_hw_id":
>>  	Available only with debug option. The identifier used by hardware
>>  	for the monitor group. On x86 this is the RMID.
>> diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
>> index 6ed2dfd4dbbd..f6b8ad24b0b5 100644
>> --- a/fs/resctrl/ctrlmondata.c
>> +++ b/fs/resctrl/ctrlmondata.c
>> @@ -643,6 +643,18 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>>  			goto out;
>>  		}
>>  		d = container_of(hdr, struct rdt_mon_domain, hdr);
>> +
>> +		/*
>> +		 * Report 'Unassigned' if mbm_cntr_assign mode is enabled and
>> +		 * counter is unassigned.
>> +		 */
>> +		if (resctrl_arch_mbm_cntr_assign_enabled(r) &&
>> +		    resctrl_is_mbm_event(evtid) &&
>> +		    (mbm_cntr_get(r, d, rdtgrp, evtid) < 0)) {
>> +			rr.err = -ENOENT;
>> +			goto checkresult;
>> +		}
>> +
>>  		mon_event_read(&rr, r, d, rdtgrp, &d->hdr.cpu_mask, evtid, false);
>>  	}
>>  
>> @@ -652,6 +664,8 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>>  		seq_puts(m, "Error\n");
>>  	else if (rr.err == -EINVAL)
>>  		seq_puts(m, "Unavailable\n");
>> +	else if (rr.err == -ENOENT)
>> +		seq_puts(m, "Unassigned\n");
>>  	else
>>  		seq_printf(m, "%llu\n", rr.val);
>>  
> 
> It may be unexpected that this is treated as "-ENOENT" but the function returns
> success. This can be addressed with a comment when comparing the return codes to
> other hardware return codes.

Will add the comment.

> 
>> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
>> index 64ddc107fcab..0dfd2efe68fc 100644
>> --- a/fs/resctrl/internal.h
>> +++ b/fs/resctrl/internal.h
>> @@ -381,6 +381,8 @@ int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
>>  			      struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
>>  int resctrl_unassign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
>>  				struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
>> +int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
>> +		 struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
>>  
>>  #ifdef CONFIG_RESCTRL_FS_PSEUDO_LOCK
>>  int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>> index fbc938bd3b23..c98a61bde179 100644
>> --- a/fs/resctrl/monitor.c
>> +++ b/fs/resctrl/monitor.c
>> @@ -956,8 +956,8 @@ static void resctrl_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d
>>   * mbm_cntr_get() - Return the cntr_id for the matching evtid and rdtgrp in
>>   *		    cntr_cfg array.
>>   */
>> -static int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
>> -			struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
>> +int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
>> +		 struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
>>  {
>>  	int cntr_id;
>>  
>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>> index f192b2736a77..72317a5adee2 100644
>> --- a/fs/resctrl/rdtgroup.c
>> +++ b/fs/resctrl/rdtgroup.c
>> @@ -127,7 +127,7 @@ static bool resctrl_is_mbm_enabled(void)
>>  		resctrl_arch_is_mbm_local_enabled());
>>  }
>>  
>> -static bool resctrl_is_mbm_event(int e)
>> +bool resctrl_is_mbm_event(int e)
>>  {
>>  	return (e >= QOS_L3_MBM_TOTAL_EVENT_ID &&
>>  		e <= QOS_L3_MBM_LOCAL_EVENT_ID);
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index 59a4fe60ab46..f78b6064230c 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -441,6 +441,7 @@ static inline u32 resctrl_get_config_index(u32 closid,
>>  	}
>>  }
>>  
>> +bool resctrl_is_mbm_event(int e);
>>  bool resctrl_arch_get_cdp_enabled(enum resctrl_res_level l);
>>  int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
>>  
> 
> Reinette
> 
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 16/27] x86/resctrl: Pass entire struct rdtgroup rather than passing individual members
  2025-05-22 23:05   ` Reinette Chatre
@ 2025-05-29 18:07     ` Moger, Babu
  0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-05-29 18:07 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/22/25 18:05, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/15/25 3:52 PM, Babu Moger wrote:
>> The mbm_cntr_assign mode requires a cntr_id to read event data. The
> 
> cntr_id -> "counter ID"
> 

Sure.

>> cntr_id is retrieved via mbm_cntr_get, which takes a struct rdtgroup as
> 
> cntr_id -> "counter ID"
>

Sure.


> mbm_cntr_get -> mbm_cntr_get()
> 

Sure.

>> a parameter.
>>
>> Passing the full rdtgroup also provides access to closid and rmid, both of
> 
> closid -> CLOSID
> rmid -> RMID
> 

Sure.
>> which are necessary to read monitoring events.
>>
>> Refactor the code to pass the entire struct rdtgroup instead of individual
> 
> "the entire" -> "a pointer to"
> 

Sure.

>> members in preparation for this requirement.
>>
>> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> Patch looks good.-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 17/27] x86/resctrl: Add the support for reading ABMC counters
  2025-05-22 23:31   ` Reinette Chatre
@ 2025-05-29 18:25     ` Moger, Babu
  0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-05-29 18:25 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/22/25 18:31, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/15/25 3:52 PM, Babu Moger wrote:
>> Software can read the assignable counters using the QM_EVTSEL and QM_CTR
>> register pair.
> 
> Please append with more context on how register pair is used to support the
> changes in this patch.

Sure.

> 
>>
>> QM_EVTSEL Register definition:
>> =======================================================
>> Bits	Mnemonic	Description
>> =======================================================
>> 63:44	--		Reserved
>> 43:32   RMID		Resource Monitoring Identifier
>> 31	ExtEvtID	Extended Event Identifier
>> 30:8	--		Reserved
>> 7:0	EvtID		Event Identifier
>> =======================================================
>>
>> The contents of a specific counter can be read by setting the following
>> fields in QM_EVTSEL.ExtendedEvtID = 1, QM_EVTSEL.EvtID = L3CacheABMC (=1)
>> and setting [RMID] to the desired counter ID. Reading QM_CTR will then
>> return the contents of the specified counter. The E bit will be set if the
>> counter configuration was invalid, or if an invalid counter ID was set
>> in the QM_EVTSEL[RMID] field.
> 
> Please rewrite above in imperative tone.

Sure.

> 
>>
>> Introduce __cntr_id_read_phys() to read the counter ID event data.
>>
>> Link: https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/40332.pdf
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v13: Split the patch into 2. First one to handle the passing of rdtgroup structure to few
>>      functions( __mon_event_count and mbm_update(). Second one to handle ABMC counter reading.
>>      Added new function __cntr_id_read_phys() to handle ABMC event reading.
>>      Updated kernel doc for resctrl_arch_reset_rmid() and resctrl_arch_rmid_read().
>>      Resolved conflicts caused by the recent FS/ARCH code restructure.
>>      The monitor.c file has now been split between the FS and ARCH directories.
>>
>> v12: New patch to support extended event mode when ABMC is enabled.
>> ---
>>  arch/x86/kernel/cpu/resctrl/internal.h |  6 +++
>>  arch/x86/kernel/cpu/resctrl/monitor.c  | 66 ++++++++++++++++++++++----
>>  fs/resctrl/monitor.c                   | 14 ++++--
>>  include/linux/resctrl.h                |  9 ++--
>>  4 files changed, 80 insertions(+), 15 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index db6b0c28ee6b..3b0cdb5520c7 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -40,6 +40,12 @@ struct arch_mbm_state {
>>  /* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
>>  #define ABMC_ENABLE_BIT			0
>>  
>> +/*
>> + * ABMC Qos Event Identifiers.
> 
> QoS?

Sure.

> 
>> + */
>> +#define ABMC_EXTENDED_EVT_ID		BIT(31)
>> +#define ABMC_EVT_ID			1
> 
> Please use BIT(0) to be consistent.
> 
Sure.

>> +
>>  /**
>>   * struct rdt_hw_ctrl_domain - Arch private attributes of a set of CPUs that share
>>   *			       a resource for a control function
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index e31084f7babd..36a03dae6d8e 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -161,6 +161,41 @@ static int __rmid_read_phys(u32 prmid, enum resctrl_event_id eventid, u64 *val)
>>  	return 0;
>>  }
>>  
>> +static int __cntr_id_read_phys(u32 cntr_id, u64 *val)
>> +{
>> +	u64 msr_val;
>> +
>> +	/*
>> +	 * QM_EVTSEL Register definition:
>> +	 * =======================================================
>> +	 * Bits    Mnemonic        Description
>> +	 * =======================================================
>> +	 * 63:44   --              Reserved
>> +	 * 43:32   RMID            Resource Monitoring Identifier
>> +	 * 31      ExtEvtID        Extended Event Identifier
>> +	 * 30:8    --              Reserved
>> +	 * 7:0     EvtID           Event Identifier
>> +	 * =======================================================
>> +	 * The contents of a specific counter can be read by setting the
>> +	 * following fields in QM_EVTSEL.ExtendedEvtID(=1) and
>> +	 * QM_EVTSEL.EvtID = L3CacheABMC (=1) and setting [RMID] to the
>> +	 * desired counter ID. Reading QM_CTR will then return the
>> +	 * contents of the specified counter. The E bit will be set if the
>> +	 * counter configuration was invalid, or if an invalid counter ID
>> +	 * was set in the QM_EVTSEL[RMID] field.
>> +	 */
>> +	wrmsr(MSR_IA32_QM_EVTSEL, ABMC_EXTENDED_EVT_ID | ABMC_EVT_ID, cntr_id);
>> +	rdmsrl(MSR_IA32_QM_CTR, msr_val);
>> +
>> +	if (msr_val & RMID_VAL_ERROR)
>> +		return -EIO;
>> +	if (msr_val & RMID_VAL_UNAVAIL)
>> +		return -EINVAL;
>> +
>> +	*val = msr_val;
>> +	return 0;
>> +}
>> +
>>  static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_mon_domain *hw_dom,
>>  						 u32 rmid,
>>  						 enum resctrl_event_id eventid)
>> @@ -180,7 +215,7 @@ static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_mon_domain *hw_do
>>  }
>>  
>>  void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
>> -			     u32 unused, u32 rmid,
>> +			     u32 unused, u32 rmid, int cntr_id,
>>  			     enum resctrl_event_id eventid)
>>  {
>>  	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
>> @@ -192,9 +227,16 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
>>  	if (am) {
>>  		memset(am, 0, sizeof(*am));
>>  
>> -		prmid = logical_rmid_to_physical_rmid(cpu, rmid);
>> -		/* Record any initial, non-zero count value. */
>> -		__rmid_read_phys(prmid, eventid, &am->prev_msr);
>> +		if (resctrl_arch_mbm_cntr_assign_enabled(r) &&
>> +		    resctrl_is_mbm_event(eventid)) {
>> +			if (cntr_id < 0)
> 
> This would be a bug, no? how about WARN_ON_ONCE()?

Yes. Will do that.

> 
>> +				return;
>> +			__cntr_id_read_phys(cntr_id, &am->prev_msr);
>> +		} else {
>> +			prmid = logical_rmid_to_physical_rmid(cpu, rmid);
>> +			/* Record any initial, non-zero count value. */
>> +			__rmid_read_phys(prmid, eventid, &am->prev_msr);
>> +		}
>>  	}
>>  }
>>  
>> @@ -224,8 +266,8 @@ static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
>>  }
>>  
>>  int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
>> -			   u32 unused, u32 rmid, enum resctrl_event_id eventid,
>> -			   u64 *val, void *ignored)
>> +			   u32 unused, u32 rmid, int cntr_id,
>> +			   enum resctrl_event_id eventid, u64 *val, void *ignored)
>>  {
>>  	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
>>  	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>> @@ -237,8 +279,16 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
>>  
>>  	resctrl_arch_rmid_read_context_check();
>>  
>> -	prmid = logical_rmid_to_physical_rmid(cpu, rmid);
>> -	ret = __rmid_read_phys(prmid, eventid, &msr_val);
>> +	if (resctrl_arch_mbm_cntr_assign_enabled(r) &&
>> +	    resctrl_is_mbm_event(eventid)) {
>> +		if (cntr_id < 0)
> 
> WARN_ON_ONCE()?
> 

Yes.

>> +			return cntr_id;
>> +		ret = __cntr_id_read_phys(cntr_id, &msr_val);
>> +	} else {
>> +		prmid = logical_rmid_to_physical_rmid(cpu, rmid);
>> +		ret = __rmid_read_phys(prmid, eventid, &msr_val);
>> +	}
>> +
>>  	if (ret)
>>  		return ret;
>>  
>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>> index a477be9cdb66..72f3dfb5b903 100644
>> --- a/fs/resctrl/monitor.c
>> +++ b/fs/resctrl/monitor.c
>> @@ -159,7 +159,11 @@ void __check_limbo(struct rdt_mon_domain *d, bool force_free)
>>  			break;
>>  
>>  		entry = __rmid_entry(idx);
>> -		if (resctrl_arch_rmid_read(r, d, entry->closid, entry->rmid,
>> +		/*
>> +		 * cntr_id is not relevant for QOS_L3_OCCUP_EVENT_ID.
>> +		 * Pass dummy value -1.
>> +		 */
>> +		if (resctrl_arch_rmid_read(r, d, entry->closid, entry->rmid, -1,
>>  					   QOS_L3_OCCUP_EVENT_ID, &val,
>>  					   arch_mon_ctx)) {
>>  			rmid_dirty = true;
>> @@ -359,6 +363,7 @@ static struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
>>  
>>  static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
>>  {
>> +	int cntr_id = mbm_cntr_get(rr->r, rr->d, rdtgrp, rr->evtid);
> 
> So mbm_cntr_get() is called on *all* events (even non MBM) whether assignable counters
> are supported or not. I assume it relies on num_mbm_cntrs to be zero on non-ABMC systems
> but I think this needs to be explicit that mbm_cntr_get() returns -ENOENT in these cases.
> Any developer attempting to modify mbm_cntr_get() needs to be aware of this usage.
> 

Yes. Good point.

> This is quite subtle that resctrl_arch_reset_rmid() and resctrl_arch_rmid_read()
> can be called with a negative counter ID. To help with code health this needs to
> be highlighted (more later). 

Sure.

> 
>>  	int cpu = smp_processor_id();
>>  	u32 closid = rdtgrp->closid;
>>  	u32 rmid = rdtgrp->mon.rmid;
>> @@ -368,7 +373,7 @@ static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
>>  	u64 tval = 0;
>>  
>>  	if (rr->first) {
>> -		resctrl_arch_reset_rmid(rr->r, rr->d, closid, rmid, rr->evtid);
>> +		resctrl_arch_reset_rmid(rr->r, rr->d, closid, rmid, cntr_id, rr->evtid);
>>  		m = get_mbm_state(rr->d, closid, rmid, rr->evtid);
>>  		if (m)
>>  			memset(m, 0, sizeof(struct mbm_state));
>> @@ -379,7 +384,7 @@ static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
>>  		/* Reading a single domain, must be on a CPU in that domain. */
>>  		if (!cpumask_test_cpu(cpu, &rr->d->hdr.cpu_mask))
>>  			return -EINVAL;
>> -		rr->err = resctrl_arch_rmid_read(rr->r, rr->d, closid, rmid,
>> +		rr->err = resctrl_arch_rmid_read(rr->r, rr->d, closid, rmid, cntr_id,
>>  						 rr->evtid, &tval, rr->arch_mon_ctx);
>>  		if (rr->err)
>>  			return rr->err;
>> @@ -404,7 +409,8 @@ static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
>>  	list_for_each_entry(d, &rr->r->mon_domains, hdr.list) {
>>  		if (d->ci->id != rr->ci->id)
>>  			continue;
>> -		err = resctrl_arch_rmid_read(rr->r, d, closid, rmid,
>> +		cntr_id = mbm_cntr_get(rr->r, d, rdtgrp, rr->evtid);
>> +		err = resctrl_arch_rmid_read(rr->r, d, closid, rmid, cntr_id,
>>  					     rr->evtid, &tval, rr->arch_mon_ctx);
>>  		if (!err) {
>>  			rr->val += tval;
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index f78b6064230c..cd24d1577e0a 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -473,6 +473,7 @@ void resctrl_offline_cpu(unsigned int cpu);
>>   *			counter may match traffic of both @closid and @rmid, or @rmid
>>   *			only.
>>   * @rmid:		rmid of the counter to read.
>> + * @cntr_id:		cntr_id to read MBM events with mbm_cntr_assign mode.
> 
> "Counter ID used to read MBM events in mbm_cntr_evt_assign mode. Only valid when
>  mbm_cntr_evt_assign mode is enabled and @eventid is an MBM event. Can be negative
>  when invalid." (Please feel free to improve)

Looks good.

> 
>>   * @eventid:		eventid to read, e.g. L3 occupancy.
>>   * @val:		result of the counter read in bytes.
>>   * @arch_mon_ctx:	An architecture specific value from
>> @@ -490,8 +491,9 @@ void resctrl_offline_cpu(unsigned int cpu);
>>   * 0 on success, or -EIO, -EINVAL etc on error.
>>   */
>>  int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
>> -			   u32 closid, u32 rmid, enum resctrl_event_id eventid,
>> -			   u64 *val, void *arch_mon_ctx);
>> +			   u32 closid, u32 rmid, int cntr_id,
>> +			   enum resctrl_event_id eventid, u64 *val,
>> +			   void *arch_mon_ctx);
>>  
>>  /**
>>   * resctrl_arch_rmid_read_context_check()  - warn about invalid contexts
>> @@ -532,12 +534,13 @@ struct rdt_domain_hdr *resctrl_find_domain(struct list_head *h, int id,
>>   * @closid:	closid that matches the rmid. Depending on the architecture, the
>>   *		counter may match traffic of both @closid and @rmid, or @rmid only.
>>   * @rmid:	The rmid whose counter values should be reset.
>> + * @cntr_id:	The cntr_id to read MBM events with mbm_cntr_assign mode.
> 
> Same as above.
> 

Sure.

>>   * @eventid:	The eventid whose counter values should be reset.
>>   *
>>   * This can be called from any CPU.
>>   */
>>  void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
>> -			     u32 closid, u32 rmid,
>> +			     u32 closid, u32 rmid, int cntr_id,
>>  			     enum resctrl_event_id eventid);
>>  
>>  /**
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 18/27] x86/resctrl: Add definitions for MBM event configuration
  2025-05-23  4:41   ` Reinette Chatre
@ 2025-05-29 19:00     ` Moger, Babu
  2025-05-29 20:58       ` Reinette Chatre
  0 siblings, 1 reply; 114+ messages in thread
From: Moger, Babu @ 2025-05-29 19:00 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/22/25 23:41, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/15/25 3:52 PM, Babu Moger wrote:
>> The "mbm_cntr_assign" mode allows users to manually assign a hardware
>> counter to a specific RMID and event pair. The events available for
>> assignment are configurable.
>>
>> By default, each resctrl group supports two MBM events: mbm_total_bytes
>> and mbm_local_bytes. Each event corresponds to an MBM configuration that
>> specifies the bandwidth sources tracked by the event.
> 
> hmmm ... earlier I thought "bandwidth source" means RMID but here it
> seems to mean the memory transactions? The various terms are confusing.

My bad. Yes. "bandwidth source" means RMID.

I should say "memory transactions"

> 
>>
>> Add definitions of supported bandwidth sources.
> 
> changelog uses "bandwidth sources" while the comments of patch
> uses "memory transactions" ... please be consistent with terms.

Sure.

> 
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v13: Updated the changelog.
>>      Removed the definitions from resctrl_types.h and moved to internal.h.
>>      Removed mbm_assign_config definition. Configurations will be part of
>>      mon_evt list.
>>      Resolved conflicts caused by the recent FS/ARCH code restructure.
>>      The rdtgroup.c file has now been split between the FS and ARCH directories.
>>
>> v12: New patch to support event configurations via new counter_configs
>>      method.
>> ---
>>  fs/resctrl/internal.h | 10 ++++++++++
>>  fs/resctrl/rdtgroup.c | 14 ++++++++++++++
>>  2 files changed, 24 insertions(+)
>>
>> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
>> index 0dfd2efe68fc..019d00bf5adf 100644
>> --- a/fs/resctrl/internal.h
>> +++ b/fs/resctrl/internal.h
>> @@ -203,6 +203,16 @@ struct rdtgroup {
>>  	struct pseudo_lock_region	*plr;
>>  };
>>  
>> +/**
>> + * struct mbm_evt_value - Specific type of memory events.
> 
> I am trying to decipher the terminology. If these are events, then it becomes confusing
> since it becomes "these events are used to configure events". You mention "memory
> transaction" below, this sounds more accurate to me. Above could thus be:
> 
> struct mbm_evt_value - Memory transaction an MBM event can be configured with.

Sure.

> 
> The name of the struct could also do with a rename to avoid the "event" term that
> conflicts with the actual MBM events. Maybe "mbm_cfg_value" ... I do not think this
> is a good name so please consider what would work better.

I can change it to "mbm_config_value".

> 
>> + * @evt_name:		Name of memory transaction type (read, write etc).
> 
> Unclear what "type" means ... maybe just "Name of memory transaction (read, write ...)"?

sure.

> 
> The "evt_" prefix looks unnecessary.

ok

> 
>> + * @evt_val:		Value representing the memory transaction.
> 
> This could just be "val" and the description could be specific:

ok.

> 
> "The bit used to represent the memory transaction within an event's configuration."
> Please feel free to improve.

Sounds good.

> 
>> + */
>> +struct mbm_evt_value {
>> +	char    evt_name[32];
>> +	u32     evt_val;
> 
> Please space member names with TABs.

Sure.

> 
>> +};
>> +
>>  /* rdtgroup.flags */
>>  #define	RDT_DELETED		1
>>  
>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>> index 72317a5adee2..b109e91096b0 100644
>> --- a/fs/resctrl/rdtgroup.c
>> +++ b/fs/resctrl/rdtgroup.c
>> @@ -75,6 +75,20 @@ static void rdtgroup_destroy_root(void);
>>  
>>  struct dentry *debugfs_resctrl;
>>  
>> +/* Number of memory transaction types that can be monitored */
> 
> "Number of memory transactions that an MBM event can be configured with."?

Sure.

> 
>> +#define NUM_MBM_EVT_VALUES             7
>> +
>> +/* Decoded values for each type of memory events */
> 
> Please be consistent with terminology. In the above lines it switches
> between "memory transaction types" and "memory events".

"Decoded values for each type of memory transaction types"

> 
>> +struct mbm_evt_value mbm_evt_values[NUM_MBM_EVT_VALUES] = {
>> +	{"local_reads", READS_TO_LOCAL_MEM},
>> +	{"remote_reads", READS_TO_REMOTE_MEM},
>> +	{"local_non_temporal_writes", NON_TEMP_WRITE_TO_LOCAL_MEM},
>> +	{"remote_non_temporal_writes", NON_TEMP_WRITE_TO_REMOTE_MEM},
>> +	{"local_reads_slow_memory", READS_TO_LOCAL_S_MEM},
>> +	{"remote_reads_slow_memory", READS_TO_REMOTE_S_MEM},
>> +	{"dirty_victim_writes_all", DIRTY_VICTIMS_TO_ALL_MEM},
>> +};
>> +
>>  /*
>>   * Memory bandwidth monitoring event to use for the default CTRL_MON group
>>   * and each new CTRL_MON group created by the user.  Only relevant when
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 19/27] x86/resctrl: Add event configuration directory under info/L3_MON/
  2025-05-23  4:43   ` Reinette Chatre
@ 2025-05-29 19:54     ` Moger, Babu
  0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-05-29 19:54 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/22/25 23:43, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/15/25 3:52 PM, Babu Moger wrote:
>> Create the configuration directory and files for mbm_cntr_assign mode.
>> These configurations will be used to assign MBM events in mbm_cntr_assign
>> mode, with two default configurations created upon mounting.
> 
> This just jumps in with what the patch does. Requirements for proper changelog
> should be familiar by now. The changelog *always* starts with a context.
> 
> Sample:
> 
> "When assignable counters are supported the
> /sys/fs/resctrl/info/L3_MON/event_configs directory contains a sub-directory
> for each MBM event that can be assigned to a counter. The MBM event
> sub-directory contains a file named "event_filter" that is used to
> view and modify which memory transactions the MBM event is configured with.
> 
> Create the /sys/fs/resctrl/info/L3_MON/event_configs directory on resctrl
> mount and pre-populate it with directories for the two existing MBM events:
> mbm_total_bytes and mbm_local_bytes. Create the "event_filter" file within
> each MBM event directory with the needed *show() that displays the memory
> transactions with which the MBM event is configured."
> 

Looks good. Thanks. The directory name will be "event_configs". Will
change it in the code as well.

>>
>> Example:
>> $ cd /sys/fs/resctrl/
>> $ cat info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>>   local_reads, remote_reads, local_non_temporal_writes,
>>   remote_non_temporal_writes, local_reads_slow_memory,
>>   remote_reads_slow_memory, dirty_victim_writes_all
>>
>> $ cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>   local_reads, local_non_temporal_writes, local_reads_slow_memory
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v13: Updated user doc (resctrl.rst).
>>      Changed the name of the function resctrl_mkdir_info_configs to
>>      resctrl_mkdir_counter_configs().
>>      Replaced seq_puts() with seq_putc() where applicable.
>>      Removed RFTYPE_MON_CONFIG definition. Not required.
>>      Changed the name of the flag RFTYPE_CONFIG to RFTYPE_ASSIGN_CONFIG.
>>      Reinette suggested RFTYPE_MBM_EVENT_CONFIG but RFTYPE_ASSIGN_CONFIG
>>      seemed shorter and pricise.
>>      The configuration is created using evt_list.
>>      Resolved conflicts caused by the recent FS/ARCH code restructure.
>>      The monitor.c/rdtgroup.c files have been split between the FS and ARCH directories.
>>
>> v12: New patch to hold the MBM event configurations for mbm_cntr_assign mode.
>> ---
>>  Documentation/filesystems/resctrl.rst | 30 ++++++++++
>>  fs/resctrl/internal.h                 |  2 +
>>  fs/resctrl/monitor.c                  |  1 +
>>  fs/resctrl/rdtgroup.c                 | 80 +++++++++++++++++++++++++++
>>  4 files changed, 113 insertions(+)
>>
>> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
>> index 5cf2d742f04c..4eb9f007ba3d 100644
>> --- a/Documentation/filesystems/resctrl.rst
>> +++ b/Documentation/filesystems/resctrl.rst
>> @@ -306,6 +306,36 @@ with the following files:
>>  	  # cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
>>  	  0=30;1=30
>>  
>> +"counter_configs":
>> +	When the "mbm_cntr_assign" mode is supported, a dedicated directory is created
>> +	under the "L3_MON" directory to store configuration files.
> 
> ? it does not contain files but directories for each event, no?
> 
> It will help if the text is specific. For example,
> 	"event_configs":
> 	Directory that exists when mbm_cntr_evt_assign is supported. Contains sub-directory
> 	for each MBM event that can be assigned to a counter. Each MBM event
> 	sub-directory ...

Sounds good.


> 
>> +
>> +	These files contain the list of configurable events. There are two default
> 
> So confusing ... terminology is all over the place. Which files are even talked about here?
> "configurable events" ... are these the memory transactions or MBM events? 

Should be "memory trasactions"

> 
>> +	configurations: mbm_local_bytes and mbm_total_bytes.
> 
> "two default configurations"? These are not "configurations" but "events", no?

Sure. Should be "two default events"

> 
>> +
>> +	Following types of events are supported:
> 
> events -> memory transactions?

Sure.

> 
> I am unable to parse the above.

The following are the types of memory transactions that an MBM event can
be configured with:

> 
> 
>> +
>> +	==== ========================= ============================================================
>> +	Bits Name   		         Description
>> +	==== ========================= ============================================================
>> +	6    dirty_victim_writes_all     Dirty Victims from the QOS domain to all types of memory
>> +	5    remote_reads_slow_memory    Reads to slow memory in the non-local NUMA domain
>> +	4    local_reads_slow_memory     Reads to slow memory in the local NUMA domain
>> +	3    remote_non_temporal_writes  Non-temporal writes to non-local NUMA domain
>> +	2    local_non_temporal_writes   Non-temporal writes to local NUMA domain
>> +	1    remote_reads                Reads to memory in the non-local NUMA domain
>> +	0    local_reads                 Reads to memory in the local NUMA domain
>> +	==== ========================= ==========================================================
> 
> Why does user need to know the bit position used to represent the memory transaction?

Not required. Will remove it.

> 
>> +
>> +	For example::
>> +
>> +	  # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>> +	  local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
>> +	  local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all
>> +
>> +	  # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>> +	  local_reads, local_non_temporal_writes, local_reads_slow_memory
>> +
>>  "max_threshold_occupancy":
>>  		Read/write file provides the largest value (in
>>  		bytes) at which a previously used LLC_occupancy
>> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
>> index 019d00bf5adf..446cc9cc61df 100644
>> --- a/fs/resctrl/internal.h
>> +++ b/fs/resctrl/internal.h
>> @@ -238,6 +238,8 @@ struct mbm_evt_value {
>>  
>>  #define RFTYPE_DEBUG			BIT(10)
>>  
>> +#define RFTYPE_ASSIGN_CONFIG		BIT(11)
>> +
>>  #define RFTYPE_CTRL_INFO		(RFTYPE_INFO | RFTYPE_CTRL)
>>  
>>  #define RFTYPE_MON_INFO			(RFTYPE_INFO | RFTYPE_MON)
>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>> index 72f3dfb5b903..1f72249a5c93 100644
>> --- a/fs/resctrl/monitor.c
>> +++ b/fs/resctrl/monitor.c
>> @@ -932,6 +932,7 @@ int resctrl_mon_resource_init(void)
>>  					 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
>>  		resctrl_file_fflags_init("available_mbm_cntrs",
>>  					 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
>> +		resctrl_file_fflags_init("event_filter", RFTYPE_ASSIGN_CONFIG);
>>  	}
>>  
>>  	return 0;
>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>> index b109e91096b0..cf84e3a382ac 100644
>> --- a/fs/resctrl/rdtgroup.c
>> +++ b/fs/resctrl/rdtgroup.c
>> @@ -1911,6 +1911,25 @@ static int resctrl_available_mbm_cntrs_show(struct kernfs_open_file *of,
>>  	return ret;
>>  }
>>  
>> +static int event_filter_show(struct kernfs_open_file *of, struct seq_file *seq, void *v)
>> +{
>> +	struct mon_evt *mevt = rdt_kn_parent_priv(of->kn);
>> +	bool sep = false;
>> +	int i;
>> +
>> +	for (i = 0; i < NUM_MBM_EVT_VALUES; i++) {
>> +		if (mevt->evt_cfg & mbm_evt_values[i].evt_val) {
> 
> Still no idea where mevt->evt_cfg comes from. Patch ordering issue?

Yes.
Need to introduce evt_cfg member and also need to initialize the default
values during the init. Will order it correctly to make little bit clear.


> 
>> +			if (sep)
>> +				seq_putc(seq, ',');
>> +			seq_printf(seq, "%s", mbm_evt_values[i].evt_name);
>> +			sep = true;
>> +		}
>> +	}
>> +	seq_putc(seq, '\n');
>> +
>> +	return 0;
>> +}
>> +
>>  /* rdtgroup information files for one cache resource. */
>>  static struct rftype res_common_files[] = {
>>  	{
>> @@ -2035,6 +2054,12 @@ static struct rftype res_common_files[] = {
>>  		.seq_show	= mbm_local_bytes_config_show,
>>  		.write		= mbm_local_bytes_config_write,
>>  	},
>> +	{
>> +		.name		= "event_filter",
>> +		.mode		= 0444,
>> +		.kf_ops		= &rdtgroup_kf_single_ops,
>> +		.seq_show	= event_filter_show,
>> +	},
>>  	{
>>  		.name		= "mbm_assign_mode",
>>  		.mode		= 0444,
>> @@ -2317,6 +2342,55 @@ static int rdtgroup_mkdir_info_resdir(void *priv, char *name,
>>  	return ret;
>>  }
>>  
>> +static int resctrl_mkdir_counter_configs(struct rdt_resource *r, char *name)
>> +{
>> +	struct kernfs_node *l3_mon_kn, *kn_subdir, *kn_subdir2;
>> +	struct mon_evt *mevt;
>> +	int ret;
>> +
>> +	l3_mon_kn = kernfs_find_and_get(kn_info, name);
>> +	if (!l3_mon_kn)
>> +		return -ENOENT;
>> +
>> +	kn_subdir = kernfs_create_dir(l3_mon_kn, "counter_configs", l3_mon_kn->mode, NULL);
>> +	if (IS_ERR(kn_subdir)) {
>> +		kernfs_put(l3_mon_kn);
>> +		return PTR_ERR(kn_subdir);
>> +	}
>> +
>> +	ret = rdtgroup_kn_set_ugid(kn_subdir);
>> +	if (ret) {
>> +		kernfs_put(l3_mon_kn);
>> +		return ret;
>> +	}
>> +
>> +	list_for_each_entry(mevt, &r->mon.evt_list, list) {
>> +		if (mevt->mbm_mode == MBM_MODE_ASSIGN) {
> 
> I do not think this "mbm_mode" is needed, resctrl_mon::mbm_cntr_assignable is already used
> earlier, so would for_each_mbm_event() from the telemetry work be useful here?

Yes. Will remove mbm_mode and use Tony's telemetry work.

> 
>> +			kn_subdir2 = kernfs_create_dir(kn_subdir, mevt->name,
>> +						       kn_subdir->mode, mevt);
>> +			if (IS_ERR(kn_subdir2)) {
>> +				ret = PTR_ERR(kn_subdir2);
>> +				goto config_out;
> 
> "grep goto fs/resctrl/rdtgroup.c" for naming conventions.

Yes. I see it. It should be "out_config".

> 
>> +			}
>> +
>> +			ret = rdtgroup_kn_set_ugid(kn_subdir2);
>> +			if (ret)
>> +				goto config_out;
>> +
>> +			ret = rdtgroup_add_files(kn_subdir2, RFTYPE_ASSIGN_CONFIG);
>> +			if (!ret)
>> +				kernfs_activate(kn_subdir);
>> +		}
>> +	}
>> +
>> +config_out:
>> +	kernfs_put(l3_mon_kn);
>> +	if (ret)
>> +		kernfs_remove(kn_subdir);
> 
> This looks unnecessary since caller does kernfs_remove() on error return. Compare
> with how rdtgroup_mkdir_info_resdir() handles errors.

Yes. Will remove it.

> 
>> +
>> +	return ret;
>> +}
>> +
>>  static unsigned long fflags_from_resource(struct rdt_resource *r)
>>  {
>>  	switch (r->rid) {
>> @@ -2363,6 +2437,12 @@ static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
>>  		ret = rdtgroup_mkdir_info_resdir(r, name, fflags);
>>  		if (ret)
>>  			goto out_destroy;
>> +
>> +		if (r->mon.mbm_cntr_assignable) {
>> +			ret = resctrl_mkdir_counter_configs(r, name);
>> +			if (ret)
>> +				goto out_destroy;
>> +		}
>>  	}
>>  
>>  	ret = rdtgroup_kn_set_ugid(kn_info);
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 18/27] x86/resctrl: Add definitions for MBM event configuration
  2025-05-29 19:00     ` Moger, Babu
@ 2025-05-29 20:58       ` Reinette Chatre
  2025-06-03 13:41         ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-05-29 20:58 UTC (permalink / raw)
  To: babu.moger, corbet, tony.luck, tglx, mingo, bp, dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/29/25 12:00 PM, Moger, Babu wrote:
> On 5/22/25 23:41, Reinette Chatre wrote:
>> On 5/15/25 3:52 PM, Babu Moger wrote:


>>> +/**
>>> + * struct mbm_evt_value - Specific type of memory events.
>>
>> I am trying to decipher the terminology. If these are events, then it becomes confusing
>> since it becomes "these events are used to configure events". You mention "memory
>> transaction" below, this sounds more accurate to me. Above could thus be:
>>
>> struct mbm_evt_value - Memory transaction an MBM event can be configured with.
> 
> Sure.
> 
>>
>> The name of the struct could also do with a rename to avoid the "event" term that
>> conflicts with the actual MBM events. Maybe "mbm_cfg_value" ... I do not think this
>> is a good name so please consider what would work better.
> 
> I can change it to "mbm_config_value".

Looks good, thank you.

...

>>> +#define NUM_MBM_EVT_VALUES             7
>>> +
>>> +/* Decoded values for each type of memory events */
>>
>> Please be consistent with terminology. In the above lines it switches
>> between "memory transaction types" and "memory events".
> 
> "Decoded values for each type of memory transaction types"

I do not think "type" is needed twice. Could also be:
"Decoded values of each memory transaction type."

> 
>>
>>> +struct mbm_evt_value mbm_evt_values[NUM_MBM_EVT_VALUES] = {
>>> +	{"local_reads", READS_TO_LOCAL_MEM},
>>> +	{"remote_reads", READS_TO_REMOTE_MEM},
>>> +	{"local_non_temporal_writes", NON_TEMP_WRITE_TO_LOCAL_MEM},
>>> +	{"remote_non_temporal_writes", NON_TEMP_WRITE_TO_REMOTE_MEM},
>>> +	{"local_reads_slow_memory", READS_TO_LOCAL_S_MEM},
>>> +	{"remote_reads_slow_memory", READS_TO_REMOTE_S_MEM},
>>> +	{"dirty_victim_writes_all", DIRTY_VICTIMS_TO_ALL_MEM},
>>> +};


Reinette


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 20/27] x86/resctrl: Provide interface to update the event configurations
  2025-05-23  4:45   ` Reinette Chatre
@ 2025-05-29 22:35     ` Moger, Babu
  0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-05-29 22:35 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, tony.luck, tglx, mingo, bp,
	dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/22/2025 11:45 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/15/25 3:52 PM, Babu Moger wrote:
>> Users can modify the event configuration by writing to the event_filter
>> interface file. The event configurations for mbm_cntr_assign mode are
>> located in /sys/fs/resctrl/info/event_configs/.
> 
> heh ... looks like you also started thinking that "event_configs"
> is a better name (also missing L3_MON).

Interesting... 'event_configs' seems to have made its way in somehow.
Alright, I’ll go ahead and add L3_MON."

> 
>>
>> Update the assignments of all groups when the event configuration is
>> modified.
>>
>> Example:
>> $ cd /sys/fs/resctrl/
>>
>> $ cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>    local_reads,local_non_temporal_writes,local_reads_slow_memory
>>
>> $ echo "local_reads,local_non_temporal_writes" >
>>    info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>>
>> $ cat info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>>    local_reads,local_non_temporal_writes
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v13: Updated changelog for imperative mode.
>>       Added function description in the prototype.
>>       Updated the user doc resctrl.rst to address few feedback.
>>       Resolved conflicts caused by the recent FS/ARCH code restructure.
>>       The rdtgroup.c/monitor.c file has now been split between the FS and ARCH directories.
>>
>> v12: New patch to modify event configurations.
>> ---
>>   Documentation/filesystems/resctrl.rst |  12 +++
>>   fs/resctrl/rdtgroup.c                 | 120 +++++++++++++++++++++++++-
>>   2 files changed, 131 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
>> index 4eb9f007ba3d..9923276826db 100644
>> --- a/Documentation/filesystems/resctrl.rst
>> +++ b/Documentation/filesystems/resctrl.rst
> 
> ...
> 
>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>> index cf84e3a382ac..8c498b41be5d 100644
>> --- a/fs/resctrl/rdtgroup.c
>> +++ b/fs/resctrl/rdtgroup.c
>> @@ -1930,6 +1930,123 @@ static int event_filter_show(struct kernfs_open_file *of, struct seq_file *seq,
>>   	return 0;
>>   }
>>   
>> +/**
>> + * resctrl_group_assign - Update the counter assignments for the event in
>> + *			  a group.
> 
> This name is very generic with an unexpected namespace. "rdtgroup_" prefix
> is often used for a function that operates on a rdtgroup. This can thus be
> "rdtgroup_assign_cntr()".

Sure.

> 
>> + * @r:		Resource to which update needs to be done.
>> + * @rdtgrp:	Resctrl group.
>> + * @evtid:	Event ID.
>> + * @evt_cfg:	Event configuration value.
>> + */
>> +static int resctrl_group_assign(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>> +				enum resctrl_event_id evtid, u32 evt_cfg)
>> +{
>> +	struct rdt_mon_domain *d;
>> +	int cntr_id;
>> +
>> +	list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> +		cntr_id = mbm_cntr_get(r, d, rdtgrp, evtid);
>> +		if (cntr_id >= 0 && d->cntr_cfg[cntr_id].evt_cfg != evt_cfg) {
>> +			d->cntr_cfg[cntr_id].evt_cfg = evt_cfg;
>> +			resctrl_arch_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
>> +						 rdtgrp->closid, cntr_id, evt_cfg, true);
>> +		}
>> +	}
>> +
>> +	return 0;
> 
> Can just return void?

sure.

> 
>> +}
>> +
>> +/**
>> + * resctrl_update_assign - Update the counter assignments for the event for all
>> + *			   the groups.
> 
> Again very generic with "update" and "assign" that seem redundant? How about
> "resctrl_assign_cntr_allrdtgrp()"?

Yes.

> 
>> + * @r:		Resource to which update needs to be done.
>> + * @evtid:	Event ID.
>> + * @evt_cfg:	Event configuration value.
> 
> Why are both event ID and evt_cfg needed? Could just passing mon_evt simplify this?
> 
>> + */
>> +static int resctrl_update_assign(struct rdt_resource *r, enum resctrl_event_id evtid,
>> +				 u32 evt_cfg)
>> +{
>> +	struct rdtgroup *prgrp, *crgrp;
>> +
>> +	/* Check if the cntr_id is associated to the event type updated */
> 
> Comment does not match code.

Will correct it.

> 
>> +	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
>> +		resctrl_group_assign(r, prgrp, evtid, evt_cfg);
>> +
>> +		list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list, mon.crdtgrp_list) {
>> +			resctrl_group_assign(r, crgrp, evtid, evt_cfg);
>> +		}
> 
> Unnecessary braces?

Yes.

> 
>> +	}
>> +
>> +	return 0;
> 
> return void?
> 

Sure.

>> +}
>> +
>> +static int resctrl_process_configs(char *tok, u32 *val)
>> +{
>> +	char *evt_str;
>> +	bool found;
>> +	int i;
>> +
>> +next_config:
>> +	if (!tok || tok[0] == '\0')
>> +		return 0;
>> +
>> +	/* Start processing the strings for each event type */
> 
> Does comment intend to describe one iteration or all iterations?
> Also, "event type" -> "memory transaction"?

Sure.

> 
>> +	evt_str = strim(strsep(&tok, ","));
>> +	found = false;
>> +	for (i = 0; i < NUM_MBM_EVT_VALUES; i++) {
>> +		if (!strcmp(mbm_evt_values[i].evt_name, evt_str)) {
>> +			*val |=  mbm_evt_values[i].evt_val;
> 
> check spacing.

ok.

> 
>> +			found = true;
>> +			break;
>> +		}
>> +	}
>> +
>> +	if (!found) {
>> +		rdt_last_cmd_printf("Invalid event type %s\n", evt_str);
>> +		return -EINVAL;
> 
> Looks like this will return partially initialized data. Please use a local
> variable in which to gather the new configuration and only assign that
> to provided pointer on success.

Yes. Makes sense.

> 
>> +	}
>> +
>> +	goto next_config;
>> +}
>> +
>> +static ssize_t event_filter_write(struct kernfs_open_file *of, char *buf,
>> +				  size_t nbytes, loff_t off)
>> +{
>> +	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
>> +	struct mon_evt *mevt = rdt_kn_parent_priv(of->kn);
>> +	u32 evt_cfg = 0;
>> +	int ret = 0;
>> +
>> +	/* Valid input requires a trailing newline */
>> +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
>> +		return -EINVAL;
>> +
>> +	buf[nbytes - 1] = '\0';
>> +
>> +	cpus_read_lock();
>> +	mutex_lock(&rdtgroup_mutex);
>> +
>> +	rdt_last_cmd_clear();
>> +
>> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
>> +		rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
>> +		ret = -EINVAL;
>> +		goto unlock_out;
> 
> "grep goto fs/resctrl/rdtgroup.c"

Will change it to "out_unlock".

> 
>> +	}
>> +
>> +	ret = resctrl_process_configs(buf, &evt_cfg);
>> +	if (!ret && mevt->evt_val != evt_cfg) {
>> +		mevt->evt_val = evt_cfg;
> 
> ah ... here it is. hmmm ... but it is mon_evt::evt_cfg, no? ah,
> fixed in next patch.

Yea. My bad.

> 
> I still seem to be missing something because I expected mon_evt::evt_cfg
> of mbm_total_bytes and mbm_local_bytes to be initialized with a starting
> default. I missed where this is done in this series.

Its not done in this series. Will do in next revision.

> 
>> +		resctrl_update_assign(r, mevt->evtid, evt_cfg);
>> +	}
>> +
>> +unlock_out:
>> +	mutex_unlock(&rdtgroup_mutex);
>> +	cpus_read_unlock();
>> +
>> +	return ret ?: nbytes;
>> +}
>> +
>>   /* rdtgroup information files for one cache resource. */
>>   static struct rftype res_common_files[] = {
>>   	{
>> @@ -2056,9 +2173,10 @@ static struct rftype res_common_files[] = {
>>   	},
>>   	{
>>   		.name		= "event_filter",
>> -		.mode		= 0444,
>> +		.mode		= 0644,
>>   		.kf_ops		= &rdtgroup_kf_single_ops,
>>   		.seq_show	= event_filter_show,
>> +		.write		= event_filter_write,
>>   	},
>>   	{
>>   		.name		= "mbm_assign_mode",
> 
> Reinette
> 


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 21/27] x86/resctrl: Introduce mbm_assign_on_mkdir to configure assignments
  2025-05-23  4:48   ` Reinette Chatre
@ 2025-05-29 23:03     ` Moger, Babu
  2025-05-30 20:54       ` Reinette Chatre
  0 siblings, 1 reply; 114+ messages in thread
From: Moger, Babu @ 2025-05-29 23:03 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, tony.luck, tglx, mingo, bp,
	dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/22/2025 11:48 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/15/25 3:52 PM, Babu Moger wrote:
>> The mbm_cntr_assign mode provides an option to the user to assign a
>> counter to an RMID, event pair and monitor the bandwidth as long as
>> the counter is assigned.
>>
>> Introduce a configuration option to automatically assign counter IDs
> 
> "assign counter IDs" -> "assign counter IDs to <what?>"

"Introduce a configuration option to automatically assign counter IDs to 
to an RMID, event pair when a resctrl group is created, provided the 
counter IDs are available."

> 
>> when a resctrl group is created, provided the counters are available.
>> By default, this option is enabled at boot.
>>
>> Suggested-by: Peter Newman <peternewman@google.com>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v13: Added Suggested-by tag.
>>       Resolved conflicts caused by the recent FS/ARCH code restructure.
>>       The rdtgroup.c/monitor.c file has now been split between the FS and ARCH directories.
>>
>> v12: New patch. Added after the discussion on the list.
>>       https://lore.kernel.org/lkml/CALPaoCh8siZKjL_3yvOYGL4cF_n_38KpUFgHVGbQ86nD+Q2_SA@mail.gmail.com/
>> ---
>>   Documentation/filesystems/resctrl.rst | 10 ++++++
>>   fs/resctrl/monitor.c                  |  2 ++
>>   fs/resctrl/rdtgroup.c                 | 44 +++++++++++++++++++++++++--
>>   include/linux/resctrl.h               |  2 ++
>>   4 files changed, 56 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
>> index 9923276826db..356f1f918a86 100644
>> --- a/Documentation/filesystems/resctrl.rst
>> +++ b/Documentation/filesystems/resctrl.rst
>> @@ -348,6 +348,16 @@ with the following files:
>>   	  # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>>   	   local_reads, local_non_temporal_writes
>>   
>> +"mbm_assign_on_mkdir":
>> +	Automatically assign the monitoring counters on resctrl group creation
> 
> assign the monitoring counters to what?

"Automatically assign counter IDs to an RMID, event pair on resctrl 
group creation if the counter IDs are available. It is enabled by 
default on boot and users can disable by writing to the interface."

>> +	if the counters are available. It is enabled by default on boot and users
>> +	can disable by writing to the interface.
>> +	::
>> +
>> +	  # echo 0 > /sys/fs/resctrl/info/L3_MON/mbm_assign_on_mkdir
>> +	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_on_mkdir
>> +	  0
> 
> Please be explicit in docs what possible values are and what they mean.

Sure. I can print "enabled" or "disabled".

> 
>> +
>>   "max_threshold_occupancy":
>>   		Read/write file provides the largest value (in
>>   		bytes) at which a previously used LLC_occupancy
>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>> index 1f72249a5c93..5f6c4b662f3b 100644
>> --- a/fs/resctrl/monitor.c
>> +++ b/fs/resctrl/monitor.c
>> @@ -933,6 +933,8 @@ int resctrl_mon_resource_init(void)
>>   		resctrl_file_fflags_init("available_mbm_cntrs",
>>   					 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
>>   		resctrl_file_fflags_init("event_filter", RFTYPE_ASSIGN_CONFIG);
>> +		resctrl_file_fflags_init("mbm_assign_on_mkdir", RFTYPE_MON_INFO |
>> +					 RFTYPE_RES_CACHE);
>>   	}
>>   
>>   	return 0;
>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>> index 8c498b41be5d..0093b323d858 100644
>> --- a/fs/resctrl/rdtgroup.c
>> +++ b/fs/resctrl/rdtgroup.c
>> @@ -2035,8 +2035,8 @@ static ssize_t event_filter_write(struct kernfs_open_file *of, char *buf,
>>   	}
>>   
>>   	ret = resctrl_process_configs(buf, &evt_cfg);
>> -	if (!ret && mevt->evt_val != evt_cfg) {
>> -		mevt->evt_val = evt_cfg;
>> +	if (!ret && mevt->evt_cfg != evt_cfg) {
>> +		mevt->evt_cfg = evt_cfg;
>>   		resctrl_update_assign(r, mevt->evtid, evt_cfg);
>>   	}
>>   
> 
> Needs to be squashed.

Sure.

> 
>> @@ -2047,6 +2047,39 @@ static ssize_t event_filter_write(struct kernfs_open_file *of, char *buf,
>>   	return ret ?: nbytes;
>>   }
>>   
>> +static int resctrl_mbm_assign_on_mkdir_show(struct kernfs_open_file *of,
>> +					    struct seq_file *s, void *v)
>> +{
>> +	struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
>> +
>> +	seq_printf(s, "%u\n", r->mon.mbm_assign_on_mkdir);
>> +
>> +	return 0;
>> +}
>> +
>> +static ssize_t resctrl_mbm_assign_on_mkdir_write(struct kernfs_open_file *of,
>> +						 char *buf, size_t nbytes, loff_t off)
>> +{
>> +	struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
>> +	bool value;
>> +	int ret;
>> +
>> +	ret = kstrtobool(buf, &value);
>> +	if (ret)
>> +		return ret;
>> +
>> +	cpus_read_lock();
> 
> not traversing the domain list so hotplug lock not needed.

ok. Sure.

> 
>> +	mutex_lock(&rdtgroup_mutex);
> 
> rdtgroup_mutex seems only needed because the message buffer is cleared below, and this is why it
> is not required in the show()?

Hmm. I didnt think about that. Do you think it is required?

> 
>> +	rdt_last_cmd_clear();
>> +
>> +	r->mon.mbm_assign_on_mkdir = value;
>> +
>> +	mutex_unlock(&rdtgroup_mutex);
>> +	cpus_read_unlock();
>> +
>> +	return ret ?: nbytes;
>> +}
>> +
>>   /* rdtgroup information files for one cache resource. */
>>   static struct rftype res_common_files[] = {
>>   	{
>> @@ -2056,6 +2089,13 @@ static struct rftype res_common_files[] = {
>>   		.seq_show	= rdt_last_cmd_status_show,
>>   		.fflags		= RFTYPE_TOP_INFO,
>>   	},
>> +	{
>> +		.name		= "mbm_assign_on_mkdir",
>> +		.mode		= 0644,
>> +		.kf_ops		= &rdtgroup_kf_single_ops,
>> +		.seq_show	= resctrl_mbm_assign_on_mkdir_show,
>> +		.write		= resctrl_mbm_assign_on_mkdir_write,
>> +	},
>>   	{
>>   		.name		= "num_closids",
>>   		.mode		= 0444,
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index cd24d1577e0a..d6435abdde7b 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -278,6 +278,7 @@ enum resctrl_schema_fmt {
>>    *			monitoring events can be configured.
>>    * @num_mbm_cntrs:	Number of assignable monitoring counters
>>    * @mbm_cntr_assignable:Is system capable of supporting monitor assignment?
>> + * @mbm_assign_on_mkdir:Auto enable monitor assignment on mkdir?
> 
> How is "monitor assignment" different from "counter assignment"?

I should be:

"Auto enable counter ID assignment on mkdir"

> 
>>    * @evt_list:		List of monitoring events
>>    */
>>   struct resctrl_mon {
>> @@ -285,6 +286,7 @@ struct resctrl_mon {
>>   	unsigned int		mbm_cfg_mask;
>>   	int			num_mbm_cntrs;
>>   	bool			mbm_cntr_assignable;
>> +	bool			mbm_assign_on_mkdir;
>>   	struct list_head	evt_list;
>>   };
>>   
> 
> Reinette
> 

Thanks
Babu

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 23/27] x86/resctrl: Introduce mbm_L3_assignments to list assignments in a group
  2025-05-23  4:47   ` Reinette Chatre
@ 2025-05-30  0:55     ` Moger, Babu
  0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-05-30  0:55 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, tony.luck, tglx, mingo, bp,
	dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/22/2025 11:47 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/15/25 3:52 PM, Babu Moger wrote:
>> Introduce the interface to display the assignment states for each group
>> when mbm_cntr_assign mode is enabled.
>>
>> The list is displayed in the following format:
>> <Event configuration>:<Domain id>=<Assignment type>
> 
> Should this just be <Event>? The information is just the event name, not
> its configuration that will be in the "event_filter" file.

Yes. Sure.

> 
>>
>> Event configuration: A valid event configuration listed in the
>> /sys/fs/resctrl/info/L3_MON/counter_configs directory.
>>
>> Domain ID: A valid domain ID number.
>>
>> The assignment type can be one of the following:
>>
>> _ : No event configuration assigned
>>
>> e : Event configuration assigned in exclusive mode
>>

This needs to change as well based on comments below(resctrl.rst).
This is note for me.

>> Example:
>> $cd /sys/fs/resctrl
>> $cat mbm_L3_assignments
>> mbm_total_bytes:0=e;1=e
>> mbm_local_bytes:0=e;1=e
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v13: Changelog update.
>>       Few changes in mbm_L3_assignments_show() after moving the event config to evt_list.
>>       Resolved conflicts caused by the recent FS/ARCH code restructure.
>>       The rdtgroup.c/monitor.c files have been split between the FS and ARCH directories.
>>
>> v12: New patch:
>>       Assignment interface moved inside the group based the discussion
>>       https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/#t
>> ---
>>   Documentation/filesystems/resctrl.rst | 28 +++++++++++++++
>>   fs/resctrl/monitor.c                  |  1 +
>>   fs/resctrl/rdtgroup.c                 | 52 +++++++++++++++++++++++++++
>>   3 files changed, 81 insertions(+)
>>
>> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
>> index 356f1f918a86..2350c1f21f4e 100644
>> --- a/Documentation/filesystems/resctrl.rst
>> +++ b/Documentation/filesystems/resctrl.rst
>> @@ -504,6 +504,34 @@ When the "mba_MBps" mount option is used all CTRL_MON groups will also contain:
>>   	/sys/fs/resctrl/info/L3_MON/mon_features changes the input
>>   	event.
>>   
>> +"mbm_L3_assignments":
>> +	This interface file is created when the mbm_cntr_assign mode is supported
> 
> "This interface file is created when" -> "Exists when mbm_cntr_assign mode is supported"?

Sure,

> 
>> +	and shows the assignment status for each group.
> 
> This doc is in the portion documenting files in monitor groups. So rather:
> "the assignment status for each group" -> "the counter assignment status for the MON group"?

ok.

> 
>> +
>> +	The assignment list is displayed in the following format:
>> +
>> +	<Event configuration>:<Domain id>=<Assignment type>
> 
> <Event configuration> -> <Event>
> 
Ok.

>> +
>> +	Event configuration: A valid event configuration listed in the
> 
> "A valid event in the /sys/fs/resctrl/info/L3_MON/event_configs directory"

Sure.

> 
>> +	/sys/fs/resctrl/info/L3_MON/counter_configs directory.
>> +
>> +	Domain ID: A valid domain ID number.
> 
> "A valid domain ID"
> 
>> +
>> +	Assignment types:
>> +
>> +	_ : No event configuration assigned
> 
> hmmm ... since the line has event as first field, would this not reflect the
> counter? That is "No counter assigned"

Sure.

> 
>> +
>> +	e : Event configuration assigned in exclusive mode
> 
> "Counter assigned exclusively"? (with exclusive defined somewhere)

Sure.

> 
>> +
>> +	Example:
>> +	To list the assignment states for the default group.
> 
> "the counter assignment states"?

ok

> 
>> +	::
>> +
>> +	  # cd /sys/fs/resctrl
>> +	  # cat mbm_L3_assignments
>> +	    mbm_total_bytes:0=e;1=e
>> +	    mbm_local_bytes:0=e;1=e
>> +
>>   Resource allocation rules
>>   -------------------------
>>   
>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>> index 5f6c4b662f3b..b982540ce4e3 100644
>> --- a/fs/resctrl/monitor.c
>> +++ b/fs/resctrl/monitor.c
>> @@ -935,6 +935,7 @@ int resctrl_mon_resource_init(void)
>>   		resctrl_file_fflags_init("event_filter", RFTYPE_ASSIGN_CONFIG);
>>   		resctrl_file_fflags_init("mbm_assign_on_mkdir", RFTYPE_MON_INFO |
>>   					 RFTYPE_RES_CACHE);
>> +		resctrl_file_fflags_init("mbm_L3_assignments", RFTYPE_MON_BASE);
>>   	}
>>   
>>   	return 0;
>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>> index 931ea355f159..8d970b99bbbd 100644
>> --- a/fs/resctrl/rdtgroup.c
>> +++ b/fs/resctrl/rdtgroup.c
>> @@ -2080,6 +2080,52 @@ static ssize_t resctrl_mbm_assign_on_mkdir_write(struct kernfs_open_file *of,
>>   	return ret ?: nbytes;
>>   }
>>   
>> +static int mbm_L3_assignments_show(struct kernfs_open_file *of, struct seq_file *s, void *v)
>> +{
>> +	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
>> +	struct rdt_mon_domain *d;
>> +	struct rdtgroup *rdtgrp;
>> +	struct mon_evt *mevt;
>> +	int ret = 0;
>> +	bool sep;
>> +
>> +	rdtgrp = rdtgroup_kn_lock_live(of->kn);
>> +	if (!rdtgrp)
>> +		return -ENOENT;
> 
> Missing a rdtgroup_kn_unlock()?

Yes. Will add it.

> 
>> +
>> +	rdt_last_cmd_clear();
>> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
>> +		rdt_last_cmd_puts("mbm_cntr_assign mode not enabled\n");
>> +		ret = -ENOENT;
>> +		goto assign_out;
> 
> grep goto fs/resctrl/rdtgroup.c

Will change it to "out_assign".

> 
>> +	}
>> +
>> +	list_for_each_entry(mevt, &r->mon.evt_list, list) {
> 
> can use for_each_mbm_event() and then below will not be needed?

Yes.

> 
>> +		if (mevt->mbm_mode != MBM_MODE_ASSIGN)
>> +			continue;
>> +
>> +		sep = false;
>> +		seq_printf(s, "%s:", mevt->name);
>> +		list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> +			if (sep)
>> +				seq_putc(s, ';');
>> +
>> +			if (mbm_cntr_get(r, d, rdtgrp, mevt->evtid) >= 0)
>> +				seq_printf(s, "%d=e", d->hdr.id);
>> +			else
>> +				seq_printf(s, "%d=_", d->hdr.id);
>> +
>> +			sep = true;
>> +		}
>> +		seq_putc(s, '\n');
>> +	}
>> +
>> +assign_out:
>> +	rdtgroup_kn_unlock(of->kn);
>> +
>> +	return ret;
>> +}
>> +
>>   /* rdtgroup information files for one cache resource. */
>>   static struct rftype res_common_files[] = {
>>   	{
>> @@ -2218,6 +2264,12 @@ static struct rftype res_common_files[] = {
>>   		.seq_show	= event_filter_show,
>>   		.write		= event_filter_write,
>>   	},
>> +	{
>> +		.name		= "mbm_L3_assignments",
>> +		.mode		= 0444,
>> +		.kf_ops		= &rdtgroup_kf_single_ops,
>> +		.seq_show	= mbm_L3_assignments_show,
>> +	},
>>   	{
>>   		.name		= "mbm_assign_mode",
>>   		.mode		= 0444,
> 
> Reinette
> 

Thanks
Babu

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 21/27] x86/resctrl: Introduce mbm_assign_on_mkdir to configure assignments
  2025-05-29 23:03     ` Moger, Babu
@ 2025-05-30 20:54       ` Reinette Chatre
  2025-06-03 14:00         ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-05-30 20:54 UTC (permalink / raw)
  To: Moger, Babu, Babu Moger, corbet, tony.luck, tglx, mingo, bp,
	dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Babu,

On 5/29/25 4:03 PM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 5/22/2025 11:48 PM, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 5/15/25 3:52 PM, Babu Moger wrote:
>>> The mbm_cntr_assign mode provides an option to the user to assign a
>>> counter to an RMID, event pair and monitor the bandwidth as long as
>>> the counter is assigned.
>>>
>>> Introduce a configuration option to automatically assign counter IDs
>>
>> "assign counter IDs" -> "assign counter IDs to <what?>"
> 
> "Introduce a configuration option to automatically assign counter IDs to to an RMID, event pair when a resctrl group is created, provided the counter IDs are available."

Stating that "counter IDs" (plural) are assigned to "an RMID, event pair" (singular)
can be confusing.

How about something like (please feel free to improve):
"Introduce a user-configurable option that determines if a counter will automatically
be assigned to an RMID, event pair when its associated monitor group is created via mkdir."


> 
>>
>>> when a resctrl group is created, provided the counters are available.
>>> By default, this option is enabled at boot.
>>>
>>> Suggested-by: Peter Newman <peternewman@google.com>
>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>>> ---
>>> v13: Added Suggested-by tag.
>>>       Resolved conflicts caused by the recent FS/ARCH code restructure.
>>>       The rdtgroup.c/monitor.c file has now been split between the FS and ARCH directories.
>>>
>>> v12: New patch. Added after the discussion on the list.
>>>       https://lore.kernel.org/lkml/CALPaoCh8siZKjL_3yvOYGL4cF_n_38KpUFgHVGbQ86nD+Q2_SA@mail.gmail.com/
>>> ---
>>>   Documentation/filesystems/resctrl.rst | 10 ++++++
>>>   fs/resctrl/monitor.c                  |  2 ++
>>>   fs/resctrl/rdtgroup.c                 | 44 +++++++++++++++++++++++++--
>>>   include/linux/resctrl.h               |  2 ++
>>>   4 files changed, 56 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
>>> index 9923276826db..356f1f918a86 100644
>>> --- a/Documentation/filesystems/resctrl.rst
>>> +++ b/Documentation/filesystems/resctrl.rst
>>> @@ -348,6 +348,16 @@ with the following files:
>>>         # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>>>          local_reads, local_non_temporal_writes
>>>   +"mbm_assign_on_mkdir":
>>> +    Automatically assign the monitoring counters on resctrl group creation
>>
>> assign the monitoring counters to what?
> 
> "Automatically assign counter IDs to an RMID, event pair on resctrl group creation if the counter IDs are available. It is enabled by default on boot and users can disable by writing to the interface."

Same here, please take care with the plural/singular usage.

> 
>>> +    if the counters are available. It is enabled by default on boot and users
>>> +    can disable by writing to the interface.
>>> +    ::
>>> +
>>> +      # echo 0 > /sys/fs/resctrl/info/L3_MON/mbm_assign_on_mkdir
>>> +      # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_on_mkdir
>>> +      0
>>
>> Please be explicit in docs what possible values are and what they mean.
> 
> Sure. I can print "enabled" or "disabled".

I am not requesting a change in user interface self but instead clear documentation about
what the input/output values mean. Even if the interface changes to "enabled"/"disabled"
I assume the interface will still accept boolean values? Compare to the "sparse_masks"
documentation on how the possible values are explicitly documented.

...

>>> +static ssize_t resctrl_mbm_assign_on_mkdir_write(struct kernfs_open_file *of,
>>> +                         char *buf, size_t nbytes, loff_t off)
>>> +{
>>> +    struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
>>> +    bool value;
>>> +    int ret;
>>> +
>>> +    ret = kstrtobool(buf, &value);
>>> +    if (ret)
>>> +        return ret;
>>> +
>>> +    cpus_read_lock();
>>
>> not traversing the domain list so hotplug lock not needed.
> 
> ok. Sure.
> 
>>
>>> +    mutex_lock(&rdtgroup_mutex);
>>
>> rdtgroup_mutex seems only needed because the message buffer is cleared below, and this is why it
>> is not required in the show()?
> 
> Hmm. I didnt think about that. Do you think it is required?

It is certainly required to be able to call rdt_last_cmd_clear() and since it then
covers mbm_assign_on_mkdir I would prefer symmetry in consistently acquiring
rdtgroup_mutex on both read and write while resctrl is mounted. Note that
there is also other read usage on resctrl mount that is done with
mutex held. Having the mutex acquired consistently will help to keep things
simple.

...

>>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>>> index cd24d1577e0a..d6435abdde7b 100644
>>> --- a/include/linux/resctrl.h
>>> +++ b/include/linux/resctrl.h
>>> @@ -278,6 +278,7 @@ enum resctrl_schema_fmt {
>>>    *            monitoring events can be configured.
>>>    * @num_mbm_cntrs:    Number of assignable monitoring counters
>>>    * @mbm_cntr_assignable:Is system capable of supporting monitor assignment?
>>> + * @mbm_assign_on_mkdir:Auto enable monitor assignment on mkdir?
>>
>> How is "monitor assignment" different from "counter assignment"?
> 
> I should be:
> 
> "Auto enable counter ID assignment on mkdir"

hmmm ... I do not think this is about "Auto enable".
How about something like "Automatic counter assignment during monitor group create via mkdir?"
or "True if counters should automatically be assigned to MBM events of monitor groups
created via mkdir."

Reinette

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 18/27] x86/resctrl: Add definitions for MBM event configuration
  2025-05-29 20:58       ` Reinette Chatre
@ 2025-06-03 13:41         ` Moger, Babu
  0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-06-03 13:41 UTC (permalink / raw)
  To: Reinette Chatre, babu.moger, corbet, tony.luck, tglx, mingo, bp,
	dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/29/2025 3:58 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/29/25 12:00 PM, Moger, Babu wrote:
>> On 5/22/25 23:41, Reinette Chatre wrote:
>>> On 5/15/25 3:52 PM, Babu Moger wrote:
> 
> 
>>>> +/**
>>>> + * struct mbm_evt_value - Specific type of memory events.
>>>
>>> I am trying to decipher the terminology. If these are events, then it becomes confusing
>>> since it becomes "these events are used to configure events". You mention "memory
>>> transaction" below, this sounds more accurate to me. Above could thus be:
>>>
>>> struct mbm_evt_value - Memory transaction an MBM event can be configured with.
>>
>> Sure.
>>
>>>
>>> The name of the struct could also do with a rename to avoid the "event" term that
>>> conflicts with the actual MBM events. Maybe "mbm_cfg_value" ... I do not think this
>>> is a good name so please consider what would work better.
>>
>> I can change it to "mbm_config_value".
> 
> Looks good, thank you.
> 
> ...
> 
>>>> +#define NUM_MBM_EVT_VALUES             7
>>>> +
>>>> +/* Decoded values for each type of memory events */
>>>
>>> Please be consistent with terminology. In the above lines it switches
>>> between "memory transaction types" and "memory events".
>>
>> "Decoded values for each type of memory transaction types"
> 
> I do not think "type" is needed twice. Could also be:
> "Decoded values of each memory transaction type."

Sure.

Thanks
Babu


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 21/27] x86/resctrl: Introduce mbm_assign_on_mkdir to configure assignments
  2025-05-30 20:54       ` Reinette Chatre
@ 2025-06-03 14:00         ` Moger, Babu
  0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-06-03 14:00 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, tony.luck, tglx, mingo, bp,
	dave.hansen
  Cc: james.morse, dave.martin, fenghuay, x86, hpa, paulmck, akpm,
	thuth, rostedt, ardb, gregkh, daniel.sneddon, jpoimboe,
	alexandre.chartre, pawan.kumar.gupta, thomas.lendacky, perry.yuan,
	seanjc, kai.huang, xiaoyao.li, kan.liang, xin3.li, ebiggers, xin,
	sohil.mehta, andrew.cooper3, mario.limonciello, linux-doc,
	linux-kernel, peternewman, maciej.wieczor-retman, eranian,
	Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/30/2025 3:54 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/29/25 4:03 PM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 5/22/2025 11:48 PM, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 5/15/25 3:52 PM, Babu Moger wrote:
>>>> The mbm_cntr_assign mode provides an option to the user to assign a
>>>> counter to an RMID, event pair and monitor the bandwidth as long as
>>>> the counter is assigned.
>>>>
>>>> Introduce a configuration option to automatically assign counter IDs
>>>
>>> "assign counter IDs" -> "assign counter IDs to <what?>"
>>
>> "Introduce a configuration option to automatically assign counter IDs to to an RMID, event pair when a resctrl group is created, provided the counter IDs are available."
> 
> Stating that "counter IDs" (plural) are assigned to "an RMID, event pair" (singular)
> can be confusing.
> 
> How about something like (please feel free to improve):
> "Introduce a user-configurable option that determines if a counter will automatically
> be assigned to an RMID, event pair when its associated monitor group is created via mkdir."

Sure.

> 
> 
>>
>>>
>>>> when a resctrl group is created, provided the counters are available.
>>>> By default, this option is enabled at boot.
>>>>
>>>> Suggested-by: Peter Newman <peternewman@google.com>
>>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>>>> ---
>>>> v13: Added Suggested-by tag.
>>>>        Resolved conflicts caused by the recent FS/ARCH code restructure.
>>>>        The rdtgroup.c/monitor.c file has now been split between the FS and ARCH directories.
>>>>
>>>> v12: New patch. Added after the discussion on the list.
>>>>        https://lore.kernel.org/lkml/CALPaoCh8siZKjL_3yvOYGL4cF_n_38KpUFgHVGbQ86nD+Q2_SA@mail.gmail.com/
>>>> ---
>>>>    Documentation/filesystems/resctrl.rst | 10 ++++++
>>>>    fs/resctrl/monitor.c                  |  2 ++
>>>>    fs/resctrl/rdtgroup.c                 | 44 +++++++++++++++++++++++++--
>>>>    include/linux/resctrl.h               |  2 ++
>>>>    4 files changed, 56 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
>>>> index 9923276826db..356f1f918a86 100644
>>>> --- a/Documentation/filesystems/resctrl.rst
>>>> +++ b/Documentation/filesystems/resctrl.rst
>>>> @@ -348,6 +348,16 @@ with the following files:
>>>>          # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>>>>           local_reads, local_non_temporal_writes
>>>>    +"mbm_assign_on_mkdir":
>>>> +    Automatically assign the monitoring counters on resctrl group creation
>>>
>>> assign the monitoring counters to what?
>>
>> "Automatically assign counter IDs to an RMID, event pair on resctrl group creation if the counter IDs are available. It is enabled by default on boot and users can disable by writing to the interface."
> 
> Same here, please take care with the plural/singular usage.

Sure.

> 
>>
>>>> +    if the counters are available. It is enabled by default on boot and users
>>>> +    can disable by writing to the interface.
>>>> +    ::
>>>> +
>>>> +      # echo 0 > /sys/fs/resctrl/info/L3_MON/mbm_assign_on_mkdir
>>>> +      # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_on_mkdir
>>>> +      0
>>>
>>> Please be explicit in docs what possible values are and what they mean.
>>
>> Sure. I can print "enabled" or "disabled".
> 
> I am not requesting a change in user interface self but instead clear documentation about
> what the input/output values mean. Even if the interface changes to "enabled"/"disabled"
> I assume the interface will still accept boolean values? Compare to the "sparse_masks"
> documentation on how the possible values are explicitly documented.
> 

ok. Will look into that.

> ...
> 
>>>> +static ssize_t resctrl_mbm_assign_on_mkdir_write(struct kernfs_open_file *of,
>>>> +                         char *buf, size_t nbytes, loff_t off)
>>>> +{
>>>> +    struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
>>>> +    bool value;
>>>> +    int ret;
>>>> +
>>>> +    ret = kstrtobool(buf, &value);
>>>> +    if (ret)
>>>> +        return ret;
>>>> +
>>>> +    cpus_read_lock();
>>>
>>> not traversing the domain list so hotplug lock not needed.
>>
>> ok. Sure.
>>
>>>
>>>> +    mutex_lock(&rdtgroup_mutex);
>>>
>>> rdtgroup_mutex seems only needed because the message buffer is cleared below, and this is why it
>>> is not required in the show()?
>>
>> Hmm. I didnt think about that. Do you think it is required?
> 
> It is certainly required to be able to call rdt_last_cmd_clear() and since it then
> covers mbm_assign_on_mkdir I would prefer symmetry in consistently acquiring
> rdtgroup_mutex on both read and write while resctrl is mounted. Note that
> there is also other read usage on resctrl mount that is done with
> mutex held. Having the mutex acquired consistently will help to keep things
> simple.
> 

Sure. will do.

> ...
> 
>>>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>>>> index cd24d1577e0a..d6435abdde7b 100644
>>>> --- a/include/linux/resctrl.h
>>>> +++ b/include/linux/resctrl.h
>>>> @@ -278,6 +278,7 @@ enum resctrl_schema_fmt {
>>>>     *            monitoring events can be configured.
>>>>     * @num_mbm_cntrs:    Number of assignable monitoring counters
>>>>     * @mbm_cntr_assignable:Is system capable of supporting monitor assignment?
>>>> + * @mbm_assign_on_mkdir:Auto enable monitor assignment on mkdir?
>>>
>>> How is "monitor assignment" different from "counter assignment"?
>>
>> I should be:
>>
>> "Auto enable counter ID assignment on mkdir"
> 
> hmmm ... I do not think this is about "Auto enable".
> How about something like "Automatic counter assignment during monitor group create via mkdir?"
> or "True if counters should automatically be assigned to MBM events of monitor groups
> created via mkdir."

Sure. Thanks
Babu

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: RE: [PATCH v13 11/27] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
  2025-05-28 21:41             ` Moger, Babu
  2025-05-28 22:00               ` Luck, Tony
@ 2025-06-09 14:01               ` Moger, Babu
  1 sibling, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-06-09 14:01 UTC (permalink / raw)
  To: Moger, Babu, Luck, Tony, Peter Newman
  Cc: Chatre, Reinette, corbet@lwn.net, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	james.morse@arm.com, dave.martin@arm.com, fenghuay@nvidia.com,
	x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
	akpm@linux-foundation.org, thuth@redhat.com, rostedt@goodmis.org,
	ardb@kernel.org, gregkh@linuxfoundation.org,
	daniel.sneddon@linux.intel.com, jpoimboe@kernel.org,
	alexandre.chartre@oracle.com, pawan.kumar.gupta@linux.intel.com,
	thomas.lendacky@amd.com, perry.yuan@amd.com, seanjc@google.com,
	Huang, Kai, Li, Xiaoyao, kan.liang@linux.intel.com, Li, Xin3,
	ebiggers@google.com, xin@zytor.com, Mehta, Sohil,
	andrew.cooper3@citrix.com, mario.limonciello@amd.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Wieczor-Retman, Maciej, Eranian, Stephane, Xiaojian.Du@amd.com,
	gautham.shenoy@amd.com

Hi Tony,

On 5/28/25 16:41, Moger, Babu wrote:
> Hi Tony, Peter,
> 
> On 5/27/2025 4:41 PM, Luck, Tony wrote:
>>
>>> Thanks for applying my suggestion[1] about the array entry sizes, but
>>> you needed one more dereference:
>>
>>> -       size_t tsize = sizeof(hw_dom->arch_mbm_states[0]);
>>> +       size_t tsize = sizeof(*hw_dom->arch_mbm_states[0]);
>>
>>> -       size_t tsize = sizeof(d->mbm_states[0]);
>>> +       size_t tsize = sizeof(*d->mbm_states[0]);
>>
>> Indeed yes. Thanks.
>>
> 
> Tony, Thanks for porting patches.
> 
> I can actually pick your branch [1] and apply review comments on top for
> v14 series. Hope that is fine with everyone.
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git/log/?h=my_mbm_plus_babu_abmc
> 
> One question though: Where will the Peter's fix [2] go?
> [2]
> https://lore.kernel.org/lkml/CALPaoCj7FBv_vfDp+4tgqo4p8T7Eov_Ys+CQRoAX6u43a4OTDQ@mail.gmail.com/
> 
> thanks
> Babu
> 
> 

I'm currently working on v14 and plan to post the updated ABMC series
tomorrow. I've used your multi-event support patches as the base:

    x86, fs/resctrl: Consolidate monitor event descriptions

    x86, fs/resctrl: Replace architecture event enabled checks

    x86/resctrl: Remove 'rdt_mon_features' global variable

    x86, fs/resctrl: Prepare for more monitor events

I noticed there are a few comments on your series here:
https://lore.kernel.org/lkml/20250521225049.132551-1-tony.luck@intel.com/

Let me know if you've updated the patches. If so, I’ll incorporate the
latest version. Otherwise, I’ll proceed with the current base as-is.
-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2025-05-22 16:33                     ` Reinette Chatre
  2025-05-22 19:15                       ` Moger, Babu
@ 2025-06-10 23:19                       ` Moger, Babu
  2025-06-11 18:29                         ` Reinette Chatre
  1 sibling, 1 reply; 114+ messages in thread
From: Moger, Babu @ 2025-06-10 23:19 UTC (permalink / raw)
  To: Reinette Chatre, babu.moger, Peter Newman
  Cc: corbet, tony.luck, tglx, mingo, bp, dave.hansen, james.morse,
	dave.martin, fenghuay, x86, hpa, paulmck, akpm, thuth, rostedt,
	ardb, gregkh, daniel.sneddon, jpoimboe, alexandre.chartre,
	pawan.kumar.gupta, thomas.lendacky, perry.yuan, seanjc, kai.huang,
	xiaoyao.li, kan.liang, xin3.li, ebiggers, xin, sohil.mehta,
	andrew.cooper3, mario.limonciello, linux-doc, linux-kernel,
	maciej.wieczor-retman, eranian, Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 5/22/2025 11:33 AM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/22/25 8:44 AM, Moger, Babu wrote:
>> On 5/21/25 18:03, Reinette Chatre wrote:
> 
> ...
> 
>>> This is why I proposed in [3] that the name of the mode reflects how user can interact
>>> with the system. Instead of one "mbm_cntr_assign" mode there can be "mbm_cntr_event_assign"
>>> that is used for ABMC and "mbm_cntr_group_assign" that is used for soft-ABMC. The mode should
>>> make it clear what the system is capable of wrt counter assignments.
>>
>> Yes, that makes sense. Perhaps we can also simplify it further:
>>
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode:
>> [mbm_cntr_evt_assign] <- for ABMC
>>   mbm_cntr_grp_assign  <- for soft-ABMC
> 
> Looks good to me. Thank you.

I am actually ready with v14 series. I have good feeling that we are 
getting closer to making these changes final.

So, Looking back again, it might make more sense to rename few user 
visible interfaces.

1. # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode.
    [mbm_assign_event] <- for ABMC
     mbm_assign_group  <- for soft-ABMC

    This looks much more cleaner.  It matches with "mbm_assign_mode"

Similarly, we can rename few functions and variable names to make little 
more readable.

2. mbm_cntr_assignable -> mbm_assignable

3. resctrl_arch_mbm_cntr_assign_enabled
  -> >resctrl_arch_mbm_assign_enabled

4. mbm_cntr_assign_enabled -> mbm_assign_enabled

5. resctrl_arch_mbm_cntr_assign_set_one ->

    resctrl_arch_mbm_assign_set_one.

6. There will few more functions. I will look into that if you agree 
with approach.

7. No need to change few of these below. These are related to actual 
counters.
    num_mbm_cntrs
    available_mbm_cntrs

What do you think?

Thanks
Babu Moger



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2025-06-10 23:19                       ` Moger, Babu
@ 2025-06-11 18:29                         ` Reinette Chatre
  2025-06-11 21:21                           ` Moger, Babu
  0 siblings, 1 reply; 114+ messages in thread
From: Reinette Chatre @ 2025-06-11 18:29 UTC (permalink / raw)
  To: Moger, Babu, babu.moger, Peter Newman
  Cc: corbet, tony.luck, tglx, mingo, bp, dave.hansen, james.morse,
	dave.martin, fenghuay, x86, hpa, paulmck, akpm, thuth, rostedt,
	ardb, gregkh, daniel.sneddon, jpoimboe, alexandre.chartre,
	pawan.kumar.gupta, thomas.lendacky, perry.yuan, seanjc, kai.huang,
	xiaoyao.li, kan.liang, xin3.li, ebiggers, xin, sohil.mehta,
	andrew.cooper3, mario.limonciello, linux-doc, linux-kernel,
	maciej.wieczor-retman, eranian, Xiaojian.Du, gautham.shenoy

Hi Babu,

On 6/10/25 4:19 PM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 5/22/2025 11:33 AM, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 5/22/25 8:44 AM, Moger, Babu wrote:
>>> On 5/21/25 18:03, Reinette Chatre wrote:
>>
>> ...
>>
>>>> This is why I proposed in [3] that the name of the mode reflects how user can interact
>>>> with the system. Instead of one "mbm_cntr_assign" mode there can be "mbm_cntr_event_assign"
>>>> that is used for ABMC and "mbm_cntr_group_assign" that is used for soft-ABMC. The mode should
>>>> make it clear what the system is capable of wrt counter assignments.
>>>
>>> Yes, that makes sense. Perhaps we can also simplify it further:
>>>
>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode:
>>> [mbm_cntr_evt_assign] <- for ABMC
>>>   mbm_cntr_grp_assign  <- for soft-ABMC
>>
>> Looks good to me. Thank you.
> 
> I am actually ready with v14 series. I have good feeling that we are getting closer to making these changes final.
> 
> So, Looking back again, it might make more sense to rename few user visible interfaces.
> 
> 1. # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode.
>    [mbm_assign_event] <- for ABMC
>     mbm_assign_group  <- for soft-ABMC
> 
>    This looks much more cleaner.  It matches with "mbm_assign_mode"

ah, I see, by dropping "cntr" it reduces confusion where ABMC assigns counters
and soft-ABMC assigned RMID. This looks good.

Taking this further, the "assign" term in "mbm_assign_event" and "mbm_assign_group" may also
be redundant considering that the filename, "mbm_assign_mode", already has "assign" in its name.

> 
> Similarly, we can rename few functions and variable names to make little more readable.
> 
> 2. mbm_cntr_assignable -> mbm_assignable
> 

I have no insight into how the soft-ABMC implementation will look and thus if it will
build on this property. If soft-ABMC uses the property then making it more generic may
help, but if it does not then it may make the code harder to read. Since this is all
internal I'd vote for keeping it mbm_cntr_assignable since the current implementation
directly associates it with hardware counters. I do not know if there will be a scenario
where a system may support *both* event and group assignable counters. The idea did
briefly come up[1]. If that may be possible then resctrl would need to distinguish them.
Also, interesting to note that the example used in (1) above notes a system that
supports both event and group assignment.

> 3. resctrl_arch_mbm_cntr_assign_enabled
>  -> >resctrl_arch_mbm_assign_enabled
> 

This is directly connected to choice for (2)

> 4. mbm_cntr_assign_enabled -> mbm_assign_enabled

hmmm ... here mbm_cntr_assign_enabled is even more directly associated with hardware
support for counter assignment. It is not clear what the benefit is to make it generic.

> 
> 5. resctrl_arch_mbm_cntr_assign_set_one ->
> 
>    resctrl_arch_mbm_assign_set_one.

Same as (4)

> 
> 6. There will few more functions. I will look into that if you agree with approach.
> 
> 7. No need to change few of these below. These are related to actual counters.
>    num_mbm_cntrs
>    available_mbm_cntrs
> 
> What do you think?

It sounds to me as though you are aiming to make the ABMC implementation more
generic in preparation for soft-ABMC support. If you have insight into the soft-ABMC
implementation then please share the details for this to be taken into account.
Until then I think it will be simpler for the implementation to be specific to
the feature being enabled here. When soft-ABMC enabling arrives the needed changes
can be made. Since this is about internals of resctrl (not the user interface) we
are not as pressured to "get it right" while not having all information required
to make these choices.

Reinette
 
[1] https://lore.kernel.org/lkml/CALPaoCj438UfH3QA_VnGo-pj2a_48sJufUWjBKT3MQatcMJ_Uw@mail.gmail.com/

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2025-06-11 18:29                         ` Reinette Chatre
@ 2025-06-11 21:21                           ` Moger, Babu
  0 siblings, 0 replies; 114+ messages in thread
From: Moger, Babu @ 2025-06-11 21:21 UTC (permalink / raw)
  To: Reinette Chatre, babu.moger, Peter Newman
  Cc: corbet, tony.luck, tglx, mingo, bp, dave.hansen, james.morse,
	dave.martin, fenghuay, x86, hpa, paulmck, akpm, thuth, rostedt,
	ardb, gregkh, daniel.sneddon, jpoimboe, alexandre.chartre,
	pawan.kumar.gupta, thomas.lendacky, perry.yuan, seanjc, kai.huang,
	xiaoyao.li, kan.liang, xin3.li, ebiggers, xin, sohil.mehta,
	andrew.cooper3, mario.limonciello, linux-doc, linux-kernel,
	maciej.wieczor-retman, eranian, Xiaojian.Du, gautham.shenoy

Hi Reinette,

On 6/11/2025 1:29 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 6/10/25 4:19 PM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 5/22/2025 11:33 AM, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 5/22/25 8:44 AM, Moger, Babu wrote:
>>>> On 5/21/25 18:03, Reinette Chatre wrote:
>>>
>>> ...
>>>
>>>>> This is why I proposed in [3] that the name of the mode reflects how user can interact
>>>>> with the system. Instead of one "mbm_cntr_assign" mode there can be "mbm_cntr_event_assign"
>>>>> that is used for ABMC and "mbm_cntr_group_assign" that is used for soft-ABMC. The mode should
>>>>> make it clear what the system is capable of wrt counter assignments.
>>>>
>>>> Yes, that makes sense. Perhaps we can also simplify it further:
>>>>
>>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode:
>>>> [mbm_cntr_evt_assign] <- for ABMC
>>>>    mbm_cntr_grp_assign  <- for soft-ABMC
>>>
>>> Looks good to me. Thank you.
>>
>> I am actually ready with v14 series. I have good feeling that we are getting closer to making these changes final.
>>
>> So, Looking back again, it might make more sense to rename few user visible interfaces.
>>
>> 1. # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode.
>>     [mbm_assign_event] <- for ABMC
>>      mbm_assign_group  <- for soft-ABMC
>>
>>     This looks much more cleaner.  It matches with "mbm_assign_mode"
> 
> ah, I see, by dropping "cntr" it reduces confusion where ABMC assigns counters
> and soft-ABMC assigned RMID. This looks good.
> 
> Taking this further, the "assign" term in "mbm_assign_event" and "mbm_assign_group" may also
> be redundant considering that the filename, "mbm_assign_mode", already has "assign" in its name.

ok. Sure. It will be

# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode.
   [mbm_event] <- for ABMC
    mbm_group  <- for soft-ABMC


> 
>>
>> Similarly, we can rename few functions and variable names to make little more readable.
>>
>> 2. mbm_cntr_assignable -> mbm_assignable
>>
> 
> I have no insight into how the soft-ABMC implementation will look and thus if it will
> build on this property. If soft-ABMC uses the property then making it more generic may
> help, but if it does not then it may make the code harder to read. Since this is all
> internal I'd vote for keeping it mbm_cntr_assignable since the current implementation
> directly associates it with hardware counters. I do not know if there will be a scenario
> where a system may support *both* event and group assignable counters. The idea did
> briefly come up[1]. If that may be possible then resctrl would need to distinguish them.
> Also, interesting to note that the example used in (1) above notes a system that
> supports both event and group assignment.

Ok. That is fine. Lets keep it as is then.

> 
>> 3. resctrl_arch_mbm_cntr_assign_enabled
>>   -> >resctrl_arch_mbm_assign_enabled
>>
> 
> This is directly connected to choice for (2)

Ok.

> 
>> 4. mbm_cntr_assign_enabled -> mbm_assign_enabled
> 
> hmmm ... here mbm_cntr_assign_enabled is even more directly associated with hardware
> support for counter assignment. It is not clear what the benefit is to make it generic.

Ok.

> 
>>
>> 5. resctrl_arch_mbm_cntr_assign_set_one ->
>>
>>     resctrl_arch_mbm_assign_set_one.
> 
> Same as (4)
> 
>>
>> 6. There will few more functions. I will look into that if you agree with approach.
>>
>> 7. No need to change few of these below. These are related to actual counters.
>>     num_mbm_cntrs
>>     available_mbm_cntrs
>>
>> What do you think?
> 
> It sounds to me as though you are aiming to make the ABMC implementation more
> generic in preparation for soft-ABMC support. If you have insight into the soft-ABMC
> implementation then please share the details for this to be taken into account.
> Until then I think it will be simpler for the implementation to be specific to
> the feature being enabled here. When soft-ABMC enabling arrives the needed changes
> can be made. Since this is about internals of resctrl (not the user interface) we
> are not as pressured to "get it right" while not having all information required
> to make these choices.

Ok. Sure. That is fine. Lets keep the internals to implementation 
specific for now.

Thanks
Babu

> 
> Reinette
>   
> [1] https://lore.kernel.org/lkml/CALPaoCj438UfH3QA_VnGo-pj2a_48sJufUWjBKT3MQatcMJ_Uw@mail.gmail.com/
> 


^ permalink raw reply	[flat|nested] 114+ messages in thread

end of thread, other threads:[~2025-06-11 21:22 UTC | newest]

Thread overview: 114+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-15 22:51 [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
2025-05-15 22:51 ` [PATCH v13 01/27] x86/cpufeatures: Add support for " Babu Moger
2025-05-22 20:51   ` Reinette Chatre
2025-05-27 17:23     ` Moger, Babu
2025-05-27 17:54       ` Reinette Chatre
2025-05-27 18:40         ` Moger, Babu
2025-05-27 23:42           ` Reinette Chatre
2025-05-28 16:18             ` Moger, Babu
2025-05-15 22:51 ` [PATCH v13 02/27] x86/resctrl: Add ABMC feature in the command line options Babu Moger
2025-05-15 22:51 ` [PATCH v13 03/27] x86/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
2025-05-22 20:52   ` Reinette Chatre
2025-05-27 18:49     ` Moger, Babu
2025-05-15 22:51 ` [PATCH v13 04/27] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
2025-05-22 20:54   ` Reinette Chatre
2025-05-27 19:52     ` Moger, Babu
2025-05-27 20:15     ` Moger, Babu
2025-05-15 22:51 ` [PATCH v13 05/27] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
2025-05-22 20:56   ` Reinette Chatre
2025-05-27 20:21     ` Moger, Babu
2025-05-15 22:51 ` [PATCH v13 06/27] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
2025-05-22 20:56   ` Reinette Chatre
2025-05-27 20:33     ` Moger, Babu
2025-05-15 22:51 ` [PATCH v13 07/27] x86/resctrl: Introduce interface to display number of monitoring counters Babu Moger
2025-05-15 22:51 ` [PATCH v13 08/27] x86/resctrl: Introduce mbm_cntr_cfg to track assignable counters at domain Babu Moger
2025-05-22 21:02   ` Reinette Chatre
2025-05-28 16:56     ` Moger, Babu
2025-05-28 17:34       ` Reinette Chatre
2025-05-28 19:05         ` Moger, Babu
2025-05-15 22:51 ` [PATCH v13 09/27] x86/resctrl: Introduce interface to display number of free MBM counters Babu Moger
2025-05-15 22:51 ` [PATCH v13 10/27] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
2025-05-22 21:10   ` Reinette Chatre
2025-05-28 19:15     ` Moger, Babu
2025-05-15 22:51 ` [PATCH v13 11/27] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC Babu Moger
2025-05-22 21:51   ` Reinette Chatre
2025-05-22 22:16     ` Luck, Tony
2025-05-23 21:08       ` Luck, Tony
2025-05-26 13:14         ` Peter Newman
2025-05-27 21:41           ` Luck, Tony
2025-05-28 21:41             ` Moger, Babu
2025-05-28 22:00               ` Luck, Tony
2025-05-28 22:13                 ` Luck, Tony
2025-05-28 23:48                   ` Moger, Babu
2025-06-09 14:01               ` Moger, Babu
2025-05-28 21:39     ` Moger, Babu
2025-05-15 22:51 ` [PATCH v13 12/27] x86/resctrl: Introduce event configuration modes Babu Moger
2025-05-22 22:05   ` Reinette Chatre
2025-05-29 15:21     ` Moger, Babu
2025-05-15 22:51 ` [PATCH v13 13/27] x86/resctrl: Add the functionality to assign MBM events Babu Moger
2025-05-22 22:41   ` Reinette Chatre
2025-05-29 16:05     ` Moger, Babu
2025-05-15 22:51 ` [PATCH v13 14/27] x86/resctrl: Add the functionality to unassign " Babu Moger
2025-05-22 22:49   ` Reinette Chatre
2025-05-29 16:25     ` Moger, Babu
2025-05-15 22:52 ` [PATCH v13 15/27] x86/resctrl: Report 'Unassigned' for MBM events in mbm_cntr_assign mode Babu Moger
2025-05-22 23:01   ` Reinette Chatre
2025-05-29 16:58     ` Moger, Babu
2025-05-15 22:52 ` [PATCH v13 16/27] x86/resctrl: Pass entire struct rdtgroup rather than passing individual members Babu Moger
2025-05-22 23:05   ` Reinette Chatre
2025-05-29 18:07     ` Moger, Babu
2025-05-15 22:52 ` [PATCH v13 17/27] x86/resctrl: Add the support for reading ABMC counters Babu Moger
2025-05-22 23:31   ` Reinette Chatre
2025-05-29 18:25     ` Moger, Babu
2025-05-15 22:52 ` [PATCH v13 18/27] x86/resctrl: Add definitions for MBM event configuration Babu Moger
2025-05-23  4:41   ` Reinette Chatre
2025-05-29 19:00     ` Moger, Babu
2025-05-29 20:58       ` Reinette Chatre
2025-06-03 13:41         ` Moger, Babu
2025-05-15 22:52 ` [PATCH v13 19/27] x86/resctrl: Add event configuration directory under info/L3_MON/ Babu Moger
2025-05-23  4:43   ` Reinette Chatre
2025-05-29 19:54     ` Moger, Babu
2025-05-15 22:52 ` [PATCH v13 20/27] x86/resctrl: Provide interface to update the event configurations Babu Moger
2025-05-23  4:45   ` Reinette Chatre
2025-05-29 22:35     ` Moger, Babu
2025-05-15 22:52 ` [PATCH v13 21/27] x86/resctrl: Introduce mbm_assign_on_mkdir to configure assignments Babu Moger
2025-05-23  4:48   ` Reinette Chatre
2025-05-29 23:03     ` Moger, Babu
2025-05-30 20:54       ` Reinette Chatre
2025-06-03 14:00         ` Moger, Babu
2025-05-15 22:52 ` [PATCH v13 22/27] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled Babu Moger
2025-05-15 22:52 ` [PATCH v13 23/27] x86/resctrl: Introduce mbm_L3_assignments to list assignments in a group Babu Moger
2025-05-23  4:47   ` Reinette Chatre
2025-05-30  0:55     ` Moger, Babu
2025-05-15 22:52 ` [PATCH v13 24/27] x86/resctrl: Introduce the interface to modify " Babu Moger
2025-05-26  9:48   ` Peter Newman
2025-05-27 15:24     ` Moger, Babu
2025-05-15 22:52 ` [PATCH v13 25/27] x86/resctrl: Hide the BMEC related files when mbm_cnt_assign is enabled Babu Moger
2025-05-15 22:52 ` [PATCH v13 26/27] x86/resctrl: Introduce the interface to switch between monitor modes Babu Moger
2025-05-15 22:52 ` [PATCH v13 27/27] x86/resctrl: Configure mbm_cntr_assign mode if supported Babu Moger
2025-05-19 15:59 ` [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Peter Newman
2025-05-20 15:28   ` Moger, Babu
2025-05-20 16:06     ` Reinette Chatre
2025-05-20 17:51       ` Moger, Babu
2025-05-20 18:23         ` Reinette Chatre
2025-05-20 23:25           ` Moger, Babu
2025-05-20 23:44             ` Reinette Chatre
2025-05-21  9:18               ` Peter Newman
2025-05-21 23:03                 ` Reinette Chatre
2025-05-21 23:43                   ` Luck, Tony
2025-05-22  0:10                     ` Reinette Chatre
2025-05-22  0:21                       ` Luck, Tony
2025-05-22  8:47                         ` Peter Newman
2025-05-22 16:32                           ` Reinette Chatre
2025-05-22 17:21                           ` Luck, Tony
2025-05-22 15:44                   ` Moger, Babu
2025-05-22 16:33                     ` Reinette Chatre
2025-05-22 19:15                       ` Moger, Babu
2025-06-10 23:19                       ` Moger, Babu
2025-06-11 18:29                         ` Reinette Chatre
2025-06-11 21:21                           ` Moger, Babu
2025-05-21 14:27               ` Peter Newman
2025-05-21 23:05                 ` Reinette Chatre
2025-05-22  9:14                   ` Peter Newman
2025-05-22 16:33                     ` Reinette Chatre
2025-05-22 20:44 ` Reinette Chatre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).