linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
@ 2024-10-09 17:39 Babu Moger
  2024-10-09 17:39 ` [PATCH v8 01/25] x86/cpufeatures: Add support for " Babu Moger
                   ` (25 more replies)
  0 siblings, 26 replies; 124+ messages in thread
From: Babu Moger @ 2024-10-09 17:39 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse


This series adds the support for Assignable Bandwidth Monitoring Counters
(ABMC). It is also called QoS RMID Pinning feature

Series is written such that it is easier to support other assignable
features supported from different vendors.

The feature details are documented in the  APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC). The documentation is available at
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537

The patches are based on top of commit
5b0c5f05fb2fe (tip/master) Merge branch into tip/master: 'x86/splitlock'

# Introduction

Users can create as many monitor groups as RMIDs supported by the hardware.
However, bandwidth monitoring feature on AMD system only guarantees that
RMIDs currently assigned to a processor will be tracked by hardware.
The counters of any other RMIDs which are no longer being tracked will be
reset to zero. The MBM event counters return "Unavailable" for the RMIDs
that are not tracked by hardware. So, there can be only limited number of
groups that can give guaranteed monitoring numbers. With ever changing
configurations there is no way to definitely know which of these groups
are being tracked for certain point of time. Users do not have the option
to monitor a group or set of groups for certain period of time without
worrying about RMID being reset in between.
    
The ABMC feature provides an option to the user to assign a hardware
counter to an RMID, event pair and monitor the bandwidth as long as it is
assigned.  The assigned RMID will be tracked by the hardware until the user
unassigns it manually. There is no need to worry about counters being reset
during this period. Additionally, the user can specify a bitmask identifying
the specific bandwidth types from the given source to track with the counter.

Without ABMC enabled, monitoring will work in current 'default' mode without
assignment option.

# Linux Implementation

Create a generic interface aimed to support user space assignment
of scarce counters used for monitoring. First usage of interface
is by ABMC with option to expand usage to "soft-ABMC" and MPAM
counters in future.

Feature adds following interface files:

/sys/fs/resctrl/info/L3_MON/mbm_assign_mode: Reports the list of assignable
monitoring features supported. The enclosed brackets indicate which
feature is enabled.

/sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
counters available for assignment.

/sys/fs/resctrl/info/L3_MON/mbm_assign_control: Reports the resctrl group and monitor
status of each group. Assignment state can be updated by writing to the
interface.

# Examples

a. Check if ABMC support is available
	#mount -t resctrl resctrl /sys/fs/resctrl/

	#cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
	[mbm_cntr_assign]
	default

	ABMC feature is detected and it is enabled.

b. Check how many ABMC counters are available. 

	#cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs 
	32

c. Create few resctrl groups.

	# mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
	# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
	# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp


d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control
   to list and modify any group's monitoring states. File provides single place
   to list monitoring states of all the resctrl groups. It makes it easier for
   user space to learn about the used counters without needing to traverse all
   the groups thus reducing the number of file system calls.

	The list follows the following format:

	"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"

	Format for specific type of groups:

	* Default CTRL_MON group:
	 "//<domain_id>=<flags>"

       * Non-default CTRL_MON group:
               "<CTRL_MON group>//<domain_id>=<flags>"

       * Child MON group of default CTRL_MON group:
               "/<MON group>/<domain_id>=<flags>"

       * Child MON group of non-default CTRL_MON group:
               "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"

       Flags can be one of the following:

        t  MBM total event is enabled.
        l  MBM local event is enabled.
        tl Both total and local MBM events are enabled.
        _  None of the MBM events are enabled

	Examples:

	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control 
	non_default_ctrl_mon_grp//0=tl;1=tl;
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
	//0=tl;1=tl;
	/child_default_mon_grp/0=tl;1=tl;
	
	There are four groups and all the groups have local and total
	event enabled on domain 0 and 1.

e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control.

 	The write format is similar to the above list format with addition
	of opcode for the assignment operation.
    	“<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>”

	
	* Default CTRL_MON group:
	        "//<domain_id><opcode><flags>"
	
	* Non-default CTRL_MON group:
	        "<CTRL_MON group>//<domain_id><opcode><flags>"
	
	* Child MON group of default CTRL_MON group:
	        "/<MON group>/<domain_id><opcode><flags>"
	
	* Child MON group of non-default CTRL_MON group:
	        "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
	
	Opcode can be one of the following:
	
	= Update the assignment to match the flags.
	+ Assign a new MBM event without impacting existing assignments.
	- Unassign a MBM event from currently assigned events.

	Flags can be one of the following:

        t  MBM total event.
        l  MBM local event.
        tl Both total and local MBM events.
        _  None of the MBM events. Only works with '=' opcode. This flag cannot be combined with other flags.
	
	Initial group status:
	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
	non_default_ctrl_mon_grp//0=tl;1=tl;
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
	//0=tl;1=tl;
	/child_default_mon_grp/0=tl;1=tl;

	To update the default group to enable only total event on domain 0:
	# echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control

	Assignment status after the update:
	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
	non_default_ctrl_mon_grp//0=tl;1=tl;
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
	//0=t;1=tl;
	/child_default_mon_grp/0=tl;1=tl;

	To update the MON group child_default_mon_grp to remove total event on domain 1:
	# echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control

	Assignment status after the update:
	$ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
	non_default_ctrl_mon_grp//0=tl;1=tl;
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
	//0=t;1=tl;
	/child_default_mon_grp/0=tl;1=l;

	To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to
	remove both local and total events on domain 1:
	# echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
	       /sys/fs/resctrl/info/L3_MON/mbm_assign_control

	Assignment status after the update:
	non_default_ctrl_mon_grp//0=tl;1=tl;
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
	//0=t;1=tl;
	/child_default_mon_grp/0=tl;1=l;

	To update the default group to add a local event domain 0.
	# echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control

	Assignment status after the update:
	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
	non_default_ctrl_mon_grp//0=tl;1=tl;
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
	//0=tl;1=tl;
	/child_default_mon_grp/0=tl;1=l;

	To update the non default CTRL_MON group non_default_ctrl_mon_grp to unassign all
	the MBM events on all the domains.
	# echo "non_default_ctrl_mon_grp//*=_" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control

	Assignment status after the update:
	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
	non_default_ctrl_mon_grp//0=_;1=_;
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
	//0=tl;1=tl;
	/child_default_mon_grp/0=tl;1=l;


f. Read the event mbm_total_bytes and mbm_local_bytes of the default group.
   There is no change in reading the events with ABMC. If the event is unassigned
   when reading, then the read will come back as "Unassigned".
	
	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	779247936
	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes 
	765207488
	
g. Check the bandwidth configuration for the group. Note that bandwidth
   configuration has a domain scope. Total event defaults to 0x7F (to
   count all the events) and local event defaults to 0x15 (to count all
   the local numa events). The event bitmap decoding is available at
   https://www.kernel.org/doc/Documentation/x86/resctrl.rst
   in section "mbm_total_bytes_config", "mbm_local_bytes_config":
	
	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
	0=0x7f;1=0x7f
	
	#cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config 
	0=0x15;1=0x15
	
h. Change the bandwidth source for domain 0 for the total event to count only reads.
   Note that this change effects total events on the domain 0.
	
	#echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
	0=0x33;1=0x7F
	
i. Now read the total event again. The first read will come back with "Unavailable"
   status. The subsequent read of mbm_total_bytes will display only the read events.
	
	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	Unavailable
	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	314101

j. Users will have the option to go back to 'default' mbm_assign_mode if required.
   This can be done using the following command. Note that switching the
   mbm_assign_mode will reset all the MBM counters of all resctrl groups.

	# echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
	mbm_cntr_assign
	[default]

	
k. Unmount the resctrl
	 
	#umount /sys/fs/resctrl/
---
v8:
  Patches are getting into final stages. 
  Couple of changes Patch 8, Patch 19 and Patch 23.
  Most of the other changes are related to rename and text message updates.

  Details are in each patch. Here is the summary.

  Added __init attribute to dom_data_init() in patch 8/25.
  Moved the mbm_cntrs_init() and mbm_cntrs_exit() functionality inside
  dom_data_init() and dom_data_exit() respectively.

  Renamed resctrl_mbm_evt_config_init() to arch_mbm_evt_config_init()
  Renamed resctrl_arch_event_config_get() to resctrl_arch_mon_event_config_get().
          resctrl_arch_event_config_set() to resctrl_arch_mon_event_config_set().

  Rename resctrl_arch_assign_cntr to resctrl_arch_config_cntr.
  Renamed rdtgroup_assign_cntr() to rdtgroup_assign_cntr_event().
  Added the code to return the error if rdtgroup_assign_cntr_event fails.
  Moved definition of MBM_EVENT_ARRAY_INDEX to resctrl/internal.h.
  Renamed rdtgroup_mbm_cntr_is_assigned to mbm_cntr_assigned_to_domain
  Added return error handling in resctrl_arch_config_cntr().
  Renamed rdtgroup_assign_grp to rdtgroup_assign_cntrs.
  Renamed rdtgroup_unassign_grp to rdtgroup_unassign_cntrs.
  Fixed the problem with unassigning the child MON groups of CTRL_MON group.
  Reset the internal counters after mbm_cntr_assign mode is changed.
  Renamed rdtgroup_mbm_cntr_reset() to mbm_cntr_reset()
  Renamed resctrl_arch_mbm_cntr_assign_configure to
            resctrl_arch_mbm_cntr_assign_set_one.

  Used the same IPI as event update to modify the assignment.
  Could not do the way we discussed in the thread.
  https://lore.kernel.org/lkml/f77737ac-d3f6-3e4b-3565-564f79c86ca8@amd.com/
  Needed to figure out event type to update the configuration.

  Moved unassign first and assign during the assign modification.
  Assign none "_" takes priority. Cannot be mixed with other flags.
  Updated the documentation and .rst file format. htmldoc looks ok.

v7:
   Major changes are related to FS and arch codes separation.
   Changed few interface names based on feedback.
   Here are the summary and each patch contains changes specific the patch.

   Removed WARN_ON for num_mbm_cntrs. Decided to dynamically allocate the bitmap.
   WARN_ON is not required anymore.
 
   Renamed the function resctrl_arch_get_abmc_enabled() to resctrl_arch_mbm_cntr_assign_enabled().

   Merged resctrl_arch_mbm_cntr_assign_disable, resctrl_arch_mbm_cntr_assign_disable
   and renamed to resctrl_arch_mbm_cntr_assign_set(). Passed the struct rdt_resource
   to these functions.

   Removed resctrl_arch_reset_rmid_all() from arch code. This will be done from FS the caller.

   Updated the descriptions/commit log in resctrl.rst to generic text. Removed ABMC references.
   Renamed mbm_mode to mbm_assign_mode.
   Renamed mbm_control to  mbm_assign_control.
   Introduced mutex lock in rdtgroup_mbm_mode_show().
 
   The 'legacy' mode is called 'default' mode. 

   Removed the static allocation and now allocating bitmap mbm_cntr_free_map dynamically.

   Merged rdtgroup_assign_cntr(), rdtgroup_alloc_cntr() into one.
   Merged rdtgroup_unassign_cntr(), rdtgroup_free_cntr() into one.
   
  Added struct rdt_resource to the interface functions resctrl_arch_assign_cntr ()
  and resctrl_arch_unassign_cntr().
  Rename rdtgroup_abmc_cfg() to resctrl_abmc_config_one_amd().
   
  Added a new patch to fix counter assignment on event config changes.

  Removed the references of ABMC from user interfaces.

  Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write().
  Added mutex lock in rdtgroup_mbm_assign_control_write() while processing.

  Thomas Gleixner asked us to update  https://gitlab.com/x86-cpuid.org/x86-cpuid-db. 
  It needs internal approval. We are working on it.

v6:
  We still need to finalize few interface details on mbm_assign_mode and mbm_assign_control
  in case of ABMC and Soft-ABMC. We can continue the discussion with this series.

  Added support for domain-id '*' to update all the domains at once.
  Fixed assign interface to allocate the counter if counter is
  not assigned.   
  Fixed unassign interface to free the counter if the counter is not
  assigned in any of the domains.

  Renamed abmc_capable to mbm_cntr_assignable.

  Renamed abmc_enabled to mbm_cntr_assign_enabled.
  Used msr_set_bit and msr_clear_bit for msr updates.
  Renamed resctrl_arch_abmc_enable() to resctrl_arch_mbm_cntr_assign_enable().
  Renamed resctrl_arch_abmc_disable() to resctrl_arch_mbm_cntr_assign_disable().

  Changed the display name from num_cntrs to num_mbm_cntrs.

  Removed the variable mbm_cntrs_free_map_len. This is not required.
  Removed the call mbm_cntrs_init() in arch code. This needs to be done at higher level.
  Used DECLARE_BITMAP to initialize mbm_cntrs_free_map.
  Removed unused config value definitions.

  Introduced mbm_cntr_map to track counters at domain level. With this
  we dont need to send MSR read to read the counter configuration.

  Separated all the counter id management to upper level in FS code.

  Added checks to detect "Unassigned" before reading the RMID.

  More details in each patch.

v5:
  Rebase changes (because of SNC support)

  Interface changes.
   /sys/fs/resctrl/mbm_assign to /sys/fs/resctrl/mbm_assign_mode.
   /sys/fs/resctrl/mbm_assign_control to /sys/fs/resctrl/mbm_assign_control.

  Added few arch specific routines.
  resctrl_arch_get_abmc_enabled.
  resctrl_arch_abmc_enable.
  resctrl_arch_abmc_disable.

  Few renames
   num_cntrs_free_map -> mbm_cntrs_free_map
   num_cntrs_init -> mbm_cntrs_init
   arch_domain_mbm_evt_config -> resctrl_arch_mbm_evt_config

  Introduced resctrl_arch_event_config_get and
    resctrl_arch_event_config_set() to update event configuration.

  Removed mon_state field mongroup. Added MON_CNTR_UNSET to initialize counters.

  Renamed ctr_id to cntr_id for the hardware counter.
 
  Report "Unassigned" in case the user attempts to read the events without assigning the counter.
  
  ABMC is enabled during the boot up. Can be enabled or disabled later.

  Fixed opcode and flags combination.
    '=_" is valid.
    "-_" amd "+_" is not valid.

 Added all the comments as far as I know. If I missed something, it is not intentional.

v4: 
  Main change is domain specific event assignment.
  Kept the ABMC feature as a default.
  Dynamcic switching between ABMC and mbm_legacy is still allowed.
  We are still not clear about mount option.
  Moved the monitoring related data in resctrl_mon structure from rdt_resource.
  Fixed the display of legacy and ABMC mode.
  Used bimap APIs when possible.
  Removed event configuration read from MSRs. We can use the
  internal saved data.(patch 12)
  Added more comments about L3_QOS_ABMC_CFG MSR.
  Added IPIs to read the assignment status for each domain (patch 18 and 19)
  More details in each patch.

v3:
   This series adds the support for global assignment mode discussed in
   the thread. https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/
   Removed the individual assignment mode and included the global assignment interface.
   Added following interface files.
   a. /sys/fs/resctrl/info/L3_MON/mbm_assign
      Used for displaying the current assignment mode and switch between
      ABMC and legacy mode.
   b. /sys/fs/resctrl/info/L3_MON/mbm_assign_control
      Used for lising the groups assignment mode and modify the assignment states.
   c. Most of the changes are related to the new interface.
   d. Addressed the comments from Reinette, James and Peter.
   e. Hope I have addressed most of the major feedbacks discussed. If I missed
      something then it is not intentional. Please feel free to comment.
   f. Sending this as an RFC as per Reinette's comment. So, this is still open
      for discussion.

v2:
   a. Major change is the way ABMC is enabled. Earlier, user needed to remount
      with -o abmc to enable ABMC feature. Removed that option now.
      Now users can enable ABMC by "$echo 1 to /sys/fs/resctrl/info/L3_MON/mbm_assign_enable".
     
   b. Added new word 21 to x86/cpufeatures.h.

   c. Display unsupported if user attempts to read the events when ABMC is enabled
      and event is not assigned.

   d. Display monitor_state as "Unsupported" when ABMC is disabled.
  
   e. Text updates and rebase to latest tip tree (as of Jan 18).
 
   f. This series is still work in progress. I am yet to hear from ARM developers. 

v7:
  https://lore.kernel.org/lkml/cover.1725488488.git.babu.moger@amd.com/

v6:
  https://lore.kernel.org/lkml/cover.1722981659.git.babu.moger@amd.com/

v5:
  https://lore.kernel.org/lkml/cover.1720043311.git.babu.moger@amd.com/

v4:
  https://lore.kernel.org/lkml/cover.1716552602.git.babu.moger@amd.com/

v3:
 https://lore.kernel.org/lkml/cover.1711674410.git.babu.moger@amd.com/  

v2:
  https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/

v1 :
   https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/


Babu Moger (24):
  x86/cpufeatures: Add support for Assignable Bandwidth Monitoring
    Counters (ABMC)
  x86/resctrl: Add ABMC feature in the command line options
  x86/resctrl: Consolidate monitoring related data from rdt_resource
  x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
  x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags
  x86/resctrl: Add support to enable/disable AMD ABMC feature
  x86/resctrl: Introduce the interface to display monitor mode
  x86/resctrl: Introduce interface to display number of monitoring
    counters
  x86/resctrl: Introduce bitmap mbm_cntr_free_map to track assignable
    counters
  x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg in struct
    rdt_hw_mon_domain
  x86/resctrl: Remove MSR reading of event configuration value
  x86/resctrl: Introduce mbm_cntr_map to track counters at domain
  x86/resctrl: Add data structures and definitions for ABMC assignment
  x86/resctrl: Introduce cntr_id in mongroup for assignments
  x86/resctrl: Implement resctrl_arch_assign_cntr to assign a counter
    with ABMC
  x86/resctrl: Add the interface to assign/update counter assignment
  x86/resctrl: Add the interface to unassign a MBM counter
  x86/resctrl: Auto Assign/unassign counters when mbm_cntr_assign is
    enabled
  x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign
    mode
  x86/resctrl: Introduce the interface to switch between monitor modes
  x86/resctrl: Configure mbm_cntr_assign mode if supported
  x86/resctrl: Update assignments on event configuration changes
  x86/resctrl: Introduce interface to list assignment states of all the
    groups
  x86/resctrl: Introduce interface to modify assignment states of the
    groups

 .../admin-guide/kernel-parameters.txt         |   2 +-
 Documentation/arch/x86/resctrl.rst            | 198 ++++
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/msr-index.h              |   2 +
 arch/x86/kernel/cpu/cpuid-deps.c              |   3 +
 arch/x86/kernel/cpu/resctrl/core.c            |  19 +-
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c     |  13 +-
 arch/x86/kernel/cpu/resctrl/internal.h        |  77 +-
 arch/x86/kernel/cpu/resctrl/monitor.c         |  90 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c        | 875 ++++++++++++++++--
 arch/x86/kernel/cpu/scattered.c               |   1 +
 include/linux/resctrl.h                       |  31 +-
 12 files changed, 1227 insertions(+), 85 deletions(-)

-- 
2.34.1



Babu Moger (25):
  x86/cpufeatures: Add support for Assignable Bandwidth Monitoring
    Counters (ABMC)
  x86/resctrl: Add ABMC feature in the command line options
  x86/resctrl: Consolidate monitoring related data from rdt_resource
  x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
  x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags
  x86/resctrl: Add support to enable/disable AMD ABMC feature
  x86/resctrl: Introduce the interface to display monitor mode
  x86/resctrl: Introduce interface to display number of monitoring
    counters
  x86/resctrl: Add __init attribute to dom_data_init()
  x86/resctrl: Introduce bitmap mbm_cntr_free_map to track assignable
    counters
  x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg in struct
    rdt_hw_mon_domain
  x86/resctrl: Remove MSR reading of event configuration value
  x86/resctrl: Introduce mbm_cntr_map to track assignable counters at
    domain
  x86/resctrl: Add data structures and definitions for ABMC assignment
  x86/resctrl: Introduce cntr_id in mongroup for assignments
  x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter
    with ABMC
  x86/resctrl: Add the interface to assign/update counter assignment
  x86/resctrl: Add the interface to unassign a MBM counter
  x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is
    enabled
  x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign
    mode
  x86/resctrl: Introduce the interface to switch between monitor modes
  x86/resctrl: Configure mbm_cntr_assign mode if supported
  x86/resctrl: Update assignments on event configuration changes
  x86/resctrl: Introduce interface to list assignment states of all the
    groups
  x86/resctrl: Introduce interface to modify assignment states of the
    groups

 .../admin-guide/kernel-parameters.txt         |   2 +-
 Documentation/arch/x86/resctrl.rst            | 221 +++++
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/msr-index.h              |   2 +
 arch/x86/kernel/cpu/cpuid-deps.c              |   3 +
 arch/x86/kernel/cpu/resctrl/core.c            |  19 +-
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c     |  13 +-
 arch/x86/kernel/cpu/resctrl/internal.h        |  87 +-
 arch/x86/kernel/cpu/resctrl/monitor.c         | 110 ++-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c        | 899 ++++++++++++++++--
 arch/x86/kernel/cpu/scattered.c               |   1 +
 include/linux/resctrl.h                       |  31 +-
 12 files changed, 1293 insertions(+), 96 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 124+ messages in thread

* [PATCH v8 01/25] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC)
  2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
@ 2024-10-09 17:39 ` Babu Moger
  2024-10-09 17:39 ` [PATCH v8 02/25] x86/resctrl: Add ABMC feature in the command line options Babu Moger
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 124+ messages in thread
From: Babu Moger @ 2024-10-09 17:39 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Users can create as many monitor groups as RMIDs supported by the hardware.
However, bandwidth monitoring feature on AMD system only guarantees that
RMIDs currently assigned to a processor will be tracked by hardware. The
counters of any other RMIDs which are no longer being tracked will be
reset to zero. The MBM event counters return "Unavailable" for the RMIDs
that are not tracked by hardware. So, there can be only limited number of
groups that can give guaranteed monitoring numbers. With ever changing
configurations there is no way to definitely know which of these groups
are being tracked for certain point of time. Users do not have the option
to monitor a group or set of groups for certain period of time without
worrying about RMID being reset in between.

The ABMC feature provides an option to the user to assign a hardware
counter to an RMID, event pair and monitor the bandwidth as long as it is
assigned. The assigned RMID will be tracked by the hardware until the user
unassigns it manually. There is no need to worry about counters being reset
during this period. Additionally, the user can specify a bitmask identifying
the specific bandwidth types from the given source to track with the counter.

Without ABMC enabled, monitoring will work in current mode without
assignment option.

Linux resctrl subsystem provides the interface to count maximum of two
memory bandwidth events per group, from a combination of available total
and local events. Keeping the current interface, users can enable a maximum
of 2 ABMC counters per group. User will also have the option to enable only
one counter to the group. If the system runs out of assignable ABMC
counters, kernel will display an error. Users need to disable an already
enabled counter to make space for new assignments.

The feature can be detected via CPUID_Fn80000020_EBX_x00 bit 5.
Bits Description
5    ABMC (Assignable Bandwidth Monitoring Counters)

The feature details are documented in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
Note: Checkpatch checks/warnings are ignored to maintain coding style.

v8: No changes.

v7: Removed "" from feature flags. Not required anymore.
    https://lore.kernel.org/lkml/20240817145058.GCZsC40neU4wkPXeVR@fat_crate.local/

v6: Added Reinette's Reviewed-by. Moved the Checkpatch note below ---.

v5: Minor rebase change and subject line update.

v4: Changes because of rebase. Feature word 21 has few more additions now.
    Changed the text to "tracked by hardware" instead of active.

v3: Change because of rebase. Actual patch did not change.

v2: Added dependency on X86_FEATURE_BMEC.
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/cpu/cpuid-deps.c   | 3 +++
 arch/x86/kernel/cpu/scattered.c    | 1 +
 3 files changed, 5 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index dd4682857c12..4c514cb245ff 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -473,6 +473,7 @@
 #define X86_FEATURE_CLEAR_BHB_HW	(21*32+ 3) /* BHI_DIS_S HW control enabled */
 #define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* Clear branch history at vmexit using SW loop */
 #define X86_FEATURE_FAST_CPPC		(21*32 + 5) /* AMD Fast CPPC */
+#define X86_FEATURE_ABMC		(21*32 + 6) /* Assignable Bandwidth Monitoring Counters */
 
 /*
  * BUG word(s)
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 8bd84114c2d9..7e4d63b381d6 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -70,6 +70,9 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_CQM_MBM_LOCAL,		X86_FEATURE_CQM_LLC   },
 	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_TOTAL   },
 	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_LOCAL   },
+	{ X86_FEATURE_ABMC,			X86_FEATURE_CQM_MBM_TOTAL   },
+	{ X86_FEATURE_ABMC,			X86_FEATURE_CQM_MBM_LOCAL   },
+	{ X86_FEATURE_ABMC,			X86_FEATURE_BMEC      },
 	{ X86_FEATURE_AVX512_BF16,		X86_FEATURE_AVX512VL  },
 	{ X86_FEATURE_AVX512_FP16,		X86_FEATURE_AVX512BW  },
 	{ X86_FEATURE_ENQCMD,			X86_FEATURE_XSAVES    },
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index c84c30188fdf..87f63e6b2994 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -49,6 +49,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
 	{ X86_FEATURE_SMBA,		CPUID_EBX,  2, 0x80000020, 0 },
 	{ X86_FEATURE_BMEC,		CPUID_EBX,  3, 0x80000020, 0 },
+	{ X86_FEATURE_ABMC,		CPUID_EBX,  5, 0x80000020, 0 },
 	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
 	{ X86_FEATURE_AMD_LBR_V2,	CPUID_EAX,  1, 0x80000022, 0 },
 	{ X86_FEATURE_AMD_LBR_PMC_FREEZE,	CPUID_EAX,  2, 0x80000022, 0 },
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v8 02/25] x86/resctrl: Add ABMC feature in the command line options
  2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
  2024-10-09 17:39 ` [PATCH v8 01/25] x86/cpufeatures: Add support for " Babu Moger
@ 2024-10-09 17:39 ` Babu Moger
  2024-10-16  3:06   ` Reinette Chatre
  2024-10-09 17:39 ` [PATCH v8 03/25] x86/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
                   ` (23 subsequent siblings)
  25 siblings, 1 reply; 124+ messages in thread
From: Babu Moger @ 2024-10-09 17:39 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Add the command line option to enable or disable exposing the ABMC
(Assignable Bandwidth Monitoring Counters) hardware feature to resctrl.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v8: Commit message update.

v7: No changes

v6: No changes

v5: No changes

v4: No changes

v3: No changes

v2: No changes
---
 Documentation/admin-guide/kernel-parameters.txt | 2 +-
 Documentation/arch/x86/resctrl.rst              | 1 +
 arch/x86/kernel/cpu/resctrl/core.c              | 2 ++
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 1518343bbe22..b3b3ca564220 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5677,7 +5677,7 @@
 	rdt=		[HW,X86,RDT]
 			Turn on/off individual RDT features. List is:
 			cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
-			mba, smba, bmec.
+			mba, smba, bmec, abmc.
 			E.g. to turn on cmt and turn off mba use:
 				rdt=cmt,!mba
 
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index a824affd741d..30586728a4cd 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -26,6 +26,7 @@ MBM (Memory Bandwidth Monitoring)		"cqm_mbm_total", "cqm_mbm_local"
 MBA (Memory Bandwidth Allocation)		"mba"
 SMBA (Slow Memory Bandwidth Allocation)         ""
 BMEC (Bandwidth Monitoring Event Configuration) ""
+ABMC (Assignable Bandwidth Monitoring Counters) ""
 ===============================================	================================
 
 Historically, new features were made visible by default in /proc/cpuinfo. This
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 8591d53c144b..668148ceda0b 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -809,6 +809,7 @@ enum {
 	RDT_FLAG_MBA,
 	RDT_FLAG_SMBA,
 	RDT_FLAG_BMEC,
+	RDT_FLAG_ABMC,
 };
 
 #define RDT_OPT(idx, n, f)	\
@@ -834,6 +835,7 @@ static struct rdt_options rdt_options[]  __initdata = {
 	RDT_OPT(RDT_FLAG_MBA,	    "mba",	X86_FEATURE_MBA),
 	RDT_OPT(RDT_FLAG_SMBA,	    "smba",	X86_FEATURE_SMBA),
 	RDT_OPT(RDT_FLAG_BMEC,	    "bmec",	X86_FEATURE_BMEC),
+	RDT_OPT(RDT_FLAG_ABMC,	    "abmc",	X86_FEATURE_ABMC),
 };
 #define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options)
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v8 03/25] x86/resctrl: Consolidate monitoring related data from rdt_resource
  2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
  2024-10-09 17:39 ` [PATCH v8 01/25] x86/cpufeatures: Add support for " Babu Moger
  2024-10-09 17:39 ` [PATCH v8 02/25] x86/resctrl: Add ABMC feature in the command line options Babu Moger
@ 2024-10-09 17:39 ` Babu Moger
  2024-10-09 17:39 ` [PATCH v8 04/25] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 124+ messages in thread
From: Babu Moger @ 2024-10-09 17:39 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

The cache allocation and memory bandwidth allocation feature properties
are consolidated into struct resctrl_cache and struct resctrl_membw
respectively.

In preparation for more monitoring properties that will clobber the
existing resource struct more, re-organize the monitoring specific
properties to also be in a separate structure.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
v8: Added Reviewed-by from Reinette. No other changes.

v7: Added kernel doc for data structure. Minor text update.

v6: Update commit message and update kernel doc for rdt_resource.

v5: Commit message update.
    Also changes related to data structure updates does to SNC support.

v4: New patch.
---
 arch/x86/kernel/cpu/resctrl/core.c     |  4 ++--
 arch/x86/kernel/cpu/resctrl/monitor.c  | 18 +++++++++---------
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |  8 ++++----
 include/linux/resctrl.h                | 16 ++++++++++++----
 4 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 668148ceda0b..73bfc8d7a438 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -124,7 +124,7 @@ u32 resctrl_arch_system_num_rmid_idx(void)
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
 
 	/* RMID are independent numbers for x86. num_rmid_idx == num_rmid */
-	return r->num_rmid;
+	return r->mon.num_rmid;
 }
 
 /*
@@ -625,7 +625,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
 
 	arch_mon_domain_online(r, d);
 
-	if (arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
+	if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
 		mon_domain_free(hw_dom);
 		return;
 	}
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 851b561850e0..795fe91a8feb 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -222,7 +222,7 @@ static int logical_rmid_to_physical_rmid(int cpu, int lrmid)
 	if (snc_nodes_per_l3_cache == 1)
 		return lrmid;
 
-	return lrmid + (cpu_to_node(cpu) % snc_nodes_per_l3_cache) * r->num_rmid;
+	return lrmid + (cpu_to_node(cpu) % snc_nodes_per_l3_cache) * r->mon.num_rmid;
 }
 
 static int __rmid_read_phys(u32 prmid, enum resctrl_event_id eventid, u64 *val)
@@ -297,11 +297,11 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *
 
 	if (is_mbm_total_enabled())
 		memset(hw_dom->arch_mbm_total, 0,
-		       sizeof(*hw_dom->arch_mbm_total) * r->num_rmid);
+		       sizeof(*hw_dom->arch_mbm_total) * r->mon.num_rmid);
 
 	if (is_mbm_local_enabled())
 		memset(hw_dom->arch_mbm_local, 0,
-		       sizeof(*hw_dom->arch_mbm_local) * r->num_rmid);
+		       sizeof(*hw_dom->arch_mbm_local) * r->mon.num_rmid);
 }
 
 static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
@@ -1083,14 +1083,14 @@ static struct mon_evt mbm_local_event = {
  */
 static void l3_mon_evt_init(struct rdt_resource *r)
 {
-	INIT_LIST_HEAD(&r->evt_list);
+	INIT_LIST_HEAD(&r->mon.evt_list);
 
 	if (is_llc_occupancy_enabled())
-		list_add_tail(&llc_occupancy_event.list, &r->evt_list);
+		list_add_tail(&llc_occupancy_event.list, &r->mon.evt_list);
 	if (is_mbm_total_enabled())
-		list_add_tail(&mbm_total_event.list, &r->evt_list);
+		list_add_tail(&mbm_total_event.list, &r->mon.evt_list);
 	if (is_mbm_local_enabled())
-		list_add_tail(&mbm_local_event.list, &r->evt_list);
+		list_add_tail(&mbm_local_event.list, &r->mon.evt_list);
 }
 
 /*
@@ -1186,7 +1186,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 
 	resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
 	hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale / snc_nodes_per_l3_cache;
-	r->num_rmid = (boot_cpu_data.x86_cache_max_rmid + 1) / snc_nodes_per_l3_cache;
+	r->mon.num_rmid = (boot_cpu_data.x86_cache_max_rmid + 1) / snc_nodes_per_l3_cache;
 	hw_res->mbm_width = MBM_CNTR_WIDTH_BASE;
 
 	if (mbm_offset > 0 && mbm_offset <= MBM_CNTR_WIDTH_OFFSET_MAX)
@@ -1201,7 +1201,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 	 *
 	 * For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC.
 	 */
-	threshold = resctrl_rmid_realloc_limit / r->num_rmid;
+	threshold = resctrl_rmid_realloc_limit / r->mon.num_rmid;
 
 	/*
 	 * Because num_rmid may not be a power of two, round the value
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index d7163b764c62..f9f3b5db1987 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1097,7 +1097,7 @@ static int rdt_num_rmids_show(struct kernfs_open_file *of,
 {
 	struct rdt_resource *r = of->kn->parent->priv;
 
-	seq_printf(seq, "%d\n", r->num_rmid);
+	seq_printf(seq, "%d\n", r->mon.num_rmid);
 
 	return 0;
 }
@@ -1108,7 +1108,7 @@ static int rdt_mon_features_show(struct kernfs_open_file *of,
 	struct rdt_resource *r = of->kn->parent->priv;
 	struct mon_evt *mevt;
 
-	list_for_each_entry(mevt, &r->evt_list, list) {
+	list_for_each_entry(mevt, &r->mon.evt_list, list) {
 		seq_printf(seq, "%s\n", mevt->name);
 		if (mevt->configurable)
 			seq_printf(seq, "%s_config\n", mevt->name);
@@ -3057,13 +3057,13 @@ static int mon_add_all_files(struct kernfs_node *kn, struct rdt_mon_domain *d,
 	struct mon_evt *mevt;
 	int ret;
 
-	if (WARN_ON(list_empty(&r->evt_list)))
+	if (WARN_ON(list_empty(&r->mon.evt_list)))
 		return -EPERM;
 
 	priv.u.rid = r->rid;
 	priv.u.domid = do_sum ? d->ci->id : d->hdr.id;
 	priv.u.sum = do_sum;
-	list_for_each_entry(mevt, &r->evt_list, list) {
+	list_for_each_entry(mevt, &r->mon.evt_list, list) {
 		priv.u.evtid = mevt->evtid;
 		ret = mon_addfile(kn, mevt->name, priv.priv);
 		if (ret)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index d94abba1c716..3c2307c7c106 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -182,16 +182,26 @@ enum resctrl_scope {
 	RESCTRL_L3_NODE,
 };
 
+/**
+ * struct resctrl_mon - Monitoring related data of a resctrl resource
+ * @num_rmid:		Number of RMIDs available
+ * @evt_list:		List of monitoring events
+ */
+struct resctrl_mon {
+	int			num_rmid;
+	struct list_head	evt_list;
+};
+
 /**
  * struct rdt_resource - attributes of a resctrl resource
  * @rid:		The index of the resource
  * @alloc_capable:	Is allocation available on this machine
  * @mon_capable:	Is monitor feature available on this machine
- * @num_rmid:		Number of RMIDs available
  * @ctrl_scope:		Scope of this resource for control functions
  * @mon_scope:		Scope of this resource for monitor functions
  * @cache:		Cache allocation related data
  * @membw:		If the component has bandwidth controls, their properties.
+ * @mon:		Monitoring related data.
  * @ctrl_domains:	RCU list of all control domains for this resource
  * @mon_domains:	RCU list of all monitor domains for this resource
  * @name:		Name to use in "schemata" file.
@@ -199,7 +209,6 @@ enum resctrl_scope {
  * @default_ctrl:	Specifies default cache cbm or memory B/W percent.
  * @format_str:		Per resource format string to show domain value
  * @parse_ctrlval:	Per resource function pointer to parse control values
- * @evt_list:		List of monitoring events
  * @fflags:		flags to choose base and info files
  * @cdp_capable:	Is the CDP feature available on this resource
  */
@@ -207,11 +216,11 @@ struct rdt_resource {
 	int			rid;
 	bool			alloc_capable;
 	bool			mon_capable;
-	int			num_rmid;
 	enum resctrl_scope	ctrl_scope;
 	enum resctrl_scope	mon_scope;
 	struct resctrl_cache	cache;
 	struct resctrl_membw	membw;
+	struct resctrl_mon	mon;
 	struct list_head	ctrl_domains;
 	struct list_head	mon_domains;
 	char			*name;
@@ -221,7 +230,6 @@ struct rdt_resource {
 	int			(*parse_ctrlval)(struct rdt_parse_data *data,
 						 struct resctrl_schema *s,
 						 struct rdt_ctrl_domain *d);
-	struct list_head	evt_list;
 	unsigned long		fflags;
 	bool			cdp_capable;
 };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v8 04/25] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
  2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (2 preceding siblings ...)
  2024-10-09 17:39 ` [PATCH v8 03/25] x86/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
@ 2024-10-09 17:39 ` Babu Moger
  2024-10-16  3:06   ` Reinette Chatre
  2024-10-09 17:39 ` [PATCH v8 05/25] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags Babu Moger
                   ` (21 subsequent siblings)
  25 siblings, 1 reply; 124+ messages in thread
From: Babu Moger @ 2024-10-09 17:39 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

ABMC feature details are reported via CPUID Fn8000_0020_EBX_x5.
Bits Description
15:0 MAX_ABMC Maximum Supported Assignable Bandwidth
     Monitoring Counter ID + 1

The feature details are documented in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).

Detect the feature and number of assignable monitoring counters supported.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v8: Used GENMASK for the mask.

v7: Removed WARN_ON for num_mbm_cntrs. Decided to dynamically allocate the
    bitmap. WARN_ON is not required anymore.
    Removed redundant comments.

v6: Commit message update.
    Renamed abmc_capable to mbm_cntr_assignable.

v5: Name change num_cntrs to num_mbm_cntrs.
    Moved abmc_capable to resctrl_mon.

v4: Removed resctrl_arch_has_abmc(). Added all the code inline. We dont
    need to separate this as arch code.

v3: Removed changes related to mon_features.
    Moved rdt_cpu_has to core.c and added new function resctrl_arch_has_abmc.
    Also moved the fields mbm_assign_capable and mbm_assign_cntrs to
    rdt_resource. (James)

v2: Changed the field name to mbm_assign_capable from abmc_capable.
---
 arch/x86/kernel/cpu/resctrl/monitor.c | 6 ++++++
 include/linux/resctrl.h               | 4 ++++
 2 files changed, 10 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 795fe91a8feb..41a8b587f4f5 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1229,6 +1229,12 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 			mbm_local_event.configurable = true;
 			mbm_config_rftype_init("mbm_local_bytes_config");
 		}
+
+		if (rdt_cpu_has(X86_FEATURE_ABMC)) {
+			r->mon.mbm_cntr_assignable = true;
+			cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
+			r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
+		}
 	}
 
 	l3_mon_evt_init(r);
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 3c2307c7c106..511cfce8fc21 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -185,10 +185,14 @@ enum resctrl_scope {
 /**
  * struct resctrl_mon - Monitoring related data of a resctrl resource
  * @num_rmid:		Number of RMIDs available
+ * @num_mbm_cntrs:	Number of assignable monitoring counters
+ * @mbm_cntr_assignable:Is system capable of supporting monitor assignment?
  * @evt_list:		List of monitoring events
  */
 struct resctrl_mon {
 	int			num_rmid;
+	int			num_mbm_cntrs;
+	bool			mbm_cntr_assignable;
 	struct list_head	evt_list;
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v8 05/25] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags
  2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (3 preceding siblings ...)
  2024-10-09 17:39 ` [PATCH v8 04/25] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
@ 2024-10-09 17:39 ` Babu Moger
  2024-10-09 17:39 ` [PATCH v8 06/25] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 124+ messages in thread
From: Babu Moger @ 2024-10-09 17:39 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

thread_throttle_mode_init() and mbm_config_rftype_init() both initialize
fflags for resctrl files.

Adding new files will involve adding another function to initialize
the fflags. This can be simplified by adding a new function
resctrl_file_fflags_init() and passing the file name and flags
to be initialized.

Consolidate fflags initialization into resctrl_file_fflags_init() and
remove thread_throttle_mode_init() and mbm_config_rftype_init().

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
v8: No changes.

v7: No changes.

v6: Added Reviewed-by from Reinette.

v5: Commit message update.

v4: Commit message update.

v3: New patch to display ABMC capability.
---
 arch/x86/kernel/cpu/resctrl/core.c     |  4 +++-
 arch/x86/kernel/cpu/resctrl/internal.h |  4 ++--
 arch/x86/kernel/cpu/resctrl/monitor.c  |  6 ++++--
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 16 +++-------------
 4 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 73bfc8d7a438..186d8047578b 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -234,7 +234,9 @@ static bool __get_mem_config_intel(struct rdt_resource *r)
 		r->membw.throttle_mode = THREAD_THROTTLE_PER_THREAD;
 	else
 		r->membw.throttle_mode = THREAD_THROTTLE_MAX;
-	thread_throttle_mode_init();
+
+	resctrl_file_fflags_init("thread_throttle_mode",
+				 RFTYPE_CTRL_INFO | RFTYPE_RES_MB);
 
 	r->alloc_capable = true;
 
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 955999aecfca..2bd207624eec 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -647,8 +647,8 @@ void cqm_handle_limbo(struct work_struct *work);
 bool has_busy_rmid(struct rdt_mon_domain *d);
 void __check_limbo(struct rdt_mon_domain *d, bool force_free);
 void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
-void __init thread_throttle_mode_init(void);
-void __init mbm_config_rftype_init(const char *config);
+void __init resctrl_file_fflags_init(const char *config,
+				     unsigned long fflags);
 void rdt_staged_configs_clear(void);
 bool closid_allocated(unsigned int closid);
 int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 41a8b587f4f5..2f3bf4529498 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1223,11 +1223,13 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 
 		if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
 			mbm_total_event.configurable = true;
-			mbm_config_rftype_init("mbm_total_bytes_config");
+			resctrl_file_fflags_init("mbm_total_bytes_config",
+						 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
 		}
 		if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
 			mbm_local_event.configurable = true;
-			mbm_config_rftype_init("mbm_local_bytes_config");
+			resctrl_file_fflags_init("mbm_local_bytes_config",
+						 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
 		}
 
 		if (rdt_cpu_has(X86_FEATURE_ABMC)) {
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index f9f3b5db1987..7e76f8d839fc 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2020,24 +2020,14 @@ static struct rftype *rdtgroup_get_rftype_by_name(const char *name)
 	return NULL;
 }
 
-void __init thread_throttle_mode_init(void)
-{
-	struct rftype *rft;
-
-	rft = rdtgroup_get_rftype_by_name("thread_throttle_mode");
-	if (!rft)
-		return;
-
-	rft->fflags = RFTYPE_CTRL_INFO | RFTYPE_RES_MB;
-}
-
-void __init mbm_config_rftype_init(const char *config)
+void __init resctrl_file_fflags_init(const char *config,
+				     unsigned long fflags)
 {
 	struct rftype *rft;
 
 	rft = rdtgroup_get_rftype_by_name(config);
 	if (rft)
-		rft->fflags = RFTYPE_MON_INFO | RFTYPE_RES_CACHE;
+		rft->fflags = fflags;
 }
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v8 06/25] x86/resctrl: Add support to enable/disable AMD ABMC feature
  2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (4 preceding siblings ...)
  2024-10-09 17:39 ` [PATCH v8 05/25] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags Babu Moger
@ 2024-10-09 17:39 ` Babu Moger
  2024-10-11 18:14   ` Tony Luck
  2024-10-16  3:07   ` Reinette Chatre
  2024-10-09 17:39 ` [PATCH v8 07/25] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
                   ` (19 subsequent siblings)
  25 siblings, 2 replies; 124+ messages in thread
From: Babu Moger @ 2024-10-09 17:39 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Add the functionality to enable/disable AMD ABMC feature.

AMD ABMC feature is enabled by setting enabled bit(0) in MSR
L3_QOS_EXT_CFG. When the state of ABMC is changed, the MSR needs
to be updated on all the logical processors in the QOS Domain.

Hardware counters will reset when ABMC state is changed.

The ABMC feature details are documented in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v8:
  Commit message update and moved around the comments about L3_QOS_EXT_CFG
  to _resctrl_abmc_enable.

v7:
  Renamed the function
   resctrl_arch_get_abmc_enabled() to resctrl_arch_mbm_cntr_assign_enabled().

  Merged resctrl_arch_mbm_cntr_assign_disable, resctrl_arch_mbm_cntr_assign_disable
  and renamed to resctrl_arch_mbm_cntr_assign_set().

  Moved the function definition to linux/resctrl.h.

  Passed the struct rdt_resource to these functions.
  Removed resctrl_arch_reset_rmid_all() from arch code. This will be done
  from the caller.

v6: Renamed abmc_enabled to mbm_cntr_assign_enabled.
    Used msr_set_bit and msr_clear_bit for msr updates.
    Renamed resctrl_arch_abmc_enable() to resctrl_arch_mbm_cntr_assign_enable().
    Renamed resctrl_arch_abmc_disable() to resctrl_arch_mbm_cntr_assign_disable().
    Made _resctrl_abmc_enable to return void.

v5: Renamed resctrl_abmc_enable to resctrl_arch_abmc_enable.
    Renamed resctrl_abmc_disable to resctrl_arch_abmc_disable.
    Introduced resctrl_arch_get_abmc_enabled to get abmc state from
    non-arch code.
    Renamed resctrl_abmc_set_all to _resctrl_abmc_enable().
    Modified commit log to make it clear about AMD ABMC feature.

v3: No changes.

v2: Few text changes in commit message.
---
 arch/x86/include/asm/msr-index.h       |  1 +
 arch/x86/kernel/cpu/resctrl/core.c     |  5 ++++
 arch/x86/kernel/cpu/resctrl/internal.h |  5 ++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 36 ++++++++++++++++++++++++++
 include/linux/resctrl.h                |  3 +++
 5 files changed, 50 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 3ae84c3b8e6d..43c9dc473aba 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1195,6 +1195,7 @@
 #define MSR_IA32_MBA_BW_BASE		0xc0000200
 #define MSR_IA32_SMBA_BW_BASE		0xc0000280
 #define MSR_IA32_EVT_CFG_BASE		0xc0000400
+#define MSR_IA32_L3_QOS_EXT_CFG		0xc00003ff
 
 /* AMD-V MSRs */
 #define MSR_VM_CR                       0xc0010114
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 186d8047578b..49d147e2e4e5 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -405,6 +405,11 @@ void rdt_ctrl_update(void *arg)
 	hw_res->msr_update(m);
 }
 
+bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
+{
+	return resctrl_to_arch_res(r)->mbm_cntr_assign_enabled;
+}
+
 /*
  * rdt_find_domain - Search for a domain id in a resource domain list.
  *
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 2bd207624eec..a45ae410274c 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -56,6 +56,9 @@
 /* Max event bits supported */
 #define MAX_EVT_CONFIG_BITS		GENMASK(6, 0)
 
+/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
+#define ABMC_ENABLE_BIT			0
+
 /**
  * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
  *			        aren't marked nohz_full
@@ -477,6 +480,7 @@ struct rdt_parse_data {
  * @mbm_cfg_mask:	Bandwidth sources that can be tracked when Bandwidth
  *			Monitoring Event Configuration (BMEC) is supported.
  * @cdp_enabled:	CDP state of this resource
+ * @mbm_cntr_assign_enabled:	ABMC feature is enabled
  *
  * Members of this structure are either private to the architecture
  * e.g. mbm_width, or accessed via helpers that provide abstraction. e.g.
@@ -491,6 +495,7 @@ struct rdt_hw_resource {
 	unsigned int		mbm_width;
 	unsigned int		mbm_cfg_mask;
 	bool			cdp_enabled;
+	bool			mbm_cntr_assign_enabled;
 };
 
 static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 7e76f8d839fc..6bfa8312a4b2 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2402,6 +2402,42 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable)
 	return 0;
 }
 
+static void resctrl_abmc_set_one_amd(void *arg)
+{
+	bool *enable = arg;
+
+	if (*enable)
+		msr_set_bit(MSR_IA32_L3_QOS_EXT_CFG, ABMC_ENABLE_BIT);
+	else
+		msr_clear_bit(MSR_IA32_L3_QOS_EXT_CFG, ABMC_ENABLE_BIT);
+}
+
+/*
+ * Update L3_QOS_EXT_CFG MSR on all the CPUs associated with the monitor
+ * domain.
+ */
+static void _resctrl_abmc_enable(struct rdt_resource *r, bool enable)
+{
+	struct rdt_mon_domain *d;
+
+	list_for_each_entry(d, &r->mon_domains, hdr.list)
+		on_each_cpu_mask(&d->hdr.cpu_mask,
+				 resctrl_abmc_set_one_amd, &enable, 1);
+}
+
+int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable)
+{
+	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+
+	if (r->mon.mbm_cntr_assignable &&
+	    hw_res->mbm_cntr_assign_enabled != enable) {
+		_resctrl_abmc_enable(r, enable);
+		hw_res->mbm_cntr_assign_enabled = enable;
+	}
+
+	return 0;
+}
+
 /*
  * We don't allow rdtgroup directories to be created anywhere
  * except the root directory. Thus when looking for the rdtgroup
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 511cfce8fc21..f11d6fdfd977 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -355,4 +355,7 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *
 extern unsigned int resctrl_rmid_realloc_threshold;
 extern unsigned int resctrl_rmid_realloc_limit;
 
+int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable);
+bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r);
+
 #endif /* _RESCTRL_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v8 07/25] x86/resctrl: Introduce the interface to display monitor mode
  2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (5 preceding siblings ...)
  2024-10-09 17:39 ` [PATCH v8 06/25] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
@ 2024-10-09 17:39 ` Babu Moger
  2024-10-09 22:42   ` Tony Luck
  2024-10-16  3:12   ` Reinette Chatre
  2024-10-09 17:39 ` [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters Babu Moger
                   ` (18 subsequent siblings)
  25 siblings, 2 replies; 124+ messages in thread
From: Babu Moger @ 2024-10-09 17:39 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Introduce the interface file "mbm_assign_mode" to list monitor modes
supported.

The "mbm_cntr_assign" mode provides the option to assign a counter to
an RMID, event pair and monitor the bandwidth as long as it is assigned.

On AMD systems "mbm_cntr_assign" is backed by the ABMC (Assignable
Bandwidth Monitoring Counters) hardware feature and is enabled by default.

The "default" mode is the existing monitoring mode that works without the
explicit counter assignment, instead relying on dynamic counter assignment
by hardware that may result in hardware not dedicating a counter resulting
in monitoring data reads returning "Unavailable".

Provide an interface to display the monitor mode on the system.
$cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
[mbm_cntr_assign]
default

Switching the mbm_assign_mode will reset all the MBM counters of all
resctrl groups.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v8: Commit message update.

v7: Updated the descriptions/commit log in resctrl.rst to generic text.
    Thanks to James and Reinette.
    Rename mbm_mode to mbm_assign_mode.
    Introduced mutex lock in rdtgroup_mbm_mode_show().

v6: Added documentation for mbm_cntr_assign and legacy mode.
    Moved mbm_mode fflags initialization to static initialization.

v5: Changed interface name to mbm_mode.
    It will be always available even if ABMC feature is not supported.
    Added description in resctrl.rst about ABMC mode.
    Fixed display abmc and legacy consistantly.

v4: Fixed the checks for legacy and abmc mode. Default it ABMC.

v3: New patch to display ABMC capability.
---
 Documentation/arch/x86/resctrl.rst     | 34 ++++++++++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 31 +++++++++++++++++++++++
 2 files changed, 65 insertions(+)

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 30586728a4cd..e4a7d6e815f6 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -257,6 +257,40 @@ with the following files:
 	    # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
 	    0=0x30;1=0x30;3=0x15;4=0x15
 
+"mbm_assign_mode":
+	Reports the list of monitoring modes supported. The enclosed brackets
+	indicate which mode is enabled.
+	::
+
+	  cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
+	  [mbm_cntr_assign]
+	  default
+
+	"mbm_cntr_assign":
+
+	In mbm_cntr_assign mode user-space is able to specify which control
+	or monitor groups in resctrl should have a counter assigned using the
+	'mbm_assign_control' file. The number of counters available is described
+	in the 'num_mbm_cntrs' file. Changing the mode may cause all counters on
+	a resource to reset.
+
+	The mode is useful on platforms which support more control and monitor
+	groups than hardware counters, meaning 'unassigned' control or monitor
+	groups will report 'Unavailable' or count the traffic in an unpredictable
+	way.
+
+	AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature
+	enable this mode by default so that counters remain assigned even when the
+	corresponding RMID is not in use by any processor.
+
+	"default":
+
+	By default resctrl assumes each control and monitor group has a hardware
+	counter. Hardware that does not support 'mbm_cntr_assign' mode will still
+	allow more control or monitor groups than 'num_rmids' to be created. In
+	that case reading the mbm_total_bytes and mbm_local_bytes may report
+	'Unavailable' if there is no counter associated with that group.
+
 "max_threshold_occupancy":
 		Read/write file provides the largest value (in
 		bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 6bfa8312a4b2..895264c207c7 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -845,6 +845,30 @@ static int rdtgroup_rmid_show(struct kernfs_open_file *of,
 	return ret;
 }
 
+static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
+					 struct seq_file *s, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	mutex_lock(&rdtgroup_mutex);
+
+	if (r->mon.mbm_cntr_assignable) {
+		if (resctrl_arch_mbm_cntr_assign_enabled(r)) {
+			seq_puts(s, "[mbm_cntr_assign]\n");
+			seq_puts(s, "default\n");
+		} else {
+			seq_puts(s, "mbm_cntr_assign\n");
+			seq_puts(s, "[default]\n");
+		}
+	} else {
+		seq_puts(s, "[default]\n");
+	}
+
+	mutex_unlock(&rdtgroup_mutex);
+
+	return 0;
+}
+
 #ifdef CONFIG_PROC_CPU_RESCTRL
 
 /*
@@ -1901,6 +1925,13 @@ static struct rftype res_common_files[] = {
 		.seq_show	= mbm_local_bytes_config_show,
 		.write		= mbm_local_bytes_config_write,
 	},
+	{
+		.name		= "mbm_assign_mode",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdtgroup_mbm_assign_mode_show,
+		.fflags		= RFTYPE_MON_INFO,
+	},
 	{
 		.name		= "cpus",
 		.mode		= 0644,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (6 preceding siblings ...)
  2024-10-09 17:39 ` [PATCH v8 07/25] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
@ 2024-10-09 17:39 ` Babu Moger
  2024-10-09 22:49   ` Tony Luck
  2024-10-09 17:39 ` [PATCH v8 09/25] x86/resctrl: Add __init attribute to dom_data_init() Babu Moger
                   ` (17 subsequent siblings)
  25 siblings, 1 reply; 124+ messages in thread
From: Babu Moger @ 2024-10-09 17:39 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

The mbm_cntr_assign mode provides an option to the user to assign a
counter to an RMID, event pair and monitor the bandwidth as long as
the counter is assigned. Number of assignments depend on number of
monitoring counters available.

Provide the interface to display the number of monitoring counters
supported. The interface file 'num_mbm_cntrs' is available when an
architecture supports mbm_cntr_assign mode.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v8: Commit message update and documentation update.

v7: Minor commit log text changes.

v6: No changes.

v5: Changed the display name from num_cntrs to num_mbm_cntrs.
    Updated the commit message.
    Moved the patch after mbm_mode is introduced.

v4: Changed the counter name to num_cntrs. And few text changes.

v3: Changed the field name to mbm_assign_cntrs.

v2: Changed the field name to mbm_assignable_counters from abmc_counte
---
 Documentation/arch/x86/resctrl.rst     |  4 ++++
 arch/x86/kernel/cpu/resctrl/monitor.c  |  1 +
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 16 ++++++++++++++++
 3 files changed, 21 insertions(+)

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index e4a7d6e815f6..1b5c05a35793 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -291,6 +291,10 @@ with the following files:
 	that case reading the mbm_total_bytes and mbm_local_bytes may report
 	'Unavailable' if there is no counter associated with that group.
 
+"num_mbm_cntrs":
+	The number of monitoring counters available for assignment when the
+	architecture supports mbm_cntr_assign mode.
+
 "max_threshold_occupancy":
 		Read/write file provides the largest value (in
 		bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 2f3bf4529498..7aa579a99501 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1236,6 +1236,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 			r->mon.mbm_cntr_assignable = true;
 			cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
 			r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
+			resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
 		}
 	}
 
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 895264c207c7..c48b5450e6c2 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -869,6 +869,16 @@ static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
+				       struct seq_file *s, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	seq_printf(s, "%d\n", r->mon.num_mbm_cntrs);
+
+	return 0;
+}
+
 #ifdef CONFIG_PROC_CPU_RESCTRL
 
 /*
@@ -1940,6 +1950,12 @@ static struct rftype res_common_files[] = {
 		.seq_show	= rdtgroup_cpus_show,
 		.fflags		= RFTYPE_BASE,
 	},
+	{
+		.name		= "num_mbm_cntrs",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdtgroup_num_mbm_cntrs_show,
+	},
 	{
 		.name		= "cpus_list",
 		.mode		= 0644,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v8 09/25] x86/resctrl: Add __init attribute to dom_data_init()
  2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (7 preceding siblings ...)
  2024-10-09 17:39 ` [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters Babu Moger
@ 2024-10-09 17:39 ` Babu Moger
  2024-10-16  3:13   ` Reinette Chatre
  2024-10-09 17:39 ` [PATCH v8 10/25] x86/resctrl: Introduce bitmap mbm_cntr_free_map to track assignable counters Babu Moger
                   ` (16 subsequent siblings)
  25 siblings, 1 reply; 124+ messages in thread
From: Babu Moger @ 2024-10-09 17:39 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

dom_data_init() is only called during the __init sequence.
Add __init attribute like the rest of call sequence.

While at it, pass 'struct rdt_resource' to dom_data_init() and
dom_data_exit() which will be used for mbm counter __init and__exit
call sequence.

Fixes: bd334c86b5d7 ("x86/resctrl: Add __init attribute to rdt_get_mon_l3_config()")
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v8: New patch.
---
 arch/x86/kernel/cpu/resctrl/core.c     | 2 +-
 arch/x86/kernel/cpu/resctrl/internal.h | 2 +-
 arch/x86/kernel/cpu/resctrl/monitor.c  | 8 ++++----
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 49d147e2e4e5..00ad00258df2 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -1140,7 +1140,7 @@ static void __exit resctrl_exit(void)
 	rdtgroup_exit();
 
 	if (r->mon_capable)
-		rdt_put_mon_l3_config();
+		rdt_put_mon_l3_config(r);
 }
 
 __exitcall(resctrl_exit);
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index a45ae410274c..92eae4672312 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -633,7 +633,7 @@ void closid_free(int closid);
 int alloc_rmid(u32 closid);
 void free_rmid(u32 closid, u32 rmid);
 int rdt_get_mon_l3_config(struct rdt_resource *r);
-void __exit rdt_put_mon_l3_config(void);
+void __exit rdt_put_mon_l3_config(struct rdt_resource *r);
 bool __init rdt_cpu_has(int flag);
 void mon_event_count(void *info);
 int rdtgroup_mondata_show(struct seq_file *m, void *arg);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 7aa579a99501..66b06574f660 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -983,7 +983,7 @@ void mbm_setup_overflow_handler(struct rdt_mon_domain *dom, unsigned long delay_
 		schedule_delayed_work_on(cpu, &dom->mbm_over, delay);
 }
 
-static int dom_data_init(struct rdt_resource *r)
+static __init int dom_data_init(struct rdt_resource *r)
 {
 	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
 	u32 num_closid = resctrl_arch_get_num_closid(r);
@@ -1044,7 +1044,7 @@ static int dom_data_init(struct rdt_resource *r)
 	return err;
 }
 
-static void __exit dom_data_exit(void)
+static void __exit dom_data_exit(struct rdt_resource *r)
 {
 	mutex_lock(&rdtgroup_mutex);
 
@@ -1247,9 +1247,9 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 	return 0;
 }
 
-void __exit rdt_put_mon_l3_config(void)
+void __exit rdt_put_mon_l3_config(struct rdt_resource *r)
 {
-	dom_data_exit();
+	dom_data_exit(r);
 }
 
 void __init intel_rdt_mbm_apply_quirk(void)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v8 10/25] x86/resctrl: Introduce bitmap mbm_cntr_free_map to track assignable counters
  2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (8 preceding siblings ...)
  2024-10-09 17:39 ` [PATCH v8 09/25] x86/resctrl: Add __init attribute to dom_data_init() Babu Moger
@ 2024-10-09 17:39 ` Babu Moger
  2024-10-16  3:14   ` Reinette Chatre
  2024-10-09 17:39 ` [PATCH v8 11/25] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg in struct rdt_hw_mon_domain Babu Moger
                   ` (15 subsequent siblings)
  25 siblings, 1 reply; 124+ messages in thread
From: Babu Moger @ 2024-10-09 17:39 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hardware provides a set of counters when mbm_assign_mode is supported.
These counters are assigned to the MBM monitoring events of a MON group
that needs to be tracked. The kernel must manage and track the available
counters.

Introduce mbm_cntr_free_map bitmap to track available counters and set
of routines to allocate and free the counters. Move dom_data_init() after
mbm_cntr_assign detection.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v8: Moved the init and exit functionality inside dom_data_init()
    and dom_data_exit() respectively.

v7: Removed the static allocation and now allocating bitmap mbm_cntr_free_map
    dynamically.
    Passed the struct rdt_resource mbm_cntr_alloc and mbm_cntr_free.
    Removed the reference of ABMC and changed it mbm_cntr_assign.
    Few other text changes.

v6: Removed the variable mbm_cntrs_free_map_len. This is not required.
    Removed the call mbm_cntrs_init() in arch code. This needs to be
    done at higher level.
    Used DECLARE_BITMAP to initialize mbm_cntrs_free_map.
    Moved all the counter interfaces mbm_cntr_alloc() and mbm_cntr_free()
    in here as part of separating arch and fs bits.

v5:
   Updated the comments and commit log.
   Few renames
    num_cntrs_free_map -> mbm_cntrs_free_map
    num_cntrs_init -> mbm_cntrs_init
    Added initialization in rdt_get_tree because the default ABMC
    enablement happens during the init.

v4: Changed the name to num_cntrs where applicable.
     Used bitmap apis.
     Added more comments for the globals.

v3: Changed the bitmap name to assign_cntrs_free_map. Removed abmc
     from the name.

v2: Changed the bitmap name to assignable_counter_free_map from
     abmc_counter_free_map.
---
 arch/x86/kernel/cpu/resctrl/internal.h |  2 ++
 arch/x86/kernel/cpu/resctrl/monitor.c  | 43 +++++++++++++++++++++++---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 19 ++++++++++++
 include/linux/resctrl.h                |  2 ++
 4 files changed, 62 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 92eae4672312..99f9103a35ba 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -654,6 +654,8 @@ void __check_limbo(struct rdt_mon_domain *d, bool force_free);
 void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
 void __init resctrl_file_fflags_init(const char *config,
 				     unsigned long fflags);
+int mbm_cntr_alloc(struct rdt_resource *r);
+void mbm_cntr_free(struct rdt_resource *r, u32 cntr_id);
 void rdt_staged_configs_clear(void);
 bool closid_allocated(unsigned int closid);
 int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 66b06574f660..5c2a28565747 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -983,6 +983,27 @@ void mbm_setup_overflow_handler(struct rdt_mon_domain *dom, unsigned long delay_
 		schedule_delayed_work_on(cpu, &dom->mbm_over, delay);
 }
 
+/*
+ * Counter bitmap for tracking the available counters.
+ * 'mbm_cntr_assign' mode provides set of hardware counters for assigning
+ * RMID, event pair. Each RMID and event pair takes one hardware counter.
+ */
+static __init unsigned long *mbm_cntrs_init(struct rdt_resource *r)
+{
+	r->mon.mbm_cntr_free_map = bitmap_zalloc(r->mon.num_mbm_cntrs,
+						 GFP_KERNEL);
+	if (r->mon.mbm_cntr_free_map)
+		bitmap_fill(r->mon.mbm_cntr_free_map, r->mon.num_mbm_cntrs);
+
+	return r->mon.mbm_cntr_free_map;
+}
+
+static  __exit void mbm_cntrs_exit(struct rdt_resource *r)
+{
+	bitmap_free(r->mon.mbm_cntr_free_map);
+	r->mon.mbm_cntr_free_map = NULL;
+}
+
 static __init int dom_data_init(struct rdt_resource *r)
 {
 	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
@@ -1020,6 +1041,17 @@ static __init int dom_data_init(struct rdt_resource *r)
 		goto out_unlock;
 	}
 
+	if (r->mon.mbm_cntr_assignable && !mbm_cntrs_init(r)) {
+		if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
+			kfree(closid_num_dirty_rmid);
+			closid_num_dirty_rmid = NULL;
+		}
+		kfree(rmid_ptrs);
+		rmid_ptrs = NULL;
+		err = -ENOMEM;
+		goto out_unlock;
+	}
+
 	for (i = 0; i < idx_limit; i++) {
 		entry = &rmid_ptrs[i];
 		INIT_LIST_HEAD(&entry->list);
@@ -1056,6 +1088,9 @@ static void __exit dom_data_exit(struct rdt_resource *r)
 	kfree(rmid_ptrs);
 	rmid_ptrs = NULL;
 
+	if (r->mon.mbm_cntr_assignable)
+		mbm_cntrs_exit(r);
+
 	mutex_unlock(&rdtgroup_mutex);
 }
 
@@ -1210,10 +1245,6 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 	 */
 	resctrl_rmid_realloc_threshold = resctrl_arch_round_mon_val(threshold);
 
-	ret = dom_data_init(r);
-	if (ret)
-		return ret;
-
 	if (rdt_cpu_has(X86_FEATURE_BMEC)) {
 		u32 eax, ebx, ecx, edx;
 
@@ -1240,6 +1271,10 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 		}
 	}
 
+	ret = dom_data_init(r);
+	if (ret)
+		return ret;
+
 	l3_mon_evt_init(r);
 
 	r->mon_capable = true;
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index c48b5450e6c2..8ffebd203c31 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -185,6 +185,25 @@ bool closid_allocated(unsigned int closid)
 	return !test_bit(closid, &closid_free_map);
 }
 
+int mbm_cntr_alloc(struct rdt_resource *r)
+{
+	int cntr_id;
+
+	cntr_id = find_first_bit(r->mon.mbm_cntr_free_map,
+				 r->mon.num_mbm_cntrs);
+	if (cntr_id >= r->mon.num_mbm_cntrs)
+		return -ENOSPC;
+
+	__clear_bit(cntr_id, r->mon.mbm_cntr_free_map);
+
+	return cntr_id;
+}
+
+void mbm_cntr_free(struct rdt_resource *r, u32 cntr_id)
+{
+	__set_bit(cntr_id, r->mon.mbm_cntr_free_map);
+}
+
 /**
  * rdtgroup_mode_by_closid - Return mode of resource group with closid
  * @closid: closid if the resource group
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index f11d6fdfd977..5a4d6adec974 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -187,12 +187,14 @@ enum resctrl_scope {
  * @num_rmid:		Number of RMIDs available
  * @num_mbm_cntrs:	Number of assignable monitoring counters
  * @mbm_cntr_assignable:Is system capable of supporting monitor assignment?
+ * @mbm_cntr_free_map:	bitmap of free MBM counters
  * @evt_list:		List of monitoring events
  */
 struct resctrl_mon {
 	int			num_rmid;
 	int			num_mbm_cntrs;
 	bool			mbm_cntr_assignable;
+	unsigned long		*mbm_cntr_free_map;
 	struct list_head	evt_list;
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v8 11/25] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg in struct rdt_hw_mon_domain
  2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (9 preceding siblings ...)
  2024-10-09 17:39 ` [PATCH v8 10/25] x86/resctrl: Introduce bitmap mbm_cntr_free_map to track assignable counters Babu Moger
@ 2024-10-09 17:39 ` Babu Moger
  2024-10-16  3:15   ` Reinette Chatre
  2024-10-09 17:39 ` [PATCH v8 12/25] x86/resctrl: Remove MSR reading of event configuration value Babu Moger
                   ` (14 subsequent siblings)
  25 siblings, 1 reply; 124+ messages in thread
From: Babu Moger @ 2024-10-09 17:39 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

If the BMEC (Bandwidth Monitoring Event Configuration) feature is
supported, the bandwidth events can be configured to track specific
events. The event configuration is domain specific. ABMC (Assignable
Bandwidth Monitoring Counters) feature needs event configuration
information to assign a hardware counter to an RMID. Event configurations
are not stored in resctrl but instead always read from or written to
hardware directly when prompted by user space.

Read the event configuration from the hardware during the domain
initialization. Save the configuration value in struct rdt_hw_mon_domain,
so it can be used for counter assignment.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v8: Renamed resctrl_mbm_evt_config_init() to arch_mbm_evt_config_init()
    Minor commit message update.

v7: Fixed initializing INVALID_CONFIG_VALUE to mbm_local_cfg in case of error.

v6: Renamed resctrl_arch_mbm_evt_config -> resctrl_mbm_evt_config_init
    Initialized value to INVALID_CONFIG_VALUE if it is not configurable.
    Minor commit message update.

v5: Exported mon_event_config_index_get.
    Renamed arch_domain_mbm_evt_config to resctrl_arch_mbm_evt_config.

v4: Read the configuration information from the hardware to initialize.
    Added few commit messages.
    Fixed the tab spaces.

v3: Minor changes related to rebase in mbm_config_write_domain.

v2: No changes.
---
 arch/x86/kernel/cpu/resctrl/core.c     |  2 ++
 arch/x86/kernel/cpu/resctrl/internal.h |  9 +++++++++
 arch/x86/kernel/cpu/resctrl/monitor.c  | 26 ++++++++++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |  4 +---
 4 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 00ad00258df2..a4f88c327b40 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -632,6 +632,8 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
 
 	arch_mon_domain_online(r, d);
 
+	arch_mbm_evt_config_init(hw_dom);
+
 	if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
 		mon_domain_free(hw_dom);
 		return;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 99f9103a35ba..86e3e188c119 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -56,6 +56,9 @@
 /* Max event bits supported */
 #define MAX_EVT_CONFIG_BITS		GENMASK(6, 0)
 
+#define INVALID_CONFIG_VALUE		U32_MAX
+#define INVALID_CONFIG_INDEX		UINT_MAX
+
 /* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
 #define ABMC_ENABLE_BIT			0
 
@@ -401,6 +404,8 @@ struct rdt_hw_ctrl_domain {
  * @d_resctrl:	Properties exposed to the resctrl file system
  * @arch_mbm_total:	arch private state for MBM total bandwidth
  * @arch_mbm_local:	arch private state for MBM local bandwidth
+ * @mbm_total_cfg:	MBM total bandwidth configuration
+ * @mbm_local_cfg:	MBM local bandwidth configuration
  *
  * Members of this structure are accessed via helpers that provide abstraction.
  */
@@ -408,6 +413,8 @@ struct rdt_hw_mon_domain {
 	struct rdt_mon_domain		d_resctrl;
 	struct arch_mbm_state		*arch_mbm_total;
 	struct arch_mbm_state		*arch_mbm_local;
+	u32				mbm_total_cfg;
+	u32				mbm_local_cfg;
 };
 
 static inline struct rdt_hw_ctrl_domain *resctrl_to_arch_ctrl_dom(struct rdt_ctrl_domain *r)
@@ -656,6 +663,8 @@ void __init resctrl_file_fflags_init(const char *config,
 				     unsigned long fflags);
 int mbm_cntr_alloc(struct rdt_resource *r);
 void mbm_cntr_free(struct rdt_resource *r, u32 cntr_id);
+void arch_mbm_evt_config_init(struct rdt_hw_mon_domain *hw_dom);
+unsigned int mon_event_config_index_get(u32 evtid);
 void rdt_staged_configs_clear(void);
 bool closid_allocated(unsigned int closid);
 int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 5c2a28565747..6b4cf4813a4b 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1282,6 +1282,32 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 	return 0;
 }
 
+void arch_mbm_evt_config_init(struct rdt_hw_mon_domain *hw_dom)
+{
+	unsigned int index;
+	u64 msrval;
+
+	/*
+	 * Read the configuration registers QOS_EVT_CFG_n, where <n> is
+	 * the BMEC event number (EvtID).
+	 */
+	if (mbm_total_event.configurable) {
+		index = mon_event_config_index_get(QOS_L3_MBM_TOTAL_EVENT_ID);
+		rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval);
+		hw_dom->mbm_total_cfg = msrval & MAX_EVT_CONFIG_BITS;
+	} else {
+		hw_dom->mbm_total_cfg = INVALID_CONFIG_VALUE;
+	}
+
+	if (mbm_local_event.configurable) {
+		index = mon_event_config_index_get(QOS_L3_MBM_LOCAL_EVENT_ID);
+		rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval);
+		hw_dom->mbm_local_cfg = msrval & MAX_EVT_CONFIG_BITS;
+	} else {
+		hw_dom->mbm_local_cfg = INVALID_CONFIG_VALUE;
+	}
+}
+
 void __exit rdt_put_mon_l3_config(struct rdt_resource *r)
 {
 	dom_data_exit(r);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 8ffebd203c31..91ffd9d24883 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1601,8 +1601,6 @@ struct mon_config_info {
 	u32 mon_config;
 };
 
-#define INVALID_CONFIG_INDEX   UINT_MAX
-
 /**
  * mon_event_config_index_get - get the hardware index for the
  *                              configurable event
@@ -1612,7 +1610,7 @@ struct mon_config_info {
  *         1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
  *         INVALID_CONFIG_INDEX for invalid evtid
  */
-static inline unsigned int mon_event_config_index_get(u32 evtid)
+unsigned int mon_event_config_index_get(u32 evtid)
 {
 	switch (evtid) {
 	case QOS_L3_MBM_TOTAL_EVENT_ID:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v8 12/25] x86/resctrl: Remove MSR reading of event configuration value
  2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (10 preceding siblings ...)
  2024-10-09 17:39 ` [PATCH v8 11/25] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg in struct rdt_hw_mon_domain Babu Moger
@ 2024-10-09 17:39 ` Babu Moger
  2024-10-16  3:16   ` Reinette Chatre
  2024-10-09 17:39 ` [PATCH v8 13/25] x86/resctrl: Introduce mbm_cntr_map to track assignable counters at domain Babu Moger
                   ` (13 subsequent siblings)
  25 siblings, 1 reply; 124+ messages in thread
From: Babu Moger @ 2024-10-09 17:39 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

The event configuration is domain specific and initialized during domain
initialization. The values are stored in struct rdt_hw_mon_domain.

It is not required to read the configuration register every time user asks
for it. Use the value stored in struct rdt_hw_mon_domain instead.

Introduce resctrl_arch_mon_event_config_get() and
resctrl_arch_mon_event_config_set() to get/set architecture domain specific
mbm_total_cfg/mbm_local_cfg values.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v8: Renamed
    resctrl_arch_event_config_get() to resctrl_arch_mon_event_config_get().
    resctrl_arch_event_config_set() to resctrl_arch_mon_event_config_set().

v7: Removed check if (val == INVALID_CONFIG_VALUE) as resctrl_arch_event_config_get
    already prints warning.
    Kept the Event config value definitions as is.

v6: Fixed inconstancy with types. Made all the types to u32 for config
    value.
    Removed few rdt_last_cmd_puts as it is not necessary.
    Removed unused config value definitions.
    Few more updates to commit message.

v5: Introduced resctrl_arch_event_config_get and
    resctrl_arch_event_config_get() based on our discussion.
    https://lore.kernel.org/lkml/68e861f9-245d-4496-a72e-46fc57d19c62@amd.com/

v4: New patch.
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 103 ++++++++++++++-----------
 include/linux/resctrl.h                |   4 +
 2 files changed, 62 insertions(+), 45 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 91ffd9d24883..ba90e520150f 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1597,10 +1597,57 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
 }
 
 struct mon_config_info {
+	struct rdt_mon_domain *d;
 	u32 evtid;
 	u32 mon_config;
 };
 
+u32 resctrl_arch_mon_event_config_get(struct rdt_mon_domain *d,
+				      enum resctrl_event_id eventid)
+{
+	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
+
+	switch (eventid) {
+	case QOS_L3_OCCUP_EVENT_ID:
+		break;
+	case QOS_L3_MBM_TOTAL_EVENT_ID:
+		return hw_dom->mbm_total_cfg;
+	case QOS_L3_MBM_LOCAL_EVENT_ID:
+		return hw_dom->mbm_local_cfg;
+	}
+
+	/* Never expect to get here */
+	WARN_ON_ONCE(1);
+
+	return INVALID_CONFIG_VALUE;
+}
+
+void resctrl_arch_mon_event_config_set(void *info)
+{
+	struct mon_config_info *mon_info = info;
+	struct rdt_hw_mon_domain *hw_dom;
+	unsigned int index;
+
+	index = mon_event_config_index_get(mon_info->evtid);
+	if (index == INVALID_CONFIG_INDEX)
+		return;
+
+	wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
+
+	hw_dom = resctrl_to_arch_mon_dom(mon_info->d);
+
+	switch (mon_info->evtid) {
+	case QOS_L3_OCCUP_EVENT_ID:
+		break;
+	case QOS_L3_MBM_TOTAL_EVENT_ID:
+		hw_dom->mbm_total_cfg = mon_info->mon_config;
+		break;
+	case QOS_L3_MBM_LOCAL_EVENT_ID:
+		hw_dom->mbm_local_cfg =  mon_info->mon_config;
+		break;
+	}
+}
+
 /**
  * mon_event_config_index_get - get the hardware index for the
  *                              configurable event
@@ -1623,33 +1670,11 @@ unsigned int mon_event_config_index_get(u32 evtid)
 	}
 }
 
-static void mon_event_config_read(void *info)
-{
-	struct mon_config_info *mon_info = info;
-	unsigned int index;
-	u64 msrval;
-
-	index = mon_event_config_index_get(mon_info->evtid);
-	if (index == INVALID_CONFIG_INDEX) {
-		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
-		return;
-	}
-	rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval);
-
-	/* Report only the valid event configuration bits */
-	mon_info->mon_config = msrval & MAX_EVT_CONFIG_BITS;
-}
-
-static void mondata_config_read(struct rdt_mon_domain *d, struct mon_config_info *mon_info)
-{
-	smp_call_function_any(&d->hdr.cpu_mask, mon_event_config_read, mon_info, 1);
-}
-
 static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
 {
-	struct mon_config_info mon_info = {0};
 	struct rdt_mon_domain *dom;
 	bool sep = false;
+	u32 val;
 
 	cpus_read_lock();
 	mutex_lock(&rdtgroup_mutex);
@@ -1658,11 +1683,8 @@ static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid
 		if (sep)
 			seq_puts(s, ";");
 
-		memset(&mon_info, 0, sizeof(struct mon_config_info));
-		mon_info.evtid = evtid;
-		mondata_config_read(dom, &mon_info);
-
-		seq_printf(s, "%d=0x%02x", dom->hdr.id, mon_info.mon_config);
+		val = resctrl_arch_mon_event_config_get(dom, evtid);
+		seq_printf(s, "%d=0x%02x", dom->hdr.id, val);
 		sep = true;
 	}
 	seq_puts(s, "\n");
@@ -1693,33 +1715,23 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
 	return 0;
 }
 
-static void mon_event_config_write(void *info)
-{
-	struct mon_config_info *mon_info = info;
-	unsigned int index;
-
-	index = mon_event_config_index_get(mon_info->evtid);
-	if (index == INVALID_CONFIG_INDEX) {
-		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
-		return;
-	}
-	wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
-}
 
 static void mbm_config_write_domain(struct rdt_resource *r,
 				    struct rdt_mon_domain *d, u32 evtid, u32 val)
 {
 	struct mon_config_info mon_info = {0};
+	u32 config_val;
 
 	/*
-	 * Read the current config value first. If both are the same then
+	 * Check the current config value first. If both are the same then
 	 * no need to write it again.
 	 */
-	mon_info.evtid = evtid;
-	mondata_config_read(d, &mon_info);
-	if (mon_info.mon_config == val)
+	config_val = resctrl_arch_mon_event_config_get(d, evtid);
+	if (config_val == INVALID_CONFIG_VALUE || config_val == val)
 		return;
 
+	mon_info.d = d;
+	mon_info.evtid = evtid;
 	mon_info.mon_config = val;
 
 	/*
@@ -1728,7 +1740,8 @@ static void mbm_config_write_domain(struct rdt_resource *r,
 	 * are scoped at the domain level. Writing any of these MSRs
 	 * on one CPU is observed by all the CPUs in the domain.
 	 */
-	smp_call_function_any(&d->hdr.cpu_mask, mon_event_config_write,
+	smp_call_function_any(&d->hdr.cpu_mask,
+			      resctrl_arch_mon_event_config_set,
 			      &mon_info, 1);
 
 	/*
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 5a4d6adec974..54eacc8a9d49 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -354,6 +354,10 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
  */
 void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d);
 
+void resctrl_arch_mon_event_config_set(void *info);
+u32 resctrl_arch_mon_event_config_get(struct rdt_mon_domain *d,
+				      enum resctrl_event_id eventid);
+
 extern unsigned int resctrl_rmid_realloc_threshold;
 extern unsigned int resctrl_rmid_realloc_limit;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v8 13/25] x86/resctrl: Introduce mbm_cntr_map to track assignable counters at domain
  2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (11 preceding siblings ...)
  2024-10-09 17:39 ` [PATCH v8 12/25] x86/resctrl: Remove MSR reading of event configuration value Babu Moger
@ 2024-10-09 17:39 ` Babu Moger
  2024-10-16  3:19   ` Reinette Chatre
  2024-10-09 17:39 ` [PATCH v8 14/25] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
                   ` (12 subsequent siblings)
  25 siblings, 1 reply; 124+ messages in thread
From: Babu Moger @ 2024-10-09 17:39 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

The MBM counters are allocated globally and assigned to an RMID, event pair
in a resctrl group. It is tracked by mbm_cntr_free_map. Counters are
assigned to the domain based on the user input. It needs to be tracked
at domain level also.

Add the mbm_cntr_map bitmap in struct rdt_mon_domain to keep track of
assignment at domain level. The global counter at mbm_cntr_free_map can
be released when assignment at all the domains are cleared.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v8: Minor commit message changes.

v7: Added check mbm_cntr_assignable for allocating bitmap mbm_cntr_map

v6: New patch to add domain level assignment.
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 10 ++++++++++
 include/linux/resctrl.h                |  2 ++
 2 files changed, 12 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index ba90e520150f..610eae64b13a 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -4093,6 +4093,7 @@ static void __init rdtgroup_setup_default(void)
 
 static void domain_destroy_mon_state(struct rdt_mon_domain *d)
 {
+	bitmap_free(d->mbm_cntr_map);
 	bitmap_free(d->rmid_busy_llc);
 	kfree(d->mbm_total);
 	kfree(d->mbm_local);
@@ -4166,6 +4167,15 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain
 			return -ENOMEM;
 		}
 	}
+	if (is_mbm_enabled() && r->mon.mbm_cntr_assignable) {
+		d->mbm_cntr_map = bitmap_zalloc(r->mon.num_mbm_cntrs, GFP_KERNEL);
+		if (!d->mbm_cntr_map) {
+			bitmap_free(d->rmid_busy_llc);
+			kfree(d->mbm_total);
+			kfree(d->mbm_local);
+			return -ENOMEM;
+		}
+	}
 
 	return 0;
 }
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 54eacc8a9d49..329fe23474ff 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -105,6 +105,7 @@ struct rdt_ctrl_domain {
  * @cqm_limbo:		worker to periodically read CQM h/w counters
  * @mbm_work_cpu:	worker CPU for MBM h/w counters
  * @cqm_work_cpu:	worker CPU for CQM h/w counters
+ * @mbm_cntr_map:	bitmap to track domain counter assignment
  */
 struct rdt_mon_domain {
 	struct rdt_domain_hdr		hdr;
@@ -116,6 +117,7 @@ struct rdt_mon_domain {
 	struct delayed_work		cqm_limbo;
 	int				mbm_work_cpu;
 	int				cqm_work_cpu;
+	unsigned long			*mbm_cntr_map;
 };
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v8 14/25] x86/resctrl: Add data structures and definitions for ABMC assignment
  2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (12 preceding siblings ...)
  2024-10-09 17:39 ` [PATCH v8 13/25] x86/resctrl: Introduce mbm_cntr_map to track assignable counters at domain Babu Moger
@ 2024-10-09 17:39 ` Babu Moger
  2024-10-16  3:21   ` Reinette Chatre
  2024-10-09 17:39 ` [PATCH v8 15/25] x86/resctrl: Introduce cntr_id in mongroup for assignments Babu Moger
                   ` (11 subsequent siblings)
  25 siblings, 1 reply; 124+ messages in thread
From: Babu Moger @ 2024-10-09 17:39 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

The ABMC feature provides an option to the user to assign a hardware
counter to an RMID, event pair and monitor the bandwidth as long as the
counter is assigned. The bandwidth events will be tracked by the hardware
until the user changes the configuration. Each resctrl group can configure
maximum two counters, one for total event and one for local event.

The ABMC feature implements an MSR L3_QOS_ABMC_CFG (C000_03FDh).
Configuration is done by setting the counter id, bandwidth source (RMID)
and bandwidth configuration supported by BMEC (Bandwidth Monitoring Event
Configuration).

Attempts to read or write the MSR when ABMC is not enabled will result
in a #GP(0) exception.

Introduce the data structures and definitions for MSR L3_QOS_ABMC_CFG
(0xC000_03FDh):
=========================================================================
Bits 	Mnemonic	Description			Access Reset
							Type   Value
=========================================================================
63 	CfgEn 		Configuration Enable 		R/W 	0

62 	CtrEn 		Enable/disable counting		R/W 	0

61:53 	– 		Reserved 			MBZ 	0

52:48 	CtrID 		Counter Identifier		R/W	0

47 	IsCOS		BwSrc field is a CLOSID		R/W	0
			(not an RMID)

46:44 	–		Reserved			MBZ	0

43:32	BwSrc		Bandwidth Source		R/W	0
			(RMID or CLOSID)

31:0	BwType		Bandwidth configuration		R/W	0
			to track for this counter
==========================================================================

The feature details are documented in the APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v8: Update the configuration notes in kernel_doc.
    Few commit message update.

v7: Removed the reference of L3_QOS_ABMC_DSC as it is not used anymore.
    Moved the configuration notes to kernel_doc.
    Adjusted the tabs for l3_qos_abmc_cfg and checkpatch seems happy.

v6: Removed all the fs related changes.
    Added note on CfgEn,CtrEn.
    Removed the definitions which are not used.
    Removed cntr_id initialization.

v5: Moved assignment flags here (path 10/19 of v4).
    Added MON_CNTR_UNSET definition to initialize cntr_id's.
    More details in commit log.
    Renamed few fields in l3_qos_abmc_cfg for readability.

v4: Added more descriptions.
    Changed the name abmc_ctr_id to ctr_id.
    Added L3_QOS_ABMC_DSC. Used for reading the configuration.

v3: No changes.

v2: No changes.
---
 arch/x86/include/asm/msr-index.h       |  1 +
 arch/x86/kernel/cpu/resctrl/internal.h | 33 ++++++++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 43c9dc473aba..2c281c977342 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1196,6 +1196,7 @@
 #define MSR_IA32_SMBA_BW_BASE		0xc0000280
 #define MSR_IA32_EVT_CFG_BASE		0xc0000400
 #define MSR_IA32_L3_QOS_EXT_CFG		0xc00003ff
+#define MSR_IA32_L3_QOS_ABMC_CFG	0xc00003fd
 
 /* AMD-V MSRs */
 #define MSR_VM_CR                       0xc0010114
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 86e3e188c119..de397468b945 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -602,6 +602,39 @@ union cpuid_0x10_x_edx {
 	unsigned int full;
 };
 
+/*
+ * ABMC counters can be configured by writing to L3_QOS_ABMC_CFG.
+ * Reading L3_QOS_ABMC_DSC returns the configuration of the counter id
+ * specified in L3_QOS_ABMC_CFG.cntr_id.
+ * @bw_type		: Bandwidth configuration(supported by BMEC)
+ *			  tracked by the @cntr_id.
+ * @bw_src		: Bandwidth source (RMID or CLOSID).
+ * @reserved1		: Reserved.
+ * @is_clos		: @bw_src field is a CLOSID (not an RMID).
+ * @cntr_id		: Counter identifier.
+ * @reserved		: Reserved.
+ * @cntr_en		: Counting enable bit.
+ * @cfg_en		: Configuration enable bit.
+ *
+ * Configuration and counting:
+ * cfg_en=0,            : No configuration changes applied.
+ * cfg_en=1, cntr_en=0  : Configure cntr_id and but no counting the events.
+ * cfg_en=1, cntr_en=1  : Configure cntr_id and start counting the events.
+ */
+union l3_qos_abmc_cfg {
+	struct {
+		unsigned long bw_type  :32,
+			      bw_src   :12,
+			      reserved1: 3,
+			      is_clos  : 1,
+			      cntr_id  : 5,
+			      reserved : 9,
+			      cntr_en  : 1,
+			      cfg_en   : 1;
+	} split;
+	unsigned long full;
+};
+
 void rdt_last_cmd_clear(void);
 void rdt_last_cmd_puts(const char *s);
 __printf(1, 2)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v8 15/25] x86/resctrl: Introduce cntr_id in mongroup for assignments
  2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (13 preceding siblings ...)
  2024-10-09 17:39 ` [PATCH v8 14/25] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
@ 2024-10-09 17:39 ` Babu Moger
  2024-10-16  3:22   ` Reinette Chatre
  2024-10-09 17:39 ` [PATCH v8 16/25] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC Babu Moger
                   ` (10 subsequent siblings)
  25 siblings, 1 reply; 124+ messages in thread
From: Babu Moger @ 2024-10-09 17:39 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

mbm_cntr_assign feature provides an option to the user to assign a counter
to an RMID, event pair and monitor the bandwidth as long as the counter is
assigned. There can be two counters per monitor group, one for MBM total
event and another for MBM local event.

Introduce cntr_id to manage the assignments.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v8: Minor commit message update.

v7: Minor comment update for cntr_id.

v6: New patch.
    Separated FS and arch bits.
---
 arch/x86/kernel/cpu/resctrl/internal.h | 7 +++++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 6 ++++++
 2 files changed, 13 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index de397468b945..58298db9034f 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -62,6 +62,11 @@
 /* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
 #define ABMC_ENABLE_BIT			0
 
+/* Maximum assignable counters per resctrl group */
+#define MAX_CNTRS			2
+
+#define MON_CNTR_UNSET			U32_MAX
+
 /**
  * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
  *			        aren't marked nohz_full
@@ -231,12 +236,14 @@ enum rdtgrp_mode {
  * @parent:			parent rdtgrp
  * @crdtgrp_list:		child rdtgroup node list
  * @rmid:			rmid for this rdtgroup
+ * @cntr_id:			IDs of hardware counters assigned to monitor group
  */
 struct mongroup {
 	struct kernfs_node	*mon_data_kn;
 	struct rdtgroup		*parent;
 	struct list_head	crdtgrp_list;
 	u32			rmid;
+	u32			cntr_id[MAX_CNTRS];
 };
 
 /**
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 610eae64b13a..03b670b95c49 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -3530,6 +3530,9 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
 	}
 	rdtgrp->mon.rmid = ret;
 
+	rdtgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
+	rdtgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
+
 	ret = mkdir_mondata_all(rdtgrp->kn, rdtgrp, &rdtgrp->mon.mon_data_kn);
 	if (ret) {
 		rdt_last_cmd_puts("kernfs subdir error\n");
@@ -4084,6 +4087,9 @@ static void __init rdtgroup_setup_default(void)
 	rdtgroup_default.closid = RESCTRL_RESERVED_CLOSID;
 	rdtgroup_default.mon.rmid = RESCTRL_RESERVED_RMID;
 	rdtgroup_default.type = RDTCTRL_GROUP;
+	rdtgroup_default.mon.cntr_id[0] = MON_CNTR_UNSET;
+	rdtgroup_default.mon.cntr_id[1] = MON_CNTR_UNSET;
+
 	INIT_LIST_HEAD(&rdtgroup_default.mon.crdtgrp_list);
 
 	list_add(&rdtgroup_default.rdtgroup_list, &rdt_all_groups);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v8 16/25] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
  2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (14 preceding siblings ...)
  2024-10-09 17:39 ` [PATCH v8 15/25] x86/resctrl: Introduce cntr_id in mongroup for assignments Babu Moger
@ 2024-10-09 17:39 ` Babu Moger
  2024-10-16  3:23   ` Reinette Chatre
  2024-10-09 17:39 ` [PATCH v8 17/25] x86/resctrl: Add the interface to assign/update counter assignment Babu Moger
                   ` (9 subsequent siblings)
  25 siblings, 1 reply; 124+ messages in thread
From: Babu Moger @ 2024-10-09 17:39 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

The ABMC feature provides an option to the user to assign a hardware
counter to an RMID, event pair and monitor the bandwidth as long as it is
assigned. The assigned RMID will be tracked by the hardware until the user
unassigns it manually.

Counters are configured by writing to L3_QOS_ABMC_CFG MSR and
specifying the counter id, bandwidth source, and bandwidth types.

Provide the interface to assign the counter ids to RMID.

The feature details are documented in the APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
    Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
    Monitoring (ABMC).

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v8: Rename resctrl_arch_assign_cntr to resctrl_arch_config_cntr.

v7: Separated arch and fs functions. This patch only has arch implementation.
    Added struct rdt_resource to the interface resctrl_arch_assign_cntr.
    Rename rdtgroup_abmc_cfg() to resctrl_abmc_config_one_amd().

v6: Removed mbm_cntr_alloc() from this patch to keep fs and arch code
    separate.
    Added code to update the counter assignment at domain level.

v5: Few name changes to match cntr_id.
    Changed the function names to
      rdtgroup_assign_cntr
      resctr_arch_assign_cntr
      More comments on commit log.
      Added function summary.

v4: Commit message update.
      User bitmap APIs where applicable.
      Changed the interfaces considering MPAM(arm).
      Added domain specific assignment.

v3: Removed the static from the prototype of rdtgroup_assign_abmc.
      The function is not called directly from user anymore. These
      changes are related to global assignment interface.

v2: Minor text changes in commit message.
---
 arch/x86/kernel/cpu/resctrl/internal.h |  3 ++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 45 ++++++++++++++++++++++++++
 2 files changed, 48 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 58298db9034f..6d4df0490186 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -705,6 +705,9 @@ int mbm_cntr_alloc(struct rdt_resource *r);
 void mbm_cntr_free(struct rdt_resource *r, u32 cntr_id);
 void arch_mbm_evt_config_init(struct rdt_hw_mon_domain *hw_dom);
 unsigned int mon_event_config_index_get(u32 evtid);
+int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+			     enum resctrl_event_id evtid, u32 rmid, u32 closid,
+			     u32 cntr_id, bool assign);
 void rdt_staged_configs_clear(void);
 bool closid_allocated(unsigned int closid);
 int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 03b670b95c49..4ab1a18010c9 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1853,6 +1853,51 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
 	return ret ?: nbytes;
 }
 
+static void resctrl_abmc_config_one_amd(void *info)
+{
+	u64 *msrval = info;
+
+	wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *msrval);
+}
+
+/*
+ * Send an IPI to the domain to assign the counter to RMID, event pair.
+ */
+int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+			     enum resctrl_event_id evtid, u32 rmid, u32 closid,
+			     u32 cntr_id, bool assign)
+{
+	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
+	union l3_qos_abmc_cfg abmc_cfg = { 0 };
+	struct arch_mbm_state *arch_mbm;
+
+	abmc_cfg.split.cfg_en = 1;
+	abmc_cfg.split.cntr_en = assign ? 1 : 0;
+	abmc_cfg.split.cntr_id = cntr_id;
+	abmc_cfg.split.bw_src = rmid;
+
+	/* Update the event configuration from the domain */
+	if (evtid == QOS_L3_MBM_TOTAL_EVENT_ID) {
+		abmc_cfg.split.bw_type = hw_dom->mbm_total_cfg;
+		arch_mbm = &hw_dom->arch_mbm_total[rmid];
+	} else {
+		abmc_cfg.split.bw_type = hw_dom->mbm_local_cfg;
+		arch_mbm = &hw_dom->arch_mbm_local[rmid];
+	}
+
+	smp_call_function_any(&d->hdr.cpu_mask, resctrl_abmc_config_one_amd,
+			      &abmc_cfg, 1);
+
+	/*
+	 * Reset the architectural state so that reading of hardware
+	 * counter is not considered as an overflow in next update.
+	 */
+	if (arch_mbm)
+		memset(arch_mbm, 0, sizeof(struct arch_mbm_state));
+
+	return 0;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v8 17/25] x86/resctrl: Add the interface to assign/update counter assignment
  2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (15 preceding siblings ...)
  2024-10-09 17:39 ` [PATCH v8 16/25] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC Babu Moger
@ 2024-10-09 17:39 ` Babu Moger
  2024-10-16  3:25   ` Reinette Chatre
  2024-10-09 17:39 ` [PATCH v8 18/25] x86/resctrl: Add the interface to unassign a MBM counter Babu Moger
                   ` (8 subsequent siblings)
  25 siblings, 1 reply; 124+ messages in thread
From: Babu Moger @ 2024-10-09 17:39 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

The mbm_cntr_assign mode offers several hardware counters that can be
assigned to an RMID-event pair and monitor the bandwidth as long as it
is assigned.

Counters are managed at two levels. The global assignment is tracked
using the mbm_cntr_free_map field in the struct resctrl_mon, while
domain-specific assignments are tracked using the mbm_cntr_map field
in the struct rdt_mon_domain. Allocation begins at the global level
and is then applied individually to each domain.

Introduce an interface to allocate these counters and update the
corresponding domains accordingly.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v8: Renamed rdtgroup_assign_cntr() to rdtgroup_assign_cntr_event().
    Added the code to return the error if rdtgroup_assign_cntr_event fails.
    Moved definition of MBM_EVENT_ARRAY_INDEX to resctrl/internal.h.
    Updated typo in the comments.

v7: New patch. Moved all the FS code here.
    Merged rdtgroup_assign_cntr and rdtgroup_alloc_cntr.
    Adde new #define MBM_EVENT_ARRAY_INDEX.
---
 arch/x86/kernel/cpu/resctrl/internal.h |  9 +++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 47 ++++++++++++++++++++++++++
 2 files changed, 56 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 6d4df0490186..900e18aea2c4 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -67,6 +67,13 @@
 
 #define MON_CNTR_UNSET			U32_MAX
 
+/*
+ * Get the counter index for the assignable counter
+ * 0 for evtid == QOS_L3_MBM_TOTAL_EVENT_ID
+ * 1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
+ */
+#define MBM_EVENT_ARRAY_INDEX(_event) ((_event) - 2)
+
 /**
  * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
  *			        aren't marked nohz_full
@@ -708,6 +715,8 @@ unsigned int mon_event_config_index_get(u32 evtid);
 int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
 			     enum resctrl_event_id evtid, u32 rmid, u32 closid,
 			     u32 cntr_id, bool assign);
+int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
+			       struct rdt_mon_domain *d, enum resctrl_event_id evtid);
 void rdt_staged_configs_clear(void);
 bool closid_allocated(unsigned int closid);
 int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 4ab1a18010c9..e4f628e6fe65 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1898,6 +1898,53 @@ int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
 	return 0;
 }
 
+/*
+ * Assign a hardware counter to the group.
+ * Counter will be assigned to all the domains if rdt_mon_domain is NULL
+ * else the counter will be allocated to specific domain.
+ */
+int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
+			       struct rdt_mon_domain *d, enum resctrl_event_id evtid)
+{
+	int index = MBM_EVENT_ARRAY_INDEX(evtid);
+	int cntr_id = rdtgrp->mon.cntr_id[index];
+	int ret;
+
+	/*
+	 * Allocate a new counter id to the event if the counter is not
+	 * assigned already.
+	 */
+	if (cntr_id == MON_CNTR_UNSET) {
+		cntr_id = mbm_cntr_alloc(r);
+		if (cntr_id < 0) {
+			rdt_last_cmd_puts("Out of MBM assignable counters\n");
+			return -ENOSPC;
+		}
+		rdtgrp->mon.cntr_id[index] = cntr_id;
+	}
+
+	if (!d) {
+		list_for_each_entry(d, &r->mon_domains, hdr.list) {
+			ret = resctrl_arch_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
+						       rdtgrp->closid, cntr_id, true);
+			if (ret)
+				goto out_done_assign;
+
+			set_bit(cntr_id, d->mbm_cntr_map);
+		}
+	} else {
+		ret = resctrl_arch_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
+					       rdtgrp->closid, cntr_id, true);
+		if (ret)
+			goto out_done_assign;
+
+		set_bit(cntr_id, d->mbm_cntr_map);
+	}
+
+out_done_assign:
+	return ret;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v8 18/25] x86/resctrl: Add the interface to unassign a MBM counter
  2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (16 preceding siblings ...)
  2024-10-09 17:39 ` [PATCH v8 17/25] x86/resctrl: Add the interface to assign/update counter assignment Babu Moger
@ 2024-10-09 17:39 ` Babu Moger
  2024-10-16  3:29   ` Reinette Chatre
  2024-10-09 17:39 ` [PATCH v8 19/25] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled Babu Moger
                   ` (7 subsequent siblings)
  25 siblings, 1 reply; 124+ messages in thread
From: Babu Moger @ 2024-10-09 17:39 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

The mbm_cntr_assign mode provides a limited number of hardware counters
that can be assigned to an RMID-event pair to monitor bandwidth while
assigned. If all counters are in use, the kernel will show an error
message: "Out of MBM assignable counters" when a new assignment is
requested. To make space for a new assignment, users must unassign an
already assigned counter.

Introduce an interface that allows for the unassignment of counter IDs
from both the group and the domain. Additionally, ensure that the global
counter is released if it is no longer assigned to any domains.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v8: Renamed rdtgroup_mbm_cntr_is_assigned to mbm_cntr_assigned_to_domain
    Added return error handling in resctrl_arch_config_cntr().

v7: Merged rdtgroup_unassign_cntr and rdtgroup_free_cntr functions.
    Renamed rdtgroup_mbm_cntr_test() to rdtgroup_mbm_cntr_is_assigned().
    Reworded the commit log little bit.

v6: Removed mbm_cntr_free from this patch.
    Added counter test in all the domains and free if it is not assigned to
    any domains.

v5: Few name changes to match cntr_id.
    Changed the function names to rdtgroup_unassign_cntr
    More comments on commit log.

v4: Added domain specific unassign feature.
    Few name changes.

v3: Removed the static from the prototype of rdtgroup_unassign_abmc.
    The function is not called directly from user anymore. These
    changes are related to global assignment interface.

v2: No changes.
---
 arch/x86/kernel/cpu/resctrl/internal.h |  2 +
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 56 ++++++++++++++++++++++++++
 2 files changed, 58 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 900e18aea2c4..6f388d20fb22 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -717,6 +717,8 @@ int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
 			     u32 cntr_id, bool assign);
 int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
 			       struct rdt_mon_domain *d, enum resctrl_event_id evtid);
+int rdtgroup_unassign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
+				 struct rdt_mon_domain *d, enum resctrl_event_id evtid);
 void rdt_staged_configs_clear(void);
 bool closid_allocated(unsigned int closid);
 int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index e4f628e6fe65..791258adcbda 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1945,6 +1945,62 @@ int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
 	return ret;
 }
 
+static bool mbm_cntr_assigned_to_domain(struct rdt_resource *r, u32 cntr_id)
+{
+	struct rdt_mon_domain *d;
+
+	list_for_each_entry(d, &r->mon_domains, hdr.list)
+		if (test_bit(cntr_id, d->mbm_cntr_map))
+			return 1;
+
+	return 0;
+}
+
+/*
+ * Unassign a hardware counter from the domain and the group.
+ * Counter will be unassigned in all the domains if rdt_mon_domain is NULL
+ * else the counter will be assigned to specific domain.
+ * Global counter will be freed once it is unassigned from all the domains.
+ */
+int rdtgroup_unassign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
+				 struct rdt_mon_domain *d, enum resctrl_event_id evtid)
+{
+	int index = MBM_EVENT_ARRAY_INDEX(evtid);
+	int cntr_id = rdtgrp->mon.cntr_id[index];
+	int ret;
+
+	/* Return early if the counter is unassigned already */
+	if (cntr_id == MON_CNTR_UNSET)
+		return 0;
+
+	if (!d) {
+		list_for_each_entry(d, &r->mon_domains, hdr.list) {
+			ret = resctrl_arch_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
+						       rdtgrp->closid, cntr_id, false);
+			if (ret)
+				goto out_done_unassign;
+
+			clear_bit(cntr_id, d->mbm_cntr_map);
+		}
+	} else {
+		ret = resctrl_arch_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
+					       rdtgrp->closid, cntr_id, false);
+		if (ret)
+			goto out_done_unassign;
+
+		clear_bit(cntr_id, d->mbm_cntr_map);
+	}
+
+	/* Update the counter bitmap */
+	if (!mbm_cntr_assigned_to_domain(r, cntr_id)) {
+		mbm_cntr_free(r, cntr_id);
+		rdtgrp->mon.cntr_id[index] = MON_CNTR_UNSET;
+	}
+
+out_done_unassign:
+	return ret;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v8 19/25] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled
  2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (17 preceding siblings ...)
  2024-10-09 17:39 ` [PATCH v8 18/25] x86/resctrl: Add the interface to unassign a MBM counter Babu Moger
@ 2024-10-09 17:39 ` Babu Moger
  2024-10-11 17:17   ` Tony Luck
  2024-10-16  3:30   ` Reinette Chatre
  2024-10-09 17:39 ` [PATCH v8 20/25] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode Babu Moger
                   ` (6 subsequent siblings)
  25 siblings, 2 replies; 124+ messages in thread
From: Babu Moger @ 2024-10-09 17:39 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Assign/unassign counters on resctrl group creation/deletion. Two counters
are required per group, one for MBM total event and one for MBM local
event.

There are a limited number of counters available for assignment. If these
counters are exhausted, the kernel will display the error message: "Out of
MBM assignable counters". However, it is not necessary to fail the
creation of a group due to assignment failures. Users have the flexibility
to modify the assignments at a later time.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v8: Renamed rdtgroup_assign_grp to rdtgroup_assign_cntrs.
    Renamed rdtgroup_unassign_grp to rdtgroup_unassign_cntrs.
    Fixed the problem with unassigning the child MON groups of CTRL_MON group.

v7: Reworded the commit message.
    Removed the reference of ABMC with mbm_cntr_assign.
    Renamed the function rdtgroup_assign_cntrs to rdtgroup_assign_grp.

v6: Removed the redundant comments on all the calls of
    rdtgroup_assign_cntrs. Updated the commit message.
    Dropped printing error message on every call of rdtgroup_assign_cntrs.

v5: Removed the code to enable/disable ABMC during the mount.
    That will be another patch.
    Added arch callers to get the arch specific data.
    Renamed fuctions to match the other abmc function.
    Added code comments for assignment failures.

v4: Few name changes based on the upstream discussion.
    Commit message update.

v3: This is a new patch. Patch addresses the upstream comment to enable
    ABMC feature by default if the feature is available.
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 64 ++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 791258adcbda..cb2c60c0319e 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2875,6 +2875,52 @@ static void schemata_list_destroy(void)
 	}
 }
 
+/*
+ * Called when a new group is created. If `mbm_cntr_assign` mode is enabled,
+ * counters are automatically assigned. Each group requires two counters:
+ * one for the total event and one for the local event. Due to the limited
+ * number of counters, assignments may fail in some cases. However, it is
+ * not necessary to fail the group creation. Users have the option to
+ * modify the assignments after the group has been created.
+ */
+static int rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
+{
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+	int ret = 0;
+
+	if (!resctrl_arch_mbm_cntr_assign_enabled(r))
+		return 0;
+
+	if (is_mbm_total_enabled())
+		ret = rdtgroup_assign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_TOTAL_EVENT_ID);
+
+	if (!ret && is_mbm_local_enabled())
+		ret = rdtgroup_assign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);
+
+	return ret;
+}
+
+/*
+ * Called when a group is deleted. Counters are unassigned if it was in
+ * assigned state.
+ */
+static int rdtgroup_unassign_cntrs(struct rdtgroup *rdtgrp)
+{
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+	int ret = 0;
+
+	if (!resctrl_arch_mbm_cntr_assign_enabled(r))
+		return 0;
+
+	if (is_mbm_total_enabled())
+		ret = rdtgroup_unassign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_TOTAL_EVENT_ID);
+
+	if (!ret && is_mbm_local_enabled())
+		ret = rdtgroup_unassign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);
+
+	return ret;
+}
+
 static int rdt_get_tree(struct fs_context *fc)
 {
 	struct rdt_fs_context *ctx = rdt_fc2context(fc);
@@ -2934,6 +2980,8 @@ static int rdt_get_tree(struct fs_context *fc)
 		if (ret < 0)
 			goto out_mongrp;
 		rdtgroup_default.mon.mon_data_kn = kn_mondata;
+
+		rdtgroup_assign_cntrs(&rdtgroup_default);
 	}
 
 	ret = rdt_pseudo_lock_init();
@@ -2964,6 +3012,7 @@ static int rdt_get_tree(struct fs_context *fc)
 out_psl:
 	rdt_pseudo_lock_release();
 out_mondata:
+	rdtgroup_unassign_cntrs(&rdtgroup_default);
 	if (resctrl_arch_mon_capable())
 		kernfs_remove(kn_mondata);
 out_mongrp:
@@ -3144,6 +3193,7 @@ static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp)
 
 	head = &rdtgrp->mon.crdtgrp_list;
 	list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) {
+		rdtgroup_unassign_cntrs(sentry);
 		free_rmid(sentry->closid, sentry->mon.rmid);
 		list_del(&sentry->mon.crdtgrp_list);
 
@@ -3184,6 +3234,8 @@ static void rmdir_all_sub(void)
 		cpumask_or(&rdtgroup_default.cpu_mask,
 			   &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);
 
+		rdtgroup_unassign_cntrs(rdtgrp);
+
 		free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
 
 		kernfs_remove(rdtgrp->kn);
@@ -3223,6 +3275,8 @@ static void rdt_kill_sb(struct super_block *sb)
 		resctrl_arch_disable_alloc();
 	if (resctrl_arch_mon_capable())
 		resctrl_arch_disable_mon();
+
+	rdtgroup_unassign_cntrs(&rdtgroup_default);
 	resctrl_mounted = false;
 	kernfs_kill_sb(sb);
 	mutex_unlock(&rdtgroup_mutex);
@@ -3814,6 +3868,8 @@ static int rdtgroup_mkdir_mon(struct kernfs_node *parent_kn,
 		goto out_unlock;
 	}
 
+	rdtgroup_assign_cntrs(rdtgrp);
+
 	kernfs_activate(rdtgrp->kn);
 
 	/*
@@ -3858,6 +3914,8 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
 	if (ret)
 		goto out_closid_free;
 
+	rdtgroup_assign_cntrs(rdtgrp);
+
 	kernfs_activate(rdtgrp->kn);
 
 	ret = rdtgroup_init_alloc(rdtgrp);
@@ -3883,6 +3941,7 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
 out_del_list:
 	list_del(&rdtgrp->rdtgroup_list);
 out_rmid_free:
+	rdtgroup_unassign_cntrs(rdtgrp);
 	mkdir_rdt_prepare_rmid_free(rdtgrp);
 out_closid_free:
 	closid_free(closid);
@@ -3953,6 +4012,9 @@ static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
 	update_closid_rmid(tmpmask, NULL);
 
 	rdtgrp->flags = RDT_DELETED;
+
+	rdtgroup_unassign_cntrs(rdtgrp);
+
 	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
 
 	/*
@@ -3999,6 +4061,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
 	cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
 	update_closid_rmid(tmpmask, NULL);
 
+	rdtgroup_unassign_cntrs(rdtgrp);
+
 	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
 	closid_free(rdtgrp->closid);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v8 20/25] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode
  2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (18 preceding siblings ...)
  2024-10-09 17:39 ` [PATCH v8 19/25] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled Babu Moger
@ 2024-10-09 17:39 ` Babu Moger
  2024-10-11 17:23   ` Tony Luck
  2024-10-16  3:31   ` Reinette Chatre
  2024-10-09 17:39 ` [PATCH v8 21/25] x86/resctrl: Introduce the interface to switch between monitor modes Babu Moger
                   ` (5 subsequent siblings)
  25 siblings, 2 replies; 124+ messages in thread
From: Babu Moger @ 2024-10-09 17:39 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

In mbm_cntr_assign mode, the hardware counter should be assigned to read
the MBM events.

Report "Unassigned" in case the user attempts to read the events without
assigning the counter.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v8: Used MBM_EVENT_ARRAY_INDEX to get the index for the MBM event.
    Documentation update to make the text generic.

v7: Moved the documentation under "mon_data".
    Updated the text little bit.

v6: Added more explaination in the resctrl.rst
    Added checks to detect "Unassigned" before reading RMID.

v5: New patch.
---
 Documentation/arch/x86/resctrl.rst        | 10 ++++++++++
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 13 ++++++++++++-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 1b5c05a35793..99ee9c87952b 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -419,6 +419,16 @@ When monitoring is enabled all MON groups will also contain:
 	for the L3 cache they occupy). These are named "mon_sub_L3_YY"
 	where "YY" is the node number.
 
+	When supported the 'mbm_cntr_assign' mode allows users to assign a
+	counter to mon_hw_id, event pair enabling bandwidth monitoring for
+	as long as the counter remains assigned. The hardware will continue
+	tracking the assigned mon_hw_id until the user manually unassigns
+	it, ensuring that counters are not reset during this period. With
+	a limited number of counters, the system may run out of assignable
+	counters at some point. In that case, MBM event counters will return
+	"Unassigned" when the event is read. Users must manually assign a
+	counter to read the events.
+
 "mon_hw_id":
 	Available only with debug option. The identifier used by hardware
 	for the monitor group. On x86 this is the RMID.
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 50fa1fe9a073..5a9d15b2c319 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -562,7 +562,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 	struct rdtgroup *rdtgrp;
 	struct rdt_resource *r;
 	union mon_data_bits md;
-	int ret = 0;
+	int ret = 0, index;
 
 	rdtgrp = rdtgroup_kn_lock_live(of->kn);
 	if (!rdtgrp) {
@@ -576,6 +576,15 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 	evtid = md.u.evtid;
 	r = &rdt_resources_all[resid].r_resctrl;
 
+	if (resctrl_arch_mbm_cntr_assign_enabled(r) && evtid != QOS_L3_OCCUP_EVENT_ID) {
+		index = MBM_EVENT_ARRAY_INDEX(evtid);
+		if (index != INVALID_CONFIG_INDEX &&
+		    rdtgrp->mon.cntr_id[index] == MON_CNTR_UNSET) {
+			rr.err = -ENOENT;
+			goto checkresult;
+		}
+	}
+
 	if (md.u.sum) {
 		/*
 		 * This file requires summing across all domains that share
@@ -613,6 +622,8 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 		seq_puts(m, "Error\n");
 	else if (rr.err == -EINVAL)
 		seq_puts(m, "Unavailable\n");
+	else if (rr.err == -ENOENT)
+		seq_puts(m, "Unassigned\n");
 	else
 		seq_printf(m, "%llu\n", rr.val);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v8 21/25] x86/resctrl: Introduce the interface to switch between monitor modes
  2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (19 preceding siblings ...)
  2024-10-09 17:39 ` [PATCH v8 20/25] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode Babu Moger
@ 2024-10-09 17:39 ` Babu Moger
  2024-10-16  3:36   ` Reinette Chatre
  2024-10-09 17:39 ` [PATCH v8 22/25] x86/resctrl: Configure mbm_cntr_assign mode if supported Babu Moger
                   ` (4 subsequent siblings)
  25 siblings, 1 reply; 124+ messages in thread
From: Babu Moger @ 2024-10-09 17:39 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Introduce interface to switch between mbm_cntr_assign and default modes.

$ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
[mbm_cntr_assign]
default

To enable the "mbm_cntr_assign" mode:
$ echo "mbm_cntr_assign" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode

To enable the default monitoring mode:
$ echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode

MBM event counters will reset when mbm_assign_mode is changed.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v8: Reset the internal counters after mbm_cntr_assign mode is changed.
    Renamed rdtgroup_mbm_cntr_reset() to mbm_cntr_reset()
    Updated the documentation to make text generic.

v7: Changed the interface name to mbm_assign_mode.
    Removed the references of ABMC.
    Added the changes to reset global and domain bitmaps.
    Added the changes to reset rmid.

v6: Changed the mode name to mbm_cntr_assign.
    Moved all the FS related code here.
    Added changes to reset mbm_cntr_map and resctrl group counters.

v5: Change log and mode description text correction.

v4: Minor commit text changes. Keep the default to ABMC when supported.
    Fixed comments to reflect changed interface "mbm_mode".

v3: New patch to address the review comments from upstream.
---
 Documentation/arch/x86/resctrl.rst     | 15 ++++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 75 +++++++++++++++++++++++++-
 2 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 99ee9c87952b..d9574078f735 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -291,6 +291,21 @@ with the following files:
 	that case reading the mbm_total_bytes and mbm_local_bytes may report
 	'Unavailable' if there is no counter associated with that group.
 
+	* To enable "mbm_cntr_assign" mode:
+	  ::
+
+	    # echo  "mbm_cntr_assign" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
+
+	* To enable default monitoring mode:
+	  ::
+
+	    # echo  "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
+
+	The MBM events (mbm_total_bytes and/or mbm_local_bytes) associated counters
+	may reset when the mode is changed. Moving to mbm_cntr_assign mode will
+	require users to assign the counters to the events. Otherwise, the MBM
+	event counters will return "Unassigned" when read.
+
 "num_mbm_cntrs":
 	The number of monitoring counters available for assignment when the
 	architecture supports mbm_cntr_assign mode.
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index cb2c60c0319e..88eda3cf5c82 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -888,6 +888,78 @@ static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static void mbm_cntr_reset(struct rdt_resource *r)
+{
+	struct rdtgroup *prgrp, *crgrp;
+	struct rdt_mon_domain *dom;
+
+	/*
+	 * Hardware counters will reset after switching the monitor mode.
+	 * Reset the architectural state so that reading of hardware
+	 * counter is not considered as an overflow in the next update.
+	 * Also reset the domain counter bitmap.
+	 */
+	list_for_each_entry(dom, &r->mon_domains, hdr.list) {
+		bitmap_zero(dom->mbm_cntr_map, r->mon.num_mbm_cntrs);
+		resctrl_arch_reset_rmid_all(r, dom);
+	}
+
+	/* Reset global MBM counter map */
+	bitmap_fill(r->mon.mbm_cntr_free_map, r->mon.num_mbm_cntrs);
+
+	/* Reset the cntr_id's for all the monitor groups */
+	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
+		prgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
+		prgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
+		list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list,
+				    mon.crdtgrp_list) {
+			crgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
+			crgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
+		}
+	}
+}
+
+static ssize_t rdtgroup_mbm_assign_mode_write(struct kernfs_open_file *of,
+					      char *buf, size_t nbytes, loff_t off)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+	int ret = 0;
+	bool enable;
+
+	/* Valid input requires a trailing newline */
+	if (nbytes == 0 || buf[nbytes - 1] != '\n')
+		return -EINVAL;
+
+	buf[nbytes - 1] = '\0';
+
+	cpus_read_lock();
+	mutex_lock(&rdtgroup_mutex);
+
+	rdt_last_cmd_clear();
+
+	if (!strcmp(buf, "default")) {
+		enable = 0;
+	} else if (!strcmp(buf, "mbm_cntr_assign")) {
+		enable = 1;
+	} else {
+		ret = -EINVAL;
+		rdt_last_cmd_puts("Unsupported assign mode\n");
+		goto write_exit;
+	}
+
+	if (enable != resctrl_arch_mbm_cntr_assign_enabled(r)) {
+		ret = resctrl_arch_mbm_cntr_assign_set(r, enable);
+		if (!ret)
+			mbm_cntr_reset(r);
+	}
+
+write_exit:
+	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
+
+	return ret ?: nbytes;
+}
+
 static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
 				       struct seq_file *s, void *v)
 {
@@ -2115,9 +2187,10 @@ static struct rftype res_common_files[] = {
 	},
 	{
 		.name		= "mbm_assign_mode",
-		.mode		= 0444,
+		.mode		= 0644,
 		.kf_ops		= &rdtgroup_kf_single_ops,
 		.seq_show	= rdtgroup_mbm_assign_mode_show,
+		.write		= rdtgroup_mbm_assign_mode_write,
 		.fflags		= RFTYPE_MON_INFO,
 	},
 	{
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v8 22/25] x86/resctrl: Configure mbm_cntr_assign mode if supported
  2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (20 preceding siblings ...)
  2024-10-09 17:39 ` [PATCH v8 21/25] x86/resctrl: Introduce the interface to switch between monitor modes Babu Moger
@ 2024-10-09 17:39 ` Babu Moger
  2024-10-09 17:39 ` [PATCH v8 23/25] x86/resctrl: Update assignments on event configuration changes Babu Moger
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 124+ messages in thread
From: Babu Moger @ 2024-10-09 17:39 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Configure mbm_cntr_assign on AMD. 'mbm_cntr_assign' mode in AMD is ABMC
(Assignable Bandwidth Monitoring Counters). It is enabled by default when
supported on the system.

When the ABMC is updated, it must be updated on all the logical processors
in the resctrl domain.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v8: Renamed resctrl_arch_mbm_cntr_assign_configure to
	resctrl_arch_mbm_cntr_assign_set_one.
    Adde r->mon_capable check.
    Commit message update.

v7: Introduced resctrl_arch_mbm_cntr_assign_configure() to configure.
    Moved the default settings to rdt_get_mon_l3_config(). It should be
    done before the hotplug handler is called. It cannot be done at
    rdtgroup_init().

v6: Keeping the default enablement in arch init code for now.
     This may need some discussion.
     Renamed resctrl_arch_configure_abmc to resctrl_arch_mbm_cntr_assign_configure.

v5: New patch to enable ABMC by default.
---
 arch/x86/kernel/cpu/resctrl/internal.h |  1 +
 arch/x86/kernel/cpu/resctrl/monitor.c  |  1 +
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 11 +++++++++++
 3 files changed, 13 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 6f388d20fb22..a6f40d3115f4 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -719,6 +719,7 @@ int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
 			       struct rdt_mon_domain *d, enum resctrl_event_id evtid);
 int rdtgroup_unassign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
 				 struct rdt_mon_domain *d, enum resctrl_event_id evtid);
+void resctrl_arch_mbm_cntr_assign_set_one(struct rdt_resource *r);
 void rdt_staged_configs_clear(void);
 bool closid_allocated(unsigned int closid);
 int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 6b4cf4813a4b..395d99984893 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1268,6 +1268,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 			cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
 			r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
 			resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
+			hw_res->mbm_cntr_assign_enabled = true;
 		}
 	}
 
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 88eda3cf5c82..f890d294e002 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2736,6 +2736,13 @@ int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable)
 	return 0;
 }
 
+void resctrl_arch_mbm_cntr_assign_set_one(struct rdt_resource *r)
+{
+	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+
+	resctrl_abmc_set_one_amd(&hw_res->mbm_cntr_assign_enabled);
+}
+
 /*
  * We don't allow rdtgroup directories to be created anywhere
  * except the root directory. Thus when looking for the rdtgroup
@@ -4523,9 +4530,13 @@ int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
 
 void resctrl_online_cpu(unsigned int cpu)
 {
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+
 	mutex_lock(&rdtgroup_mutex);
 	/* The CPU is set in default rdtgroup after online. */
 	cpumask_set_cpu(cpu, &rdtgroup_default.cpu_mask);
+	if (r->mon_capable && r->mon.mbm_cntr_assignable)
+		resctrl_arch_mbm_cntr_assign_set_one(r);
 	mutex_unlock(&rdtgroup_mutex);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v8 23/25] x86/resctrl: Update assignments on event configuration changes
  2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (21 preceding siblings ...)
  2024-10-09 17:39 ` [PATCH v8 22/25] x86/resctrl: Configure mbm_cntr_assign mode if supported Babu Moger
@ 2024-10-09 17:39 ` Babu Moger
  2024-10-16  3:40   ` Reinette Chatre
  2024-10-09 17:39 ` [PATCH v8 24/25] x86/resctrl: Introduce interface to list assignment states of all the groups Babu Moger
                   ` (2 subsequent siblings)
  25 siblings, 1 reply; 124+ messages in thread
From: Babu Moger @ 2024-10-09 17:39 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Users can modify the configuration of assignable events. Whenever the
event configuration is updated, MBM assignments must be revised across
all monitor groups within the impacted domains.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v8: Patch changed completely.
    Updated the assignment on same IPI as the event is updated.
    Could not do the way we discussed in the thread.
    https://lore.kernel.org/lkml/f77737ac-d3f6-3e4b-3565-564f79c86ca8@amd.com/
    Needed to figure out event type to update the configuration.

v7: New patch to update the assignments. Missed it earlier.
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 49 ++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index f890d294e002..cf2e0ad0e4f4 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1669,6 +1669,7 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
 }
 
 struct mon_config_info {
+	struct rdt_resource *r;
 	struct rdt_mon_domain *d;
 	u32 evtid;
 	u32 mon_config;
@@ -1694,11 +1695,46 @@ u32 resctrl_arch_mon_event_config_get(struct rdt_mon_domain *d,
 	return INVALID_CONFIG_VALUE;
 }
 
+static void mbm_cntr_event_update(int cntr_id, unsigned int index, u32 val)
+{
+	union l3_qos_abmc_cfg abmc_cfg = { 0 };
+	struct rdtgroup *prgrp, *crgrp;
+	int update = 0;
+
+	/* Check if the cntr_id is associated to the event type updated */
+	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
+		if (prgrp->mon.cntr_id[index] == cntr_id) {
+			abmc_cfg.split.bw_src = prgrp->mon.rmid;
+			update = 1;
+			goto out_update;
+		}
+		list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list, mon.crdtgrp_list) {
+			if (crgrp->mon.cntr_id[index] == cntr_id) {
+				abmc_cfg.split.bw_src = crgrp->mon.rmid;
+				update = 1;
+				goto out_update;
+			}
+		}
+	}
+
+out_update:
+	if (update) {
+		abmc_cfg.split.cfg_en = 1;
+		abmc_cfg.split.cntr_en = 1;
+		abmc_cfg.split.cntr_id = cntr_id;
+		abmc_cfg.split.bw_type = val;
+		wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, abmc_cfg.full);
+	}
+}
+
 void resctrl_arch_mon_event_config_set(void *info)
 {
 	struct mon_config_info *mon_info = info;
+	struct rdt_mon_domain *d = mon_info->d;
+	struct rdt_resource *r = mon_info->r;
 	struct rdt_hw_mon_domain *hw_dom;
 	unsigned int index;
+	int cntr_id;
 
 	index = mon_event_config_index_get(mon_info->evtid);
 	if (index == INVALID_CONFIG_INDEX)
@@ -1718,6 +1754,18 @@ void resctrl_arch_mon_event_config_set(void *info)
 		hw_dom->mbm_local_cfg =  mon_info->mon_config;
 		break;
 	}
+
+	/*
+	 * Update the assignment if the domain has the cntr_id's assigned
+	 * to event type updated.
+	 */
+	if (resctrl_arch_mbm_cntr_assign_enabled(r)) {
+		for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
+			if (test_bit(cntr_id, d->mbm_cntr_map))
+				mbm_cntr_event_update(cntr_id, index,
+						      mon_info->mon_config);
+		}
+	}
 }
 
 /**
@@ -1805,6 +1853,7 @@ static void mbm_config_write_domain(struct rdt_resource *r,
 	mon_info.d = d;
 	mon_info.evtid = evtid;
 	mon_info.mon_config = val;
+	mon_info.r = r;
 
 	/*
 	 * Update MSR_IA32_EVT_CFG_BASE MSR on one of the CPUs in the
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v8 24/25] x86/resctrl: Introduce interface to list assignment states of all the groups
  2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (22 preceding siblings ...)
  2024-10-09 17:39 ` [PATCH v8 23/25] x86/resctrl: Update assignments on event configuration changes Babu Moger
@ 2024-10-09 17:39 ` Babu Moger
  2024-10-16  3:40   ` Reinette Chatre
  2024-10-09 17:39 ` [PATCH v8 25/25] x86/resctrl: Introduce interface to modify assignment states of " Babu Moger
  2024-10-16  3:05 ` [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Reinette Chatre
  25 siblings, 1 reply; 124+ messages in thread
From: Babu Moger @ 2024-10-09 17:39 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Provide the interface to list the assignment states of all the resctrl
groups in mbm_cntr_assign mode.

Example:
$mount -t resctrl resctrl /sys/fs/resctrl/
$cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
//0=tl;1=tl;

List follows the following format:

"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"

Format for specific type of groups:

- Default CTRL_MON group:
  "//<domain_id>=<flags>"

- Non-default CTRL_MON group:
  "<CTRL_MON group>//<domain_id>=<flags>"

- Child MON group of default CTRL_MON group:
  "/<MON group>/<domain_id>=<flags>"

- Child MON group of non-default CTRL_MON group:
  "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"

Flags can be one of the following:
t  MBM total event is enabled
l  MBM local event is enabled
tl Both total and local MBM events are enabled
_  None of the MBM events are enabled

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v8: Moved resctrl_mbm_event_assigned() in here as it is first used here.
    Moved rdt_last_cmd_clear() before making any call.
    Updated the commit log.
    Corrected the doc format.

v7: Renamed the interface name from 'mbm_control' to 'mbm_assign_control'
    to match 'mbm_assign_mode'.
    Removed Arch references from FS code.
    Added rdt_last_cmd_clear() before the command processing.
    Added rdtgroup_mutex before all the calls.
    Removed references of ABMC from FS code.

v6: The domain specific assignment can be determined looking at mbm_cntr_map.
    Removed rdtgroup_abmc_dom_cfg() and rdtgroup_abmc_dom_state().
    Removed the switch statement for the domain_state detection.
    Determined the flags incremently.
    Removed special handling of default group while printing..

v5: Replaced "assignment flags" with "flags".
    Changes related to mon structure.
    Changes related renaming the interface from mbm_assign_control to
    mbm_control.

v4: Added functionality to query domain specific assigment in.
    rdtgroup_abmc_dom_state().

v3: New patch.
    Addresses the feedback to provide the global assignment interface.
    https://lore.kernel.org/lkml/c73f444b-83a1-4e9a-95d3-54c5165ee782@intel.com/
---
 Documentation/arch/x86/resctrl.rst     | 44 +++++++++++++++
 arch/x86/kernel/cpu/resctrl/monitor.c  |  1 +
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 76 ++++++++++++++++++++++++++
 3 files changed, 121 insertions(+)

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index d9574078f735..b85d3bc3e301 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -310,6 +310,50 @@ with the following files:
 	The number of monitoring counters available for assignment when the
 	architecture supports mbm_cntr_assign mode.
 
+"mbm_assign_control":
+	Reports the resctrl group and monitor status of each group.
+
+	List follows the following format:
+		"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
+
+	Format for specific type of groups:
+
+	* Default CTRL_MON group:
+		"//<domain_id>=<flags>"
+
+	* Non-default CTRL_MON group:
+		"<CTRL_MON group>//<domain_id>=<flags>"
+
+	* Child MON group of default CTRL_MON group:
+		"/<MON group>/<domain_id>=<flags>"
+
+	* Child MON group of non-default CTRL_MON group:
+		"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
+
+	Flags can be one of the following:
+	::
+
+	 t  MBM total event is assigned.
+	 l  MBM local event is assigned.
+	 tl Both total and local MBM events are assigned.
+	 _  None of the MBM events are assigned.
+
+	Examples:
+	::
+
+	 # mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
+	 # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
+	 # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
+
+	 # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+	 non_default_ctrl_mon_grp//0=tl;1=tl;
+	 non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+	 //0=tl;1=tl;
+	 /child_default_mon_grp/0=tl;1=tl;
+
+	There are four resctrl groups. All the groups have total and local MBM events
+	assigned on domain 0 and 1.
+
 "max_threshold_occupancy":
 		Read/write file provides the largest value (in
 		bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 395d99984893..fa7c77935080 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1269,6 +1269,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 			r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
 			resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
 			hw_res->mbm_cntr_assign_enabled = true;
+			resctrl_file_fflags_init("mbm_assign_control", RFTYPE_MON_INFO);
 		}
 	}
 
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index cf2e0ad0e4f4..cf92ceb0f05e 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -970,6 +970,76 @@ static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static bool resctrl_mbm_event_assigned(struct rdtgroup *rdtg,
+				       struct rdt_mon_domain *d, u32 evtid)
+{
+	int index = MBM_EVENT_ARRAY_INDEX(evtid);
+	int cntr_id = rdtg->mon.cntr_id[index];
+
+	return cntr_id != MON_CNTR_UNSET && test_bit(cntr_id, d->mbm_cntr_map);
+}
+
+static char *rdtgroup_mon_state_to_str(struct rdtgroup *rdtgrp,
+				       struct rdt_mon_domain *d, char *str)
+{
+	char *tmp = str;
+
+	/* Query the total and local event flags for the domain */
+	if (resctrl_mbm_event_assigned(rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID))
+		*tmp++ = 't';
+
+	if (resctrl_mbm_event_assigned(rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID))
+		*tmp++ = 'l';
+
+	if (tmp == str)
+		*tmp++ = '_';
+
+	*tmp = '\0';
+	return str;
+}
+
+static int rdtgroup_mbm_assign_control_show(struct kernfs_open_file *of,
+					    struct seq_file *s, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+	struct rdt_mon_domain *dom;
+	struct rdtgroup *rdtg;
+	char str[10];
+
+	mutex_lock(&rdtgroup_mutex);
+	rdt_last_cmd_clear();
+
+	if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
+		rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
+		mutex_unlock(&rdtgroup_mutex);
+		return -EINVAL;
+	}
+
+	list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
+		struct rdtgroup *crg;
+
+		seq_printf(s, "%s//", rdtg->kn->name);
+
+		list_for_each_entry(dom, &r->mon_domains, hdr.list)
+			seq_printf(s, "%d=%s;", dom->hdr.id,
+				   rdtgroup_mon_state_to_str(rdtg, dom, str));
+		seq_putc(s, '\n');
+
+		list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
+				    mon.crdtgrp_list) {
+			seq_printf(s, "%s/%s/", rdtg->kn->name, crg->kn->name);
+
+			list_for_each_entry(dom, &r->mon_domains, hdr.list)
+				seq_printf(s, "%d=%s;", dom->hdr.id,
+					   rdtgroup_mon_state_to_str(crg, dom, str));
+			seq_putc(s, '\n');
+		}
+	}
+
+	mutex_unlock(&rdtgroup_mutex);
+	return 0;
+}
+
 #ifdef CONFIG_PROC_CPU_RESCTRL
 
 /*
@@ -2256,6 +2326,12 @@ static struct rftype res_common_files[] = {
 		.kf_ops		= &rdtgroup_kf_single_ops,
 		.seq_show	= rdtgroup_num_mbm_cntrs_show,
 	},
+	{
+		.name		= "mbm_assign_control",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdtgroup_mbm_assign_control_show,
+	},
 	{
 		.name		= "cpus_list",
 		.mode		= 0644,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v8 25/25] x86/resctrl: Introduce interface to modify assignment states of the groups
  2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (23 preceding siblings ...)
  2024-10-09 17:39 ` [PATCH v8 24/25] x86/resctrl: Introduce interface to list assignment states of all the groups Babu Moger
@ 2024-10-09 17:39 ` Babu Moger
  2024-10-16  3:43   ` Reinette Chatre
  2024-10-16  3:05 ` [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Reinette Chatre
  25 siblings, 1 reply; 124+ messages in thread
From: Babu Moger @ 2024-10-09 17:39 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Introduce the interface to assign MBM events in mbm_cntr_assign mode.

Events can be enabled or disabled by writing to file
/sys/fs/resctrl/info/L3_MON/mbm_assign_control

Format is similar to the list format with addition of opcode for the
assignment operation.
 "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"

Format for specific type of groups:

 * Default CTRL_MON group:
         "//<domain_id><opcode><flags>"

 * Non-default CTRL_MON group:
         "<CTRL_MON group>//<domain_id><opcode><flags>"

 * Child MON group of default CTRL_MON group:
         "/<MON group>/<domain_id><opcode><flags>"

 * Child MON group of non-default CTRL_MON group:
         "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"

Domain_id '*' will apply the flags on all the domains.

Opcode can be one of the following:

 = Update the assignment to match the flags
 + Assign a new MBM event without impacting existing assignments.
 - Unassign a MBM event from currently assigned events.

Assignment flags can be one of the following:
 t  MBM total event
 l  MBM local event
 tl Both total and local MBM events
 _  None of the MBM events. Valid only with '=' opcode. This flag cannot
    be combined with other flags.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v8: Moved unassign as the first action during the assign modification.
    Assign none "_" takes priority. Cannot be mixed with other flags.
    Updated the documentation and .rst file format. htmldoc looks ok.

v7: Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write().
    Added mutex lock in rdtgroup_mbm_assign_control_write() while processing.
    Renamed rdtgroup_find_grp to rdtgroup_find_grp_by_name.
    Fixed rdtgroup_str_to_mon_state to return error for invalid flags.
    Simplified the calls rdtgroup_assign_cntr by merging few functions earlier.
    Removed ABMC reference in FS code.
    Reinette commented about handling the combination of flags like 'lt_' and '_lt'.
    Not sure if we need to change the behaviour here. Processed them sequencially right now.
    Users have the liberty to pass the flags. Restricting it might be a problem later.

v6: Added support assign all if domain id is '*'
    Fixed the allocation of counter id if it not assigned already.

v5: Interface name changed from mbm_assign_control to mbm_control.
    Fixed opcode and flags combination.
    '=_" is valid.
    "-_" amd "+_" is not valid.
    Minor message update.
    Renamed the function with prefix - rdtgroup_.
    Corrected few documentation mistakes.
    Rebase related changes after SNC support.

v4: Added domain specific assignments. Fixed the opcode parsing.

v3: New patch.
    Addresses the feedback to provide the global assignment interface.
    https://lore.kernel.org/lkml/c73f444b-83a1-4e9a-95d3-54c5165ee782@intel.com/
---
 Documentation/arch/x86/resctrl.rst     | 115 +++++++++++-
 arch/x86/kernel/cpu/resctrl/internal.h |  10 ++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 233 ++++++++++++++++++++++++-
 3 files changed, 356 insertions(+), 2 deletions(-)

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index b85d3bc3e301..77bb0b095127 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -336,7 +336,8 @@ with the following files:
 	 t  MBM total event is assigned.
 	 l  MBM local event is assigned.
 	 tl Both total and local MBM events are assigned.
-	 _  None of the MBM events are assigned.
+	 _  None of the MBM events are assigned. Only works with opcode '=' for write
+	    and cannot be combined with other flags.
 
 	Examples:
 	::
@@ -354,6 +355,118 @@ with the following files:
 	There are four resctrl groups. All the groups have total and local MBM events
 	assigned on domain 0 and 1.
 
+	Assignment state can be updated by writing to the interface.
+
+	Format is similar to the list format with addition of opcode for the
+	assignment operation.
+
+		"<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
+
+	Format for each type of groups:
+
+        * Default CTRL_MON group:
+                "//<domain_id><opcode><flags>"
+
+        * Non-default CTRL_MON group:
+                "<CTRL_MON group>//<domain_id><opcode><flags>"
+
+        * Child MON group of default CTRL_MON group:
+                "/<MON group>/<domain_id><opcode><flags>"
+
+        * Child MON group of non-default CTRL_MON group:
+                "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
+
+	Domain_id '*' will apply the flags on all the domains.
+
+	Opcode can be one of the following:
+	::
+
+	 = Update the assignment to match the MBM event.
+	 + Assign a new MBM event without impacting existing assignments.
+	 - Unassign a MBM event from currently assigned events.
+
+	Examples:
+	Initial group status:
+	::
+
+	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+	  non_default_ctrl_mon_grp//0=tl;1=tl;
+	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+	  //0=tl;1=tl;
+	  /child_default_mon_grp/0=tl;1=tl;
+
+	To update the default group to assign only total MBM event on domain 0:
+	::
+
+	  # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+	Assignment status after the update:
+	::
+
+	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+	  non_default_ctrl_mon_grp//0=tl;1=tl;
+	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+	  //0=t;1=tl;
+	  /child_default_mon_grp/0=tl;1=tl;
+
+	To update the MON group child_default_mon_grp to remove total MBM event on domain 1:
+	::
+
+	  # echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+	Assignment status after the update:
+	::
+
+	  $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+	  non_default_ctrl_mon_grp//0=tl;1=tl;
+	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+	  //0=t;1=tl;
+	  /child_default_mon_grp/0=tl;1=l;
+
+	To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to unassign
+	both local and total MBM events on domain 1:
+	::
+
+	  # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
+			/sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+	Assignment status after the update:
+	::
+
+	  non_default_ctrl_mon_grp//0=tl;1=tl;
+	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
+	  //0=t;1=tl;
+	  /child_default_mon_grp/0=tl;1=l;
+
+	To update the default group to add a local MBM event domain 0.
+	::
+
+	  # echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+	Assignment status after the update:
+	::
+
+	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+	  non_default_ctrl_mon_grp//0=tl;1=tl;
+	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
+	  //0=tl;1=tl;
+	  /child_default_mon_grp/0=tl;1=l;
+
+	To update the non default CTRL_MON group non_default_ctrl_mon_grp to unassign all the
+	MBM events on all the domains.
+	::
+
+	  # echo "non_default_ctrl_mon_grp//*=_" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+	Assignment status after the update:
+	::
+
+	  #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+	  non_default_ctrl_mon_grp//0=_;1=_;
+	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
+	  //0=tl;1=tl;
+	  /child_default_mon_grp/0=tl;1=l;
+
 "max_threshold_occupancy":
 		Read/write file provides the largest value (in
 		bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index a6f40d3115f4..e8d6a430dc4a 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -74,6 +74,16 @@
  */
 #define MBM_EVENT_ARRAY_INDEX(_event) ((_event) - 2)
 
+/*
+ * Assignment flags for mbm_cntr_assign feature
+ */
+enum {
+	ASSIGN_NONE	= 0,
+	ASSIGN_TOTAL	= BIT(QOS_L3_MBM_TOTAL_EVENT_ID),
+	ASSIGN_LOCAL	= BIT(QOS_L3_MBM_LOCAL_EVENT_ID),
+	ASSIGN_INVALID,
+};
+
 /**
  * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
  *			        aren't marked nohz_full
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index cf92ceb0f05e..6095146e3ba4 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1040,6 +1040,236 @@ static int rdtgroup_mbm_assign_control_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static int rdtgroup_str_to_mon_state(char *flag)
+{
+	int i, mon_state = ASSIGN_NONE;
+
+	for (i = 0; i < strlen(flag); i++) {
+		switch (*(flag + i)) {
+		case 't':
+			mon_state |= ASSIGN_TOTAL;
+			break;
+		case 'l':
+			mon_state |= ASSIGN_LOCAL;
+			break;
+		case '_':
+			return ASSIGN_NONE;
+		default:
+			return ASSIGN_INVALID;
+		}
+	}
+
+	return mon_state;
+}
+
+static struct rdtgroup *rdtgroup_find_grp_by_name(enum rdt_group_type rtype,
+						  char *p_grp, char *c_grp)
+{
+	struct rdtgroup *rdtg, *crg;
+
+	if (rtype == RDTCTRL_GROUP && *p_grp == '\0') {
+		return &rdtgroup_default;
+	} else if (rtype == RDTCTRL_GROUP) {
+		list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list)
+			if (!strcmp(p_grp, rdtg->kn->name))
+				return rdtg;
+	} else if (rtype == RDTMON_GROUP) {
+		list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
+			if (!strcmp(p_grp, rdtg->kn->name)) {
+				list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
+						    mon.crdtgrp_list) {
+					if (!strcmp(c_grp, crg->kn->name))
+						return crg;
+				}
+			}
+		}
+	}
+
+	return NULL;
+}
+
+static int rdtgroup_process_flags(struct rdt_resource *r,
+				  enum rdt_group_type rtype,
+				  char *p_grp, char *c_grp, char *tok)
+{
+	int op, mon_state, assign_state, unassign_state;
+	char *dom_str, *id_str, *op_str;
+	struct rdt_mon_domain *d;
+	struct rdtgroup *rdtgrp;
+	unsigned long dom_id;
+	int ret, found = 0;
+
+	rdtgrp = rdtgroup_find_grp_by_name(rtype, p_grp, c_grp);
+
+	if (!rdtgrp) {
+		rdt_last_cmd_puts("Not a valid resctrl group\n");
+		return -EINVAL;
+	}
+
+next:
+	if (!tok || tok[0] == '\0')
+		return 0;
+
+	/* Start processing the strings for each domain */
+	dom_str = strim(strsep(&tok, ";"));
+
+	op_str = strpbrk(dom_str, "=+-");
+
+	if (op_str) {
+		op = *op_str;
+	} else {
+		rdt_last_cmd_puts("Missing operation =, +, - character\n");
+		return -EINVAL;
+	}
+
+	id_str = strsep(&dom_str, "=+-");
+
+	/* Check for domain id '*' which means all domains */
+	if (id_str && *id_str == '*') {
+		d = NULL;
+		goto check_state;
+	} else if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
+		rdt_last_cmd_puts("Missing domain id\n");
+		return -EINVAL;
+	}
+
+	/* Verify if the dom_id is valid */
+	list_for_each_entry(d, &r->mon_domains, hdr.list) {
+		if (d->hdr.id == dom_id) {
+			found = 1;
+			break;
+		}
+	}
+
+	if (!found) {
+		rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
+		return -EINVAL;
+	}
+
+check_state:
+	mon_state = rdtgroup_str_to_mon_state(dom_str);
+
+	if (mon_state == ASSIGN_INVALID) {
+		rdt_last_cmd_puts("Invalid assign flag\n");
+		goto out_fail;
+	}
+
+	assign_state = 0;
+	unassign_state = 0;
+
+	switch (op) {
+	case '+':
+		if (mon_state == ASSIGN_NONE) {
+			rdt_last_cmd_puts("Invalid assign opcode\n");
+			goto out_fail;
+		}
+		assign_state = mon_state;
+		break;
+	case '-':
+		if (mon_state == ASSIGN_NONE) {
+			rdt_last_cmd_puts("Invalid assign opcode\n");
+			goto out_fail;
+		}
+		unassign_state = mon_state;
+		break;
+	case '=':
+		assign_state = mon_state;
+		unassign_state = (ASSIGN_TOTAL | ASSIGN_LOCAL) & ~assign_state;
+		break;
+	default:
+		break;
+	}
+
+	if (unassign_state & ASSIGN_TOTAL) {
+		ret = rdtgroup_unassign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID);
+		if (ret)
+			goto out_fail;
+	}
+
+	if (unassign_state & ASSIGN_LOCAL) {
+		ret = rdtgroup_unassign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID);
+		if (ret)
+			goto out_fail;
+	}
+
+	if (assign_state & ASSIGN_TOTAL) {
+		ret = rdtgroup_assign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID);
+		if (ret)
+			goto out_fail;
+	}
+
+	if (assign_state & ASSIGN_LOCAL) {
+		ret = rdtgroup_assign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID);
+		if (ret)
+			goto out_fail;
+	}
+
+	goto next;
+
+out_fail:
+
+	return -EINVAL;
+}
+
+static ssize_t rdtgroup_mbm_assign_control_write(struct kernfs_open_file *of,
+						 char *buf, size_t nbytes, loff_t off)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+	char *token, *cmon_grp, *mon_grp;
+	enum rdt_group_type rtype;
+	int ret;
+
+	/* Valid input requires a trailing newline */
+	if (nbytes == 0 || buf[nbytes - 1] != '\n')
+		return -EINVAL;
+
+	buf[nbytes - 1] = '\0';
+
+	cpus_read_lock();
+	mutex_lock(&rdtgroup_mutex);
+
+	if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
+		rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
+		mutex_unlock(&rdtgroup_mutex);
+		cpus_read_unlock();
+		return -EINVAL;
+	}
+
+	rdt_last_cmd_clear();
+
+	while ((token = strsep(&buf, "\n")) != NULL) {
+		if (strstr(token, "/")) {
+			/*
+			 * The write command follows the following format:
+			 * “<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>”
+			 * Extract the CTRL_MON group.
+			 */
+			cmon_grp = strsep(&token, "/");
+
+			/*
+			 * Extract the MON_GROUP.
+			 * strsep returns empty string for contiguous delimiters.
+			 * Empty mon_grp here means it is a RDTCTRL_GROUP.
+			 */
+			mon_grp = strsep(&token, "/");
+
+			if (*mon_grp == '\0')
+				rtype = RDTCTRL_GROUP;
+			else
+				rtype = RDTMON_GROUP;
+
+			ret = rdtgroup_process_flags(r, rtype, cmon_grp, mon_grp, token);
+			if (ret)
+				break;
+		}
+	}
+
+	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
+
+	return ret ?: nbytes;
+}
+
 #ifdef CONFIG_PROC_CPU_RESCTRL
 
 /*
@@ -2328,9 +2558,10 @@ static struct rftype res_common_files[] = {
 	},
 	{
 		.name		= "mbm_assign_control",
-		.mode		= 0444,
+		.mode		= 0644,
 		.kf_ops		= &rdtgroup_kf_single_ops,
 		.seq_show	= rdtgroup_mbm_assign_control_show,
+		.write		= rdtgroup_mbm_assign_control_write,
 	},
 	{
 		.name		= "cpus_list",
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 07/25] x86/resctrl: Introduce the interface to display monitor mode
  2024-10-09 17:39 ` [PATCH v8 07/25] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
@ 2024-10-09 22:42   ` Tony Luck
  2024-10-10 14:54     ` Moger, Babu
  2024-10-16  3:12   ` Reinette Chatre
  1 sibling, 1 reply; 124+ messages in thread
From: Tony Luck @ 2024-10-09 22:42 UTC (permalink / raw)
  To: Babu Moger
  Cc: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen,
	x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

On Wed, Oct 09, 2024 at 12:39:32PM -0500, Babu Moger wrote:
> +"mbm_assign_mode":
> +	Reports the list of monitoring modes supported. The enclosed brackets
> +	indicate which mode is enabled.
> +	::
> +
> +	  cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> +	  [mbm_cntr_assign]
> +	  default
> +
> +	"mbm_cntr_assign":
> +
> +	In mbm_cntr_assign mode user-space is able to specify which control
> +	or monitor groups in resctrl should have a counter assigned using the
> +	'mbm_assign_control' file. The number of counters available is described
> +	in the 'num_mbm_cntrs' file. Changing the mode may cause all counters on
> +	a resource to reset.
> +
> +	The mode is useful on platforms which support more control and monitor
> +	groups than hardware counters, meaning 'unassigned' control or monitor
> +	groups will report 'Unavailable' or count the traffic in an unpredictable
> +	way.
> +
> +	AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature
> +	enable this mode by default so that counters remain assigned even when the
> +	corresponding RMID is not in use by any processor.
> +
> +	"default":
> +
> +	By default resctrl assumes each control and monitor group has a hardware
> +	counter. Hardware that does not support 'mbm_cntr_assign' mode will still
> +	allow more control or monitor groups than 'num_rmids' to be created. In

Should that be s/num_rmids/num_mbm_cntrs/ ?

-Tony

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-09 17:39 ` [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters Babu Moger
@ 2024-10-09 22:49   ` Tony Luck
  2024-10-10 15:12     ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Tony Luck @ 2024-10-09 22:49 UTC (permalink / raw)
  To: Babu Moger
  Cc: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen,
	x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

On Wed, Oct 09, 2024 at 12:39:33PM -0500, Babu Moger wrote:
> +"num_mbm_cntrs":
> +	The number of monitoring counters available for assignment when the
> +	architecture supports mbm_cntr_assign mode.

It's not obvious (to me) how these counters work. When I create
a group with both local and total monitoring enabled, does that
use up two counters (even though I only used up one RMID)?

Are the counters multi-purpose. E.g. if I disable local counting
on all groups, are the freed-up counters available for use to
count total bandwidth on some additional groups?

From the examples it looks like if there are free counters
available when user does mkdir, then they will be assigned
to the new rdtgroup. If only one counter is free, does it
get assigned to local or total?

Thanks

-Tony

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 07/25] x86/resctrl: Introduce the interface to display monitor mode
  2024-10-09 22:42   ` Tony Luck
@ 2024-10-10 14:54     ` Moger, Babu
  2024-10-10 15:07       ` Luck, Tony
  0 siblings, 1 reply; 124+ messages in thread
From: Moger, Babu @ 2024-10-10 14:54 UTC (permalink / raw)
  To: Tony Luck
  Cc: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen,
	x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Tony,

Thanks for reviewing the patches.

On 10/9/24 17:42, Tony Luck wrote:
> On Wed, Oct 09, 2024 at 12:39:32PM -0500, Babu Moger wrote:
>> +"mbm_assign_mode":
>> +	Reports the list of monitoring modes supported. The enclosed brackets
>> +	indicate which mode is enabled.
>> +	::
>> +
>> +	  cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> +	  [mbm_cntr_assign]
>> +	  default
>> +
>> +	"mbm_cntr_assign":
>> +
>> +	In mbm_cntr_assign mode user-space is able to specify which control
>> +	or monitor groups in resctrl should have a counter assigned using the
>> +	'mbm_assign_control' file. The number of counters available is described
>> +	in the 'num_mbm_cntrs' file. Changing the mode may cause all counters on
>> +	a resource to reset.
>> +
>> +	The mode is useful on platforms which support more control and monitor
>> +	groups than hardware counters, meaning 'unassigned' control or monitor
>> +	groups will report 'Unavailable' or count the traffic in an unpredictable
>> +	way.
>> +
>> +	AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature
>> +	enable this mode by default so that counters remain assigned even when the
>> +	corresponding RMID is not in use by any processor.
>> +
>> +	"default":
>> +
>> +	By default resctrl assumes each control and monitor group has a hardware
>> +	counter. Hardware that does not support 'mbm_cntr_assign' mode will still
>> +	allow more control or monitor groups than 'num_rmids' to be created. In
> 
> Should that be s/num_rmids/num_mbm_cntrs/ ?

It is actually num_rmids here as in default mode, num_rmid_cntrs are not
available.
-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* RE: [PATCH v8 07/25] x86/resctrl: Introduce the interface to display monitor mode
  2024-10-10 14:54     ` Moger, Babu
@ 2024-10-10 15:07       ` Luck, Tony
  2024-10-10 15:30         ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Luck, Tony @ 2024-10-10 15:07 UTC (permalink / raw)
  To: babu.moger@amd.com
  Cc: corbet@lwn.net, Yu, Fenghua, Chatre, Reinette, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
	rdunlap@infradead.org, tj@kernel.org, peterz@infradead.org,
	yanjiewtw@gmail.com, kim.phillips@amd.com,
	lukas.bulwahn@gmail.com, seanjc@google.com, jmattson@google.com,
	leitao@debian.org, jpoimboe@kernel.org, Edgecombe, Rick P,
	kirill.shutemov@linux.intel.com, Joseph, Jithu, Huang, Kai,
	kan.liang@linux.intel.com, daniel.sneddon@linux.intel.com,
	pbonzini@redhat.com, sandipan.das@amd.com,
	ilpo.jarvinen@linux.intel.com, peternewman@google.com,
	Wieczor-Retman, Maciej, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Eranian, Stephane,
	james.morse@arm.com

> >> +  By default resctrl assumes each control and monitor group has a hardware
> >> +  counter. Hardware that does not support 'mbm_cntr_assign' mode will still
> >> +  allow more control or monitor groups than 'num_rmids' to be created. In
> >
> > Should that be s/num_rmids/num_mbm_cntrs/ ?
>
> It is actually num_rmids here as in default mode, num_rmid_cntrs are not
> available.

Babu,

The code isn't working that way for me. I built & booted. Since I'm on
an Intel machine without ABMC I'm in "default" mode. But I can't make
more monitor groups that num_rmids.

-Tony

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-09 22:49   ` Tony Luck
@ 2024-10-10 15:12     ` Moger, Babu
  2024-10-10 15:58       ` Luck, Tony
  2024-10-14 16:25       ` Reinette Chatre
  0 siblings, 2 replies; 124+ messages in thread
From: Moger, Babu @ 2024-10-10 15:12 UTC (permalink / raw)
  To: Tony Luck
  Cc: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen,
	x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Tony,

On 10/9/24 17:49, Tony Luck wrote:
> On Wed, Oct 09, 2024 at 12:39:33PM -0500, Babu Moger wrote:
>> +"num_mbm_cntrs":
>> +	The number of monitoring counters available for assignment when the
>> +	architecture supports mbm_cntr_assign mode.
> 
> It's not obvious (to me) how these counters work. When I create
> a group with both local and total monitoring enabled, does that
> use up two counters (even though I only used up one RMID)?

That is correct. One RMID can be associated with multiple h/w counters.

> 
> Are the counters multi-purpose. E.g. if I disable local counting
> on all groups, are the freed-up counters available for use to
> count total bandwidth on some additional groups?

Yes. That is correct.

With 32 counters you can enable both the events on up to 16 groups.

You can also enable only one event in up to 32 groups.

> 
>>From the examples it looks like if there are free counters
> available when user does mkdir, then they will be assigned
> to the new rdtgroup. If only one counter is free, does it
> get assigned to local or total?

Right now total event takes a priority.

All good points. How about this text:

"num_mbm_cntrs":
The number of monitoring counters available for assignment when the
architecture supports mbm_cntr_assign mode.

Resctrl subsystem provides the interface to count maximum of two memory
bandwidth events per group, from a combination of available total and
local events. Keeping the current interface, users can enable a maximum of
2 counters per group. User will also have the option to enable only one
counter to the group to maximize the number of groups monitored.


-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 07/25] x86/resctrl: Introduce the interface to display monitor mode
  2024-10-10 15:07       ` Luck, Tony
@ 2024-10-10 15:30         ` Moger, Babu
  2024-10-10 16:02           ` Luck, Tony
  2024-10-11 22:24           ` Reinette Chatre
  0 siblings, 2 replies; 124+ messages in thread
From: Moger, Babu @ 2024-10-10 15:30 UTC (permalink / raw)
  To: Luck, Tony
  Cc: corbet@lwn.net, Yu, Fenghua, Chatre, Reinette, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
	rdunlap@infradead.org, tj@kernel.org, peterz@infradead.org,
	yanjiewtw@gmail.com, kim.phillips@amd.com,
	lukas.bulwahn@gmail.com, seanjc@google.com, jmattson@google.com,
	leitao@debian.org, jpoimboe@kernel.org, Edgecombe, Rick P,
	kirill.shutemov@linux.intel.com, Joseph, Jithu, Huang, Kai,
	kan.liang@linux.intel.com, daniel.sneddon@linux.intel.com,
	pbonzini@redhat.com, sandipan.das@amd.com,
	ilpo.jarvinen@linux.intel.com, peternewman@google.com,
	Wieczor-Retman, Maciej, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Eranian, Stephane,
	james.morse@arm.com

Hi Tony,

On 10/10/24 10:07, Luck, Tony wrote:
>>>> +  By default resctrl assumes each control and monitor group has a hardware
>>>> +  counter. Hardware that does not support 'mbm_cntr_assign' mode will still
>>>> +  allow more control or monitor groups than 'num_rmids' to be created. In
>>>
>>> Should that be s/num_rmids/num_mbm_cntrs/ ?
>>
>> It is actually num_rmids here as in default mode, num_rmid_cntrs are not
>> available.
> 
> Babu,
> 
> The code isn't working that way for me. I built & booted. Since I'm on
> an Intel machine without ABMC I'm in "default" mode. But I can't make
> more monitor groups that num_rmids.
> 

That is correct. We will have to change the text. How about?

"default":
By default resctrl assumes each control and monitor group has a hardware
counter. Hardware that does not support 'mbm_cntr_assign' mode will still
allow to create control or monitor groups up to num_rmids supported. In
that case reading the mbm_total_bytes and mbm_local_bytes may report
'Unavailable' if there is no counter associated with that group.

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* RE: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-10 15:12     ` Moger, Babu
@ 2024-10-10 15:58       ` Luck, Tony
  2024-10-10 16:57         ` Moger, Babu
  2024-10-14 16:25       ` Reinette Chatre
  1 sibling, 1 reply; 124+ messages in thread
From: Luck, Tony @ 2024-10-10 15:58 UTC (permalink / raw)
  To: babu.moger@amd.com
  Cc: corbet@lwn.net, Yu, Fenghua, Chatre, Reinette, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
	rdunlap@infradead.org, tj@kernel.org, peterz@infradead.org,
	yanjiewtw@gmail.com, kim.phillips@amd.com,
	lukas.bulwahn@gmail.com, seanjc@google.com, jmattson@google.com,
	leitao@debian.org, jpoimboe@kernel.org, Edgecombe, Rick P,
	kirill.shutemov@linux.intel.com, Joseph, Jithu, Huang, Kai,
	kan.liang@linux.intel.com, daniel.sneddon@linux.intel.com,
	pbonzini@redhat.com, sandipan.das@amd.com,
	ilpo.jarvinen@linux.intel.com, peternewman@google.com,
	Wieczor-Retman, Maciej, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Eranian, Stephane,
	james.morse@arm.com

> All good points. How about this text:
>
> "num_mbm_cntrs":
> The number of monitoring counters available for assignment when the
> architecture supports mbm_cntr_assign mode.
>
> Resctrl subsystem provides the interface to count maximum of two memory
> bandwidth events per group, from a combination of available total and
> local events. Keeping the current interface, users can enable a maximum of
> 2 counters per group. User will also have the option to enable only one
> counter to the group to maximize the number of groups monitored.

Much better. Looks OK to me.

New questions:

1) Should resctrl provide a file to tell the user how many free
counters are available? They can figure it out by counting all the 'l' and 't'
in "mbm_assign_control" and subtracting that from "num_mbm_cntrs".
But that seems complex.

2) Even more so because free counters might be different per socket
if the user did some "0=tl;1=_" assignments as in one of your examples.

Maybe a UI like:

$ cat /sys/fs/resctrl/info/L3_MON/free_mbm_cntrs
0=5;1=9

-Tony



^ permalink raw reply	[flat|nested] 124+ messages in thread

* RE: [PATCH v8 07/25] x86/resctrl: Introduce the interface to display monitor mode
  2024-10-10 15:30         ` Moger, Babu
@ 2024-10-10 16:02           ` Luck, Tony
  2024-10-11 22:24           ` Reinette Chatre
  1 sibling, 0 replies; 124+ messages in thread
From: Luck, Tony @ 2024-10-10 16:02 UTC (permalink / raw)
  To: babu.moger@amd.com
  Cc: corbet@lwn.net, Yu, Fenghua, Chatre, Reinette, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
	rdunlap@infradead.org, tj@kernel.org, peterz@infradead.org,
	yanjiewtw@gmail.com, kim.phillips@amd.com,
	lukas.bulwahn@gmail.com, seanjc@google.com, jmattson@google.com,
	leitao@debian.org, jpoimboe@kernel.org, Edgecombe, Rick P,
	kirill.shutemov@linux.intel.com, Joseph, Jithu, Huang, Kai,
	kan.liang@linux.intel.com, daniel.sneddon@linux.intel.com,
	pbonzini@redhat.com, sandipan.das@amd.com,
	ilpo.jarvinen@linux.intel.com, peternewman@google.com,
	Wieczor-Retman, Maciej, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Eranian, Stephane,
	james.morse@arm.com

> That is correct. We will have to change the text. How about?
>
> "default":
> By default resctrl assumes each control and monitor group has a hardware
> counter. Hardware that does not support 'mbm_cntr_assign' mode will still
> allow to create control or monitor groups up to num_rmids supported. In
> that case reading the mbm_total_bytes and mbm_local_bytes may report
> 'Unavailable' if there is no counter associated with that group.

Babu,

Looks good to me.

Thanks

-Tony

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-10 15:58       ` Luck, Tony
@ 2024-10-10 16:57         ` Moger, Babu
  2024-10-10 17:08           ` Luck, Tony
  0 siblings, 1 reply; 124+ messages in thread
From: Moger, Babu @ 2024-10-10 16:57 UTC (permalink / raw)
  To: Luck, Tony
  Cc: corbet@lwn.net, Yu, Fenghua, Chatre, Reinette, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
	rdunlap@infradead.org, tj@kernel.org, peterz@infradead.org,
	yanjiewtw@gmail.com, kim.phillips@amd.com,
	lukas.bulwahn@gmail.com, seanjc@google.com, jmattson@google.com,
	leitao@debian.org, jpoimboe@kernel.org, Edgecombe, Rick P,
	kirill.shutemov@linux.intel.com, Joseph, Jithu, Huang, Kai,
	kan.liang@linux.intel.com, daniel.sneddon@linux.intel.com,
	pbonzini@redhat.com, sandipan.das@amd.com,
	ilpo.jarvinen@linux.intel.com, peternewman@google.com,
	Wieczor-Retman, Maciej, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Eranian, Stephane,
	james.morse@arm.com

Hi Tony,

On 10/10/24 10:58, Luck, Tony wrote:
>> All good points. How about this text:
>>
>> "num_mbm_cntrs":
>> The number of monitoring counters available for assignment when the
>> architecture supports mbm_cntr_assign mode.
>>
>> Resctrl subsystem provides the interface to count maximum of two memory
>> bandwidth events per group, from a combination of available total and
>> local events. Keeping the current interface, users can enable a maximum of
>> 2 counters per group. User will also have the option to enable only one
>> counter to the group to maximize the number of groups monitored.
> 
> Much better. Looks OK to me.

thanks

> 
> New questions:
> 
> 1) Should resctrl provide a file to tell the user how many free
> counters are available? They can figure it out by counting all the 'l' and 't'
> in "mbm_assign_control" and subtracting that from "num_mbm_cntrs".
> But that seems complex.

We have the information already in r->mon.mbm_cntr_free_map.

How about adding an extra text while printing num_mbm_cntrs?

$ cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
  Total 32, Available 16

There are all global counters, we don't differentiate between sockets just
like number of CLOSIDs.


> 
> 2) Even more so because free counters might be different per socket
> if the user did some "0=tl;1=_" assignments as in one of your examples.
> 
> Maybe a UI like:
> 
> $ cat /sys/fs/resctrl/info/L3_MON/free_mbm_cntrs
> 0=5;1=9
> 
> -Tony
> 
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* RE: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-10 16:57         ` Moger, Babu
@ 2024-10-10 17:08           ` Luck, Tony
  2024-10-10 18:36             ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Luck, Tony @ 2024-10-10 17:08 UTC (permalink / raw)
  To: babu.moger@amd.com
  Cc: corbet@lwn.net, Yu, Fenghua, Chatre, Reinette, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
	rdunlap@infradead.org, tj@kernel.org, peterz@infradead.org,
	yanjiewtw@gmail.com, kim.phillips@amd.com,
	lukas.bulwahn@gmail.com, seanjc@google.com, jmattson@google.com,
	leitao@debian.org, jpoimboe@kernel.org, Edgecombe, Rick P,
	kirill.shutemov@linux.intel.com, Joseph, Jithu, Huang, Kai,
	kan.liang@linux.intel.com, daniel.sneddon@linux.intel.com,
	pbonzini@redhat.com, sandipan.das@amd.com,
	ilpo.jarvinen@linux.intel.com, peternewman@google.com,
	Wieczor-Retman, Maciej, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Eranian, Stephane,
	james.morse@arm.com

Babu,

> We have the information already in r->mon.mbm_cntr_free_map.
>
> How about adding an extra text while printing num_mbm_cntrs?
>
> $ cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
>   Total 32, Available 16

Either that or:
Total 32
Available 16

which looks fractionally simpler to parse. But I don't have strong feelings.

> There are all global counters, we don't differentiate between sockets just
> like number of CLOSIDs.

Interesting. So there is no real benefit from "0=tl;1=_" ... you are using
up two counters, just not reporting them on socket 1.

Why have this complexity in mbm_assign_control syntax?

You could have just {grouppath}/{allocation}

where allocation is one of _, t, l, tl

-Tony

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-10 17:08           ` Luck, Tony
@ 2024-10-10 18:36             ` Moger, Babu
  2024-10-10 18:57               ` Luck, Tony
  2024-10-14 16:59               ` Reinette Chatre
  0 siblings, 2 replies; 124+ messages in thread
From: Moger, Babu @ 2024-10-10 18:36 UTC (permalink / raw)
  To: Luck, Tony
  Cc: corbet@lwn.net, Yu, Fenghua, Chatre, Reinette, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
	rdunlap@infradead.org, tj@kernel.org, peterz@infradead.org,
	yanjiewtw@gmail.com, kim.phillips@amd.com,
	lukas.bulwahn@gmail.com, seanjc@google.com, jmattson@google.com,
	leitao@debian.org, jpoimboe@kernel.org, Edgecombe, Rick P,
	kirill.shutemov@linux.intel.com, Joseph, Jithu, Huang, Kai,
	kan.liang@linux.intel.com, daniel.sneddon@linux.intel.com,
	pbonzini@redhat.com, sandipan.das@amd.com,
	ilpo.jarvinen@linux.intel.com, peternewman@google.com,
	Wieczor-Retman, Maciej, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Eranian, Stephane,
	james.morse@arm.com

Hi Tony,

On 10/10/24 12:08, Luck, Tony wrote:
> Babu,
> 
>> We have the information already in r->mon.mbm_cntr_free_map.
>>
>> How about adding an extra text while printing num_mbm_cntrs?
>>
>> $ cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
>>   Total 32, Available 16
> 
> Either that or:
> Total 32
> Available 16
> 

Sure. Fine with me.

> which looks fractionally simpler to parse. But I don't have strong feelings.
> 
>> There are all global counters, we don't differentiate between sockets just
>> like number of CLOSIDs.
> 
> Interesting. So there is no real benefit from "0=tl;1=_" ... you are using
> up two counters, just not reporting them on socket 1.
> 
> Why have this complexity in mbm_assign_control syntax?

Lets take an example:
$ cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
Total 32
Available 30

# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
//0=tl;1=tl;

Here default group has taken two counters.

# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
//0=_;1=tl;

Here default group has two counters.
Domain 0 does not have counters applied. So, you wont be able to read the
MBM values for domain 0.
Domain 1 has both the counters applied.

Domain level application is important.

This is similar to what we have with schemata. You can change the value in
each individual domain.
#cat schemata
    MB:0=2048;1=2048;2=2048;3=2048
    L3:0=ffff;1=ffff;2=ffff;3=ffff


> 
> You could have just {grouppath}/{allocation}
> 
> where allocation is one of _, t, l, tl
> 
> -Tony

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* RE: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-10 18:36             ` Moger, Babu
@ 2024-10-10 18:57               ` Luck, Tony
  2024-10-10 20:32                 ` Moger, Babu
  2024-10-14 16:59               ` Reinette Chatre
  1 sibling, 1 reply; 124+ messages in thread
From: Luck, Tony @ 2024-10-10 18:57 UTC (permalink / raw)
  To: babu.moger@amd.com
  Cc: corbet@lwn.net, Yu, Fenghua, Chatre, Reinette, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
	rdunlap@infradead.org, tj@kernel.org, peterz@infradead.org,
	yanjiewtw@gmail.com, kim.phillips@amd.com,
	lukas.bulwahn@gmail.com, seanjc@google.com, jmattson@google.com,
	leitao@debian.org, jpoimboe@kernel.org, Edgecombe, Rick P,
	kirill.shutemov@linux.intel.com, Joseph, Jithu, Huang, Kai,
	kan.liang@linux.intel.com, daniel.sneddon@linux.intel.com,
	pbonzini@redhat.com, sandipan.das@amd.com,
	ilpo.jarvinen@linux.intel.com, peternewman@google.com,
	Wieczor-Retman, Maciej, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Eranian, Stephane,
	james.morse@arm.com

> > Why have this complexity in mbm_assign_control syntax?
>
> Lets take an example:
> $ cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> Total 32
> Available 30
>
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> //0=tl;1=tl;
>
> Here default group has taken two counters.
>
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> //0=_;1=tl;
>
> Here default group has two counters.
> Domain 0 does not have counters applied. So, you wont be able to read the
> MBM values for domain 0.
> Domain 1 has both the counters applied.

Is there some benefit from doing this? You are still using the same
number of counters. You now can't read them from domain 0.

You said the counters are system-wide. Does that mean that in 
this case:

# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
//0=tl;1=tl;

there aren't separate counts from each of domain 0 and domain 1.
I.e. if I read both I'd see the same value (sum of traffic on both domains):

$ grep . /sys/fs/resctrl/mon_data/*/*total*
/sys/fs/resctrl /mon_data/mon_L3_00/mbm_total_bytes:260039467008
/sys/fs/resctrl /mon_data/mon_L3_01/mbm_total_bytes:260039467008

-Tony

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-10 18:57               ` Luck, Tony
@ 2024-10-10 20:32                 ` Moger, Babu
  2024-10-11 17:44                   ` Tony Luck
  0 siblings, 1 reply; 124+ messages in thread
From: Moger, Babu @ 2024-10-10 20:32 UTC (permalink / raw)
  To: Luck, Tony
  Cc: corbet@lwn.net, Yu, Fenghua, Chatre, Reinette, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
	rdunlap@infradead.org, tj@kernel.org, peterz@infradead.org,
	yanjiewtw@gmail.com, kim.phillips@amd.com,
	lukas.bulwahn@gmail.com, seanjc@google.com, jmattson@google.com,
	leitao@debian.org, jpoimboe@kernel.org, Edgecombe, Rick P,
	kirill.shutemov@linux.intel.com, Joseph, Jithu, Huang, Kai,
	kan.liang@linux.intel.com, daniel.sneddon@linux.intel.com,
	pbonzini@redhat.com, sandipan.das@amd.com,
	ilpo.jarvinen@linux.intel.com, peternewman@google.com,
	Wieczor-Retman, Maciej, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Eranian, Stephane,
	james.morse@arm.com

Hi Tony,

On 10/10/24 13:57, Luck, Tony wrote:
>>> Why have this complexity in mbm_assign_control syntax?
>>
>> Lets take an example:
>> $ cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
>> Total 32
>> Available 30
>>
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> //0=tl;1=tl;
>>
>> Here default group has taken two counters.
>>
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> //0=_;1=tl;
>>
>> Here default group has two counters.
>> Domain 0 does not have counters applied. So, you wont be able to read the
>> MBM values for domain 0.
>> Domain 1 has both the counters applied.
> 
> Is there some benefit from doing this? You are still using the same
> number of counters. You now can't read them from domain 0.
> 
> You said the counters are system-wide. Does that mean that in 
> this case:

Counter are system wide. We also keep track if the counters is applied to
specific domain or not. We have two bitmaps to keep track of this.

There is a cost to applying counter to the domain(IPI needs to be sent to
the domain).


> 
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> //0=tl;1=tl;
> 
> there aren't separate counts from each of domain 0 and domain 1.

Yes. There is. Each domain has its own count. I am not sure about your config.

# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
 //0=_;1=tl;

# grep . /sys/fs/resctrl/mon_data/*/*total*
/sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes:Unassigned
/sys/fs/resctrl/mon_data/mon_L3_01/mbm_total_bytes:22976

> I.e. if I read both I'd see the same value (sum of traffic on both domains):
> 
> $ grep . /sys/fs/resctrl/mon_data/*/*total*
> /sys/fs/resctrl /mon_data/mon_L3_00/mbm_total_bytes:260039467008
> /sys/fs/resctrl /mon_data/mon_L3_01/mbm_total_bytes:260039467008
> 
> -Tony

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 19/25] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled
  2024-10-09 17:39 ` [PATCH v8 19/25] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled Babu Moger
@ 2024-10-11 17:17   ` Tony Luck
  2024-10-11 21:17     ` Moger, Babu
  2024-10-16  3:30   ` Reinette Chatre
  1 sibling, 1 reply; 124+ messages in thread
From: Tony Luck @ 2024-10-11 17:17 UTC (permalink / raw)
  To: Babu Moger
  Cc: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen,
	x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

On Wed, Oct 09, 2024 at 12:39:44PM -0500, Babu Moger wrote:
> +/*
> + * Called when a new group is created. If `mbm_cntr_assign` mode is enabled,
> + * counters are automatically assigned. Each group requires two counters:
> + * one for the total event and one for the local event. Due to the limited
> + * number of counters, assignments may fail in some cases. However, it is
> + * not necessary to fail the group creation. Users have the option to
> + * modify the assignments after the group has been created.
> + */
> +static int rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
> +{
> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> +	int ret = 0;
> +
> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r))
> +		return 0;
> +
> +	if (is_mbm_total_enabled())
> +		ret = rdtgroup_assign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_TOTAL_EVENT_ID);
> +
> +	if (!ret && is_mbm_local_enabled())
> +		ret = rdtgroup_assign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);

This overwrites the value from allocating the counter for total event.

> +
> +	return ret;

But none of the callers check the return. Indeed it is ok (and
expected) that counter allocation can fail.

Just make this a "void" function and delete the "ret" local variable.
> +}
> +
> +/*
> + * Called when a group is deleted. Counters are unassigned if it was in
> + * assigned state.
> + */
> +static int rdtgroup_unassign_cntrs(struct rdtgroup *rdtgrp)
> +{
> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> +	int ret = 0;
> +
> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r))
> +		return 0;
> +
> +	if (is_mbm_total_enabled())
> +		ret = rdtgroup_unassign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_TOTAL_EVENT_ID);
> +
> +	if (!ret && is_mbm_local_enabled())
> +		ret = rdtgroup_unassign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);
> +
> +	return ret;

Ditto. No caller checks. Make this a void function. Dig down the
call chain here. It looks like rdtgroup_unassign_cntr_event() can't
fail, so it should be a void function too. Ditto resctrl_arch_config_cntr()
> +}

-Tony

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 20/25] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode
  2024-10-09 17:39 ` [PATCH v8 20/25] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode Babu Moger
@ 2024-10-11 17:23   ` Tony Luck
  2024-10-11 21:21     ` Moger, Babu
  2024-10-16  3:31   ` Reinette Chatre
  1 sibling, 1 reply; 124+ messages in thread
From: Tony Luck @ 2024-10-11 17:23 UTC (permalink / raw)
  To: Babu Moger
  Cc: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen,
	x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

On Wed, Oct 09, 2024 at 12:39:45PM -0500, Babu Moger wrote:
> @@ -576,6 +576,15 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>  	evtid = md.u.evtid;
>  	r = &rdt_resources_all[resid].r_resctrl;
>  
> +	if (resctrl_arch_mbm_cntr_assign_enabled(r) && evtid != QOS_L3_OCCUP_EVENT_ID) {

Better to write this as:

	if (resctrl_arch_mbm_cntr_assign_enabled(r) && is_mbm_event(evtid)) {

-Tony

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-10 20:32                 ` Moger, Babu
@ 2024-10-11 17:44                   ` Tony Luck
  2024-10-11 20:49                     ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Tony Luck @ 2024-10-11 17:44 UTC (permalink / raw)
  To: Moger, Babu
  Cc: corbet@lwn.net, Yu, Fenghua, Chatre, Reinette, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
	rdunlap@infradead.org, tj@kernel.org, peterz@infradead.org,
	yanjiewtw@gmail.com, kim.phillips@amd.com,
	lukas.bulwahn@gmail.com, seanjc@google.com, jmattson@google.com,
	leitao@debian.org, jpoimboe@kernel.org, Edgecombe, Rick P,
	kirill.shutemov@linux.intel.com, Joseph, Jithu, Huang, Kai,
	kan.liang@linux.intel.com, daniel.sneddon@linux.intel.com,
	pbonzini@redhat.com, sandipan.das@amd.com,
	ilpo.jarvinen@linux.intel.com, peternewman@google.com,
	Wieczor-Retman, Maciej, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Eranian, Stephane,
	james.morse@arm.com

On Thu, Oct 10, 2024 at 03:32:08PM -0500, Moger, Babu wrote:
> > # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> > //0=tl;1=tl;
> > 
> > there aren't separate counts from each of domain 0 and domain 1.
> 
> Yes. There is. Each domain has its own count. I am not sure about your config.

I've been reading the code and see better now.

There are a bunch (32) of counters per domain.

But you have a system-wide allocator. So when making
a group you may allocate counters 2 and 3 for total
and local respectively. Then configure the local instance
of counter 2 on each domain (recording that in the per-domain
bitmap) for total bandwidth. Ditto for counter 3 instances
on each domain.

If the user updates the configuration to stop counting
on domain 1. Then the per-domain bitmap is updated to
show counters 2 and 3 are no longer in use on this domain.
But those counters aren't freed (because domain 0 is still
using them).

Is there some hardware limitation that would prevent
re-using domain 1 counters 2 & 3 for some other group (RMID)?

Or is this just a s/w implementation detail because
you have a system wide allocator for counters?

-Tony

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 06/25] x86/resctrl: Add support to enable/disable AMD ABMC feature
  2024-10-09 17:39 ` [PATCH v8 06/25] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
@ 2024-10-11 18:14   ` Tony Luck
  2024-10-11 20:53     ` Moger, Babu
  2024-10-16  3:07   ` Reinette Chatre
  1 sibling, 1 reply; 124+ messages in thread
From: Tony Luck @ 2024-10-11 18:14 UTC (permalink / raw)
  To: Babu Moger
  Cc: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen,
	x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

On Wed, Oct 09, 2024 at 12:39:31PM -0500, Babu Moger wrote:
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 3ae84c3b8e6d..43c9dc473aba 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -1195,6 +1195,7 @@
>  #define MSR_IA32_MBA_BW_BASE		0xc0000200
>  #define MSR_IA32_SMBA_BW_BASE		0xc0000280
>  #define MSR_IA32_EVT_CFG_BASE		0xc0000400
> +#define MSR_IA32_L3_QOS_EXT_CFG		0xc00003ff

Nitpick. Most of the MSRs in this file are in numerical order (within
each functional grouping). So this belongs before MSR_IA32_EVT_CFG_BASE

Same in patch 14 which adds MSR_IA32_L3_QOS_ABMC_CFG

-Tony

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-11 17:44                   ` Tony Luck
@ 2024-10-11 20:49                     ` Moger, Babu
  2024-10-11 21:36                       ` Tony Luck
  0 siblings, 1 reply; 124+ messages in thread
From: Moger, Babu @ 2024-10-11 20:49 UTC (permalink / raw)
  To: Tony Luck, Moger, Babu
  Cc: corbet@lwn.net, Yu, Fenghua, Chatre, Reinette, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
	rdunlap@infradead.org, tj@kernel.org, peterz@infradead.org,
	yanjiewtw@gmail.com, kim.phillips@amd.com,
	lukas.bulwahn@gmail.com, seanjc@google.com, jmattson@google.com,
	leitao@debian.org, jpoimboe@kernel.org, Edgecombe, Rick P,
	kirill.shutemov@linux.intel.com, Joseph, Jithu, Huang, Kai,
	kan.liang@linux.intel.com, daniel.sneddon@linux.intel.com,
	pbonzini@redhat.com, sandipan.das@amd.com,
	ilpo.jarvinen@linux.intel.com, peternewman@google.com,
	Wieczor-Retman, Maciej, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Eranian, Stephane,
	james.morse@arm.com

Hi Tony,

On 10/11/2024 12:44 PM, Tony Luck wrote:
> On Thu, Oct 10, 2024 at 03:32:08PM -0500, Moger, Babu wrote:
>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>> //0=tl;1=tl;
>>>
>>> there aren't separate counts from each of domain 0 and domain 1.
>>
>> Yes. There is. Each domain has its own count. I am not sure about your config.
> 
> I've been reading the code and see better now.
> 
> There are a bunch (32) of counters per domain.
> 
> But you have a system-wide allocator. So when making
> a group you may allocate counters 2 and 3 for total
> and local respectively. Then configure the local instance
> of counter 2 on each domain (recording that in the per-domain
> bitmap) for total bandwidth. Ditto for counter 3 instances
> on each domain.

Yes. That is correct.
> 
> If the user updates the configuration to stop counting
> on domain 1. Then the per-domain bitmap is updated to
> show counters 2 and 3 are no longer in use on this domain.
> But those counters aren't freed (because domain 0 is still
> using them).

Yes. Correct.


> 
> Is there some hardware limitation that would prevent
> re-using domain 1 counters 2 & 3 for some other group (RMID)?
> 
> Or is this just a s/w implementation detail because
> you have a system wide allocator for counters?
> 

There is no hardware limitation. It is how resctrl is designed.
In case of Intel(with two sockets, 16 CLOSIDs), You can only create 16 
groups. Each group will have two domains(domain 0 for socket 0 and 
domain 1 for socket 1).

# cat schemata
     MB:0=100;1=100
     L3:0=ffff;1=ffff;


We may have to think of addressing this sometime in the future.
-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 06/25] x86/resctrl: Add support to enable/disable AMD ABMC feature
  2024-10-11 18:14   ` Tony Luck
@ 2024-10-11 20:53     ` Moger, Babu
  0 siblings, 0 replies; 124+ messages in thread
From: Moger, Babu @ 2024-10-11 20:53 UTC (permalink / raw)
  To: Tony Luck, Babu Moger
  Cc: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen,
	x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Tony,

On 10/11/2024 1:14 PM, Tony Luck wrote:
> On Wed, Oct 09, 2024 at 12:39:31PM -0500, Babu Moger wrote:
>> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
>> index 3ae84c3b8e6d..43c9dc473aba 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -1195,6 +1195,7 @@
>>   #define MSR_IA32_MBA_BW_BASE		0xc0000200
>>   #define MSR_IA32_SMBA_BW_BASE		0xc0000280
>>   #define MSR_IA32_EVT_CFG_BASE		0xc0000400
>> +#define MSR_IA32_L3_QOS_EXT_CFG		0xc00003ff
> 
> Nitpick. Most of the MSRs in this file are in numerical order (within
> each functional grouping). So this belongs before MSR_IA32_EVT_CFG_BASE
> 
> Same in patch 14 which adds MSR_IA32_L3_QOS_ABMC_CFG

Yes. Will take care of this in next revision.

-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 19/25] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled
  2024-10-11 17:17   ` Tony Luck
@ 2024-10-11 21:17     ` Moger, Babu
  2024-10-11 21:33       ` Luck, Tony
  0 siblings, 1 reply; 124+ messages in thread
From: Moger, Babu @ 2024-10-11 21:17 UTC (permalink / raw)
  To: Tony Luck, Babu Moger
  Cc: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen,
	x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Tony,

On 10/11/2024 12:17 PM, Tony Luck wrote:
> On Wed, Oct 09, 2024 at 12:39:44PM -0500, Babu Moger wrote:
>> +/*
>> + * Called when a new group is created. If `mbm_cntr_assign` mode is enabled,
>> + * counters are automatically assigned. Each group requires two counters:
>> + * one for the total event and one for the local event. Due to the limited
>> + * number of counters, assignments may fail in some cases. However, it is
>> + * not necessary to fail the group creation. Users have the option to
>> + * modify the assignments after the group has been created.
>> + */
>> +static int rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
>> +{
>> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> +	int ret = 0;
>> +
>> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r))
>> +		return 0;
>> +
>> +	if (is_mbm_total_enabled())
>> +		ret = rdtgroup_assign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_TOTAL_EVENT_ID);
>> +
>> +	if (!ret && is_mbm_local_enabled())
>> +		ret = rdtgroup_assign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);
> 
> This overwrites the value from allocating the counter for total event.

Total event and local events have two different indexes.
Can you please elaborate?

> 
>> +
>> +	return ret;
> 
> But none of the callers check the return. Indeed it is ok (and
> expected) that counter allocation can fail.
> 
> Just make this a "void" function and delete the "ret" local variable.

Comment below.
>> +}
>> +
>> +/*
>> + * Called when a group is deleted. Counters are unassigned if it was in
>> + * assigned state.
>> + */
>> +static int rdtgroup_unassign_cntrs(struct rdtgroup *rdtgrp)
>> +{
>> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> +	int ret = 0;
>> +
>> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r))
>> +		return 0;
>> +
>> +	if (is_mbm_total_enabled())
>> +		ret = rdtgroup_unassign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_TOTAL_EVENT_ID);
>> +
>> +	if (!ret && is_mbm_local_enabled())
>> +		ret = rdtgroup_unassign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);
>> +
>> +	return ret;
> 
> Ditto. No caller checks. Make this a void function. Dig down the
> call chain here. It looks like rdtgroup_unassign_cntr_event() can't
> fail, so it should be a void function too. Ditto resctrl_arch_config_cntr()

It was started a void function. In this case all the call sequence 
return 0. There is a possibility that other architectures can return 
failure(in arch calls resctrl_arch_config_cntr()). Keeping that in mind 
we added the check to handle the return values. Hope that helps.
Thanks
-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 20/25] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode
  2024-10-11 17:23   ` Tony Luck
@ 2024-10-11 21:21     ` Moger, Babu
  0 siblings, 0 replies; 124+ messages in thread
From: Moger, Babu @ 2024-10-11 21:21 UTC (permalink / raw)
  To: Tony Luck, Babu Moger
  Cc: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen,
	x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Tony,

On 10/11/2024 12:23 PM, Tony Luck wrote:
> On Wed, Oct 09, 2024 at 12:39:45PM -0500, Babu Moger wrote:
>> @@ -576,6 +576,15 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>>   	evtid = md.u.evtid;
>>   	r = &rdt_resources_all[resid].r_resctrl;
>>   
>> +	if (resctrl_arch_mbm_cntr_assign_enabled(r) && evtid != QOS_L3_OCCUP_EVENT_ID) {
> 
> Better to write this as:
> 
> 	if (resctrl_arch_mbm_cntr_assign_enabled(r) && is_mbm_event(evtid)) {
> 

Sure. will do.
Thanks
- Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* RE: [PATCH v8 19/25] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled
  2024-10-11 21:17     ` Moger, Babu
@ 2024-10-11 21:33       ` Luck, Tony
  2024-10-14 15:43         ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Luck, Tony @ 2024-10-11 21:33 UTC (permalink / raw)
  To: babu.moger@amd.com
  Cc: corbet@lwn.net, Yu, Fenghua, Chatre, Reinette, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
	rdunlap@infradead.org, tj@kernel.org, peterz@infradead.org,
	yanjiewtw@gmail.com, kim.phillips@amd.com,
	lukas.bulwahn@gmail.com, seanjc@google.com, jmattson@google.com,
	leitao@debian.org, jpoimboe@kernel.org, Edgecombe, Rick P,
	kirill.shutemov@linux.intel.com, Joseph, Jithu, Huang, Kai,
	kan.liang@linux.intel.com, daniel.sneddon@linux.intel.com,
	pbonzini@redhat.com, sandipan.das@amd.com,
	ilpo.jarvinen@linux.intel.com, peternewman@google.com,
	Wieczor-Retman, Maciej, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Eranian, Stephane,
	james.morse@arm.com

> >> +static int rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
> >> +{
> >> +  struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> >> +  int ret = 0;
> >> +
> >> +  if (!resctrl_arch_mbm_cntr_assign_enabled(r))
> >> +          return 0;
> >> +
> >> +  if (is_mbm_total_enabled())
> >> +          ret = rdtgroup_assign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_TOTAL_EVENT_ID);

Consider that this call fails. "ret" indicates failure to allocate.

> >> +
> >> +  if (!ret && is_mbm_local_enabled())
> >> +          ret = rdtgroup_assign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);

Now this call succeeds. The failure of the previous call is forgotten as "ret" is
overwritten with the success code.

> >
> > This overwrites the value from allocating the counter for total event.
>
> Total event and local events have two different indexes.
> Can you please elaborate?

See comments above.  If you want a return code you need

	int ret_local = 0, ret_total = 0;

	if (is_mbm_total_enabled())
		ret_total = rdtgroup_assign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_TOTAL_EVENT_ID);
	if (!ret && is_mbm_local_enabled())
		ret_local = rdtgroup_assign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);


	return some_function of ret_local and ret_total;

Not sure if you want to say success only if both of these calls succeeded. Or maybe if either worked?

But it all seems complicated since callers don't have to take any different action depending on whether allocation of a counter succeeds or fails.

-Tony


	

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-11 20:49                     ` Moger, Babu
@ 2024-10-11 21:36                       ` Tony Luck
  2024-10-14 16:46                         ` Reinette Chatre
  0 siblings, 1 reply; 124+ messages in thread
From: Tony Luck @ 2024-10-11 21:36 UTC (permalink / raw)
  To: babu.moger
  Cc: corbet@lwn.net, Yu, Fenghua, Chatre, Reinette, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
	rdunlap@infradead.org, tj@kernel.org, peterz@infradead.org,
	yanjiewtw@gmail.com, kim.phillips@amd.com,
	lukas.bulwahn@gmail.com, seanjc@google.com, jmattson@google.com,
	leitao@debian.org, jpoimboe@kernel.org, Edgecombe, Rick P,
	kirill.shutemov@linux.intel.com, Joseph, Jithu, Huang, Kai,
	kan.liang@linux.intel.com, daniel.sneddon@linux.intel.com,
	pbonzini@redhat.com, sandipan.das@amd.com,
	ilpo.jarvinen@linux.intel.com, peternewman@google.com,
	Wieczor-Retman, Maciej, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Eranian, Stephane,
	james.morse@arm.com

On Fri, Oct 11, 2024 at 03:49:48PM -0500, Moger, Babu wrote:
> > Is there some hardware limitation that would prevent
> > re-using domain 1 counters 2 & 3 for some other group (RMID)?
> >
> > Or is this just a s/w implementation detail because
> > you have a system wide allocator for counters?
> >
>
> There is no hardware limitation. It is how resctrl is designed.
> In case of Intel(with two sockets, 16 CLOSIDs), You can only create 16
> groups. Each group will have two domains(domain 0 for socket 0 and domain 1
> for socket 1).
>
> # cat schemata
>     MB:0=100;1=100
>     L3:0=ffff;1=ffff;
>
>
> We may have to think of addressing this sometime in the future.

In this example, the hardware would support using the instances
of counters 2 & 3 on socket 1 for a different group (RMID). But
your code doesn't alllow it because the instances of counters
2 & 3 are active on socket 0.

If you had a separate counter allocation pool for each domain
you would not have this limitation. When counters 2 & 3 are
freed on domain 1, they could be allocated to the domain 1
element of some other group.

Maybe that isn't an interesting use case, so not worth doing?

But if that is the goal, then there is no benefit in having
/sys/fs/resctrl/info/L3_MON/mbm_assign_control allow different
domains to choose different counter allocation policies.

E.g. in this example from Documentation:

/child_default_mon_grp/0=tl;1=l;

This group allocated two counters (because domain 0 is counting
both total and local). Domain 1 is only counting local, but
that means a counter on domain 1 is sitting idle. It can't
be used because the matching counter is active on domain 0.

I.e. the user who chose this simply gave up being able to
read total bandwidth on domain 1, but didn't get an extra
counter in exchange for this sacrifice. That doesn't seem
like a good deal.

I see two options for improvement:

1) Implement per-domain allocation of counters. Then a counter
freed in a domain becomes available for use in that domain
for other groups.

2) Go all-in on the global counter model and simplify the
syntax of mbm_assign_control to allocate the same counters
in all domains. That would simplify the parsing code.

-Tony

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 07/25] x86/resctrl: Introduce the interface to display monitor mode
  2024-10-10 15:30         ` Moger, Babu
  2024-10-10 16:02           ` Luck, Tony
@ 2024-10-11 22:24           ` Reinette Chatre
  2024-10-14 15:16             ` Moger, Babu
  1 sibling, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-11 22:24 UTC (permalink / raw)
  To: babu.moger, Luck, Tony
  Cc: corbet@lwn.net, Yu, Fenghua, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, paulmck@kernel.org, rdunlap@infradead.org,
	tj@kernel.org, peterz@infradead.org, yanjiewtw@gmail.com,
	kim.phillips@amd.com, lukas.bulwahn@gmail.com, seanjc@google.com,
	jmattson@google.com, leitao@debian.org, jpoimboe@kernel.org,
	Edgecombe, Rick P, kirill.shutemov@linux.intel.com, Joseph, Jithu,
	Huang, Kai, kan.liang@linux.intel.com,
	daniel.sneddon@linux.intel.com, pbonzini@redhat.com,
	sandipan.das@amd.com, ilpo.jarvinen@linux.intel.com,
	peternewman@google.com, Wieczor-Retman, Maciej,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Eranian, Stephane, james.morse@arm.com



On 10/10/24 8:30 AM, Moger, Babu wrote:
> On 10/10/24 10:07, Luck, Tony wrote:
>>>>> +  By default resctrl assumes each control and monitor group has a hardware
>>>>> +  counter. Hardware that does not support 'mbm_cntr_assign' mode will still
>>>>> +  allow more control or monitor groups than 'num_rmids' to be created. In
>>>>
>>>> Should that be s/num_rmids/num_mbm_cntrs/ ?
>>>
>>> It is actually num_rmids here as in default mode, num_rmid_cntrs are not
>>> available.
>>
>> Babu,
>>
>> The code isn't working that way for me. I built & booted. Since I'm on
>> an Intel machine without ABMC I'm in "default" mode. But I can't make
>> more monitor groups that num_rmids.
>>
> 
> That is correct. We will have to change the text. How about?
> 
> "default":
> By default resctrl assumes each control and monitor group has a hardware
> counter. Hardware that does not support 'mbm_cntr_assign' mode will still

I think this is independent from whether hardware supports 'mbm_cntr_assign'
mode since a user could enable 'default' mode on hardware that supports 
'mbm_cntr_assign'. This snippet is thus more about what is meant by 'default'
mode than what is supported by hardware.

The docs already contain:
	"num_rmids":
		...
		This is the upper bound for how many "CTRL_MON" + "MON"
		groups can be created.


Neither of the 'mbm_assign_mode' options change this meaning of 'num_rmids' (i.e.
no change in how many monitor groups can be created) so mentioning it in the
'default' portion but not in the 'mbm_cntr_assign' portion may create confusion.


Perhaps it can be simplified to:
	In default mode resctrl assumes each CTRL_MON and MON group has a
	hardware counter. Reading mbm_total_bytes or mbm_local_bytes may
	report 'Unavailable' if there is no counter associated with that
	group.


> allow to create control or monitor groups up to num_rmids supported. In
> that case reading the mbm_total_bytes and mbm_local_bytes may report
> 'Unavailable' if there is no counter associated with that group.
> 

Reinette

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 07/25] x86/resctrl: Introduce the interface to display monitor mode
  2024-10-11 22:24           ` Reinette Chatre
@ 2024-10-14 15:16             ` Moger, Babu
  0 siblings, 0 replies; 124+ messages in thread
From: Moger, Babu @ 2024-10-14 15:16 UTC (permalink / raw)
  To: Reinette Chatre, Luck, Tony
  Cc: corbet@lwn.net, Yu, Fenghua, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, paulmck@kernel.org, rdunlap@infradead.org,
	tj@kernel.org, peterz@infradead.org, yanjiewtw@gmail.com,
	kim.phillips@amd.com, lukas.bulwahn@gmail.com, seanjc@google.com,
	jmattson@google.com, leitao@debian.org, jpoimboe@kernel.org,
	Edgecombe, Rick P, kirill.shutemov@linux.intel.com, Joseph, Jithu,
	Huang, Kai, kan.liang@linux.intel.com,
	daniel.sneddon@linux.intel.com, pbonzini@redhat.com,
	sandipan.das@amd.com, ilpo.jarvinen@linux.intel.com,
	peternewman@google.com, Wieczor-Retman, Maciej,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Eranian, Stephane, james.morse@arm.com

Hi Reinette,

On 10/11/24 17:24,  wrote:
> 
> 
> On 10/10/24 8:30 AM, Moger, Babu wrote:
>> On 10/10/24 10:07, Luck, Tony wrote:
>>>>>> +  By default resctrl assumes each control and monitor group has a hardware
>>>>>> +  counter. Hardware that does not support 'mbm_cntr_assign' mode will still
>>>>>> +  allow more control or monitor groups than 'num_rmids' to be created. In
>>>>>
>>>>> Should that be s/num_rmids/num_mbm_cntrs/ ?
>>>>
>>>> It is actually num_rmids here as in default mode, num_rmid_cntrs are not
>>>> available.
>>>
>>> Babu,
>>>
>>> The code isn't working that way for me. I built & booted. Since I'm on
>>> an Intel machine without ABMC I'm in "default" mode. But I can't make
>>> more monitor groups that num_rmids.
>>>
>>
>> That is correct. We will have to change the text. How about?
>>
>> "default":
>> By default resctrl assumes each control and monitor group has a hardware
>> counter. Hardware that does not support 'mbm_cntr_assign' mode will still
> 
> I think this is independent from whether hardware supports 'mbm_cntr_assign'
> mode since a user could enable 'default' mode on hardware that supports 
> 'mbm_cntr_assign'. This snippet is thus more about what is meant by 'default'
> mode than what is supported by hardware.
> 
> The docs already contain:
> 	"num_rmids":
> 		...
> 		This is the upper bound for how many "CTRL_MON" + "MON"
> 		groups can be created.
> 
> 
> Neither of the 'mbm_assign_mode' options change this meaning of 'num_rmids' (i.e.
> no change in how many monitor groups can be created) so mentioning it in the
> 'default' portion but not in the 'mbm_cntr_assign' portion may create confusion.
> 
> 
> Perhaps it can be simplified to:
> 	In default mode resctrl assumes each CTRL_MON and MON group has a
> 	hardware counter. Reading mbm_total_bytes or mbm_local_bytes may
> 	report 'Unavailable' if there is no counter associated with that
> 	group.

Sure.

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: RE: [PATCH v8 19/25] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled
  2024-10-11 21:33       ` Luck, Tony
@ 2024-10-14 15:43         ` Moger, Babu
  2024-10-14 16:18           ` Luck, Tony
  0 siblings, 1 reply; 124+ messages in thread
From: Moger, Babu @ 2024-10-14 15:43 UTC (permalink / raw)
  To: Luck, Tony
  Cc: corbet@lwn.net, Yu, Fenghua, Chatre, Reinette, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
	rdunlap@infradead.org, tj@kernel.org, peterz@infradead.org,
	yanjiewtw@gmail.com, kim.phillips@amd.com,
	lukas.bulwahn@gmail.com, seanjc@google.com, jmattson@google.com,
	leitao@debian.org, jpoimboe@kernel.org, Edgecombe, Rick P,
	kirill.shutemov@linux.intel.com, Joseph, Jithu, Huang, Kai,
	kan.liang@linux.intel.com, daniel.sneddon@linux.intel.com,
	pbonzini@redhat.com, sandipan.das@amd.com,
	ilpo.jarvinen@linux.intel.com, peternewman@google.com,
	Wieczor-Retman, Maciej, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Eranian, Stephane,
	james.morse@arm.com

Hi Tony,

On 10/11/24 16:33, Luck, Tony wrote:
>>>> +static int rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
>>>> +{
>>>> +  struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>>>> +  int ret = 0;
>>>> +
>>>> +  if (!resctrl_arch_mbm_cntr_assign_enabled(r))
>>>> +          return 0;
>>>> +
>>>> +  if (is_mbm_total_enabled())
>>>> +          ret = rdtgroup_assign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_TOTAL_EVENT_ID);
> 
> Consider that this call fails. "ret" indicates failure to allocate.

Look at this call

      if (is_mbm_total_enabled())
                ret = rdtgroup_assign_cntr_event(r, rdtgrp, NULL,
QOS_L3_MBM_TOTAL_EVENT_ID);


If this call fails, it will return immediately.

Lets say ret = 1; (1 if for failure. 0 for success)

> 
>>>> +
>>>> +  if (!ret && is_mbm_local_enabled())
>>>> +          ret = rdtgroup_assign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);
> 
> Now this call succeeds. The failure of the previous call is forgotten as "ret" is
> overwritten with the success code.

It will not make this call if the first call fails because of this check.

        if (!ret && is_mbm_local_enabled())
                ret = rdtgroup_assign_cntr_event(r, rdtgrp, NULL,
QOS_L3_MBM_LOCAL_EVENT_ID);

        return ret;

Here if (!1) evaluates to false.

Did I miss something?


> 
>>>
>>> This overwrites the value from allocating the counter for total event.
>>
>> Total event and local events have two different indexes.
>> Can you please elaborate?
> 
> See comments above.  If you want a return code you need
> 
> 	int ret_local = 0, ret_total = 0;
> 
> 	if (is_mbm_total_enabled())
> 		ret_total = rdtgroup_assign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_TOTAL_EVENT_ID);
> 	if (!ret && is_mbm_local_enabled())
> 		ret_local = rdtgroup_assign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);
> 
> 
> 	return some_function of ret_local and ret_total;
> 
> Not sure if you want to say success only if both of these calls succeeded. Or maybe if either worked?
> 
> But it all seems complicated since callers don't have to take any different action depending on whether allocation of a counter succeeds or fails.
> 
> -Tony
> 
> 
> 	

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* RE: RE: [PATCH v8 19/25] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled
  2024-10-14 15:43         ` Moger, Babu
@ 2024-10-14 16:18           ` Luck, Tony
  2024-10-14 16:35             ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Luck, Tony @ 2024-10-14 16:18 UTC (permalink / raw)
  To: babu.moger@amd.com
  Cc: corbet@lwn.net, Yu, Fenghua, Chatre, Reinette, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
	rdunlap@infradead.org, tj@kernel.org, peterz@infradead.org,
	yanjiewtw@gmail.com, kim.phillips@amd.com,
	lukas.bulwahn@gmail.com, seanjc@google.com, jmattson@google.com,
	leitao@debian.org, jpoimboe@kernel.org, Edgecombe, Rick P,
	kirill.shutemov@linux.intel.com, Joseph, Jithu, Huang, Kai,
	kan.liang@linux.intel.com, daniel.sneddon@linux.intel.com,
	pbonzini@redhat.com, sandipan.das@amd.com,
	ilpo.jarvinen@linux.intel.com, peternewman@google.com,
	Wieczor-Retman, Maciej, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Eranian, Stephane,
	james.morse@arm.com

> >>>> +  if (!ret && is_mbm_local_enabled())
> >>>> +          ret = rdtgroup_assign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);
> >
> > Now this call succeeds. The failure of the previous call is forgotten as "ret" is
> > overwritten with the success code.
>
> It will not make this call if the first call fails because of this check.
>
>         if (!ret && is_mbm_local_enabled())
>                 ret = rdtgroup_assign_cntr_event(r, rdtgrp, NULL,
> QOS_L3_MBM_LOCAL_EVENT_ID);
>
>         return ret;
>
> Here if (!1) evaluates to false.
>
> Did I miss something?

You didn't.

I missed the check for ret in the local case.

It is still the case that callers don't care about the return value.

-Tony

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-10 15:12     ` Moger, Babu
  2024-10-10 15:58       ` Luck, Tony
@ 2024-10-14 16:25       ` Reinette Chatre
  2024-10-14 17:46         ` Moger, Babu
  1 sibling, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-14 16:25 UTC (permalink / raw)
  To: babu.moger, Tony Luck
  Cc: corbet, fenghua.yu, tglx, mingo, bp, dave.hansen, x86, hpa,
	paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu and Tony,

On 10/10/24 8:12 AM, Moger, Babu wrote:
> 
> All good points. How about this text:
> 
> "num_mbm_cntrs":
> The number of monitoring counters available for assignment when the
> architecture supports mbm_cntr_assign mode.
> 
> Resctrl subsystem provides the interface to count maximum of two memory

subsystem -> filesystem

> bandwidth events per group, from a combination of available total and

Is this "from a combination of ..." snippet intended to hint at BMEC?

> local events. Keeping the current interface, users can enable a maximum of

What is meant by "Keeping the current interface"? Which interface? What will
"current" mean when a user reads this documentation?

> 2 counters per group. User will also have the option to enable only one

"User will also have" is talking about the future. When will this be the case?

> counter to the group to maximize the number of groups monitored.
> 
> 

I think that we need to be careful when making this documentation so specific
to the ABMC implementation. We already know that "soft-ABMC" is coming and
Peter already shared [1] that with software assignment it will not be possible
to assign counters to individual events. 

The goal of this work is to create a generic interface and this is the documentation
for it. If this documentation is created to be specific to the first implementation
it will make it difficult to use this same interface to support other
implementations.

Reinette


[1] https://lore.kernel.org/all/CALPaoCi_TBZnULHQpYns+H+30jODZvyQpUHJRDHNwjQzajrD=A@mail.gmail.com/

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: RE: RE: [PATCH v8 19/25] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled
  2024-10-14 16:18           ` Luck, Tony
@ 2024-10-14 16:35             ` Moger, Babu
  2024-10-15  2:39               ` Reinette Chatre
  0 siblings, 1 reply; 124+ messages in thread
From: Moger, Babu @ 2024-10-14 16:35 UTC (permalink / raw)
  To: Luck, Tony
  Cc: corbet@lwn.net, Yu, Fenghua, Chatre, Reinette, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
	rdunlap@infradead.org, tj@kernel.org, peterz@infradead.org,
	yanjiewtw@gmail.com, kim.phillips@amd.com,
	lukas.bulwahn@gmail.com, seanjc@google.com, jmattson@google.com,
	leitao@debian.org, jpoimboe@kernel.org, Edgecombe, Rick P,
	kirill.shutemov@linux.intel.com, Joseph, Jithu, Huang, Kai,
	kan.liang@linux.intel.com, daniel.sneddon@linux.intel.com,
	pbonzini@redhat.com, sandipan.das@amd.com,
	ilpo.jarvinen@linux.intel.com, peternewman@google.com,
	Wieczor-Retman, Maciej, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Eranian, Stephane,
	james.morse@arm.com



On 12/31/69 18:00, Luck, Tony wrote:
>>>>>> +  if (!ret && is_mbm_local_enabled())
>>>>>> +          ret = rdtgroup_assign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);
>>>
>>> Now this call succeeds. The failure of the previous call is forgotten as "ret" is
>>> overwritten with the success code.
>>
>> It will not make this call if the first call fails because of this check.
>>
>>         if (!ret && is_mbm_local_enabled())
>>                 ret = rdtgroup_assign_cntr_event(r, rdtgrp, NULL,
>> QOS_L3_MBM_LOCAL_EVENT_ID);
>>
>>         return ret;
>>
>> Here if (!1) evaluates to false.
>>
>> Did I miss something?
> 
> You didn't.
> 
> I missed the check for ret in the local case.

That is fine,

> 
> It is still the case that callers don't care about the return value.

That is correct.

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-11 21:36                       ` Tony Luck
@ 2024-10-14 16:46                         ` Reinette Chatre
  2024-10-14 17:20                           ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-14 16:46 UTC (permalink / raw)
  To: Tony Luck, babu.moger
  Cc: corbet@lwn.net, Yu, Fenghua, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, paulmck@kernel.org, rdunlap@infradead.org,
	tj@kernel.org, peterz@infradead.org, yanjiewtw@gmail.com,
	kim.phillips@amd.com, lukas.bulwahn@gmail.com, seanjc@google.com,
	jmattson@google.com, leitao@debian.org, jpoimboe@kernel.org,
	Edgecombe, Rick P, kirill.shutemov@linux.intel.com, Joseph, Jithu,
	Huang, Kai, kan.liang@linux.intel.com,
	daniel.sneddon@linux.intel.com, pbonzini@redhat.com,
	sandipan.das@amd.com, ilpo.jarvinen@linux.intel.com,
	peternewman@google.com, Wieczor-Retman, Maciej,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Eranian, Stephane, james.morse@arm.com

Hi Tony,

On 10/11/24 2:36 PM, Tony Luck wrote:
> On Fri, Oct 11, 2024 at 03:49:48PM -0500, Moger, Babu wrote:
> 
> I.e. the user who chose this simply gave up being able to
> read total bandwidth on domain 1, but didn't get an extra
> counter in exchange for this sacrifice. That doesn't seem
> like a good deal.

As Babu mentioned earlier, this seems equivalent to the existing
CLOSid management. For example, if a user assigns only CPUs
from one domain to a resource group, it does not free up the
CLOSID to create a new resource group dedicated to other domain(s).

Reinette

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-10 18:36             ` Moger, Babu
  2024-10-10 18:57               ` Luck, Tony
@ 2024-10-14 16:59               ` Reinette Chatre
  2024-10-14 19:23                 ` Moger, Babu
  1 sibling, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-14 16:59 UTC (permalink / raw)
  To: babu.moger, Luck, Tony
  Cc: corbet@lwn.net, Yu, Fenghua, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, paulmck@kernel.org, rdunlap@infradead.org,
	tj@kernel.org, peterz@infradead.org, yanjiewtw@gmail.com,
	kim.phillips@amd.com, lukas.bulwahn@gmail.com, seanjc@google.com,
	jmattson@google.com, leitao@debian.org, jpoimboe@kernel.org,
	Edgecombe, Rick P, kirill.shutemov@linux.intel.com, Joseph, Jithu,
	Huang, Kai, kan.liang@linux.intel.com,
	daniel.sneddon@linux.intel.com, pbonzini@redhat.com,
	sandipan.das@amd.com, ilpo.jarvinen@linux.intel.com,
	peternewman@google.com, Wieczor-Retman, Maciej,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Eranian, Stephane, james.morse@arm.com

Hi Tony and Babu,

On 10/10/24 11:36 AM, Moger, Babu wrote:
> Hi Tony,
> 
> On 10/10/24 12:08, Luck, Tony wrote:
>> Babu,
>>
>>> We have the information already in r->mon.mbm_cntr_free_map.
>>>
>>> How about adding an extra text while printing num_mbm_cntrs?
>>>
>>> $ cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
>>>   Total 32, Available 16
>>
>> Either that or:
>> Total 32
>> Available 16
>>
> 
> Sure. Fine with me.

I think separate files would be easier to parse and matches the existing resctrl
interface in this regard. How about "available_mbm_cntrs"?

Reinette


^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-14 16:46                         ` Reinette Chatre
@ 2024-10-14 17:20                           ` Moger, Babu
  2024-10-14 17:49                             ` Luck, Tony
  0 siblings, 1 reply; 124+ messages in thread
From: Moger, Babu @ 2024-10-14 17:20 UTC (permalink / raw)
  To: Reinette Chatre, Tony Luck
  Cc: corbet@lwn.net, Yu, Fenghua, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, paulmck@kernel.org, rdunlap@infradead.org,
	tj@kernel.org, peterz@infradead.org, yanjiewtw@gmail.com,
	kim.phillips@amd.com, lukas.bulwahn@gmail.com, seanjc@google.com,
	jmattson@google.com, leitao@debian.org, jpoimboe@kernel.org,
	Edgecombe, Rick P, kirill.shutemov@linux.intel.com, Joseph, Jithu,
	Huang, Kai, kan.liang@linux.intel.com,
	daniel.sneddon@linux.intel.com, pbonzini@redhat.com,
	sandipan.das@amd.com, ilpo.jarvinen@linux.intel.com,
	peternewman@google.com, Wieczor-Retman, Maciej,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Eranian, Stephane, james.morse@arm.com



On 10/14/24 11:46,  wrote:
> Hi Tony,
> 
> On 10/11/24 2:36 PM, Tony Luck wrote:
>> On Fri, Oct 11, 2024 at 03:49:48PM -0500, Moger, Babu wrote:
>>
>> I.e. the user who chose this simply gave up being able to
>> read total bandwidth on domain 1, but didn't get an extra
>> counter in exchange for this sacrifice. That doesn't seem
>> like a good deal.
> 
> As Babu mentioned earlier, this seems equivalent to the existing
> CLOSid management. For example, if a user assigns only CPUs
> from one domain to a resource group, it does not free up the
> CLOSID to create a new resource group dedicated to other domain(s).
> 

Thanks for the confirmation here.

I was wondering if this works differently on Intel. I was trying to figure
out on 2 socket intel system if we can create two separate resctrl groups
sharing the same CLOSID (one group using CLOSID 1 on socket 0 and another
group CLOSID 1 socket 1). No. We cannot do that.

Even though hardware supports separate allocation for each domain, resctrl
design does not support that.
-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-14 16:25       ` Reinette Chatre
@ 2024-10-14 17:46         ` Moger, Babu
  2024-10-14 18:30           ` Reinette Chatre
  0 siblings, 1 reply; 124+ messages in thread
From: Moger, Babu @ 2024-10-14 17:46 UTC (permalink / raw)
  To: Reinette Chatre, Tony Luck
  Cc: corbet, fenghua.yu, tglx, mingo, bp, dave.hansen, x86, hpa,
	paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/14/24 11:25, Reinette Chatre wrote:
> Hi Babu and Tony,
> 
> On 10/10/24 8:12 AM, Moger, Babu wrote:
>>
>> All good points. How about this text:
>>
>> "num_mbm_cntrs":
>> The number of monitoring counters available for assignment when the
>> architecture supports mbm_cntr_assign mode.
>>
>> Resctrl subsystem provides the interface to count maximum of two memory
> 
> subsystem -> filesystem

Sure.
> 
>> bandwidth events per group, from a combination of available total and
> 
> Is this "from a combination of ..." snippet intended to hint at BMEC?

No. We support 2 MBM events right now. That is why I added combination of
total and local. I can remove that text.


> 
>> local events. Keeping the current interface, users can enable a maximum of
> 
> What is meant by "Keeping the current interface"? Which interface? What will
> "current" mean when a user reads this documentation?

I meant not to change any interface to support mbm_cntrl_assign feature.

> 
>> 2 counters per group. User will also have the option to enable only one
> 
> "User will also have" is talking about the future. When will this be the case?

Again.. will have change the text here.

> 
>> counter to the group to maximize the number of groups monitored.
>>
>>
> 
> I think that we need to be careful when making this documentation so specific
> to the ABMC implementation. We already know that "soft-ABMC" is coming and
> Peter already shared [1] that with software assignment it will not be possible
> to assign counters to individual events. 
> 
> The goal of this work is to create a generic interface and this is the documentation
> for it. If this documentation is created to be specific to the first implementation
> it will make it difficult to use this same interface to support other
> implementations.
> 

Agree.

How about this?


"num_mbm_cntrs":
The number of monitoring counters available for assignment when the
architecture supports mbm_cntr_assign mode.

The resctrl filesystem allows user track up to two memory bandwidth events
per group, using a mix of total and local events. Users can enable up to 2
counters per group. There's also an option to enable just one counter per
group, which allows monitoring more groups.


-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* RE: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-14 17:20                           ` Moger, Babu
@ 2024-10-14 17:49                             ` Luck, Tony
  2024-10-14 19:21                               ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Luck, Tony @ 2024-10-14 17:49 UTC (permalink / raw)
  To: babu.moger@amd.com, Chatre, Reinette
  Cc: corbet@lwn.net, Yu, Fenghua, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, paulmck@kernel.org, rdunlap@infradead.org,
	tj@kernel.org, peterz@infradead.org, yanjiewtw@gmail.com,
	kim.phillips@amd.com, lukas.bulwahn@gmail.com, seanjc@google.com,
	jmattson@google.com, leitao@debian.org, jpoimboe@kernel.org,
	Edgecombe, Rick P, kirill.shutemov@linux.intel.com, Joseph, Jithu,
	Huang, Kai, kan.liang@linux.intel.com,
	daniel.sneddon@linux.intel.com, pbonzini@redhat.com,
	sandipan.das@amd.com, ilpo.jarvinen@linux.intel.com,
	peternewman@google.com, Wieczor-Retman, Maciej,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Eranian, Stephane, james.morse@arm.com

> >> I.e. the user who chose this simply gave up being able to
> >> read total bandwidth on domain 1, but didn't get an extra
> >> counter in exchange for this sacrifice. That doesn't seem
> >> like a good deal.
> >
> > As Babu mentioned earlier, this seems equivalent to the existing
> > CLOSid management. For example, if a user assigns only CPUs
> > from one domain to a resource group, it does not free up the
> > CLOSID to create a new resource group dedicated to other domain(s).

I hadn't considered the case where a user is assigning CPUs to resctrl
groups instead of assigning tasks. With that context this makes sense
to me now.  Thanks.


> Thanks for the confirmation here.
>
> I was wondering if this works differently on Intel. I was trying to figure
> out on 2 socket intel system if we can create two separate resctrl groups
> sharing the same CLOSID (one group using CLOSID 1 on socket 0 and another
> group CLOSID 1 socket 1). No. We cannot do that.
>
> Even though hardware supports separate allocation for each domain, resctrl
> design does not support that.

So CLOSIDs and counters are blanket assigned across all domains. I understand
that now.

Back to my question of why complicate code and resctrl files by providing a
mechanism to enable event counters differently per-domain.

"0=tl;1=_" requires allocation of the same counters as "0=tl;1=tl" or
"0=t;1=l"

What advantage does it have over skipping the per-domain list and
just providing a single value for all domains? You clearly expect this
will be a common user request since you implemented the "*" means
apply to all domains.

-Tony



^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-14 17:46         ` Moger, Babu
@ 2024-10-14 18:30           ` Reinette Chatre
  2024-10-14 18:51             ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-14 18:30 UTC (permalink / raw)
  To: babu.moger, Tony Luck
  Cc: corbet, fenghua.yu, tglx, mingo, bp, dave.hansen, x86, hpa,
	paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/14/24 10:46 AM, Moger, Babu wrote:
> On 10/14/24 11:25, Reinette Chatre wrote:
>> On 10/10/24 8:12 AM, Moger, Babu wrote:
>>>
>>> All good points. How about this text:
>>>
>>> "num_mbm_cntrs":
>>> The number of monitoring counters available for assignment when the
>>> architecture supports mbm_cntr_assign mode.
>>>
>>> Resctrl subsystem provides the interface to count maximum of two memory
>>
>> subsystem -> filesystem
> 
> Sure.
>>
>>> bandwidth events per group, from a combination of available total and
>>
>> Is this "from a combination of ..." snippet intended to hint at BMEC?
> 
> No. We support 2 MBM events right now. That is why I added combination of
> total and local. I can remove that text.
> 
>>
>>> local events. Keeping the current interface, users can enable a maximum of
>>
>> What is meant by "Keeping the current interface"? Which interface? What will
>> "current" mean when a user reads this documentation?
> 
> I meant not to change any interface to support mbm_cntrl_assign feature.
> 
>>
>>> 2 counters per group. User will also have the option to enable only one
>>
>> "User will also have" is talking about the future. When will this be the case?
> 
> Again.. will have change the text here.
> 
>>
>>> counter to the group to maximize the number of groups monitored.
>>>
>>>
>>
>> I think that we need to be careful when making this documentation so specific
>> to the ABMC implementation. We already know that "soft-ABMC" is coming and
>> Peter already shared [1] that with software assignment it will not be possible
>> to assign counters to individual events. 
>>
>> The goal of this work is to create a generic interface and this is the documentation
>> for it. If this documentation is created to be specific to the first implementation
>> it will make it difficult to use this same interface to support other
>> implementations.
>>
> 
> Agree.
> 
> How about this?
> 
> 
> "num_mbm_cntrs":
> The number of monitoring counters available for assignment when the
> architecture supports mbm_cntr_assign mode.
> 
> The resctrl filesystem allows user track up to two memory bandwidth events
> per group, using a mix of total and local events. Users can enable up to 2

"a mix of" remains unclear to me since there are only two options. I think we
can be specific here.

> counters per group. There's also an option to enable just one counter per
> group, which allows monitoring more groups.
> 

How about below for the second paragraph:

	The resctrl filesystem supports tracking up to two memory bandwidth
	events per monitoring group: mbm_total_bytes and/or mbm_local_bytes.
	Up to two counters can be assigned per monitoring group, one for each
	memory bandwidth event. More monitoring groups can be tracked by
	assigning one counter per monitoring group. However, doing so limits
	memory bandwidth tracking to a single memory bandwidth event per
	monitoring group.
 
Reinette

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-14 18:30           ` Reinette Chatre
@ 2024-10-14 18:51             ` Moger, Babu
  0 siblings, 0 replies; 124+ messages in thread
From: Moger, Babu @ 2024-10-14 18:51 UTC (permalink / raw)
  To: Reinette Chatre, Tony Luck
  Cc: corbet, fenghua.yu, tglx, mingo, bp, dave.hansen, x86, hpa,
	paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/14/24 13:30, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/14/24 10:46 AM, Moger, Babu wrote:
>> On 10/14/24 11:25, Reinette Chatre wrote:
>>> On 10/10/24 8:12 AM, Moger, Babu wrote:
>>>>
>>>> All good points. How about this text:
>>>>
>>>> "num_mbm_cntrs":
>>>> The number of monitoring counters available for assignment when the
>>>> architecture supports mbm_cntr_assign mode.
>>>>
>>>> Resctrl subsystem provides the interface to count maximum of two memory
>>>
>>> subsystem -> filesystem
>>
>> Sure.
>>>
>>>> bandwidth events per group, from a combination of available total and
>>>
>>> Is this "from a combination of ..." snippet intended to hint at BMEC?
>>
>> No. We support 2 MBM events right now. That is why I added combination of
>> total and local. I can remove that text.
>>
>>>
>>>> local events. Keeping the current interface, users can enable a maximum of
>>>
>>> What is meant by "Keeping the current interface"? Which interface? What will
>>> "current" mean when a user reads this documentation?
>>
>> I meant not to change any interface to support mbm_cntrl_assign feature.
>>
>>>
>>>> 2 counters per group. User will also have the option to enable only one
>>>
>>> "User will also have" is talking about the future. When will this be the case?
>>
>> Again.. will have change the text here.
>>
>>>
>>>> counter to the group to maximize the number of groups monitored.
>>>>
>>>>
>>>
>>> I think that we need to be careful when making this documentation so specific
>>> to the ABMC implementation. We already know that "soft-ABMC" is coming and
>>> Peter already shared [1] that with software assignment it will not be possible
>>> to assign counters to individual events. 
>>>
>>> The goal of this work is to create a generic interface and this is the documentation
>>> for it. If this documentation is created to be specific to the first implementation
>>> it will make it difficult to use this same interface to support other
>>> implementations.
>>>
>>
>> Agree.
>>
>> How about this?
>>
>>
>> "num_mbm_cntrs":
>> The number of monitoring counters available for assignment when the
>> architecture supports mbm_cntr_assign mode.
>>
>> The resctrl filesystem allows user track up to two memory bandwidth events
>> per group, using a mix of total and local events. Users can enable up to 2
> 
> "a mix of" remains unclear to me since there are only two options. I think we
> can be specific here.
> 
>> counters per group. There's also an option to enable just one counter per
>> group, which allows monitoring more groups.
>>
> 
> How about below for the second paragraph:
> 
> 	The resctrl filesystem supports tracking up to two memory bandwidth
> 	events per monitoring group: mbm_total_bytes and/or mbm_local_bytes.
> 	Up to two counters can be assigned per monitoring group, one for each
> 	memory bandwidth event. More monitoring groups can be tracked by
> 	assigning one counter per monitoring group. However, doing so limits
> 	memory bandwidth tracking to a single memory bandwidth event per
> 	monitoring group.
>  

Sure. Looks good.
-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-14 17:49                             ` Luck, Tony
@ 2024-10-14 19:21                               ` Moger, Babu
  2024-10-14 19:51                                 ` Luck, Tony
  0 siblings, 1 reply; 124+ messages in thread
From: Moger, Babu @ 2024-10-14 19:21 UTC (permalink / raw)
  To: Luck, Tony, Chatre, Reinette
  Cc: corbet@lwn.net, Yu, Fenghua, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, paulmck@kernel.org, rdunlap@infradead.org,
	tj@kernel.org, peterz@infradead.org, yanjiewtw@gmail.com,
	kim.phillips@amd.com, lukas.bulwahn@gmail.com, seanjc@google.com,
	jmattson@google.com, leitao@debian.org, jpoimboe@kernel.org,
	Edgecombe, Rick P, kirill.shutemov@linux.intel.com, Joseph, Jithu,
	Huang, Kai, kan.liang@linux.intel.com,
	daniel.sneddon@linux.intel.com, pbonzini@redhat.com,
	sandipan.das@amd.com, ilpo.jarvinen@linux.intel.com,
	peternewman@google.com, Wieczor-Retman, Maciej,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Eranian, Stephane, james.morse@arm.com

Hi Tony,

On 10/14/24 12:49, Luck, Tony wrote:
>>>> I.e. the user who chose this simply gave up being able to
>>>> read total bandwidth on domain 1, but didn't get an extra
>>>> counter in exchange for this sacrifice. That doesn't seem
>>>> like a good deal.
>>>
>>> As Babu mentioned earlier, this seems equivalent to the existing
>>> CLOSid management. For example, if a user assigns only CPUs
>>> from one domain to a resource group, it does not free up the
>>> CLOSID to create a new resource group dedicated to other domain(s).
> 
> I hadn't considered the case where a user is assigning CPUs to resctrl
> groups instead of assigning tasks. With that context this makes sense
> to me now.  Thanks.
> 
> 
>> Thanks for the confirmation here.
>>
>> I was wondering if this works differently on Intel. I was trying to figure
>> out on 2 socket intel system if we can create two separate resctrl groups
>> sharing the same CLOSID (one group using CLOSID 1 on socket 0 and another
>> group CLOSID 1 socket 1). No. We cannot do that.
>>
>> Even though hardware supports separate allocation for each domain, resctrl
>> design does not support that.
> 
> So CLOSIDs and counters are blanket assigned across all domains. I understand
> that now.
> 
> Back to my question of why complicate code and resctrl files by providing a
> mechanism to enable event counters differently per-domain.
> 
> "0=tl;1=_" requires allocation of the same counters as "0=tl;1=tl" or
> "0=t;1=l"

Yes. That is correct.

> 
> What advantage does it have over skipping the per-domain list and
> just providing a single value for all domains? You clearly expect this
> will be a common user request since you implemented the "*" means
> apply to all domains.
> 

We started with a global assignment by applying assignment across all the
domains initially.

But we wanted give a generic approach which allows both the options(domain
specific assignment and global assignment with '*"). It is also matches
with other managements (RMID/CLOSID management) we are doing in resctrl
right now. Also, there is an extra IPI for each domain if user is only
interested in on domain.

Some of the discussions are here.
https://lore.kernel.org/lkml/f7dac996d87b4144e4c786178a7fd3d218eaebe8.1711674410.git.babu.moger@amd.com/#r

Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-14 16:59               ` Reinette Chatre
@ 2024-10-14 19:23                 ` Moger, Babu
  0 siblings, 0 replies; 124+ messages in thread
From: Moger, Babu @ 2024-10-14 19:23 UTC (permalink / raw)
  To: Reinette Chatre, Luck, Tony
  Cc: corbet@lwn.net, Yu, Fenghua, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, paulmck@kernel.org, rdunlap@infradead.org,
	tj@kernel.org, peterz@infradead.org, yanjiewtw@gmail.com,
	kim.phillips@amd.com, lukas.bulwahn@gmail.com, seanjc@google.com,
	jmattson@google.com, leitao@debian.org, jpoimboe@kernel.org,
	Edgecombe, Rick P, kirill.shutemov@linux.intel.com, Joseph, Jithu,
	Huang, Kai, kan.liang@linux.intel.com,
	daniel.sneddon@linux.intel.com, pbonzini@redhat.com,
	sandipan.das@amd.com, ilpo.jarvinen@linux.intel.com,
	peternewman@google.com, Wieczor-Retman, Maciej,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Eranian, Stephane, james.morse@arm.com

Hi Reinette,

On 10/14/24 11:59,  wrote:
> Hi Tony and Babu,
> 
> On 10/10/24 11:36 AM, Moger, Babu wrote:
>> Hi Tony,
>>
>> On 10/10/24 12:08, Luck, Tony wrote:
>>> Babu,
>>>
>>>> We have the information already in r->mon.mbm_cntr_free_map.
>>>>
>>>> How about adding an extra text while printing num_mbm_cntrs?
>>>>
>>>> $ cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
>>>>   Total 32, Available 16
>>>
>>> Either that or:
>>> Total 32
>>> Available 16
>>>
>>
>> Sure. Fine with me.
> 
> I think separate files would be easier to parse and matches the existing resctrl
> interface in this regard. How about "available_mbm_cntrs"?

Sure.

Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* RE: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-14 19:21                               ` Moger, Babu
@ 2024-10-14 19:51                                 ` Luck, Tony
  2024-10-14 20:05                                   ` Reinette Chatre
  0 siblings, 1 reply; 124+ messages in thread
From: Luck, Tony @ 2024-10-14 19:51 UTC (permalink / raw)
  To: babu.moger@amd.com, Chatre, Reinette
  Cc: corbet@lwn.net, Yu, Fenghua, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, paulmck@kernel.org, rdunlap@infradead.org,
	tj@kernel.org, peterz@infradead.org, yanjiewtw@gmail.com,
	kim.phillips@amd.com, lukas.bulwahn@gmail.com, seanjc@google.com,
	jmattson@google.com, leitao@debian.org, jpoimboe@kernel.org,
	Edgecombe, Rick P, kirill.shutemov@linux.intel.com, Joseph, Jithu,
	Huang, Kai, kan.liang@linux.intel.com,
	daniel.sneddon@linux.intel.com, pbonzini@redhat.com,
	sandipan.das@amd.com, ilpo.jarvinen@linux.intel.com,
	peternewman@google.com, Wieczor-Retman, Maciej,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Eranian, Stephane, james.morse@arm.com

> > What advantage does it have over skipping the per-domain list and
> > just providing a single value for all domains? You clearly expect this
> > will be a common user request since you implemented the "*" means
> > apply to all domains.
> >
>
> We started with a global assignment by applying assignment across all the
> domains initially.
>
> But we wanted give a generic approach which allows both the options(domain
> specific assignment and global assignment with '*"). It is also matches
> with other managements (RMID/CLOSID management) we are doing in resctrl
> right now. Also, there is an extra IPI for each domain if user is only
> interested in on domain.
>
> Some of the discussions are here.
> https://lore.kernel.org/lkml/f7dac996d87b4144e4c786178a7fd3d218eaebe8.1711674410.git.babu.moger@amd.com/#r

My summary of that:

Peter: Complex, don't need per-domain.
Reinette: Maybe some architecture might want per-domain.

Since you seem to want to keep the flexibility for a possible future
where per-domain is needed. The "available_mbm_cntrs" file
suggested in another thread would need to list available counters
on each domain to avoid ABI problems should that future arrive.

$ cat num_mbm_counters
32

$ cat available_mbm_cntrs
0=12;1=9

Current implementation would show same number for all domains.

-Tony

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-14 19:51                                 ` Luck, Tony
@ 2024-10-14 20:05                                   ` Reinette Chatre
  2024-10-14 20:32                                     ` Moger, Babu
  2024-10-24 17:29                                     ` Moger, Babu
  0 siblings, 2 replies; 124+ messages in thread
From: Reinette Chatre @ 2024-10-14 20:05 UTC (permalink / raw)
  To: Luck, Tony, babu.moger@amd.com
  Cc: corbet@lwn.net, Yu, Fenghua, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, paulmck@kernel.org, rdunlap@infradead.org,
	tj@kernel.org, peterz@infradead.org, yanjiewtw@gmail.com,
	kim.phillips@amd.com, lukas.bulwahn@gmail.com, seanjc@google.com,
	jmattson@google.com, leitao@debian.org, jpoimboe@kernel.org,
	Edgecombe, Rick P, kirill.shutemov@linux.intel.com, Joseph, Jithu,
	Huang, Kai, kan.liang@linux.intel.com,
	daniel.sneddon@linux.intel.com, pbonzini@redhat.com,
	sandipan.das@amd.com, ilpo.jarvinen@linux.intel.com,
	peternewman@google.com, Wieczor-Retman, Maciej,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Eranian, Stephane, james.morse@arm.com

Hi Tony,

On 10/14/24 12:51 PM, Luck, Tony wrote:
>>> What advantage does it have over skipping the per-domain list and
>>> just providing a single value for all domains? You clearly expect this
>>> will be a common user request since you implemented the "*" means
>>> apply to all domains.
>>>
>>
>> We started with a global assignment by applying assignment across all the
>> domains initially.
>>
>> But we wanted give a generic approach which allows both the options(domain
>> specific assignment and global assignment with '*"). It is also matches
>> with other managements (RMID/CLOSID management) we are doing in resctrl
>> right now. Also, there is an extra IPI for each domain if user is only
>> interested in on domain.
>>
>> Some of the discussions are here.
>> https://lore.kernel.org/lkml/f7dac996d87b4144e4c786178a7fd3d218eaebe8.1711674410.git.babu.moger@amd.com/#r
> 
> My summary of that:
> 
> Peter: Complex, don't need per-domain.
> Reinette: Maybe some architecture might want per-domain.

To be specific ... we already have an architecture that supports per-domain:
AMD's ABMC. When I considered the lifetime of user interfaces (forever?) while knowing
that ABMC does indeed support per-domain counter assignment it seems a good
precaution for the user interface to support that, even if the first
implementation does not.

There are two parts to this work: (a) the new user interface
and (b) support for ABMC. I believe that the user interface has to be
flexible to support all ABMC features that users may want to take advantage of,
even if the first implementation does not enable those features. In addition,
the user interface should support future usages that we know if, "soft-ABMC"
and MPAM.

I do not think that we should require all implementations to support everything
made possible by user interface though. As I mentioned in that thread [1] I do
think that the user _interface_ needs to be flexible by supporting domain level
counter assignment, but that it may be possible that the _implementation_ only
supports assignment to '*' domain values. 

I thus do not think we should simplify the syntax of mbm_assign_control,
but I also do not think we should require that all implementations support all that
the syntax makes possible. 
 
> Since you seem to want to keep the flexibility for a possible future
> where per-domain is needed. The "available_mbm_cntrs" file
> suggested in another thread would need to list available counters
> on each domain to avoid ABI problems should that future arrive.
> 
> $ cat num_mbm_counters
> 32
> 
> $ cat available_mbm_cntrs
> 0=12;1=9

Good point.

> 
> Current implementation would show same number for all domains.
> 

Reinette

[1] https://lore.kernel.org/all/c8a23c54-237c-4ebb-9c88-39606b9ae1ab@intel.com/



^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-14 20:05                                   ` Reinette Chatre
@ 2024-10-14 20:32                                     ` Moger, Babu
  2024-10-24 17:29                                     ` Moger, Babu
  1 sibling, 0 replies; 124+ messages in thread
From: Moger, Babu @ 2024-10-14 20:32 UTC (permalink / raw)
  To: Reinette Chatre, Luck, Tony
  Cc: corbet@lwn.net, Yu, Fenghua, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, paulmck@kernel.org, rdunlap@infradead.org,
	tj@kernel.org, peterz@infradead.org, yanjiewtw@gmail.com,
	kim.phillips@amd.com, lukas.bulwahn@gmail.com, seanjc@google.com,
	jmattson@google.com, leitao@debian.org, jpoimboe@kernel.org,
	Edgecombe, Rick P, kirill.shutemov@linux.intel.com, Joseph, Jithu,
	Huang, Kai, kan.liang@linux.intel.com,
	daniel.sneddon@linux.intel.com, pbonzini@redhat.com,
	sandipan.das@amd.com, ilpo.jarvinen@linux.intel.com,
	peternewman@google.com, Wieczor-Retman, Maciej,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Eranian, Stephane, james.morse@arm.com

Hi Reinette/Tony,

On 10/14/24 15:05,  wrote:
> Hi Tony,
> 
> On 10/14/24 12:51 PM, Luck, Tony wrote:
>>>> What advantage does it have over skipping the per-domain list and
>>>> just providing a single value for all domains? You clearly expect this
>>>> will be a common user request since you implemented the "*" means
>>>> apply to all domains.
>>>>
>>>
>>> We started with a global assignment by applying assignment across all the
>>> domains initially.
>>>
>>> But we wanted give a generic approach which allows both the options(domain
>>> specific assignment and global assignment with '*"). It is also matches
>>> with other managements (RMID/CLOSID management) we are doing in resctrl
>>> right now. Also, there is an extra IPI for each domain if user is only
>>> interested in on domain.
>>>
>>> Some of the discussions are here.
>>> https://lore.kernel.org/lkml/f7dac996d87b4144e4c786178a7fd3d218eaebe8.1711674410.git.babu.moger@amd.com/#r
>>
>> My summary of that:
>>
>> Peter: Complex, don't need per-domain.
>> Reinette: Maybe some architecture might want per-domain.
> 
> To be specific ... we already have an architecture that supports per-domain:
> AMD's ABMC. When I considered the lifetime of user interfaces (forever?) while knowing
> that ABMC does indeed support per-domain counter assignment it seems a good
> precaution for the user interface to support that, even if the first
> implementation does not.
> 
> There are two parts to this work: (a) the new user interface
> and (b) support for ABMC. I believe that the user interface has to be
> flexible to support all ABMC features that users may want to take advantage of,
> even if the first implementation does not enable those features. In addition,
> the user interface should support future usages that we know if, "soft-ABMC"
> and MPAM.
> 
> I do not think that we should require all implementations to support everything
> made possible by user interface though. As I mentioned in that thread [1] I do
> think that the user _interface_ needs to be flexible by supporting domain level
> counter assignment, but that it may be possible that the _implementation_ only
> supports assignment to '*' domain values. 
> 
> I thus do not think we should simplify the syntax of mbm_assign_control,
> but I also do not think we should require that all implementations support all that
> the syntax makes possible. 
>  
>> Since you seem to want to keep the flexibility for a possible future
>> where per-domain is needed. The "available_mbm_cntrs" file
>> suggested in another thread would need to list available counters
>> on each domain to avoid ABI problems should that future arrive.
>>
>> $ cat num_mbm_counters
>> 32
>>
>> $ cat available_mbm_cntrs
>> 0=12;1=9
> 
> Good point.

Ok. Will add it.
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 19/25] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled
  2024-10-14 16:35             ` Moger, Babu
@ 2024-10-15  2:39               ` Reinette Chatre
  2024-10-15 15:43                 ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-15  2:39 UTC (permalink / raw)
  To: babu.moger, Luck, Tony
  Cc: corbet@lwn.net, Yu, Fenghua, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, paulmck@kernel.org, rdunlap@infradead.org,
	tj@kernel.org, peterz@infradead.org, yanjiewtw@gmail.com,
	kim.phillips@amd.com, lukas.bulwahn@gmail.com, seanjc@google.com,
	jmattson@google.com, leitao@debian.org, jpoimboe@kernel.org,
	Edgecombe, Rick P, kirill.shutemov@linux.intel.com, Joseph, Jithu,
	Huang, Kai, kan.liang@linux.intel.com,
	daniel.sneddon@linux.intel.com, pbonzini@redhat.com,
	sandipan.das@amd.com, ilpo.jarvinen@linux.intel.com,
	peternewman@google.com, Wieczor-Retman, Maciej,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Eranian, Stephane, james.morse@arm.com

Hi Babu,

On 10/14/24 9:35 AM, Moger, Babu wrote:
> On 12/31/69 18:00, Luck, Tony wrote:
 
>>
>> It is still the case that callers don't care about the return value.
> 
> That is correct.
> 

Are you planning to change this? I think Tony has a good point that since
assignment failures do not matter it unnecessarily complicates the code to
have rdtgroup_assign_cntrs() return failure.

I also think the internals of rdtgroup_assign_cntrs() deserve a closer look.
I assume that error handling within rdtgroup_assign_cntrs() was created with
ABMC in mind. When only considering ABMC then the only reason why
rdtgroup_assign_cntr_event() could fail is if the system ran out of counters
and then indeed it makes no sense to attempt another call to rdtgroup_assign_cntr_event().

Now that the resctrl fs/arch split is clear the implementation does indeed expose
another opportunity for failure ... if the arch callback, resctrl_arch_config_cntr()
fails. It could thus be possible for the first rdtgroup_assign_cntr_event() to fail
while the second succeeds. Earlier [1], Tony suggested to, within rdtgroup_assign_cntrs(),
remove the local ret variable and have it return void. This sounds good to me.
When doing so a function comment explaining the usage will be helpful.

I also think that rdtgroup_unassign_cntrs() deserves similar scrutiny. Even more
so since I do not think that the second rdtgroup_unassign_cntr_event()
should be prevented from running if the first rdtgroup_unassign_cntr_event() fails.

Reinette

[1] https://lore.kernel.org/all/ZwldvDBjEA3TSw2k@agluck-desk3.sc.intel.com/

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 19/25] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled
  2024-10-15  2:39               ` Reinette Chatre
@ 2024-10-15 15:43                 ` Moger, Babu
  2024-10-15 16:57                   ` Luck, Tony
  2024-10-15 17:18                   ` Reinette Chatre
  0 siblings, 2 replies; 124+ messages in thread
From: Moger, Babu @ 2024-10-15 15:43 UTC (permalink / raw)
  To: Reinette Chatre, Luck, Tony
  Cc: corbet@lwn.net, Yu, Fenghua, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, paulmck@kernel.org, rdunlap@infradead.org,
	tj@kernel.org, peterz@infradead.org, yanjiewtw@gmail.com,
	kim.phillips@amd.com, lukas.bulwahn@gmail.com, seanjc@google.com,
	jmattson@google.com, leitao@debian.org, jpoimboe@kernel.org,
	Edgecombe, Rick P, kirill.shutemov@linux.intel.com, Joseph, Jithu,
	Huang, Kai, kan.liang@linux.intel.com,
	daniel.sneddon@linux.intel.com, pbonzini@redhat.com,
	sandipan.das@amd.com, ilpo.jarvinen@linux.intel.com,
	peternewman@google.com, Wieczor-Retman, Maciej,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Eranian, Stephane, james.morse@arm.com

Hi Reinette/Tony,

On 10/14/24 21:39,  wrote:
> Hi Babu,
> 
> On 10/14/24 9:35 AM, Moger, Babu wrote:
>> On 12/31/69 18:00, Luck, Tony wrote:
>  
>>>
>>> It is still the case that callers don't care about the return value.
>>
>> That is correct.
>>
> 
> Are you planning to change this? I think Tony has a good point that since
> assignment failures do not matter it unnecessarily complicates the code to
> have rdtgroup_assign_cntrs() return failure.
> 
> I also think the internals of rdtgroup_assign_cntrs() deserve a closer look.
> I assume that error handling within rdtgroup_assign_cntrs() was created with
> ABMC in mind. When only considering ABMC then the only reason why
> rdtgroup_assign_cntr_event() could fail is if the system ran out of counters
> and then indeed it makes no sense to attempt another call to rdtgroup_assign_cntr_event().
> 
> Now that the resctrl fs/arch split is clear the implementation does indeed expose
> another opportunity for failure ... if the arch callback, resctrl_arch_config_cntr()
> fails. It could thus be possible for the first rdtgroup_assign_cntr_event() to fail
> while the second succeeds. Earlier [1], Tony suggested to, within rdtgroup_assign_cntrs(),
> remove the local ret variable and have it return void. This sounds good to me.
> When doing so a function comment explaining the usage will be helpful.
> 
> I also think that rdtgroup_unassign_cntrs() deserves similar scrutiny. Even more
> so since I do not think that the second rdtgroup_unassign_cntr_event()
> should be prevented from running if the first rdtgroup_unassign_cntr_event() fails.


Sounds fine with me. Now it will look like this below.


static void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
{
  struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;

 if (!resctrl_arch_mbm_cntr_assign_enabled(r))
      return;

 if (is_mbm_total_enabled())
   rdtgroup_assign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_TOTAL_EVENT_ID);

 if (is_mbm_local_enabled())
   rdtgroup_assign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);

}

/*
 * Called when a group is deleted. Counters are unassigned if it was in
 * assigned state.
 */
static void rdtgroup_unassign_cntrs(struct rdtgroup *rdtgrp)
{
  struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;

  if (!resctrl_arch_mbm_cntr_assign_enabled(r))
       return;

 if (is_mbm_total_enabled())
 rdtgroup_unassign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_TOTAL_EVENT_ID);

 if (is_mbm_local_enabled())
 rdtgroup_unassign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);

}


> 
> Reinette
> 
> [1] https://lore.kernel.org/all/ZwldvDBjEA3TSw2k@agluck-desk3.sc.intel.com/
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* RE: [PATCH v8 19/25] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled
  2024-10-15 15:43                 ` Moger, Babu
@ 2024-10-15 16:57                   ` Luck, Tony
  2024-10-15 17:18                   ` Reinette Chatre
  1 sibling, 0 replies; 124+ messages in thread
From: Luck, Tony @ 2024-10-15 16:57 UTC (permalink / raw)
  To: babu.moger@amd.com, Chatre, Reinette
  Cc: corbet@lwn.net, Yu, Fenghua, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, paulmck@kernel.org, rdunlap@infradead.org,
	tj@kernel.org, peterz@infradead.org, yanjiewtw@gmail.com,
	kim.phillips@amd.com, lukas.bulwahn@gmail.com, seanjc@google.com,
	jmattson@google.com, leitao@debian.org, jpoimboe@kernel.org,
	Edgecombe, Rick P, kirill.shutemov@linux.intel.com, Joseph, Jithu,
	Huang, Kai, kan.liang@linux.intel.com,
	daniel.sneddon@linux.intel.com, pbonzini@redhat.com,
	sandipan.das@amd.com, ilpo.jarvinen@linux.intel.com,
	peternewman@google.com, Wieczor-Retman, Maciej,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Eranian, Stephane, james.morse@arm.com

> Sounds fine with me. Now it will look like this below.
>
>
> static void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
> {
>   struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>
>  if (!resctrl_arch_mbm_cntr_assign_enabled(r))
>       return;
>
>  if (is_mbm_total_enabled())
>    rdtgroup_assign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_TOTAL_EVENT_ID);
>
>  if (is_mbm_local_enabled())
>    rdtgroup_assign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);
>
> }
>
> /*
>  * Called when a group is deleted. Counters are unassigned if it was in
>  * assigned state.
>  */
> static void rdtgroup_unassign_cntrs(struct rdtgroup *rdtgrp)
> {
>   struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>
>   if (!resctrl_arch_mbm_cntr_assign_enabled(r))
>        return;
>
>  if (is_mbm_total_enabled())
>  rdtgroup_unassign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_TOTAL_EVENT_ID);
>
>  if (is_mbm_local_enabled())
>  rdtgroup_unassign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);
>
> }

Much cleaner. Thanks.

-Tony

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 19/25] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled
  2024-10-15 15:43                 ` Moger, Babu
  2024-10-15 16:57                   ` Luck, Tony
@ 2024-10-15 17:18                   ` Reinette Chatre
  2024-10-15 20:42                     ` Moger, Babu
  1 sibling, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-15 17:18 UTC (permalink / raw)
  To: babu.moger, Luck, Tony
  Cc: corbet@lwn.net, Yu, Fenghua, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, paulmck@kernel.org, rdunlap@infradead.org,
	tj@kernel.org, peterz@infradead.org, yanjiewtw@gmail.com,
	kim.phillips@amd.com, lukas.bulwahn@gmail.com, seanjc@google.com,
	jmattson@google.com, leitao@debian.org, jpoimboe@kernel.org,
	Edgecombe, Rick P, kirill.shutemov@linux.intel.com, Joseph, Jithu,
	Huang, Kai, kan.liang@linux.intel.com,
	daniel.sneddon@linux.intel.com, pbonzini@redhat.com,
	sandipan.das@amd.com, ilpo.jarvinen@linux.intel.com,
	peternewman@google.com, Wieczor-Retman, Maciej,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Eranian, Stephane, james.morse@arm.com

Hi Babu,

On 10/15/24 8:43 AM, Moger, Babu wrote:
> Hi Reinette/Tony,
> 
> On 10/14/24 21:39,  wrote:
>> Hi Babu,
>>
>> On 10/14/24 9:35 AM, Moger, Babu wrote:
>>> On 12/31/69 18:00, Luck, Tony wrote:
>>  
>>>>
>>>> It is still the case that callers don't care about the return value.
>>>
>>> That is correct.
>>>
>>
>> Are you planning to change this? I think Tony has a good point that since
>> assignment failures do not matter it unnecessarily complicates the code to
>> have rdtgroup_assign_cntrs() return failure.
>>
>> I also think the internals of rdtgroup_assign_cntrs() deserve a closer look.
>> I assume that error handling within rdtgroup_assign_cntrs() was created with
>> ABMC in mind. When only considering ABMC then the only reason why
>> rdtgroup_assign_cntr_event() could fail is if the system ran out of counters
>> and then indeed it makes no sense to attempt another call to rdtgroup_assign_cntr_event().
>>
>> Now that the resctrl fs/arch split is clear the implementation does indeed expose
>> another opportunity for failure ... if the arch callback, resctrl_arch_config_cntr()
>> fails. It could thus be possible for the first rdtgroup_assign_cntr_event() to fail
>> while the second succeeds. Earlier [1], Tony suggested to, within rdtgroup_assign_cntrs(),
>> remove the local ret variable and have it return void. This sounds good to me.
>> When doing so a function comment explaining the usage will be helpful.
>>
>> I also think that rdtgroup_unassign_cntrs() deserves similar scrutiny. Even more
>> so since I do not think that the second rdtgroup_unassign_cntr_event()
>> should be prevented from running if the first rdtgroup_unassign_cntr_event() fails.
> 
> 
> Sounds fine with me. Now it will look like this below.

Thank you for considering.

> 
> 

I assume that you will keep rdtgroup_assign_cntrs() function comment? I think
it may need some small changes to go with the function now returning void ...
for example, saying "Each group *requires* two counters" and then not failing when
two counters cannot be allocated seems suspect.

For example (please feel free to improve):

	Called when a new group is created. If "mbm_cntr_assign" mode is enabled,   
	counters are automatically assigned. Each group can accommodate two counters:      
	one for the total event and one for the local event. Assignments may fail
	due to the limited number of counters. However, it is not necessary to
	fail the group creation and thus no failure is returned. Users have the
	option to modify the counter assignments after the group has been created.   

> static void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
> {
>   struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> 
>  if (!resctrl_arch_mbm_cntr_assign_enabled(r))
>       return;
> 
>  if (is_mbm_total_enabled())
>    rdtgroup_assign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_TOTAL_EVENT_ID);
> 
>  if (is_mbm_local_enabled())
>    rdtgroup_assign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);
> 
> }
> 
> /*
>  * Called when a group is deleted. Counters are unassigned if it was in
>  * assigned state.
>  */
> static void rdtgroup_unassign_cntrs(struct rdtgroup *rdtgrp)
> {
>   struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> 
>   if (!resctrl_arch_mbm_cntr_assign_enabled(r))
>        return;
> 
>  if (is_mbm_total_enabled())
>  rdtgroup_unassign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_TOTAL_EVENT_ID);
> 
>  if (is_mbm_local_enabled())
>  rdtgroup_unassign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);
> 
> }

Looks good to me, thank you.

Reinette


^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 19/25] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled
  2024-10-15 17:18                   ` Reinette Chatre
@ 2024-10-15 20:42                     ` Moger, Babu
  0 siblings, 0 replies; 124+ messages in thread
From: Moger, Babu @ 2024-10-15 20:42 UTC (permalink / raw)
  To: Reinette Chatre, Luck, Tony
  Cc: corbet@lwn.net, Yu, Fenghua, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, paulmck@kernel.org, rdunlap@infradead.org,
	tj@kernel.org, peterz@infradead.org, yanjiewtw@gmail.com,
	kim.phillips@amd.com, lukas.bulwahn@gmail.com, seanjc@google.com,
	jmattson@google.com, leitao@debian.org, jpoimboe@kernel.org,
	Edgecombe, Rick P, kirill.shutemov@linux.intel.com, Joseph, Jithu,
	Huang, Kai, kan.liang@linux.intel.com,
	daniel.sneddon@linux.intel.com, pbonzini@redhat.com,
	sandipan.das@amd.com, ilpo.jarvinen@linux.intel.com,
	peternewman@google.com, Wieczor-Retman, Maciej,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Eranian, Stephane, james.morse@arm.com

Hi Reinette,

On 10/15/24 12:18,  wrote:
> Hi Babu,
> 
> On 10/15/24 8:43 AM, Moger, Babu wrote:
>> Hi Reinette/Tony,
>>
>> On 10/14/24 21:39,  wrote:
>>> Hi Babu,
>>>
>>> On 10/14/24 9:35 AM, Moger, Babu wrote:
>>>> On 12/31/69 18:00, Luck, Tony wrote:
>>>  
>>>>>
>>>>> It is still the case that callers don't care about the return value.
>>>>
>>>> That is correct.
>>>>
>>>
>>> Are you planning to change this? I think Tony has a good point that since
>>> assignment failures do not matter it unnecessarily complicates the code to
>>> have rdtgroup_assign_cntrs() return failure.
>>>
>>> I also think the internals of rdtgroup_assign_cntrs() deserve a closer look.
>>> I assume that error handling within rdtgroup_assign_cntrs() was created with
>>> ABMC in mind. When only considering ABMC then the only reason why
>>> rdtgroup_assign_cntr_event() could fail is if the system ran out of counters
>>> and then indeed it makes no sense to attempt another call to rdtgroup_assign_cntr_event().
>>>
>>> Now that the resctrl fs/arch split is clear the implementation does indeed expose
>>> another opportunity for failure ... if the arch callback, resctrl_arch_config_cntr()
>>> fails. It could thus be possible for the first rdtgroup_assign_cntr_event() to fail
>>> while the second succeeds. Earlier [1], Tony suggested to, within rdtgroup_assign_cntrs(),
>>> remove the local ret variable and have it return void. This sounds good to me.
>>> When doing so a function comment explaining the usage will be helpful.
>>>
>>> I also think that rdtgroup_unassign_cntrs() deserves similar scrutiny. Even more
>>> so since I do not think that the second rdtgroup_unassign_cntr_event()
>>> should be prevented from running if the first rdtgroup_unassign_cntr_event() fails.
>>
>>
>> Sounds fine with me. Now it will look like this below.
> 
> Thank you for considering.
> 
>>
>>
> 
> I assume that you will keep rdtgroup_assign_cntrs() function comment? I think
> it may need some small changes to go with the function now returning void ...
> for example, saying "Each group *requires* two counters" and then not failing when
> two counters cannot be allocated seems suspect.
> 
> For example (please feel free to improve):
> 
> 	Called when a new group is created. If "mbm_cntr_assign" mode is enabled,   
> 	counters are automatically assigned. Each group can accommodate two counters:      
> 	one for the total event and one for the local event. Assignments may fail
> 	due to the limited number of counters. However, it is not necessary to
> 	fail the group creation and thus no failure is returned. Users have the
> 	option to modify the counter assignments after the group has been created.   
> 

Looks good. Thanks

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (24 preceding siblings ...)
  2024-10-09 17:39 ` [PATCH v8 25/25] x86/resctrl: Introduce interface to modify assignment states of " Babu Moger
@ 2024-10-16  3:05 ` Reinette Chatre
  2024-10-21 17:09   ` Moger, Babu
  25 siblings, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-16  3:05 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/9/24 10:39 AM, Babu Moger wrote:
> 
> This series adds the support for Assignable Bandwidth Monitoring Counters
> (ABMC). It is also called QoS RMID Pinning feature
> 
> Series is written such that it is easier to support other assignable
> features supported from different vendors.
> 
> The feature details are documented in the  APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC). The documentation is available at
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> 
> The patches are based on top of commit
> 5b0c5f05fb2fe (tip/master) Merge branch into tip/master: 'x86/splitlock'
> 
> # Introduction
> 
> Users can create as many monitor groups as RMIDs supported by the hardware.
> However, bandwidth monitoring feature on AMD system only guarantees that
> RMIDs currently assigned to a processor will be tracked by hardware.
> The counters of any other RMIDs which are no longer being tracked will be
> reset to zero. The MBM event counters return "Unavailable" for the RMIDs
> that are not tracked by hardware. So, there can be only limited number of
> groups that can give guaranteed monitoring numbers. With ever changing
> configurations there is no way to definitely know which of these groups
> are being tracked for certain point of time. Users do not have the option
> to monitor a group or set of groups for certain period of time without
> worrying about RMID being reset in between.

"worrying about RMID being reset in between" -> "worrying about counter being
reset in between"? 

>     
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID, event pair and monitor the bandwidth as long as it is
> assigned.  The assigned RMID will be tracked by the hardware until the user
> unassigns it manually. There is no need to worry about counters being reset
> during this period. Additionally, the user can specify a bitmask identifying
> the specific bandwidth types from the given source to track with the counter.
> 
> Without ABMC enabled, monitoring will work in current 'default' mode without
> assignment option.
> 

Reinette

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 02/25] x86/resctrl: Add ABMC feature in the command line options
  2024-10-09 17:39 ` [PATCH v8 02/25] x86/resctrl: Add ABMC feature in the command line options Babu Moger
@ 2024-10-16  3:06   ` Reinette Chatre
  0 siblings, 0 replies; 124+ messages in thread
From: Reinette Chatre @ 2024-10-16  3:06 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/9/24 10:39 AM, Babu Moger wrote:
> Add the command line option to enable or disable exposing the ABMC
> (Assignable Bandwidth Monitoring Counters) hardware feature to resctrl.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette


^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 04/25] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
  2024-10-09 17:39 ` [PATCH v8 04/25] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
@ 2024-10-16  3:06   ` Reinette Chatre
  0 siblings, 0 replies; 124+ messages in thread
From: Reinette Chatre @ 2024-10-16  3:06 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/9/24 10:39 AM, Babu Moger wrote:
> ABMC feature details are reported via CPUID Fn8000_0020_EBX_x5.
> Bits Description
> 15:0 MAX_ABMC Maximum Supported Assignable Bandwidth
>      Monitoring Counter ID + 1
> 
> The feature details are documented in APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
> 
> Detect the feature and number of assignable monitoring counters supported.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette


^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 06/25] x86/resctrl: Add support to enable/disable AMD ABMC feature
  2024-10-09 17:39 ` [PATCH v8 06/25] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
  2024-10-11 18:14   ` Tony Luck
@ 2024-10-16  3:07   ` Reinette Chatre
  1 sibling, 0 replies; 124+ messages in thread
From: Reinette Chatre @ 2024-10-16  3:07 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/9/24 10:39 AM, Babu Moger wrote:
> Add the functionality to enable/disable AMD ABMC feature.
> 
> AMD ABMC feature is enabled by setting enabled bit(0) in MSR
> L3_QOS_EXT_CFG. When the state of ABMC is changed, the MSR needs
> to be updated on all the logical processors in the QOS Domain.
> 
> Hardware counters will reset when ABMC state is changed.
> 
> The ABMC feature details are documented in APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

With the MSRs ordered numerically per Tony's suggestion:
|Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 07/25] x86/resctrl: Introduce the interface to display monitor mode
  2024-10-09 17:39 ` [PATCH v8 07/25] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
  2024-10-09 22:42   ` Tony Luck
@ 2024-10-16  3:12   ` Reinette Chatre
  2024-10-16 15:57     ` Moger, Babu
  1 sibling, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-16  3:12 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/9/24 10:39 AM, Babu Moger wrote:
> Introduce the interface file "mbm_assign_mode" to list monitor modes
> supported.
> 
> The "mbm_cntr_assign" mode provides the option to assign a counter to
> an RMID, event pair and monitor the bandwidth as long as it is assigned.
> 
> On AMD systems "mbm_cntr_assign" is backed by the ABMC (Assignable
> Bandwidth Monitoring Counters) hardware feature and is enabled by default.
> 
> The "default" mode is the existing monitoring mode that works without the
> explicit counter assignment, instead relying on dynamic counter assignment
> by hardware that may result in hardware not dedicating a counter resulting
> in monitoring data reads returning "Unavailable".
> 
> Provide an interface to display the monitor mode on the system.
> $cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> [mbm_cntr_assign]
> default
> 
> Switching the mbm_assign_mode will reset all the MBM counters of all
> resctrl groups.

Please note that this now contradicts the documentation. Perhaps this sentence
can just be dropped since there is the documentation within the patch.	


> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

...

> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index 30586728a4cd..e4a7d6e815f6 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -257,6 +257,40 @@ with the following files:
>  	    # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
>  	    0=0x30;1=0x30;3=0x15;4=0x15
>  
> +"mbm_assign_mode":
> +	Reports the list of monitoring modes supported. The enclosed brackets
> +	indicate which mode is enabled.
> +	::
> +
> +	  cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> +	  [mbm_cntr_assign]
> +	  default
> +
> +	"mbm_cntr_assign":
> +
> +	In mbm_cntr_assign mode user-space is able to specify which control
> +	or monitor groups in resctrl should have a counter assigned using the

Counters cannot be assigned to control groups. How about replacing all instances
of "control and monitor groups" with "CTRL_MON and MON groups", similarly
"control or monitor groups" with "CTRL_MON or MON groups".

> +	'mbm_assign_control' file. The number of counters available is described

Looking at the rest of the doc it seems that the custom is actually to place
filenames in double quotes, like "mbm_assign_control".

> +	in the 'num_mbm_cntrs' file. Changing the mode may cause all counters on
> +	a resource to reset.
> +
> +	The mode is useful on platforms which support more control and monitor
> +	groups than hardware counters, meaning 'unassigned' control or monitor
> +	groups will report 'Unavailable' or count the traffic in an unpredictable
> +	way.

Note two more instances of "control groups" above.

Please note that the above description implies that counter assignment is per-group. For
example, "specify which control	or monitor groups in resctrl should have a counter
assigned" and "useful on platforms which support more control and monitor groups
than hardware counters". This needs to be reworked to reflect that counters
are assigned to events.

> +
> +	AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature
> +	enable this mode by default so that counters remain assigned even when the
> +	corresponding RMID is not in use by any processor.

I assume this should remain RMID since this specifically talks about an x86 system?

> +
> +	"default":
> +
> +	By default resctrl assumes each control and monitor group has a hardware
> +	counter. Hardware that does not support 'mbm_cntr_assign' mode will still
> +	allow more control or monitor groups than 'num_rmids' to be created. In
> +	that case reading the mbm_total_bytes and mbm_local_bytes may report
> +	'Unavailable' if there is no counter associated with that group.
> +

I reconsidered my earlier suggestion and I believe it needs a correction since
counter assignment is not per group:

	In default mode resctrl assumes there is a hardware counter for each
	event within every CTRL_MON and MON group. Reading mbm_total_bytes or
	mbm_local_bytes may report 'Unavailable' if there is no counter associated
	with that event.

Please feel free to improve.

>  "max_threshold_occupancy":
>  		Read/write file provides the largest value (in
>  		bytes) at which a previously used LLC_occupancy

The code change looks good to me.

Reinette


^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 09/25] x86/resctrl: Add __init attribute to dom_data_init()
  2024-10-09 17:39 ` [PATCH v8 09/25] x86/resctrl: Add __init attribute to dom_data_init() Babu Moger
@ 2024-10-16  3:13   ` Reinette Chatre
  2024-10-16 17:32     ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-16  3:13 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/9/24 10:39 AM, Babu Moger wrote:
> dom_data_init() is only called during the __init sequence.
> Add __init attribute like the rest of call sequence.
> 
> While at it, pass 'struct rdt_resource' to dom_data_init() and
> dom_data_exit() which will be used for mbm counter __init and__exit
> call sequence.

This patch needs to be split. Please move fixes to beginning of series and
move the addition of the parameter to the patch where it is first used/needed.

> 
> Fixes: bd334c86b5d7 ("x86/resctrl: Add __init attribute to rdt_get_mon_l3_config()")

For this change I think the following Fixes tag would be more accurate:
Fixes: 6a445edce657 ("x86/intel_rdt/cqm: Add RDT monitoring initialization")

I think for a complete fix of the above commit it also needs to add __init
storage class to l3_mon_evt_init().

The __init storage class is also missing from rdt_get_mon_l3_config() ...
fixing that would indeed need the Fixes tag below:
Fixes: bd334c86b5d7 ("x86/resctrl: Add __init attribute to rdt_get_mon_l3_config()"

Reinette


^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 10/25] x86/resctrl: Introduce bitmap mbm_cntr_free_map to track assignable counters
  2024-10-09 17:39 ` [PATCH v8 10/25] x86/resctrl: Introduce bitmap mbm_cntr_free_map to track assignable counters Babu Moger
@ 2024-10-16  3:14   ` Reinette Chatre
  2024-10-17 16:55     ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-16  3:14 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/9/24 10:39 AM, Babu Moger wrote:
> Hardware provides a set of counters when mbm_assign_mode is supported.
> These counters are assigned to the MBM monitoring events of a MON group
> that needs to be tracked. The kernel must manage and track the available
> counters.
> 
> Introduce mbm_cntr_free_map bitmap to track available counters and set
> of routines to allocate and free the counters. Move dom_data_init() after
> mbm_cntr_assign detection.

Regarding "Move dom_data_init() after mbm_cntr_assign detection." - this is
clear from the patch, please use changelog to explain *why*.

> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---


> ---
>  arch/x86/kernel/cpu/resctrl/internal.h |  2 ++
>  arch/x86/kernel/cpu/resctrl/monitor.c  | 43 +++++++++++++++++++++++---
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 19 ++++++++++++
>  include/linux/resctrl.h                |  2 ++
>  4 files changed, 62 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 92eae4672312..99f9103a35ba 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -654,6 +654,8 @@ void __check_limbo(struct rdt_mon_domain *d, bool force_free);
>  void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>  void __init resctrl_file_fflags_init(const char *config,
>  				     unsigned long fflags);
> +int mbm_cntr_alloc(struct rdt_resource *r);
> +void mbm_cntr_free(struct rdt_resource *r, u32 cntr_id);
>  void rdt_staged_configs_clear(void);
>  bool closid_allocated(unsigned int closid);
>  int resctrl_find_cleanest_closid(void);
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 66b06574f660..5c2a28565747 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -983,6 +983,27 @@ void mbm_setup_overflow_handler(struct rdt_mon_domain *dom, unsigned long delay_
>  		schedule_delayed_work_on(cpu, &dom->mbm_over, delay);
>  }
>  
> +/*
> + * Counter bitmap for tracking the available counters.
> + * 'mbm_cntr_assign' mode provides set of hardware counters for assigning
> + * RMID, event pair. Each RMID and event pair takes one hardware counter.
> + */

"counters for assigning RMID, event pair" sounds strange and it seems like the same
thing is mentioned twice.
How about:
	Bitmap tracking the available hardware counters when operating in
	"mbm_cntr_assign" mode. A hardware counter can be assigned to a
	RMID, event pair.

> +static __init unsigned long *mbm_cntrs_init(struct rdt_resource *r)
> +{
> +	r->mon.mbm_cntr_free_map = bitmap_zalloc(r->mon.num_mbm_cntrs,
> +						 GFP_KERNEL);
> +	if (r->mon.mbm_cntr_free_map)
> +		bitmap_fill(r->mon.mbm_cntr_free_map, r->mon.num_mbm_cntrs);
> +
> +	return r->mon.mbm_cntr_free_map;
> +}
> +
> +static  __exit void mbm_cntrs_exit(struct rdt_resource *r)
> +{
> +	bitmap_free(r->mon.mbm_cntr_free_map);
> +	r->mon.mbm_cntr_free_map = NULL;
> +}
> +
>  static __init int dom_data_init(struct rdt_resource *r)
>  {
>  	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
> @@ -1020,6 +1041,17 @@ static __init int dom_data_init(struct rdt_resource *r)
>  		goto out_unlock;
>  	}
>  
> +	if (r->mon.mbm_cntr_assignable && !mbm_cntrs_init(r)) {
> +		if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
> +			kfree(closid_num_dirty_rmid);
> +			closid_num_dirty_rmid = NULL;
> +		}
> +		kfree(rmid_ptrs);
> +		rmid_ptrs = NULL;
> +		err = -ENOMEM;
> +		goto out_unlock;
> +	}
> +
>  	for (i = 0; i < idx_limit; i++) {
>  		entry = &rmid_ptrs[i];
>  		INIT_LIST_HEAD(&entry->list);
> @@ -1056,6 +1088,9 @@ static void __exit dom_data_exit(struct rdt_resource *r)
>  	kfree(rmid_ptrs);
>  	rmid_ptrs = NULL;
>  
> +	if (r->mon.mbm_cntr_assignable)
> +		mbm_cntrs_exit(r);
> +
>  	mutex_unlock(&rdtgroup_mutex);
>  }
>  
> @@ -1210,10 +1245,6 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>  	 */
>  	resctrl_rmid_realloc_threshold = resctrl_arch_round_mon_val(threshold);
>  
> -	ret = dom_data_init(r);
> -	if (ret)
> -		return ret;
> -
>  	if (rdt_cpu_has(X86_FEATURE_BMEC)) {
>  		u32 eax, ebx, ecx, edx;
>  
> @@ -1240,6 +1271,10 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>  		}
>  	}
>  
> +	ret = dom_data_init(r);
> +	if (ret)
> +		return ret;
> +
>  	l3_mon_evt_init(r);
>  
>  	r->mon_capable = true;
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index c48b5450e6c2..8ffebd203c31 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -185,6 +185,25 @@ bool closid_allocated(unsigned int closid)
>  	return !test_bit(closid, &closid_free_map);
>  }
>  
> +int mbm_cntr_alloc(struct rdt_resource *r)
> +{
> +	int cntr_id;
> +
> +	cntr_id = find_first_bit(r->mon.mbm_cntr_free_map,
> +				 r->mon.num_mbm_cntrs);
> +	if (cntr_id >= r->mon.num_mbm_cntrs)
> +		return -ENOSPC;
> +
> +	__clear_bit(cntr_id, r->mon.mbm_cntr_free_map);
> +
> +	return cntr_id;
> +}
> +
> +void mbm_cntr_free(struct rdt_resource *r, u32 cntr_id)
> +{
> +	__set_bit(cntr_id, r->mon.mbm_cntr_free_map);
> +}
> +
>  /**
>   * rdtgroup_mode_by_closid - Return mode of resource group with closid
>   * @closid: closid if the resource group
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index f11d6fdfd977..5a4d6adec974 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -187,12 +187,14 @@ enum resctrl_scope {
>   * @num_rmid:		Number of RMIDs available
>   * @num_mbm_cntrs:	Number of assignable monitoring counters
>   * @mbm_cntr_assignable:Is system capable of supporting monitor assignment?
> + * @mbm_cntr_free_map:	bitmap of free MBM counters
>   * @evt_list:		List of monitoring events
>   */

Please follow custom of existing doc and have description start with capital letter.

>  struct resctrl_mon {
>  	int			num_rmid;
>  	int			num_mbm_cntrs;
>  	bool			mbm_cntr_assignable;
> +	unsigned long		*mbm_cntr_free_map;
>  	struct list_head	evt_list;
>  };
>  

Reinette

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 11/25] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg in struct rdt_hw_mon_domain
  2024-10-09 17:39 ` [PATCH v8 11/25] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg in struct rdt_hw_mon_domain Babu Moger
@ 2024-10-16  3:15   ` Reinette Chatre
  0 siblings, 0 replies; 124+ messages in thread
From: Reinette Chatre @ 2024-10-16  3:15 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/9/24 10:39 AM, Babu Moger wrote:
> If the BMEC (Bandwidth Monitoring Event Configuration) feature is
> supported, the bandwidth events can be configured to track specific
> events. The event configuration is domain specific. ABMC (Assignable
> Bandwidth Monitoring Counters) feature needs event configuration
> information to assign a hardware counter to an RMID. Event configurations
> are not stored in resctrl but instead always read from or written to
> hardware directly when prompted by user space.
> 
> Read the event configuration from the hardware during the domain
> initialization. Save the configuration value in struct rdt_hw_mon_domain,
> so it can be used for counter assignment.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette


^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 12/25] x86/resctrl: Remove MSR reading of event configuration value
  2024-10-09 17:39 ` [PATCH v8 12/25] x86/resctrl: Remove MSR reading of event configuration value Babu Moger
@ 2024-10-16  3:16   ` Reinette Chatre
  2024-10-17 17:59     ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-16  3:16 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/9/24 10:39 AM, Babu Moger wrote:
> The event configuration is domain specific and initialized during domain
> initialization. The values are stored in struct rdt_hw_mon_domain.
> 
> It is not required to read the configuration register every time user asks
> for it. Use the value stored in struct rdt_hw_mon_domain instead.
> 
> Introduce resctrl_arch_mon_event_config_get() and
> resctrl_arch_mon_event_config_set() to get/set architecture domain specific
> mbm_total_cfg/mbm_local_cfg values.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
...


> +void resctrl_arch_mon_event_config_set(void *info)
> +{
> +	struct mon_config_info *mon_info = info;
> +	struct rdt_hw_mon_domain *hw_dom;
> +	unsigned int index;
> +
> +	index = mon_event_config_index_get(mon_info->evtid);
> +	if (index == INVALID_CONFIG_INDEX)
> +		return;
> +
> +	wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
> +
> +	hw_dom = resctrl_to_arch_mon_dom(mon_info->d);
> +
> +	switch (mon_info->evtid) {
> +	case QOS_L3_OCCUP_EVENT_ID:
> +		break;

This check does no harm but I do not think it is necessary since earlier
mon_event_config_index_get() would return INVALID_CONFIG_INDEX if the
evtid is QOS_L3_OCCUP_EVENT_ID.

> +	case QOS_L3_MBM_TOTAL_EVENT_ID:
> +		hw_dom->mbm_total_cfg = mon_info->mon_config;
> +		break;
> +	case QOS_L3_MBM_LOCAL_EVENT_ID:
> +		hw_dom->mbm_local_cfg =  mon_info->mon_config;

nit: unnecessary space

> +		break;
> +	}
> +}
> +

Reinette


^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 13/25] x86/resctrl: Introduce mbm_cntr_map to track assignable counters at domain
  2024-10-09 17:39 ` [PATCH v8 13/25] x86/resctrl: Introduce mbm_cntr_map to track assignable counters at domain Babu Moger
@ 2024-10-16  3:19   ` Reinette Chatre
  0 siblings, 0 replies; 124+ messages in thread
From: Reinette Chatre @ 2024-10-16  3:19 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/9/24 10:39 AM, Babu Moger wrote:
> The MBM counters are allocated globally and assigned to an RMID, event pair
> in a resctrl group. It is tracked by mbm_cntr_free_map. Counters are
> assigned to the domain based on the user input. It needs to be tracked
> at domain level also.
> 
> Add the mbm_cntr_map bitmap in struct rdt_mon_domain to keep track of
> assignment at domain level. The global counter at mbm_cntr_free_map can
> be released when assignment at all the domains are cleared.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette


^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 14/25] x86/resctrl: Add data structures and definitions for ABMC assignment
  2024-10-09 17:39 ` [PATCH v8 14/25] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
@ 2024-10-16  3:21   ` Reinette Chatre
  2024-10-17 18:52     ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-16  3:21 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/9/24 10:39 AM, Babu Moger wrote:
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID, event pair and monitor the bandwidth as long as the
> counter is assigned. The bandwidth events will be tracked by the hardware
> until the user changes the configuration. Each resctrl group can configure
> maximum two counters, one for total event and one for local event.
> 
> The ABMC feature implements an MSR L3_QOS_ABMC_CFG (C000_03FDh).
> Configuration is done by setting the counter id, bandwidth source (RMID)
> and bandwidth configuration supported by BMEC (Bandwidth Monitoring Event
> Configuration).
> 
> Attempts to read or write the MSR when ABMC is not enabled will result
> in a #GP(0) exception.
> 
> Introduce the data structures and definitions for MSR L3_QOS_ABMC_CFG
> (0xC000_03FDh):
> =========================================================================
> Bits 	Mnemonic	Description			Access Reset
> 							Type   Value
> =========================================================================
> 63 	CfgEn 		Configuration Enable 		R/W 	0
> 
> 62 	CtrEn 		Enable/disable counting		R/W 	0
> 
> 61:53 	– 		Reserved 			MBZ 	0
> 
> 52:48 	CtrID 		Counter Identifier		R/W	0
> 
> 47 	IsCOS		BwSrc field is a CLOSID		R/W	0
> 			(not an RMID)
> 
> 46:44 	–		Reserved			MBZ	0
> 
> 43:32	BwSrc		Bandwidth Source		R/W	0
> 			(RMID or CLOSID)
> 
> 31:0	BwType		Bandwidth configuration		R/W	0
> 			to track for this counter
> ==========================================================================
> 
> The feature details are documented in the APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

...

> ---
>  arch/x86/include/asm/msr-index.h       |  1 +
>  arch/x86/kernel/cpu/resctrl/internal.h | 33 ++++++++++++++++++++++++++
>  2 files changed, 34 insertions(+)
> 
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 43c9dc473aba..2c281c977342 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -1196,6 +1196,7 @@
>  #define MSR_IA32_SMBA_BW_BASE		0xc0000280
>  #define MSR_IA32_EVT_CFG_BASE		0xc0000400
>  #define MSR_IA32_L3_QOS_EXT_CFG		0xc00003ff
> +#define MSR_IA32_L3_QOS_ABMC_CFG	0xc00003fd
>  

As Tony mentioned, also please correct order of this MSR.

>  /* AMD-V MSRs */
>  #define MSR_VM_CR                       0xc0010114
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 86e3e188c119..de397468b945 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -602,6 +602,39 @@ union cpuid_0x10_x_edx {
>  	unsigned int full;
>  };
>  
> +/*
> + * ABMC counters can be configured by writing to L3_QOS_ABMC_CFG.

"ABMC counters are configured by writing to L3_QOS_ABMC_CFG."

> + * Reading L3_QOS_ABMC_DSC returns the configuration of the counter id
> + * specified in L3_QOS_ABMC_CFG.cntr_id.

First and only mention/use of L3_QOS_ABMC_DSC in this series. If this register
is not used then references to it can be removed.

> + * @bw_type		: Bandwidth configuration(supported by BMEC)

"configuration(supported" -> "configuration (supported" 

> + *			  tracked by the @cntr_id.
> + * @bw_src		: Bandwidth source (RMID or CLOSID).
> + * @reserved1		: Reserved.
> + * @is_clos		: @bw_src field is a CLOSID (not an RMID).
> + * @cntr_id		: Counter identifier.
> + * @reserved		: Reserved.
> + * @cntr_en		: Counting enable bit.
> + * @cfg_en		: Configuration enable bit.
> + *
> + * Configuration and counting:
> + * cfg_en=0,            : No configuration changes applied.

Can this be expanded? (sidenote: It is taking a long time to get clarity on how
to interact with hardware. These incremental cryptic fragments make it difficult
to know how to interact with the hardware.)

For example, "No configuration changes applied. Counter can be configured across
multiple writes to MSR while @cfg_en=0. Configuration applied when @cfg_en=1."

> + * cfg_en=1, cntr_en=0  : Configure cntr_id and but no counting the events.

hmmm ... still the same (""but no counting the events") strange language I
highlighted in V7 ...

I think it will make things easier to understand if similar language is used
between the descriptions of the different fields.

"Apply @cntr_id configuration but do not count events." 
 
> + * cfg_en=1, cntr_en=1  : Configure cntr_id and start counting the events.

"Apply @cntr_id configuration and start counting events." 

Can it be added here which of these settings (or combination of settings) result
in counters being reset?

> + */
> +union l3_qos_abmc_cfg {
> +	struct {
> +		unsigned long bw_type  :32,
> +			      bw_src   :12,
> +			      reserved1: 3,
> +			      is_clos  : 1,
> +			      cntr_id  : 5,
> +			      reserved : 9,
> +			      cntr_en  : 1,
> +			      cfg_en   : 1;
> +	} split;
> +	unsigned long full;
> +};
> +
>  void rdt_last_cmd_clear(void);
>  void rdt_last_cmd_puts(const char *s);
>  __printf(1, 2)

Reinette

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 15/25] x86/resctrl: Introduce cntr_id in mongroup for assignments
  2024-10-09 17:39 ` [PATCH v8 15/25] x86/resctrl: Introduce cntr_id in mongroup for assignments Babu Moger
@ 2024-10-16  3:22   ` Reinette Chatre
  2024-10-17 19:19     ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-16  3:22 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/9/24 10:39 AM, Babu Moger wrote:
> mbm_cntr_assign feature provides an option to the user to assign a counter
> to an RMID, event pair and monitor the bandwidth as long as the counter is
> assigned. There can be two counters per monitor group, one for MBM total
> event and another for MBM local event.
> 
> Introduce cntr_id to manage the assignments.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v8: Minor commit message update.
> 
> v7: Minor comment update for cntr_id.
> 
> v6: New patch.
>     Separated FS and arch bits.
> ---
>  arch/x86/kernel/cpu/resctrl/internal.h | 7 +++++++
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 6 ++++++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index de397468b945..58298db9034f 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -62,6 +62,11 @@
>  /* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
>  #define ABMC_ENABLE_BIT			0
>  
> +/* Maximum assignable counters per resctrl group */
> +#define MAX_CNTRS			2
> +
> +#define MON_CNTR_UNSET			U32_MAX
> +
>  /**
>   * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
>   *			        aren't marked nohz_full
> @@ -231,12 +236,14 @@ enum rdtgrp_mode {
>   * @parent:			parent rdtgrp
>   * @crdtgrp_list:		child rdtgroup node list
>   * @rmid:			rmid for this rdtgroup
> + * @cntr_id:			IDs of hardware counters assigned to monitor group
>   */
>  struct mongroup {
>  	struct kernfs_node	*mon_data_kn;
>  	struct rdtgroup		*parent;
>  	struct list_head	crdtgrp_list;
>  	u32			rmid;
> +	u32			cntr_id[MAX_CNTRS];
>  };
>  
>  /**
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 610eae64b13a..03b670b95c49 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -3530,6 +3530,9 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
>  	}
>  	rdtgrp->mon.rmid = ret;
>  
> +	rdtgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
> +	rdtgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
> +
>  	ret = mkdir_mondata_all(rdtgrp->kn, rdtgrp, &rdtgrp->mon.mon_data_kn);
>  	if (ret) {
>  		rdt_last_cmd_puts("kernfs subdir error\n");
> @@ -4084,6 +4087,9 @@ static void __init rdtgroup_setup_default(void)
>  	rdtgroup_default.closid = RESCTRL_RESERVED_CLOSID;
>  	rdtgroup_default.mon.rmid = RESCTRL_RESERVED_RMID;
>  	rdtgroup_default.type = RDTCTRL_GROUP;
> +	rdtgroup_default.mon.cntr_id[0] = MON_CNTR_UNSET;
> +	rdtgroup_default.mon.cntr_id[1] = MON_CNTR_UNSET;
> +

Could these magic constants be avoided by introducing MBM_EVENT_ARRAY_INDEX here
and using it for the array index instead of "0" and "1"?

>  	INIT_LIST_HEAD(&rdtgroup_default.mon.crdtgrp_list);
>  
>  	list_add(&rdtgroup_default.rdtgroup_list, &rdt_all_groups);

Reinette



^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 16/25] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
  2024-10-09 17:39 ` [PATCH v8 16/25] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC Babu Moger
@ 2024-10-16  3:23   ` Reinette Chatre
  2024-10-17 22:44     ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-16  3:23 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/9/24 10:39 AM, Babu Moger wrote:
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID, event pair and monitor the bandwidth as long as it is
> assigned. The assigned RMID will be tracked by the hardware until the user
> unassigns it manually.
> 
> Counters are configured by writing to L3_QOS_ABMC_CFG MSR and
> specifying the counter id, bandwidth source, and bandwidth types.
> 
> Provide the interface to assign the counter ids to RMID.
> 
> The feature details are documented in the APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>     Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>     Monitoring (ABMC).
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

...

> ---
>  arch/x86/kernel/cpu/resctrl/internal.h |  3 ++
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 45 ++++++++++++++++++++++++++
>  2 files changed, 48 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 58298db9034f..6d4df0490186 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -705,6 +705,9 @@ int mbm_cntr_alloc(struct rdt_resource *r);
>  void mbm_cntr_free(struct rdt_resource *r, u32 cntr_id);
>  void arch_mbm_evt_config_init(struct rdt_hw_mon_domain *hw_dom);
>  unsigned int mon_event_config_index_get(u32 evtid);
> +int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
> +			     enum resctrl_event_id evtid, u32 rmid, u32 closid,
> +			     u32 cntr_id, bool assign);
>  void rdt_staged_configs_clear(void);
>  bool closid_allocated(unsigned int closid);
>  int resctrl_find_cleanest_closid(void);
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 03b670b95c49..4ab1a18010c9 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1853,6 +1853,51 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
>  	return ret ?: nbytes;
>  }
>  
> +static void resctrl_abmc_config_one_amd(void *info)
> +{
> +	u64 *msrval = info;
> +
> +	wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *msrval);
> +}
> +
> +/*
> + * Send an IPI to the domain to assign the counter to RMID, event pair.
> + */
> +int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
> +			     enum resctrl_event_id evtid, u32 rmid, u32 closid,
> +			     u32 cntr_id, bool assign)
> +{
> +	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
> +	union l3_qos_abmc_cfg abmc_cfg = { 0 };
> +	struct arch_mbm_state *arch_mbm;
> +
> +	abmc_cfg.split.cfg_en = 1;
> +	abmc_cfg.split.cntr_en = assign ? 1 : 0;
> +	abmc_cfg.split.cntr_id = cntr_id;
> +	abmc_cfg.split.bw_src = rmid;
> +
> +	/* Update the event configuration from the domain */
> +	if (evtid == QOS_L3_MBM_TOTAL_EVENT_ID) {
> +		abmc_cfg.split.bw_type = hw_dom->mbm_total_cfg;
> +		arch_mbm = &hw_dom->arch_mbm_total[rmid];
> +	} else {
> +		abmc_cfg.split.bw_type = hw_dom->mbm_local_cfg;
> +		arch_mbm = &hw_dom->arch_mbm_local[rmid];
> +	}
> +
> +	smp_call_function_any(&d->hdr.cpu_mask, resctrl_abmc_config_one_amd,
> +			      &abmc_cfg, 1);
> +
> +	/*
> +	 * Reset the architectural state so that reading of hardware
> +	 * counter is not considered as an overflow in next update.
> +	 */
> +	if (arch_mbm)
> +		memset(arch_mbm, 0, sizeof(struct arch_mbm_state));
> +

More on this later, but I do believe later code can be simplified if
reset of architectural state is done by caller. This function should
focus on just configuring the counter.

> +	return 0;
> +}
> +
>  /* rdtgroup information files for one cache resource. */
>  static struct rftype res_common_files[] = {
>  	{

Reinette

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 17/25] x86/resctrl: Add the interface to assign/update counter assignment
  2024-10-09 17:39 ` [PATCH v8 17/25] x86/resctrl: Add the interface to assign/update counter assignment Babu Moger
@ 2024-10-16  3:25   ` Reinette Chatre
  2024-10-17 22:56     ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-16  3:25 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/9/24 10:39 AM, Babu Moger wrote:
> The mbm_cntr_assign mode offers several hardware counters that can be
> assigned to an RMID-event pair and monitor the bandwidth as long as it

repeated nit (to be consistent): RMID, event 

> is assigned.
> 
> Counters are managed at two levels. The global assignment is tracked
> using the mbm_cntr_free_map field in the struct resctrl_mon, while
> domain-specific assignments are tracked using the mbm_cntr_map field
> in the struct rdt_mon_domain. Allocation begins at the global level
> and is then applied individually to each domain.
> 
> Introduce an interface to allocate these counters and update the
> corresponding domains accordingly.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v8: Renamed rdtgroup_assign_cntr() to rdtgroup_assign_cntr_event().
>     Added the code to return the error if rdtgroup_assign_cntr_event fails.
>     Moved definition of MBM_EVENT_ARRAY_INDEX to resctrl/internal.h.
>     Updated typo in the comments.
> 
> v7: New patch. Moved all the FS code here.
>     Merged rdtgroup_assign_cntr and rdtgroup_alloc_cntr.
>     Adde new #define MBM_EVENT_ARRAY_INDEX.
> ---
>  arch/x86/kernel/cpu/resctrl/internal.h |  9 +++++
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 47 ++++++++++++++++++++++++++
>  2 files changed, 56 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 6d4df0490186..900e18aea2c4 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -67,6 +67,13 @@
>  
>  #define MON_CNTR_UNSET			U32_MAX
>  
> +/*
> + * Get the counter index for the assignable counter
> + * 0 for evtid == QOS_L3_MBM_TOTAL_EVENT_ID
> + * 1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
> + */
> +#define MBM_EVENT_ARRAY_INDEX(_event) ((_event) - 2)
> +

This can be moved to patch that introduces and initializes the array and used there.

>  /**
>   * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
>   *			        aren't marked nohz_full
> @@ -708,6 +715,8 @@ unsigned int mon_event_config_index_get(u32 evtid);
>  int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>  			     enum resctrl_event_id evtid, u32 rmid, u32 closid,
>  			     u32 cntr_id, bool assign);
> +int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
> +			       struct rdt_mon_domain *d, enum resctrl_event_id evtid);
>  void rdt_staged_configs_clear(void);
>  bool closid_allocated(unsigned int closid);
>  int resctrl_find_cleanest_closid(void);
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 4ab1a18010c9..e4f628e6fe65 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1898,6 +1898,53 @@ int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>  	return 0;
>  }
>  
> +/*
> + * Assign a hardware counter to the group.

hmmm ... counters are not assigned to groups. How about:
"Assign a hardware counter to event @evtid of group @rdtgrp"?

> + * Counter will be assigned to all the domains if rdt_mon_domain is NULL
> + * else the counter will be allocated to specific domain.

"will be allocated to" -> "will be assigned to"?

> + */
> +int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
> +			       struct rdt_mon_domain *d, enum resctrl_event_id evtid)
> +{
> +	int index = MBM_EVENT_ARRAY_INDEX(evtid);
> +	int cntr_id = rdtgrp->mon.cntr_id[index];
> +	int ret;
> +
> +	/*
> +	 * Allocate a new counter id to the event if the counter is not
> +	 * assigned already.
> +	 */
> +	if (cntr_id == MON_CNTR_UNSET) {
> +		cntr_id = mbm_cntr_alloc(r);
> +		if (cntr_id < 0) {
> +			rdt_last_cmd_puts("Out of MBM assignable counters\n");
> +			return -ENOSPC;
> +		}
> +		rdtgrp->mon.cntr_id[index] = cntr_id;
> +	}
> +
> +	if (!d) {
> +		list_for_each_entry(d, &r->mon_domains, hdr.list) {
> +			ret = resctrl_arch_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
> +						       rdtgrp->closid, cntr_id, true);
> +			if (ret)
> +				goto out_done_assign;
> +
> +			set_bit(cntr_id, d->mbm_cntr_map);

The code pattern above is repeated four times in this work, twice in
rdtgroup_assign_cntr_event() and twice in rdtgroup_unassign_cntr_event(). This
duplication should be avoided. It can be done in a function that also resets
the architectural state.

> +		}
> +	} else {
> +		ret = resctrl_arch_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
> +					       rdtgrp->closid, cntr_id, true);
> +		if (ret)
> +			goto out_done_assign;
> +
> +		set_bit(cntr_id, d->mbm_cntr_map);
> +	}
> +
> +out_done_assign:

Should a newly allocated counter not be freed if it could not be configured?

> +	return ret;
> +}
> +
>  /* rdtgroup information files for one cache resource. */
>  static struct rftype res_common_files[] = {
>  	{

Reinette

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 18/25] x86/resctrl: Add the interface to unassign a MBM counter
  2024-10-09 17:39 ` [PATCH v8 18/25] x86/resctrl: Add the interface to unassign a MBM counter Babu Moger
@ 2024-10-16  3:29   ` Reinette Chatre
  2024-10-17 23:11     ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-16  3:29 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/9/24 10:39 AM, Babu Moger wrote:
> The mbm_cntr_assign mode provides a limited number of hardware counters
> that can be assigned to an RMID-event pair to monitor bandwidth while
> assigned. If all counters are in use, the kernel will show an error
> message: "Out of MBM assignable counters" when a new assignment is
> requested. To make space for a new assignment, users must unassign an
> already assigned counter.
> 
> Introduce an interface that allows for the unassignment of counter IDs
> from both the group and the domain. Additionally, ensure that the global
> counter is released if it is no longer assigned to any domains.

Needs imperative tone ... "Release the global counter ..."

> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
...

> ---
>  arch/x86/kernel/cpu/resctrl/internal.h |  2 +
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 56 ++++++++++++++++++++++++++
>  2 files changed, 58 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 900e18aea2c4..6f388d20fb22 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -717,6 +717,8 @@ int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>  			     u32 cntr_id, bool assign);
>  int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>  			       struct rdt_mon_domain *d, enum resctrl_event_id evtid);
> +int rdtgroup_unassign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
> +				 struct rdt_mon_domain *d, enum resctrl_event_id evtid);
>  void rdt_staged_configs_clear(void);
>  bool closid_allocated(unsigned int closid);
>  int resctrl_find_cleanest_closid(void);
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index e4f628e6fe65..791258adcbda 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1945,6 +1945,62 @@ int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>  	return ret;
>  }
>  
> +static bool mbm_cntr_assigned_to_domain(struct rdt_resource *r, u32 cntr_id)
> +{
> +	struct rdt_mon_domain *d;
> +
> +	list_for_each_entry(d, &r->mon_domains, hdr.list)
> +		if (test_bit(cntr_id, d->mbm_cntr_map))
> +			return 1;
> +
> +	return 0;
> +}
> +
> +/*
> + * Unassign a hardware counter from the domain and the group.

Not sure ... maybe "Unassign a hardware counter associated with @evtid from
the domain and the group."?

> + * Counter will be unassigned in all the domains if rdt_mon_domain is NULL

Please use imperative tone: "Unassign the counter from all the domains ...."

> + * else the counter will be assigned to specific domain.

copy&paste error?
"assigned to specific domain" -> "unassign from specific domain"?

> + * Global counter will be freed once it is unassigned from all the domains.

Needs imperative tone.

> + */
> +int rdtgroup_unassign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
> +				 struct rdt_mon_domain *d, enum resctrl_event_id evtid)
> +{
> +	int index = MBM_EVENT_ARRAY_INDEX(evtid);
> +	int cntr_id = rdtgrp->mon.cntr_id[index];
> +	int ret;
> +
> +	/* Return early if the counter is unassigned already */
> +	if (cntr_id == MON_CNTR_UNSET)
> +		return 0;
> +
> +	if (!d) {
> +		list_for_each_entry(d, &r->mon_domains, hdr.list) {
> +			ret = resctrl_arch_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
> +						       rdtgrp->closid, cntr_id, false);
> +			if (ret)
> +				goto out_done_unassign;
> +
> +			clear_bit(cntr_id, d->mbm_cntr_map);
> +		}
> +	} else {
> +		ret = resctrl_arch_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
> +					       rdtgrp->closid, cntr_id, false);
> +		if (ret)
> +			goto out_done_unassign;
> +
> +		clear_bit(cntr_id, d->mbm_cntr_map);

Please see comment to previous patch about the duplicate snippets. Snippets can be
replaced with single function that also resets architectural state.

> +	}
> +
> +	/* Update the counter bitmap */

What is the update?

> +	if (!mbm_cntr_assigned_to_domain(r, cntr_id)) {
> +		mbm_cntr_free(r, cntr_id);
> +		rdtgrp->mon.cntr_id[index] = MON_CNTR_UNSET;
> +	}
> +
> +out_done_unassign:
> +	return ret;
> +}
> +
>  /* rdtgroup information files for one cache resource. */
>  static struct rftype res_common_files[] = {
>  	{


Reinette

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 19/25] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled
  2024-10-09 17:39 ` [PATCH v8 19/25] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled Babu Moger
  2024-10-11 17:17   ` Tony Luck
@ 2024-10-16  3:30   ` Reinette Chatre
  2024-10-18 14:22     ` Moger, Babu
  1 sibling, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-16  3:30 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/9/24 10:39 AM, Babu Moger wrote:
> Assign/unassign counters on resctrl group creation/deletion. Two counters
> are required per group, one for MBM total event and one for MBM local
> event.
> 
> There are a limited number of counters available for assignment. If these
> counters are exhausted, the kernel will display the error message: "Out of
> MBM assignable counters". However, it is not necessary to fail the
> creation of a group due to assignment failures. Users have the flexibility
> to modify the assignments at a later time.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
...

> ---
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 64 ++++++++++++++++++++++++++
>  1 file changed, 64 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 791258adcbda..cb2c60c0319e 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c

...

>  static int rdt_get_tree(struct fs_context *fc)
>  {
>  	struct rdt_fs_context *ctx = rdt_fc2context(fc);
> @@ -2934,6 +2980,8 @@ static int rdt_get_tree(struct fs_context *fc)
>  		if (ret < 0)
>  			goto out_mongrp;
>  		rdtgroup_default.mon.mon_data_kn = kn_mondata;
> +
> +		rdtgroup_assign_cntrs(&rdtgroup_default);
>  	}
>  
>  	ret = rdt_pseudo_lock_init();
> @@ -2964,6 +3012,7 @@ static int rdt_get_tree(struct fs_context *fc)
>  out_psl:
>  	rdt_pseudo_lock_release();
>  out_mondata:
> +	rdtgroup_unassign_cntrs(&rdtgroup_default);
>  	if (resctrl_arch_mon_capable())
>  		kernfs_remove(kn_mondata);

I think I mentioned this before ... this addition belongs within the
"if (resctrl_arch_mon_capable())" to be symmetrical with where it was called from.

>  out_mongrp:
> @@ -3144,6 +3193,7 @@ static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp)
>  
>  	head = &rdtgrp->mon.crdtgrp_list;
>  	list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) {
> +		rdtgroup_unassign_cntrs(sentry);
>  		free_rmid(sentry->closid, sentry->mon.rmid);
>  		list_del(&sentry->mon.crdtgrp_list);
>  
> @@ -3184,6 +3234,8 @@ static void rmdir_all_sub(void)
>  		cpumask_or(&rdtgroup_default.cpu_mask,
>  			   &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);
>  
> +		rdtgroup_unassign_cntrs(rdtgrp);
> +
>  		free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>  
>  		kernfs_remove(rdtgrp->kn);
> @@ -3223,6 +3275,8 @@ static void rdt_kill_sb(struct super_block *sb)
>  		resctrl_arch_disable_alloc();
>  	if (resctrl_arch_mon_capable())
>  		resctrl_arch_disable_mon();
> +
> +	rdtgroup_unassign_cntrs(&rdtgroup_default);

Unassigning counters after monitoring is completely disabled seems late. I
think this can be moved earlier to be right after the counters of all the
other groups are unassigned.

>  	resctrl_mounted = false;
>  	kernfs_kill_sb(sb);
>  	mutex_unlock(&rdtgroup_mutex);
> @@ -3814,6 +3868,8 @@ static int rdtgroup_mkdir_mon(struct kernfs_node *parent_kn,
>  		goto out_unlock;
>  	}
>  
> +	rdtgroup_assign_cntrs(rdtgrp);
> +
>  	kernfs_activate(rdtgrp->kn);
>  
>  	/*
> @@ -3858,6 +3914,8 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
>  	if (ret)
>  		goto out_closid_free;
>  
> +	rdtgroup_assign_cntrs(rdtgrp);
> +
>  	kernfs_activate(rdtgrp->kn);
>  
>  	ret = rdtgroup_init_alloc(rdtgrp);
> @@ -3883,6 +3941,7 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
>  out_del_list:
>  	list_del(&rdtgrp->rdtgroup_list);
>  out_rmid_free:
> +	rdtgroup_unassign_cntrs(rdtgrp);
>  	mkdir_rdt_prepare_rmid_free(rdtgrp);
>  out_closid_free:
>  	closid_free(closid);
> @@ -3953,6 +4012,9 @@ static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
>  	update_closid_rmid(tmpmask, NULL);
>  
>  	rdtgrp->flags = RDT_DELETED;
> +
> +	rdtgroup_unassign_cntrs(rdtgrp);
> +
>  	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>  
>  	/*
> @@ -3999,6 +4061,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
>  	cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
>  	update_closid_rmid(tmpmask, NULL);
>  
> +	rdtgroup_unassign_cntrs(rdtgrp);
> +
>  	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>  	closid_free(rdtgrp->closid);
>  

Reinette

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 20/25] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode
  2024-10-09 17:39 ` [PATCH v8 20/25] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode Babu Moger
  2024-10-11 17:23   ` Tony Luck
@ 2024-10-16  3:31   ` Reinette Chatre
  2024-10-18 14:31     ` Moger, Babu
  1 sibling, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-16  3:31 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/9/24 10:39 AM, Babu Moger wrote:
> In mbm_cntr_assign mode, the hardware counter should be assigned to read
> the MBM events.
> 
> Report "Unassigned" in case the user attempts to read the events without
> assigning the counter.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v8: Used MBM_EVENT_ARRAY_INDEX to get the index for the MBM event.
>     Documentation update to make the text generic.
> 
> v7: Moved the documentation under "mon_data".
>     Updated the text little bit.
> 
> v6: Added more explaination in the resctrl.rst
>     Added checks to detect "Unassigned" before reading RMID.
> 
> v5: New patch.
> ---
>  Documentation/arch/x86/resctrl.rst        | 10 ++++++++++
>  arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 13 ++++++++++++-
>  2 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index 1b5c05a35793..99ee9c87952b 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -419,6 +419,16 @@ When monitoring is enabled all MON groups will also contain:
>  	for the L3 cache they occupy). These are named "mon_sub_L3_YY"
>  	where "YY" is the node number.
>  
> +	When supported the 'mbm_cntr_assign' mode allows users to assign a
> +	counter to mon_hw_id, event pair enabling bandwidth monitoring for
> +	as long as the counter remains assigned. The hardware will continue
> +	tracking the assigned mon_hw_id until the user manually unassigns
> +	it, ensuring that counters are not reset during this period. With
> +	a limited number of counters, the system may run out of assignable
> +	counters at some point. In that case, MBM event counters will return

nit: "at some point" can be dropped for clarity.

> +	"Unassigned" when the event is read. Users must manually assign a
> +	counter to read the events.
> +
>  "mon_hw_id":
>  	Available only with debug option. The identifier used by hardware
>  	for the monitor group. On x86 this is the RMID.
> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> index 50fa1fe9a073..5a9d15b2c319 100644
> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> @@ -562,7 +562,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>  	struct rdtgroup *rdtgrp;
>  	struct rdt_resource *r;
>  	union mon_data_bits md;
> -	int ret = 0;
> +	int ret = 0, index;
>  
>  	rdtgrp = rdtgroup_kn_lock_live(of->kn);
>  	if (!rdtgrp) {
> @@ -576,6 +576,15 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>  	evtid = md.u.evtid;
>  	r = &rdt_resources_all[resid].r_resctrl;
>  
> +	if (resctrl_arch_mbm_cntr_assign_enabled(r) && evtid != QOS_L3_OCCUP_EVENT_ID) {
> +		index = MBM_EVENT_ARRAY_INDEX(evtid);
> +		if (index != INVALID_CONFIG_INDEX &&
> +		    rdtgrp->mon.cntr_id[index] == MON_CNTR_UNSET) {
> +			rr.err = -ENOENT;
> +			goto checkresult;
> +		}
> +	}
> +
>  	if (md.u.sum) {
>  		/*
>  		 * This file requires summing across all domains that share
> @@ -613,6 +622,8 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>  		seq_puts(m, "Error\n");
>  	else if (rr.err == -EINVAL)
>  		seq_puts(m, "Unavailable\n");
> +	else if (rr.err == -ENOENT)
> +		seq_puts(m, "Unassigned\n");
>  	else
>  		seq_printf(m, "%llu\n", rr.val);
>  

Reinette

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 21/25] x86/resctrl: Introduce the interface to switch between monitor modes
  2024-10-09 17:39 ` [PATCH v8 21/25] x86/resctrl: Introduce the interface to switch between monitor modes Babu Moger
@ 2024-10-16  3:36   ` Reinette Chatre
  2024-10-18 15:13     ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-16  3:36 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/9/24 10:39 AM, Babu Moger wrote:
> Introduce interface to switch between mbm_cntr_assign and default modes.
> 
> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> [mbm_cntr_assign]
> default
> 
> To enable the "mbm_cntr_assign" mode:
> $ echo "mbm_cntr_assign" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> 
> To enable the default monitoring mode:
> $ echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> 
> MBM event counters will reset when mbm_assign_mode is changed.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

...

> ---
>  Documentation/arch/x86/resctrl.rst     | 15 ++++++
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 75 +++++++++++++++++++++++++-
>  2 files changed, 89 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index 99ee9c87952b..d9574078f735 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -291,6 +291,21 @@ with the following files:
>  	that case reading the mbm_total_bytes and mbm_local_bytes may report
>  	'Unavailable' if there is no counter associated with that group.
>  
> +	* To enable "mbm_cntr_assign" mode:
> +	  ::
> +
> +	    # echo  "mbm_cntr_assign" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode

extra spaces

> +
> +	* To enable default monitoring mode:
> +	  ::
> +
> +	    # echo  "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode

extra spaces

> +
> +	The MBM events (mbm_total_bytes and/or mbm_local_bytes) associated counters

I did ask you not to copy the text verbatim
https://lore.kernel.org/all/b38c93bf-4650-45d1-9aca-8b4c4d425886@intel.com/

> +	may reset when the mode is changed. Moving to mbm_cntr_assign mode will
> +	require users to assign the counters to the events. Otherwise, the MBM

"will require" -> "require"

> +	event counters will return "Unassigned" when read.
> +
>  "num_mbm_cntrs":
>  	The number of monitoring counters available for assignment when the
>  	architecture supports mbm_cntr_assign mode.
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index cb2c60c0319e..88eda3cf5c82 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -888,6 +888,78 @@ static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
>  	return 0;
>  }
>  
> +static void mbm_cntr_reset(struct rdt_resource *r)
> +{
> +	struct rdtgroup *prgrp, *crgrp;
> +	struct rdt_mon_domain *dom;
> +
> +	/*
> +	 * Hardware counters will reset after switching the monitor mode.
> +	 * Reset the architectural state so that reading of hardware
> +	 * counter is not considered as an overflow in the next update.
> +	 * Also reset the domain counter bitmap.
> +	 */
> +	list_for_each_entry(dom, &r->mon_domains, hdr.list) {
> +		bitmap_zero(dom->mbm_cntr_map, r->mon.num_mbm_cntrs);
> +		resctrl_arch_reset_rmid_all(r, dom);
> +	}
> +
> +	/* Reset global MBM counter map */
> +	bitmap_fill(r->mon.mbm_cntr_free_map, r->mon.num_mbm_cntrs);
> +
> +	/* Reset the cntr_id's for all the monitor groups */
> +	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
> +		prgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
> +		prgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
> +		list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list,
> +				    mon.crdtgrp_list) {
> +			crgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
> +			crgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
> +		}

Please use MBM_EVENT_ARRAY_INDEX

> +	}
> +}
> +
> +static ssize_t rdtgroup_mbm_assign_mode_write(struct kernfs_open_file *of,
> +					      char *buf, size_t nbytes, loff_t off)
> +{
> +	struct rdt_resource *r = of->kn->parent->priv;
> +	int ret = 0;
> +	bool enable;
> +
> +	/* Valid input requires a trailing newline */
> +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
> +		return -EINVAL;
> +
> +	buf[nbytes - 1] = '\0';
> +
> +	cpus_read_lock();
> +	mutex_lock(&rdtgroup_mutex);
> +
> +	rdt_last_cmd_clear();
> +
> +	if (!strcmp(buf, "default")) {
> +		enable = 0;
> +	} else if (!strcmp(buf, "mbm_cntr_assign")) {
> +		enable = 1;
> +	} else {
> +		ret = -EINVAL;
> +		rdt_last_cmd_puts("Unsupported assign mode\n");
> +		goto write_exit;
> +	}

Please keep two things in mind:
* this file is always accessible, whether platform supports assignable
  counters or not.
* this is resctrl fs code.

So, considering above, how should user interpret the "Unsupported assign mode"?
Shouldn't it also return this error if a user attempts to enable
"mbm_cntr_assign" on a platform that does not support this mode?

> +
> +	if (enable != resctrl_arch_mbm_cntr_assign_enabled(r)) {

resctrl_arch_mbm_cntr_assign_enabled() returns true if mbm_cntr_assign
mode is enabled, but when it returns false it could mean different things:
platform supports mbm_cntr_assign mode, but it is disabled, or platform
does not support mbm_cntr_assign mode.

resctrl fs should not rely on all archs to duplicate the all the checking done
in resctrl_arch_mbm_cntr_assign_set(). It should never ask arch to enable a mode
that it knows the platform is not capable of.

> +		ret = resctrl_arch_mbm_cntr_assign_set(r, enable);
> +		if (!ret)
> +			mbm_cntr_reset(r);
> +	}
> +
> +write_exit:
> +	mutex_unlock(&rdtgroup_mutex);
> +	cpus_read_unlock();
> +
> +	return ret ?: nbytes;
> +}
> +
>  static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
>  				       struct seq_file *s, void *v)
>  {
> @@ -2115,9 +2187,10 @@ static struct rftype res_common_files[] = {
>  	},
>  	{
>  		.name		= "mbm_assign_mode",
> -		.mode		= 0444,
> +		.mode		= 0644,
>  		.kf_ops		= &rdtgroup_kf_single_ops,
>  		.seq_show	= rdtgroup_mbm_assign_mode_show,
> +		.write		= rdtgroup_mbm_assign_mode_write,
>  		.fflags		= RFTYPE_MON_INFO,
>  	},
>  	{

Reinette


^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 23/25] x86/resctrl: Update assignments on event configuration changes
  2024-10-09 17:39 ` [PATCH v8 23/25] x86/resctrl: Update assignments on event configuration changes Babu Moger
@ 2024-10-16  3:40   ` Reinette Chatre
  2024-10-18 15:50     ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-16  3:40 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/9/24 10:39 AM, Babu Moger wrote:
> Users can modify the configuration of assignable events. Whenever the
> event configuration is updated, MBM assignments must be revised across
> all monitor groups within the impacted domains.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
...

> ---
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 49 ++++++++++++++++++++++++++
>  1 file changed, 49 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index f890d294e002..cf2e0ad0e4f4 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1669,6 +1669,7 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
>  }
>  
>  struct mon_config_info {
> +	struct rdt_resource *r;
>  	struct rdt_mon_domain *d;
>  	u32 evtid;
>  	u32 mon_config;
> @@ -1694,11 +1695,46 @@ u32 resctrl_arch_mon_event_config_get(struct rdt_mon_domain *d,
>  	return INVALID_CONFIG_VALUE;
>  }
>  
> +static void mbm_cntr_event_update(int cntr_id, unsigned int index, u32 val)
> +{
> +	union l3_qos_abmc_cfg abmc_cfg = { 0 };
> +	struct rdtgroup *prgrp, *crgrp;
> +	int update = 0;
> +
> +	/* Check if the cntr_id is associated to the event type updated */
> +	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
> +		if (prgrp->mon.cntr_id[index] == cntr_id) {
> +			abmc_cfg.split.bw_src = prgrp->mon.rmid;
> +			update = 1;
> +			goto out_update;
> +		}
> +		list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list, mon.crdtgrp_list) {
> +			if (crgrp->mon.cntr_id[index] == cntr_id) {
> +				abmc_cfg.split.bw_src = crgrp->mon.rmid;
> +				update = 1;
> +				goto out_update;
> +			}
> +		}

This code looks like it is better suited for resctrl fs. Note that
after the arch fs split struct rdtgroup is private to resctrl fs.

> +	}
> +
> +out_update:
> +	if (update) {
> +		abmc_cfg.split.cfg_en = 1;
> +		abmc_cfg.split.cntr_en = 1;
> +		abmc_cfg.split.cntr_id = cntr_id;
> +		abmc_cfg.split.bw_type = val;
> +		wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, abmc_cfg.full);
> +	}
> +}
> +
>  void resctrl_arch_mon_event_config_set(void *info)
>  {
>  	struct mon_config_info *mon_info = info;
> +	struct rdt_mon_domain *d = mon_info->d;
> +	struct rdt_resource *r = mon_info->r;
>  	struct rdt_hw_mon_domain *hw_dom;
>  	unsigned int index;
> +	int cntr_id;
>  
>  	index = mon_event_config_index_get(mon_info->evtid);
>  	if (index == INVALID_CONFIG_INDEX)
> @@ -1718,6 +1754,18 @@ void resctrl_arch_mon_event_config_set(void *info)
>  		hw_dom->mbm_local_cfg =  mon_info->mon_config;
>  		break;
>  	}
> +
> +	/*
> +	 * Update the assignment if the domain has the cntr_id's assigned
> +	 * to event type updated.
> +	 */
> +	if (resctrl_arch_mbm_cntr_assign_enabled(r)) {
> +		for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
> +			if (test_bit(cntr_id, d->mbm_cntr_map))
> +				mbm_cntr_event_update(cntr_id, index,
> +						      mon_info->mon_config);
> +		}
> +	}
>  }
>  
>  /**
> @@ -1805,6 +1853,7 @@ static void mbm_config_write_domain(struct rdt_resource *r,
>  	mon_info.d = d;
>  	mon_info.evtid = evtid;
>  	mon_info.mon_config = val;
> +	mon_info.r = r;
>  
>  	/*
>  	 * Update MSR_IA32_EVT_CFG_BASE MSR on one of the CPUs in the

If I understand correctly, mbm_config_write_domain() paints itself into a corner by
calling arch code via IPI. As seen above it needs resctrl help to get all the information
and doing so from the arch helper is not appropriate.

How about calling a resctrl fs helper via IPI instead? For example:

resctrl_mon_event_config_set() {

	resctrl_arch_mon_event_config_set();

	if (resctrl_arch_mbm_cntr_assign_enabled(r)) {
		for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
			if (test_bit(cntr_id, d->mbm_cntr_map)) {
				/* determine rmid */
				resctrl_arch_config_cntr()
			}
		}
	}
}


mbm_config_write_domain() {

	...
	smp_call_function_any(&d->hdr.cpu_mask, resctrl_mon_event_config_set, ...)
	...

}

By removing reset of arch state from resctrl_arch_config_cntr() this works well with the
resctrl_arch_reset_rmid_all() that is done from mbm_config_write_domain().
Even though resctrl_arch_config_cntr() contains a smp_call_function_any() it should
already be running on CPU in mask and thus should just run on local CPU.

Reinette



^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 24/25] x86/resctrl: Introduce interface to list assignment states of all the groups
  2024-10-09 17:39 ` [PATCH v8 24/25] x86/resctrl: Introduce interface to list assignment states of all the groups Babu Moger
@ 2024-10-16  3:40   ` Reinette Chatre
  2024-10-21 14:56     ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-16  3:40 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/9/24 10:39 AM, Babu Moger wrote:
> Provide the interface to list the assignment states of all the resctrl
> groups in mbm_cntr_assign mode.
> 
> Example:
> $mount -t resctrl resctrl /sys/fs/resctrl/
> $cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> //0=tl;1=tl;
> 
> List follows the following format:
> 
> "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
> 
> Format for specific type of groups:
> 
> - Default CTRL_MON group:
>   "//<domain_id>=<flags>"
> 
> - Non-default CTRL_MON group:
>   "<CTRL_MON group>//<domain_id>=<flags>"
> 
> - Child MON group of default CTRL_MON group:
>   "/<MON group>/<domain_id>=<flags>"
> 
> - Child MON group of non-default CTRL_MON group:
>   "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
> 
> Flags can be one of the following:
> t  MBM total event is enabled
> l  MBM local event is enabled
> tl Both total and local MBM events are enabled
> _  None of the MBM events are enabled
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v8: Moved resctrl_mbm_event_assigned() in here as it is first used here.
>     Moved rdt_last_cmd_clear() before making any call.
>     Updated the commit log.
>     Corrected the doc format.
> 
> v7: Renamed the interface name from 'mbm_control' to 'mbm_assign_control'
>     to match 'mbm_assign_mode'.
>     Removed Arch references from FS code.
>     Added rdt_last_cmd_clear() before the command processing.
>     Added rdtgroup_mutex before all the calls.
>     Removed references of ABMC from FS code.
> 
> v6: The domain specific assignment can be determined looking at mbm_cntr_map.
>     Removed rdtgroup_abmc_dom_cfg() and rdtgroup_abmc_dom_state().
>     Removed the switch statement for the domain_state detection.
>     Determined the flags incremently.
>     Removed special handling of default group while printing..
> 
> v5: Replaced "assignment flags" with "flags".
>     Changes related to mon structure.
>     Changes related renaming the interface from mbm_assign_control to
>     mbm_control.
> 
> v4: Added functionality to query domain specific assigment in.
>     rdtgroup_abmc_dom_state().
> 
> v3: New patch.
>     Addresses the feedback to provide the global assignment interface.
>     https://lore.kernel.org/lkml/c73f444b-83a1-4e9a-95d3-54c5165ee782@intel.com/
> ---
>  Documentation/arch/x86/resctrl.rst     | 44 +++++++++++++++
>  arch/x86/kernel/cpu/resctrl/monitor.c  |  1 +
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 76 ++++++++++++++++++++++++++
>  3 files changed, 121 insertions(+)
> 
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index d9574078f735..b85d3bc3e301 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -310,6 +310,50 @@ with the following files:
>  	The number of monitoring counters available for assignment when the
>  	architecture supports mbm_cntr_assign mode.
>  
> +"mbm_assign_control":
> +	Reports the resctrl group and monitor status of each group.
> +
> +	List follows the following format:
> +		"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
> +
> +	Format for specific type of groups:
> +
> +	* Default CTRL_MON group:
> +		"//<domain_id>=<flags>"
> +
> +	* Non-default CTRL_MON group:
> +		"<CTRL_MON group>//<domain_id>=<flags>"
> +
> +	* Child MON group of default CTRL_MON group:
> +		"/<MON group>/<domain_id>=<flags>"
> +
> +	* Child MON group of non-default CTRL_MON group:
> +		"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
> +
> +	Flags can be one of the following:
> +	::
> +
> +	 t  MBM total event is assigned.
> +	 l  MBM local event is assigned.
> +	 tl Both total and local MBM events are assigned.
> +	 _  None of the MBM events are assigned.
> +
> +	Examples:
> +	::
> +
> +	 # mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
> +	 # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
> +	 # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
> +
> +	 # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +	 non_default_ctrl_mon_grp//0=tl;1=tl;
> +	 non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> +	 //0=tl;1=tl;
> +	 /child_default_mon_grp/0=tl;1=tl;
> +
> +	There are four resctrl groups. All the groups have total and local MBM events
> +	assigned on domain 0 and 1.
> +
>  "max_threshold_occupancy":
>  		Read/write file provides the largest value (in
>  		bytes) at which a previously used LLC_occupancy
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 395d99984893..fa7c77935080 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -1269,6 +1269,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>  			r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
>  			resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
>  			hw_res->mbm_cntr_assign_enabled = true;
> +			resctrl_file_fflags_init("mbm_assign_control", RFTYPE_MON_INFO);
>  		}
>  	}
>  
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index cf2e0ad0e4f4..cf92ceb0f05e 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -970,6 +970,76 @@ static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
>  	return 0;
>  }
>  
> +static bool resctrl_mbm_event_assigned(struct rdtgroup *rdtg,
> +				       struct rdt_mon_domain *d, u32 evtid)

u32 -> enum resctrl_event_id ?

> +{
> +	int index = MBM_EVENT_ARRAY_INDEX(evtid);
> +	int cntr_id = rdtg->mon.cntr_id[index];
> +
> +	return cntr_id != MON_CNTR_UNSET && test_bit(cntr_id, d->mbm_cntr_map);
> +}
> +
> +static char *rdtgroup_mon_state_to_str(struct rdtgroup *rdtgrp,
> +				       struct rdt_mon_domain *d, char *str)
> +{
> +	char *tmp = str;
> +
> +	/* Query the total and local event flags for the domain */
> +	if (resctrl_mbm_event_assigned(rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID))
> +		*tmp++ = 't';
> +
> +	if (resctrl_mbm_event_assigned(rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID))
> +		*tmp++ = 'l';
> +
> +	if (tmp == str)
> +		*tmp++ = '_';
> +
> +	*tmp = '\0';
> +	return str;
> +}
> +

Reinette


^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 25/25] x86/resctrl: Introduce interface to modify assignment states of the groups
  2024-10-09 17:39 ` [PATCH v8 25/25] x86/resctrl: Introduce interface to modify assignment states of " Babu Moger
@ 2024-10-16  3:43   ` Reinette Chatre
  2024-10-21 17:04     ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-16  3:43 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/9/24 10:39 AM, Babu Moger wrote:
> Introduce the interface to assign MBM events in mbm_cntr_assign mode.
> 
> Events can be enabled or disabled by writing to file
> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> 
> Format is similar to the list format with addition of opcode for the
> assignment operation.
>  "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
> 
> Format for specific type of groups:
> 
>  * Default CTRL_MON group:
>          "//<domain_id><opcode><flags>"
> 
>  * Non-default CTRL_MON group:
>          "<CTRL_MON group>//<domain_id><opcode><flags>"
> 
>  * Child MON group of default CTRL_MON group:
>          "/<MON group>/<domain_id><opcode><flags>"
> 
>  * Child MON group of non-default CTRL_MON group:
>          "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
> 
> Domain_id '*' will apply the flags on all the domains.
> 
> Opcode can be one of the following:
> 
>  = Update the assignment to match the flags
>  + Assign a new MBM event without impacting existing assignments.
>  - Unassign a MBM event from currently assigned events.
> 
> Assignment flags can be one of the following:
>  t  MBM total event
>  l  MBM local event
>  tl Both total and local MBM events
>  _  None of the MBM events. Valid only with '=' opcode. This flag cannot
>     be combined with other flags.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v8: Moved unassign as the first action during the assign modification.
>     Assign none "_" takes priority. Cannot be mixed with other flags.
>     Updated the documentation and .rst file format. htmldoc looks ok.
> 
> v7: Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write().
>     Added mutex lock in rdtgroup_mbm_assign_control_write() while processing.
>     Renamed rdtgroup_find_grp to rdtgroup_find_grp_by_name.
>     Fixed rdtgroup_str_to_mon_state to return error for invalid flags.
>     Simplified the calls rdtgroup_assign_cntr by merging few functions earlier.
>     Removed ABMC reference in FS code.
>     Reinette commented about handling the combination of flags like 'lt_' and '_lt'.
>     Not sure if we need to change the behaviour here. Processed them sequencially right now.
>     Users have the liberty to pass the flags. Restricting it might be a problem later.
> 
> v6: Added support assign all if domain id is '*'
>     Fixed the allocation of counter id if it not assigned already.
> 
> v5: Interface name changed from mbm_assign_control to mbm_control.
>     Fixed opcode and flags combination.
>     '=_" is valid.
>     "-_" amd "+_" is not valid.
>     Minor message update.
>     Renamed the function with prefix - rdtgroup_.
>     Corrected few documentation mistakes.
>     Rebase related changes after SNC support.
> 
> v4: Added domain specific assignments. Fixed the opcode parsing.
> 
> v3: New patch.
>     Addresses the feedback to provide the global assignment interface.
>     https://lore.kernel.org/lkml/c73f444b-83a1-4e9a-95d3-54c5165ee782@intel.com/
> ---
>  Documentation/arch/x86/resctrl.rst     | 115 +++++++++++-
>  arch/x86/kernel/cpu/resctrl/internal.h |  10 ++
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 233 ++++++++++++++++++++++++-
>  3 files changed, 356 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index b85d3bc3e301..77bb0b095127 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -336,7 +336,8 @@ with the following files:
>  	 t  MBM total event is assigned.
>  	 l  MBM local event is assigned.
>  	 tl Both total and local MBM events are assigned.
> -	 _  None of the MBM events are assigned.
> +	 _  None of the MBM events are assigned. Only works with opcode '=' for write
> +	    and cannot be combined with other flags.
>  
>  	Examples:
>  	::
> @@ -354,6 +355,118 @@ with the following files:
>  	There are four resctrl groups. All the groups have total and local MBM events
>  	assigned on domain 0 and 1.
>  
> +	Assignment state can be updated by writing to the interface.
> +
> +	Format is similar to the list format with addition of opcode for the
> +	assignment operation.
> +
> +		"<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
> +
> +	Format for each type of groups:
> +
> +        * Default CTRL_MON group:
> +                "//<domain_id><opcode><flags>"
> +
> +        * Non-default CTRL_MON group:
> +                "<CTRL_MON group>//<domain_id><opcode><flags>"
> +
> +        * Child MON group of default CTRL_MON group:
> +                "/<MON group>/<domain_id><opcode><flags>"
> +
> +        * Child MON group of non-default CTRL_MON group:
> +                "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
> +
> +	Domain_id '*' will apply the flags on all the domains.
> +
> +	Opcode can be one of the following:
> +	::
> +
> +	 = Update the assignment to match the MBM event.
> +	 + Assign a new MBM event without impacting existing assignments.
> +	 - Unassign a MBM event from currently assigned events.
> +
> +	Examples:
> +	Initial group status:
> +	::
> +
> +	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +	  non_default_ctrl_mon_grp//0=tl;1=tl;
> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> +	  //0=tl;1=tl;
> +	  /child_default_mon_grp/0=tl;1=tl;
> +
> +	To update the default group to assign only total MBM event on domain 0:
> +	::
> +
> +	  # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +
> +	Assignment status after the update:
> +	::
> +
> +	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +	  non_default_ctrl_mon_grp//0=tl;1=tl;
> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> +	  //0=t;1=tl;
> +	  /child_default_mon_grp/0=tl;1=tl;
> +
> +	To update the MON group child_default_mon_grp to remove total MBM event on domain 1:
> +	::
> +
> +	  # echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +
> +	Assignment status after the update:
> +	::
> +
> +	  $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control

Please be consistent by always using "# cat", not sometimes "$ cat" as above.

> +	  non_default_ctrl_mon_grp//0=tl;1=tl;
> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> +	  //0=t;1=tl;
> +	  /child_default_mon_grp/0=tl;1=l;
> +
> +	To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to unassign
> +	both local and total MBM events on domain 1:
> +	::
> +
> +	  # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
> +			/sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +
> +	Assignment status after the update:
> +	::
> +

Missing "# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control"

> +	  non_default_ctrl_mon_grp//0=tl;1=tl;
> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
> +	  //0=t;1=tl;
> +	  /child_default_mon_grp/0=tl;1=l;
> +
> +	To update the default group to add a local MBM event domain 0.
> +	::
> +
> +	  # echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +
> +	Assignment status after the update:
> +	::
> +
> +	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +	  non_default_ctrl_mon_grp//0=tl;1=tl;
> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
> +	  //0=tl;1=tl;
> +	  /child_default_mon_grp/0=tl;1=l;
> +
> +	To update the non default CTRL_MON group non_default_ctrl_mon_grp to unassign all the
> +	MBM events on all the domains.
> +	::
> +
> +	  # echo "non_default_ctrl_mon_grp//*=_" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +
> +	Assignment status after the update:
> +	::
> +
> +	  #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control

Please be consistent with spacing "# cat" vs "#cat". This is very noticeable when
viewing the formatted docs.

> +	  non_default_ctrl_mon_grp//0=_;1=_;
> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
> +	  //0=tl;1=tl;
> +	  /child_default_mon_grp/0=tl;1=l;
> +
>  "max_threshold_occupancy":
>  		Read/write file provides the largest value (in
>  		bytes) at which a previously used LLC_occupancy
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index a6f40d3115f4..e8d6a430dc4a 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -74,6 +74,16 @@
>   */
>  #define MBM_EVENT_ARRAY_INDEX(_event) ((_event) - 2)
>  
> +/*
> + * Assignment flags for mbm_cntr_assign feature
> + */
> +enum {
> +	ASSIGN_NONE	= 0,
> +	ASSIGN_TOTAL	= BIT(QOS_L3_MBM_TOTAL_EVENT_ID),
> +	ASSIGN_LOCAL	= BIT(QOS_L3_MBM_LOCAL_EVENT_ID),
> +	ASSIGN_INVALID,
> +};
> +
>  /**
>   * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
>   *			        aren't marked nohz_full
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index cf92ceb0f05e..6095146e3ba4 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1040,6 +1040,236 @@ static int rdtgroup_mbm_assign_control_show(struct kernfs_open_file *of,
>  	return 0;
>  }
>  
> +static int rdtgroup_str_to_mon_state(char *flag)
> +{
> +	int i, mon_state = ASSIGN_NONE;
> +
> +	for (i = 0; i < strlen(flag); i++) {
> +		switch (*(flag + i)) {
> +		case 't':
> +			mon_state |= ASSIGN_TOTAL;
> +			break;
> +		case 'l':
> +			mon_state |= ASSIGN_LOCAL;
> +			break;
> +		case '_':
> +			return ASSIGN_NONE;
> +		default:
> +			return ASSIGN_INVALID;
> +		}
> +	}
> +
> +	return mon_state;
> +}
> +
> +static struct rdtgroup *rdtgroup_find_grp_by_name(enum rdt_group_type rtype,
> +						  char *p_grp, char *c_grp)
> +{
> +	struct rdtgroup *rdtg, *crg;
> +
> +	if (rtype == RDTCTRL_GROUP && *p_grp == '\0') {
> +		return &rdtgroup_default;
> +	} else if (rtype == RDTCTRL_GROUP) {
> +		list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list)
> +			if (!strcmp(p_grp, rdtg->kn->name))
> +				return rdtg;
> +	} else if (rtype == RDTMON_GROUP) {
> +		list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
> +			if (!strcmp(p_grp, rdtg->kn->name)) {
> +				list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
> +						    mon.crdtgrp_list) {
> +					if (!strcmp(c_grp, crg->kn->name))
> +						return crg;
> +				}
> +			}
> +		}
> +	}
> +
> +	return NULL;
> +}
> +
> +static int rdtgroup_process_flags(struct rdt_resource *r,
> +				  enum rdt_group_type rtype,
> +				  char *p_grp, char *c_grp, char *tok)
> +{
> +	int op, mon_state, assign_state, unassign_state;
> +	char *dom_str, *id_str, *op_str;
> +	struct rdt_mon_domain *d;
> +	struct rdtgroup *rdtgrp;
> +	unsigned long dom_id;
> +	int ret, found = 0;
> +
> +	rdtgrp = rdtgroup_find_grp_by_name(rtype, p_grp, c_grp);
> +
> +	if (!rdtgrp) {
> +		rdt_last_cmd_puts("Not a valid resctrl group\n");
> +		return -EINVAL;
> +	}
> +
> +next:
> +	if (!tok || tok[0] == '\0')
> +		return 0;
> +
> +	/* Start processing the strings for each domain */
> +	dom_str = strim(strsep(&tok, ";"));
> +
> +	op_str = strpbrk(dom_str, "=+-");
> +
> +	if (op_str) {
> +		op = *op_str;
> +	} else {
> +		rdt_last_cmd_puts("Missing operation =, +, - character\n");
> +		return -EINVAL;
> +	}
> +
> +	id_str = strsep(&dom_str, "=+-");
> +
> +	/* Check for domain id '*' which means all domains */
> +	if (id_str && *id_str == '*') {
> +		d = NULL;
> +		goto check_state;
> +	} else if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
> +		rdt_last_cmd_puts("Missing domain id\n");
> +		return -EINVAL;
> +	}
> +
> +	/* Verify if the dom_id is valid */
> +	list_for_each_entry(d, &r->mon_domains, hdr.list) {
> +		if (d->hdr.id == dom_id) {
> +			found = 1;
> +			break;
> +		}
> +	}
> +
> +	if (!found) {
> +		rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
> +		return -EINVAL;
> +	}
> +
> +check_state:
> +	mon_state = rdtgroup_str_to_mon_state(dom_str);
> +
> +	if (mon_state == ASSIGN_INVALID) {
> +		rdt_last_cmd_puts("Invalid assign flag\n");
> +		goto out_fail;
> +	}
> +
> +	assign_state = 0;
> +	unassign_state = 0;
> +
> +	switch (op) {
> +	case '+':
> +		if (mon_state == ASSIGN_NONE) {
> +			rdt_last_cmd_puts("Invalid assign opcode\n");
> +			goto out_fail;
> +		}
> +		assign_state = mon_state;
> +		break;
> +	case '-':
> +		if (mon_state == ASSIGN_NONE) {
> +			rdt_last_cmd_puts("Invalid assign opcode\n");
> +			goto out_fail;
> +		}
> +		unassign_state = mon_state;
> +		break;
> +	case '=':
> +		assign_state = mon_state;
> +		unassign_state = (ASSIGN_TOTAL | ASSIGN_LOCAL) & ~assign_state;
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	if (unassign_state & ASSIGN_TOTAL) {
> +		ret = rdtgroup_unassign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID);
> +		if (ret)
> +			goto out_fail;
> +	}
> +
> +	if (unassign_state & ASSIGN_LOCAL) {
> +		ret = rdtgroup_unassign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID);
> +		if (ret)
> +			goto out_fail;
> +	}
> +
> +	if (assign_state & ASSIGN_TOTAL) {
> +		ret = rdtgroup_assign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID);
> +		if (ret)
> +			goto out_fail;
> +	}
> +
> +	if (assign_state & ASSIGN_LOCAL) {
> +		ret = rdtgroup_assign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID);
> +		if (ret)
> +			goto out_fail;
> +	}
> +
> +	goto next;
> +
> +out_fail:

Is it possible to print a message to the command status to give some details about which
request failed? I am wondering about a scenario where a user changes multiple domains of
multiple groups, since the operation does not undo changes, it will fail without information
to user space about which setting triggered the failure and which settings succeeded.
This is similar to what is done when user attempts to move several tasks ... the error will
indicate which task triggered failure so that user space knows what completed successfully.

> +
> +	return -EINVAL;
> +}
> +
> +static ssize_t rdtgroup_mbm_assign_control_write(struct kernfs_open_file *of,
> +						 char *buf, size_t nbytes, loff_t off)
> +{
> +	struct rdt_resource *r = of->kn->parent->priv;
> +	char *token, *cmon_grp, *mon_grp;
> +	enum rdt_group_type rtype;
> +	int ret;
> +
> +	/* Valid input requires a trailing newline */
> +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
> +		return -EINVAL;
> +
> +	buf[nbytes - 1] = '\0';
> +
> +	cpus_read_lock();
> +	mutex_lock(&rdtgroup_mutex);
> +
> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
> +		rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");

Writing to last_cmd_status_buf here ...

> +		mutex_unlock(&rdtgroup_mutex);
> +		cpus_read_unlock();
> +		return -EINVAL;
> +	}
> +
> +	rdt_last_cmd_clear();

... but initializing buffer here. 
Sidenote: This was an issue before. If you receive comments about
items in patches, please do check if those comments apply to other patches also.

> +
> +	while ((token = strsep(&buf, "\n")) != NULL) {
> +		if (strstr(token, "/")) {
> +			/*
> +			 * The write command follows the following format:
> +			 * “<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>”
> +			 * Extract the CTRL_MON group.
> +			 */
> +			cmon_grp = strsep(&token, "/");
> +
> +			/*
> +			 * Extract the MON_GROUP.
> +			 * strsep returns empty string for contiguous delimiters.
> +			 * Empty mon_grp here means it is a RDTCTRL_GROUP.
> +			 */
> +			mon_grp = strsep(&token, "/");
> +
> +			if (*mon_grp == '\0')
> +				rtype = RDTCTRL_GROUP;
> +			else
> +				rtype = RDTMON_GROUP;
> +
> +			ret = rdtgroup_process_flags(r, rtype, cmon_grp, mon_grp, token);
> +			if (ret)
> +				break;
> +		}
> +	}
> +
> +	mutex_unlock(&rdtgroup_mutex);
> +	cpus_read_unlock();
> +
> +	return ret ?: nbytes;
> +}
> +
>  #ifdef CONFIG_PROC_CPU_RESCTRL
>  
>  /*
> @@ -2328,9 +2558,10 @@ static struct rftype res_common_files[] = {
>  	},
>  	{
>  		.name		= "mbm_assign_control",
> -		.mode		= 0444,
> +		.mode		= 0644,
>  		.kf_ops		= &rdtgroup_kf_single_ops,
>  		.seq_show	= rdtgroup_mbm_assign_control_show,
> +		.write		= rdtgroup_mbm_assign_control_write,
>  	},
>  	{
>  		.name		= "cpus_list",

On a high level this looks ok but this code needs to be more robust. This will parse
data from user space that may include all kinds of input ... think malicious user or
a buggy script. I am not able to test this code but I tried to work through what will
happen under some wrong input and found some issues. For example, if user space provides
input like '//\n' then rdtgroup_process_flags() will be called with token == NULL. This will
result in rdtgroup_process_flags() returning "success", but fortunately do nothing, for
this invalid input. A more severe example is with input like '//0=\n', from what I can tell
this will result in rdtgroup_str_to_mon_state() called with dom_str==NULL that will treat
this as ASSIGN_NONE and proceed as if user provided '//0=_'.
This was just some scenarios with basic input that could be typos, no real stress tests.
I stopped here though since I believe it is already clear this needs to be more robust.
Please do test this interface by exercising it with invalid input and corner cases.

Reinette


^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 07/25] x86/resctrl: Introduce the interface to display monitor mode
  2024-10-16  3:12   ` Reinette Chatre
@ 2024-10-16 15:57     ` Moger, Babu
  2024-10-16 16:25       ` Reinette Chatre
  0 siblings, 1 reply; 124+ messages in thread
From: Moger, Babu @ 2024-10-16 15:57 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette

On 10/15/24 22:12, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/9/24 10:39 AM, Babu Moger wrote:
>> Introduce the interface file "mbm_assign_mode" to list monitor modes
>> supported.
>>
>> The "mbm_cntr_assign" mode provides the option to assign a counter to
>> an RMID, event pair and monitor the bandwidth as long as it is assigned.
>>
>> On AMD systems "mbm_cntr_assign" is backed by the ABMC (Assignable
>> Bandwidth Monitoring Counters) hardware feature and is enabled by default.
>>
>> The "default" mode is the existing monitoring mode that works without the
>> explicit counter assignment, instead relying on dynamic counter assignment
>> by hardware that may result in hardware not dedicating a counter resulting
>> in monitoring data reads returning "Unavailable".
>>
>> Provide an interface to display the monitor mode on the system.
>> $cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> [mbm_cntr_assign]
>> default
>>
>> Switching the mbm_assign_mode will reset all the MBM counters of all
>> resctrl groups.
> 
> Please note that this now contradicts the documentation. Perhaps this sentence
> can just be dropped since there is the documentation within the patch.	

Sure. Will drop it.

> 
> 
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> 
> ...
> 
>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>> index 30586728a4cd..e4a7d6e815f6 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -257,6 +257,40 @@ with the following files:
>>  	    # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
>>  	    0=0x30;1=0x30;3=0x15;4=0x15
>>  
>> +"mbm_assign_mode":
>> +	Reports the list of monitoring modes supported. The enclosed brackets
>> +	indicate which mode is enabled.
>> +	::
>> +
>> +	  cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> +	  [mbm_cntr_assign]
>> +	  default
>> +
>> +	"mbm_cntr_assign":
>> +
>> +	In mbm_cntr_assign mode user-space is able to specify which control
>> +	or monitor groups in resctrl should have a counter assigned using the
> 
> Counters cannot be assigned to control groups. How about replacing all instances
> of "control and monitor groups" with "CTRL_MON and MON groups", similarly
> "control or monitor groups" with "CTRL_MON or MON groups".

Ok.

> 
>> +	'mbm_assign_control' file. The number of counters available is described
> 
> Looking at the rest of the doc it seems that the custom is actually to place
> filenames in double quotes, like "mbm_assign_control".

Sure.

> 
>> +	in the 'num_mbm_cntrs' file. Changing the mode may cause all counters on
>> +	a resource to reset.
>> +
>> +	The mode is useful on platforms which support more control and monitor
>> +	groups than hardware counters, meaning 'unassigned' control or monitor
>> +	groups will report 'Unavailable' or count the traffic in an unpredictable
>> +	way.
> 
> Note two more instances of "control groups" above.
> 
> Please note that the above description implies that counter assignment is per-group. For
> example, "specify which control	or monitor groups in resctrl should have a counter
> assigned" and "useful on platforms which support more control and monitor groups
> than hardware counters". This needs to be reworked to reflect that counters
> are assigned to events.

How about this?

The mode is useful on platforms which support more CTRL_MON and MON groups
than the hardware counters, meaning 'unassigned' events on CTRL_MON or MON
groups will report 'Unavailable' or count the traffic in an unpredictable
way.


> 
>> +
>> +	AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature
>> +	enable this mode by default so that counters remain assigned even when the
>> +	corresponding RMID is not in use by any processor.
> 
> I assume this should remain RMID since this specifically talks about an x86 system?

This was a suggestion from James. Let me know if you want me to change.

> 
>> +
>> +	"default":
>> +
>> +	By default resctrl assumes each control and monitor group has a hardware
>> +	counter. Hardware that does not support 'mbm_cntr_assign' mode will still
>> +	allow more control or monitor groups than 'num_rmids' to be created. In
>> +	that case reading the mbm_total_bytes and mbm_local_bytes may report
>> +	'Unavailable' if there is no counter associated with that group.
>> +
> 
> I reconsidered my earlier suggestion and I believe it needs a correction since
> counter assignment is not per group:
> 
> 	In default mode resctrl assumes there is a hardware counter for each
> 	event within every CTRL_MON and MON group. Reading mbm_total_bytes or
> 	mbm_local_bytes may report 'Unavailable' if there is no counter associated
> 	with that event.
> 
> Please feel free to improve.

Looks good.

> 
>>  "max_threshold_occupancy":
>>  		Read/write file provides the largest value (in
>>  		bytes) at which a previously used LLC_occupancy
> 
> The code change looks good to me.

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 07/25] x86/resctrl: Introduce the interface to display monitor mode
  2024-10-16 15:57     ` Moger, Babu
@ 2024-10-16 16:25       ` Reinette Chatre
  0 siblings, 0 replies; 124+ messages in thread
From: Reinette Chatre @ 2024-10-16 16:25 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/16/24 8:57 AM, Moger, Babu wrote:
> On 10/15/24 22:12, Reinette Chatre wrote:
>> On 10/9/24 10:39 AM, Babu Moger wrote:

>>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>>> index 30586728a4cd..e4a7d6e815f6 100644
>>> --- a/Documentation/arch/x86/resctrl.rst
>>> +++ b/Documentation/arch/x86/resctrl.rst
>>> @@ -257,6 +257,40 @@ with the following files:
>>>  	    # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
>>>  	    0=0x30;1=0x30;3=0x15;4=0x15
>>>  
>>> +"mbm_assign_mode":
>>> +	Reports the list of monitoring modes supported. The enclosed brackets
>>> +	indicate which mode is enabled.
>>> +	::
>>> +
>>> +	  cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>>> +	  [mbm_cntr_assign]
>>> +	  default
>>> +
>>> +	"mbm_cntr_assign":
>>> +
>>> +	In mbm_cntr_assign mode user-space is able to specify which control
>>> +	or monitor groups in resctrl should have a counter assigned using the
>>
>> Counters cannot be assigned to control groups. How about replacing all instances
>> of "control and monitor groups" with "CTRL_MON and MON groups", similarly
>> "control or monitor groups" with "CTRL_MON or MON groups".
> 
> Ok.
> 
>>
>>> +	'mbm_assign_control' file. The number of counters available is described
>>
>> Looking at the rest of the doc it seems that the custom is actually to place
>> filenames in double quotes, like "mbm_assign_control".
> 
> Sure.
> 
>>
>>> +	in the 'num_mbm_cntrs' file. Changing the mode may cause all counters on
>>> +	a resource to reset.
>>> +
>>> +	The mode is useful on platforms which support more control and monitor
>>> +	groups than hardware counters, meaning 'unassigned' control or monitor
>>> +	groups will report 'Unavailable' or count the traffic in an unpredictable
>>> +	way.
>>
>> Note two more instances of "control groups" above.
>>
>> Please note that the above description implies that counter assignment is per-group. For
>> example, "specify which control	or monitor groups in resctrl should have a counter
>> assigned" and "useful on platforms which support more control and monitor groups
>> than hardware counters". This needs to be reworked to reflect that counters
>> are assigned to events.
> 
> How about this?
> 
> The mode is useful on platforms which support more CTRL_MON and MON groups
> than the hardware counters, meaning 'unassigned' events on CTRL_MON or MON
> groups will report 'Unavailable' or count the traffic in an unpredictable
> way.

This rewrites the second paragraph of the section about "mbm_cntr_assign". It is
not clear to me how this section will end up looking since the first paragraph still
seems to refer to counters being assigned to groups ("specify which control or monitor
groups in resctrl should have a counter assigned") while the later addition
to this section by "x86/resctrl: Introduce the interface to switch between monitor
modes" starts by specifying how counters are assigned to the MBM events ("The MBM
events (mbm_total_bytes and/or mbm_local_bytes) associated counters").

>>> +
>>> +	AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature
>>> +	enable this mode by default so that counters remain assigned even when the
>>> +	corresponding RMID is not in use by any processor.
>>
>> I assume this should remain RMID since this specifically talks about an x86 system?
> 
> This was a suggestion from James. Let me know if you want me to change.

I can proceed to assume this is a paragraph intended to be x86 specific. No need to change.

Reinette



^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 09/25] x86/resctrl: Add __init attribute to dom_data_init()
  2024-10-16  3:13   ` Reinette Chatre
@ 2024-10-16 17:32     ` Moger, Babu
  2024-10-16 18:55       ` Reinette Chatre
  0 siblings, 1 reply; 124+ messages in thread
From: Moger, Babu @ 2024-10-16 17:32 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse



On 10/15/24 22:13, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/9/24 10:39 AM, Babu Moger wrote:
>> dom_data_init() is only called during the __init sequence.
>> Add __init attribute like the rest of call sequence.
>>
>> While at it, pass 'struct rdt_resource' to dom_data_init() and
>> dom_data_exit() which will be used for mbm counter __init and__exit
>> call sequence.
> 
> This patch needs to be split. Please move fixes to beginning of series and
> move the addition of the parameter to the patch where it is first used/needed.

Sure. Will move the fixes to the beginning.

> 
>>
>> Fixes: bd334c86b5d7 ("x86/resctrl: Add __init attribute to rdt_get_mon_l3_config()")
> 
> For this change I think the following Fixes tag would be more accurate:
> Fixes: 6a445edce657 ("x86/intel_rdt/cqm: Add RDT monitoring initialization")
> 
> I think for a complete fix of the above commit it also needs to add __init
> storage class to l3_mon_evt_init().

Yes. Sure.

> 
> The __init storage class is also missing from rdt_get_mon_l3_config() ...

1 internal.h _int rdt_get_mon_l3_config(struct rdt_resource *r);
2 monitor.c  int __init rdt_get_mon_l3_config(struct rdt_resource *r)

rdt_get_mon_l3_config() has __init attribute already. But prototype in
internal.h does not add the '__init'. Looks like that is ok.


> fixing that would indeed need the Fixes tag below:
> Fixes: bd334c86b5d7 ("x86/resctrl: Add __init attribute to rdt_get_mon_l3_config()"

How about addressing both dom_data_init() and l3_mon_evt_init() in a
single patch and adding 2 fixes flags?

Fixes: 6a445edce657 ("x86/intel_rdt/cqm: Add RDT monitoring initialization")
Fixes: bd334c86b5d7 ("x86/resctrl: Add __init attribute to
rdt_get_mon_l3_config()")
-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 09/25] x86/resctrl: Add __init attribute to dom_data_init()
  2024-10-16 17:32     ` Moger, Babu
@ 2024-10-16 18:55       ` Reinette Chatre
  2024-10-16 20:18         ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-16 18:55 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/16/24 10:32 AM, Moger, Babu wrote:
> 
> 
> On 10/15/24 22:13, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 10/9/24 10:39 AM, Babu Moger wrote:
>>> dom_data_init() is only called during the __init sequence.
>>> Add __init attribute like the rest of call sequence.
>>>
>>> While at it, pass 'struct rdt_resource' to dom_data_init() and
>>> dom_data_exit() which will be used for mbm counter __init and__exit
>>> call sequence.
>>
>> This patch needs to be split. Please move fixes to beginning of series and
>> move the addition of the parameter to the patch where it is first used/needed.
> 
> Sure. Will move the fixes to the beginning.
> 
>>
>>>
>>> Fixes: bd334c86b5d7 ("x86/resctrl: Add __init attribute to rdt_get_mon_l3_config()")
>>
>> For this change I think the following Fixes tag would be more accurate:
>> Fixes: 6a445edce657 ("x86/intel_rdt/cqm: Add RDT monitoring initialization")
>>
>> I think for a complete fix of the above commit it also needs to add __init
>> storage class to l3_mon_evt_init().
> 
> Yes. Sure.
> 
>>
>> The __init storage class is also missing from rdt_get_mon_l3_config() ...
> 
> 1 internal.h _int rdt_get_mon_l3_config(struct rdt_resource *r);
> 2 monitor.c  int __init rdt_get_mon_l3_config(struct rdt_resource *r)
> 
> rdt_get_mon_l3_config() has __init attribute already. But prototype in
> internal.h does not add the '__init'. Looks like that is ok.

I also think it may technically be ok since as far as I understand attributes
the attributes will be merged. Even so, doing so does not match the current
style where the storage class of declaration and definition are the same. See
for example the partner function rdt_put_mon_l3_config().

> 
> 
>> fixing that would indeed need the Fixes tag below:
>> Fixes: bd334c86b5d7 ("x86/resctrl: Add __init attribute to rdt_get_mon_l3_config()"
> 
> How about addressing both dom_data_init() and l3_mon_evt_init() in a
> single patch and adding 2 fixes flags?

... and add __init to declaration of rdt_get_mon_l3_config() ?

> 
> Fixes: 6a445edce657 ("x86/intel_rdt/cqm: Add RDT monitoring initialization")
> Fixes: bd334c86b5d7 ("x86/resctrl: Add __init attribute to
> rdt_get_mon_l3_config()")

Reinette

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 09/25] x86/resctrl: Add __init attribute to dom_data_init()
  2024-10-16 18:55       ` Reinette Chatre
@ 2024-10-16 20:18         ` Moger, Babu
  0 siblings, 0 replies; 124+ messages in thread
From: Moger, Babu @ 2024-10-16 20:18 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	kirill.shutemov, jithu.joseph, kai.huang, kan.liang,
	daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/16/24 13:55, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/16/24 10:32 AM, Moger, Babu wrote:
>>
>>
>> On 10/15/24 22:13, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 10/9/24 10:39 AM, Babu Moger wrote:
>>>> dom_data_init() is only called during the __init sequence.
>>>> Add __init attribute like the rest of call sequence.
>>>>
>>>> While at it, pass 'struct rdt_resource' to dom_data_init() and
>>>> dom_data_exit() which will be used for mbm counter __init and__exit
>>>> call sequence.
>>>
>>> This patch needs to be split. Please move fixes to beginning of series and
>>> move the addition of the parameter to the patch where it is first used/needed.
>>
>> Sure. Will move the fixes to the beginning.
>>
>>>
>>>>
>>>> Fixes: bd334c86b5d7 ("x86/resctrl: Add __init attribute to rdt_get_mon_l3_config()")
>>>
>>> For this change I think the following Fixes tag would be more accurate:
>>> Fixes: 6a445edce657 ("x86/intel_rdt/cqm: Add RDT monitoring initialization")
>>>
>>> I think for a complete fix of the above commit it also needs to add __init
>>> storage class to l3_mon_evt_init().
>>
>> Yes. Sure.
>>
>>>
>>> The __init storage class is also missing from rdt_get_mon_l3_config() ...
>>
>> 1 internal.h _int rdt_get_mon_l3_config(struct rdt_resource *r);
>> 2 monitor.c  int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>>
>> rdt_get_mon_l3_config() has __init attribute already. But prototype in
>> internal.h does not add the '__init'. Looks like that is ok.
> 
> I also think it may technically be ok since as far as I understand attributes
> the attributes will be merged. Even so, doing so does not match the current
> style where the storage class of declaration and definition are the same. See
> for example the partner function rdt_put_mon_l3_config().

Sure.

> 
>>
>>
>>> fixing that would indeed need the Fixes tag below:
>>> Fixes: bd334c86b5d7 ("x86/resctrl: Add __init attribute to rdt_get_mon_l3_config()"
>>
>> How about addressing both dom_data_init() and l3_mon_evt_init() in a
>> single patch and adding 2 fixes flags?
> 
> ... and add __init to declaration of rdt_get_mon_l3_config() ?

Sure. Will do.

> 
>>
>> Fixes: 6a445edce657 ("x86/intel_rdt/cqm: Add RDT monitoring initialization")
>> Fixes: bd334c86b5d7 ("x86/resctrl: Add __init attribute to
>> rdt_get_mon_l3_config()")
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 10/25] x86/resctrl: Introduce bitmap mbm_cntr_free_map to track assignable counters
  2024-10-16  3:14   ` Reinette Chatre
@ 2024-10-17 16:55     ` Moger, Babu
  0 siblings, 0 replies; 124+ messages in thread
From: Moger, Babu @ 2024-10-17 16:55 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	kirill.shutemov, jithu.joseph, kai.huang, kan.liang,
	daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/15/24 22:14, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/9/24 10:39 AM, Babu Moger wrote:
>> Hardware provides a set of counters when mbm_assign_mode is supported.
>> These counters are assigned to the MBM monitoring events of a MON group
>> that needs to be tracked. The kernel must manage and track the available
>> counters.
>>
>> Introduce mbm_cntr_free_map bitmap to track available counters and set
>> of routines to allocate and free the counters. Move dom_data_init() after
>> mbm_cntr_assign detection.
> 
> Regarding "Move dom_data_init() after mbm_cntr_assign detection." - this is
> clear from the patch, please use changelog to explain *why*.

Will change it to.

dom_data_init() requires mbm_cntr_assign state to initialize
mbm_cntr_free_map bitmap. Move dom_data_init() after mbm_cntr_assign
detection.

> 
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> 
> 
>> ---
>>  arch/x86/kernel/cpu/resctrl/internal.h |  2 ++
>>  arch/x86/kernel/cpu/resctrl/monitor.c  | 43 +++++++++++++++++++++++---
>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 19 ++++++++++++
>>  include/linux/resctrl.h                |  2 ++
>>  4 files changed, 62 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 92eae4672312..99f9103a35ba 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -654,6 +654,8 @@ void __check_limbo(struct rdt_mon_domain *d, bool force_free);
>>  void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>>  void __init resctrl_file_fflags_init(const char *config,
>>  				     unsigned long fflags);
>> +int mbm_cntr_alloc(struct rdt_resource *r);
>> +void mbm_cntr_free(struct rdt_resource *r, u32 cntr_id);
>>  void rdt_staged_configs_clear(void);
>>  bool closid_allocated(unsigned int closid);
>>  int resctrl_find_cleanest_closid(void);
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 66b06574f660..5c2a28565747 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -983,6 +983,27 @@ void mbm_setup_overflow_handler(struct rdt_mon_domain *dom, unsigned long delay_
>>  		schedule_delayed_work_on(cpu, &dom->mbm_over, delay);
>>  }
>>  
>> +/*
>> + * Counter bitmap for tracking the available counters.
>> + * 'mbm_cntr_assign' mode provides set of hardware counters for assigning
>> + * RMID, event pair. Each RMID and event pair takes one hardware counter.
>> + */
> 
> "counters for assigning RMID, event pair" sounds strange and it seems like the same
> thing is mentioned twice.
> How about:
> 	Bitmap tracking the available hardware counters when operating in
> 	"mbm_cntr_assign" mode. A hardware counter can be assigned to a
> 	RMID, event pair.


Sure.

> 
>> +static __init unsigned long *mbm_cntrs_init(struct rdt_resource *r)
>> +{
>> +	r->mon.mbm_cntr_free_map = bitmap_zalloc(r->mon.num_mbm_cntrs,
>> +						 GFP_KERNEL);
>> +	if (r->mon.mbm_cntr_free_map)
>> +		bitmap_fill(r->mon.mbm_cntr_free_map, r->mon.num_mbm_cntrs);
>> +
>> +	return r->mon.mbm_cntr_free_map;
>> +}
>> +
>> +static  __exit void mbm_cntrs_exit(struct rdt_resource *r)
>> +{
>> +	bitmap_free(r->mon.mbm_cntr_free_map);
>> +	r->mon.mbm_cntr_free_map = NULL;
>> +}
>> +
>>  static __init int dom_data_init(struct rdt_resource *r)
>>  {
>>  	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
>> @@ -1020,6 +1041,17 @@ static __init int dom_data_init(struct rdt_resource *r)
>>  		goto out_unlock;
>>  	}
>>  
>> +	if (r->mon.mbm_cntr_assignable && !mbm_cntrs_init(r)) {
>> +		if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
>> +			kfree(closid_num_dirty_rmid);
>> +			closid_num_dirty_rmid = NULL;
>> +		}
>> +		kfree(rmid_ptrs);
>> +		rmid_ptrs = NULL;
>> +		err = -ENOMEM;
>> +		goto out_unlock;
>> +	}
>> +
>>  	for (i = 0; i < idx_limit; i++) {
>>  		entry = &rmid_ptrs[i];
>>  		INIT_LIST_HEAD(&entry->list);
>> @@ -1056,6 +1088,9 @@ static void __exit dom_data_exit(struct rdt_resource *r)
>>  	kfree(rmid_ptrs);
>>  	rmid_ptrs = NULL;
>>  
>> +	if (r->mon.mbm_cntr_assignable)
>> +		mbm_cntrs_exit(r);
>> +
>>  	mutex_unlock(&rdtgroup_mutex);
>>  }
>>  
>> @@ -1210,10 +1245,6 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>>  	 */
>>  	resctrl_rmid_realloc_threshold = resctrl_arch_round_mon_val(threshold);
>>  
>> -	ret = dom_data_init(r);
>> -	if (ret)
>> -		return ret;
>> -
>>  	if (rdt_cpu_has(X86_FEATURE_BMEC)) {
>>  		u32 eax, ebx, ecx, edx;
>>  
>> @@ -1240,6 +1271,10 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>>  		}
>>  	}
>>  
>> +	ret = dom_data_init(r);
>> +	if (ret)
>> +		return ret;
>> +
>>  	l3_mon_evt_init(r);
>>  
>>  	r->mon_capable = true;
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index c48b5450e6c2..8ffebd203c31 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -185,6 +185,25 @@ bool closid_allocated(unsigned int closid)
>>  	return !test_bit(closid, &closid_free_map);
>>  }
>>  
>> +int mbm_cntr_alloc(struct rdt_resource *r)
>> +{
>> +	int cntr_id;
>> +
>> +	cntr_id = find_first_bit(r->mon.mbm_cntr_free_map,
>> +				 r->mon.num_mbm_cntrs);
>> +	if (cntr_id >= r->mon.num_mbm_cntrs)
>> +		return -ENOSPC;
>> +
>> +	__clear_bit(cntr_id, r->mon.mbm_cntr_free_map);
>> +
>> +	return cntr_id;
>> +}
>> +
>> +void mbm_cntr_free(struct rdt_resource *r, u32 cntr_id)
>> +{
>> +	__set_bit(cntr_id, r->mon.mbm_cntr_free_map);
>> +}
>> +
>>  /**
>>   * rdtgroup_mode_by_closid - Return mode of resource group with closid
>>   * @closid: closid if the resource group
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index f11d6fdfd977..5a4d6adec974 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -187,12 +187,14 @@ enum resctrl_scope {
>>   * @num_rmid:		Number of RMIDs available
>>   * @num_mbm_cntrs:	Number of assignable monitoring counters
>>   * @mbm_cntr_assignable:Is system capable of supporting monitor assignment?
>> + * @mbm_cntr_free_map:	bitmap of free MBM counters
>>   * @evt_list:		List of monitoring events
>>   */
> 
> Please follow custom of existing doc and have description start with capital letter.

Sure.

> 
>>  struct resctrl_mon {
>>  	int			num_rmid;
>>  	int			num_mbm_cntrs;
>>  	bool			mbm_cntr_assignable;
>> +	unsigned long		*mbm_cntr_free_map;
>>  	struct list_head	evt_list;
>>  };
>>  
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 12/25] x86/resctrl: Remove MSR reading of event configuration value
  2024-10-16  3:16   ` Reinette Chatre
@ 2024-10-17 17:59     ` Moger, Babu
  0 siblings, 0 replies; 124+ messages in thread
From: Moger, Babu @ 2024-10-17 17:59 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/15/24 22:16, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/9/24 10:39 AM, Babu Moger wrote:
>> The event configuration is domain specific and initialized during domain
>> initialization. The values are stored in struct rdt_hw_mon_domain.
>>
>> It is not required to read the configuration register every time user asks
>> for it. Use the value stored in struct rdt_hw_mon_domain instead.
>>
>> Introduce resctrl_arch_mon_event_config_get() and
>> resctrl_arch_mon_event_config_set() to get/set architecture domain specific
>> mbm_total_cfg/mbm_local_cfg values.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> ...
> 
> 
>> +void resctrl_arch_mon_event_config_set(void *info)
>> +{
>> +	struct mon_config_info *mon_info = info;
>> +	struct rdt_hw_mon_domain *hw_dom;
>> +	unsigned int index;
>> +
>> +	index = mon_event_config_index_get(mon_info->evtid);
>> +	if (index == INVALID_CONFIG_INDEX)
>> +		return;
>> +
>> +	wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
>> +
>> +	hw_dom = resctrl_to_arch_mon_dom(mon_info->d);
>> +
>> +	switch (mon_info->evtid) {
>> +	case QOS_L3_OCCUP_EVENT_ID:
>> +		break;
> 
> This check does no harm but I do not think it is necessary since earlier
> mon_event_config_index_get() would return INVALID_CONFIG_INDEX if the
> evtid is QOS_L3_OCCUP_EVENT_ID.

Sure. Will remove it.

> 
>> +	case QOS_L3_MBM_TOTAL_EVENT_ID:
>> +		hw_dom->mbm_total_cfg = mon_info->mon_config;
>> +		break;
>> +	case QOS_L3_MBM_LOCAL_EVENT_ID:
>> +		hw_dom->mbm_local_cfg =  mon_info->mon_config;
> 
> nit: unnecessary space

Sure.

> 
>> +		break;
>> +	}
>> +}
>> +
> 
> Reinette
> 
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 14/25] x86/resctrl: Add data structures and definitions for ABMC assignment
  2024-10-16  3:21   ` Reinette Chatre
@ 2024-10-17 18:52     ` Moger, Babu
  2024-10-17 21:13       ` Reinette Chatre
  0 siblings, 1 reply; 124+ messages in thread
From: Moger, Babu @ 2024-10-17 18:52 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	kirill.shutemov, jithu.joseph, kai.huang, kan.liang,
	daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/15/24 22:21, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/9/24 10:39 AM, Babu Moger wrote:
>> The ABMC feature provides an option to the user to assign a hardware
>> counter to an RMID, event pair and monitor the bandwidth as long as the
>> counter is assigned. The bandwidth events will be tracked by the hardware
>> until the user changes the configuration. Each resctrl group can configure
>> maximum two counters, one for total event and one for local event.
>>
>> The ABMC feature implements an MSR L3_QOS_ABMC_CFG (C000_03FDh).
>> Configuration is done by setting the counter id, bandwidth source (RMID)
>> and bandwidth configuration supported by BMEC (Bandwidth Monitoring Event
>> Configuration).
>>
>> Attempts to read or write the MSR when ABMC is not enabled will result
>> in a #GP(0) exception.
>>
>> Introduce the data structures and definitions for MSR L3_QOS_ABMC_CFG
>> (0xC000_03FDh):
>> =========================================================================
>> Bits 	Mnemonic	Description			Access Reset
>> 							Type   Value
>> =========================================================================
>> 63 	CfgEn 		Configuration Enable 		R/W 	0
>>
>> 62 	CtrEn 		Enable/disable counting		R/W 	0
>>
>> 61:53 	– 		Reserved 			MBZ 	0
>>
>> 52:48 	CtrID 		Counter Identifier		R/W	0
>>
>> 47 	IsCOS		BwSrc field is a CLOSID		R/W	0
>> 			(not an RMID)
>>
>> 46:44 	–		Reserved			MBZ	0
>>
>> 43:32	BwSrc		Bandwidth Source		R/W	0
>> 			(RMID or CLOSID)
>>
>> 31:0	BwType		Bandwidth configuration		R/W	0
>> 			to track for this counter
>> ==========================================================================
>>
>> The feature details are documented in the APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC).
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> 
> ...
> 
>> ---
>>  arch/x86/include/asm/msr-index.h       |  1 +
>>  arch/x86/kernel/cpu/resctrl/internal.h | 33 ++++++++++++++++++++++++++
>>  2 files changed, 34 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
>> index 43c9dc473aba..2c281c977342 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -1196,6 +1196,7 @@
>>  #define MSR_IA32_SMBA_BW_BASE		0xc0000280
>>  #define MSR_IA32_EVT_CFG_BASE		0xc0000400
>>  #define MSR_IA32_L3_QOS_EXT_CFG		0xc00003ff
>> +#define MSR_IA32_L3_QOS_ABMC_CFG	0xc00003fd
>>  
> 
> As Tony mentioned, also please correct order of this MSR.

Sure.

> 
>>  /* AMD-V MSRs */
>>  #define MSR_VM_CR                       0xc0010114
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 86e3e188c119..de397468b945 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -602,6 +602,39 @@ union cpuid_0x10_x_edx {
>>  	unsigned int full;
>>  };
>>  
>> +/*
>> + * ABMC counters can be configured by writing to L3_QOS_ABMC_CFG.
> 
> "ABMC counters are configured by writing to L3_QOS_ABMC_CFG."

Sure.

> 
>> + * Reading L3_QOS_ABMC_DSC returns the configuration of the counter id
>> + * specified in L3_QOS_ABMC_CFG.cntr_id.
> 
> First and only mention/use of L3_QOS_ABMC_DSC in this series. If this register
> is not used then references to it can be removed.

Sure.

> 
>> + * @bw_type		: Bandwidth configuration(supported by BMEC)
> 
> "configuration(supported" -> "configuration (supported" 

Sure.

> 
>> + *			  tracked by the @cntr_id.
>> + * @bw_src		: Bandwidth source (RMID or CLOSID).
>> + * @reserved1		: Reserved.
>> + * @is_clos		: @bw_src field is a CLOSID (not an RMID).
>> + * @cntr_id		: Counter identifier.
>> + * @reserved		: Reserved.
>> + * @cntr_en		: Counting enable bit.
>> + * @cfg_en		: Configuration enable bit.
>> + *
>> + * Configuration and counting:
>> + * cfg_en=0,            : No configuration changes applied.
> 
> Can this be expanded? (sidenote: It is taking a long time to get clarity on how
> to interact with hardware. These incremental cryptic fragments make it difficult
> to know how to interact with the hardware.)
> 
> For example, "No configuration changes applied. Counter can be configured across
> multiple writes to MSR while @cfg_en=0. Configuration applied when @cfg_en=1."
> 
>> + * cfg_en=1, cntr_en=0  : Configure cntr_id and but no counting the events.
> 
> hmmm ... still the same (""but no counting the events") strange language I
> highlighted in V7 ...
> 
> I think it will make things easier to understand if similar language is used
> between the descriptions of the different fields.
> 
> "Apply @cntr_id configuration but do not count events." 
>  
>> + * cfg_en=1, cntr_en=1  : Configure cntr_id and start counting the events.
> 
> "Apply @cntr_id configuration and start counting events." 
> 
> Can it be added here which of these settings (or combination of settings) result
> in counters being reset?

Any change in the configuration will reset the counters.

Little bit lost here. Let me summarize. How about this?

Configuration and counting:
Counter can be configured across multiple writes to MSR. Configuration
is applied only when @cfg_en = 1. The event counters will reset when any
of the configuration is changed.
cfg_en = 1, cntr_en = 0 : Apply @cntr_id configuration but do not count
events.
cfg_en = 1, cntr_en = 1 : Apply @cntr_id configuration and start counting
events.


> 
>> + */
>> +union l3_qos_abmc_cfg {
>> +	struct {
>> +		unsigned long bw_type  :32,
>> +			      bw_src   :12,
>> +			      reserved1: 3,
>> +			      is_clos  : 1,
>> +			      cntr_id  : 5,
>> +			      reserved : 9,
>> +			      cntr_en  : 1,
>> +			      cfg_en   : 1;
>> +	} split;
>> +	unsigned long full;
>> +};
>> +
>>  void rdt_last_cmd_clear(void);
>>  void rdt_last_cmd_puts(const char *s);
>>  __printf(1, 2)
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 15/25] x86/resctrl: Introduce cntr_id in mongroup for assignments
  2024-10-16  3:22   ` Reinette Chatre
@ 2024-10-17 19:19     ` Moger, Babu
  0 siblings, 0 replies; 124+ messages in thread
From: Moger, Babu @ 2024-10-17 19:19 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	kirill.shutemov, jithu.joseph, kai.huang, kan.liang,
	daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/15/24 22:22, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/9/24 10:39 AM, Babu Moger wrote:
>> mbm_cntr_assign feature provides an option to the user to assign a counter
>> to an RMID, event pair and monitor the bandwidth as long as the counter is
>> assigned. There can be two counters per monitor group, one for MBM total
>> event and another for MBM local event.
>>
>> Introduce cntr_id to manage the assignments.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v8: Minor commit message update.
>>
>> v7: Minor comment update for cntr_id.
>>
>> v6: New patch.
>>     Separated FS and arch bits.
>> ---
>>  arch/x86/kernel/cpu/resctrl/internal.h | 7 +++++++
>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 6 ++++++
>>  2 files changed, 13 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index de397468b945..58298db9034f 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -62,6 +62,11 @@
>>  /* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
>>  #define ABMC_ENABLE_BIT			0
>>  
>> +/* Maximum assignable counters per resctrl group */
>> +#define MAX_CNTRS			2
>> +
>> +#define MON_CNTR_UNSET			U32_MAX
>> +
>>  /**
>>   * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
>>   *			        aren't marked nohz_full
>> @@ -231,12 +236,14 @@ enum rdtgrp_mode {
>>   * @parent:			parent rdtgrp
>>   * @crdtgrp_list:		child rdtgroup node list
>>   * @rmid:			rmid for this rdtgroup
>> + * @cntr_id:			IDs of hardware counters assigned to monitor group
>>   */
>>  struct mongroup {
>>  	struct kernfs_node	*mon_data_kn;
>>  	struct rdtgroup		*parent;
>>  	struct list_head	crdtgrp_list;
>>  	u32			rmid;
>> +	u32			cntr_id[MAX_CNTRS];
>>  };
>>  
>>  /**
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 610eae64b13a..03b670b95c49 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -3530,6 +3530,9 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
>>  	}
>>  	rdtgrp->mon.rmid = ret;
>>  
>> +	rdtgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
>> +	rdtgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
>> +
>>  	ret = mkdir_mondata_all(rdtgrp->kn, rdtgrp, &rdtgrp->mon.mon_data_kn);
>>  	if (ret) {
>>  		rdt_last_cmd_puts("kernfs subdir error\n");
>> @@ -4084,6 +4087,9 @@ static void __init rdtgroup_setup_default(void)
>>  	rdtgroup_default.closid = RESCTRL_RESERVED_CLOSID;
>>  	rdtgroup_default.mon.rmid = RESCTRL_RESERVED_RMID;
>>  	rdtgroup_default.type = RDTCTRL_GROUP;
>> +	rdtgroup_default.mon.cntr_id[0] = MON_CNTR_UNSET;
>> +	rdtgroup_default.mon.cntr_id[1] = MON_CNTR_UNSET;
>> +
> 
> Could these magic constants be avoided by introducing MBM_EVENT_ARRAY_INDEX here
> and using it for the array index instead of "0" and "1"?

Sure. Will do.

> 
>>  	INIT_LIST_HEAD(&rdtgroup_default.mon.crdtgrp_list);
>>  
>>  	list_add(&rdtgroup_default.rdtgroup_list, &rdt_all_groups);
> 
> Reinette
> 
> 
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 14/25] x86/resctrl: Add data structures and definitions for ABMC assignment
  2024-10-17 18:52     ` Moger, Babu
@ 2024-10-17 21:13       ` Reinette Chatre
  2024-10-17 23:02         ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-17 21:13 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	kirill.shutemov, jithu.joseph, kai.huang, kan.liang,
	daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/17/24 11:52 AM, Moger, Babu wrote:
> On 10/15/24 22:21, Reinette Chatre wrote:
>> On 10/9/24 10:39 AM, Babu Moger wrote:

>>> + *			  tracked by the @cntr_id.
>>> + * @bw_src		: Bandwidth source (RMID or CLOSID).
>>> + * @reserved1		: Reserved.
>>> + * @is_clos		: @bw_src field is a CLOSID (not an RMID).
>>> + * @cntr_id		: Counter identifier.
>>> + * @reserved		: Reserved.
>>> + * @cntr_en		: Counting enable bit.
>>> + * @cfg_en		: Configuration enable bit.
>>> + *
>>> + * Configuration and counting:
>>> + * cfg_en=0,            : No configuration changes applied.
>>
>> Can this be expanded? (sidenote: It is taking a long time to get clarity on how
>> to interact with hardware. These incremental cryptic fragments make it difficult
>> to know how to interact with the hardware.)
>>
>> For example, "No configuration changes applied. Counter can be configured across
>> multiple writes to MSR while @cfg_en=0. Configuration applied when @cfg_en=1."
>>
>>> + * cfg_en=1, cntr_en=0  : Configure cntr_id and but no counting the events.
>>
>> hmmm ... still the same (""but no counting the events") strange language I
>> highlighted in V7 ...
>>
>> I think it will make things easier to understand if similar language is used
>> between the descriptions of the different fields.
>>
>> "Apply @cntr_id configuration but do not count events." 
>>  
>>> + * cfg_en=1, cntr_en=1  : Configure cntr_id and start counting the events.
>>
>> "Apply @cntr_id configuration and start counting events." 
>>
>> Can it be added here which of these settings (or combination of settings) result
>> in counters being reset?
> 
> Any change in the configuration will reset the counters.
> 
> Little bit lost here. Let me summarize. How about this?
> 
> Configuration and counting:
> Counter can be configured across multiple writes to MSR. Configuration
> is applied only when @cfg_en = 1. The event counters will reset when any
> of the configuration is changed.

Is is not clear to me what is meant with "when any of the configuration is changed".
Are event counters reset with every write to the MSR, whether @cfg_en is set or
not? If counters are only reset when @cfg_en is set then I think it should read
"Counter @cntr_id is reset when the configuration is applied." Note this is
also made specific to be related to *just* the counter being configured, not all
event counters that "The event counters will reset ..." implies.

> cfg_en = 1, cntr_en = 0 : Apply @cntr_id configuration but do not count
> events.
> cfg_en = 1, cntr_en = 1 : Apply @cntr_id configuration and start counting
> events.
> 

Please use the @ prefix when referring to union members.

Thank you

Reinette


^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 16/25] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
  2024-10-16  3:23   ` Reinette Chatre
@ 2024-10-17 22:44     ` Moger, Babu
  0 siblings, 0 replies; 124+ messages in thread
From: Moger, Babu @ 2024-10-17 22:44 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/15/2024 10:23 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/9/24 10:39 AM, Babu Moger wrote:
>> The ABMC feature provides an option to the user to assign a hardware
>> counter to an RMID, event pair and monitor the bandwidth as long as it is
>> assigned. The assigned RMID will be tracked by the hardware until the user
>> unassigns it manually.
>>
>> Counters are configured by writing to L3_QOS_ABMC_CFG MSR and
>> specifying the counter id, bandwidth source, and bandwidth types.
>>
>> Provide the interface to assign the counter ids to RMID.
>>
>> The feature details are documented in the APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>>      Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>>      Monitoring (ABMC).
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> 
> ...
> 
>> ---
>>   arch/x86/kernel/cpu/resctrl/internal.h |  3 ++
>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 45 ++++++++++++++++++++++++++
>>   2 files changed, 48 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 58298db9034f..6d4df0490186 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -705,6 +705,9 @@ int mbm_cntr_alloc(struct rdt_resource *r);
>>   void mbm_cntr_free(struct rdt_resource *r, u32 cntr_id);
>>   void arch_mbm_evt_config_init(struct rdt_hw_mon_domain *hw_dom);
>>   unsigned int mon_event_config_index_get(u32 evtid);
>> +int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>> +			     enum resctrl_event_id evtid, u32 rmid, u32 closid,
>> +			     u32 cntr_id, bool assign);
>>   void rdt_staged_configs_clear(void);
>>   bool closid_allocated(unsigned int closid);
>>   int resctrl_find_cleanest_closid(void);
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 03b670b95c49..4ab1a18010c9 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1853,6 +1853,51 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
>>   	return ret ?: nbytes;
>>   }
>>   
>> +static void resctrl_abmc_config_one_amd(void *info)
>> +{
>> +	u64 *msrval = info;
>> +
>> +	wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *msrval);
>> +}
>> +
>> +/*
>> + * Send an IPI to the domain to assign the counter to RMID, event pair.
>> + */
>> +int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>> +			     enum resctrl_event_id evtid, u32 rmid, u32 closid,
>> +			     u32 cntr_id, bool assign)
>> +{
>> +	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
>> +	union l3_qos_abmc_cfg abmc_cfg = { 0 };
>> +	struct arch_mbm_state *arch_mbm;
>> +
>> +	abmc_cfg.split.cfg_en = 1;
>> +	abmc_cfg.split.cntr_en = assign ? 1 : 0;
>> +	abmc_cfg.split.cntr_id = cntr_id;
>> +	abmc_cfg.split.bw_src = rmid;
>> +
>> +	/* Update the event configuration from the domain */
>> +	if (evtid == QOS_L3_MBM_TOTAL_EVENT_ID) {
>> +		abmc_cfg.split.bw_type = hw_dom->mbm_total_cfg;
>> +		arch_mbm = &hw_dom->arch_mbm_total[rmid];
>> +	} else {
>> +		abmc_cfg.split.bw_type = hw_dom->mbm_local_cfg;
>> +		arch_mbm = &hw_dom->arch_mbm_local[rmid];
>> +	}
>> +
>> +	smp_call_function_any(&d->hdr.cpu_mask, resctrl_abmc_config_one_amd,
>> +			      &abmc_cfg, 1);
>> +
>> +	/*
>> +	 * Reset the architectural state so that reading of hardware
>> +	 * counter is not considered as an overflow in next update.
>> +	 */
>> +	if (arch_mbm)
>> +		memset(arch_mbm, 0, sizeof(struct arch_mbm_state));
>> +
> 
> More on this later, but I do believe later code can be simplified if
> reset of architectural state is done by caller. This function should
> focus on just configuring the counter.

Yes. It can be done by the caller. Need to introduce arch handler to do 
this as it accesses the arch state.

> 
>> +	return 0;
>> +}
>> +
>>   /* rdtgroup information files for one cache resource. */
>>   static struct rftype res_common_files[] = {
>>   	{
> 
> Reinette
> 

-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 17/25] x86/resctrl: Add the interface to assign/update counter assignment
  2024-10-16  3:25   ` Reinette Chatre
@ 2024-10-17 22:56     ` Moger, Babu
  2024-10-18 15:59       ` Reinette Chatre
  0 siblings, 1 reply; 124+ messages in thread
From: Moger, Babu @ 2024-10-17 22:56 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/15/2024 10:25 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/9/24 10:39 AM, Babu Moger wrote:
>> The mbm_cntr_assign mode offers several hardware counters that can be
>> assigned to an RMID-event pair and monitor the bandwidth as long as it
> 
> repeated nit (to be consistent): RMID, event

Sure.

> 
>> is assigned.
>>
>> Counters are managed at two levels. The global assignment is tracked
>> using the mbm_cntr_free_map field in the struct resctrl_mon, while
>> domain-specific assignments are tracked using the mbm_cntr_map field
>> in the struct rdt_mon_domain. Allocation begins at the global level
>> and is then applied individually to each domain.
>>
>> Introduce an interface to allocate these counters and update the
>> corresponding domains accordingly.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v8: Renamed rdtgroup_assign_cntr() to rdtgroup_assign_cntr_event().
>>      Added the code to return the error if rdtgroup_assign_cntr_event fails.
>>      Moved definition of MBM_EVENT_ARRAY_INDEX to resctrl/internal.h.
>>      Updated typo in the comments.
>>
>> v7: New patch. Moved all the FS code here.
>>      Merged rdtgroup_assign_cntr and rdtgroup_alloc_cntr.
>>      Adde new #define MBM_EVENT_ARRAY_INDEX.
>> ---
>>   arch/x86/kernel/cpu/resctrl/internal.h |  9 +++++
>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 47 ++++++++++++++++++++++++++
>>   2 files changed, 56 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 6d4df0490186..900e18aea2c4 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -67,6 +67,13 @@
>>   
>>   #define MON_CNTR_UNSET			U32_MAX
>>   
>> +/*
>> + * Get the counter index for the assignable counter
>> + * 0 for evtid == QOS_L3_MBM_TOTAL_EVENT_ID
>> + * 1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
>> + */
>> +#define MBM_EVENT_ARRAY_INDEX(_event) ((_event) - 2)
>> +
> 
> This can be moved to patch that introduces and initializes the array and used there.

Sure.

> 
>>   /**
>>    * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
>>    *			        aren't marked nohz_full
>> @@ -708,6 +715,8 @@ unsigned int mon_event_config_index_get(u32 evtid);
>>   int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>>   			     enum resctrl_event_id evtid, u32 rmid, u32 closid,
>>   			     u32 cntr_id, bool assign);
>> +int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>> +			       struct rdt_mon_domain *d, enum resctrl_event_id evtid);
>>   void rdt_staged_configs_clear(void);
>>   bool closid_allocated(unsigned int closid);
>>   int resctrl_find_cleanest_closid(void);
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 4ab1a18010c9..e4f628e6fe65 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1898,6 +1898,53 @@ int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>>   	return 0;
>>   }
>>   
>> +/*
>> + * Assign a hardware counter to the group.
> 
> hmmm ... counters are not assigned to groups. How about:
> "Assign a hardware counter to event @evtid of group @rdtgrp"?

Sure.

> 
>> + * Counter will be assigned to all the domains if rdt_mon_domain is NULL
>> + * else the counter will be allocated to specific domain.
> 
> "will be allocated to" -> "will be assigned to"?

Sure.

> 
>> + */
>> +int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>> +			       struct rdt_mon_domain *d, enum resctrl_event_id evtid)
>> +{
>> +	int index = MBM_EVENT_ARRAY_INDEX(evtid);
>> +	int cntr_id = rdtgrp->mon.cntr_id[index];
>> +	int ret;
>> +
>> +	/*
>> +	 * Allocate a new counter id to the event if the counter is not
>> +	 * assigned already.
>> +	 */
>> +	if (cntr_id == MON_CNTR_UNSET) {
>> +		cntr_id = mbm_cntr_alloc(r);
>> +		if (cntr_id < 0) {
>> +			rdt_last_cmd_puts("Out of MBM assignable counters\n");
>> +			return -ENOSPC;
>> +		}
>> +		rdtgrp->mon.cntr_id[index] = cntr_id;
>> +	}
>> +
>> +	if (!d) {
>> +		list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> +			ret = resctrl_arch_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
>> +						       rdtgrp->closid, cntr_id, true);
>> +			if (ret)
>> +				goto out_done_assign;
>> +
>> +			set_bit(cntr_id, d->mbm_cntr_map);
> 
> The code pattern above is repeated four times in this work, twice in
> rdtgroup_assign_cntr_event() and twice in rdtgroup_unassign_cntr_event(). This
> duplication should be avoided. It can be done in a function that also resets
> the architectural state.

Are you suggesting to combine rdtgroup_assign_cntr_event() and 
rdtgroup_unassign_cntr_event()?

It can be done. We need a flag to tell if it is a assign or unassign.


> 
>> +		}
>> +	} else {
>> +		ret = resctrl_arch_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
>> +					       rdtgrp->closid, cntr_id, true);
>> +		if (ret)
>> +			goto out_done_assign;
>> +
>> +		set_bit(cntr_id, d->mbm_cntr_map);
>> +	}
>> +
>> +out_done_assign:
> 
> Should a newly allocated counter not be freed if it could not be configured?

Yes.

> 
>> +	return ret;
>> +}
>> +
>>   /* rdtgroup information files for one cache resource. */
>>   static struct rftype res_common_files[] = {
>>   	{
> 
> Reinette
> 

-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 14/25] x86/resctrl: Add data structures and definitions for ABMC assignment
  2024-10-17 21:13       ` Reinette Chatre
@ 2024-10-17 23:02         ` Moger, Babu
  0 siblings, 0 replies; 124+ messages in thread
From: Moger, Babu @ 2024-10-17 23:02 UTC (permalink / raw)
  To: Reinette Chatre, babu.moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	kirill.shutemov, jithu.joseph, kai.huang, kan.liang,
	daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/17/2024 4:13 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/17/24 11:52 AM, Moger, Babu wrote:
>> On 10/15/24 22:21, Reinette Chatre wrote:
>>> On 10/9/24 10:39 AM, Babu Moger wrote:
> 
>>>> + *			  tracked by the @cntr_id.
>>>> + * @bw_src		: Bandwidth source (RMID or CLOSID).
>>>> + * @reserved1		: Reserved.
>>>> + * @is_clos		: @bw_src field is a CLOSID (not an RMID).
>>>> + * @cntr_id		: Counter identifier.
>>>> + * @reserved		: Reserved.
>>>> + * @cntr_en		: Counting enable bit.
>>>> + * @cfg_en		: Configuration enable bit.
>>>> + *
>>>> + * Configuration and counting:
>>>> + * cfg_en=0,            : No configuration changes applied.
>>>
>>> Can this be expanded? (sidenote: It is taking a long time to get clarity on how
>>> to interact with hardware. These incremental cryptic fragments make it difficult
>>> to know how to interact with the hardware.)
>>>
>>> For example, "No configuration changes applied. Counter can be configured across
>>> multiple writes to MSR while @cfg_en=0. Configuration applied when @cfg_en=1."
>>>
>>>> + * cfg_en=1, cntr_en=0  : Configure cntr_id and but no counting the events.
>>>
>>> hmmm ... still the same (""but no counting the events") strange language I
>>> highlighted in V7 ...
>>>
>>> I think it will make things easier to understand if similar language is used
>>> between the descriptions of the different fields.
>>>
>>> "Apply @cntr_id configuration but do not count events."
>>>   
>>>> + * cfg_en=1, cntr_en=1  : Configure cntr_id and start counting the events.
>>>
>>> "Apply @cntr_id configuration and start counting events."
>>>
>>> Can it be added here which of these settings (or combination of settings) result
>>> in counters being reset?
>>
>> Any change in the configuration will reset the counters.
>>
>> Little bit lost here. Let me summarize. How about this?
>>
>> Configuration and counting:
>> Counter can be configured across multiple writes to MSR. Configuration
>> is applied only when @cfg_en = 1. The event counters will reset when any
>> of the configuration is changed.
> 
> Is is not clear to me what is meant with "when any of the configuration is changed".
> Are event counters reset with every write to the MSR, whether @cfg_en is set or
> not? If counters are only reset when @cfg_en is set then I think it should read
> "Counter @cntr_id is reset when the configuration is applied." Note this is

Sure.

> also made specific to be related to *just* the counter being configured, not all
> event counters that "The event counters will reset ..." implies.
> 
>> cfg_en = 1, cntr_en = 0 : Apply @cntr_id configuration but do not count
>> events.
>> cfg_en = 1, cntr_en = 1 : Apply @cntr_id configuration and start counting
>> events.
>>
> 
> Please use the @ prefix when referring to union members.

Ok. Sure.

> 
> Thank you
> 
> Reinette
> 
> 

Thanks
- Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 18/25] x86/resctrl: Add the interface to unassign a MBM counter
  2024-10-16  3:29   ` Reinette Chatre
@ 2024-10-17 23:11     ` Moger, Babu
  2024-10-18 16:06       ` Reinette Chatre
  0 siblings, 1 reply; 124+ messages in thread
From: Moger, Babu @ 2024-10-17 23:11 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/15/2024 10:29 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/9/24 10:39 AM, Babu Moger wrote:
>> The mbm_cntr_assign mode provides a limited number of hardware counters
>> that can be assigned to an RMID-event pair to monitor bandwidth while
>> assigned. If all counters are in use, the kernel will show an error
>> message: "Out of MBM assignable counters" when a new assignment is
>> requested. To make space for a new assignment, users must unassign an
>> already assigned counter.
>>
>> Introduce an interface that allows for the unassignment of counter IDs
>> from both the group and the domain. Additionally, ensure that the global
>> counter is released if it is no longer assigned to any domains.
> 
> Needs imperative tone ... "Release the global counter ..."

sure.

> 
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> ...
> 
>> ---
>>   arch/x86/kernel/cpu/resctrl/internal.h |  2 +
>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 56 ++++++++++++++++++++++++++
>>   2 files changed, 58 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 900e18aea2c4..6f388d20fb22 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -717,6 +717,8 @@ int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>>   			     u32 cntr_id, bool assign);
>>   int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>>   			       struct rdt_mon_domain *d, enum resctrl_event_id evtid);
>> +int rdtgroup_unassign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>> +				 struct rdt_mon_domain *d, enum resctrl_event_id evtid);
>>   void rdt_staged_configs_clear(void);
>>   bool closid_allocated(unsigned int closid);
>>   int resctrl_find_cleanest_closid(void);
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index e4f628e6fe65..791258adcbda 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1945,6 +1945,62 @@ int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>>   	return ret;
>>   }
>>   
>> +static bool mbm_cntr_assigned_to_domain(struct rdt_resource *r, u32 cntr_id)
>> +{
>> +	struct rdt_mon_domain *d;
>> +
>> +	list_for_each_entry(d, &r->mon_domains, hdr.list)
>> +		if (test_bit(cntr_id, d->mbm_cntr_map))
>> +			return 1;
>> +
>> +	return 0;
>> +}
>> +
>> +/*
>> + * Unassign a hardware counter from the domain and the group.
> 
> Not sure ... maybe "Unassign a hardware counter associated with @evtid from
> the domain and the group."?

ok.

> 
>> + * Counter will be unassigned in all the domains if rdt_mon_domain is NULL
> 
> Please use imperative tone: "Unassign the counter from all the domains ...."

ok.

> 
>> + * else the counter will be assigned to specific domain.
> 
> copy&paste error?
> "assigned to specific domain" -> "unassign from specific domain"?

ok.

> 
>> + * Global counter will be freed once it is unassigned from all the domains.
> 
> Needs imperative tone.
> 
>> + */
>> +int rdtgroup_unassign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>> +				 struct rdt_mon_domain *d, enum resctrl_event_id evtid)
>> +{
>> +	int index = MBM_EVENT_ARRAY_INDEX(evtid);
>> +	int cntr_id = rdtgrp->mon.cntr_id[index];
>> +	int ret;
>> +
>> +	/* Return early if the counter is unassigned already */
>> +	if (cntr_id == MON_CNTR_UNSET)
>> +		return 0;
>> +
>> +	if (!d) {
>> +		list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> +			ret = resctrl_arch_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
>> +						       rdtgrp->closid, cntr_id, false);
>> +			if (ret)
>> +				goto out_done_unassign;
>> +
>> +			clear_bit(cntr_id, d->mbm_cntr_map);
>> +		}
>> +	} else {
>> +		ret = resctrl_arch_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
>> +					       rdtgrp->closid, cntr_id, false);
>> +		if (ret)
>> +			goto out_done_unassign;
>> +
>> +		clear_bit(cntr_id, d->mbm_cntr_map);
> 
> Please see comment to previous patch about the duplicate snippets. Snippets can be
> replaced with single function that also resets architectural state.

Sure.

will combine rdtgroup_assign_cntr_event() and
rdtgroup_unassign_cntr_event().

I need to rename the function. How about resctrl_configure_cntr_event()?


> 
>> +	}
>> +
>> +	/* Update the counter bitmap */
> 
> What is the update?

Clear the counter bitmap.

> 
>> +	if (!mbm_cntr_assigned_to_domain(r, cntr_id)) {
>> +		mbm_cntr_free(r, cntr_id);
>> +		rdtgrp->mon.cntr_id[index] = MON_CNTR_UNSET;
>> +	}
>> +
>> +out_done_unassign:
>> +	return ret;
>> +}
>> +
>>   /* rdtgroup information files for one cache resource. */
>>   static struct rftype res_common_files[] = {
>>   	{
> 
> 
> Reinette
> 

-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 19/25] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled
  2024-10-16  3:30   ` Reinette Chatre
@ 2024-10-18 14:22     ` Moger, Babu
  0 siblings, 0 replies; 124+ messages in thread
From: Moger, Babu @ 2024-10-18 14:22 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/15/2024 10:30 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/9/24 10:39 AM, Babu Moger wrote:
>> Assign/unassign counters on resctrl group creation/deletion. Two counters
>> are required per group, one for MBM total event and one for MBM local
>> event.
>>
>> There are a limited number of counters available for assignment. If these
>> counters are exhausted, the kernel will display the error message: "Out of
>> MBM assignable counters". However, it is not necessary to fail the
>> creation of a group due to assignment failures. Users have the flexibility
>> to modify the assignments at a later time.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> ...
> 
>> ---
>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 64 ++++++++++++++++++++++++++
>>   1 file changed, 64 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 791258adcbda..cb2c60c0319e 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> 
> ...
> 
>>   static int rdt_get_tree(struct fs_context *fc)
>>   {
>>   	struct rdt_fs_context *ctx = rdt_fc2context(fc);
>> @@ -2934,6 +2980,8 @@ static int rdt_get_tree(struct fs_context *fc)
>>   		if (ret < 0)
>>   			goto out_mongrp;
>>   		rdtgroup_default.mon.mon_data_kn = kn_mondata;
>> +
>> +		rdtgroup_assign_cntrs(&rdtgroup_default);
>>   	}
>>   
>>   	ret = rdt_pseudo_lock_init();
>> @@ -2964,6 +3012,7 @@ static int rdt_get_tree(struct fs_context *fc)
>>   out_psl:
>>   	rdt_pseudo_lock_release();
>>   out_mondata:
>> +	rdtgroup_unassign_cntrs(&rdtgroup_default);
>>   	if (resctrl_arch_mon_capable())
>>   		kernfs_remove(kn_mondata);
> 
> I think I mentioned this before ... this addition belongs within the
> "if (resctrl_arch_mon_capable())" to be symmetrical with where it was called from.

Sure.

> 
>>   out_mongrp:
>> @@ -3144,6 +3193,7 @@ static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp)
>>   
>>   	head = &rdtgrp->mon.crdtgrp_list;
>>   	list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) {
>> +		rdtgroup_unassign_cntrs(sentry);
>>   		free_rmid(sentry->closid, sentry->mon.rmid);
>>   		list_del(&sentry->mon.crdtgrp_list);
>>   
>> @@ -3184,6 +3234,8 @@ static void rmdir_all_sub(void)
>>   		cpumask_or(&rdtgroup_default.cpu_mask,
>>   			   &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);
>>   
>> +		rdtgroup_unassign_cntrs(rdtgrp);
>> +
>>   		free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>>   
>>   		kernfs_remove(rdtgrp->kn);
>> @@ -3223,6 +3275,8 @@ static void rdt_kill_sb(struct super_block *sb)
>>   		resctrl_arch_disable_alloc();
>>   	if (resctrl_arch_mon_capable())
>>   		resctrl_arch_disable_mon();
>> +
>> +	rdtgroup_unassign_cntrs(&rdtgroup_default);
> 
> Unassigning counters after monitoring is completely disabled seems late. I
> think this can be moved earlier to be right after the counters of all the
> other groups are unassigned.

Sure. Right after rmdir_all_sub().

> 
>>   	resctrl_mounted = false;
>>   	kernfs_kill_sb(sb);
>>   	mutex_unlock(&rdtgroup_mutex);
>> @@ -3814,6 +3868,8 @@ static int rdtgroup_mkdir_mon(struct kernfs_node *parent_kn,
>>   		goto out_unlock;
>>   	}
>>   
>> +	rdtgroup_assign_cntrs(rdtgrp);
>> +
>>   	kernfs_activate(rdtgrp->kn);
>>   
>>   	/*
>> @@ -3858,6 +3914,8 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
>>   	if (ret)
>>   		goto out_closid_free;
>>   
>> +	rdtgroup_assign_cntrs(rdtgrp);
>> +
>>   	kernfs_activate(rdtgrp->kn);
>>   
>>   	ret = rdtgroup_init_alloc(rdtgrp);
>> @@ -3883,6 +3941,7 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
>>   out_del_list:
>>   	list_del(&rdtgrp->rdtgroup_list);
>>   out_rmid_free:
>> +	rdtgroup_unassign_cntrs(rdtgrp);
>>   	mkdir_rdt_prepare_rmid_free(rdtgrp);
>>   out_closid_free:
>>   	closid_free(closid);
>> @@ -3953,6 +4012,9 @@ static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
>>   	update_closid_rmid(tmpmask, NULL);
>>   
>>   	rdtgrp->flags = RDT_DELETED;
>> +
>> +	rdtgroup_unassign_cntrs(rdtgrp);
>> +
>>   	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>>   
>>   	/*
>> @@ -3999,6 +4061,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
>>   	cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
>>   	update_closid_rmid(tmpmask, NULL);
>>   
>> +	rdtgroup_unassign_cntrs(rdtgrp);
>> +
>>   	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>>   	closid_free(rdtgrp->closid);
>>   
> 
> Reinette
> 

Thanks
- Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 20/25] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode
  2024-10-16  3:31   ` Reinette Chatre
@ 2024-10-18 14:31     ` Moger, Babu
  0 siblings, 0 replies; 124+ messages in thread
From: Moger, Babu @ 2024-10-18 14:31 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/15/2024 10:31 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/9/24 10:39 AM, Babu Moger wrote:
>> In mbm_cntr_assign mode, the hardware counter should be assigned to read
>> the MBM events.
>>
>> Report "Unassigned" in case the user attempts to read the events without
>> assigning the counter.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v8: Used MBM_EVENT_ARRAY_INDEX to get the index for the MBM event.
>>      Documentation update to make the text generic.
>>
>> v7: Moved the documentation under "mon_data".
>>      Updated the text little bit.
>>
>> v6: Added more explaination in the resctrl.rst
>>      Added checks to detect "Unassigned" before reading RMID.
>>
>> v5: New patch.
>> ---
>>   Documentation/arch/x86/resctrl.rst        | 10 ++++++++++
>>   arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 13 ++++++++++++-
>>   2 files changed, 22 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>> index 1b5c05a35793..99ee9c87952b 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -419,6 +419,16 @@ When monitoring is enabled all MON groups will also contain:
>>   	for the L3 cache they occupy). These are named "mon_sub_L3_YY"
>>   	where "YY" is the node number.
>>   
>> +	When supported the 'mbm_cntr_assign' mode allows users to assign a
>> +	counter to mon_hw_id, event pair enabling bandwidth monitoring for
>> +	as long as the counter remains assigned. The hardware will continue
>> +	tracking the assigned mon_hw_id until the user manually unassigns
>> +	it, ensuring that counters are not reset during this period. With
>> +	a limited number of counters, the system may run out of assignable
>> +	counters at some point. In that case, MBM event counters will return
> 
> nit: "at some point" can be dropped for clarity.

Sure.

> 
>> +	"Unassigned" when the event is read. Users must manually assign a
>> +	counter to read the events.
>> +
>>   "mon_hw_id":
>>   	Available only with debug option. The identifier used by hardware
>>   	for the monitor group. On x86 this is the RMID.
>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> index 50fa1fe9a073..5a9d15b2c319 100644
>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> @@ -562,7 +562,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>>   	struct rdtgroup *rdtgrp;
>>   	struct rdt_resource *r;
>>   	union mon_data_bits md;
>> -	int ret = 0;
>> +	int ret = 0, index;
>>   
>>   	rdtgrp = rdtgroup_kn_lock_live(of->kn);
>>   	if (!rdtgrp) {
>> @@ -576,6 +576,15 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>>   	evtid = md.u.evtid;
>>   	r = &rdt_resources_all[resid].r_resctrl;
>>   
>> +	if (resctrl_arch_mbm_cntr_assign_enabled(r) && evtid != QOS_L3_OCCUP_EVENT_ID) {
>> +		index = MBM_EVENT_ARRAY_INDEX(evtid);
>> +		if (index != INVALID_CONFIG_INDEX &&
>> +		    rdtgrp->mon.cntr_id[index] == MON_CNTR_UNSET) {
>> +			rr.err = -ENOENT;
>> +			goto checkresult;
>> +		}
>> +	}
>> +
>>   	if (md.u.sum) {
>>   		/*
>>   		 * This file requires summing across all domains that share
>> @@ -613,6 +622,8 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>>   		seq_puts(m, "Error\n");
>>   	else if (rr.err == -EINVAL)
>>   		seq_puts(m, "Unavailable\n");
>> +	else if (rr.err == -ENOENT)
>> +		seq_puts(m, "Unassigned\n");
>>   	else
>>   		seq_printf(m, "%llu\n", rr.val);
>>   
> 
> Reinette
> 

Thanks
- Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 21/25] x86/resctrl: Introduce the interface to switch between monitor modes
  2024-10-16  3:36   ` Reinette Chatre
@ 2024-10-18 15:13     ` Moger, Babu
  0 siblings, 0 replies; 124+ messages in thread
From: Moger, Babu @ 2024-10-18 15:13 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/15/2024 10:36 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/9/24 10:39 AM, Babu Moger wrote:
>> Introduce interface to switch between mbm_cntr_assign and default modes.
>>
>> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> [mbm_cntr_assign]
>> default
>>
>> To enable the "mbm_cntr_assign" mode:
>> $ echo "mbm_cntr_assign" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>>
>> To enable the default monitoring mode:
>> $ echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>>
>> MBM event counters will reset when mbm_assign_mode is changed.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> 
> ...
> 
>> ---
>>   Documentation/arch/x86/resctrl.rst     | 15 ++++++
>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 75 +++++++++++++++++++++++++-
>>   2 files changed, 89 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>> index 99ee9c87952b..d9574078f735 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -291,6 +291,21 @@ with the following files:
>>   	that case reading the mbm_total_bytes and mbm_local_bytes may report
>>   	'Unavailable' if there is no counter associated with that group.
>>   
>> +	* To enable "mbm_cntr_assign" mode:
>> +	  ::
>> +
>> +	    # echo  "mbm_cntr_assign" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> 
> extra spaces

Sure.

> 
>> +
>> +	* To enable default monitoring mode:
>> +	  ::
>> +
>> +	    # echo  "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> 
> extra spaces

Sure


>> +
>> +	The MBM events (mbm_total_bytes and/or mbm_local_bytes) associated counters
> 
> I did ask you not to copy the text verbatim
> https://lore.kernel.org/all/b38c93bf-4650-45d1-9aca-8b4c4d425886@intel.com/

Outch Yea.

The MBM events (mbm_total_bytes and/or mbm_local_bytes) associated with
counters may reset when mbm_assign_mode is changed.

> 
>> +	may reset when the mode is changed. Moving to mbm_cntr_assign mode will
>> +	require users to assign the counters to the events. Otherwise, the MBM
> 
> "will require" -> "require"
> 
>> +	event counters will return "Unassigned" when read.
>> +
>>   "num_mbm_cntrs":
>>   	The number of monitoring counters available for assignment when the
>>   	architecture supports mbm_cntr_assign mode.
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index cb2c60c0319e..88eda3cf5c82 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -888,6 +888,78 @@ static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
>>   	return 0;
>>   }
>>   
>> +static void mbm_cntr_reset(struct rdt_resource *r)
>> +{
>> +	struct rdtgroup *prgrp, *crgrp;
>> +	struct rdt_mon_domain *dom;
>> +
>> +	/*
>> +	 * Hardware counters will reset after switching the monitor mode.
>> +	 * Reset the architectural state so that reading of hardware
>> +	 * counter is not considered as an overflow in the next update.
>> +	 * Also reset the domain counter bitmap.
>> +	 */
>> +	list_for_each_entry(dom, &r->mon_domains, hdr.list) {
>> +		bitmap_zero(dom->mbm_cntr_map, r->mon.num_mbm_cntrs);
>> +		resctrl_arch_reset_rmid_all(r, dom);
>> +	}
>> +
>> +	/* Reset global MBM counter map */
>> +	bitmap_fill(r->mon.mbm_cntr_free_map, r->mon.num_mbm_cntrs);
>> +
>> +	/* Reset the cntr_id's for all the monitor groups */
>> +	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
>> +		prgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
>> +		prgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
>> +		list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list,
>> +				    mon.crdtgrp_list) {
>> +			crgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
>> +			crgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
>> +		}
> 
> Please use MBM_EVENT_ARRAY_INDEX

Sure.

> 
>> +	}
>> +}
>> +
>> +static ssize_t rdtgroup_mbm_assign_mode_write(struct kernfs_open_file *of,
>> +					      char *buf, size_t nbytes, loff_t off)
>> +{
>> +	struct rdt_resource *r = of->kn->parent->priv;
>> +	int ret = 0;
>> +	bool enable;
>> +
>> +	/* Valid input requires a trailing newline */
>> +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
>> +		return -EINVAL;
>> +
>> +	buf[nbytes - 1] = '\0';
>> +
>> +	cpus_read_lock();
>> +	mutex_lock(&rdtgroup_mutex);
>> +
>> +	rdt_last_cmd_clear();
>> +
>> +	if (!strcmp(buf, "default")) {
>> +		enable = 0;
>> +	} else if (!strcmp(buf, "mbm_cntr_assign")) {
>> +		enable = 1;
>> +	} else {
>> +		ret = -EINVAL;
>> +		rdt_last_cmd_puts("Unsupported assign mode\n");
>> +		goto write_exit;
>> +	}
> 
> Please keep two things in mind:
> * this file is always accessible, whether platform supports assignable
>    counters or not.
> * this is resctrl fs code.
> 
> So, considering above, how should user interpret the "Unsupported assign mode"?
> Shouldn't it also return this error if a user attempts to enable
> "mbm_cntr_assign" on a platform that does not support this mode?
> 
>> +
>> +	if (enable != resctrl_arch_mbm_cntr_assign_enabled(r)) {
> 
> resctrl_arch_mbm_cntr_assign_enabled() returns true if mbm_cntr_assign
> mode is enabled, but when it returns false it could mean different things:
> platform supports mbm_cntr_assign mode, but it is disabled, or platform
> does not support mbm_cntr_assign mode.
> 
> resctrl fs should not rely on all archs to duplicate the all the checking done
> in resctrl_arch_mbm_cntr_assign_set(). It should never ask arch to enable a mode
> that it knows the platform is not capable of.

Agree. This should take care of it. Good catch.


  if (!strcmp(buf, "default")) {
             enable = 0;
  } else if (!strcmp(buf, "mbm_cntr_assign")) {
          if (r->mon.mbm_cntr_assignable) {
                  enable = 1;
           } else {
            ret = -EINVAL;
            rdt_last_cmd_puts("mbm_cntr_assign mode is not supported\n");
            goto write_exit;
           }
  } else {
         ret = -EINVAL;
         rdt_last_cmd_puts("Unsupported assign mode\n");
         goto write_exit;
  }


> 
>> +		ret = resctrl_arch_mbm_cntr_assign_set(r, enable);
>> +		if (!ret)
>> +			mbm_cntr_reset(r);
>> +	}
>> +
>> +write_exit:
>> +	mutex_unlock(&rdtgroup_mutex);
>> +	cpus_read_unlock();
>> +
>> +	return ret ?: nbytes;
>> +}
>> +
>>   static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
>>   				       struct seq_file *s, void *v)
>>   {
>> @@ -2115,9 +2187,10 @@ static struct rftype res_common_files[] = {
>>   	},
>>   	{
>>   		.name		= "mbm_assign_mode",
>> -		.mode		= 0444,
>> +		.mode		= 0644,
>>   		.kf_ops		= &rdtgroup_kf_single_ops,
>>   		.seq_show	= rdtgroup_mbm_assign_mode_show,
>> +		.write		= rdtgroup_mbm_assign_mode_write,
>>   		.fflags		= RFTYPE_MON_INFO,
>>   	},
>>   	{
> 
> Reinette
> 
> 

-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 23/25] x86/resctrl: Update assignments on event configuration changes
  2024-10-16  3:40   ` Reinette Chatre
@ 2024-10-18 15:50     ` Moger, Babu
  0 siblings, 0 replies; 124+ messages in thread
From: Moger, Babu @ 2024-10-18 15:50 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	kirill.shutemov, jithu.joseph, kai.huang, kan.liang,
	daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/15/2024 10:40 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/9/24 10:39 AM, Babu Moger wrote:
>> Users can modify the configuration of assignable events. Whenever the
>> event configuration is updated, MBM assignments must be revised across
>> all monitor groups within the impacted domains.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> ...
> 
>> ---
>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 49 ++++++++++++++++++++++++++
>>   1 file changed, 49 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index f890d294e002..cf2e0ad0e4f4 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1669,6 +1669,7 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
>>   }
>>   
>>   struct mon_config_info {
>> +	struct rdt_resource *r;
>>   	struct rdt_mon_domain *d;
>>   	u32 evtid;
>>   	u32 mon_config;
>> @@ -1694,11 +1695,46 @@ u32 resctrl_arch_mon_event_config_get(struct rdt_mon_domain *d,
>>   	return INVALID_CONFIG_VALUE;
>>   }
>>   
>> +static void mbm_cntr_event_update(int cntr_id, unsigned int index, u32 val)
>> +{
>> +	union l3_qos_abmc_cfg abmc_cfg = { 0 };
>> +	struct rdtgroup *prgrp, *crgrp;
>> +	int update = 0;
>> +
>> +	/* Check if the cntr_id is associated to the event type updated */
>> +	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
>> +		if (prgrp->mon.cntr_id[index] == cntr_id) {
>> +			abmc_cfg.split.bw_src = prgrp->mon.rmid;
>> +			update = 1;
>> +			goto out_update;
>> +		}
>> +		list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list, mon.crdtgrp_list) {
>> +			if (crgrp->mon.cntr_id[index] == cntr_id) {
>> +				abmc_cfg.split.bw_src = crgrp->mon.rmid;
>> +				update = 1;
>> +				goto out_update;
>> +			}
>> +		}
> 
> This code looks like it is better suited for resctrl fs. Note that
> after the arch fs split struct rdtgroup is private to resctrl fs.

ok

> 
>> +	}
>> +
>> +out_update:
>> +	if (update) {
>> +		abmc_cfg.split.cfg_en = 1;
>> +		abmc_cfg.split.cntr_en = 1;
>> +		abmc_cfg.split.cntr_id = cntr_id;
>> +		abmc_cfg.split.bw_type = val;
>> +		wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, abmc_cfg.full);
>> +	}
>> +}
>> +
>>   void resctrl_arch_mon_event_config_set(void *info)
>>   {
>>   	struct mon_config_info *mon_info = info;
>> +	struct rdt_mon_domain *d = mon_info->d;
>> +	struct rdt_resource *r = mon_info->r;
>>   	struct rdt_hw_mon_domain *hw_dom;
>>   	unsigned int index;
>> +	int cntr_id;
>>   
>>   	index = mon_event_config_index_get(mon_info->evtid);
>>   	if (index == INVALID_CONFIG_INDEX)
>> @@ -1718,6 +1754,18 @@ void resctrl_arch_mon_event_config_set(void *info)
>>   		hw_dom->mbm_local_cfg =  mon_info->mon_config;
>>   		break;
>>   	}
>> +
>> +	/*
>> +	 * Update the assignment if the domain has the cntr_id's assigned
>> +	 * to event type updated.
>> +	 */
>> +	if (resctrl_arch_mbm_cntr_assign_enabled(r)) {
>> +		for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
>> +			if (test_bit(cntr_id, d->mbm_cntr_map))
>> +				mbm_cntr_event_update(cntr_id, index,
>> +						      mon_info->mon_config);
>> +		}
>> +	}
>>   }
>>   
>>   /**
>> @@ -1805,6 +1853,7 @@ static void mbm_config_write_domain(struct rdt_resource *r,
>>   	mon_info.d = d;
>>   	mon_info.evtid = evtid;
>>   	mon_info.mon_config = val;
>> +	mon_info.r = r;
>>   
>>   	/*
>>   	 * Update MSR_IA32_EVT_CFG_BASE MSR on one of the CPUs in the
> 
> If I understand correctly, mbm_config_write_domain() paints itself into a corner by
> calling arch code via IPI. As seen above it needs resctrl help to get all the information
> and doing so from the arch helper is not appropriate.
> 
> How about calling a resctrl fs helper via IPI instead? For example:
> 
> resctrl_mon_event_config_set() {
> 
> 	resctrl_arch_mon_event_config_set();
> 
> 	if (resctrl_arch_mbm_cntr_assign_enabled(r)) {
> 		for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
> 			if (test_bit(cntr_id, d->mbm_cntr_map)) {
> 				/* determine rmid */
> 				resctrl_arch_config_cntr()

The call resctrl_arch_config_cntr() requires both RMID and CLOSID. So, 
we will have to find the rdtgroup here (not just RMID, we need CLOSID also).

Yea. I think it can be done. Let me try.


> 			}
> 		}
> 	}
> }
> 
> 
> mbm_config_write_domain() {
> 
> 	...
> 	smp_call_function_any(&d->hdr.cpu_mask, resctrl_mon_event_config_set, ...)
> 	...
> 
> }
> 
> By removing reset of arch state from resctrl_arch_config_cntr() this works well with the
> resctrl_arch_reset_rmid_all() that is done from mbm_config_write_domain().
> Even though resctrl_arch_config_cntr() contains a smp_call_function_any() it should
> already be running on CPU in mask and thus should just run on local CPU.

Ok.
-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 17/25] x86/resctrl: Add the interface to assign/update counter assignment
  2024-10-17 22:56     ` Moger, Babu
@ 2024-10-18 15:59       ` Reinette Chatre
  2024-10-21 14:40         ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-18 15:59 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/17/24 3:56 PM, Moger, Babu wrote:
> On 10/15/2024 10:25 PM, Reinette Chatre wrote:
>> On 10/9/24 10:39 AM, Babu Moger wrote:

>>> + */
>>> +int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>>> +                   struct rdt_mon_domain *d, enum resctrl_event_id evtid)
>>> +{
>>> +    int index = MBM_EVENT_ARRAY_INDEX(evtid);
>>> +    int cntr_id = rdtgrp->mon.cntr_id[index];
>>> +    int ret;
>>> +
>>> +    /*
>>> +     * Allocate a new counter id to the event if the counter is not
>>> +     * assigned already.
>>> +     */
>>> +    if (cntr_id == MON_CNTR_UNSET) {
>>> +        cntr_id = mbm_cntr_alloc(r);
>>> +        if (cntr_id < 0) {
>>> +            rdt_last_cmd_puts("Out of MBM assignable counters\n");
>>> +            return -ENOSPC;
>>> +        }
>>> +        rdtgrp->mon.cntr_id[index] = cntr_id;
>>> +    }
>>> +
>>> +    if (!d) {
>>> +        list_for_each_entry(d, &r->mon_domains, hdr.list) {
>>> +            ret = resctrl_arch_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
>>> +                               rdtgrp->closid, cntr_id, true);
>>> +            if (ret)
>>> +                goto out_done_assign;
>>> +
>>> +            set_bit(cntr_id, d->mbm_cntr_map);
>>
>> The code pattern above is repeated four times in this work, twice in
>> rdtgroup_assign_cntr_event() and twice in rdtgroup_unassign_cntr_event(). This
>> duplication should be avoided. It can be done in a function that also resets
>> the architectural state.
> 
> Are you suggesting to combine rdtgroup_assign_cntr_event() and rdtgroup_unassign_cntr_event()?

No. My comment was about the following pattern that is repeated four times:
	...
	ret = resctrl_arch_config_cntr(...)
	if (ret)
		...
	set_bit()/clear_bit()
	...

> It can be done. We need a flag to tell if it is a assign or unassign.

There is already a flag that is used by resctrl_arch_config_cntr(), the same parameters
as resctrl_arch_config_cntr() can be used for a wrapper that just calls
resctrl_arch_config_cntr() directly and uses that same flag to
select between set_bit() and clear_bit(). This wrapper can then also include
the reset of architectural state.

Also, I do not think we need atomic bitops here so these can be __set_bit()
and __clear_bit() that also matches how bits of mbm_cntr_free_map are managed
earlier in series.

Reinette


^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 18/25] x86/resctrl: Add the interface to unassign a MBM counter
  2024-10-17 23:11     ` Moger, Babu
@ 2024-10-18 16:06       ` Reinette Chatre
  0 siblings, 0 replies; 124+ messages in thread
From: Reinette Chatre @ 2024-10-18 16:06 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/17/24 4:11 PM, Moger, Babu wrote:
> On 10/15/2024 10:29 PM, Reinette Chatre wrote:
>> On 10/9/24 10:39 AM, Babu Moger wrote:

>>> +    if (!d) {
>>> +        list_for_each_entry(d, &r->mon_domains, hdr.list) {
>>> +            ret = resctrl_arch_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
>>> +                               rdtgrp->closid, cntr_id, false);
>>> +            if (ret)
>>> +                goto out_done_unassign;
>>> +
>>> +            clear_bit(cntr_id, d->mbm_cntr_map);
>>> +        }
>>> +    } else {
>>> +        ret = resctrl_arch_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
>>> +                           rdtgrp->closid, cntr_id, false);
>>> +        if (ret)
>>> +            goto out_done_unassign;
>>> +
>>> +        clear_bit(cntr_id, d->mbm_cntr_map);
>>
>> Please see comment to previous patch about the duplicate snippets. Snippets can be
>> replaced with single function that also resets architectural state.
> 
> Sure.
> 
> will combine rdtgroup_assign_cntr_event() and
> rdtgroup_unassign_cntr_event().

That is not what I suggested. I attempted to clarify in response to patch
with original feedback:
https://lore.kernel.org/all/c36e0c76-1666-4a31-984e-1ee6aed2e414@intel.com/

> 
> I need to rename the function. How about resctrl_configure_cntr_event()?
> 
> 
>>
>>> +    }
>>> +
>>> +    /* Update the counter bitmap */
>>
>> What is the update?
> 
> Clear the counter bitmap.

Could you please update the comment to be more specific? What is
written can be seen from code.

Reinette



^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 17/25] x86/resctrl: Add the interface to assign/update counter assignment
  2024-10-18 15:59       ` Reinette Chatre
@ 2024-10-21 14:40         ` Moger, Babu
  2024-10-21 15:31           ` Reinette Chatre
  0 siblings, 1 reply; 124+ messages in thread
From: Moger, Babu @ 2024-10-21 14:40 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/18/24 10:59, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/17/24 3:56 PM, Moger, Babu wrote:
>> On 10/15/2024 10:25 PM, Reinette Chatre wrote:
>>> On 10/9/24 10:39 AM, Babu Moger wrote:
> 
>>>> + */
>>>> +int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>>>> +                   struct rdt_mon_domain *d, enum resctrl_event_id evtid)
>>>> +{
>>>> +    int index = MBM_EVENT_ARRAY_INDEX(evtid);
>>>> +    int cntr_id = rdtgrp->mon.cntr_id[index];
>>>> +    int ret;
>>>> +
>>>> +    /*
>>>> +     * Allocate a new counter id to the event if the counter is not
>>>> +     * assigned already.
>>>> +     */
>>>> +    if (cntr_id == MON_CNTR_UNSET) {
>>>> +        cntr_id = mbm_cntr_alloc(r);
>>>> +        if (cntr_id < 0) {
>>>> +            rdt_last_cmd_puts("Out of MBM assignable counters\n");
>>>> +            return -ENOSPC;
>>>> +        }
>>>> +        rdtgrp->mon.cntr_id[index] = cntr_id;
>>>> +    }
>>>> +
>>>> +    if (!d) {
>>>> +        list_for_each_entry(d, &r->mon_domains, hdr.list) {
>>>> +            ret = resctrl_arch_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
>>>> +                               rdtgrp->closid, cntr_id, true);
>>>> +            if (ret)
>>>> +                goto out_done_assign;
>>>> +
>>>> +            set_bit(cntr_id, d->mbm_cntr_map);
>>>
>>> The code pattern above is repeated four times in this work, twice in
>>> rdtgroup_assign_cntr_event() and twice in rdtgroup_unassign_cntr_event(). This
>>> duplication should be avoided. It can be done in a function that also resets
>>> the architectural state.
>>
>> Are you suggesting to combine rdtgroup_assign_cntr_event() and rdtgroup_unassign_cntr_event()?
> 
> No. My comment was about the following pattern that is repeated four times:
> 	...
> 	ret = resctrl_arch_config_cntr(...)
> 	if (ret)
> 		...
> 	set_bit()/clear_bit()
> 	...
> 

ok.


>> It can be done. We need a flag to tell if it is a assign or unassign.
> 
> There is already a flag that is used by resctrl_arch_config_cntr(), the same parameters
> as resctrl_arch_config_cntr() can be used for a wrapper that just calls
> resctrl_arch_config_cntr() directly and uses that same flag to
> select between set_bit() and clear_bit(). This wrapper can then also include
> the reset of architectural state.

ok. Got it, It will look like this.


+/*
+ * Wrapper to configure the counter in a domain.
+ */
+static int rdtgroup_config_cntr(struct rdt_resource *r,struct
rdt_mon_domain *d,
+                               enum resctrl_event_id evtid, u32 rmid, u32
closid,
+                               u32 cntr_id, bool assign)
+{
+       int ret;
+
+       ret = resctrl_arch_config_cntr(r, d, evtid, rmid, closid, cntr_id,
assign);
+       if (ret)
+               return ret;
+
+       if (assign)
+               __set_bit(cntr_id, d->mbm_cntr_map);
+       else
+               __clear_bit(cntr_id, d->mbm_cntr_map);
+
+       /*
+        * Reset the architectural state so that reading of hardware
+        * counter is not considered as an overflow in next update.
+        */
+       resctrl_arch_reset_rmid(r, d, closid, rmid, evtid);
+
+       return ret;
+}
+


> 
> Also, I do not think we need atomic bitops here so these can be __set_bit()
> and __clear_bit() that also matches how bits of mbm_cntr_free_map are managed
> earlier in series.
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 24/25] x86/resctrl: Introduce interface to list assignment states of all the groups
  2024-10-16  3:40   ` Reinette Chatre
@ 2024-10-21 14:56     ` Moger, Babu
  0 siblings, 0 replies; 124+ messages in thread
From: Moger, Babu @ 2024-10-21 14:56 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/15/24 22:40, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/9/24 10:39 AM, Babu Moger wrote:
>> Provide the interface to list the assignment states of all the resctrl
>> groups in mbm_cntr_assign mode.
>>
>> Example:
>> $mount -t resctrl resctrl /sys/fs/resctrl/
>> $cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> //0=tl;1=tl;
>>
>> List follows the following format:
>>
>> "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>>
>> Format for specific type of groups:
>>
>> - Default CTRL_MON group:
>>   "//<domain_id>=<flags>"
>>
>> - Non-default CTRL_MON group:
>>   "<CTRL_MON group>//<domain_id>=<flags>"
>>
>> - Child MON group of default CTRL_MON group:
>>   "/<MON group>/<domain_id>=<flags>"
>>
>> - Child MON group of non-default CTRL_MON group:
>>   "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>>
>> Flags can be one of the following:
>> t  MBM total event is enabled
>> l  MBM local event is enabled
>> tl Both total and local MBM events are enabled
>> _  None of the MBM events are enabled
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v8: Moved resctrl_mbm_event_assigned() in here as it is first used here.
>>     Moved rdt_last_cmd_clear() before making any call.
>>     Updated the commit log.
>>     Corrected the doc format.
>>
>> v7: Renamed the interface name from 'mbm_control' to 'mbm_assign_control'
>>     to match 'mbm_assign_mode'.
>>     Removed Arch references from FS code.
>>     Added rdt_last_cmd_clear() before the command processing.
>>     Added rdtgroup_mutex before all the calls.
>>     Removed references of ABMC from FS code.
>>
>> v6: The domain specific assignment can be determined looking at mbm_cntr_map.
>>     Removed rdtgroup_abmc_dom_cfg() and rdtgroup_abmc_dom_state().
>>     Removed the switch statement for the domain_state detection.
>>     Determined the flags incremently.
>>     Removed special handling of default group while printing..
>>
>> v5: Replaced "assignment flags" with "flags".
>>     Changes related to mon structure.
>>     Changes related renaming the interface from mbm_assign_control to
>>     mbm_control.
>>
>> v4: Added functionality to query domain specific assigment in.
>>     rdtgroup_abmc_dom_state().
>>
>> v3: New patch.
>>     Addresses the feedback to provide the global assignment interface.
>>     https://lore.kernel.org/lkml/c73f444b-83a1-4e9a-95d3-54c5165ee782@intel.com/
>> ---
>>  Documentation/arch/x86/resctrl.rst     | 44 +++++++++++++++
>>  arch/x86/kernel/cpu/resctrl/monitor.c  |  1 +
>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 76 ++++++++++++++++++++++++++
>>  3 files changed, 121 insertions(+)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>> index d9574078f735..b85d3bc3e301 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -310,6 +310,50 @@ with the following files:
>>  	The number of monitoring counters available for assignment when the
>>  	architecture supports mbm_cntr_assign mode.
>>  
>> +"mbm_assign_control":
>> +	Reports the resctrl group and monitor status of each group.
>> +
>> +	List follows the following format:
>> +		"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>> +
>> +	Format for specific type of groups:
>> +
>> +	* Default CTRL_MON group:
>> +		"//<domain_id>=<flags>"
>> +
>> +	* Non-default CTRL_MON group:
>> +		"<CTRL_MON group>//<domain_id>=<flags>"
>> +
>> +	* Child MON group of default CTRL_MON group:
>> +		"/<MON group>/<domain_id>=<flags>"
>> +
>> +	* Child MON group of non-default CTRL_MON group:
>> +		"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>> +
>> +	Flags can be one of the following:
>> +	::
>> +
>> +	 t  MBM total event is assigned.
>> +	 l  MBM local event is assigned.
>> +	 tl Both total and local MBM events are assigned.
>> +	 _  None of the MBM events are assigned.
>> +
>> +	Examples:
>> +	::
>> +
>> +	 # mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
>> +	 # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
>> +	 # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
>> +
>> +	 # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +	 non_default_ctrl_mon_grp//0=tl;1=tl;
>> +	 non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> +	 //0=tl;1=tl;
>> +	 /child_default_mon_grp/0=tl;1=tl;
>> +
>> +	There are four resctrl groups. All the groups have total and local MBM events
>> +	assigned on domain 0 and 1.
>> +
>>  "max_threshold_occupancy":
>>  		Read/write file provides the largest value (in
>>  		bytes) at which a previously used LLC_occupancy
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 395d99984893..fa7c77935080 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -1269,6 +1269,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>>  			r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
>>  			resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
>>  			hw_res->mbm_cntr_assign_enabled = true;
>> +			resctrl_file_fflags_init("mbm_assign_control", RFTYPE_MON_INFO);
>>  		}
>>  	}
>>  
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index cf2e0ad0e4f4..cf92ceb0f05e 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -970,6 +970,76 @@ static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
>>  	return 0;
>>  }
>>  
>> +static bool resctrl_mbm_event_assigned(struct rdtgroup *rdtg,
>> +				       struct rdt_mon_domain *d, u32 evtid)
> 
> u32 -> enum resctrl_event_id ?


Sure.

> 
>> +{
>> +	int index = MBM_EVENT_ARRAY_INDEX(evtid);
>> +	int cntr_id = rdtg->mon.cntr_id[index];
>> +
>> +	return cntr_id != MON_CNTR_UNSET && test_bit(cntr_id, d->mbm_cntr_map);
>> +}
>> +
>> +static char *rdtgroup_mon_state_to_str(struct rdtgroup *rdtgrp,
>> +				       struct rdt_mon_domain *d, char *str)
>> +{
>> +	char *tmp = str;
>> +
>> +	/* Query the total and local event flags for the domain */
>> +	if (resctrl_mbm_event_assigned(rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID))
>> +		*tmp++ = 't';
>> +
>> +	if (resctrl_mbm_event_assigned(rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID))
>> +		*tmp++ = 'l';
>> +
>> +	if (tmp == str)
>> +		*tmp++ = '_';
>> +
>> +	*tmp = '\0';
>> +	return str;
>> +}
>> +
> 
> Reinette
> 
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 17/25] x86/resctrl: Add the interface to assign/update counter assignment
  2024-10-21 14:40         ` Moger, Babu
@ 2024-10-21 15:31           ` Reinette Chatre
  2024-10-22  1:15             ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-21 15:31 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/21/24 7:40 AM, Moger, Babu wrote:
> On 10/18/24 10:59, Reinette Chatre wrote:
>> On 10/17/24 3:56 PM, Moger, Babu wrote:
>>> On 10/15/2024 10:25 PM, Reinette Chatre wrote:
>>>> On 10/9/24 10:39 AM, Babu Moger wrote:
>>
>>>>> + */
>>>>> +int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>>>>> +                   struct rdt_mon_domain *d, enum resctrl_event_id evtid)
>>>>> +{
>>>>> +    int index = MBM_EVENT_ARRAY_INDEX(evtid);
>>>>> +    int cntr_id = rdtgrp->mon.cntr_id[index];
>>>>> +    int ret;
>>>>> +
>>>>> +    /*
>>>>> +     * Allocate a new counter id to the event if the counter is not
>>>>> +     * assigned already.
>>>>> +     */
>>>>> +    if (cntr_id == MON_CNTR_UNSET) {
>>>>> +        cntr_id = mbm_cntr_alloc(r);
>>>>> +        if (cntr_id < 0) {
>>>>> +            rdt_last_cmd_puts("Out of MBM assignable counters\n");
>>>>> +            return -ENOSPC;
>>>>> +        }
>>>>> +        rdtgrp->mon.cntr_id[index] = cntr_id;
>>>>> +    }
>>>>> +
>>>>> +    if (!d) {
>>>>> +        list_for_each_entry(d, &r->mon_domains, hdr.list) {
>>>>> +            ret = resctrl_arch_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
>>>>> +                               rdtgrp->closid, cntr_id, true);
>>>>> +            if (ret)
>>>>> +                goto out_done_assign;
>>>>> +
>>>>> +            set_bit(cntr_id, d->mbm_cntr_map);
>>>>
>>>> The code pattern above is repeated four times in this work, twice in
>>>> rdtgroup_assign_cntr_event() and twice in rdtgroup_unassign_cntr_event(). This
>>>> duplication should be avoided. It can be done in a function that also resets
>>>> the architectural state.
>>>
>>> Are you suggesting to combine rdtgroup_assign_cntr_event() and rdtgroup_unassign_cntr_event()?
>>
>> No. My comment was about the following pattern that is repeated four times:
>> 	...
>> 	ret = resctrl_arch_config_cntr(...)
>> 	if (ret)
>> 		...
>> 	set_bit()/clear_bit()
>> 	...
>>
> 
> ok.
> 
> 
>>> It can be done. We need a flag to tell if it is a assign or unassign.
>>
>> There is already a flag that is used by resctrl_arch_config_cntr(), the same parameters
>> as resctrl_arch_config_cntr() can be used for a wrapper that just calls
>> resctrl_arch_config_cntr() directly and uses that same flag to
>> select between set_bit() and clear_bit(). This wrapper can then also include
>> the reset of architectural state.
> 
> ok. Got it, It will look like this.
> 
> 
> +/*
> + * Wrapper to configure the counter in a domain.
> + */

Please replace comment with a description of what the function does.

> +static int rdtgroup_config_cntr(struct rdt_resource *r,struct

While it keeps being a challenge to get naming right I do think this
can start by replacing "rdtgroup" with "resctrl" (specifically,
"rdtgroup_config_cntr() -> resctrl_config_cntr()") because, as seen
with the parameters passed, this has nothing to do with rdtgroup.

> rdt_mon_domain *d,
> +                               enum resctrl_event_id evtid, u32 rmid, u32
> closid,
> +                               u32 cntr_id, bool assign)
> +{
> +       int ret;
> +
> +       ret = resctrl_arch_config_cntr(r, d, evtid, rmid, closid, cntr_id,
> assign);
> +       if (ret)
> +               return ret;
> +
> +       if (assign)
> +               __set_bit(cntr_id, d->mbm_cntr_map);
> +       else
> +               __clear_bit(cntr_id, d->mbm_cntr_map);
> +
> +       /*
> +        * Reset the architectural state so that reading of hardware
> +        * counter is not considered as an overflow in next update.
> +        */
> +       resctrl_arch_reset_rmid(r, d, closid, rmid, evtid);
> +
> +       return ret;
> +}
> +

Yes, this looks good. Thank you.

Reinette


^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 25/25] x86/resctrl: Introduce interface to modify assignment states of the groups
  2024-10-16  3:43   ` Reinette Chatre
@ 2024-10-21 17:04     ` Moger, Babu
  2024-10-21 17:20       ` Reinette Chatre
  0 siblings, 1 reply; 124+ messages in thread
From: Moger, Babu @ 2024-10-21 17:04 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/15/24 22:43, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/9/24 10:39 AM, Babu Moger wrote:
>> Introduce the interface to assign MBM events in mbm_cntr_assign mode.
>>
>> Events can be enabled or disabled by writing to file
>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>
>> Format is similar to the list format with addition of opcode for the
>> assignment operation.
>>  "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
>>
>> Format for specific type of groups:
>>
>>  * Default CTRL_MON group:
>>          "//<domain_id><opcode><flags>"
>>
>>  * Non-default CTRL_MON group:
>>          "<CTRL_MON group>//<domain_id><opcode><flags>"
>>
>>  * Child MON group of default CTRL_MON group:
>>          "/<MON group>/<domain_id><opcode><flags>"
>>
>>  * Child MON group of non-default CTRL_MON group:
>>          "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
>>
>> Domain_id '*' will apply the flags on all the domains.
>>
>> Opcode can be one of the following:
>>
>>  = Update the assignment to match the flags
>>  + Assign a new MBM event without impacting existing assignments.
>>  - Unassign a MBM event from currently assigned events.
>>
>> Assignment flags can be one of the following:
>>  t  MBM total event
>>  l  MBM local event
>>  tl Both total and local MBM events
>>  _  None of the MBM events. Valid only with '=' opcode. This flag cannot
>>     be combined with other flags.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v8: Moved unassign as the first action during the assign modification.
>>     Assign none "_" takes priority. Cannot be mixed with other flags.
>>     Updated the documentation and .rst file format. htmldoc looks ok.
>>
>> v7: Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write().
>>     Added mutex lock in rdtgroup_mbm_assign_control_write() while processing.
>>     Renamed rdtgroup_find_grp to rdtgroup_find_grp_by_name.
>>     Fixed rdtgroup_str_to_mon_state to return error for invalid flags.
>>     Simplified the calls rdtgroup_assign_cntr by merging few functions earlier.
>>     Removed ABMC reference in FS code.
>>     Reinette commented about handling the combination of flags like 'lt_' and '_lt'.
>>     Not sure if we need to change the behaviour here. Processed them sequencially right now.
>>     Users have the liberty to pass the flags. Restricting it might be a problem later.
>>
>> v6: Added support assign all if domain id is '*'
>>     Fixed the allocation of counter id if it not assigned already.
>>
>> v5: Interface name changed from mbm_assign_control to mbm_control.
>>     Fixed opcode and flags combination.
>>     '=_" is valid.
>>     "-_" amd "+_" is not valid.
>>     Minor message update.
>>     Renamed the function with prefix - rdtgroup_.
>>     Corrected few documentation mistakes.
>>     Rebase related changes after SNC support.
>>
>> v4: Added domain specific assignments. Fixed the opcode parsing.
>>
>> v3: New patch.
>>     Addresses the feedback to provide the global assignment interface.
>>     https://lore.kernel.org/lkml/c73f444b-83a1-4e9a-95d3-54c5165ee782@intel.com/
>> ---
>>  Documentation/arch/x86/resctrl.rst     | 115 +++++++++++-
>>  arch/x86/kernel/cpu/resctrl/internal.h |  10 ++
>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 233 ++++++++++++++++++++++++-
>>  3 files changed, 356 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>> index b85d3bc3e301..77bb0b095127 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -336,7 +336,8 @@ with the following files:
>>  	 t  MBM total event is assigned.
>>  	 l  MBM local event is assigned.
>>  	 tl Both total and local MBM events are assigned.
>> -	 _  None of the MBM events are assigned.
>> +	 _  None of the MBM events are assigned. Only works with opcode '=' for write
>> +	    and cannot be combined with other flags.
>>  
>>  	Examples:
>>  	::
>> @@ -354,6 +355,118 @@ with the following files:
>>  	There are four resctrl groups. All the groups have total and local MBM events
>>  	assigned on domain 0 and 1.
>>  
>> +	Assignment state can be updated by writing to the interface.
>> +
>> +	Format is similar to the list format with addition of opcode for the
>> +	assignment operation.
>> +
>> +		"<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
>> +
>> +	Format for each type of groups:
>> +
>> +        * Default CTRL_MON group:
>> +                "//<domain_id><opcode><flags>"
>> +
>> +        * Non-default CTRL_MON group:
>> +                "<CTRL_MON group>//<domain_id><opcode><flags>"
>> +
>> +        * Child MON group of default CTRL_MON group:
>> +                "/<MON group>/<domain_id><opcode><flags>"
>> +
>> +        * Child MON group of non-default CTRL_MON group:
>> +                "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
>> +
>> +	Domain_id '*' will apply the flags on all the domains.
>> +
>> +	Opcode can be one of the following:
>> +	::
>> +
>> +	 = Update the assignment to match the MBM event.
>> +	 + Assign a new MBM event without impacting existing assignments.
>> +	 - Unassign a MBM event from currently assigned events.
>> +
>> +	Examples:
>> +	Initial group status:
>> +	::
>> +
>> +	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +	  non_default_ctrl_mon_grp//0=tl;1=tl;
>> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> +	  //0=tl;1=tl;
>> +	  /child_default_mon_grp/0=tl;1=tl;
>> +
>> +	To update the default group to assign only total MBM event on domain 0:
>> +	::
>> +
>> +	  # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +
>> +	Assignment status after the update:
>> +	::
>> +
>> +	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +	  non_default_ctrl_mon_grp//0=tl;1=tl;
>> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> +	  //0=t;1=tl;
>> +	  /child_default_mon_grp/0=tl;1=tl;
>> +
>> +	To update the MON group child_default_mon_grp to remove total MBM event on domain 1:
>> +	::
>> +
>> +	  # echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +
>> +	Assignment status after the update:
>> +	::
>> +
>> +	  $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> 
> Please be consistent by always using "# cat", not sometimes "$ cat" as above.

Sure.

> 
>> +	  non_default_ctrl_mon_grp//0=tl;1=tl;
>> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> +	  //0=t;1=tl;
>> +	  /child_default_mon_grp/0=tl;1=l;
>> +
>> +	To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to unassign
>> +	both local and total MBM events on domain 1:
>> +	::
>> +
>> +	  # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
>> +			/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +
>> +	Assignment status after the update:
>> +	::
>> +
> 
> Missing "# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control"

Sure.

> 
>> +	  non_default_ctrl_mon_grp//0=tl;1=tl;
>> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
>> +	  //0=t;1=tl;
>> +	  /child_default_mon_grp/0=tl;1=l;
>> +
>> +	To update the default group to add a local MBM event domain 0.
>> +	::
>> +
>> +	  # echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +
>> +	Assignment status after the update:
>> +	::
>> +
>> +	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +	  non_default_ctrl_mon_grp//0=tl;1=tl;
>> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
>> +	  //0=tl;1=tl;
>> +	  /child_default_mon_grp/0=tl;1=l;
>> +
>> +	To update the non default CTRL_MON group non_default_ctrl_mon_grp to unassign all the
>> +	MBM events on all the domains.
>> +	::
>> +
>> +	  # echo "non_default_ctrl_mon_grp//*=_" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +
>> +	Assignment status after the update:
>> +	::
>> +
>> +	  #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> 
> Please be consistent with spacing "# cat" vs "#cat". This is very noticeable when
> viewing the formatted docs.

Sure.

> 
>> +	  non_default_ctrl_mon_grp//0=_;1=_;
>> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
>> +	  //0=tl;1=tl;
>> +	  /child_default_mon_grp/0=tl;1=l;
>> +
>>  "max_threshold_occupancy":
>>  		Read/write file provides the largest value (in
>>  		bytes) at which a previously used LLC_occupancy
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index a6f40d3115f4..e8d6a430dc4a 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -74,6 +74,16 @@
>>   */
>>  #define MBM_EVENT_ARRAY_INDEX(_event) ((_event) - 2)
>>  
>> +/*
>> + * Assignment flags for mbm_cntr_assign feature
>> + */
>> +enum {
>> +	ASSIGN_NONE	= 0,
>> +	ASSIGN_TOTAL	= BIT(QOS_L3_MBM_TOTAL_EVENT_ID),
>> +	ASSIGN_LOCAL	= BIT(QOS_L3_MBM_LOCAL_EVENT_ID),
>> +	ASSIGN_INVALID,
>> +};
>> +
>>  /**
>>   * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
>>   *			        aren't marked nohz_full
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index cf92ceb0f05e..6095146e3ba4 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1040,6 +1040,236 @@ static int rdtgroup_mbm_assign_control_show(struct kernfs_open_file *of,
>>  	return 0;
>>  }
>>  
>> +static int rdtgroup_str_to_mon_state(char *flag)
>> +{
>> +	int i, mon_state = ASSIGN_NONE;
>> +
>> +	for (i = 0; i < strlen(flag); i++) {
>> +		switch (*(flag + i)) {
>> +		case 't':
>> +			mon_state |= ASSIGN_TOTAL;
>> +			break;
>> +		case 'l':
>> +			mon_state |= ASSIGN_LOCAL;
>> +			break;
>> +		case '_':
>> +			return ASSIGN_NONE;
>> +		default:
>> +			return ASSIGN_INVALID;
>> +		}
>> +	}
>> +
>> +	return mon_state;
>> +}
>> +
>> +static struct rdtgroup *rdtgroup_find_grp_by_name(enum rdt_group_type rtype,
>> +						  char *p_grp, char *c_grp)
>> +{
>> +	struct rdtgroup *rdtg, *crg;
>> +
>> +	if (rtype == RDTCTRL_GROUP && *p_grp == '\0') {
>> +		return &rdtgroup_default;
>> +	} else if (rtype == RDTCTRL_GROUP) {
>> +		list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list)
>> +			if (!strcmp(p_grp, rdtg->kn->name))
>> +				return rdtg;
>> +	} else if (rtype == RDTMON_GROUP) {
>> +		list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
>> +			if (!strcmp(p_grp, rdtg->kn->name)) {
>> +				list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
>> +						    mon.crdtgrp_list) {
>> +					if (!strcmp(c_grp, crg->kn->name))
>> +						return crg;
>> +				}
>> +			}
>> +		}
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +static int rdtgroup_process_flags(struct rdt_resource *r,
>> +				  enum rdt_group_type rtype,
>> +				  char *p_grp, char *c_grp, char *tok)
>> +{
>> +	int op, mon_state, assign_state, unassign_state;
>> +	char *dom_str, *id_str, *op_str;
>> +	struct rdt_mon_domain *d;
>> +	struct rdtgroup *rdtgrp;
>> +	unsigned long dom_id;
>> +	int ret, found = 0;
>> +
>> +	rdtgrp = rdtgroup_find_grp_by_name(rtype, p_grp, c_grp);
>> +
>> +	if (!rdtgrp) {
>> +		rdt_last_cmd_puts("Not a valid resctrl group\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +next:
>> +	if (!tok || tok[0] == '\0')
>> +		return 0;
>> +
>> +	/* Start processing the strings for each domain */
>> +	dom_str = strim(strsep(&tok, ";"));
>> +
>> +	op_str = strpbrk(dom_str, "=+-");
>> +
>> +	if (op_str) {
>> +		op = *op_str;
>> +	} else {
>> +		rdt_last_cmd_puts("Missing operation =, +, - character\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	id_str = strsep(&dom_str, "=+-");
>> +
>> +	/* Check for domain id '*' which means all domains */
>> +	if (id_str && *id_str == '*') {
>> +		d = NULL;
>> +		goto check_state;
>> +	} else if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
>> +		rdt_last_cmd_puts("Missing domain id\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	/* Verify if the dom_id is valid */
>> +	list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> +		if (d->hdr.id == dom_id) {
>> +			found = 1;
>> +			break;
>> +		}
>> +	}
>> +
>> +	if (!found) {
>> +		rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
>> +		return -EINVAL;
>> +	}
>> +
>> +check_state:
>> +	mon_state = rdtgroup_str_to_mon_state(dom_str);
>> +
>> +	if (mon_state == ASSIGN_INVALID) {
>> +		rdt_last_cmd_puts("Invalid assign flag\n");
>> +		goto out_fail;
>> +	}
>> +
>> +	assign_state = 0;
>> +	unassign_state = 0;
>> +
>> +	switch (op) {
>> +	case '+':
>> +		if (mon_state == ASSIGN_NONE) {
>> +			rdt_last_cmd_puts("Invalid assign opcode\n");
>> +			goto out_fail;
>> +		}
>> +		assign_state = mon_state;
>> +		break;
>> +	case '-':
>> +		if (mon_state == ASSIGN_NONE) {
>> +			rdt_last_cmd_puts("Invalid assign opcode\n");
>> +			goto out_fail;
>> +		}
>> +		unassign_state = mon_state;
>> +		break;
>> +	case '=':
>> +		assign_state = mon_state;
>> +		unassign_state = (ASSIGN_TOTAL | ASSIGN_LOCAL) & ~assign_state;
>> +		break;
>> +	default:
>> +		break;
>> +	}
>> +
>> +	if (unassign_state & ASSIGN_TOTAL) {
>> +		ret = rdtgroup_unassign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID);
>> +		if (ret)
>> +			goto out_fail;
>> +	}
>> +
>> +	if (unassign_state & ASSIGN_LOCAL) {
>> +		ret = rdtgroup_unassign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID);
>> +		if (ret)
>> +			goto out_fail;
>> +	}
>> +
>> +	if (assign_state & ASSIGN_TOTAL) {
>> +		ret = rdtgroup_assign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID);
>> +		if (ret)
>> +			goto out_fail;
>> +	}
>> +
>> +	if (assign_state & ASSIGN_LOCAL) {
>> +		ret = rdtgroup_assign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID);
>> +		if (ret)
>> +			goto out_fail;
>> +	}
>> +
>> +	goto next;
>> +
>> +out_fail:
> 
> Is it possible to print a message to the command status to give some details about which
> request failed? I am wondering about a scenario where a user changes multiple domains of
> multiple groups, since the operation does not undo changes, it will fail without information
> to user space about which setting triggered the failure and which settings succeeded.
> This is similar to what is done when user attempts to move several tasks ... the error will
> indicate which task triggered failure so that user space knows what completed successfully.

Will add something like this on failure.

rdt_last_cmd_printf("Total event assign failed on domain %d\n", dom_id);


> 
>> +
>> +	return -EINVAL;
>> +}
>> +
>> +static ssize_t rdtgroup_mbm_assign_control_write(struct kernfs_open_file *of,
>> +						 char *buf, size_t nbytes, loff_t off)
>> +{
>> +	struct rdt_resource *r = of->kn->parent->priv;
>> +	char *token, *cmon_grp, *mon_grp;
>> +	enum rdt_group_type rtype;
>> +	int ret;
>> +
>> +	/* Valid input requires a trailing newline */
>> +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
>> +		return -EINVAL;
>> +
>> +	buf[nbytes - 1] = '\0';
>> +
>> +	cpus_read_lock();
>> +	mutex_lock(&rdtgroup_mutex);
>> +
>> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
>> +		rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
> 
> Writing to last_cmd_status_buf here ...

Sure.

> 
>> +		mutex_unlock(&rdtgroup_mutex);
>> +		cpus_read_unlock();
>> +		return -EINVAL;
>> +	}
>> +
>> +	rdt_last_cmd_clear();
> 
> ... but initializing buffer here. 
> Sidenote: This was an issue before. If you receive comments about
> items in patches, please do check if those comments apply to other patches also.

Missed it.

> 
>> +
>> +	while ((token = strsep(&buf, "\n")) != NULL) {
>> +		if (strstr(token, "/")) {
>> +			/*
>> +			 * The write command follows the following format:
>> +			 * “<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>”
>> +			 * Extract the CTRL_MON group.
>> +			 */
>> +			cmon_grp = strsep(&token, "/");
>> +
>> +			/*
>> +			 * Extract the MON_GROUP.
>> +			 * strsep returns empty string for contiguous delimiters.
>> +			 * Empty mon_grp here means it is a RDTCTRL_GROUP.
>> +			 */
>> +			mon_grp = strsep(&token, "/");
>> +
>> +			if (*mon_grp == '\0')
>> +				rtype = RDTCTRL_GROUP;
>> +			else
>> +				rtype = RDTMON_GROUP;
>> +
>> +			ret = rdtgroup_process_flags(r, rtype, cmon_grp, mon_grp, token);
>> +			if (ret)
>> +				break;
>> +		}
>> +	}
>> +
>> +	mutex_unlock(&rdtgroup_mutex);
>> +	cpus_read_unlock();
>> +
>> +	return ret ?: nbytes;
>> +}
>> +
>>  #ifdef CONFIG_PROC_CPU_RESCTRL
>>  
>>  /*
>> @@ -2328,9 +2558,10 @@ static struct rftype res_common_files[] = {
>>  	},
>>  	{
>>  		.name		= "mbm_assign_control",
>> -		.mode		= 0444,
>> +		.mode		= 0644,
>>  		.kf_ops		= &rdtgroup_kf_single_ops,
>>  		.seq_show	= rdtgroup_mbm_assign_control_show,
>> +		.write		= rdtgroup_mbm_assign_control_write,
>>  	},
>>  	{
>>  		.name		= "cpus_list",
> 
> On a high level this looks ok but this code needs to be more robust. This will parse
> data from user space that may include all kinds of input ... think malicious user or
> a buggy script. I am not able to test this code but I tried to work through what will
> happen under some wrong input and found some issues. For example, if user space provides
> input like '//\n' then rdtgroup_process_flags() will be called with token == NULL. This will
> result in rdtgroup_process_flags() returning "success", but fortunately do nothing, for
> this invalid input. A more severe example is with input like '//0=\n', from what I can tell
> this will result in rdtgroup_str_to_mon_state() called with dom_str==NULL that will treat
> this as ASSIGN_NONE and proceed as if user provided '//0=_'.
> This was just some scenarios with basic input that could be typos, no real stress tests.
> I stopped here though since I believe it is already clear this needs to be more robust.
> Please do test this interface by exercising it with invalid input and corner cases.

Agree.

But, tested the cases you mentioned above. It seems to handle as expected.

# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl;

#echo '//\n' > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
bash: echo: write error: Invalid argument

# cat /sys/fs/resctrl/info/last_cmd_status
Missing operation =, +, - character


#echo '//0=\n' > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
bash: echo: write error: Invalid argument

#cat /sys/fs/resctrl/info/last_cmd_status
Invalid assign flag

#echo '/0=\n' > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
bash: echo: write error: Invalid argument
# cat /sys/fs/resctrl/info/last_cmd_status
Not a valid resctrl group


The assign state did not change.
#cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl;

Sure. will test some more combinations to be sure.
-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2024-10-16  3:05 ` [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Reinette Chatre
@ 2024-10-21 17:09   ` Moger, Babu
  0 siblings, 0 replies; 124+ messages in thread
From: Moger, Babu @ 2024-10-21 17:09 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

Thanks again for quick turn around for this series. Will start working on
v9.

On 10/15/24 22:05, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/9/24 10:39 AM, Babu Moger wrote:
>>
>> This series adds the support for Assignable Bandwidth Monitoring Counters
>> (ABMC). It is also called QoS RMID Pinning feature
>>
>> Series is written such that it is easier to support other assignable
>> features supported from different vendors.
>>
>> The feature details are documented in the  APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC). The documentation is available at
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>>
>> The patches are based on top of commit
>> 5b0c5f05fb2fe (tip/master) Merge branch into tip/master: 'x86/splitlock'
>>
>> # Introduction
>>
>> Users can create as many monitor groups as RMIDs supported by the hardware.
>> However, bandwidth monitoring feature on AMD system only guarantees that
>> RMIDs currently assigned to a processor will be tracked by hardware.
>> The counters of any other RMIDs which are no longer being tracked will be
>> reset to zero. The MBM event counters return "Unavailable" for the RMIDs
>> that are not tracked by hardware. So, there can be only limited number of
>> groups that can give guaranteed monitoring numbers. With ever changing
>> configurations there is no way to definitely know which of these groups
>> are being tracked for certain point of time. Users do not have the option
>> to monitor a group or set of groups for certain period of time without
>> worrying about RMID being reset in between.
> 
> "worrying about RMID being reset in between" -> "worrying about counter being
> reset in between"? 

Sure,

> 
>>     
>> The ABMC feature provides an option to the user to assign a hardware
>> counter to an RMID, event pair and monitor the bandwidth as long as it is
>> assigned.  The assigned RMID will be tracked by the hardware until the user
>> unassigns it manually. There is no need to worry about counters being reset
>> during this period. Additionally, the user can specify a bitmask identifying
>> the specific bandwidth types from the given source to track with the counter.
>>
>> Without ABMC enabled, monitoring will work in current 'default' mode without
>> assignment option.
>>
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 25/25] x86/resctrl: Introduce interface to modify assignment states of the groups
  2024-10-21 17:04     ` Moger, Babu
@ 2024-10-21 17:20       ` Reinette Chatre
  2024-10-22  1:12         ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Reinette Chatre @ 2024-10-21 17:20 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 10/21/24 10:04 AM, Moger, Babu wrote:
> On 10/15/24 22:43, Reinette Chatre wrote:
>> On 10/9/24 10:39 AM, Babu Moger wrote:


>>> +static int rdtgroup_process_flags(struct rdt_resource *r,
>>> +				  enum rdt_group_type rtype,
>>> +				  char *p_grp, char *c_grp, char *tok)
>>> +{
>>> +	int op, mon_state, assign_state, unassign_state;
>>> +	char *dom_str, *id_str, *op_str;
>>> +	struct rdt_mon_domain *d;
>>> +	struct rdtgroup *rdtgrp;
>>> +	unsigned long dom_id;
>>> +	int ret, found = 0;
>>> +
>>> +	rdtgrp = rdtgroup_find_grp_by_name(rtype, p_grp, c_grp);
>>> +
>>> +	if (!rdtgrp) {
>>> +		rdt_last_cmd_puts("Not a valid resctrl group\n");
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +next:
>>> +	if (!tok || tok[0] == '\0')
>>> +		return 0;
>>> +
>>> +	/* Start processing the strings for each domain */
>>> +	dom_str = strim(strsep(&tok, ";"));
>>> +
>>> +	op_str = strpbrk(dom_str, "=+-");
>>> +
>>> +	if (op_str) {
>>> +		op = *op_str;
>>> +	} else {
>>> +		rdt_last_cmd_puts("Missing operation =, +, - character\n");
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	id_str = strsep(&dom_str, "=+-");
>>> +
>>> +	/* Check for domain id '*' which means all domains */
>>> +	if (id_str && *id_str == '*') {
>>> +		d = NULL;
>>> +		goto check_state;
>>> +	} else if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
>>> +		rdt_last_cmd_puts("Missing domain id\n");
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	/* Verify if the dom_id is valid */
>>> +	list_for_each_entry(d, &r->mon_domains, hdr.list) {
>>> +		if (d->hdr.id == dom_id) {
>>> +			found = 1;
>>> +			break;
>>> +		}
>>> +	}
>>> +
>>> +	if (!found) {
>>> +		rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +check_state:
>>> +	mon_state = rdtgroup_str_to_mon_state(dom_str);
>>> +
>>> +	if (mon_state == ASSIGN_INVALID) {
>>> +		rdt_last_cmd_puts("Invalid assign flag\n");
>>> +		goto out_fail;
>>> +	}
>>> +
>>> +	assign_state = 0;
>>> +	unassign_state = 0;
>>> +
>>> +	switch (op) {
>>> +	case '+':
>>> +		if (mon_state == ASSIGN_NONE) {
>>> +			rdt_last_cmd_puts("Invalid assign opcode\n");
>>> +			goto out_fail;
>>> +		}
>>> +		assign_state = mon_state;
>>> +		break;
>>> +	case '-':
>>> +		if (mon_state == ASSIGN_NONE) {
>>> +			rdt_last_cmd_puts("Invalid assign opcode\n");
>>> +			goto out_fail;
>>> +		}
>>> +		unassign_state = mon_state;
>>> +		break;
>>> +	case '=':
>>> +		assign_state = mon_state;
>>> +		unassign_state = (ASSIGN_TOTAL | ASSIGN_LOCAL) & ~assign_state;
>>> +		break;
>>> +	default:
>>> +		break;
>>> +	}
>>> +
>>> +	if (unassign_state & ASSIGN_TOTAL) {
>>> +		ret = rdtgroup_unassign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID);
>>> +		if (ret)
>>> +			goto out_fail;
>>> +	}
>>> +
>>> +	if (unassign_state & ASSIGN_LOCAL) {
>>> +		ret = rdtgroup_unassign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID);
>>> +		if (ret)
>>> +			goto out_fail;
>>> +	}
>>> +
>>> +	if (assign_state & ASSIGN_TOTAL) {
>>> +		ret = rdtgroup_assign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID);
>>> +		if (ret)
>>> +			goto out_fail;
>>> +	}
>>> +
>>> +	if (assign_state & ASSIGN_LOCAL) {
>>> +		ret = rdtgroup_assign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID);
>>> +		if (ret)
>>> +			goto out_fail;
>>> +	}
>>> +
>>> +	goto next;
>>> +
>>> +out_fail:
>>
>> Is it possible to print a message to the command status to give some details about which
>> request failed? I am wondering about a scenario where a user changes multiple domains of
>> multiple groups, since the operation does not undo changes, it will fail without information
>> to user space about which setting triggered the failure and which settings succeeded.
>> This is similar to what is done when user attempts to move several tasks ... the error will
>> indicate which task triggered failure so that user space knows what completed successfully.
> 
> Will add something like this on failure.
> 
> rdt_last_cmd_printf("Total event assign failed on domain %d\n", dom_id);

The user may provide changes for several groups in a single write.
Could the CTRL_MON and MON group names also be printed? It is not clear
to me if it will be easier to print the flags the user provides or verbose text
that the flags translate to, that is "t" vs "Total event".

>>> +
>>> +	return -EINVAL;
>>> +}
>>> +
>>> +static ssize_t rdtgroup_mbm_assign_control_write(struct kernfs_open_file *of,
>>> +						 char *buf, size_t nbytes, loff_t off)
>>> +{
>>> +	struct rdt_resource *r = of->kn->parent->priv;
>>> +	char *token, *cmon_grp, *mon_grp;
>>> +	enum rdt_group_type rtype;
>>> +	int ret;
>>> +
>>> +	/* Valid input requires a trailing newline */
>>> +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
>>> +		return -EINVAL;
>>> +
>>> +	buf[nbytes - 1] = '\0';
>>> +
>>> +	cpus_read_lock();
>>> +	mutex_lock(&rdtgroup_mutex);
>>> +
>>> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
>>> +		rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
>>
>> Writing to last_cmd_status_buf here ...
> 
> Sure.
> 
>>
>>> +		mutex_unlock(&rdtgroup_mutex);
>>> +		cpus_read_unlock();
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	rdt_last_cmd_clear();
>>
>> ... but initializing buffer here. 
>> Sidenote: This was an issue before. If you receive comments about
>> items in patches, please do check if those comments apply to other patches also.
> 
> Missed it.
> 
>>
>>> +
>>> +	while ((token = strsep(&buf, "\n")) != NULL) {
>>> +		if (strstr(token, "/")) {

What is the purpose of this strstr() call?

>>> +			/*
>>> +			 * The write command follows the following format:
>>> +			 * “<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>”
>>> +			 * Extract the CTRL_MON group.
>>> +			 */
>>> +			cmon_grp = strsep(&token, "/");
>>> +
>>> +			/*
>>> +			 * Extract the MON_GROUP.
>>> +			 * strsep returns empty string for contiguous delimiters.
>>> +			 * Empty mon_grp here means it is a RDTCTRL_GROUP.
>>> +			 */
>>> +			mon_grp = strsep(&token, "/");
>>> +
>>> +			if (*mon_grp == '\0')
>>> +				rtype = RDTCTRL_GROUP;
>>> +			else
>>> +				rtype = RDTMON_GROUP;
>>> +
>>> +			ret = rdtgroup_process_flags(r, rtype, cmon_grp, mon_grp, token);
>>> +			if (ret)
>>> +				break;
>>> +		}
>>> +	}
>>> +
>>> +	mutex_unlock(&rdtgroup_mutex);
>>> +	cpus_read_unlock();
>>> +
>>> +	return ret ?: nbytes;
>>> +}
>>> +
>>>  #ifdef CONFIG_PROC_CPU_RESCTRL
>>>  
>>>  /*
>>> @@ -2328,9 +2558,10 @@ static struct rftype res_common_files[] = {
>>>  	},
>>>  	{
>>>  		.name		= "mbm_assign_control",
>>> -		.mode		= 0444,
>>> +		.mode		= 0644,
>>>  		.kf_ops		= &rdtgroup_kf_single_ops,
>>>  		.seq_show	= rdtgroup_mbm_assign_control_show,
>>> +		.write		= rdtgroup_mbm_assign_control_write,
>>>  	},
>>>  	{
>>>  		.name		= "cpus_list",
>>
>> On a high level this looks ok but this code needs to be more robust. This will parse
>> data from user space that may include all kinds of input ... think malicious user or
>> a buggy script. I am not able to test this code but I tried to work through what will
>> happen under some wrong input and found some issues. For example, if user space provides
>> input like '//\n' then rdtgroup_process_flags() will be called with token == NULL. This will
>> result in rdtgroup_process_flags() returning "success", but fortunately do nothing, for
>> this invalid input. A more severe example is with input like '//0=\n', from what I can tell
>> this will result in rdtgroup_str_to_mon_state() called with dom_str==NULL that will treat
>> this as ASSIGN_NONE and proceed as if user provided '//0=_'.
>> This was just some scenarios with basic input that could be typos, no real stress tests.
>> I stopped here though since I believe it is already clear this needs to be more robust.
>> Please do test this interface by exercising it with invalid input and corner cases.
> 
> Agree.
> 
> But, tested the cases you mentioned above. It seems to handle as expected.
> 
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl;
> 
> #echo '//\n' > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> bash: echo: write error: Invalid argument
> 
> # cat /sys/fs/resctrl/info/last_cmd_status
> Missing operation =, +, - character
> 
> 
> #echo '//0=\n' > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> bash: echo: write error: Invalid argument
> 
> #cat /sys/fs/resctrl/info/last_cmd_status
> Invalid assign flag
> 
> #echo '/0=\n' > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> bash: echo: write error: Invalid argument
> # cat /sys/fs/resctrl/info/last_cmd_status
> Not a valid resctrl group
> 
> 
> The assign state did not change.
> #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl;
> 
> Sure. will test some more combinations to be sure.

hmmm ... these are not quite the examples I shared since from what I can
tell it adds a second \n that impacts the processing of string.

Could you please try:
# echo '//' > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
and 
# echo '//0=' > /sys/fs/resctrl/info/L3_MON/mbm_assign_control

Reinette



^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 25/25] x86/resctrl: Introduce interface to modify assignment states of the groups
  2024-10-21 17:20       ` Reinette Chatre
@ 2024-10-22  1:12         ` Moger, Babu
  0 siblings, 0 replies; 124+ messages in thread
From: Moger, Babu @ 2024-10-22  1:12 UTC (permalink / raw)
  To: Reinette Chatre, babu.moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/21/2024 12:20 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/21/24 10:04 AM, Moger, Babu wrote:
>> On 10/15/24 22:43, Reinette Chatre wrote:
>>> On 10/9/24 10:39 AM, Babu Moger wrote:
> 
> 
>>>> +static int rdtgroup_process_flags(struct rdt_resource *r,
>>>> +				  enum rdt_group_type rtype,
>>>> +				  char *p_grp, char *c_grp, char *tok)
>>>> +{
>>>> +	int op, mon_state, assign_state, unassign_state;
>>>> +	char *dom_str, *id_str, *op_str;
>>>> +	struct rdt_mon_domain *d;
>>>> +	struct rdtgroup *rdtgrp;
>>>> +	unsigned long dom_id;
>>>> +	int ret, found = 0;
>>>> +
>>>> +	rdtgrp = rdtgroup_find_grp_by_name(rtype, p_grp, c_grp);
>>>> +
>>>> +	if (!rdtgrp) {
>>>> +		rdt_last_cmd_puts("Not a valid resctrl group\n");
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +next:
>>>> +	if (!tok || tok[0] == '\0')
>>>> +		return 0;
>>>> +
>>>> +	/* Start processing the strings for each domain */
>>>> +	dom_str = strim(strsep(&tok, ";"));
>>>> +
>>>> +	op_str = strpbrk(dom_str, "=+-");
>>>> +
>>>> +	if (op_str) {
>>>> +		op = *op_str;
>>>> +	} else {
>>>> +		rdt_last_cmd_puts("Missing operation =, +, - character\n");
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +	id_str = strsep(&dom_str, "=+-");
>>>> +
>>>> +	/* Check for domain id '*' which means all domains */
>>>> +	if (id_str && *id_str == '*') {
>>>> +		d = NULL;
>>>> +		goto check_state;
>>>> +	} else if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
>>>> +		rdt_last_cmd_puts("Missing domain id\n");
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +	/* Verify if the dom_id is valid */
>>>> +	list_for_each_entry(d, &r->mon_domains, hdr.list) {
>>>> +		if (d->hdr.id == dom_id) {
>>>> +			found = 1;
>>>> +			break;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	if (!found) {
>>>> +		rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +check_state:
>>>> +	mon_state = rdtgroup_str_to_mon_state(dom_str);
>>>> +
>>>> +	if (mon_state == ASSIGN_INVALID) {
>>>> +		rdt_last_cmd_puts("Invalid assign flag\n");
>>>> +		goto out_fail;
>>>> +	}
>>>> +
>>>> +	assign_state = 0;
>>>> +	unassign_state = 0;
>>>> +
>>>> +	switch (op) {
>>>> +	case '+':
>>>> +		if (mon_state == ASSIGN_NONE) {
>>>> +			rdt_last_cmd_puts("Invalid assign opcode\n");
>>>> +			goto out_fail;
>>>> +		}
>>>> +		assign_state = mon_state;
>>>> +		break;
>>>> +	case '-':
>>>> +		if (mon_state == ASSIGN_NONE) {
>>>> +			rdt_last_cmd_puts("Invalid assign opcode\n");
>>>> +			goto out_fail;
>>>> +		}
>>>> +		unassign_state = mon_state;
>>>> +		break;
>>>> +	case '=':
>>>> +		assign_state = mon_state;
>>>> +		unassign_state = (ASSIGN_TOTAL | ASSIGN_LOCAL) & ~assign_state;
>>>> +		break;
>>>> +	default:
>>>> +		break;
>>>> +	}
>>>> +
>>>> +	if (unassign_state & ASSIGN_TOTAL) {
>>>> +		ret = rdtgroup_unassign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID);
>>>> +		if (ret)
>>>> +			goto out_fail;
>>>> +	}
>>>> +
>>>> +	if (unassign_state & ASSIGN_LOCAL) {
>>>> +		ret = rdtgroup_unassign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID);
>>>> +		if (ret)
>>>> +			goto out_fail;
>>>> +	}
>>>> +
>>>> +	if (assign_state & ASSIGN_TOTAL) {
>>>> +		ret = rdtgroup_assign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID);
>>>> +		if (ret)
>>>> +			goto out_fail;
>>>> +	}
>>>> +
>>>> +	if (assign_state & ASSIGN_LOCAL) {
>>>> +		ret = rdtgroup_assign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID);
>>>> +		if (ret)
>>>> +			goto out_fail;
>>>> +	}
>>>> +
>>>> +	goto next;
>>>> +
>>>> +out_fail:
>>>
>>> Is it possible to print a message to the command status to give some details about which
>>> request failed? I am wondering about a scenario where a user changes multiple domains of
>>> multiple groups, since the operation does not undo changes, it will fail without information
>>> to user space about which setting triggered the failure and which settings succeeded.
>>> This is similar to what is done when user attempts to move several tasks ... the error will
>>> indicate which task triggered failure so that user space knows what completed successfully.
>>
>> Will add something like this on failure.
>>
>> rdt_last_cmd_printf("Total event assign failed on domain %d\n", dom_id);
> 
> The user may provide changes for several groups in a single write.
> Could the CTRL_MON and MON group names also be printed? It is not clear
> to me if it will be easier to print the flags the user provides or verbose text
> that the flags translate to, that is "t" vs "Total event".

Yes. We can print generic messages with group names and flags

"Assignment operation +-= failed on resctrl group ABC with flags = lt"

> 
>>>> +
>>>> +	return -EINVAL;
>>>> +}
>>>> +
>>>> +static ssize_t rdtgroup_mbm_assign_control_write(struct kernfs_open_file *of,
>>>> +						 char *buf, size_t nbytes, loff_t off)
>>>> +{
>>>> +	struct rdt_resource *r = of->kn->parent->priv;
>>>> +	char *token, *cmon_grp, *mon_grp;
>>>> +	enum rdt_group_type rtype;
>>>> +	int ret;
>>>> +
>>>> +	/* Valid input requires a trailing newline */
>>>> +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
>>>> +		return -EINVAL;
>>>> +
>>>> +	buf[nbytes - 1] = '\0';
>>>> +
>>>> +	cpus_read_lock();
>>>> +	mutex_lock(&rdtgroup_mutex);
>>>> +
>>>> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
>>>> +		rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
>>>
>>> Writing to last_cmd_status_buf here ...
>>
>> Sure.
>>
>>>
>>>> +		mutex_unlock(&rdtgroup_mutex);
>>>> +		cpus_read_unlock();
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +	rdt_last_cmd_clear();
>>>
>>> ... but initializing buffer here.
>>> Sidenote: This was an issue before. If you receive comments about
>>> items in patches, please do check if those comments apply to other patches also.
>>
>> Missed it.
>>
>>>
>>>> +
>>>> +	while ((token = strsep(&buf, "\n")) != NULL) {
>>>> +		if (strstr(token, "/")) {
> 
> What is the purpose of this strstr() call?

This is a carry over for v6.  Not required. Will remove.

> 
>>>> +			/*
>>>> +			 * The write command follows the following format:
>>>> +			 * “<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>”
>>>> +			 * Extract the CTRL_MON group.
>>>> +			 */
>>>> +			cmon_grp = strsep(&token, "/");
>>>> +
>>>> +			/*
>>>> +			 * Extract the MON_GROUP.
>>>> +			 * strsep returns empty string for contiguous delimiters.
>>>> +			 * Empty mon_grp here means it is a RDTCTRL_GROUP.
>>>> +			 */
>>>> +			mon_grp = strsep(&token, "/");
>>>> +
>>>> +			if (*mon_grp == '\0')
>>>> +				rtype = RDTCTRL_GROUP;
>>>> +			else
>>>> +				rtype = RDTMON_GROUP;
>>>> +
>>>> +			ret = rdtgroup_process_flags(r, rtype, cmon_grp, mon_grp, token);
>>>> +			if (ret)
>>>> +				break;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	mutex_unlock(&rdtgroup_mutex);
>>>> +	cpus_read_unlock();
>>>> +
>>>> +	return ret ?: nbytes;
>>>> +}
>>>> +
>>>>   #ifdef CONFIG_PROC_CPU_RESCTRL
>>>>   
>>>>   /*
>>>> @@ -2328,9 +2558,10 @@ static struct rftype res_common_files[] = {
>>>>   	},
>>>>   	{
>>>>   		.name		= "mbm_assign_control",
>>>> -		.mode		= 0444,
>>>> +		.mode		= 0644,
>>>>   		.kf_ops		= &rdtgroup_kf_single_ops,
>>>>   		.seq_show	= rdtgroup_mbm_assign_control_show,
>>>> +		.write		= rdtgroup_mbm_assign_control_write,
>>>>   	},
>>>>   	{
>>>>   		.name		= "cpus_list",
>>>
>>> On a high level this looks ok but this code needs to be more robust. This will parse
>>> data from user space that may include all kinds of input ... think malicious user or
>>> a buggy script. I am not able to test this code but I tried to work through what will
>>> happen under some wrong input and found some issues. For example, if user space provides
>>> input like '//\n' then rdtgroup_process_flags() will be called with token == NULL. This will
>>> result in rdtgroup_process_flags() returning "success", but fortunately do nothing, for
>>> this invalid input. A more severe example is with input like '//0=\n', from what I can tell
>>> this will result in rdtgroup_str_to_mon_state() called with dom_str==NULL that will treat
>>> this as ASSIGN_NONE and proceed as if user provided '//0=_'.
>>> This was just some scenarios with basic input that could be typos, no real stress tests.
>>> I stopped here though since I believe it is already clear this needs to be more robust.
>>> Please do test this interface by exercising it with invalid input and corner cases.
>>
>> Agree.
>>
>> But, tested the cases you mentioned above. It seems to handle as expected.
>>
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl;
>>
>> #echo '//\n' > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> bash: echo: write error: Invalid argument
>>
>> # cat /sys/fs/resctrl/info/last_cmd_status
>> Missing operation =, +, - character
>>
>>
>> #echo '//0=\n' > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> bash: echo: write error: Invalid argument
>>
>> #cat /sys/fs/resctrl/info/last_cmd_status
>> Invalid assign flag
>>
>> #echo '/0=\n' > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> bash: echo: write error: Invalid argument
>> # cat /sys/fs/resctrl/info/last_cmd_status
>> Not a valid resctrl group
>>
>>
>> The assign state did not change.
>> #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl;
>>
>> Sure. will test some more combinations to be sure.
> 
> hmmm ... these are not quite the examples I shared since from what I can
> tell it adds a second \n that impacts the processing of string.
> 
> Could you please try:
> # echo '//' > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> and
> # echo '//0=' > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> 

Yes. You are right. Above cases does not work as expected.

This should fix. Will test more.

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 6095146e3ba4..cccce991d2d0 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1044,6 +1044,9 @@ static int rdtgroup_str_to_mon_state(char *flag)
  {
         int i, mon_state = ASSIGN_NONE;

+       if (!strlen(flag))
+               return ASSIGN_INVALID;
+
         for (i = 0; i < strlen(flag); i++) {
                 switch (*(flag + i)) {
                 case 't':


Thanks
- Babu Moger

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 17/25] x86/resctrl: Add the interface to assign/update counter assignment
  2024-10-21 15:31           ` Reinette Chatre
@ 2024-10-22  1:15             ` Moger, Babu
  0 siblings, 0 replies; 124+ messages in thread
From: Moger, Babu @ 2024-10-22  1:15 UTC (permalink / raw)
  To: Reinette Chatre, babu.moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 10/21/2024 10:31 AM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/21/24 7:40 AM, Moger, Babu wrote:
>> On 10/18/24 10:59, Reinette Chatre wrote:
>>> On 10/17/24 3:56 PM, Moger, Babu wrote:
>>>> On 10/15/2024 10:25 PM, Reinette Chatre wrote:
>>>>> On 10/9/24 10:39 AM, Babu Moger wrote:
>>>
>>>>>> + */
>>>>>> +int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>>>>>> +                   struct rdt_mon_domain *d, enum resctrl_event_id evtid)
>>>>>> +{
>>>>>> +    int index = MBM_EVENT_ARRAY_INDEX(evtid);
>>>>>> +    int cntr_id = rdtgrp->mon.cntr_id[index];
>>>>>> +    int ret;
>>>>>> +
>>>>>> +    /*
>>>>>> +     * Allocate a new counter id to the event if the counter is not
>>>>>> +     * assigned already.
>>>>>> +     */
>>>>>> +    if (cntr_id == MON_CNTR_UNSET) {
>>>>>> +        cntr_id = mbm_cntr_alloc(r);
>>>>>> +        if (cntr_id < 0) {
>>>>>> +            rdt_last_cmd_puts("Out of MBM assignable counters\n");
>>>>>> +            return -ENOSPC;
>>>>>> +        }
>>>>>> +        rdtgrp->mon.cntr_id[index] = cntr_id;
>>>>>> +    }
>>>>>> +
>>>>>> +    if (!d) {
>>>>>> +        list_for_each_entry(d, &r->mon_domains, hdr.list) {
>>>>>> +            ret = resctrl_arch_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
>>>>>> +                               rdtgrp->closid, cntr_id, true);
>>>>>> +            if (ret)
>>>>>> +                goto out_done_assign;
>>>>>> +
>>>>>> +            set_bit(cntr_id, d->mbm_cntr_map);
>>>>>
>>>>> The code pattern above is repeated four times in this work, twice in
>>>>> rdtgroup_assign_cntr_event() and twice in rdtgroup_unassign_cntr_event(). This
>>>>> duplication should be avoided. It can be done in a function that also resets
>>>>> the architectural state.
>>>>
>>>> Are you suggesting to combine rdtgroup_assign_cntr_event() and rdtgroup_unassign_cntr_event()?
>>>
>>> No. My comment was about the following pattern that is repeated four times:
>>> 	...
>>> 	ret = resctrl_arch_config_cntr(...)
>>> 	if (ret)
>>> 		...
>>> 	set_bit()/clear_bit()
>>> 	...
>>>
>>
>> ok.
>>
>>
>>>> It can be done. We need a flag to tell if it is a assign or unassign.
>>>
>>> There is already a flag that is used by resctrl_arch_config_cntr(), the same parameters
>>> as resctrl_arch_config_cntr() can be used for a wrapper that just calls
>>> resctrl_arch_config_cntr() directly and uses that same flag to
>>> select between set_bit() and clear_bit(). This wrapper can then also include
>>> the reset of architectural state.
>>
>> ok. Got it, It will look like this.
>>
>>
>> +/*
>> + * Wrapper to configure the counter in a domain.
>> + */
> 
> Please replace comment with a description of what the function does.

sure.

> 
>> +static int rdtgroup_config_cntr(struct rdt_resource *r,struct
> 
> While it keeps being a challenge to get naming right I do think this
> can start by replacing "rdtgroup" with "resctrl" (specifically,
> "rdtgroup_config_cntr() -> resctrl_config_cntr()") because, as seen
> with the parameters passed, this has nothing to do with rdtgroup.

Sure.

> 
>> rdt_mon_domain *d,
>> +                               enum resctrl_event_id evtid, u32 rmid, u32
>> closid,
>> +                               u32 cntr_id, bool assign)
>> +{
>> +       int ret;
>> +
>> +       ret = resctrl_arch_config_cntr(r, d, evtid, rmid, closid, cntr_id,
>> assign);
>> +       if (ret)
>> +               return ret;
>> +
>> +       if (assign)
>> +               __set_bit(cntr_id, d->mbm_cntr_map);
>> +       else
>> +               __clear_bit(cntr_id, d->mbm_cntr_map);
>> +
>> +       /*
>> +        * Reset the architectural state so that reading of hardware
>> +        * counter is not considered as an overflow in next update.
>> +        */
>> +       resctrl_arch_reset_rmid(r, d, closid, rmid, evtid);
>> +
>> +       return ret;
>> +}
>> +
> 
> Yes, this looks good. Thank you.
> 

Thanks-
- Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-14 20:05                                   ` Reinette Chatre
  2024-10-14 20:32                                     ` Moger, Babu
@ 2024-10-24 17:29                                     ` Moger, Babu
  2024-10-24 17:37                                       ` Luck, Tony
  1 sibling, 1 reply; 124+ messages in thread
From: Moger, Babu @ 2024-10-24 17:29 UTC (permalink / raw)
  To: Reinette Chatre, Luck, Tony
  Cc: corbet@lwn.net, Yu, Fenghua, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, paulmck@kernel.org, rdunlap@infradead.org,
	tj@kernel.org, peterz@infradead.org, yanjiewtw@gmail.com,
	kim.phillips@amd.com, lukas.bulwahn@gmail.com, seanjc@google.com,
	jmattson@google.com, leitao@debian.org, jpoimboe@kernel.org,
	Edgecombe, Rick P, kirill.shutemov@linux.intel.com, Joseph, Jithu,
	Huang, Kai, kan.liang@linux.intel.com,
	daniel.sneddon@linux.intel.com, pbonzini@redhat.com,
	sandipan.das@amd.com, ilpo.jarvinen@linux.intel.com,
	peternewman@google.com, Wieczor-Retman, Maciej,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Eranian, Stephane, james.morse@arm.com

Hi Reinette/Tony,

On 10/14/24 15:05,  wrote:
> Hi Tony,
> 
> On 10/14/24 12:51 PM, Luck, Tony wrote:
>>>> What advantage does it have over skipping the per-domain list and
>>>> just providing a single value for all domains? You clearly expect this
>>>> will be a common user request since you implemented the "*" means
>>>> apply to all domains.
>>>>
>>>
>>> We started with a global assignment by applying assignment across all the
>>> domains initially.
>>>
>>> But we wanted give a generic approach which allows both the options(domain
>>> specific assignment and global assignment with '*"). It is also matches
>>> with other managements (RMID/CLOSID management) we are doing in resctrl
>>> right now. Also, there is an extra IPI for each domain if user is only
>>> interested in on domain.
>>>
>>> Some of the discussions are here.
>>> https://lore.kernel.org/lkml/f7dac996d87b4144e4c786178a7fd3d218eaebe8.1711674410.git.babu.moger@amd.com/#r
>>
>> My summary of that:
>>
>> Peter: Complex, don't need per-domain.
>> Reinette: Maybe some architecture might want per-domain.
> 
> To be specific ... we already have an architecture that supports per-domain:
> AMD's ABMC. When I considered the lifetime of user interfaces (forever?) while knowing
> that ABMC does indeed support per-domain counter assignment it seems a good
> precaution for the user interface to support that, even if the first
> implementation does not.
> 
> There are two parts to this work: (a) the new user interface
> and (b) support for ABMC. I believe that the user interface has to be
> flexible to support all ABMC features that users may want to take advantage of,
> even if the first implementation does not enable those features. In addition,
> the user interface should support future usages that we know if, "soft-ABMC"
> and MPAM.
> 
> I do not think that we should require all implementations to support everything
> made possible by user interface though. As I mentioned in that thread [1] I do
> think that the user _interface_ needs to be flexible by supporting domain level
> counter assignment, but that it may be possible that the _implementation_ only
> supports assignment to '*' domain values. 
> 
> I thus do not think we should simplify the syntax of mbm_assign_control,
> but I also do not think we should require that all implementations support all that
> the syntax makes possible. 
>  
>> Since you seem to want to keep the flexibility for a possible future
>> where per-domain is needed. The "available_mbm_cntrs" file
>> suggested in another thread would need to list available counters
>> on each domain to avoid ABI problems should that future arrive.
>>
>> $ cat num_mbm_counters
>> 32
>>
>> $ cat available_mbm_cntrs
>> 0=12;1=9
> 
> Good point.
> 

Working on this now. Wanted to confirm if we really need domain specific
information?

To me, it does not seem necessary for the user. User cannot make any
decisions based on this information.

All user wants to know is if there are global counters available.

$ cat num_mbm_counters
32

$ cat available_mbm_cntrs
15

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

* RE: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-24 17:29                                     ` Moger, Babu
@ 2024-10-24 17:37                                       ` Luck, Tony
  2024-10-25 20:31                                         ` Moger, Babu
  0 siblings, 1 reply; 124+ messages in thread
From: Luck, Tony @ 2024-10-24 17:37 UTC (permalink / raw)
  To: babu.moger@amd.com, Chatre, Reinette
  Cc: corbet@lwn.net, Yu, Fenghua, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, paulmck@kernel.org, rdunlap@infradead.org,
	tj@kernel.org, peterz@infradead.org, yanjiewtw@gmail.com,
	kim.phillips@amd.com, lukas.bulwahn@gmail.com, seanjc@google.com,
	jmattson@google.com, leitao@debian.org, jpoimboe@kernel.org,
	Edgecombe, Rick P, kirill.shutemov@linux.intel.com, Joseph, Jithu,
	Huang, Kai, kan.liang@linux.intel.com,
	daniel.sneddon@linux.intel.com, pbonzini@redhat.com,
	sandipan.das@amd.com, ilpo.jarvinen@linux.intel.com,
	peternewman@google.com, Wieczor-Retman, Maciej,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Eranian, Stephane, james.morse@arm.com

> >> Since you seem to want to keep the flexibility for a possible future
> >> where per-domain is needed. The "available_mbm_cntrs" file
> >> suggested in another thread would need to list available counters
> >> on each domain to avoid ABI problems should that future arrive.
> >>
> >> $ cat num_mbm_counters
> >> 32
> >>
> >> $ cat available_mbm_cntrs
> >> 0=12;1=9
> >
> > Good point.
> >
>
> Working on this now. Wanted to confirm if we really need domain specific
> information?
>
> To me, it does not seem necessary for the user. User cannot make any
> decisions based on this information.
>
> All user wants to know is if there are global counters available.
>
> $ cat num_mbm_counters
> 32
>
> $ cat available_mbm_cntrs
> 15

This approach paints resctrl into an ABI corner where it can't later
update resctrl to track counters per-domain. Maybe you'll never want to do that,
but some other architecture might want to have that flexibility.

-Tony

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: RE: [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-10-24 17:37                                       ` Luck, Tony
@ 2024-10-25 20:31                                         ` Moger, Babu
  0 siblings, 0 replies; 124+ messages in thread
From: Moger, Babu @ 2024-10-25 20:31 UTC (permalink / raw)
  To: Luck, Tony, babu.moger@amd.com, Chatre, Reinette
  Cc: corbet@lwn.net, Yu, Fenghua, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, paulmck@kernel.org, rdunlap@infradead.org,
	tj@kernel.org, peterz@infradead.org, yanjiewtw@gmail.com,
	kim.phillips@amd.com, lukas.bulwahn@gmail.com, seanjc@google.com,
	jmattson@google.com, leitao@debian.org, jpoimboe@kernel.org,
	Edgecombe, Rick P, kirill.shutemov@linux.intel.com, Joseph, Jithu,
	Huang, Kai, kan.liang@linux.intel.com,
	daniel.sneddon@linux.intel.com, pbonzini@redhat.com,
	sandipan.das@amd.com, ilpo.jarvinen@linux.intel.com,
	peternewman@google.com, Wieczor-Retman, Maciej,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Eranian, Stephane, james.morse@arm.com

Hi Tony,

On 10/24/2024 12:37 PM, Luck, Tony wrote:
>>>> Since you seem to want to keep the flexibility for a possible future
>>>> where per-domain is needed. The "available_mbm_cntrs" file
>>>> suggested in another thread would need to list available counters
>>>> on each domain to avoid ABI problems should that future arrive.
>>>>
>>>> $ cat num_mbm_counters
>>>> 32
>>>>
>>>> $ cat available_mbm_cntrs
>>>> 0=12;1=9
>>>
>>> Good point.
>>>
>>
>> Working on this now. Wanted to confirm if we really need domain specific
>> information?
>>
>> To me, it does not seem necessary for the user. User cannot make any
>> decisions based on this information.
>>
>> All user wants to know is if there are global counters available.
>>
>> $ cat num_mbm_counters
>> 32
>>
>> $ cat available_mbm_cntrs
>> 15
> 
> This approach paints resctrl into an ABI corner where it can't later
> update resctrl to track counters per-domain. Maybe you'll never want to do that,
> but some other architecture might want to have that flexibility.

Ok. Fine. Lets keep the per-domain counters.

-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 124+ messages in thread

end of thread, other threads:[~2024-10-25 20:31 UTC | newest]

Thread overview: 124+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-09 17:39 [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
2024-10-09 17:39 ` [PATCH v8 01/25] x86/cpufeatures: Add support for " Babu Moger
2024-10-09 17:39 ` [PATCH v8 02/25] x86/resctrl: Add ABMC feature in the command line options Babu Moger
2024-10-16  3:06   ` Reinette Chatre
2024-10-09 17:39 ` [PATCH v8 03/25] x86/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
2024-10-09 17:39 ` [PATCH v8 04/25] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
2024-10-16  3:06   ` Reinette Chatre
2024-10-09 17:39 ` [PATCH v8 05/25] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags Babu Moger
2024-10-09 17:39 ` [PATCH v8 06/25] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
2024-10-11 18:14   ` Tony Luck
2024-10-11 20:53     ` Moger, Babu
2024-10-16  3:07   ` Reinette Chatre
2024-10-09 17:39 ` [PATCH v8 07/25] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
2024-10-09 22:42   ` Tony Luck
2024-10-10 14:54     ` Moger, Babu
2024-10-10 15:07       ` Luck, Tony
2024-10-10 15:30         ` Moger, Babu
2024-10-10 16:02           ` Luck, Tony
2024-10-11 22:24           ` Reinette Chatre
2024-10-14 15:16             ` Moger, Babu
2024-10-16  3:12   ` Reinette Chatre
2024-10-16 15:57     ` Moger, Babu
2024-10-16 16:25       ` Reinette Chatre
2024-10-09 17:39 ` [PATCH v8 08/25] x86/resctrl: Introduce interface to display number of monitoring counters Babu Moger
2024-10-09 22:49   ` Tony Luck
2024-10-10 15:12     ` Moger, Babu
2024-10-10 15:58       ` Luck, Tony
2024-10-10 16:57         ` Moger, Babu
2024-10-10 17:08           ` Luck, Tony
2024-10-10 18:36             ` Moger, Babu
2024-10-10 18:57               ` Luck, Tony
2024-10-10 20:32                 ` Moger, Babu
2024-10-11 17:44                   ` Tony Luck
2024-10-11 20:49                     ` Moger, Babu
2024-10-11 21:36                       ` Tony Luck
2024-10-14 16:46                         ` Reinette Chatre
2024-10-14 17:20                           ` Moger, Babu
2024-10-14 17:49                             ` Luck, Tony
2024-10-14 19:21                               ` Moger, Babu
2024-10-14 19:51                                 ` Luck, Tony
2024-10-14 20:05                                   ` Reinette Chatre
2024-10-14 20:32                                     ` Moger, Babu
2024-10-24 17:29                                     ` Moger, Babu
2024-10-24 17:37                                       ` Luck, Tony
2024-10-25 20:31                                         ` Moger, Babu
2024-10-14 16:59               ` Reinette Chatre
2024-10-14 19:23                 ` Moger, Babu
2024-10-14 16:25       ` Reinette Chatre
2024-10-14 17:46         ` Moger, Babu
2024-10-14 18:30           ` Reinette Chatre
2024-10-14 18:51             ` Moger, Babu
2024-10-09 17:39 ` [PATCH v8 09/25] x86/resctrl: Add __init attribute to dom_data_init() Babu Moger
2024-10-16  3:13   ` Reinette Chatre
2024-10-16 17:32     ` Moger, Babu
2024-10-16 18:55       ` Reinette Chatre
2024-10-16 20:18         ` Moger, Babu
2024-10-09 17:39 ` [PATCH v8 10/25] x86/resctrl: Introduce bitmap mbm_cntr_free_map to track assignable counters Babu Moger
2024-10-16  3:14   ` Reinette Chatre
2024-10-17 16:55     ` Moger, Babu
2024-10-09 17:39 ` [PATCH v8 11/25] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg in struct rdt_hw_mon_domain Babu Moger
2024-10-16  3:15   ` Reinette Chatre
2024-10-09 17:39 ` [PATCH v8 12/25] x86/resctrl: Remove MSR reading of event configuration value Babu Moger
2024-10-16  3:16   ` Reinette Chatre
2024-10-17 17:59     ` Moger, Babu
2024-10-09 17:39 ` [PATCH v8 13/25] x86/resctrl: Introduce mbm_cntr_map to track assignable counters at domain Babu Moger
2024-10-16  3:19   ` Reinette Chatre
2024-10-09 17:39 ` [PATCH v8 14/25] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
2024-10-16  3:21   ` Reinette Chatre
2024-10-17 18:52     ` Moger, Babu
2024-10-17 21:13       ` Reinette Chatre
2024-10-17 23:02         ` Moger, Babu
2024-10-09 17:39 ` [PATCH v8 15/25] x86/resctrl: Introduce cntr_id in mongroup for assignments Babu Moger
2024-10-16  3:22   ` Reinette Chatre
2024-10-17 19:19     ` Moger, Babu
2024-10-09 17:39 ` [PATCH v8 16/25] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC Babu Moger
2024-10-16  3:23   ` Reinette Chatre
2024-10-17 22:44     ` Moger, Babu
2024-10-09 17:39 ` [PATCH v8 17/25] x86/resctrl: Add the interface to assign/update counter assignment Babu Moger
2024-10-16  3:25   ` Reinette Chatre
2024-10-17 22:56     ` Moger, Babu
2024-10-18 15:59       ` Reinette Chatre
2024-10-21 14:40         ` Moger, Babu
2024-10-21 15:31           ` Reinette Chatre
2024-10-22  1:15             ` Moger, Babu
2024-10-09 17:39 ` [PATCH v8 18/25] x86/resctrl: Add the interface to unassign a MBM counter Babu Moger
2024-10-16  3:29   ` Reinette Chatre
2024-10-17 23:11     ` Moger, Babu
2024-10-18 16:06       ` Reinette Chatre
2024-10-09 17:39 ` [PATCH v8 19/25] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled Babu Moger
2024-10-11 17:17   ` Tony Luck
2024-10-11 21:17     ` Moger, Babu
2024-10-11 21:33       ` Luck, Tony
2024-10-14 15:43         ` Moger, Babu
2024-10-14 16:18           ` Luck, Tony
2024-10-14 16:35             ` Moger, Babu
2024-10-15  2:39               ` Reinette Chatre
2024-10-15 15:43                 ` Moger, Babu
2024-10-15 16:57                   ` Luck, Tony
2024-10-15 17:18                   ` Reinette Chatre
2024-10-15 20:42                     ` Moger, Babu
2024-10-16  3:30   ` Reinette Chatre
2024-10-18 14:22     ` Moger, Babu
2024-10-09 17:39 ` [PATCH v8 20/25] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode Babu Moger
2024-10-11 17:23   ` Tony Luck
2024-10-11 21:21     ` Moger, Babu
2024-10-16  3:31   ` Reinette Chatre
2024-10-18 14:31     ` Moger, Babu
2024-10-09 17:39 ` [PATCH v8 21/25] x86/resctrl: Introduce the interface to switch between monitor modes Babu Moger
2024-10-16  3:36   ` Reinette Chatre
2024-10-18 15:13     ` Moger, Babu
2024-10-09 17:39 ` [PATCH v8 22/25] x86/resctrl: Configure mbm_cntr_assign mode if supported Babu Moger
2024-10-09 17:39 ` [PATCH v8 23/25] x86/resctrl: Update assignments on event configuration changes Babu Moger
2024-10-16  3:40   ` Reinette Chatre
2024-10-18 15:50     ` Moger, Babu
2024-10-09 17:39 ` [PATCH v8 24/25] x86/resctrl: Introduce interface to list assignment states of all the groups Babu Moger
2024-10-16  3:40   ` Reinette Chatre
2024-10-21 14:56     ` Moger, Babu
2024-10-09 17:39 ` [PATCH v8 25/25] x86/resctrl: Introduce interface to modify assignment states of " Babu Moger
2024-10-16  3:43   ` Reinette Chatre
2024-10-21 17:04     ` Moger, Babu
2024-10-21 17:20       ` Reinette Chatre
2024-10-22  1:12         ` Moger, Babu
2024-10-16  3:05 ` [PATCH v8 00/25] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Reinette Chatre
2024-10-21 17:09   ` Moger, Babu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).