linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
@ 2024-07-03 21:48 Babu Moger
  2024-07-03 21:48 ` [PATCH v5 01/20] x86/cpufeatures: Add support for " Babu Moger
                   ` (20 more replies)
  0 siblings, 21 replies; 95+ messages in thread
From: Babu Moger @ 2024-07-03 21:48 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse


This series adds the support for Assignable Bandwidth Monitoring Counters
(ABMC). It is also called QoS RMID Pinning feature

The feature details are documented in the  APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC). The documentation is available at
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537

The patches are based on top of commit
commit fbfd4bcefc65 ("Merge branch into tip/master: 'x86/vmware'").
whic includes Tony's SNC support.

# Introduction

Users can create as many monitor groups as RMIDs supported by the hardware.
However, bandwidth monitoring feature on AMD system only guarantees that
RMIDs currently assigned to a processor will be tracked by hardware.
The counters of any other RMIDs which are no longer being tracked will be
reset to zero. The MBM event counters return "Unavailable" for the RMIDs
that are not tracked by hardware. So, there can be only limited number of
groups that can give guaranteed monitoring numbers. With ever changing
configurations there is no way to definitely know which of these groups
are being tracked for certain point of time. Users do not have the option
to monitor a group or set of groups for certain period of time without
worrying about RMID being reset in between.
    
The ABMC feature provides an option to the user to assign a hardware
counter to an RMID and monitor the bandwidth as long as it is assigned.
The assigned RMID will be tracked by the hardware until the user unassigns
it manually. There is no need to worry about counters being reset during
this period. Additionally, the user can specify a bitmask identifying the
specific bandwidth types from the given source to track with the counter.

Without ABMC enabled, monitoring will work in current mode without
assignment option.

# Linux Implementation

Linux resctrl subsystem provides the interface to count maximum of two
memory bandwidth events per group, from a combination of available total
and local events. Keeping the current interface, users can enable a maximum
of 2 ABMC counters per group. User will also have the option to enable only
one counter to the group. If the system runs out of assignable ABMC
counters, kernel will display an error. Users need to disable an already
enabled counter to make space for new assignments.


# Examples

a. Check if ABMC support is available
	#mount -t resctrl resctrl /sys/fs/resctrl/

	#cat /sys/fs/resctrl/info/L3_MON/mbm_mode
	[abmc] 
	legacy

	Linux kernel detected ABMC feature and it is enabled.

b. Check how many ABMC counters are available. 

	#cat /sys/fs/resctrl/info/L3_MON/num_cntrs 
	32

c. Create few resctrl groups.

	# mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
	# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
	# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp


d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_control
   to list and modify the group's monitoring states. File provides single place
   to list monitoring states of all the resctrl groups. It makes it easier for
   user space to learn about the counters are used without needing to traverse
   all the groups thus reducing the number of filesystem calls.

	The list follows the following format:

	"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"

	Format for specific type of groups:

	* Default CTRL_MON group:
	 "//<domain_id>=<flags>"

       * Non-default CTRL_MON group:
               "<CTRL_MON group>//<domain_id>=<flags>"

       * Child MON group of default CTRL_MON group:
               "/<MON group>/<domain_id>=<flags>"

       * Child MON group of non-default CTRL_MON group:
               "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"

       Flags can be one of the following:

        t  MBM total event is enabled.
        l  MBM local event is enabled.
        tl Both total and local MBM events are enabled.
        _  None of the MBM events are enabled

	Examples:

	# cat /sys/fs/resctrl/info/L3_MON/mbm_control 
	non_default_ctrl_mon_grp//0=tl;1=tl;
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
	//0=tl;1=tl;
	/child_default_mon_grp/0=tl;1=tl;
	
	There are four groups and all the groups have local and total
	event enabled on domain 0 and 1.

	=tl means both total and local events are enabled.

	"//" - This is a default CTRL_MON group

	"non_default_ctrl_mon_grp//" - This is non-default CTRL_MON group

	"/child_default_mon_grp/"  - This is Child MON group of the defult group

	"non_default_ctrl_mon_grp/child_non_default_mon_grp/" - This is child
	MON group of the non-default group

e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_control.

	The write format is similar to the above list format with addition of
	op-code for the assignment operation.
	
	* Default CTRL_MON group:
	        "//<domain_id><op-code><flags>"
	
	* Non-default CTRL_MON group:
	        "<CTRL_MON group>//<domain_id><op-code><flags>"
	
	* Child MON group of default CTRL_MON group:
	        "/<MON group>/<domain_id><op-code><flags>"
	
	* Child MON group of non-default CTRL_MON group:
	        "<CTRL_MON group>/<MON group>/<domain_id><op-code><flags>"
	
	Op-code can be one of the following:
	
	= Update the assignment to match the flag.
	+ Assign a new state.
	- Unassign a new state.

	Flags can be one of the following:

        t  MBM total event.
        l  MBM local event.
        tl Both total and local MBM events.
        _  None of the MBM events. Only works with '=' op-code.
	
	Initial group status:
	# cat /sys/fs/resctrl/info/L3_MON/mbm_control
	non_default_ctrl_mon_grp//0=tl;1=tl;
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
	//0=tl;1=tl;
	/child_default_mon_grp/0=tl;1=tl;

	To update the default group to enable only total event on domain 0:
	# echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_control

	Assignment status after the update:
	# cat /sys/fs/resctrl/info/L3_MON/mbm_control
	non_default_ctrl_mon_grp//0=tl;1=tl;
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
	//0=t;1=tl;
	/child_default_mon_grp/0=tl;1=tl;

	To update the MON group child_default_mon_grp to remove total event on domain 1:
	# echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_control

	Assignment status after the update:
	$ cat /sys/fs/resctrl/info/L3_MON/mbm_control
	non_default_ctrl_mon_grp//0=tl;1=tl;
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
	//0=t;1=tl;
	/child_default_mon_grp/0=tl;1=l;

	To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to
	remove both local and total events on domain 1:
	# echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
	       /sys/fs/resctrl/info/L3_MON/mbm_control

	Assignment status after the update:
	non_default_ctrl_mon_grp//0=tl;1=tl;
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
	//0=t;1=tl;
	/child_default_mon_grp/0=tl;1=l;

	To update the default group to add a local event domain 0.
	# echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_control

	Assignment status after the update:
	# cat /sys/fs/resctrl/info/L3_MON/mbm_control
	non_default_ctrl_mon_grp//0=tl;1=tl;
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
	//0=tl;1=tl;
	/child_default_mon_grp/0=tl;1=l;


f. Read the event mbm_total_bytes and mbm_local_bytes of the default group.
   There is no change in reading the events with ABMC. If the event is unassigned
   when reading, then the read will come back as "Unassigned".
	
	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	779247936
	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes 
	765207488
	
g. Users will have the option to go back to legacy mbm_mode if required.
   This can be done using the following command. Note that switching the
   mbm_mode will reset all the mbm counters of all resctrl groups.

	# echo "legacy" > /sys/fs/resctrl/info/L3_MON/mbm_mode
	# cat /sys/fs/resctrl/info/L3_MON/mbm_mode
	abmc
	[legacy]

h. Check the bandwidth configuration for the group. Note that bandwidth
   configuration has a domain scope. Total event defaults to 0x7F (to
   count all the events) and local event defaults to 0x15 (to count all
   the local numa events). The event bitmap decoding is available at
   https://www.kernel.org/doc/Documentation/x86/resctrl.rst
   in section "mbm_total_bytes_config", "mbm_local_bytes_config":
	
	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
	0=0x7f;1=0x7f
	
	#cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config 
	0=0x15;1=0x15
	
j. Change the bandwidth source for domain 0 for the total event to count only reads.
   Note that this change effects total events on the domain 0.
	
	#echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
	0=0x33;1=0x7F
	
k. Now read the total event again. The first read will come back with "Unavailable"
   status. The subsequent read of mbm_total_bytes will display only the read events.
	
	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	Unavailable
	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	314101
	
l. Unmount the resctrl
	 
	#umount /sys/fs/resctrl/

---
v5:
  Rebase changes (because of SNC support)

  Interface changes.
   /sys/fs/resctrl/mbm_assign to /sys/fs/resctrl/mbm_mode.
   /sys/fs/resctrl/mbm_assign_control to /sys/fs/resctrl/mbm_control.

  Added few arch specific routines.
  resctrl_arch_get_abmc_enabled.
  resctrl_arch_abmc_enable.
  resctrl_arch_abmc_disable.

  Few renames
   num_cntrs_free_map -> mbm_cntrs_free_map
   num_cntrs_init -> mbm_cntrs_init
   arch_domain_mbm_evt_config -> resctrl_arch_mbm_evt_config

  Introduced resctrl_arch_event_config_get and
    resctrl_arch_event_config_set() to update event configuration.

  Removed mon_state field mongroup. Added MON_CNTR_UNSET to initialize counters.

  Renamed ctr_id to cntr_id for the hardware counter.
 
  Report "Unassigned" in case the user attempts to read the events without assigning the counter.
  
  ABMC is enabled during the boot up. Can be enabled or disabled later.

  Fixed opcode and flags combination.
    '=_" is valid.
    "-_" amd "+_" is not valid.

 Added all the comments as far as I know. If I missed something, it is not intentional.

v4: 
  Main change is domain specific event assignment.
  Kept the ABMC feature as a default.
  Dynamcic switching between ABMC and mbm_legacy is still allowed.
  We are still not clear about mount option.
  Moved the monitoring related data in resctrl_mon structure from rdt_resource.
  Fixed the display of legacy and ABMC mode.
  Used bimap APIs when possible.
  Removed event configuration read from MSRs. We can use the
  internal saved data.(patch 12)
  Added more comments about L3_QOS_ABMC_CFG MSR.
  Added IPIs to read the assignment status for each domain (patch 18 and 19)
  More details in each patch.

v3:
   This series adds the support for global assignment mode discussed in
   the thread. https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/
   Removed the individual assignment mode and included the global assignment interface.
   Added following interface files.
   a. /sys/fs/resctrl/info/L3_MON/mbm_assign
      Used for displaying the current assignment mode and switch between
      ABMC and legacy mode.
   b. /sys/fs/resctrl/info/L3_MON/mbm_assign_control
      Used for lising the groups assignment mode and modify the assignment states.
   c. Most of the changes are related to the new interface.
   d. Addressed the comments from Reinette, James and Peter.
   e. Hope I have addressed most of the major feedbacks discussed. If I missed
      something then it is not intentional. Please feel free to comment.
   f. Sending this as an RFC as per Reinette's comment. So, this is still open
      for discussion.

v2:
   a. Major change is the way ABMC is enabled. Earlier, user needed to remount
      with -o abmc to enable ABMC feature. Removed that option now.
      Now users can enable ABMC by "$echo 1 to /sys/fs/resctrl/info/L3_MON/mbm_assign_enable".
     
   b. Added new word 21 to x86/cpufeatures.h.

   c. Display unsupported if user attempts to read the events when ABMC is enabled
      and event is not assigned.

   d. Display monitor_state as "Unsupported" when ABMC is disabled.
  
   e. Text updates and rebase to latest tip tree (as of Jan 18).
 
   f. This series is still work in progress. I am yet to hear from ARM developers. 

v4:
  https://lore.kernel.org/lkml/cover.1716552602.git.babu.moger@amd.com/

v3:
 https://lore.kernel.org/lkml/cover.1711674410.git.babu.moger@amd.com/  

v2:
  https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/

v1 :
   https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/



Babu Moger (20):
  x86/cpufeatures: Add support for Assignable Bandwidth Monitoring
    Counters (ABMC)
  x86/resctrl: Add ABMC feature in the command line options
  x86/resctrl: Consolidate monitoring related data from rdt_resource
  x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
  x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags
  x86/resctrl: Add support to enable/disable AMD ABMC feature
  x86/resctrl: Introduce the interface to display monitor mode
  x86/resctrl: Introduce interface to display number of monitoring
    counters
  x86/resctrl: Initialize monitor counters bitmap
  x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg
  x86/resctrl: Remove MSR reading of event configuration value
  x86/resctrl: Add data structures and definitions for ABMC assignment
  x86/resctrl: Add the interface to assign hardware counter
  x86/resctrl: Add the interface to unassign hardware counter
  x86/resctrl: Assign/unassign counters by default when ABMC is enabled
  x86/resctrl: Report "Unassigned" for MBM events in ABMC mode
  x86/resctrl: Introduce the interface switch between monitor modes
  x86/resctrl: Enable AMD ABMC feature by default when supported
  x86/resctrl: Introduce interface to list monitor states of all the
    groups
  x86/resctrl: Introduce interface to modify assignment states of the
    groups

 .../admin-guide/kernel-parameters.txt         |   2 +-
 Documentation/arch/x86/resctrl.rst            | 181 ++++
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/msr-index.h              |   3 +
 arch/x86/kernel/cpu/cpuid-deps.c              |   3 +
 arch/x86/kernel/cpu/resctrl/core.c            |  12 +-
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c     |  19 +-
 arch/x86/kernel/cpu/resctrl/internal.h        |  69 +-
 arch/x86/kernel/cpu/resctrl/monitor.c         |  63 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c        | 930 ++++++++++++++++--
 arch/x86/kernel/cpu/scattered.c               |   1 +
 include/linux/resctrl.h                       |  21 +-
 12 files changed, 1218 insertions(+), 87 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [PATCH v5 01/20] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC)
  2024-07-03 21:48 [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
@ 2024-07-03 21:48 ` Babu Moger
  2024-07-12 21:55   ` Reinette Chatre
  2024-07-03 21:48 ` [PATCH v5 02/20] x86/resctrl: Add ABMC feature in the command line options Babu Moger
                   ` (19 subsequent siblings)
  20 siblings, 1 reply; 95+ messages in thread
From: Babu Moger @ 2024-07-03 21:48 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Users can create as many monitor groups as RMIDs supported by the hardware.
However, bandwidth monitoring feature on AMD system only guarantees that
RMIDs currently assigned to a processor will be tracked by hardware. The
counters of any other RMIDs which are no longer being tracked will be
reset to zero. The MBM event counters return "Unavailable" for the RMIDs
that are not tracked by hardware. So, there can be only limited number of
groups that can give guaranteed monitoring numbers. With ever changing
configurations there is no way to definitely know which of these groups
are being tracked for certain point of time. Users do not have the option
to monitor a group or set of groups for certain period of time without
worrying about RMID being reset in between.

The ABMC feature provides an option to the user to assign a hardware
counter to an RMID and monitor the bandwidth as long as it is assigned.
The assigned RMID will be tracked by the hardware until the user unassigns
it manually. There is no need to worry about counters being reset during
this period. Additionally, the user can specify a bitmask identifying the
specific bandwidth types from the given source to track with the counter.

Without ABMC enabled, monitoring will work in current mode without
assignment option.

Linux resctrl subsystem provides the interface to count maximum of two
memory bandwidth events per group, from a combination of available total
and local events. Keeping the current interface, users can enable a maximum
of 2 ABMC counters per group. User will also have the option to enable only
one counter to the group. If the system runs out of assignable ABMC
counters, kernel will display an error. Users need to disable an already
enabled counter to make space for new assignments.

The feature can be detected via CPUID_Fn80000020_EBX_x00 bit 5.
Bits Description
5    ABMC (Assignable Bandwidth Monitoring Counters)

The feature details are documented in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).

Note: Checkpatch checks/warnings are ignored to maintain coding style.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v5: Minor rebase change and subject line update.

v4: Changes because of rebase. Feature word 21 has few more additions now.
    Changed the text to "tracked by hardware" instead of active.

v3: Change because of rebase. Actual patch did not change.

v2: Added dependency on X86_FEATURE_BMEC.
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/cpu/cpuid-deps.c   | 3 +++
 arch/x86/kernel/cpu/scattered.c    | 1 +
 3 files changed, 5 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 6007462e03d6..d7e1764cbab7 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -471,6 +471,7 @@
 #define X86_FEATURE_BHI_CTRL		(21*32+ 2) /* BHI_DIS_S HW control available */
 #define X86_FEATURE_CLEAR_BHB_HW	(21*32+ 3) /* BHI_DIS_S HW control enabled */
 #define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* Clear branch history at vmexit using SW loop */
+#define X86_FEATURE_ABMC		(21*32+ 5) /* "" Assignable Bandwidth Monitoring Counters */
 
 /*
  * BUG word(s)
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index b7d9f530ae16..5227a6232e9e 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -70,6 +70,9 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_CQM_MBM_LOCAL,		X86_FEATURE_CQM_LLC   },
 	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_TOTAL   },
 	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_LOCAL   },
+	{ X86_FEATURE_ABMC,			X86_FEATURE_CQM_MBM_TOTAL   },
+	{ X86_FEATURE_ABMC,			X86_FEATURE_CQM_MBM_LOCAL   },
+	{ X86_FEATURE_ABMC,			X86_FEATURE_BMEC      },
 	{ X86_FEATURE_AVX512_BF16,		X86_FEATURE_AVX512VL  },
 	{ X86_FEATURE_AVX512_FP16,		X86_FEATURE_AVX512BW  },
 	{ X86_FEATURE_ENQCMD,			X86_FEATURE_XSAVES    },
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index af5aa2c754c2..411b18c962bb 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -48,6 +48,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
 	{ X86_FEATURE_SMBA,		CPUID_EBX,  2, 0x80000020, 0 },
 	{ X86_FEATURE_BMEC,		CPUID_EBX,  3, 0x80000020, 0 },
+	{ X86_FEATURE_ABMC,		CPUID_EBX,  5, 0x80000020, 0 },
 	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
 	{ X86_FEATURE_AMD_LBR_V2,	CPUID_EAX,  1, 0x80000022, 0 },
 	{ X86_FEATURE_AMD_LBR_PMC_FREEZE,	CPUID_EAX,  2, 0x80000022, 0 },
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v5 02/20] x86/resctrl: Add ABMC feature in the command line options
  2024-07-03 21:48 [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
  2024-07-03 21:48 ` [PATCH v5 01/20] x86/cpufeatures: Add support for " Babu Moger
@ 2024-07-03 21:48 ` Babu Moger
  2024-07-03 21:48 ` [PATCH v5 03/20] x86/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 95+ messages in thread
From: Babu Moger @ 2024-07-03 21:48 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Add the command line option to enable or disable the new resctrl feature
ABMC (Assignable Bandwidth Monitoring Counters).

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v5: No changes

v4: No changes

v3: No changes

v2: No changes
---
 Documentation/admin-guide/kernel-parameters.txt | 2 +-
 Documentation/arch/x86/resctrl.rst              | 1 +
 arch/x86/kernel/cpu/resctrl/core.c              | 2 ++
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 2d6eacea85bd..291d9c47f74d 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5597,7 +5597,7 @@
 	rdt=		[HW,X86,RDT]
 			Turn on/off individual RDT features. List is:
 			cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
-			mba, smba, bmec.
+			mba, smba, bmec, abmc.
 			E.g. to turn on cmt and turn off mba use:
 				rdt=cmt,!mba
 
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index a824affd741d..30586728a4cd 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -26,6 +26,7 @@ MBM (Memory Bandwidth Monitoring)		"cqm_mbm_total", "cqm_mbm_local"
 MBA (Memory Bandwidth Allocation)		"mba"
 SMBA (Slow Memory Bandwidth Allocation)         ""
 BMEC (Bandwidth Monitoring Event Configuration) ""
+ABMC (Assignable Bandwidth Monitoring Counters) ""
 ===============================================	================================
 
 Historically, new features were made visible by default in /proc/cpuinfo. This
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 1930fce9dfe9..9417d8bb7029 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -801,6 +801,7 @@ enum {
 	RDT_FLAG_MBA,
 	RDT_FLAG_SMBA,
 	RDT_FLAG_BMEC,
+	RDT_FLAG_ABMC,
 };
 
 #define RDT_OPT(idx, n, f)	\
@@ -826,6 +827,7 @@ static struct rdt_options rdt_options[]  __initdata = {
 	RDT_OPT(RDT_FLAG_MBA,	    "mba",	X86_FEATURE_MBA),
 	RDT_OPT(RDT_FLAG_SMBA,	    "smba",	X86_FEATURE_SMBA),
 	RDT_OPT(RDT_FLAG_BMEC,	    "bmec",	X86_FEATURE_BMEC),
+	RDT_OPT(RDT_FLAG_ABMC,	    "abmc",	X86_FEATURE_ABMC),
 };
 #define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options)
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v5 03/20] x86/resctrl: Consolidate monitoring related data from rdt_resource
  2024-07-03 21:48 [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
  2024-07-03 21:48 ` [PATCH v5 01/20] x86/cpufeatures: Add support for " Babu Moger
  2024-07-03 21:48 ` [PATCH v5 02/20] x86/resctrl: Add ABMC feature in the command line options Babu Moger
@ 2024-07-03 21:48 ` Babu Moger
  2024-07-12 21:57   ` Reinette Chatre
  2024-07-03 21:48 ` [PATCH v5 04/20] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 95+ messages in thread
From: Babu Moger @ 2024-07-03 21:48 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

The cache allocation and memory bandwidth allocation feature properties
are consolidated into cache and membw structures respectively. In
preparation for more monitoring properties that will clobber the existing
resource struct more, re-organize the monitoring specific properties into
separate structure.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v5: Commit message update.
    Also changes related to data structure updates does to SNC support.

v4: New patch.
---
 arch/x86/kernel/cpu/resctrl/core.c     |  2 +-
 arch/x86/kernel/cpu/resctrl/monitor.c  | 18 +++++++++---------
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |  8 ++++----
 include/linux/resctrl.h                | 13 +++++++++++--
 4 files changed, 25 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 9417d8bb7029..4a2d0955ccdc 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -617,7 +617,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
 
 	arch_mon_domain_online(r, d);
 
-	if (arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
+	if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
 		mon_domain_free(hw_dom);
 		return;
 	}
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 851b561850e0..795fe91a8feb 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -222,7 +222,7 @@ static int logical_rmid_to_physical_rmid(int cpu, int lrmid)
 	if (snc_nodes_per_l3_cache == 1)
 		return lrmid;
 
-	return lrmid + (cpu_to_node(cpu) % snc_nodes_per_l3_cache) * r->num_rmid;
+	return lrmid + (cpu_to_node(cpu) % snc_nodes_per_l3_cache) * r->mon.num_rmid;
 }
 
 static int __rmid_read_phys(u32 prmid, enum resctrl_event_id eventid, u64 *val)
@@ -297,11 +297,11 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *
 
 	if (is_mbm_total_enabled())
 		memset(hw_dom->arch_mbm_total, 0,
-		       sizeof(*hw_dom->arch_mbm_total) * r->num_rmid);
+		       sizeof(*hw_dom->arch_mbm_total) * r->mon.num_rmid);
 
 	if (is_mbm_local_enabled())
 		memset(hw_dom->arch_mbm_local, 0,
-		       sizeof(*hw_dom->arch_mbm_local) * r->num_rmid);
+		       sizeof(*hw_dom->arch_mbm_local) * r->mon.num_rmid);
 }
 
 static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
@@ -1083,14 +1083,14 @@ static struct mon_evt mbm_local_event = {
  */
 static void l3_mon_evt_init(struct rdt_resource *r)
 {
-	INIT_LIST_HEAD(&r->evt_list);
+	INIT_LIST_HEAD(&r->mon.evt_list);
 
 	if (is_llc_occupancy_enabled())
-		list_add_tail(&llc_occupancy_event.list, &r->evt_list);
+		list_add_tail(&llc_occupancy_event.list, &r->mon.evt_list);
 	if (is_mbm_total_enabled())
-		list_add_tail(&mbm_total_event.list, &r->evt_list);
+		list_add_tail(&mbm_total_event.list, &r->mon.evt_list);
 	if (is_mbm_local_enabled())
-		list_add_tail(&mbm_local_event.list, &r->evt_list);
+		list_add_tail(&mbm_local_event.list, &r->mon.evt_list);
 }
 
 /*
@@ -1186,7 +1186,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 
 	resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
 	hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale / snc_nodes_per_l3_cache;
-	r->num_rmid = (boot_cpu_data.x86_cache_max_rmid + 1) / snc_nodes_per_l3_cache;
+	r->mon.num_rmid = (boot_cpu_data.x86_cache_max_rmid + 1) / snc_nodes_per_l3_cache;
 	hw_res->mbm_width = MBM_CNTR_WIDTH_BASE;
 
 	if (mbm_offset > 0 && mbm_offset <= MBM_CNTR_WIDTH_OFFSET_MAX)
@@ -1201,7 +1201,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 	 *
 	 * For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC.
 	 */
-	threshold = resctrl_rmid_realloc_limit / r->num_rmid;
+	threshold = resctrl_rmid_realloc_limit / r->mon.num_rmid;
 
 	/*
 	 * Because num_rmid may not be a power of two, round the value
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index d7163b764c62..f9f3b5db1987 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1097,7 +1097,7 @@ static int rdt_num_rmids_show(struct kernfs_open_file *of,
 {
 	struct rdt_resource *r = of->kn->parent->priv;
 
-	seq_printf(seq, "%d\n", r->num_rmid);
+	seq_printf(seq, "%d\n", r->mon.num_rmid);
 
 	return 0;
 }
@@ -1108,7 +1108,7 @@ static int rdt_mon_features_show(struct kernfs_open_file *of,
 	struct rdt_resource *r = of->kn->parent->priv;
 	struct mon_evt *mevt;
 
-	list_for_each_entry(mevt, &r->evt_list, list) {
+	list_for_each_entry(mevt, &r->mon.evt_list, list) {
 		seq_printf(seq, "%s\n", mevt->name);
 		if (mevt->configurable)
 			seq_printf(seq, "%s_config\n", mevt->name);
@@ -3057,13 +3057,13 @@ static int mon_add_all_files(struct kernfs_node *kn, struct rdt_mon_domain *d,
 	struct mon_evt *mevt;
 	int ret;
 
-	if (WARN_ON(list_empty(&r->evt_list)))
+	if (WARN_ON(list_empty(&r->mon.evt_list)))
 		return -EPERM;
 
 	priv.u.rid = r->rid;
 	priv.u.domid = do_sum ? d->ci->id : d->hdr.id;
 	priv.u.sum = do_sum;
-	list_for_each_entry(mevt, &r->evt_list, list) {
+	list_for_each_entry(mevt, &r->mon.evt_list, list) {
 		priv.u.evtid = mevt->evtid;
 		ret = mon_addfile(kn, mevt->name, priv.priv);
 		if (ret)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index b0875b99e811..e43fc5bb5a3a 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -182,6 +182,16 @@ enum resctrl_scope {
 	RESCTRL_L3_NODE,
 };
 
+/**
+ * struct resctrl_mon - Monitoring related data
+ * @num_rmid:		Number of RMIDs available
+ * @evt_list:		List of monitoring events
+ */
+struct resctrl_mon {
+	int			num_rmid;
+	struct list_head	evt_list;
+};
+
 /**
  * struct rdt_resource - attributes of a resctrl resource
  * @rid:		The index of the resource
@@ -207,11 +217,11 @@ struct rdt_resource {
 	int			rid;
 	bool			alloc_capable;
 	bool			mon_capable;
-	int			num_rmid;
 	enum resctrl_scope	ctrl_scope;
 	enum resctrl_scope	mon_scope;
 	struct resctrl_cache	cache;
 	struct resctrl_membw	membw;
+	struct resctrl_mon	mon;
 	struct list_head	ctrl_domains;
 	struct list_head	mon_domains;
 	char			*name;
@@ -221,7 +231,6 @@ struct rdt_resource {
 	int			(*parse_ctrlval)(struct rdt_parse_data *data,
 						 struct resctrl_schema *s,
 						 struct rdt_ctrl_domain *d);
-	struct list_head	evt_list;
 	unsigned long		fflags;
 	bool			cdp_capable;
 };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v5 04/20] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
  2024-07-03 21:48 [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (2 preceding siblings ...)
  2024-07-03 21:48 ` [PATCH v5 03/20] x86/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
@ 2024-07-03 21:48 ` Babu Moger
  2024-07-12 22:04   ` Reinette Chatre
  2024-07-03 21:48 ` [PATCH v5 05/20] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags Babu Moger
                   ` (16 subsequent siblings)
  20 siblings, 1 reply; 95+ messages in thread
From: Babu Moger @ 2024-07-03 21:48 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

ABMC feature details are reported via CPUID Fn8000_0020_EBX_x5.
Bits Description
15:0 MAX_ABMC Maximum Supported Assignable Bandwidth
     Monitoring Counter ID + 1

The feature details are documented in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v5: Name change num_cntrs to num_mbm_cntrs.
    Moved abmc_capable to resctrl_mon.

v4: Removed resctrl_arch_has_abmc(). Added all the code inline. We dont
    need to separate this as arch code.

v3: Removed changes related to mon_features.
    Moved rdt_cpu_has to core.c and added new function resctrl_arch_has_abmc.
    Also moved the fields mbm_assign_capable and mbm_assign_cntrs to
    rdt_resource. (James)

v2: Changed the field name to mbm_assign_capable from abmc_capable.
---
 arch/x86/kernel/cpu/resctrl/monitor.c | 12 ++++++++++++
 include/linux/resctrl.h               |  4 ++++
 2 files changed, 16 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 795fe91a8feb..87d40f149ebc 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1229,6 +1229,18 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 			mbm_local_event.configurable = true;
 			mbm_config_rftype_init("mbm_local_bytes_config");
 		}
+
+		if (rdt_cpu_has(X86_FEATURE_ABMC)) {
+			r->mon.abmc_capable = true;
+			/*
+			 * Query CPUID_Fn80000020_EBX_x05 for number of
+			 * ABMC counters
+			 */
+			cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
+			r->mon.num_mbm_cntrs = (ebx & 0xFFFF) + 1;
+			if (WARN_ON(r->mon.num_mbm_cntrs > 64))
+				r->mon.num_mbm_cntrs = 64;
+		}
 	}
 
 	l3_mon_evt_init(r);
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index e43fc5bb5a3a..62f0f002ef41 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -185,10 +185,14 @@ enum resctrl_scope {
 /**
  * struct resctrl_mon - Monitoring related data
  * @num_rmid:		Number of RMIDs available
+ * @num_mbm_cntrs:	Number of monitoring counters
+ * @abmc_capable:	Is system capable of supporting monitor assignment?
  * @evt_list:		List of monitoring events
  */
 struct resctrl_mon {
 	int			num_rmid;
+	int			num_mbm_cntrs;
+	bool			abmc_capable;
 	struct list_head	evt_list;
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v5 05/20] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags
  2024-07-03 21:48 [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (3 preceding siblings ...)
  2024-07-03 21:48 ` [PATCH v5 04/20] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
@ 2024-07-03 21:48 ` Babu Moger
  2024-07-12 22:04   ` Reinette Chatre
  2024-07-03 21:48 ` [PATCH v5 06/20] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
                   ` (15 subsequent siblings)
  20 siblings, 1 reply; 95+ messages in thread
From: Babu Moger @ 2024-07-03 21:48 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

thread_throttle_mode_init() and mbm_config_rftype_init() both initialize
fflags for resctrl files.

Adding new files will involve adding another function to initialize
the fflags. This can be simplified by adding a new function
resctrl_file_fflags_init() and passing the file name and flags
to be initialized.

Consolidate fflags initialization into resctrl_file_fflags_init() and
remove thread_throttle_mode_init() and mbm_config_rftype_init().

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v5: Commit message update.

v4: Commit message update.

v3: New patch to display ABMC capability.
---
 arch/x86/kernel/cpu/resctrl/core.c     |  4 +++-
 arch/x86/kernel/cpu/resctrl/internal.h |  4 ++--
 arch/x86/kernel/cpu/resctrl/monitor.c  |  6 ++++--
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 16 +++-------------
 4 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 4a2d0955ccdc..ff5cb693b396 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -226,7 +226,9 @@ static bool __get_mem_config_intel(struct rdt_resource *r)
 		r->membw.throttle_mode = THREAD_THROTTLE_PER_THREAD;
 	else
 		r->membw.throttle_mode = THREAD_THROTTLE_MAX;
-	thread_throttle_mode_init();
+
+	resctrl_file_fflags_init("thread_throttle_mode",
+				 RFTYPE_CTRL_INFO | RFTYPE_RES_MB);
 
 	r->alloc_capable = true;
 
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 955999aecfca..2bd207624eec 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -647,8 +647,8 @@ void cqm_handle_limbo(struct work_struct *work);
 bool has_busy_rmid(struct rdt_mon_domain *d);
 void __check_limbo(struct rdt_mon_domain *d, bool force_free);
 void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
-void __init thread_throttle_mode_init(void);
-void __init mbm_config_rftype_init(const char *config);
+void __init resctrl_file_fflags_init(const char *config,
+				     unsigned long fflags);
 void rdt_staged_configs_clear(void);
 bool closid_allocated(unsigned int closid);
 int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 87d40f149ebc..12793762ca24 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1223,11 +1223,13 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 
 		if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
 			mbm_total_event.configurable = true;
-			mbm_config_rftype_init("mbm_total_bytes_config");
+			resctrl_file_fflags_init("mbm_total_bytes_config",
+						 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
 		}
 		if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
 			mbm_local_event.configurable = true;
-			mbm_config_rftype_init("mbm_local_bytes_config");
+			resctrl_file_fflags_init("mbm_local_bytes_config",
+						 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
 		}
 
 		if (rdt_cpu_has(X86_FEATURE_ABMC)) {
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index f9f3b5db1987..7e76f8d839fc 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2020,24 +2020,14 @@ static struct rftype *rdtgroup_get_rftype_by_name(const char *name)
 	return NULL;
 }
 
-void __init thread_throttle_mode_init(void)
-{
-	struct rftype *rft;
-
-	rft = rdtgroup_get_rftype_by_name("thread_throttle_mode");
-	if (!rft)
-		return;
-
-	rft->fflags = RFTYPE_CTRL_INFO | RFTYPE_RES_MB;
-}
-
-void __init mbm_config_rftype_init(const char *config)
+void __init resctrl_file_fflags_init(const char *config,
+				     unsigned long fflags)
 {
 	struct rftype *rft;
 
 	rft = rdtgroup_get_rftype_by_name(config);
 	if (rft)
-		rft->fflags = RFTYPE_MON_INFO | RFTYPE_RES_CACHE;
+		rft->fflags = fflags;
 }
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v5 06/20] x86/resctrl: Add support to enable/disable AMD ABMC feature
  2024-07-03 21:48 [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (4 preceding siblings ...)
  2024-07-03 21:48 ` [PATCH v5 05/20] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags Babu Moger
@ 2024-07-03 21:48 ` Babu Moger
  2024-07-12 22:05   ` Reinette Chatre
  2024-07-03 21:48 ` [PATCH v5 07/20] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
                   ` (14 subsequent siblings)
  20 siblings, 1 reply; 95+ messages in thread
From: Babu Moger @ 2024-07-03 21:48 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Add the functionality to enable/disable AMD ABMC feature.

AMD ABMC feature is enabled by setting enabled bit(0) in MSR
L3_QOS_EXT_CFG.  When the state of ABMC is changed, the MSR needs
to be updated on all the logical processors in the QOS Domain.

Hardware counters will reset when ABMC state is changed. Reset the
architectural state so that reading of hardware counter is not considered
as an overflow in next update.

The ABMC feature details are documented in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).

Signed-off-by: Babu Moger <babu.moger@amd.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
---
v5: Renamed resctrl_abmc_enable to resctrl_arch_abmc_enable.
    Renamed resctrl_abmc_disable to resctrl_arch_abmc_disable.
    Introduced resctrl_arch_get_abmc_enabled to get abmc state from
    non-arch code.
    Renamed resctrl_abmc_set_all to _resctrl_abmc_enable().
    Modified commit log to make it clear about AMD ABMC feature.

v3: No changes.

v2: Few text changes in commit message.
---
 arch/x86/include/asm/msr-index.h       |  1 +
 arch/x86/kernel/cpu/resctrl/internal.h | 13 +++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 66 ++++++++++++++++++++++++++
 3 files changed, 80 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 01342963011e..263b2d9d00ed 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1174,6 +1174,7 @@
 #define MSR_IA32_MBA_BW_BASE		0xc0000200
 #define MSR_IA32_SMBA_BW_BASE		0xc0000280
 #define MSR_IA32_EVT_CFG_BASE		0xc0000400
+#define MSR_IA32_L3_QOS_EXT_CFG		0xc00003ff
 
 /* MSR_IA32_VMX_MISC bits */
 #define MSR_IA32_VMX_MISC_INTEL_PT                 (1ULL << 14)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 2bd207624eec..0ce9797f80fe 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -97,6 +97,9 @@ cpumask_any_housekeeping(const struct cpumask *mask, int exclude_cpu)
 	return cpu;
 }
 
+/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature */
+#define ABMC_ENABLE			BIT(0)
+
 struct rdt_fs_context {
 	struct kernfs_fs_context	kfc;
 	bool				enable_cdpl2;
@@ -477,6 +480,7 @@ struct rdt_parse_data {
  * @mbm_cfg_mask:	Bandwidth sources that can be tracked when Bandwidth
  *			Monitoring Event Configuration (BMEC) is supported.
  * @cdp_enabled:	CDP state of this resource
+ * @abmc_enabled:	ABMC feature is enabled
  *
  * Members of this structure are either private to the architecture
  * e.g. mbm_width, or accessed via helpers that provide abstraction. e.g.
@@ -491,6 +495,7 @@ struct rdt_hw_resource {
 	unsigned int		mbm_width;
 	unsigned int		mbm_cfg_mask;
 	bool			cdp_enabled;
+	bool			abmc_enabled;
 };
 
 static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r)
@@ -536,6 +541,14 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
 
 void arch_mon_domain_online(struct rdt_resource *r, struct rdt_mon_domain *d);
 
+static inline bool resctrl_arch_get_abmc_enabled(void)
+{
+	return rdt_resources_all[RDT_RESOURCE_L3].abmc_enabled;
+}
+
+int resctrl_arch_abmc_enable(void);
+void resctrl_arch_abmc_disable(void);
+
 /*
  * To return the common struct rdt_resource, which is contained in struct
  * rdt_hw_resource, walk the resctrl member of struct rdt_hw_resource.
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 7e76f8d839fc..471fc0dbd7c3 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2402,6 +2402,72 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable)
 	return 0;
 }
 
+/*
+ * Update L3_QOS_EXT_CFG MSR on all the CPUs associated with the resource.
+ */
+static void resctrl_abmc_set_one_amd(void *arg)
+{
+	bool *enable = arg;
+	u64 msrval;
+
+	rdmsrl(MSR_IA32_L3_QOS_EXT_CFG, msrval);
+
+	if (*enable)
+		msrval |= ABMC_ENABLE;
+	else
+		msrval &= ~ABMC_ENABLE;
+
+	wrmsrl(MSR_IA32_L3_QOS_EXT_CFG, msrval);
+}
+
+static int _resctrl_abmc_enable(struct rdt_resource *r, bool enable)
+{
+	struct rdt_mon_domain *d;
+
+	/*
+	 * Hardware counters will reset after switching the monitor mode.
+	 * Reset the architectural state so that reading of hardware
+	 * counter is not considered as an overflow in the next update.
+	 */
+	list_for_each_entry(d, &r->mon_domains, hdr.list) {
+		on_each_cpu_mask(&d->hdr.cpu_mask,
+				 resctrl_abmc_set_one_amd, &enable, 1);
+		resctrl_arch_reset_rmid_all(r, d);
+	}
+
+	return 0;
+}
+
+int resctrl_arch_abmc_enable(void)
+{
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+	int ret = 0;
+
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	if (r->mon.abmc_capable && !hw_res->abmc_enabled) {
+		ret = _resctrl_abmc_enable(r, true);
+		if (!ret)
+			hw_res->abmc_enabled = true;
+	}
+
+	return ret;
+}
+
+void resctrl_arch_abmc_disable(void)
+{
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	if (hw_res->abmc_enabled) {
+		_resctrl_abmc_enable(r, false);
+		hw_res->abmc_enabled = false;
+	}
+}
+
 /*
  * We don't allow rdtgroup directories to be created anywhere
  * except the root directory. Thus when looking for the rdtgroup
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v5 07/20] x86/resctrl: Introduce the interface to display monitor mode
  2024-07-03 21:48 [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (5 preceding siblings ...)
  2024-07-03 21:48 ` [PATCH v5 06/20] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
@ 2024-07-03 21:48 ` Babu Moger
  2024-07-12 22:06   ` Reinette Chatre
  2024-07-03 21:48 ` [PATCH v5 08/20] x86/resctrl: Introduce interface to display number of monitoring counters Babu Moger
                   ` (13 subsequent siblings)
  20 siblings, 1 reply; 95+ messages in thread
From: Babu Moger @ 2024-07-03 21:48 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

The ABMC feature provides an option to the user to assign a hardware
counter to an RMID and monitor the bandwidth as long as it is assigned.
ABMC mode is enabled by default when supported. System can be one mode
at a time (Legacy monitor mode or ABMC mode).

Provide an interface to display the monitor mode on the system.
    $cat /sys/fs/resctrl/info/L3_MON/mbm_mode
    [abmc]
    legacy

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v5: Changed interface name to mbm_mode.
    It will be always available even if ABMC feature is not supported.
    Added description in resctrl.rst about ABMC mode.
    Fixed display abmc and legacy consistantly.

v4: Fixed the checks for legacy and abmc mode. Default it ABMC.

v3: New patch to display ABMC capability.
---
 Documentation/arch/x86/resctrl.rst     | 30 ++++++++++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/monitor.c  |  2 ++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 26 ++++++++++++++++++++++
 3 files changed, 58 insertions(+)

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 30586728a4cd..108e494fd7cc 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -257,6 +257,36 @@ with the following files:
 	    # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
 	    0=0x30;1=0x30;3=0x15;4=0x15
 
+"mbm_mode":
+	Reports the list of assignable monitoring features supported. The
+	enclosed brackets indicate which feature is enabled.
+	::
+
+	  cat /sys/fs/resctrl/info/L3_MON/mbm_mode
+	  [abmc]
+	  legacy
+
+	The bandwidth monitoring feature on AMD system only guarantees that
+	RMIDs currently assigned to a processor will be tracked by hardware.
+	The counters of any other RMIDs which are no longer being tracked
+	will be reset to zero. The MBM event counters return "Unavailable"
+	for the RMIDs that are not tracked by hardware. So, there can be
+	only limited number of groups that can give guaranteed monitoring
+	numbers. With ever changing configurations there is no way to
+	definitely know which of these groups are being tracked for certain
+	point of time. Users do not have the option to monitor a group or
+	set of groups for certain period of time without worrying about
+	RMID being reset in between.
+
+	The ABMC feature provides an option to the user to assign a
+	hardware counter to an RMID and monitor the bandwidth as long as
+	it is assigned. The assigned RMID will be tracked by the hardware
+	until the user unassigns it manually. There is no need to worry
+	about counters being reset during this period.
+
+	Without ABMC enabled, monitoring will work in "legacy" mode
+	without assignment option.
+
 "max_threshold_occupancy":
 		Read/write file provides the largest value (in
 		bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 12793762ca24..6c4cb36b4b50 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1245,6 +1245,8 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 		}
 	}
 
+	resctrl_file_fflags_init("mbm_mode", RFTYPE_MON_INFO);
+
 	l3_mon_evt_init(r);
 
 	r->mon_capable = true;
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 471fc0dbd7c3..3988d7b86817 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -845,6 +845,26 @@ static int rdtgroup_rmid_show(struct kernfs_open_file *of,
 	return ret;
 }
 
+static int rdtgroup_mbm_mode_show(struct kernfs_open_file *of,
+				  struct seq_file *s, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	if (r->mon.abmc_capable) {
+		if (resctrl_arch_get_abmc_enabled()) {
+			seq_puts(s, "[abmc]\n");
+			seq_puts(s, "legacy\n");
+		} else {
+			seq_puts(s, "abmc\n");
+			seq_puts(s, "[legacy]\n");
+		}
+	} else {
+		seq_puts(s, "[legacy]\n");
+	}
+
+	return 0;
+}
+
 #ifdef CONFIG_PROC_CPU_RESCTRL
 
 /*
@@ -1901,6 +1921,12 @@ static struct rftype res_common_files[] = {
 		.seq_show	= mbm_local_bytes_config_show,
 		.write		= mbm_local_bytes_config_write,
 	},
+	{
+		.name		= "mbm_mode",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdtgroup_mbm_mode_show,
+	},
 	{
 		.name		= "cpus",
 		.mode		= 0644,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v5 08/20] x86/resctrl: Introduce interface to display number of monitoring counters
  2024-07-03 21:48 [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (6 preceding siblings ...)
  2024-07-03 21:48 ` [PATCH v5 07/20] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
@ 2024-07-03 21:48 ` Babu Moger
  2024-07-03 21:48 ` [PATCH v5 09/20] x86/resctrl: Initialize monitor counters bitmap Babu Moger
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 95+ messages in thread
From: Babu Moger @ 2024-07-03 21:48 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

The ABMC feature provides an option to the user to assign a hardware
counter to an RMID and monitor the bandwidth as long as the counter is
assigned. Number of assignments depend on number of monitoring counters
available.

Provide the interface to display the number of monitoring counters
supported.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v5: Changed the display name from num_cntrs to num_mbm_cntrs.
    Updated the commit message.
    Moved the patch after mbm_mode is introduced.

v4: Changed the counter name to num_cntrs. And few text changes.

v3: Changed the field name to mbm_assign_cntrs.

v2: Changed the field name to mbm_assignable_counters from abmc_counters.
---
 Documentation/arch/x86/resctrl.rst     |  3 +++
 arch/x86/kernel/cpu/resctrl/monitor.c  |  2 ++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 16 ++++++++++++++++
 3 files changed, 21 insertions(+)

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 108e494fd7cc..4907d0758118 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -287,6 +287,9 @@ with the following files:
 	Without ABMC enabled, monitoring will work in "legacy" mode
 	without assignment option.
 
+"num_mbm_cntrs":
+	The number of monitoring counters available for assignment.
+
 "max_threshold_occupancy":
 		Read/write file provides the largest value (in
 		bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 6c4cb36b4b50..7a93a6d2b2de 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1242,6 +1242,8 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 			r->mon.num_mbm_cntrs = (ebx & 0xFFFF) + 1;
 			if (WARN_ON(r->mon.num_mbm_cntrs > 64))
 				r->mon.num_mbm_cntrs = 64;
+
+			resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
 		}
 	}
 
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 3988d7b86817..4f47f52e01c2 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -865,6 +865,16 @@ static int rdtgroup_mbm_mode_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
+				       struct seq_file *s, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	seq_printf(s, "%d\n", r->mon.num_mbm_cntrs);
+
+	return 0;
+}
+
 #ifdef CONFIG_PROC_CPU_RESCTRL
 
 /*
@@ -1935,6 +1945,12 @@ static struct rftype res_common_files[] = {
 		.seq_show	= rdtgroup_cpus_show,
 		.fflags		= RFTYPE_BASE,
 	},
+	{
+		.name		= "num_mbm_cntrs",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdtgroup_num_mbm_cntrs_show,
+	},
 	{
 		.name		= "cpus_list",
 		.mode		= 0644,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v5 09/20] x86/resctrl: Initialize monitor counters bitmap
  2024-07-03 21:48 [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (7 preceding siblings ...)
  2024-07-03 21:48 ` [PATCH v5 08/20] x86/resctrl: Introduce interface to display number of monitoring counters Babu Moger
@ 2024-07-03 21:48 ` Babu Moger
  2024-07-12 22:07   ` Reinette Chatre
  2024-07-26 22:48   ` Peter Newman
  2024-07-03 21:48 ` [PATCH v5 10/20] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg Babu Moger
                   ` (11 subsequent siblings)
  20 siblings, 2 replies; 95+ messages in thread
From: Babu Moger @ 2024-07-03 21:48 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hardware provides a set of counters when the ABMC feature is supported.
These counters are used for enabling the events in resctrl group when
the feature is enabled.

Introduce mbm_cntrs_free_map bitmap to track available and free counters.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v5:
  Updated the comments and commit log.
  Few renames
   num_cntrs_free_map -> mbm_cntrs_free_map
   num_cntrs_init -> mbm_cntrs_init
   Added initialization in rdt_get_tree because the default ABMC
   enablement happens during the init.

v4: Changed the name to num_cntrs where applicable.
    Used bitmap apis.
    Added more comments for the globals.

v3: Changed the bitmap name to assign_cntrs_free_map. Removed abmc
    from the name.

v2: Changed the bitmap name to assignable_counter_free_map from
    abmc_counter_free_map.
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 29 ++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 4f47f52e01c2..b3d3fa048f15 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -185,6 +185,23 @@ bool closid_allocated(unsigned int closid)
 	return !test_bit(closid, &closid_free_map);
 }
 
+/*
+ * Counter bitmap and its length for tracking available counters.
+ * ABMC feature provides set of hardware counters for enabling events.
+ * Each event takes one hardware counter. Kernel needs to keep track
+ * of number of available counters.
+ */
+static unsigned long mbm_cntrs_free_map;
+static unsigned int mbm_cntrs_free_map_len;
+
+static void mbm_cntrs_init(void)
+{
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+
+	bitmap_fill(&mbm_cntrs_free_map, r->mon.num_mbm_cntrs);
+	mbm_cntrs_free_map_len = r->mon.num_mbm_cntrs;
+}
+
 /**
  * rdtgroup_mode_by_closid - Return mode of resource group with closid
  * @closid: closid if the resource group
@@ -2466,6 +2483,12 @@ static int _resctrl_abmc_enable(struct rdt_resource *r, bool enable)
 {
 	struct rdt_mon_domain *d;
 
+	/*
+	 * Clear all the previous assignments while switching the monitor
+	 * mode.
+	 */
+	mbm_cntrs_init();
+
 	/*
 	 * Hardware counters will reset after switching the monitor mode.
 	 * Reset the architectural state so that reading of hardware
@@ -2724,10 +2747,10 @@ static void schemata_list_destroy(void)
 
 static int rdt_get_tree(struct fs_context *fc)
 {
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
 	struct rdt_fs_context *ctx = rdt_fc2context(fc);
 	unsigned long flags = RFTYPE_CTRL_BASE;
 	struct rdt_mon_domain *dom;
-	struct rdt_resource *r;
 	int ret;
 
 	cpus_read_lock();
@@ -2756,6 +2779,9 @@ static int rdt_get_tree(struct fs_context *fc)
 
 	closid_init();
 
+	if (r->mon.abmc_capable)
+		mbm_cntrs_init();
+
 	if (resctrl_arch_mon_capable())
 		flags |= RFTYPE_MON;
 
@@ -2800,7 +2826,6 @@ static int rdt_get_tree(struct fs_context *fc)
 		resctrl_mounted = true;
 
 	if (is_mbm_enabled()) {
-		r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
 		list_for_each_entry(dom, &r->mon_domains, hdr.list)
 			mbm_setup_overflow_handler(dom, MBM_OVERFLOW_INTERVAL,
 						   RESCTRL_PICK_ANY_CPU);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v5 10/20] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg
  2024-07-03 21:48 [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (8 preceding siblings ...)
  2024-07-03 21:48 ` [PATCH v5 09/20] x86/resctrl: Initialize monitor counters bitmap Babu Moger
@ 2024-07-03 21:48 ` Babu Moger
  2024-07-12 22:08   ` Reinette Chatre
  2024-07-03 21:48 ` [PATCH v5 11/20] x86/resctrl: Remove MSR reading of event configuration value Babu Moger
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 95+ messages in thread
From: Babu Moger @ 2024-07-03 21:48 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

If the BMEC (Bandwidth Monitoring Event Configuration) feature is
supported, the bandwidth events can be configured to track specific
events. The event configuration is domain specific. ABMC (Assignable
Bandwidth Monitoring Counters) feature needs event configuration
information to assign hardware counter to an RMID. Event configurations
are not stored in resctrl but instead always read from or written to
hardware directly when prompted by user space.

Read the event configuration from the hardware during the domain
initialization. Save the configuration information in the rdt_hw_domain,
so it can be used for counter assignment.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v5: Exported mon_event_config_index_get.
    Renamed arch_domain_mbm_evt_config to resctrl_arch_mbm_evt_config.

v4: Read the configuration information from the hardware to initialize.
    Added few commit messages.
    Fixed the tab spaces.

v3: Minor changes related to rebase in mbm_config_write_domain.

v2: No changes.
---
 arch/x86/kernel/cpu/resctrl/core.c     |  2 ++
 arch/x86/kernel/cpu/resctrl/internal.h |  6 ++++++
 arch/x86/kernel/cpu/resctrl/monitor.c  | 22 ++++++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |  2 +-
 4 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index ff5cb693b396..6265ef8b610f 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -619,6 +619,8 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
 
 	arch_mon_domain_online(r, d);
 
+	resctrl_arch_mbm_evt_config(hw_dom);
+
 	if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
 		mon_domain_free(hw_dom);
 		return;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 0ce9797f80fe..4cb1a5d014a3 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -401,6 +401,8 @@ struct rdt_hw_ctrl_domain {
  * @d_resctrl:	Properties exposed to the resctrl file system
  * @arch_mbm_total:	arch private state for MBM total bandwidth
  * @arch_mbm_local:	arch private state for MBM local bandwidth
+ * @mbm_total_cfg:	MBM total bandwidth configuration
+ * @mbm_local_cfg:	MBM local bandwidth configuration
  *
  * Members of this structure are accessed via helpers that provide abstraction.
  */
@@ -408,6 +410,8 @@ struct rdt_hw_mon_domain {
 	struct rdt_mon_domain		d_resctrl;
 	struct arch_mbm_state		*arch_mbm_total;
 	struct arch_mbm_state		*arch_mbm_local;
+	u32				mbm_total_cfg;
+	u32				mbm_local_cfg;
 };
 
 static inline struct rdt_hw_ctrl_domain *resctrl_to_arch_ctrl_dom(struct rdt_ctrl_domain *r)
@@ -662,6 +666,8 @@ void __check_limbo(struct rdt_mon_domain *d, bool force_free);
 void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
 void __init resctrl_file_fflags_init(const char *config,
 				     unsigned long fflags);
+void resctrl_arch_mbm_evt_config(struct rdt_hw_mon_domain *hw_dom);
+unsigned int mon_event_config_index_get(u32 evtid);
 void rdt_staged_configs_clear(void);
 bool closid_allocated(unsigned int closid);
 int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 7a93a6d2b2de..b96b0a8bd7d3 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1256,6 +1256,28 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 	return 0;
 }
 
+void resctrl_arch_mbm_evt_config(struct rdt_hw_mon_domain *hw_dom)
+{
+	unsigned int index;
+	u64 msrval;
+
+	/*
+	 * Read the configuration registers QOS_EVT_CFG_n, where <n> is
+	 * the BMEC event number (EvtID).
+	 */
+	if (mbm_total_event.configurable) {
+		index = mon_event_config_index_get(QOS_L3_MBM_TOTAL_EVENT_ID);
+		rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval);
+		hw_dom->mbm_total_cfg = msrval & MAX_EVT_CONFIG_BITS;
+	}
+
+	if (mbm_local_event.configurable) {
+		index = mon_event_config_index_get(QOS_L3_MBM_LOCAL_EVENT_ID);
+		rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval);
+		hw_dom->mbm_local_cfg = msrval & MAX_EVT_CONFIG_BITS;
+	}
+}
+
 void __exit rdt_put_mon_l3_config(void)
 {
 	dom_data_exit();
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index b3d3fa048f15..b2b751741dd8 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1606,7 +1606,7 @@ struct mon_config_info {
  *         1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
  *         INVALID_CONFIG_INDEX for invalid evtid
  */
-static inline unsigned int mon_event_config_index_get(u32 evtid)
+unsigned int mon_event_config_index_get(u32 evtid)
 {
 	switch (evtid) {
 	case QOS_L3_MBM_TOTAL_EVENT_ID:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v5 11/20] x86/resctrl: Remove MSR reading of event configuration value
  2024-07-03 21:48 [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (9 preceding siblings ...)
  2024-07-03 21:48 ` [PATCH v5 10/20] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg Babu Moger
@ 2024-07-03 21:48 ` Babu Moger
  2024-07-12 22:10   ` Reinette Chatre
  2024-07-03 21:48 ` [PATCH v5 12/20] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 95+ messages in thread
From: Babu Moger @ 2024-07-03 21:48 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

The event configuration is domain specific and initialized during domain
initialization. It is not required to read the configuration register
every time user asks for it. Use the value stored in rdt_mon_hw_domain
instead. Also update the configuration value when user writes it.

Introduce resctrl_arch_event_config_get() and
resctrl_arch_event_config_set() to get/set architecture domain specific
mbm_total_cfg/mbm_local_cfg values.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v5: Introduced resctrl_arch_event_config_get and
    resctrl_arch_event_config_get() based on our discussion.
    https://lore.kernel.org/lkml/68e861f9-245d-4496-a72e-46fc57d19c62@amd.com/

v4: New patch.
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 112 +++++++++++++++----------
 include/linux/resctrl.h                |   4 +
 2 files changed, 72 insertions(+), 44 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index b2b751741dd8..91c5d45ac367 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1591,10 +1591,59 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
 }
 
 struct mon_config_info {
+	struct rdt_mon_domain *d;
 	u32 evtid;
 	u32 mon_config;
 };
 
+#define INVALID_CONFIG_VALUE   UINT_MAX
+
+unsigned int resctrl_arch_event_config_get(struct rdt_mon_domain *d,
+					   enum resctrl_event_id eventid)
+{
+	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
+
+	switch (eventid) {
+	case QOS_L3_OCCUP_EVENT_ID:
+		break;
+	case QOS_L3_MBM_TOTAL_EVENT_ID:
+		return hw_dom->mbm_total_cfg;
+	case QOS_L3_MBM_LOCAL_EVENT_ID:
+		return hw_dom->mbm_local_cfg;
+	}
+
+	/* Never expect to get here */
+	WARN_ON_ONCE(1);
+
+	return INVALID_CONFIG_VALUE;
+}
+
+void resctrl_arch_event_config_set(void *info)
+{
+	struct mon_config_info *mon_info = info;
+	struct rdt_hw_mon_domain *hw_dom;
+	unsigned int index;
+
+	index = mon_event_config_index_get(mon_info->evtid);
+	if (index == INVALID_CONFIG_VALUE) {
+		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
+		return;
+	}
+	wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
+
+	hw_dom = resctrl_to_arch_mon_dom(mon_info->d);
+
+	switch (mon_info->evtid) {
+	case QOS_L3_OCCUP_EVENT_ID:
+		break;
+	case QOS_L3_MBM_TOTAL_EVENT_ID:
+		hw_dom->mbm_total_cfg = mon_info->mon_config;
+		break;
+	case QOS_L3_MBM_LOCAL_EVENT_ID:
+		hw_dom->mbm_local_cfg =  mon_info->mon_config;
+	}
+}
+
 #define INVALID_CONFIG_INDEX   UINT_MAX
 
 /**
@@ -1619,33 +1668,11 @@ unsigned int mon_event_config_index_get(u32 evtid)
 	}
 }
 
-static void mon_event_config_read(void *info)
-{
-	struct mon_config_info *mon_info = info;
-	unsigned int index;
-	u64 msrval;
-
-	index = mon_event_config_index_get(mon_info->evtid);
-	if (index == INVALID_CONFIG_INDEX) {
-		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
-		return;
-	}
-	rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval);
-
-	/* Report only the valid event configuration bits */
-	mon_info->mon_config = msrval & MAX_EVT_CONFIG_BITS;
-}
-
-static void mondata_config_read(struct rdt_mon_domain *d, struct mon_config_info *mon_info)
-{
-	smp_call_function_any(&d->hdr.cpu_mask, mon_event_config_read, mon_info, 1);
-}
-
 static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
 {
-	struct mon_config_info mon_info = {0};
 	struct rdt_mon_domain *dom;
 	bool sep = false;
+	int val;
 
 	cpus_read_lock();
 	mutex_lock(&rdtgroup_mutex);
@@ -1654,11 +1681,13 @@ static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid
 		if (sep)
 			seq_puts(s, ";");
 
-		memset(&mon_info, 0, sizeof(struct mon_config_info));
-		mon_info.evtid = evtid;
-		mondata_config_read(dom, &mon_info);
+		val = resctrl_arch_event_config_get(dom, evtid);
+		if (val == INVALID_CONFIG_VALUE) {
+			rdt_last_cmd_puts("Invalid event configuration\n");
+			break;
+		}
 
-		seq_printf(s, "%d=0x%02x", dom->hdr.id, mon_info.mon_config);
+		seq_printf(s, "%d=0x%02x", dom->hdr.id, val);
 		sep = true;
 	}
 	seq_puts(s, "\n");
@@ -1689,33 +1718,27 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
 	return 0;
 }
 
-static void mon_event_config_write(void *info)
-{
-	struct mon_config_info *mon_info = info;
-	unsigned int index;
-
-	index = mon_event_config_index_get(mon_info->evtid);
-	if (index == INVALID_CONFIG_INDEX) {
-		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
-		return;
-	}
-	wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
-}
 
 static void mbm_config_write_domain(struct rdt_resource *r,
 				    struct rdt_mon_domain *d, u32 evtid, u32 val)
 {
 	struct mon_config_info mon_info = {0};
+	int config_val;
 
 	/*
-	 * Read the current config value first. If both are the same then
+	 * Check the current config value first. If both are the same then
 	 * no need to write it again.
 	 */
-	mon_info.evtid = evtid;
-	mondata_config_read(d, &mon_info);
-	if (mon_info.mon_config == val)
+	config_val = resctrl_arch_event_config_get(d, evtid);
+	if (config_val == INVALID_CONFIG_VALUE) {
+		rdt_last_cmd_puts("Invalid event configuration\n");
+		return;
+	}
+	if (config_val == val)
 		return;
 
+	mon_info.d = d;
+	mon_info.evtid = evtid;
 	mon_info.mon_config = val;
 
 	/*
@@ -1724,7 +1747,8 @@ static void mbm_config_write_domain(struct rdt_resource *r,
 	 * are scoped at the domain level. Writing any of these MSRs
 	 * on one CPU is observed by all the CPUs in the domain.
 	 */
-	smp_call_function_any(&d->hdr.cpu_mask, mon_event_config_write,
+	smp_call_function_any(&d->hdr.cpu_mask,
+			      resctrl_arch_event_config_set,
 			      &mon_info, 1);
 
 	/*
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 62f0f002ef41..f017258ebf85 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -352,6 +352,10 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
  */
 void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d);
 
+void resctrl_arch_event_config_set(void *info);
+unsigned int resctrl_arch_event_config_get(struct rdt_mon_domain *d,
+					   enum resctrl_event_id eventid);
+
 extern unsigned int resctrl_rmid_realloc_threshold;
 extern unsigned int resctrl_rmid_realloc_limit;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v5 12/20] x86/resctrl: Add data structures and definitions for ABMC assignment
  2024-07-03 21:48 [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (10 preceding siblings ...)
  2024-07-03 21:48 ` [PATCH v5 11/20] x86/resctrl: Remove MSR reading of event configuration value Babu Moger
@ 2024-07-03 21:48 ` Babu Moger
  2024-07-12 22:13   ` Reinette Chatre
  2024-07-03 21:48 ` [PATCH v5 13/20] x86/resctrl: Add the interface to assign hardware counter Babu Moger
                   ` (8 subsequent siblings)
  20 siblings, 1 reply; 95+ messages in thread
From: Babu Moger @ 2024-07-03 21:48 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

The ABMC feature provides an option to the user to assign a hardware
counter to an RMID and monitor the bandwidth as long as the counter
is assigned. The bandwidth events will be tracked by the hardware until
the user changes the configuration. Each resctrl group can configure
maximum two counters, one for total event and one for local event.

The counters are configured by writing to MSR L3_QOS_ABMC_CFG.
Configuration is done by setting the counter id, bandwidth source (RMID)
and bandwidth configuration supported by BMEC(Bandwidth Monitoring Event
Configuration). Reading L3_QOS_ABMC_DSC returns the configuration of the
counter id specified in L3_QOS_ABMC_CFG.

Attempts to read or write these MSRs when ABMC is not enabled will result
in a #GP(0) exception.

Introduce data structures and definitions for ABMC assignments.

MSR L3_QOS_ABMC_CFG (0xC000_03FDh) and L3_QOS_ABMC_DSC (0xC000_03FEh)
details.
=========================================================================
Bits 	Mnemonic	Description			Access Reset
							Type   Value
=========================================================================
63 	CfgEn 		Configuration Enable 		R/W 	0

62 	CtrEn 		Enable/disable Tracking		R/W 	0

61:53 	– 		Reserved 			MBZ 	0

52:48 	CtrID 		Counter Identifier		R/W	0

47 	IsCOS		BwSrc field is a CLOSID		R/W	0
			(not an RMID)

46:44 	–		Reserved			MBZ	0

43:32	BwSrc		Bandwidth Source		R/W	0
			(RMID or CLOSID)

31:0	BwType		Bandwidth configuration		R/W	0
			to track for this counter
==========================================================================

The feature details are documented in the APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).

Signed-off-by: Babu Moger <babu.moger@amd.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
---
v5: Moved assignment flags here (path 10/19 of v4).
    Added MON_CNTR_UNSET definition to initialize cntr_id's.
    More details in commit log.
    Renamed few fields in l3_qos_abmc_cfg for readability.

v4: Added more descriptions.
    Changed the name abmc_ctr_id to ctr_id.
    Added L3_QOS_ABMC_DSC. Used for reading the configuration.

v3: No changes.

v2: No changes.
---
 arch/x86/include/asm/msr-index.h       |  2 ++
 arch/x86/kernel/cpu/resctrl/internal.h | 40 ++++++++++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 18 ++++++++++++
 3 files changed, 60 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 263b2d9d00ed..5e44ff91f459 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1175,6 +1175,8 @@
 #define MSR_IA32_SMBA_BW_BASE		0xc0000280
 #define MSR_IA32_EVT_CFG_BASE		0xc0000400
 #define MSR_IA32_L3_QOS_EXT_CFG		0xc00003ff
+#define MSR_IA32_L3_QOS_ABMC_CFG	0xc00003fd
+#define MSR_IA32_L3_QOS_ABMC_DSC	0xc00003fe
 
 /* MSR_IA32_VMX_MISC bits */
 #define MSR_IA32_VMX_MISC_INTEL_PT                 (1ULL << 14)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 4cb1a5d014a3..6925c947682d 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -100,6 +100,18 @@ cpumask_any_housekeeping(const struct cpumask *mask, int exclude_cpu)
 /* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature */
 #define ABMC_ENABLE			BIT(0)
 
+/*
+ * Assignment flags for ABMC feature
+ */
+#define ASSIGN_NONE	0
+#define ASSIGN_TOTAL	BIT(QOS_L3_MBM_TOTAL_EVENT_ID)
+#define ASSIGN_LOCAL	BIT(QOS_L3_MBM_LOCAL_EVENT_ID)
+
+#define MON_CNTR_UNSET	U32_MAX
+
+/* Maximum assignable counters per resctrl group */
+#define MAX_CNTRS	2
+
 struct rdt_fs_context {
 	struct kernfs_fs_context	kfc;
 	bool				enable_cdpl2;
@@ -228,12 +240,14 @@ enum rdtgrp_mode {
  * @parent:			parent rdtgrp
  * @crdtgrp_list:		child rdtgroup node list
  * @rmid:			rmid for this rdtgroup
+ * @cntr_id:			ABMC counter ids assigned to this group
  */
 struct mongroup {
 	struct kernfs_node	*mon_data_kn;
 	struct rdtgroup		*parent;
 	struct list_head	crdtgrp_list;
 	u32			rmid;
+	u32			cntr_id[MAX_CNTRS];
 };
 
 /**
@@ -607,6 +621,32 @@ union cpuid_0x10_x_edx {
 	unsigned int full;
 };
 
+/*
+ * ABMC counters can be configured by writing to L3_QOS_ABMC_CFG.
+ * @bw_type		: Bandwidth configuration(supported by BMEC)
+ *			  to track this counter id.
+ * @bw_src		: Bandwidth Source (RMID or CLOSID).
+ * @reserved1		: Reserved.
+ * @is_clos		: BwSrc field is a CLOSID (not an RMID).
+ * @cntr_id		: Counter Identifier.
+ * @reserved		: Reserved.
+ * @cntr_en		: Tracking Enable bit.
+ * @cfg_en		: Configuration Enable bit.
+ */
+union l3_qos_abmc_cfg {
+	struct {
+		unsigned long	bw_type	:32,
+				bw_src	:12,
+				reserved1: 3,
+				is_clos	: 1,
+				cntr_id	: 5,
+				reserved : 9,
+				cntr_en	: 1,
+				cfg_en	: 1;
+	} split;
+	unsigned long full;
+};
+
 void rdt_last_cmd_clear(void);
 void rdt_last_cmd_puts(const char *s);
 __printf(1, 2)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 91c5d45ac367..d2663f1345b7 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2505,6 +2505,7 @@ static void resctrl_abmc_set_one_amd(void *arg)
 
 static int _resctrl_abmc_enable(struct rdt_resource *r, bool enable)
 {
+	struct rdtgroup *prgrp, *crgrp;
 	struct rdt_mon_domain *d;
 
 	/*
@@ -2513,6 +2514,17 @@ static int _resctrl_abmc_enable(struct rdt_resource *r, bool enable)
 	 */
 	mbm_cntrs_init();
 
+	/* Reset the cntr_id's for all the monitor groups */
+	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
+		prgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
+		prgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
+		list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list,
+				    mon.crdtgrp_list) {
+			crgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
+			crgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
+		}
+	}
+
 	/*
 	 * Hardware counters will reset after switching the monitor mode.
 	 * Reset the architectural state so that reading of hardware
@@ -3573,6 +3585,8 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
 		return ret;
 	}
 	rdtgrp->mon.rmid = ret;
+	rdtgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
+	rdtgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
 
 	ret = mkdir_mondata_all(rdtgrp->kn, rdtgrp, &rdtgrp->mon.mon_data_kn);
 	if (ret) {
@@ -4128,6 +4142,10 @@ static void __init rdtgroup_setup_default(void)
 	rdtgroup_default.closid = RESCTRL_RESERVED_CLOSID;
 	rdtgroup_default.mon.rmid = RESCTRL_RESERVED_RMID;
 	rdtgroup_default.type = RDTCTRL_GROUP;
+
+	rdtgroup_default.mon.cntr_id[0] = MON_CNTR_UNSET;
+	rdtgroup_default.mon.cntr_id[1] = MON_CNTR_UNSET;
+
 	INIT_LIST_HEAD(&rdtgroup_default.mon.crdtgrp_list);
 
 	list_add(&rdtgroup_default.rdtgroup_list, &rdt_all_groups);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v5 13/20] x86/resctrl: Add the interface to assign hardware counter
  2024-07-03 21:48 [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (11 preceding siblings ...)
  2024-07-03 21:48 ` [PATCH v5 12/20] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
@ 2024-07-03 21:48 ` Babu Moger
  2024-07-12 22:09   ` Reinette Chatre
  2024-07-03 21:48 ` [PATCH v5 14/20] x86/resctrl: Add the interface to unassign " Babu Moger
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 95+ messages in thread
From: Babu Moger @ 2024-07-03 21:48 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

The ABMC feature provides an option to the user to assign a hardware
counter to an RMID and monitor the bandwidth as long as it is assigned.
The assigned RMID will be tracked by the hardware until the user unassigns
it manually.

Individual counters are configured by writing to L3_QOS_ABMC_CFG MSR
and specifying the counter id, bandwidth source, and bandwidth types.

Provide the interface to assign the counter ids to RMID.

The feature details are documented in the APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
    Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
    Monitoring (ABMC).

Signed-off-by: Babu Moger <babu.moger@amd.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
---
v5: Few name changes to match cntr_id.
    Changed the function names to
    rdtgroup_assign_cntr
    resctr_arch_assign_cntr
    More comments on commit log.
    Added function summary.

v4: Commit message update.
    User bitmap APIs where applicable.
    Changed the interfaces considering MPAM(arm).
    Added domain specific assignment.

v3: Removed the static from the prototype of rdtgroup_assign_abmc.
    The function is not called directly from user anymore. These
    changes are related to global assignment interface.

v2: Minor text changes in commit message.
---
 arch/x86/kernel/cpu/resctrl/internal.h |  3 +
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 96 ++++++++++++++++++++++++++
 2 files changed, 99 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 6925c947682d..66460375056c 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -708,6 +708,9 @@ void __init resctrl_file_fflags_init(const char *config,
 				     unsigned long fflags);
 void resctrl_arch_mbm_evt_config(struct rdt_hw_mon_domain *hw_dom);
 unsigned int mon_event_config_index_get(u32 evtid);
+int resctrl_arch_assign_cntr(struct rdt_mon_domain *d, u32 evtid, u32 rmid,
+			     u32 cntr_id, u32 closid, bool enable);
+int rdtgroup_assign_cntr(struct rdtgroup *rdtgrp, u32 evtid);
 void rdt_staged_configs_clear(void);
 bool closid_allocated(unsigned int closid);
 int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index d2663f1345b7..44f6eff42c30 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -202,6 +202,19 @@ static void mbm_cntrs_init(void)
 	mbm_cntrs_free_map_len = r->mon.num_mbm_cntrs;
 }
 
+static int mbm_cntr_alloc(void)
+{
+	u32 cntr_id = find_first_bit(&mbm_cntrs_free_map,
+				     mbm_cntrs_free_map_len);
+
+	if (cntr_id >= mbm_cntrs_free_map_len)
+		return -ENOSPC;
+
+	__clear_bit(cntr_id, &mbm_cntrs_free_map);
+
+	return cntr_id;
+}
+
 /**
  * rdtgroup_mode_by_closid - Return mode of resource group with closid
  * @closid: closid if the resource group
@@ -1860,6 +1873,89 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
 	return ret ?: nbytes;
 }
 
+static void rdtgroup_abmc_cfg(void *info)
+{
+	u64 *msrval = info;
+
+	wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *msrval);
+}
+
+/*
+ * Send an IPI to the domain to assign the counter id to RMID.
+ */
+int resctrl_arch_assign_cntr(struct rdt_mon_domain *d, u32 evtid, u32 rmid,
+			     u32 cntr_id, u32 closid, bool enable)
+{
+	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
+	union l3_qos_abmc_cfg abmc_cfg = { 0 };
+	struct arch_mbm_state *arch_mbm;
+
+	abmc_cfg.split.cfg_en = 1;
+	abmc_cfg.split.cntr_en = enable ? 1 : 0;
+	abmc_cfg.split.cntr_id = cntr_id;
+	abmc_cfg.split.bw_src = rmid;
+
+	/* Update the event configuration from the domain */
+	if (evtid == QOS_L3_MBM_TOTAL_EVENT_ID) {
+		abmc_cfg.split.bw_type = hw_dom->mbm_total_cfg;
+		arch_mbm = &hw_dom->arch_mbm_total[rmid];
+	} else {
+		abmc_cfg.split.bw_type = hw_dom->mbm_local_cfg;
+		arch_mbm = &hw_dom->arch_mbm_local[rmid];
+	}
+
+	smp_call_function_any(&d->hdr.cpu_mask, rdtgroup_abmc_cfg, &abmc_cfg, 1);
+
+	/*
+	 * Reset the architectural state so that reading of hardware
+	 * counter is not considered as an overflow in next update.
+	 */
+	if (arch_mbm)
+		memset(arch_mbm, 0, sizeof(struct arch_mbm_state));
+
+	return 0;
+}
+
+/*
+ * Assign a hardware counter id to the group. Allocate a new counter id
+ * if the event is unassigned.
+ */
+int rdtgroup_assign_cntr(struct rdtgroup *rdtgrp, u32 evtid)
+{
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+	int cntr_id = 0, index;
+	struct rdt_mon_domain *d;
+
+	index = mon_event_config_index_get(evtid);
+	if (index == INVALID_CONFIG_INDEX) {
+		rdt_last_cmd_puts("Invalid event id\n");
+		return -EINVAL;
+	}
+
+	/* Nothing to do if event has been assigned already */
+	if (rdtgrp->mon.cntr_id[index] != MON_CNTR_UNSET) {
+		rdt_last_cmd_puts("ABMC counter is assigned already\n");
+		return 0;
+	}
+
+	/*
+	 * Allocate a new counter id and update domains
+	 */
+	cntr_id = mbm_cntr_alloc();
+	if (cntr_id < 0) {
+		rdt_last_cmd_puts("Out of ABMC counters\n");
+		return -ENOSPC;
+	}
+
+	rdtgrp->mon.cntr_id[index] = cntr_id;
+
+	list_for_each_entry(d, &r->mon_domains, hdr.list)
+		resctrl_arch_assign_cntr(d, evtid, rdtgrp->mon.rmid,
+					 cntr_id, rdtgrp->closid, 1);
+
+	return 0;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v5 14/20] x86/resctrl: Add the interface to unassign hardware counter
  2024-07-03 21:48 [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (12 preceding siblings ...)
  2024-07-03 21:48 ` [PATCH v5 13/20] x86/resctrl: Add the interface to assign hardware counter Babu Moger
@ 2024-07-03 21:48 ` Babu Moger
  2024-07-03 21:48 ` [PATCH v5 15/20] x86/resctrl: Assign/unassign counters by default when ABMC is enabled Babu Moger
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 95+ messages in thread
From: Babu Moger @ 2024-07-03 21:48 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

The ABMC feature provides an option to the user to assign a hardware
counter to an RMID and monitor the bandwidth as long as it is assigned.
The assigned RMID will be tracked by the hardware until the user unassigns
it manually.

Hardware provides only limited number of counters. If the system runs out
of assignable counters, kernel will display an error when a new assignment
is requested. Users need to unassign a already assigned counter to make
space for new assignment.

Provide the interface to unassign the counter ids from the group.

The feature details are documented in the APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
    Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
    Monitoring (ABMC).

Signed-off-by: Babu Moger <babu.moger@amd.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
---
v5: Few name changes to match cntr_id.
    Changed the function names to
    rdtgroup_unassign_cntr
    More comments on commit log.

v4: Added domain specific unassign feature.
    Few name changes.

v3: Removed the static from the prototype of rdtgroup_unassign_abmc.
    The function is not called directly from user anymore. These
    changes are related to global assignment interface.

v2: No changes.
---
 arch/x86/kernel/cpu/resctrl/internal.h |  2 ++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 33 ++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 66460375056c..beb005775fe4 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -711,6 +711,8 @@ unsigned int mon_event_config_index_get(u32 evtid);
 int resctrl_arch_assign_cntr(struct rdt_mon_domain *d, u32 evtid, u32 rmid,
 			     u32 cntr_id, u32 closid, bool enable);
 int rdtgroup_assign_cntr(struct rdtgroup *rdtgrp, u32 evtid);
+int rdtgroup_unassign_cntr(struct rdtgroup *rdtgrp, u32 evtid);
+void mbm_cntr_free(u32 cntr_id);
 void rdt_staged_configs_clear(void);
 bool closid_allocated(unsigned int closid);
 int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 44f6eff42c30..ffde30b36c1a 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -215,6 +215,11 @@ static int mbm_cntr_alloc(void)
 	return cntr_id;
 }
 
+void mbm_cntr_free(u32 cntr_id)
+{
+	__set_bit(cntr_id, &mbm_cntrs_free_map);
+}
+
 /**
  * rdtgroup_mode_by_closid - Return mode of resource group with closid
  * @closid: closid if the resource group
@@ -1956,6 +1961,34 @@ int rdtgroup_assign_cntr(struct rdtgroup *rdtgrp, u32 evtid)
 	return 0;
 }
 
+/* Unassign a hardware counter id from the group. */
+int rdtgroup_unassign_cntr(struct rdtgroup *rdtgrp, u32 evtid)
+{
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+	struct rdt_mon_domain *d;
+	int index;
+
+	index = mon_event_config_index_get(evtid);
+	if (index == INVALID_CONFIG_INDEX) {
+		pr_warn_once("Invalid event id %d\n", evtid);
+		return -EINVAL;
+	}
+
+	if (rdtgrp->mon.cntr_id[index] != MON_CNTR_UNSET) {
+		list_for_each_entry(d, &r->mon_domains, hdr.list)
+			resctrl_arch_assign_cntr(d, evtid,
+						 rdtgrp->mon.rmid,
+						 rdtgrp->mon.cntr_id[index],
+						 rdtgrp->closid, 0);
+
+		/* Update the counter bitmap */
+		mbm_cntr_free(rdtgrp->mon.cntr_id[index]);
+		rdtgrp->mon.cntr_id[index] = MON_CNTR_UNSET;
+	}
+
+	return 0;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v5 15/20] x86/resctrl: Assign/unassign counters by default when ABMC is enabled
  2024-07-03 21:48 [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (13 preceding siblings ...)
  2024-07-03 21:48 ` [PATCH v5 14/20] x86/resctrl: Add the interface to unassign " Babu Moger
@ 2024-07-03 21:48 ` Babu Moger
  2024-07-12 22:10   ` Reinette Chatre
  2024-07-26 23:22   ` Peter Newman
  2024-07-03 21:48 ` [PATCH v5 16/20] x86/resctrl: Report "Unassigned" for MBM events in ABMC mode Babu Moger
                   ` (5 subsequent siblings)
  20 siblings, 2 replies; 95+ messages in thread
From: Babu Moger @ 2024-07-03 21:48 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Assign/unassign counters on resctrl group creation/deletion. If the
counters are exhausted, report the warnings and continue. It is not
required to fail group creation for assignment failures. Users have
the option to modify the assignments later.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v5: Removed the code to enable/disable ABMC during the mount.
    That will be another patch.
    Added arch callers to get the arch specific data.
    Renamed fuctions to match the other abmc function.
    Added code comments for assignment failures.

v4: Few name changes based on the upstream discussion.
    Commit message update.

v3: This is a new patch. Patch addresses the upstream comment to enable
    ABMC feature by default if the feature is available.
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 78 ++++++++++++++++++++++++++
 1 file changed, 78 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index ffde30b36c1a..475a0c7b2a25 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2910,6 +2910,46 @@ static void schemata_list_destroy(void)
 	}
 }
 
+/*
+ * Called when new group is created. Assign the counters if ABMC is
+ * already enabled. Two counters are required per group, one for total
+ * event and one for local event. With limited number of counters,
+ * the assignments can fail in some cases. But, it is not required to
+ * fail the group creation. Users have the option to modify the
+ * assignments after the group creation.
+ */
+static int rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
+{
+	int ret = 0;
+
+	if (!resctrl_arch_get_abmc_enabled())
+		return 0;
+
+	if (is_mbm_total_enabled())
+		ret = rdtgroup_assign_cntr(rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
+
+	if (!ret && is_mbm_local_enabled())
+		ret = rdtgroup_assign_cntr(rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
+
+	return ret;
+}
+
+static int rdtgroup_unassign_cntrs(struct rdtgroup *rdtgrp)
+{
+	int ret = 0;
+
+	if (!resctrl_arch_get_abmc_enabled())
+		return 0;
+
+	if (is_mbm_total_enabled())
+		ret = rdtgroup_unassign_cntr(rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
+
+	if (!ret && is_mbm_local_enabled())
+		ret = rdtgroup_unassign_cntr(rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
+
+	return ret;
+}
+
 static int rdt_get_tree(struct fs_context *fc)
 {
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
@@ -2972,6 +3012,16 @@ static int rdt_get_tree(struct fs_context *fc)
 		if (ret < 0)
 			goto out_mongrp;
 		rdtgroup_default.mon.mon_data_kn = kn_mondata;
+
+		/*
+		 * Assign the counters if ABMC is already enabled.
+		 * With limited number of counters, the assignments can
+		 * fail in some cases. But, it is not required to fail
+		 * the group creation. Users have the option to modify
+		 * the assignments after the group creation.
+		 */
+		if (rdtgroup_assign_cntrs(&rdtgroup_default) < 0)
+			rdt_last_cmd_puts("Monitor assignment failed\n");
 	}
 
 	ret = rdt_pseudo_lock_init();
@@ -3246,6 +3296,8 @@ static void rdt_kill_sb(struct super_block *sb)
 	cpus_read_lock();
 	mutex_lock(&rdtgroup_mutex);
 
+	rdtgroup_unassign_cntrs(&rdtgroup_default);
+
 	rdt_disable_ctx();
 
 	/*Put everything back to default values. */
@@ -3850,6 +3902,16 @@ static int rdtgroup_mkdir_mon(struct kernfs_node *parent_kn,
 		goto out_unlock;
 	}
 
+	/*
+	 * Assign the counters if ABMC is already enabled.
+	 * With the limited number of counters, there can be cases
+	 * only on assignment succeed. It is not required to fail
+	 * here in that case. Users have the option to modify the
+	 * assignments later.
+	 */
+	if (rdtgroup_assign_cntrs(rdtgrp) < 0)
+		rdt_last_cmd_puts("Monitor assignment failed\n");
+
 	kernfs_activate(rdtgrp->kn);
 
 	/*
@@ -3894,6 +3956,17 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
 	if (ret)
 		goto out_closid_free;
 
+	/*
+	 * Assign the counters if ABMC is already enabled.
+	 * With the limited number of counters, there can be cases
+	 * only on assignment succeed. It is not required to fail
+	 * here in that case. Users have the option to assign the
+	 * counter later.
+	 */
+
+	if (rdtgroup_assign_cntrs(rdtgrp) < 0)
+		rdt_last_cmd_puts("Monitor assignment failed\n");
+
 	kernfs_activate(rdtgrp->kn);
 
 	ret = rdtgroup_init_alloc(rdtgrp);
@@ -3989,6 +4062,9 @@ static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
 	update_closid_rmid(tmpmask, NULL);
 
 	rdtgrp->flags = RDT_DELETED;
+
+	rdtgroup_unassign_cntrs(rdtgrp);
+
 	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
 
 	/*
@@ -4035,6 +4111,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
 	cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
 	update_closid_rmid(tmpmask, NULL);
 
+	rdtgroup_unassign_cntrs(rdtgrp);
+
 	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
 	closid_free(rdtgrp->closid);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v5 16/20] x86/resctrl: Report "Unassigned" for MBM events in ABMC mode
  2024-07-03 21:48 [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (14 preceding siblings ...)
  2024-07-03 21:48 ` [PATCH v5 15/20] x86/resctrl: Assign/unassign counters by default when ABMC is enabled Babu Moger
@ 2024-07-03 21:48 ` Babu Moger
  2024-07-12 22:13   ` Reinette Chatre
  2024-07-13 20:26   ` Markus Elfring
  2024-07-03 21:48 ` [PATCH v5 17/20] x86/resctrl: Introduce the interface switch between monitor modes Babu Moger
                   ` (4 subsequent siblings)
  20 siblings, 2 replies; 95+ messages in thread
From: Babu Moger @ 2024-07-03 21:48 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

In ABMC mode, the hardware counter should be assigned to read the MBM
events.

Report "Unassigned" in case the user attempts to read the events without
assigning the counter.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v5: New patch.
---
 Documentation/arch/x86/resctrl.rst        |  4 ++++
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 19 ++++++++++++++-----
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 4907d0758118..11b7a5f26b40 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -284,6 +284,10 @@ with the following files:
 	until the user unassigns it manually. There is no need to worry
 	about counters being reset during this period.
 
+	In ABMC mode, the MBM event counters will return "Unassigned" if
+	the hardware counter is not assigned to the event. Users need to
+	assign a counter manually to read the events.
+
 	Without ABMC enabled, monitoring will work in "legacy" mode
 	without assignment option.
 
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 50fa1fe9a073..e60b469b7d12 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -562,7 +562,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 	struct rdtgroup *rdtgrp;
 	struct rdt_resource *r;
 	union mon_data_bits md;
-	int ret = 0;
+	int ret = 0, index;
 
 	rdtgrp = rdtgroup_kn_lock_live(of->kn);
 	if (!rdtgrp) {
@@ -609,12 +609,21 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 
 checkresult:
 
-	if (rr.err == -EIO)
+	if (rr.err == -EIO) {
 		seq_puts(m, "Error\n");
-	else if (rr.err == -EINVAL)
-		seq_puts(m, "Unavailable\n");
-	else
+	} else if (rr.err == -EINVAL) {
+		if (resctrl_arch_get_abmc_enabled()) {
+			index = mon_event_config_index_get(evtid);
+			if (rdtgrp->mon.cntr_id[index] == MON_CNTR_UNSET)
+				seq_puts(m, "Unassigned\n");
+			else
+				seq_puts(m, "Unavailable\n");
+		} else {
+			seq_puts(m, "Unavailable\n");
+		}
+	} else {
 		seq_printf(m, "%llu\n", rr.val);
+	}
 
 out:
 	rdtgroup_kn_unlock(of->kn);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v5 17/20] x86/resctrl: Introduce the interface switch between monitor modes
  2024-07-03 21:48 [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (15 preceding siblings ...)
  2024-07-03 21:48 ` [PATCH v5 16/20] x86/resctrl: Report "Unassigned" for MBM events in ABMC mode Babu Moger
@ 2024-07-03 21:48 ` Babu Moger
  2024-07-12 22:14   ` Reinette Chatre
  2024-07-13  7:15   ` Markus Elfring
  2024-07-03 21:48 ` [PATCH v5 18/20] x86/resctrl: Enable AMD ABMC feature by default when supported Babu Moger
                   ` (3 subsequent siblings)
  20 siblings, 2 replies; 95+ messages in thread
From: Babu Moger @ 2024-07-03 21:48 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Introduce interface to switch between ABMC and legacy modes.

By default ABMC is enabled on boot if the feature is available.
Provide the interface to go back to legacy mode if required.

$ cat /sys/fs/resctrl/info/L3_MON/mbm_mode
[abmc]
legacy

To enable the legacy monitoring feature:
$ echo "legacy" > /sys/fs/resctrl/info/L3_MON/mbm_mode

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v4: Minor commit text changes. Keep the default to ABMC when supported.
    Fixed comments to reflect changed interface "mbm_mode".

v3: New patch to address the review comments from upstream.
---
 Documentation/arch/x86/resctrl.rst     | 10 +++++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 37 +++++++++++++++++++++++++-
 2 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 11b7a5f26b40..4c41c5622627 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -291,6 +291,16 @@ with the following files:
 	Without ABMC enabled, monitoring will work in "legacy" mode
 	without assignment option.
 
+	* To enable ABMC feature:
+	  ::
+
+	    # echo  "abmc" > /sys/fs/resctrl/info/L3_MON/mbm_mode
+
+	* To enable the legacy monitoring feature:
+	  ::
+
+	    # echo  "legacy" > /sys/fs/resctrl/info/L3_MON/mbm_mode
+
 "num_mbm_cntrs":
 	The number of monitoring counters available for assignment.
 
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 475a0c7b2a25..531233779f8d 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -910,6 +910,40 @@ static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static ssize_t rdtgroup_mbm_mode_write(struct kernfs_open_file *of,
+				       char *buf, size_t nbytes,
+				       loff_t off)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+	int ret = 0;
+
+	if (!r->mon.abmc_capable)
+		return -EINVAL;
+
+	/* Valid input requires a trailing newline */
+	if (nbytes == 0 || buf[nbytes - 1] != '\n')
+		return -EINVAL;
+
+	buf[nbytes - 1] = '\0';
+
+	cpus_read_lock();
+	mutex_lock(&rdtgroup_mutex);
+
+	rdt_last_cmd_clear();
+
+	if (!strcmp(buf, "legacy"))
+		resctrl_arch_abmc_disable();
+	else if (!strcmp(buf, "abmc"))
+		ret = resctrl_arch_abmc_enable();
+	else
+		ret = -EINVAL;
+
+	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
+
+	return ret ?: nbytes;
+}
+
 #ifdef CONFIG_PROC_CPU_RESCTRL
 
 /*
@@ -2103,9 +2137,10 @@ static struct rftype res_common_files[] = {
 	},
 	{
 		.name		= "mbm_mode",
-		.mode		= 0444,
+		.mode		= 0644,
 		.kf_ops		= &rdtgroup_kf_single_ops,
 		.seq_show	= rdtgroup_mbm_mode_show,
+		.write		= rdtgroup_mbm_mode_write,
 	},
 	{
 		.name		= "cpus",
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v5 18/20] x86/resctrl: Enable AMD ABMC feature by default when supported
  2024-07-03 21:48 [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (16 preceding siblings ...)
  2024-07-03 21:48 ` [PATCH v5 17/20] x86/resctrl: Introduce the interface switch between monitor modes Babu Moger
@ 2024-07-03 21:48 ` Babu Moger
  2024-07-12 22:15   ` Reinette Chatre
  2024-07-03 21:48 ` [PATCH v5 19/20] x86/resctrl: Introduce interface to list monitor states of all the groups Babu Moger
                   ` (2 subsequent siblings)
  20 siblings, 1 reply; 95+ messages in thread
From: Babu Moger @ 2024-07-03 21:48 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Enable ABMC by default when supported during the boot up.

Users will not see any difference in the behavior when resctrl is
mounted. With automatic assignment everything will work as running
in the legacy monitor mode.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v5: New patch to enable ABMC by default.
---
 arch/x86/kernel/cpu/resctrl/core.c     |  2 ++
 arch/x86/kernel/cpu/resctrl/internal.h |  1 +
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 17 +++++++++++++++++
 3 files changed, 20 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 6265ef8b610f..b69b2650bde3 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -599,6 +599,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
 		d = container_of(hdr, struct rdt_mon_domain, hdr);
 
 		cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
+		resctrl_arch_configure_abmc();
 		return;
 	}
 
@@ -620,6 +621,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
 	arch_mon_domain_online(r, d);
 
 	resctrl_arch_mbm_evt_config(hw_dom);
+	resctrl_arch_configure_abmc();
 
 	if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
 		mon_domain_free(hw_dom);
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index beb005775fe4..0f858cff8ab1 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -707,6 +707,7 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
 void __init resctrl_file_fflags_init(const char *config,
 				     unsigned long fflags);
 void resctrl_arch_mbm_evt_config(struct rdt_hw_mon_domain *hw_dom);
+void resctrl_arch_configure_abmc(void);
 unsigned int mon_event_config_index_get(u32 evtid);
 int resctrl_arch_assign_cntr(struct rdt_mon_domain *d, u32 evtid, u32 rmid,
 			     u32 cntr_id, u32 closid, bool enable);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 531233779f8d..d978668c8865 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2733,6 +2733,23 @@ void resctrl_arch_abmc_disable(void)
 	}
 }
 
+void resctrl_arch_configure_abmc(void)
+{
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+	bool enable = true;
+
+	mutex_lock(&rdtgroup_mutex);
+
+	if (r->mon.abmc_capable) {
+		if (!hw_res->abmc_enabled)
+			hw_res->abmc_enabled = true;
+		resctrl_abmc_set_one_amd(&enable);
+	}
+
+	mutex_unlock(&rdtgroup_mutex);
+}
+
 /*
  * We don't allow rdtgroup directories to be created anywhere
  * except the root directory. Thus when looking for the rdtgroup
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v5 19/20] x86/resctrl: Introduce interface to list monitor states of all the groups
  2024-07-03 21:48 [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (17 preceding siblings ...)
  2024-07-03 21:48 ` [PATCH v5 18/20] x86/resctrl: Enable AMD ABMC feature by default when supported Babu Moger
@ 2024-07-03 21:48 ` Babu Moger
  2024-07-12 22:16   ` Reinette Chatre
  2024-07-03 21:48 ` [PATCH v5 20/20] x86/resctrl: Introduce interface to modify assignment states of " Babu Moger
  2024-07-12 22:03 ` [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Reinette Chatre
  20 siblings, 1 reply; 95+ messages in thread
From: Babu Moger @ 2024-07-03 21:48 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Provide the interface to list the monitor states of all the resctrl
groups in ABMC mode.

Example:
$cat /sys/fs/resctrl/info/L3_MON/mbm_control

List follows the following format:

"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"

Format for specific type of groups:

- Default CTRL_MON group:
  "//<domain_id>=<flags>"

- Non-default CTRL_MON group:
  "<CTRL_MON group>//<domain_id>=<flags>"

- Child MON group of default CTRL_MON group:
  "/<MON group>/<domain_id>=<flags>"

- Child MON group of non-default CTRL_MON group:
  "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"


Flags can be one of the following:
t  MBM total event is enabled
l  MBM local event is enabled
tl Both total and local MBM events are enabled
_  None of the MBM events are enabled

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v5: Replaced "assignment flags" with "flags".
    Changes related to mon structure.
    Changes related renaming the interface from mbm_assign_control to
    mbm_control.

v4: Added functionality to query domain specific assigment in.
    rdtgroup_abmc_dom_state().

v3: New patch.
    Addresses the feedback to provide the global assignment interface.
    https://lore.kernel.org/lkml/c73f444b-83a1-4e9a-95d3-54c5165ee782@intel.com/
---
 Documentation/arch/x86/resctrl.rst     |  54 ++++++++++
 arch/x86/kernel/cpu/resctrl/monitor.c  |   1 +
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 130 +++++++++++++++++++++++++
 3 files changed, 185 insertions(+)

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 4c41c5622627..05fee779e109 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -304,6 +304,60 @@ with the following files:
 "num_mbm_cntrs":
 	The number of monitoring counters available for assignment.
 
+"mbm_control":
+	Available when ABMC features are supported.
+	Reports the resctrl group and monitor status of each group.
+
+	List follows the following format:
+		"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
+
+	Format for specific type of grpups:
+
+	* Default CTRL_MON group:
+		"//<domain_id>=<flags>"
+
+	* Non-default CTRL_MON group:
+		"<CTRL_MON group>//<domain_id>=<flags>"
+
+	* Child MON group of default CTRL_MON group:
+		"/<MON group>/<domain_id>=<flags>"
+
+	* Child MON group of non-default CTRL_MON group:
+		"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
+
+	Flags can be one of the following:
+	::
+
+	 t  MBM total event is enabled.
+	 l  MBM local event is enabled.
+	 tl Both total and local MBM events are enabled.
+	 _  None of the MBM events are enabled.
+
+	Examples:
+	::
+
+	 # mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
+	 # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
+	 # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
+
+	 # cat /sys/fs/resctrl/info/L3_MON/mbm_control
+	 non_default_ctrl_mon_grp//0=tl;1=tl;
+	 non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+	 //0=tl;1=tl;
+	 /child_default_mon_grp/0=tl;1=tl;
+
+	 There are four resctrl groups. All the groups have total and local events are
+	 enabled on domain 0 and 1.
+
+	 non_default_ctrl_mon_grp// - This is a non-default CTRL_MON group.
+
+	 non_default_ctrl_mon_grp/child_non_default_mon_grp/ - This is a child monitor
+	 group of non-default CTRL_MON group.
+
+	 // - This is a default CTRL_MON group.
+
+	 /child_default_mon_grp/ - This is a child monitor group of default CTRL_MON group.
+
 "max_threshold_occupancy":
 		Read/write file provides the largest value (in
 		bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index b96b0a8bd7d3..684730f1a72d 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1244,6 +1244,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 				r->mon.num_mbm_cntrs = 64;
 
 			resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
+			resctrl_file_fflags_init("mbm_control", RFTYPE_MON_INFO);
 		}
 	}
 
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index d978668c8865..0de9f23d5389 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -944,6 +944,130 @@ static ssize_t rdtgroup_mbm_mode_write(struct kernfs_open_file *of,
 	return ret ?: nbytes;
 }
 
+static void rdtgroup_abmc_dom_cfg(void *info)
+{
+	u64 *msrval = info;
+
+	wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *msrval);
+	rdmsrl(MSR_IA32_L3_QOS_ABMC_DSC, *msrval);
+}
+
+/*
+ * Writing the counter id with CfgEn=0 on L3_QOS_ABMC_CFG and reading
+ * L3_QOS_ABMC_DSC back will return configuration of the counter
+ * specified.
+ */
+static int rdtgroup_abmc_dom_state(struct rdt_mon_domain *d, u32 cntr_id,
+				   u32 rmid)
+{
+	union l3_qos_abmc_cfg abmc_cfg = { 0 };
+
+	abmc_cfg.split.cfg_en = 0;
+	abmc_cfg.split.cntr_id = cntr_id;
+
+	smp_call_function_any(&d->hdr.cpu_mask, rdtgroup_abmc_dom_cfg,
+			      &abmc_cfg, 1);
+
+	if (abmc_cfg.split.cntr_en && abmc_cfg.split.bw_src == rmid)
+		return 0;
+	else
+		return -1;
+}
+
+static char *rdtgroup_mon_state_to_str(struct rdtgroup *rdtgrp,
+				       struct rdt_mon_domain *d, char *str)
+{
+	char *tmp = str;
+	int dom_state = ASSIGN_NONE;
+
+	/*
+	 * Query the monitor state for the domain.
+	 * Index 0 for evtid == QOS_L3_MBM_TOTAL_EVENT_ID
+	 * Index 1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
+	 */
+	if (rdtgrp->mon.cntr_id[0] != MON_CNTR_UNSET)
+		if (!rdtgroup_abmc_dom_state(d, rdtgrp->mon.cntr_id[0], rdtgrp->mon.rmid))
+			dom_state |= ASSIGN_TOTAL;
+
+	if (rdtgrp->mon.cntr_id[1] != MON_CNTR_UNSET)
+		if (!rdtgroup_abmc_dom_state(d, rdtgrp->mon.cntr_id[1], rdtgrp->mon.rmid))
+			dom_state |= ASSIGN_LOCAL;
+
+	switch (dom_state) {
+	case ASSIGN_NONE:
+		*tmp++ = '_';
+		break;
+	case (ASSIGN_TOTAL | ASSIGN_LOCAL):
+		*tmp++ = 't';
+		*tmp++ = 'l';
+		break;
+	case ASSIGN_TOTAL:
+		*tmp++ = 't';
+		break;
+	case ASSIGN_LOCAL:
+		*tmp++ = 'l';
+		break;
+	default:
+		break;
+	}
+
+	*tmp = '\0';
+	return str;
+}
+
+static int rdtgroup_mbm_control_show(struct kernfs_open_file *of,
+				     struct seq_file *s, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+	struct rdt_mon_domain *dom;
+	struct rdtgroup *rdtg;
+	int grp_default = 0;
+	char str[10];
+
+	if (!hw_res->abmc_enabled) {
+		rdt_last_cmd_puts("ABMC feature is not enabled\n");
+		return -EINVAL;
+	}
+
+	mutex_lock(&rdtgroup_mutex);
+
+	list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
+		struct rdtgroup *crg;
+
+		if (rdtg == &rdtgroup_default) {
+			grp_default = 1;
+			seq_puts(s, "//");
+		} else {
+			grp_default = 0;
+			seq_printf(s, "%s//", rdtg->kn->name);
+		}
+
+		list_for_each_entry(dom, &r->mon_domains, hdr.list)
+			seq_printf(s, "%d=%s;", dom->hdr.id,
+				   rdtgroup_mon_state_to_str(rdtg, dom, str));
+		seq_putc(s, '\n');
+
+		list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
+				    mon.crdtgrp_list) {
+			if (grp_default)
+				seq_printf(s, "/%s/", crg->kn->name);
+			else
+				seq_printf(s, "%s/%s/", rdtg->kn->name,
+					   crg->kn->name);
+
+			list_for_each_entry(dom, &r->mon_domains, hdr.list)
+				seq_printf(s, "%d=%s;", dom->hdr.id,
+					   rdtgroup_mon_state_to_str(crg, dom, str));
+			seq_putc(s, '\n');
+		}
+	}
+
+	mutex_unlock(&rdtgroup_mutex);
+
+	return 0;
+}
+
 #ifdef CONFIG_PROC_CPU_RESCTRL
 
 /*
@@ -2156,6 +2280,12 @@ static struct rftype res_common_files[] = {
 		.kf_ops		= &rdtgroup_kf_single_ops,
 		.seq_show	= rdtgroup_num_mbm_cntrs_show,
 	},
+	{
+		.name		= "mbm_control",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdtgroup_mbm_control_show,
+	},
 	{
 		.name		= "cpus_list",
 		.mode		= 0644,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v5 20/20] x86/resctrl: Introduce interface to modify assignment states of the groups
  2024-07-03 21:48 [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (18 preceding siblings ...)
  2024-07-03 21:48 ` [PATCH v5 19/20] x86/resctrl: Introduce interface to list monitor states of all the groups Babu Moger
@ 2024-07-03 21:48 ` Babu Moger
  2024-07-12 22:17   ` Reinette Chatre
  2024-07-25  0:03   ` Peter Newman
  2024-07-12 22:03 ` [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Reinette Chatre
  20 siblings, 2 replies; 95+ messages in thread
From: Babu Moger @ 2024-07-03 21:48 UTC (permalink / raw)
  To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
	kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Introduce the interface to enable events in ABMC mode.

Events can be enabled or disabled by writing to file
/sys/fs/resctrl/info/L3_MON/mbm_control

Format is similar to the list format with addition of op-code for the
assignment operation.
 "<CTRL_MON group>/<MON group>/<op-code><flags>"

Format for specific type of groups:

 * Default CTRL_MON group:
         "//<domain_id><op-code><flags>"

 * Non-default CTRL_MON group:
         "<CTRL_MON group>//<domain_id><op-code><flags>"

 * Child MON group of default CTRL_MON group:
         "/<MON group>/<domain_id><op-code><flags>"

 * Child MON group of non-default CTRL_MON group:
         "<CTRL_MON group>/<MON group>/<domain_id><op-code><flags>"

Op-code can be one of the following:

 = Update the assignment to match the flags
 + enable a new state
 - disable a new state

Assignment flags can be one of the following:
 t  MBM total event is enabled
 l  MBM local event is enabled
 tl Both total and local MBM events are enabled
 _  None of the MBM events are enabled. Valid only with '=" opcode.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v5: Interface name changed from mbm_assign_control to mbm_control.
    Fixed opcode and flags combination.
    '=_" is valid.
    "-_" amd "+_" is not valid.
    Minor message update.
    Renamed the function with prefix - rdtgroup_.
    Corrected few documentation mistakes.
    Rebase related changes after SNC support.

v4: Added domain specific assignments. Fixed the opcode parsing.

v3: New patch.
    Addresses the feedback to provide the global assignment interface.
    https://lore.kernel.org/lkml/c73f444b-83a1-4e9a-95d3-54c5165ee782@intel.com/
---
 Documentation/arch/x86/resctrl.rst     |  81 +++++++-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 250 ++++++++++++++++++++++++-
 2 files changed, 329 insertions(+), 2 deletions(-)

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 05fee779e109..5a621235eb2b 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -331,7 +331,7 @@ with the following files:
 	 t  MBM total event is enabled.
 	 l  MBM local event is enabled.
 	 tl Both total and local MBM events are enabled.
-	 _  None of the MBM events are enabled.
+	 _  None of the MBM events are enabled. Only works with opcode '=' for write.
 
 	Examples:
 	::
@@ -358,6 +358,85 @@ with the following files:
 
 	 /child_default_mon_grp/ - This is a child monitor group of default CTRL_MON group.
 
+	Assignment state can be updated by writing to the interface.
+
+	Format is similar to the list format with addition of op-code for the
+	assignment operation.
+
+		"<CTRL_MON group>/<MON group>/<op-code><flags>"
+
+	Format for each type of groups:
+
+        * Default CTRL_MON group:
+                "//<domain_id><op-code><flags>"
+
+        * Non-default CTRL_MON group:
+                "<CTRL_MON group>//<domain_id><op-code><flags>"
+
+        * Child MON group of default CTRL_MON group:
+                "/<MON group>/<domain_id><op-code><flags>"
+
+        * Child MON group of non-default CTRL_MON group:
+                "<CTRL_MON group>/<MON group>/<domain_id><op-code><flags>"
+
+	Op-code can be one of the following:
+	::
+
+	 = Update the assignment to match the flags.
+	 + Add a new state.
+	 - delete a new state.
+
+	Examples:
+	::
+
+	  Initial group status:
+	  # cat /sys/fs/resctrl/info/L3_MON/mbm_control
+	  non_default_ctrl_mon_grp//0=tl;1=tl;
+	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+	  //0=tl;1=tl;
+	  /child_default_mon_grp/0=tl;1=tl;
+
+	  To update the default group to enable only total event on domain 0:
+	  # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_control
+
+	  Assignment status after the update:
+	  # cat /sys/fs/resctrl/info/L3_MON/mbm_control
+	  non_default_ctrl_mon_grp//0=tl;1=tl;
+	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+	  //0=t;1=tl;
+	  /child_default_mon_grp/0=tl;1=tl;
+
+	  To update the MON group child_default_mon_grp to remove total event on domain 1:
+	  # echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_control
+
+	  Assignment status after the update:
+	  $ cat /sys/fs/resctrl/info/L3_MON/mbm_control
+	  non_default_ctrl_mon_grp//0=tl;1=tl;
+	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+	  //0=t;1=tl;
+	  /child_default_mon_grp/0=tl;1=l;
+
+	  To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to
+	  remove both local and total events on domain 1:
+	  # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
+			/sys/fs/resctrl/info/L3_MON/mbm_control
+
+	  Assignment status after the update:
+	  non_default_ctrl_mon_grp//0=tl;1=tl;
+	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
+	  //0=t;1=tl;
+	  /child_default_mon_grp/0=tl;1=l;
+
+	  To update the default group to add a local event domain 0.
+	  # echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_control
+
+	  Assignment status after the update:
+	  # cat /sys/fs/resctrl/info/L3_MON/mbm_control
+	  non_default_ctrl_mon_grp//0=tl;1=tl;
+	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
+	  //0=tl;1=tl;
+	  /child_default_mon_grp/0=tl;1=l;
+
 "max_threshold_occupancy":
 		Read/write file provides the largest value (in
 		bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 0de9f23d5389..84c0874d7872 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1068,6 +1068,253 @@ static int rdtgroup_mbm_control_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static int rdtgroup_str_to_mon_state(char *flag)
+{
+	int i, mon_state = 0;
+
+	for (i = 0; i < strlen(flag); i++) {
+		switch (*(flag + i)) {
+		case 't':
+			mon_state |= ASSIGN_TOTAL;
+			break;
+		case 'l':
+			mon_state |= ASSIGN_LOCAL;
+			break;
+		case '_':
+			mon_state = ASSIGN_NONE;
+			break;
+		default:
+			mon_state = ASSIGN_NONE;
+			break;
+		}
+	}
+
+	return mon_state;
+}
+
+static struct rdtgroup *rdtgroup_find_grp(enum rdt_group_type rtype, char *p_grp, char *c_grp)
+{
+	struct rdtgroup *rdtg, *crg;
+
+	if (rtype == RDTCTRL_GROUP && *p_grp == '\0') {
+		return &rdtgroup_default;
+	} else if (rtype == RDTCTRL_GROUP) {
+		list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list)
+			if (!strcmp(p_grp, rdtg->kn->name))
+				return rdtg;
+	} else if (rtype == RDTMON_GROUP) {
+		list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
+			if (!strcmp(p_grp, rdtg->kn->name)) {
+				list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
+						    mon.crdtgrp_list) {
+					if (!strcmp(c_grp, crg->kn->name))
+						return crg;
+				}
+			}
+		}
+	}
+
+	return NULL;
+}
+
+static int rdtgroup_process_flags(enum rdt_group_type rtype, char *p_grp, char *c_grp, char *tok)
+{
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+	int op, mon_state, assign_state, unassign_state;
+	char *dom_str, *id_str, *op_str;
+	struct rdt_mon_domain *d;
+	struct rdtgroup *rdtgrp;
+	unsigned long dom_id;
+	int ret, found = 0;
+
+	rdtgrp = rdtgroup_find_grp(rtype, p_grp, c_grp);
+
+	if (!rdtgrp) {
+		rdt_last_cmd_puts("Not a valid resctrl group\n");
+		return -EINVAL;
+	}
+
+next:
+	if (!tok || tok[0] == '\0')
+		return 0;
+
+	/* Start processing the strings for each domain */
+	dom_str = strim(strsep(&tok, ";"));
+
+	op_str = strpbrk(dom_str, "=+-");
+
+	if (op_str) {
+		op = *op_str;
+	} else {
+		rdt_last_cmd_puts("Missing operation =, +, -, _ character\n");
+		return -EINVAL;
+	}
+
+	id_str = strsep(&dom_str, "=+-");
+
+	if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
+		rdt_last_cmd_puts("Missing domain id\n");
+		return -EINVAL;
+	}
+
+	/* Verify if the dom_id is valid */
+	list_for_each_entry(d, &r->mon_domains, hdr.list) {
+		if (d->hdr.id == dom_id) {
+			found = 1;
+			break;
+		}
+	}
+	if (!found) {
+		rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
+		return -EINVAL;
+	}
+
+	mon_state = rdtgroup_str_to_mon_state(dom_str);
+
+	assign_state = 0;
+	unassign_state = 0;
+
+	switch (op) {
+	case '+':
+		if (mon_state == ASSIGN_NONE) {
+			rdt_last_cmd_puts("Invalid assign opcode\n");
+			goto out_fail;
+		}
+		assign_state = mon_state;
+		break;
+	case '-':
+		if (mon_state == ASSIGN_NONE) {
+			rdt_last_cmd_puts("Invalid assign opcode\n");
+			goto out_fail;
+		}
+		unassign_state = mon_state;
+		break;
+	case '=':
+		assign_state = mon_state;
+		unassign_state = (ASSIGN_TOTAL | ASSIGN_LOCAL) & ~assign_state;
+		break;
+	default:
+		break;
+	}
+
+	if (assign_state & ASSIGN_TOTAL)
+		ret = resctrl_arch_assign_cntr(d, QOS_L3_MBM_TOTAL_EVENT_ID,
+					       rdtgrp->mon.rmid,
+					       rdtgrp->mon.cntr_id[0],
+					       rdtgrp->closid, 1);
+	if (ret)
+		goto out_fail;
+
+	if (assign_state & ASSIGN_LOCAL)
+		ret = resctrl_arch_assign_cntr(d, QOS_L3_MBM_LOCAL_EVENT_ID,
+					       rdtgrp->mon.rmid,
+					       rdtgrp->mon.cntr_id[1],
+					       rdtgrp->closid, 1);
+
+	if (ret)
+		goto out_fail;
+
+	if (unassign_state & ASSIGN_TOTAL)
+		ret = resctrl_arch_assign_cntr(d, QOS_L3_MBM_TOTAL_EVENT_ID,
+					       rdtgrp->mon.rmid,
+					       rdtgrp->mon.cntr_id[0],
+					       rdtgrp->closid, 0);
+
+	if (ret)
+		goto out_fail;
+
+	if (unassign_state & ASSIGN_LOCAL)
+		ret = resctrl_arch_assign_cntr(d, QOS_L3_MBM_LOCAL_EVENT_ID,
+					       rdtgrp->mon.rmid,
+					       rdtgrp->mon.cntr_id[1],
+					       rdtgrp->closid, 0);
+	if (ret)
+		goto out_fail;
+
+	goto next;
+
+out_fail:
+
+	return -EINVAL;
+}
+
+static ssize_t rdtgroup_mbm_control_write(struct kernfs_open_file *of,
+					  char *buf, size_t nbytes,
+					  loff_t off)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+	char *token, *cmon_grp, *mon_grp;
+	struct rdt_hw_resource *hw_res;
+	int ret;
+
+	hw_res = resctrl_to_arch_res(r);
+	if (!hw_res->abmc_enabled)
+		return -EINVAL;
+
+	/* Valid input requires a trailing newline */
+	if (nbytes == 0 || buf[nbytes - 1] != '\n')
+		return -EINVAL;
+
+	buf[nbytes - 1] = '\0';
+	rdt_last_cmd_clear();
+
+	cpus_read_lock();
+	mutex_lock(&rdtgroup_mutex);
+
+	while ((token = strsep(&buf, "\n")) != NULL) {
+		if (strstr(token, "//")) {
+			/*
+			 * The CTRL_MON group processing:
+			 * default CTRL_MON group: "//<flags>"
+			 * non-default CTRL_MON group: "<CTRL_MON group>//flags"
+			 * The CTRL_MON group will be empty string if it is a
+			 * default group.
+			 */
+			cmon_grp = strsep(&token, "//");
+
+			/*
+			 * strsep returns empty string for contiguous delimiters.
+			 * Make sure check for two consicutive delimiters and
+			 * advance the token.
+			 */
+			mon_grp = strsep(&token, "//");
+			if (*mon_grp != '\0') {
+				rdt_last_cmd_printf("Invalid CTRL_MON group format %s\n", token);
+				ret = -EINVAL;
+				break;
+			}
+
+			ret = rdtgroup_process_flags(RDTCTRL_GROUP, cmon_grp, mon_grp, token);
+			if (ret)
+				break;
+		} else if (strstr(token, "/")) {
+			/*
+			 * MON group processing:
+			 * MON_GROUP inside default CTRL_MON group: "/<MON group>/<flags>"
+			 * MON_GROUP within CTRL_MON group: "<CTRL_MON group>/<MON group>/<flags>"
+			 */
+			cmon_grp = strsep(&token, "/");
+
+			/* Extract the MON_GROUP. It cannot be empty string */
+			mon_grp = strsep(&token, "/");
+			if (*mon_grp == '\0') {
+				rdt_last_cmd_printf("Invalid MON_GROUP format %s\n", token);
+				ret = -EINVAL;
+				break;
+			}
+
+			ret = rdtgroup_process_flags(RDTMON_GROUP, cmon_grp, mon_grp, token);
+			if (ret)
+				break;
+		}
+	}
+
+	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
+
+	return ret ?: nbytes;
+}
+
 #ifdef CONFIG_PROC_CPU_RESCTRL
 
 /*
@@ -2282,9 +2529,10 @@ static struct rftype res_common_files[] = {
 	},
 	{
 		.name		= "mbm_control",
-		.mode		= 0444,
+		.mode		= 0644,
 		.kf_ops		= &rdtgroup_kf_single_ops,
 		.seq_show	= rdtgroup_mbm_control_show,
+		.write		= rdtgroup_mbm_control_write,
 	},
 	{
 		.name		= "cpus_list",
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 01/20] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC)
  2024-07-03 21:48 ` [PATCH v5 01/20] x86/cpufeatures: Add support for " Babu Moger
@ 2024-07-12 21:55   ` Reinette Chatre
  2024-07-15 18:36     ` Moger, Babu
  0 siblings, 1 reply; 95+ messages in thread
From: Reinette Chatre @ 2024-07-12 21:55 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 7/3/24 2:48 PM, Babu Moger wrote:
> Users can create as many monitor groups as RMIDs supported by the hardware.
> However, bandwidth monitoring feature on AMD system only guarantees that
> RMIDs currently assigned to a processor will be tracked by hardware. The
> counters of any other RMIDs which are no longer being tracked will be
> reset to zero. The MBM event counters return "Unavailable" for the RMIDs
> that are not tracked by hardware. So, there can be only limited number of
> groups that can give guaranteed monitoring numbers. With ever changing
> configurations there is no way to definitely know which of these groups
> are being tracked for certain point of time. Users do not have the option
> to monitor a group or set of groups for certain period of time without
> worrying about RMID being reset in between.
> 
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID and monitor the bandwidth as long as it is assigned.
> The assigned RMID will be tracked by the hardware until the user unassigns
> it manually. There is no need to worry about counters being reset during
> this period. Additionally, the user can specify a bitmask identifying the
> specific bandwidth types from the given source to track with the counter.
> 
> Without ABMC enabled, monitoring will work in current mode without
> assignment option.
> 
> Linux resctrl subsystem provides the interface to count maximum of two
> memory bandwidth events per group, from a combination of available total
> and local events. Keeping the current interface, users can enable a maximum
> of 2 ABMC counters per group. User will also have the option to enable only
> one counter to the group. If the system runs out of assignable ABMC
> counters, kernel will display an error. Users need to disable an already
> enabled counter to make space for new assignments.
> 
> The feature can be detected via CPUID_Fn80000020_EBX_x00 bit 5.
> Bits Description
> 5    ABMC (Assignable Bandwidth Monitoring Counters)
> 
> The feature details are documented in APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
> 
> Note: Checkpatch checks/warnings are ignored to maintain coding style.

This note may be more appropriate below the '---' separator line.

> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 03/20] x86/resctrl: Consolidate monitoring related data from rdt_resource
  2024-07-03 21:48 ` [PATCH v5 03/20] x86/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
@ 2024-07-12 21:57   ` Reinette Chatre
  2024-07-15 19:05     ` Moger, Babu
  0 siblings, 1 reply; 95+ messages in thread
From: Reinette Chatre @ 2024-07-12 21:57 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 7/3/24 2:48 PM, Babu Moger wrote:
> The cache allocation and memory bandwidth allocation feature properties
> are consolidated into cache and membw structures respectively. In

Let "In preparation ... " start a new paragraph.

Quoting Documentation/process/maintainer-tip.rst:
	It's also useful to structure the changelog into several paragraphs
	and not lump everything together into a single one. A good structure
	is to explain the context, the problem and the solution in separate
	paragraphs and this order.

> preparation for more monitoring properties that will clobber the existing
> resource struct more, re-organize the monitoring specific properties into
> separate structure.

"re-organize the monitoring specific properties into separate structure" ->
"re-organize the monitoring specific properties to also be in a separate structure."

> 
> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

...

> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index b0875b99e811..e43fc5bb5a3a 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -182,6 +182,16 @@ enum resctrl_scope {
>   	RESCTRL_L3_NODE,
>   };
>   
> +/**
> + * struct resctrl_mon - Monitoring related data
> + * @num_rmid:		Number of RMIDs available
> + * @evt_list:		List of monitoring events
> + */
> +struct resctrl_mon {
> +	int			num_rmid;
> +	struct list_head	evt_list;
> +};
> +
>   /**
>    * struct rdt_resource - attributes of a resctrl resource
>    * @rid:		The index of the resource
> @@ -207,11 +217,11 @@ struct rdt_resource {
>   	int			rid;
>   	bool			alloc_capable;
>   	bool			mon_capable;
> -	int			num_rmid;
>   	enum resctrl_scope	ctrl_scope;
>   	enum resctrl_scope	mon_scope;
>   	struct resctrl_cache	cache;
>   	struct resctrl_membw	membw;
> +	struct resctrl_mon	mon;
>   	struct list_head	ctrl_domains;
>   	struct list_head	mon_domains;
>   	char			*name;
> @@ -221,7 +231,6 @@ struct rdt_resource {
>   	int			(*parse_ctrlval)(struct rdt_parse_data *data,
>   						 struct resctrl_schema *s,
>   						 struct rdt_ctrl_domain *d);
> -	struct list_head	evt_list;
>   	unsigned long		fflags;
>   	bool			cdp_capable;
>   };

struct rdt_resource's kernel-doc still refers to the members
removed in this patch. Its kernel-doc also needs an update for the new
member added.

Reinette

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2024-07-03 21:48 [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
                   ` (19 preceding siblings ...)
  2024-07-03 21:48 ` [PATCH v5 20/20] x86/resctrl: Introduce interface to modify assignment states of " Babu Moger
@ 2024-07-12 22:03 ` Reinette Chatre
  2024-07-17 17:19   ` Moger, Babu
  20 siblings, 1 reply; 95+ messages in thread
From: Reinette Chatre @ 2024-07-12 22:03 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 7/3/24 2:48 PM, Babu Moger wrote:
> # Linux Implementation
> 
> Linux resctrl subsystem provides the interface to count maximum of two
> memory bandwidth events per group, from a combination of available total
> and local events. Keeping the current interface, users can enable a maximum
> of 2 ABMC counters per group. User will also have the option to enable only
> one counter to the group. If the system runs out of assignable ABMC
> counters, kernel will display an error. Users need to disable an already
> enabled counter to make space for new assignments.

The implementation appears to be converging on an interface that can
be generic enough to be used by other features discussed along the way.
"Linux implementation" summary can thus add:

	Create a generic interface aimed to support user space assignment
	of scarce counters used for monitoring. First usage of interface
	is by ABMC with option to expand usage to "soft-RMID" and MPAM
	counters in future.


> # Examples
> 
> a. Check if ABMC support is available
> 	#mount -t resctrl resctrl /sys/fs/resctrl/
> 
> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_mode
> 	[abmc]
> 	legacy
> 
> 	Linux kernel detected ABMC feature and it is enabled.

How about renaming "abmc" to "mbm_cntrs"? This will match the num_mbm_cntrs
info file and be the final step to make this generic so that another architecture
can more easily support assignining hardware counters without needing to call
the feature AMD's "abmc".

Expanding on this it may be possible to add a new "sw_mbm_cntrs" feature that
will be the "soft-RMID" feature while also reflecting the "mbm_cntrs" name
so that when user space enables that feature its properties can be found in
"num_mbm_cntrs".

The "abmc" kernel parameter remains but that does seem separate from this
resctrl fs feature since it is explicitly tied to X86_FEATURE_ABMC surely
making it architecture specific.

> 
> b. Check how many ABMC counters are available.
> 
> 	#cat /sys/fs/resctrl/info/L3_MON/num_cntrs
> 	32

This is now num_mbm_cntrs

> 
> c. Create few resctrl groups.
> 
> 	# mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
> 	# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
> 	# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
> 
> 
> d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_control
>     to list and modify the group's monitoring states. File provides single place
>     to list monitoring states of all the resctrl groups. It makes it easier for
>     user space to learn about the counters are used without needing to traverse
>     all the groups thus reducing the number of filesystem calls.
> 
> 	The list follows the following format:
> 
> 	"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
> 
> 	Format for specific type of groups:
> 
> 	* Default CTRL_MON group:
> 	 "//<domain_id>=<flags>"
> 
>         * Non-default CTRL_MON group:
>                 "<CTRL_MON group>//<domain_id>=<flags>"
> 
>         * Child MON group of default CTRL_MON group:
>                 "/<MON group>/<domain_id>=<flags>"
> 
>         * Child MON group of non-default CTRL_MON group:
>                 "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
> 
>         Flags can be one of the following:
> 
>          t  MBM total event is enabled.
>          l  MBM local event is enabled.
>          tl Both total and local MBM events are enabled.
>          _  None of the MBM events are enabled

The language needs to be changed here (and in the many copied places) to
be specific about what setting the flag accomplishes. For example, in
"legacy" mode user space can be expected to find all events enabled, no?
Needing a new feature to set a flag to accomplish something that is
possible in legacy mode can thus cause confusion.

If I understand the implementation reading "mbm_control" will fail
if system is ABMC capable but it is disabled. Why can "mbm_control" not
always be displayed to user space? For example, what if "mbm_control" is
always available to user space and it can provide specific information to
user space. For example:
	t  MBM total event is enabled but may not always be counted.
	T  MBM total event is enabled and being counted.

On AMD systems resource groups will have "t" associated with monitor
groups when ABMC disabled, "T" when ABMC enabled and a counter assigned.
On Intel systems monitor groups will always have "T".

For "soft-RMID" the flag could possible continue to be "T"?

I am trying to find ways to communicate to user space consistently
and clearly and any insights will be appreciated. We really do not want
to add this interface and then find that it just causes confusion.

It is not quite obvious to me when the new files should be visible and
what they should present to the user. "mbm_mode" is now always visible.
Should "num_mbm_cntrs" not also always be visible? Right now "num_mbm_cntrs"
appears to be only associated to ABMC, should it not also, for example,
be the file that "soft-RMID" may use to share how many counters are
available? Its contents will thus be dynamic based on which "MBM mode" is
active, begging the question, what should it contain when "legacy" mode is
enabled, should "num_mbm_cntrs" perhaps show "0" to user space when
"legacy" mode is active?


> 
> 	Examples:
> 
> 	# cat /sys/fs/resctrl/info/L3_MON/mbm_control
> 	non_default_ctrl_mon_grp//0=tl;1=tl;
> 	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> 	//0=tl;1=tl;
> 	/child_default_mon_grp/0=tl;1=tl;
> 	
> 	There are four groups and all the groups have local and total
> 	event enabled on domain 0 and 1.

"local and total event" is vague, can it be made specific with, for example,
"local and total MBM events"

> 
> 	=tl means both total and local events are enabled.

Same here (and all copied places in this series)

> 
> 	"//" - This is a default CTRL_MON group
> 
> 	"non_default_ctrl_mon_grp//" - This is non-default CTRL_MON group
> 
> 	"/child_default_mon_grp/"  - This is Child MON group of the defult group

Same typos as in previous version of cover letter.

> 
> 	"non_default_ctrl_mon_grp/child_non_default_mon_grp/" - This is child
> 	MON group of the non-default group
> 
> e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_control.
> 
> 	The write format is similar to the above list format with addition of
> 	op-code for the assignment operation.
> 	
> 	* Default CTRL_MON group:
> 	        "//<domain_id><op-code><flags>"
> 	
> 	* Non-default CTRL_MON group:
> 	        "<CTRL_MON group>//<domain_id><op-code><flags>"
> 	
> 	* Child MON group of default CTRL_MON group:
> 	        "/<MON group>/<domain_id><op-code><flags>"
> 	
> 	* Child MON group of non-default CTRL_MON group:
> 	        "<CTRL_MON group>/<MON group>/<domain_id><op-code><flags>"
> 	
> 	Op-code can be one of the following:
> 	
> 	= Update the assignment to match the flag.
> 	+ Assign a new state.
> 	- Unassign a new state.

Please be consistent with terminology. Above switches between "flag"
and "state" while it then continues below using "event". Also,
"Unassign a _new_ state" is unexpected, it should probably be an
_existing_ (not "new") state/flag/event?

> 
> 	Flags can be one of the following:
> 
>          t  MBM total event.
>          l  MBM local event.
>          tl Both total and local MBM events.
>          _  None of the MBM events. Only works with '=' op-code.
> 	
> 	Initial group status:
> 	# cat /sys/fs/resctrl/info/L3_MON/mbm_control
> 	non_default_ctrl_mon_grp//0=tl;1=tl;
> 	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> 	//0=tl;1=tl;
> 	/child_default_mon_grp/0=tl;1=tl;
> 
> 	To update the default group to enable only total event on domain 0:
> 	# echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_control
> 
> 	Assignment status after the update:
> 	# cat /sys/fs/resctrl/info/L3_MON/mbm_control
> 	non_default_ctrl_mon_grp//0=tl;1=tl;
> 	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> 	//0=t;1=tl;
> 	/child_default_mon_grp/0=tl;1=tl;
> 
> 	To update the MON group child_default_mon_grp to remove total event on domain 1:
> 	# echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_control
> 
> 	Assignment status after the update:
> 	$ cat /sys/fs/resctrl/info/L3_MON/mbm_control
> 	non_default_ctrl_mon_grp//0=tl;1=tl;
> 	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> 	//0=t;1=tl;
> 	/child_default_mon_grp/0=tl;1=l;
> 
> 	To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to
> 	remove both local and total events on domain 1:
> 	# echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
> 	       /sys/fs/resctrl/info/L3_MON/mbm_control
> 
> 	Assignment status after the update:
> 	non_default_ctrl_mon_grp//0=tl;1=tl;
> 	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
> 	//0=t;1=tl;
> 	/child_default_mon_grp/0=tl;1=l;
> 
> 	To update the default group to add a local event domain 0.
> 	# echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_control
> 
> 	Assignment status after the update:
> 	# cat /sys/fs/resctrl/info/L3_MON/mbm_control
> 	non_default_ctrl_mon_grp//0=tl;1=tl;
> 	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
> 	//0=tl;1=tl;
> 	/child_default_mon_grp/0=tl;1=l;
> 
> 
> f. Read the event mbm_total_bytes and mbm_local_bytes of the default group.
>     There is no change in reading the events with ABMC. If the event is unassigned
>     when reading, then the read will come back as "Unassigned".
> 	
> 	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 	779247936
> 	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> 	765207488
> 	
> g. Users will have the option to go back to legacy mbm_mode if required.
>     This can be done using the following command. Note that switching the
>     mbm_mode will reset all the mbm counters of all resctrl groups.

mbm -> MBM (throughout)

> 
> 	# echo "legacy" > /sys/fs/resctrl/info/L3_MON/mbm_mode
> 	# cat /sys/fs/resctrl/info/L3_MON/mbm_mode
> 	abmc
> 	[legacy]
> 
> h. Check the bandwidth configuration for the group. Note that bandwidth
>     configuration has a domain scope. Total event defaults to 0x7F (to
>     count all the events) and local event defaults to 0x15 (to count all
>     the local numa events). The event bitmap decoding is available at
>     https://www.kernel.org/doc/Documentation/x86/resctrl.rst
>     in section "mbm_total_bytes_config", "mbm_local_bytes_config":
> 	
> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> 	0=0x7f;1=0x7f
> 	
> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
> 	0=0x15;1=0x15
> 	
> j. Change the bandwidth source for domain 0 for the total event to count only reads.
>     Note that this change effects total events on the domain 0.
> 	
> 	#echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> 	0=0x33;1=0x7F
> 	
> k. Now read the total event again. The first read will come back with "Unavailable"
>     status. The subsequent read of mbm_total_bytes will display only the read events.
> 	
> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 	Unavailable
> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 	314101
> 	
> l. Unmount the resctrl
> 	
> 	#umount /sys/fs/resctrl/
> 

Reinette

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 04/20] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
  2024-07-03 21:48 ` [PATCH v5 04/20] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
@ 2024-07-12 22:04   ` Reinette Chatre
  2024-07-15 20:04     ` Moger, Babu
  0 siblings, 1 reply; 95+ messages in thread
From: Reinette Chatre @ 2024-07-12 22:04 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 7/3/24 2:48 PM, Babu Moger wrote:
> ABMC feature details are reported via CPUID Fn8000_0020_EBX_x5.
> Bits Description
> 15:0 MAX_ABMC Maximum Supported Assignable Bandwidth
>       Monitoring Counter ID + 1
> 
> The feature details are documented in APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).

<insert snippet about what the patch does>

> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v5: Name change num_cntrs to num_mbm_cntrs.
>      Moved abmc_capable to resctrl_mon.
> 
> v4: Removed resctrl_arch_has_abmc(). Added all the code inline. We dont
>      need to separate this as arch code.
> 
> v3: Removed changes related to mon_features.
>      Moved rdt_cpu_has to core.c and added new function resctrl_arch_has_abmc.
>      Also moved the fields mbm_assign_capable and mbm_assign_cntrs to
>      rdt_resource. (James)
> 
> v2: Changed the field name to mbm_assign_capable from abmc_capable.
> ---
>   arch/x86/kernel/cpu/resctrl/monitor.c | 12 ++++++++++++
>   include/linux/resctrl.h               |  4 ++++
>   2 files changed, 16 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 795fe91a8feb..87d40f149ebc 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -1229,6 +1229,18 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>   			mbm_local_event.configurable = true;
>   			mbm_config_rftype_init("mbm_local_bytes_config");
>   		}
> +
> +		if (rdt_cpu_has(X86_FEATURE_ABMC)) {
> +			r->mon.abmc_capable = true;
> +			/*
> +			 * Query CPUID_Fn80000020_EBX_x05 for number of
> +			 * ABMC counters
> +			 */
> +			cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
> +			r->mon.num_mbm_cntrs = (ebx & 0xFFFF) + 1;
> +			if (WARN_ON(r->mon.num_mbm_cntrs > 64))
> +				r->mon.num_mbm_cntrs = 64;
> +		}
>   	}
>   
>   	l3_mon_evt_init(r);
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index e43fc5bb5a3a..62f0f002ef41 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -185,10 +185,14 @@ enum resctrl_scope {
>   /**
>    * struct resctrl_mon - Monitoring related data
>    * @num_rmid:		Number of RMIDs available
> + * @num_mbm_cntrs:	Number of monitoring counters
> + * @abmc_capable:	Is system capable of supporting monitor assignment?
>    * @evt_list:		List of monitoring events
>    */
>   struct resctrl_mon {
>   	int			num_rmid;
> +	int			num_mbm_cntrs;
> +	bool			abmc_capable;
>   	struct list_head	evt_list;
>   };
>   

How about renaming "abmc_capable" to "mbm_cntr_capable? That would,
(a) connect the capability to the "num_mbm_cntrs" property, and (b)
remove the AMD marketing name from the resctrl filesystem code that
will be shared by all architectures.

Reinette


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 05/20] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags
  2024-07-03 21:48 ` [PATCH v5 05/20] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags Babu Moger
@ 2024-07-12 22:04   ` Reinette Chatre
  0 siblings, 0 replies; 95+ messages in thread
From: Reinette Chatre @ 2024-07-12 22:04 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 7/3/24 2:48 PM, Babu Moger wrote:
> thread_throttle_mode_init() and mbm_config_rftype_init() both initialize
> fflags for resctrl files.
> 
> Adding new files will involve adding another function to initialize
> the fflags. This can be simplified by adding a new function
> resctrl_file_fflags_init() and passing the file name and flags
> to be initialized.
> 
> Consolidate fflags initialization into resctrl_file_fflags_init() and
> remove thread_throttle_mode_init() and mbm_config_rftype_init().
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 06/20] x86/resctrl: Add support to enable/disable AMD ABMC feature
  2024-07-03 21:48 ` [PATCH v5 06/20] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
@ 2024-07-12 22:05   ` Reinette Chatre
  2024-07-16 15:13     ` Moger, Babu
  2024-08-16 16:29     ` James Morse
  0 siblings, 2 replies; 95+ messages in thread
From: Reinette Chatre @ 2024-07-12 22:05 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 7/3/24 2:48 PM, Babu Moger wrote:
> Add the functionality to enable/disable AMD ABMC feature.
> 
> AMD ABMC feature is enabled by setting enabled bit(0) in MSR
> L3_QOS_EXT_CFG.  When the state of ABMC is changed, the MSR needs
> to be updated on all the logical processors in the QOS Domain.
> 
> Hardware counters will reset when ABMC state is changed. Reset the
> architectural state so that reading of hardware counter is not considered
> as an overflow in next update.
> 
> The ABMC feature details are documented in APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> ---
> v5: Renamed resctrl_abmc_enable to resctrl_arch_abmc_enable.
>      Renamed resctrl_abmc_disable to resctrl_arch_abmc_disable.
>      Introduced resctrl_arch_get_abmc_enabled to get abmc state from
>      non-arch code.
>      Renamed resctrl_abmc_set_all to _resctrl_abmc_enable().
>      Modified commit log to make it clear about AMD ABMC feature.
> 
> v3: No changes.
> 
> v2: Few text changes in commit message.
> ---
>   arch/x86/include/asm/msr-index.h       |  1 +
>   arch/x86/kernel/cpu/resctrl/internal.h | 13 +++++
>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 66 ++++++++++++++++++++++++++
>   3 files changed, 80 insertions(+)
> 
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 01342963011e..263b2d9d00ed 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -1174,6 +1174,7 @@
>   #define MSR_IA32_MBA_BW_BASE		0xc0000200
>   #define MSR_IA32_SMBA_BW_BASE		0xc0000280
>   #define MSR_IA32_EVT_CFG_BASE		0xc0000400
> +#define MSR_IA32_L3_QOS_EXT_CFG		0xc00003ff
>   
>   /* MSR_IA32_VMX_MISC bits */
>   #define MSR_IA32_VMX_MISC_INTEL_PT                 (1ULL << 14)
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 2bd207624eec..0ce9797f80fe 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -97,6 +97,9 @@ cpumask_any_housekeeping(const struct cpumask *mask, int exclude_cpu)
>   	return cpu;
>   }
>   
> +/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature */

Please be consistent throughout series to have sentences end with period.

> +#define ABMC_ENABLE			BIT(0)
> +
>   struct rdt_fs_context {
>   	struct kernfs_fs_context	kfc;
>   	bool				enable_cdpl2;
> @@ -477,6 +480,7 @@ struct rdt_parse_data {
>    * @mbm_cfg_mask:	Bandwidth sources that can be tracked when Bandwidth
>    *			Monitoring Event Configuration (BMEC) is supported.
>    * @cdp_enabled:	CDP state of this resource
> + * @abmc_enabled:	ABMC feature is enabled
>    *
>    * Members of this structure are either private to the architecture
>    * e.g. mbm_width, or accessed via helpers that provide abstraction. e.g.
> @@ -491,6 +495,7 @@ struct rdt_hw_resource {
>   	unsigned int		mbm_width;
>   	unsigned int		mbm_cfg_mask;
>   	bool			cdp_enabled;
> +	bool			abmc_enabled;
>   };

mbm_cntr_enabled? This is architecture specific code so there is more flexibility
here, but it may make implementation easier to understand if consistent naming is used
between fs and arch code.

>   
>   static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r)
> @@ -536,6 +541,14 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
>   
>   void arch_mon_domain_online(struct rdt_resource *r, struct rdt_mon_domain *d);
>   
> +static inline bool resctrl_arch_get_abmc_enabled(void)
> +{
> +	return rdt_resources_all[RDT_RESOURCE_L3].abmc_enabled;
> +}
> +
> +int resctrl_arch_abmc_enable(void);
> +void resctrl_arch_abmc_disable(void);
> +
>   /*
>    * To return the common struct rdt_resource, which is contained in struct
>    * rdt_hw_resource, walk the resctrl member of struct rdt_hw_resource.
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 7e76f8d839fc..471fc0dbd7c3 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2402,6 +2402,72 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable)
>   	return 0;
>   }
>   
> +/*
> + * Update L3_QOS_EXT_CFG MSR on all the CPUs associated with the resource.
> + */
> +static void resctrl_abmc_set_one_amd(void *arg)
> +{
> +	bool *enable = arg;
> +	u64 msrval;
> +
> +	rdmsrl(MSR_IA32_L3_QOS_EXT_CFG, msrval);
> +
> +	if (*enable)
> +		msrval |= ABMC_ENABLE;
> +	else
> +		msrval &= ~ABMC_ENABLE;
> +
> +	wrmsrl(MSR_IA32_L3_QOS_EXT_CFG, msrval);
> +}

msr_set_bit() and msr_clear_bit() can be used here.

> +
> +static int _resctrl_abmc_enable(struct rdt_resource *r, bool enable)
> +{
> +	struct rdt_mon_domain *d;
> +
> +	/*
> +	 * Hardware counters will reset after switching the monitor mode.
> +	 * Reset the architectural state so that reading of hardware
> +	 * counter is not considered as an overflow in the next update.
> +	 */
> +	list_for_each_entry(d, &r->mon_domains, hdr.list) {
> +		on_each_cpu_mask(&d->hdr.cpu_mask,
> +				 resctrl_abmc_set_one_amd, &enable, 1);
> +		resctrl_arch_reset_rmid_all(r, d);
> +	}
> +
> +	return 0;
> +}

Seems like _resctrl_abmc_enable() can just return void.

> +
> +int resctrl_arch_abmc_enable(void)

resctrl_arch_mbm_cntr_enable()? I'll no longer point all these out.

> +{
> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> +	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
> +	int ret = 0;
> +
> +	lockdep_assert_held(&rdtgroup_mutex);
> +
> +	if (r->mon.abmc_capable && !hw_res->abmc_enabled) {
> +		ret = _resctrl_abmc_enable(r, true);
> +		if (!ret)
> +			hw_res->abmc_enabled = true;

The above error handling seems unnecessary.

> +	}
> +
> +	return ret;

resctrl_arch_abmc_enable() should probably keep returning an int even though
this implementation does not need it since other archs may indeed return error.

> +}
> +
> +void resctrl_arch_abmc_disable(void)
> +{
> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> +	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
> +
> +	lockdep_assert_held(&rdtgroup_mutex);
> +
> +	if (hw_res->abmc_enabled) {
> +		_resctrl_abmc_enable(r, false);
> +		hw_res->abmc_enabled = false;
> +	}
> +}
> +
>   /*
>    * We don't allow rdtgroup directories to be created anywhere
>    * except the root directory. Thus when looking for the rdtgroup

Reinette

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 07/20] x86/resctrl: Introduce the interface to display monitor mode
  2024-07-03 21:48 ` [PATCH v5 07/20] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
@ 2024-07-12 22:06   ` Reinette Chatre
  2024-07-16 16:51     ` Moger, Babu
  0 siblings, 1 reply; 95+ messages in thread
From: Reinette Chatre @ 2024-07-12 22:06 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 7/3/24 2:48 PM, Babu Moger wrote:
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID and monitor the bandwidth as long as it is assigned.
> ABMC mode is enabled by default when supported. System can be one mode
> at a time (Legacy monitor mode or ABMC mode).
> 
> Provide an interface to display the monitor mode on the system.
>      $cat /sys/fs/resctrl/info/L3_MON/mbm_mode
>      [abmc]
>      legacy

<insert snippet about what happens when user switches from one mode
to another>

> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v5: Changed interface name to mbm_mode.
>      It will be always available even if ABMC feature is not supported.
>      Added description in resctrl.rst about ABMC mode.
>      Fixed display abmc and legacy consistantly.
> 
> v4: Fixed the checks for legacy and abmc mode. Default it ABMC.
> 
> v3: New patch to display ABMC capability.
> ---
>   Documentation/arch/x86/resctrl.rst     | 30 ++++++++++++++++++++++++++
>   arch/x86/kernel/cpu/resctrl/monitor.c  |  2 ++
>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 26 ++++++++++++++++++++++
>   3 files changed, 58 insertions(+)
> 
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index 30586728a4cd..108e494fd7cc 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -257,6 +257,36 @@ with the following files:
>   	    # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
>   	    0=0x30;1=0x30;3=0x15;4=0x15
>   
> +"mbm_mode":
> +	Reports the list of assignable monitoring features supported. The
> +	enclosed brackets indicate which feature is enabled.
> +	::
> +
> +	  cat /sys/fs/resctrl/info/L3_MON/mbm_mode
> +	  [abmc]
> +	  legacy
> +

"mbm_cntr" mode can be documented here with the details on how AMD's ABMC is
one example of how it may be implemented on a system.

> +	The bandwidth monitoring feature on AMD system only guarantees that
> +	RMIDs currently assigned to a processor will be tracked by hardware.
> +	The counters of any other RMIDs which are no longer being tracked
> +	will be reset to zero. The MBM event counters return "Unavailable"
> +	for the RMIDs that are not tracked by hardware. So, there can be
> +	only limited number of groups that can give guaranteed monitoring
> +	numbers. With ever changing configurations there is no way to
> +	definitely know which of these groups are being tracked for certain
> +	point of time. Users do not have the option to monitor a group or
> +	set of groups for certain period of time without worrying about
> +	RMID being reset in between.
> +
> +	The ABMC feature provides an option to the user to assign a
> +	hardware counter to an RMID and monitor the bandwidth as long as
> +	it is assigned. The assigned RMID will be tracked by the hardware
> +	until the user unassigns it manually. There is no need to worry
> +	about counters being reset during this period.
> +
> +	Without ABMC enabled, monitoring will work in "legacy" mode
> +	without assignment option.

Let "legacy" be a distinct mode, instead of an alternative to ABMC.

> +
>   "max_threshold_occupancy":
>   		Read/write file provides the largest value (in
>   		bytes) at which a previously used LLC_occupancy
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 12793762ca24..6c4cb36b4b50 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -1245,6 +1245,8 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>   		}
>   	}
>   
> +	resctrl_file_fflags_init("mbm_mode", RFTYPE_MON_INFO);
> +

Is this special flag assignment necessary? With file always visible I think it
can just be initialized in res_common_files below with the flag already assigned?

>   	l3_mon_evt_init(r);
>   
>   	r->mon_capable = true;
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 471fc0dbd7c3..3988d7b86817 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -845,6 +845,26 @@ static int rdtgroup_rmid_show(struct kernfs_open_file *of,
>   	return ret;
>   }
>   
> +static int rdtgroup_mbm_mode_show(struct kernfs_open_file *of,
> +				  struct seq_file *s, void *v)
> +{
> +	struct rdt_resource *r = of->kn->parent->priv;
> +
> +	if (r->mon.abmc_capable) {
> +		if (resctrl_arch_get_abmc_enabled()) {
> +			seq_puts(s, "[abmc]\n");
> +			seq_puts(s, "legacy\n");
> +		} else {
> +			seq_puts(s, "abmc\n");
> +			seq_puts(s, "[legacy]\n");
> +		}
> +	} else {
> +		seq_puts(s, "[legacy]\n");
> +	}
> +
> +	return 0;
> +}
> +
>   #ifdef CONFIG_PROC_CPU_RESCTRL
>   
>   /*
> @@ -1901,6 +1921,12 @@ static struct rftype res_common_files[] = {
>   		.seq_show	= mbm_local_bytes_config_show,
>   		.write		= mbm_local_bytes_config_write,
>   	},
> +	{
> +		.name		= "mbm_mode",
> +		.mode		= 0444,
> +		.kf_ops		= &rdtgroup_kf_single_ops,
> +		.seq_show	= rdtgroup_mbm_mode_show,
> +	},
>   	{
>   		.name		= "cpus",
>   		.mode		= 0644,

Reinette

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 09/20] x86/resctrl: Initialize monitor counters bitmap
  2024-07-03 21:48 ` [PATCH v5 09/20] x86/resctrl: Initialize monitor counters bitmap Babu Moger
@ 2024-07-12 22:07   ` Reinette Chatre
  2024-07-16 17:59     ` Moger, Babu
  2024-07-26 22:48   ` Peter Newman
  1 sibling, 1 reply; 95+ messages in thread
From: Reinette Chatre @ 2024-07-12 22:07 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 7/3/24 2:48 PM, Babu Moger wrote:
> Hardware provides a set of counters when the ABMC feature is supported.
> These counters are used for enabling the events in resctrl group when
> the feature is enabled.
> 
> Introduce mbm_cntrs_free_map bitmap to track available and free counters.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v5:
>    Updated the comments and commit log.
>    Few renames
>     num_cntrs_free_map -> mbm_cntrs_free_map
>     num_cntrs_init -> mbm_cntrs_init
>     Added initialization in rdt_get_tree because the default ABMC
>     enablement happens during the init.
> 
> v4: Changed the name to num_cntrs where applicable.
>      Used bitmap apis.
>      Added more comments for the globals.
> 
> v3: Changed the bitmap name to assign_cntrs_free_map. Removed abmc
>      from the name.
> 
> v2: Changed the bitmap name to assignable_counter_free_map from
>      abmc_counter_free_map.
> ---
>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 29 ++++++++++++++++++++++++--
>   1 file changed, 27 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 4f47f52e01c2..b3d3fa048f15 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -185,6 +185,23 @@ bool closid_allocated(unsigned int closid)
>   	return !test_bit(closid, &closid_free_map);
>   }
>   
> +/*
> + * Counter bitmap and its length for tracking available counters.
> + * ABMC feature provides set of hardware counters for enabling events.
> + * Each event takes one hardware counter. Kernel needs to keep track

What is meant with "Kernel" here? It looks to be the fs code but the
implementation has both fs and arch code reaching into the counter
management. This should not be the case, either the fs code or the
arch code needs to manage the counters, not both.

> + * of number of available counters.
> + */
> +static unsigned long mbm_cntrs_free_map;

With the lengths involved this needs a proper DECLARE_BITMAP()

> +static unsigned int mbm_cntrs_free_map_len;
> +
> +static void mbm_cntrs_init(void)
> +{
> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> +
> +	bitmap_fill(&mbm_cntrs_free_map, r->mon.num_mbm_cntrs);
> +	mbm_cntrs_free_map_len = r->mon.num_mbm_cntrs;
> +}
> +
>   /**
>    * rdtgroup_mode_by_closid - Return mode of resource group with closid
>    * @closid: closid if the resource group
> @@ -2466,6 +2483,12 @@ static int _resctrl_abmc_enable(struct rdt_resource *r, bool enable)
>   {
>   	struct rdt_mon_domain *d;
>   
> +	/*
> +	 * Clear all the previous assignments while switching the monitor
> +	 * mode.
> +	 */
> +	mbm_cntrs_init();
> +

If the counters are managed by fs code then the arch code should not be
doing this. If needed the fs code should init the counters before calling
the arch helpers.

>   	/*
>   	 * Hardware counters will reset after switching the monitor mode.
>   	 * Reset the architectural state so that reading of hardware
> @@ -2724,10 +2747,10 @@ static void schemata_list_destroy(void)
>   
>   static int rdt_get_tree(struct fs_context *fc)
>   {
> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>   	struct rdt_fs_context *ctx = rdt_fc2context(fc);
>   	unsigned long flags = RFTYPE_CTRL_BASE;
>   	struct rdt_mon_domain *dom;
> -	struct rdt_resource *r;
>   	int ret;
>   
>   	cpus_read_lock();
> @@ -2756,6 +2779,9 @@ static int rdt_get_tree(struct fs_context *fc)
>   
>   	closid_init();
>   
> +	if (r->mon.abmc_capable)
> +		mbm_cntrs_init();
> +
>   	if (resctrl_arch_mon_capable())
>   		flags |= RFTYPE_MON;
>   
> @@ -2800,7 +2826,6 @@ static int rdt_get_tree(struct fs_context *fc)
>   		resctrl_mounted = true;
>   
>   	if (is_mbm_enabled()) {
> -		r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>   		list_for_each_entry(dom, &r->mon_domains, hdr.list)
>   			mbm_setup_overflow_handler(dom, MBM_OVERFLOW_INTERVAL,
>   						   RESCTRL_PICK_ANY_CPU);

Reinette

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 10/20] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg
  2024-07-03 21:48 ` [PATCH v5 10/20] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg Babu Moger
@ 2024-07-12 22:08   ` Reinette Chatre
  2024-07-16 19:21     ` Moger, Babu
  0 siblings, 1 reply; 95+ messages in thread
From: Reinette Chatre @ 2024-07-12 22:08 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 7/3/24 2:48 PM, Babu Moger wrote:
> If the BMEC (Bandwidth Monitoring Event Configuration) feature is
> supported, the bandwidth events can be configured to track specific
> events. The event configuration is domain specific. ABMC (Assignable
> Bandwidth Monitoring Counters) feature needs event configuration
> information to assign hardware counter to an RMID. Event configurations
> are not stored in resctrl but instead always read from or written to
> hardware directly when prompted by user space.
> 
> Read the event configuration from the hardware during the domain
> initialization. Save the configuration information in the rdt_hw_domain,

rdt_hw_domain -> rdt_hw_mon_domain

> so it can be used for counter assignment.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v5: Exported mon_event_config_index_get.
>      Renamed arch_domain_mbm_evt_config to resctrl_arch_mbm_evt_config.
> 
> v4: Read the configuration information from the hardware to initialize.
>      Added few commit messages.
>      Fixed the tab spaces.
> 
> v3: Minor changes related to rebase in mbm_config_write_domain.
> 
> v2: No changes.
> ---
>   arch/x86/kernel/cpu/resctrl/core.c     |  2 ++
>   arch/x86/kernel/cpu/resctrl/internal.h |  6 ++++++
>   arch/x86/kernel/cpu/resctrl/monitor.c  | 22 ++++++++++++++++++++++
>   arch/x86/kernel/cpu/resctrl/rdtgroup.c |  2 +-
>   4 files changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index ff5cb693b396..6265ef8b610f 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -619,6 +619,8 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
>   
>   	arch_mon_domain_online(r, d);
>   
> +	resctrl_arch_mbm_evt_config(hw_dom);
> +

This does not look to be an arch call called by the fs code so special
naming does not seem to be required? If it _was_ an arch callback then
it cannot take a HW resource as parameter since the fs code does not have
access to that.


>   	if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
>   		mon_domain_free(hw_dom);
>   		return;
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 0ce9797f80fe..4cb1a5d014a3 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -401,6 +401,8 @@ struct rdt_hw_ctrl_domain {
>    * @d_resctrl:	Properties exposed to the resctrl file system
>    * @arch_mbm_total:	arch private state for MBM total bandwidth
>    * @arch_mbm_local:	arch private state for MBM local bandwidth
> + * @mbm_total_cfg:	MBM total bandwidth configuration
> + * @mbm_local_cfg:	MBM local bandwidth configuration
>    *
>    * Members of this structure are accessed via helpers that provide abstraction.
>    */
> @@ -408,6 +410,8 @@ struct rdt_hw_mon_domain {
>   	struct rdt_mon_domain		d_resctrl;
>   	struct arch_mbm_state		*arch_mbm_total;
>   	struct arch_mbm_state		*arch_mbm_local;
> +	u32				mbm_total_cfg;
> +	u32				mbm_local_cfg;
>   };
>   
>   static inline struct rdt_hw_ctrl_domain *resctrl_to_arch_ctrl_dom(struct rdt_ctrl_domain *r)
> @@ -662,6 +666,8 @@ void __check_limbo(struct rdt_mon_domain *d, bool force_free);
>   void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>   void __init resctrl_file_fflags_init(const char *config,
>   				     unsigned long fflags);
> +void resctrl_arch_mbm_evt_config(struct rdt_hw_mon_domain *hw_dom);
> +unsigned int mon_event_config_index_get(u32 evtid);
>   void rdt_staged_configs_clear(void);
>   bool closid_allocated(unsigned int closid);
>   int resctrl_find_cleanest_closid(void);
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 7a93a6d2b2de..b96b0a8bd7d3 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -1256,6 +1256,28 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>   	return 0;
>   }
>   
> +void resctrl_arch_mbm_evt_config(struct rdt_hw_mon_domain *hw_dom)

A function is expected to have a verb in its name and the verb here seems to be
"config", which does not seem appropriate and creates confusion with
resctrl_arch_event_config_set(). How about resctrl_arch_mbm_evt_config_init()
with proper initializer of the config values to also cover case when events are
not configurable (INVALID_CONFIG_VALUE introduced in next patch?) ?

> +{
> +	unsigned int index;
> +	u64 msrval;
> +
> +	/*
> +	 * Read the configuration registers QOS_EVT_CFG_n, where <n> is
> +	 * the BMEC event number (EvtID).
> +	 */
> +	if (mbm_total_event.configurable) {
> +		index = mon_event_config_index_get(QOS_L3_MBM_TOTAL_EVENT_ID);
> +		rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval);
> +		hw_dom->mbm_total_cfg = msrval & MAX_EVT_CONFIG_BITS;
> +	}
> +
> +	if (mbm_local_event.configurable) {
> +		index = mon_event_config_index_get(QOS_L3_MBM_LOCAL_EVENT_ID);
> +		rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval);
> +		hw_dom->mbm_local_cfg = msrval & MAX_EVT_CONFIG_BITS;
> +	}
> +}
> +
>   void __exit rdt_put_mon_l3_config(void)
>   {
>   	dom_data_exit();
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index b3d3fa048f15..b2b751741dd8 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1606,7 +1606,7 @@ struct mon_config_info {
>    *         1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
>    *         INVALID_CONFIG_INDEX for invalid evtid
>    */
> -static inline unsigned int mon_event_config_index_get(u32 evtid)
> +unsigned int mon_event_config_index_get(u32 evtid)
>   {
>   	switch (evtid) {
>   	case QOS_L3_MBM_TOTAL_EVENT_ID:

Reinette

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 13/20] x86/resctrl: Add the interface to assign hardware counter
  2024-07-03 21:48 ` [PATCH v5 13/20] x86/resctrl: Add the interface to assign hardware counter Babu Moger
@ 2024-07-12 22:09   ` Reinette Chatre
  2024-07-16 20:45     ` Moger, Babu
  0 siblings, 1 reply; 95+ messages in thread
From: Reinette Chatre @ 2024-07-12 22:09 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 7/3/24 2:48 PM, Babu Moger wrote:
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID and monitor the bandwidth as long as it is assigned.
> The assigned RMID will be tracked by the hardware until the user unassigns
> it manually.
> 
> Individual counters are configured by writing to L3_QOS_ABMC_CFG MSR
> and specifying the counter id, bandwidth source, and bandwidth types.
> 
> Provide the interface to assign the counter ids to RMID.
> 

Again this is a mix of a couple of layers where this single patch
introduces fs code (mbm_cntr_alloc() and rdtgroup_assign_cntr()) as well
as architecture specific code (resctrl_arch_assign_cntr() and rdtgroup_abmc_cfg()).
Lumping this all together without any guidance to reader makes this very difficult
to navigate. This work needs to be split into fs and arch parts with
clear descriptions of how the layers interact.

> The feature details are documented in the APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>      Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>      Monitoring (ABMC).
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> ---
> v5: Few name changes to match cntr_id.
>      Changed the function names to
>      rdtgroup_assign_cntr
>      resctr_arch_assign_cntr
>      More comments on commit log.
>      Added function summary.
> 
> v4: Commit message update.
>      User bitmap APIs where applicable.
>      Changed the interfaces considering MPAM(arm).
>      Added domain specific assignment.
> 
> v3: Removed the static from the prototype of rdtgroup_assign_abmc.
>      The function is not called directly from user anymore. These
>      changes are related to global assignment interface.
> 
> v2: Minor text changes in commit message.
> ---
>   arch/x86/kernel/cpu/resctrl/internal.h |  3 +
>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 96 ++++++++++++++++++++++++++
>   2 files changed, 99 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 6925c947682d..66460375056c 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -708,6 +708,9 @@ void __init resctrl_file_fflags_init(const char *config,
>   				     unsigned long fflags);
>   void resctrl_arch_mbm_evt_config(struct rdt_hw_mon_domain *hw_dom);
>   unsigned int mon_event_config_index_get(u32 evtid);
> +int resctrl_arch_assign_cntr(struct rdt_mon_domain *d, u32 evtid, u32 rmid,
> +			     u32 cntr_id, u32 closid, bool enable);
> +int rdtgroup_assign_cntr(struct rdtgroup *rdtgrp, u32 evtid);
>   void rdt_staged_configs_clear(void);
>   bool closid_allocated(unsigned int closid);
>   int resctrl_find_cleanest_closid(void);
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index d2663f1345b7..44f6eff42c30 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -202,6 +202,19 @@ static void mbm_cntrs_init(void)
>   	mbm_cntrs_free_map_len = r->mon.num_mbm_cntrs;
>   }
>   
> +static int mbm_cntr_alloc(void)
> +{
> +	u32 cntr_id = find_first_bit(&mbm_cntrs_free_map,
> +				     mbm_cntrs_free_map_len);
> +
> +	if (cntr_id >= mbm_cntrs_free_map_len)
> +		return -ENOSPC;
> +
> +	__clear_bit(cntr_id, &mbm_cntrs_free_map);
> +
> +	return cntr_id;
> +}
> +
>   /**
>    * rdtgroup_mode_by_closid - Return mode of resource group with closid
>    * @closid: closid if the resource group
> @@ -1860,6 +1873,89 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
>   	return ret ?: nbytes;
>   }
>   
> +static void rdtgroup_abmc_cfg(void *info)
> +{
> +	u64 *msrval = info;
> +
> +	wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *msrval);
> +}
> +
> +/*
> + * Send an IPI to the domain to assign the counter id to RMID.
> + */
> +int resctrl_arch_assign_cntr(struct rdt_mon_domain *d, u32 evtid, u32 rmid,

u32 evtid -> enum resctrl_event_id evtid

> +			     u32 cntr_id, u32 closid, bool enable)
> +{
> +	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
> +	union l3_qos_abmc_cfg abmc_cfg = { 0 };
> +	struct arch_mbm_state *arch_mbm;
> +
> +	abmc_cfg.split.cfg_en = 1;
> +	abmc_cfg.split.cntr_en = enable ? 1 : 0;
> +	abmc_cfg.split.cntr_id = cntr_id;
> +	abmc_cfg.split.bw_src = rmid;
> +
> +	/* Update the event configuration from the domain */
> +	if (evtid == QOS_L3_MBM_TOTAL_EVENT_ID) {
> +		abmc_cfg.split.bw_type = hw_dom->mbm_total_cfg;
> +		arch_mbm = &hw_dom->arch_mbm_total[rmid];
> +	} else {
> +		abmc_cfg.split.bw_type = hw_dom->mbm_local_cfg;
> +		arch_mbm = &hw_dom->arch_mbm_local[rmid];
> +	}
> +
> +	smp_call_function_any(&d->hdr.cpu_mask, rdtgroup_abmc_cfg, &abmc_cfg, 1);
> +
> +	/*
> +	 * Reset the architectural state so that reading of hardware
> +	 * counter is not considered as an overflow in next update.
> +	 */
> +	if (arch_mbm)
> +		memset(arch_mbm, 0, sizeof(struct arch_mbm_state));
> +
> +	return 0;
> +}
> +
> +/*
> + * Assign a hardware counter id to the group. Allocate a new counter id
> + * if the event is unassigned.
> + */
> +int rdtgroup_assign_cntr(struct rdtgroup *rdtgrp, u32 evtid)

u32 evtid -> enum resctrl_event_id evtid

> +{
> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> +	int cntr_id = 0, index;
> +	struct rdt_mon_domain *d;

reverse fir

> +
> +	index = mon_event_config_index_get(evtid);
> +	if (index == INVALID_CONFIG_INDEX) {
> +		rdt_last_cmd_puts("Invalid event id\n");

This is a kernel bug and can be a WARN (once) instead. No need to message user space.

> +		return -EINVAL;
> +	}
> +
> +	/* Nothing to do if event has been assigned already */
> +	if (rdtgrp->mon.cntr_id[index] != MON_CNTR_UNSET) {
> +		rdt_last_cmd_puts("ABMC counter is assigned already\n");
> +		return 0;
> +	}
> +
> +	/*
> +	 * Allocate a new counter id and update domains
> +	 */
> +	cntr_id = mbm_cntr_alloc();
> +	if (cntr_id < 0) {
> +		rdt_last_cmd_puts("Out of ABMC counters\n");
> +		return -ENOSPC;
> +	}
> +
> +	rdtgrp->mon.cntr_id[index] = cntr_id;
> +
> +	list_for_each_entry(d, &r->mon_domains, hdr.list)
> +		resctrl_arch_assign_cntr(d, evtid, rdtgrp->mon.rmid,
> +					 cntr_id, rdtgrp->closid, 1);
> +
> +	return 0;
> +}
> +
>   /* rdtgroup information files for one cache resource. */
>   static struct rftype res_common_files[] = {
>   	{

Reinette

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 11/20] x86/resctrl: Remove MSR reading of event configuration value
  2024-07-03 21:48 ` [PATCH v5 11/20] x86/resctrl: Remove MSR reading of event configuration value Babu Moger
@ 2024-07-12 22:10   ` Reinette Chatre
  2024-07-16 19:34     ` Moger, Babu
  0 siblings, 1 reply; 95+ messages in thread
From: Reinette Chatre @ 2024-07-12 22:10 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 7/3/24 2:48 PM, Babu Moger wrote:
> The event configuration is domain specific and initialized during domain
> initialization. It is not required to read the configuration register
> every time user asks for it. Use the value stored in rdt_mon_hw_domain

rdt_mon_hw_domain -> rdt_hw_mon_domain

> instead. Also update the configuration value when user writes it.

Please separate the context/problem/solution clearly.

> 
> Introduce resctrl_arch_event_config_get() and
> resctrl_arch_event_config_set() to get/set architecture domain specific
> mbm_total_cfg/mbm_local_cfg values.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v5: Introduced resctrl_arch_event_config_get and
>      resctrl_arch_event_config_get() based on our discussion.
>      https://lore.kernel.org/lkml/68e861f9-245d-4496-a72e-46fc57d19c62@amd.com/
> 
> v4: New patch.
> ---
>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 112 +++++++++++++++----------
>   include/linux/resctrl.h                |   4 +
>   2 files changed, 72 insertions(+), 44 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index b2b751741dd8..91c5d45ac367 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1591,10 +1591,59 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
>   }
>   
>   struct mon_config_info {
> +	struct rdt_mon_domain *d;
>   	u32 evtid;
>   	u32 mon_config;
>   };

as seen above, mon_config is a u32

>   
> +#define INVALID_CONFIG_VALUE   UINT_MAX

So an invalid config value can be U32_MAX?

> +
> +unsigned int resctrl_arch_event_config_get(struct rdt_mon_domain *d,
> +					   enum resctrl_event_id eventid)
> +{
> +	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
> +
> +	switch (eventid) {
> +	case QOS_L3_OCCUP_EVENT_ID:
> +		break;
> +	case QOS_L3_MBM_TOTAL_EVENT_ID:
> +		return hw_dom->mbm_total_cfg;
> +	case QOS_L3_MBM_LOCAL_EVENT_ID:
> +		return hw_dom->mbm_local_cfg;
> +	}
> +
> +	/* Never expect to get here */
> +	WARN_ON_ONCE(1);
> +
> +	return INVALID_CONFIG_VALUE;
> +}
> +
> +void resctrl_arch_event_config_set(void *info)
> +{
> +	struct mon_config_info *mon_info = info;
> +	struct rdt_hw_mon_domain *hw_dom;
> +	unsigned int index;
> +
> +	index = mon_event_config_index_get(mon_info->evtid);
> +	if (index == INVALID_CONFIG_VALUE) {

INVALID_CONFIG_INDEX?

> +		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
> +		return;
> +	}
> +	wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
> +
> +	hw_dom = resctrl_to_arch_mon_dom(mon_info->d);
> +
> +	switch (mon_info->evtid) {
> +	case QOS_L3_OCCUP_EVENT_ID:
> +		break;
> +	case QOS_L3_MBM_TOTAL_EVENT_ID:
> +		hw_dom->mbm_total_cfg = mon_info->mon_config;
> +		break;
> +	case QOS_L3_MBM_LOCAL_EVENT_ID:
> +		hw_dom->mbm_local_cfg =  mon_info->mon_config;

Please add a break here.

> +	}
> +}
> +
>   #define INVALID_CONFIG_INDEX   UINT_MAX
>   
>   /**
> @@ -1619,33 +1668,11 @@ unsigned int mon_event_config_index_get(u32 evtid)
>   	}
>   }
>   
> -static void mon_event_config_read(void *info)
> -{
> -	struct mon_config_info *mon_info = info;
> -	unsigned int index;
> -	u64 msrval;
> -
> -	index = mon_event_config_index_get(mon_info->evtid);
> -	if (index == INVALID_CONFIG_INDEX) {
> -		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
> -		return;
> -	}
> -	rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval);
> -
> -	/* Report only the valid event configuration bits */
> -	mon_info->mon_config = msrval & MAX_EVT_CONFIG_BITS;
> -}
> -
> -static void mondata_config_read(struct rdt_mon_domain *d, struct mon_config_info *mon_info)
> -{
> -	smp_call_function_any(&d->hdr.cpu_mask, mon_event_config_read, mon_info, 1);
> -}
> -
>   static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
>   {
> -	struct mon_config_info mon_info = {0};
>   	struct rdt_mon_domain *dom;
>   	bool sep = false;
> +	int val;
>   
>   	cpus_read_lock();
>   	mutex_lock(&rdtgroup_mutex);
> @@ -1654,11 +1681,13 @@ static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid
>   		if (sep)
>   			seq_puts(s, ";");
>   
> -		memset(&mon_info, 0, sizeof(struct mon_config_info));
> -		mon_info.evtid = evtid;
> -		mondata_config_read(dom, &mon_info);
> +		val = resctrl_arch_event_config_get(dom, evtid);

There are too many types used interchangeably. The mon_config is a "u32", but the new function
returns "unsigned int", which is then assigned to an "int". Please just use one type
consistently, it is a u32 so resctrl_arch_event_config_get() can return u32 and "val" should
be u32.

> +		if (val == INVALID_CONFIG_VALUE) {
> +			rdt_last_cmd_puts("Invalid event configuration\n");

I do not see a reason to print message to user space here. If this error is encountered
then it is a kernel bug and resctrl_arch_event_config_get() would already have triggered
a WARN.

Since this is a "never should happen" scenario I wonder if we can not just print
the INVALID_CONFIG_VALUE to user space?


> +			break;
> +		}
>   
> -		seq_printf(s, "%d=0x%02x", dom->hdr.id, mon_info.mon_config);
> +		seq_printf(s, "%d=0x%02x", dom->hdr.id, val);
>   		sep = true;
>   	}
>   	seq_puts(s, "\n");
> @@ -1689,33 +1718,27 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
>   	return 0;
>   }
>   
> -static void mon_event_config_write(void *info)
> -{
> -	struct mon_config_info *mon_info = info;
> -	unsigned int index;
> -
> -	index = mon_event_config_index_get(mon_info->evtid);
> -	if (index == INVALID_CONFIG_INDEX) {
> -		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
> -		return;
> -	}
> -	wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
> -}
>   
>   static void mbm_config_write_domain(struct rdt_resource *r,
>   				    struct rdt_mon_domain *d, u32 evtid, u32 val)
>   {
>   	struct mon_config_info mon_info = {0};
> +	int config_val;
>   
>   	/*
> -	 * Read the current config value first. If both are the same then
> +	 * Check the current config value first. If both are the same then
>   	 * no need to write it again.
>   	 */
> -	mon_info.evtid = evtid;
> -	mondata_config_read(d, &mon_info);
> -	if (mon_info.mon_config == val)
> +	config_val = resctrl_arch_event_config_get(d, evtid);
> +	if (config_val == INVALID_CONFIG_VALUE) {
> +		rdt_last_cmd_puts("Invalid event configuration\n");

same here about unneeded print to user space. When this is encountered it is
a kernel bug.

> +		return;
> +	}
> +	if (config_val == val)
>   		return;
>   
> +	mon_info.d = d;
> +	mon_info.evtid = evtid;
>   	mon_info.mon_config = val;
>   
>   	/*
> @@ -1724,7 +1747,8 @@ static void mbm_config_write_domain(struct rdt_resource *r,
>   	 * are scoped at the domain level. Writing any of these MSRs
>   	 * on one CPU is observed by all the CPUs in the domain.
>   	 */
> -	smp_call_function_any(&d->hdr.cpu_mask, mon_event_config_write,
> +	smp_call_function_any(&d->hdr.cpu_mask,
> +			      resctrl_arch_event_config_set,
>   			      &mon_info, 1);
>   
>   	/*
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 62f0f002ef41..f017258ebf85 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -352,6 +352,10 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
>    */
>   void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d);
>   
> +void resctrl_arch_event_config_set(void *info);
> +unsigned int resctrl_arch_event_config_get(struct rdt_mon_domain *d,
> +					   enum resctrl_event_id eventid);
> +
>   extern unsigned int resctrl_rmid_realloc_threshold;
>   extern unsigned int resctrl_rmid_realloc_limit;
>   

Reinette

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 15/20] x86/resctrl: Assign/unassign counters by default when ABMC is enabled
  2024-07-03 21:48 ` [PATCH v5 15/20] x86/resctrl: Assign/unassign counters by default when ABMC is enabled Babu Moger
@ 2024-07-12 22:10   ` Reinette Chatre
  2024-07-16 20:58     ` Moger, Babu
  2024-07-26 23:22   ` Peter Newman
  1 sibling, 1 reply; 95+ messages in thread
From: Reinette Chatre @ 2024-07-12 22:10 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 7/3/24 2:48 PM, Babu Moger wrote:
> Assign/unassign counters on resctrl group creation/deletion. If the
> counters are exhausted, report the warnings and continue. It is not
> required to fail group creation for assignment failures. Users have
> the option to modify the assignments later.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v5: Removed the code to enable/disable ABMC during the mount.
>      That will be another patch.
>      Added arch callers to get the arch specific data.
>      Renamed fuctions to match the other abmc function.
>      Added code comments for assignment failures.
> 
> v4: Few name changes based on the upstream discussion.
>      Commit message update.
> 
> v3: This is a new patch. Patch addresses the upstream comment to enable
>      ABMC feature by default if the feature is available.
> ---
>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 78 ++++++++++++++++++++++++++
>   1 file changed, 78 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index ffde30b36c1a..475a0c7b2a25 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2910,6 +2910,46 @@ static void schemata_list_destroy(void)
>   	}
>   }
>   
> +/*
> + * Called when new group is created. Assign the counters if ABMC is
> + * already enabled. Two counters are required per group, one for total
> + * event and one for local event. With limited number of counters,
> + * the assignments can fail in some cases. But, it is not required to
> + * fail the group creation. Users have the option to modify the
> + * assignments after the group creation.
> + */
> +static int rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
> +{
> +	int ret = 0;
> +
> +	if (!resctrl_arch_get_abmc_enabled())
> +		return 0;
> +
> +	if (is_mbm_total_enabled())
> +		ret = rdtgroup_assign_cntr(rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
> +
> +	if (!ret && is_mbm_local_enabled())
> +		ret = rdtgroup_assign_cntr(rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
> +
> +	return ret;
> +}
> +
> +static int rdtgroup_unassign_cntrs(struct rdtgroup *rdtgrp)
> +{
> +	int ret = 0;
> +
> +	if (!resctrl_arch_get_abmc_enabled())
> +		return 0;
> +
> +	if (is_mbm_total_enabled())
> +		ret = rdtgroup_unassign_cntr(rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
> +
> +	if (!ret && is_mbm_local_enabled())
> +		ret = rdtgroup_unassign_cntr(rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
> +
> +	return ret;
> +}
> +
>   static int rdt_get_tree(struct fs_context *fc)
>   {
>   	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> @@ -2972,6 +3012,16 @@ static int rdt_get_tree(struct fs_context *fc)
>   		if (ret < 0)
>   			goto out_mongrp;
>   		rdtgroup_default.mon.mon_data_kn = kn_mondata;
> +
> +		/*
> +		 * Assign the counters if ABMC is already enabled.
> +		 * With limited number of counters, the assignments can
> +		 * fail in some cases. But, it is not required to fail
> +		 * the group creation. Users have the option to modify
> +		 * the assignments after the group creation.
> +		 */

The function has detailed comments - it seems unnecessary to me that the
same comments are duplicated at each call site.

> +		if (rdtgroup_assign_cntrs(&rdtgroup_default) < 0)
> +			rdt_last_cmd_puts("Monitor assignment failed\n");

rdtgroup_assign_cntrs() already prints message, why print another? Error
handling can then be dropped.

>   	}
>   
>   	ret = rdt_pseudo_lock_init();
> @@ -3246,6 +3296,8 @@ static void rdt_kill_sb(struct super_block *sb)
>   	cpus_read_lock();
>   	mutex_lock(&rdtgroup_mutex);
>   
> +	rdtgroup_unassign_cntrs(&rdtgroup_default);
> +

This seems appropriate to be in the "Put everything back to default values"
section.

>   	rdt_disable_ctx();
>   
>   	/*Put everything back to default values. */
> @@ -3850,6 +3902,16 @@ static int rdtgroup_mkdir_mon(struct kernfs_node *parent_kn,
>   		goto out_unlock;
>   	}
>   
> +	/*
> +	 * Assign the counters if ABMC is already enabled.
> +	 * With the limited number of counters, there can be cases
> +	 * only on assignment succeed. It is not required to fail
> +	 * here in that case. Users have the option to modify the
> +	 * assignments later.
> +	 */
> +	if (rdtgroup_assign_cntrs(rdtgrp) < 0)
> +		rdt_last_cmd_puts("Monitor assignment failed\n");
> +
>   	kernfs_activate(rdtgrp->kn);
>   
>   	/*
> @@ -3894,6 +3956,17 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
>   	if (ret)
>   		goto out_closid_free;
>   
> +	/*
> +	 * Assign the counters if ABMC is already enabled.
> +	 * With the limited number of counters, there can be cases
> +	 * only on assignment succeed. It is not required to fail
> +	 * here in that case. Users have the option to assign the
> +	 * counter later.
> +	 */
> +
> +	if (rdtgroup_assign_cntrs(rdtgrp) < 0)
> +		rdt_last_cmd_puts("Monitor assignment failed\n");
> +
>   	kernfs_activate(rdtgrp->kn);
>   
>   	ret = rdtgroup_init_alloc(rdtgrp);
> @@ -3989,6 +4062,9 @@ static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
>   	update_closid_rmid(tmpmask, NULL);
>   
>   	rdtgrp->flags = RDT_DELETED;
> +
> +	rdtgroup_unassign_cntrs(rdtgrp);
> +
>   	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>   
>   	/*
> @@ -4035,6 +4111,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
>   	cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
>   	update_closid_rmid(tmpmask, NULL);
>   
> +	rdtgroup_unassign_cntrs(rdtgrp);
> +
>   	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>   	closid_free(rdtgrp->closid);
>   

Reinette

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 12/20] x86/resctrl: Add data structures and definitions for ABMC assignment
  2024-07-03 21:48 ` [PATCH v5 12/20] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
@ 2024-07-12 22:13   ` Reinette Chatre
  2024-07-16 20:24     ` Moger, Babu
  0 siblings, 1 reply; 95+ messages in thread
From: Reinette Chatre @ 2024-07-12 22:13 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 7/3/24 2:48 PM, Babu Moger wrote:
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID and monitor the bandwidth as long as the counter
> is assigned. The bandwidth events will be tracked by the hardware until
> the user changes the configuration. Each resctrl group can configure
> maximum two counters, one for total event and one for local event.
> 
> The counters are configured by writing to MSR L3_QOS_ABMC_CFG.
> Configuration is done by setting the counter id, bandwidth source (RMID)
> and bandwidth configuration supported by BMEC(Bandwidth Monitoring Event
> Configuration). Reading L3_QOS_ABMC_DSC returns the configuration of the
> counter id specified in L3_QOS_ABMC_CFG.
> 
> Attempts to read or write these MSRs when ABMC is not enabled will result
> in a #GP(0) exception.
> 
> Introduce data structures and definitions for ABMC assignments.
> 
> MSR L3_QOS_ABMC_CFG (0xC000_03FDh) and L3_QOS_ABMC_DSC (0xC000_03FEh)
> details.
> =========================================================================
> Bits 	Mnemonic	Description			Access Reset
> 							Type   Value
> =========================================================================
> 63 	CfgEn 		Configuration Enable 		R/W 	0
> 
> 62 	CtrEn 		Enable/disable Tracking		R/W 	0
> 
> 61:53 	– 		Reserved 			MBZ 	0
> 
> 52:48 	CtrID 		Counter Identifier		R/W	0
> 
> 47 	IsCOS		BwSrc field is a CLOSID		R/W	0
> 			(not an RMID)
> 
> 46:44 	–		Reserved			MBZ	0
> 
> 43:32	BwSrc		Bandwidth Source		R/W	0
> 			(RMID or CLOSID)
> 
> 31:0	BwType		Bandwidth configuration		R/W	0
> 			to track for this counter
> ==========================================================================
> 
> The feature details are documented in the APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).

The changelog only describes the hardware interface yet the patch contains
part hardware interface part new driver support for hardware interface.

> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> ---
> v5: Moved assignment flags here (path 10/19 of v4).
>      Added MON_CNTR_UNSET definition to initialize cntr_id's.
>      More details in commit log.
>      Renamed few fields in l3_qos_abmc_cfg for readability.
> 
> v4: Added more descriptions.
>      Changed the name abmc_ctr_id to ctr_id.
>      Added L3_QOS_ABMC_DSC. Used for reading the configuration.
> 
> v3: No changes.
> 
> v2: No changes.
> ---
>   arch/x86/include/asm/msr-index.h       |  2 ++
>   arch/x86/kernel/cpu/resctrl/internal.h | 40 ++++++++++++++++++++++++++
>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 18 ++++++++++++
>   3 files changed, 60 insertions(+)
> 
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 263b2d9d00ed..5e44ff91f459 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -1175,6 +1175,8 @@
>   #define MSR_IA32_SMBA_BW_BASE		0xc0000280
>   #define MSR_IA32_EVT_CFG_BASE		0xc0000400
>   #define MSR_IA32_L3_QOS_EXT_CFG		0xc00003ff
> +#define MSR_IA32_L3_QOS_ABMC_CFG	0xc00003fd
> +#define MSR_IA32_L3_QOS_ABMC_DSC	0xc00003fe
>   
>   /* MSR_IA32_VMX_MISC bits */
>   #define MSR_IA32_VMX_MISC_INTEL_PT                 (1ULL << 14)
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 4cb1a5d014a3..6925c947682d 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -100,6 +100,18 @@ cpumask_any_housekeeping(const struct cpumask *mask, int exclude_cpu)
>   /* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature */
>   #define ABMC_ENABLE			BIT(0)
>   
> +/*
> + * Assignment flags for ABMC feature
> + */
> +#define ASSIGN_NONE	0
> +#define ASSIGN_TOTAL	BIT(QOS_L3_MBM_TOTAL_EVENT_ID)
> +#define ASSIGN_LOCAL	BIT(QOS_L3_MBM_LOCAL_EVENT_ID)

These flags do not appear to be part of hardware interface and there
is no explanation for what they mean or how they will be used. They are
also not used in this patch. It is thus not possible to understand if
they belong in this patch or is appropriate in this work.

> +
> +#define MON_CNTR_UNSET	U32_MAX
> +
> +/* Maximum assignable counters per resctrl group */
> +#define MAX_CNTRS	2
> +
>   struct rdt_fs_context {
>   	struct kernfs_fs_context	kfc;
>   	bool				enable_cdpl2;
> @@ -228,12 +240,14 @@ enum rdtgrp_mode {
>    * @parent:			parent rdtgrp
>    * @crdtgrp_list:		child rdtgroup node list
>    * @rmid:			rmid for this rdtgroup
> + * @cntr_id:			ABMC counter ids assigned to this group

struct mongroup is private to resctrl fs so it cannot contain an
architecture specific feature. Having it contain a generic "cntr_id"
may be ok at this point, but it should not be termed "ABMC counter".

>    */
>   struct mongroup {
>   	struct kernfs_node	*mon_data_kn;
>   	struct rdtgroup		*parent;
>   	struct list_head	crdtgrp_list;
>   	u32			rmid;
> +	u32			cntr_id[MAX_CNTRS];

This is a significant addition yet is silently included as part of a patch
that just introduces hardware interface. This is how resctrl will manage
the hardware counters. It is significant since this is what dictates that it
is resctrl fs that will manage the counters, which makes it important which
interfaces are made available and from where it is called. Through
this series I have also not come across a description of this architecture.
With this introduction counters are maintained per monitor group, yet
the new interface supports assigining counters per domain. There
is no high level explanation of this architecture and the reader is forced
to decipher it from the implementation making this work harder to review
that necessary.

Would it be possible to present the fs and architecture code
separately? I think doing so will make it easier to understand.

>   };
>   
>   /**
> @@ -607,6 +621,32 @@ union cpuid_0x10_x_edx {
>   	unsigned int full;
>   };
>   
> +/*
> + * ABMC counters can be configured by writing to L3_QOS_ABMC_CFG.
> + * @bw_type		: Bandwidth configuration(supported by BMEC)
> + *			  to track this counter id.

Does "to track this counter id" mean "tracked by @cntr_id"?

> + * @bw_src		: Bandwidth Source (RMID or CLOSID).

Please do not capitalize words mid sentence, like "Source"
above, "Identifier", and "Enable" in two instances below.

> + * @reserved1		: Reserved.
> + * @is_clos		: BwSrc field is a CLOSID (not an RMID).

Just stick to @bw_src.

> + * @cntr_id		: Counter Identifier.
> + * @reserved		: Reserved.
> + * @cntr_en		: Tracking Enable bit.

Can this be more detailed about what happens when this bit is set/clear?

> + * @cfg_en		: Configuration Enable bit.

What is difference between "configuration enable" and "tracking enable"?
What is relationship, if any, to @bw_type that is the bandwidth configuration?

> + */
> +union l3_qos_abmc_cfg {
> +	struct {
> +		unsigned long	bw_type	:32,
> +				bw_src	:12,
> +				reserved1: 3,
> +				is_clos	: 1,
> +				cntr_id	: 5,
> +				reserved : 9,
> +				cntr_en	: 1,
> +				cfg_en	: 1;
> +	} split;

Please check the spacing in this data structure. Tabs are used inconsistently
and the members are not lining up either.

> +	unsigned long full;
> +};
> +
>   void rdt_last_cmd_clear(void);
>   void rdt_last_cmd_puts(const char *s);
>   __printf(1, 2)
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 91c5d45ac367..d2663f1345b7 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2505,6 +2505,7 @@ static void resctrl_abmc_set_one_amd(void *arg)
>   
>   static int _resctrl_abmc_enable(struct rdt_resource *r, bool enable)
>   {
> +	struct rdtgroup *prgrp, *crgrp;
>   	struct rdt_mon_domain *d;
>   
>   	/*
> @@ -2513,6 +2514,17 @@ static int _resctrl_abmc_enable(struct rdt_resource *r, bool enable)
>   	 */
>   	mbm_cntrs_init();
>   
> +	/* Reset the cntr_id's for all the monitor groups */
> +	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
> +		prgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
> +		prgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
> +		list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list,
> +				    mon.crdtgrp_list) {
> +			crgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
> +			crgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
> +		}
> +	}
> +

No. The counters are in the monitor group that is a structure that is private
to the fs. The architecture code should not be accessing it. This should only be
done by fs code.

>   	/*
>   	 * Hardware counters will reset after switching the monitor mode.
>   	 * Reset the architectural state so that reading of hardware
> @@ -3573,6 +3585,8 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
>   		return ret;
>   	}
>   	rdtgrp->mon.rmid = ret;
> +	rdtgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
> +	rdtgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
>   
>   	ret = mkdir_mondata_all(rdtgrp->kn, rdtgrp, &rdtgrp->mon.mon_data_kn);
>   	if (ret) {
> @@ -4128,6 +4142,10 @@ static void __init rdtgroup_setup_default(void)
>   	rdtgroup_default.closid = RESCTRL_RESERVED_CLOSID;
>   	rdtgroup_default.mon.rmid = RESCTRL_RESERVED_RMID;
>   	rdtgroup_default.type = RDTCTRL_GROUP;
> +
> +	rdtgroup_default.mon.cntr_id[0] = MON_CNTR_UNSET;
> +	rdtgroup_default.mon.cntr_id[1] = MON_CNTR_UNSET;
> +
>   	INIT_LIST_HEAD(&rdtgroup_default.mon.crdtgrp_list);
>   
>   	list_add(&rdtgroup_default.rdtgroup_list, &rdt_all_groups);

Reinette

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 16/20] x86/resctrl: Report "Unassigned" for MBM events in ABMC mode
  2024-07-03 21:48 ` [PATCH v5 16/20] x86/resctrl: Report "Unassigned" for MBM events in ABMC mode Babu Moger
@ 2024-07-12 22:13   ` Reinette Chatre
  2024-07-16 21:04     ` Moger, Babu
  2024-07-13 20:26   ` Markus Elfring
  1 sibling, 1 reply; 95+ messages in thread
From: Reinette Chatre @ 2024-07-12 22:13 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 7/3/24 2:48 PM, Babu Moger wrote:
> In ABMC mode, the hardware counter should be assigned to read the MBM
> events.
> 
> Report "Unassigned" in case the user attempts to read the events without
> assigning the counter.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v5: New patch.
> ---
>   Documentation/arch/x86/resctrl.rst        |  4 ++++
>   arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 19 ++++++++++++++-----
>   2 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index 4907d0758118..11b7a5f26b40 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -284,6 +284,10 @@ with the following files:
>   	until the user unassigns it manually. There is no need to worry
>   	about counters being reset during this period.
>   
> +	In ABMC mode, the MBM event counters will return "Unassigned" if
> +	the hardware counter is not assigned to the event. Users need to
> +	assign a counter manually to read the events.

This no longer seems accurate with counters assigned by default.

> +
>   	Without ABMC enabled, monitoring will work in "legacy" mode
>   	without assignment option.
>   
> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> index 50fa1fe9a073..e60b469b7d12 100644
> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> @@ -562,7 +562,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>   	struct rdtgroup *rdtgrp;
>   	struct rdt_resource *r;
>   	union mon_data_bits md;
> -	int ret = 0;
> +	int ret = 0, index;
>   
>   	rdtgrp = rdtgroup_kn_lock_live(of->kn);
>   	if (!rdtgrp) {
> @@ -609,12 +609,21 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>   
>   checkresult:
>   
> -	if (rr.err == -EIO)
> +	if (rr.err == -EIO) {
>   		seq_puts(m, "Error\n");
> -	else if (rr.err == -EINVAL)
> -		seq_puts(m, "Unavailable\n");
> -	else
> +	} else if (rr.err == -EINVAL) {
> +		if (resctrl_arch_get_abmc_enabled()) {
> +			index = mon_event_config_index_get(evtid);
> +			if (rdtgrp->mon.cntr_id[index] == MON_CNTR_UNSET)
> +				seq_puts(m, "Unassigned\n");
> +			else
> +				seq_puts(m, "Unavailable\n");
> +		} else {
> +			seq_puts(m, "Unavailable\n");
> +		}
> +	} else {
>   		seq_printf(m, "%llu\n", rr.val);
> +	}
>   

This still attempts to read from hardware that is futile to do knowing
that a counter is not assigned. Why not just print "Unassigned" right away
without trying to read data from hardware when knowing it will fail?

>   out:
>   	rdtgroup_kn_unlock(of->kn);

Reinette

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 17/20] x86/resctrl: Introduce the interface switch between monitor modes
  2024-07-03 21:48 ` [PATCH v5 17/20] x86/resctrl: Introduce the interface switch between monitor modes Babu Moger
@ 2024-07-12 22:14   ` Reinette Chatre
  2024-07-16 22:46     ` Moger, Babu
  2024-07-13  7:15   ` Markus Elfring
  1 sibling, 1 reply; 95+ messages in thread
From: Reinette Chatre @ 2024-07-12 22:14 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 7/3/24 2:48 PM, Babu Moger wrote:
> Introduce interface to switch between ABMC and legacy modes.
> 
> By default ABMC is enabled on boot if the feature is available.
> Provide the interface to go back to legacy mode if required.
> 
> $ cat /sys/fs/resctrl/info/L3_MON/mbm_mode
> [abmc]
> legacy
> 
> To enable the legacy monitoring feature:
> $ echo "legacy" > /sys/fs/resctrl/info/L3_MON/mbm_mode
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v4: Minor commit text changes. Keep the default to ABMC when supported.
>      Fixed comments to reflect changed interface "mbm_mode".
> 
> v3: New patch to address the review comments from upstream.
> ---
>   Documentation/arch/x86/resctrl.rst     | 10 +++++++
>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 37 +++++++++++++++++++++++++-
>   2 files changed, 46 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index 11b7a5f26b40..4c41c5622627 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -291,6 +291,16 @@ with the following files:
>   	Without ABMC enabled, monitoring will work in "legacy" mode
>   	without assignment option.
>   
> +	* To enable ABMC feature:
> +	  ::
> +
> +	    # echo  "abmc" > /sys/fs/resctrl/info/L3_MON/mbm_mode
> +
> +	* To enable the legacy monitoring feature:
> +	  ::
> +
> +	    # echo  "legacy" > /sys/fs/resctrl/info/L3_MON/mbm_mode
> +

Needs details on what user can expect to happen to counters/data when
switching between modes.

>   "num_mbm_cntrs":
>   	The number of monitoring counters available for assignment.
>   
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 475a0c7b2a25..531233779f8d 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -910,6 +910,40 @@ static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
>   	return 0;
>   }
>   
> +static ssize_t rdtgroup_mbm_mode_write(struct kernfs_open_file *of,
> +				       char *buf, size_t nbytes,
> +				       loff_t off)
> +{
> +	struct rdt_resource *r = of->kn->parent->priv;
> +	int ret = 0;
> +
> +	if (!r->mon.abmc_capable)
> +		return -EINVAL;
> +

Why should a user not be able to write "legacy" into this
file if "legacy" is the only mode supported?

> +	/* Valid input requires a trailing newline */
> +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
> +		return -EINVAL;
> +
> +	buf[nbytes - 1] = '\0';
> +
> +	cpus_read_lock();
> +	mutex_lock(&rdtgroup_mutex);
> +
> +	rdt_last_cmd_clear();
> +
> +	if (!strcmp(buf, "legacy"))
> +		resctrl_arch_abmc_disable();
> +	else if (!strcmp(buf, "abmc"))
> +		ret = resctrl_arch_abmc_enable();
> +	else
> +		ret = -EINVAL;
> +
> +	mutex_unlock(&rdtgroup_mutex);
> +	cpus_read_unlock();
> +
> +	return ret ?: nbytes;
> +}
> +
>   #ifdef CONFIG_PROC_CPU_RESCTRL
>   
>   /*
> @@ -2103,9 +2137,10 @@ static struct rftype res_common_files[] = {
>   	},
>   	{
>   		.name		= "mbm_mode",
> -		.mode		= 0444,
> +		.mode		= 0644,
>   		.kf_ops		= &rdtgroup_kf_single_ops,
>   		.seq_show	= rdtgroup_mbm_mode_show,
> +		.write		= rdtgroup_mbm_mode_write,
>   	},
>   	{
>   		.name		= "cpus",

Reinette

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 18/20] x86/resctrl: Enable AMD ABMC feature by default when supported
  2024-07-03 21:48 ` [PATCH v5 18/20] x86/resctrl: Enable AMD ABMC feature by default when supported Babu Moger
@ 2024-07-12 22:15   ` Reinette Chatre
  2024-07-16 23:23     ` Moger, Babu
  0 siblings, 1 reply; 95+ messages in thread
From: Reinette Chatre @ 2024-07-12 22:15 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 7/3/24 2:48 PM, Babu Moger wrote:
> Enable ABMC by default when supported during the boot up.
> 
> Users will not see any difference in the behavior when resctrl is
> mounted. With automatic assignment everything will work as running
> in the legacy monitor mode.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v5: New patch to enable ABMC by default.
> ---
>   arch/x86/kernel/cpu/resctrl/core.c     |  2 ++
>   arch/x86/kernel/cpu/resctrl/internal.h |  1 +
>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 17 +++++++++++++++++
>   3 files changed, 20 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 6265ef8b610f..b69b2650bde3 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -599,6 +599,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
>   		d = container_of(hdr, struct rdt_mon_domain, hdr);
>   
>   		cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
> +		resctrl_arch_configure_abmc();
>   		return;
>   	}
>   
> @@ -620,6 +621,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
>   	arch_mon_domain_online(r, d);
>   
>   	resctrl_arch_mbm_evt_config(hw_dom);
> +	resctrl_arch_configure_abmc();
>   
>   	if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
>   		mon_domain_free(hw_dom);
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index beb005775fe4..0f858cff8ab1 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -707,6 +707,7 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>   void __init resctrl_file_fflags_init(const char *config,
>   				     unsigned long fflags);
>   void resctrl_arch_mbm_evt_config(struct rdt_hw_mon_domain *hw_dom);
> +void resctrl_arch_configure_abmc(void);
>   unsigned int mon_event_config_index_get(u32 evtid);
>   int resctrl_arch_assign_cntr(struct rdt_mon_domain *d, u32 evtid, u32 rmid,
>   			     u32 cntr_id, u32 closid, bool enable);
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 531233779f8d..d978668c8865 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2733,6 +2733,23 @@ void resctrl_arch_abmc_disable(void)
>   	}
>   }
>   
> +void resctrl_arch_configure_abmc(void)
> +{
> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> +	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
> +	bool enable = true;
> +
> +	mutex_lock(&rdtgroup_mutex);
> +
> +	if (r->mon.abmc_capable) {
> +		if (!hw_res->abmc_enabled)
> +			hw_res->abmc_enabled = true;
> +		resctrl_abmc_set_one_amd(&enable);
> +	}

This does not look right. It is not architecture code that needs to
decide if this feature is enabled or not, right? The feature is enabled
via fs (for example when user writes to mbm_mode). If the default is
enabled then it should be set by fs. resctrl_arch_configure_abmc()
then checks if feature is capable and enabled before it configures
it on the CPU.

> +
> +	mutex_unlock(&rdtgroup_mutex);
> +}
> +
>   /*
>    * We don't allow rdtgroup directories to be created anywhere
>    * except the root directory. Thus when looking for the rdtgroup

Reinette

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 19/20] x86/resctrl: Introduce interface to list monitor states of all the groups
  2024-07-03 21:48 ` [PATCH v5 19/20] x86/resctrl: Introduce interface to list monitor states of all the groups Babu Moger
@ 2024-07-12 22:16   ` Reinette Chatre
  2024-07-17 15:22     ` Moger, Babu
  0 siblings, 1 reply; 95+ messages in thread
From: Reinette Chatre @ 2024-07-12 22:16 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 7/3/24 2:48 PM, Babu Moger wrote:
> Provide the interface to list the monitor states of all the resctrl
> groups in ABMC mode.
> 
> Example:
> $cat /sys/fs/resctrl/info/L3_MON/mbm_control
> 
> List follows the following format:
> 
> "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
> 
> Format for specific type of groups:
> 
> - Default CTRL_MON group:
>    "//<domain_id>=<flags>"
> 
> - Non-default CTRL_MON group:
>    "<CTRL_MON group>//<domain_id>=<flags>"
> 
> - Child MON group of default CTRL_MON group:
>    "/<MON group>/<domain_id>=<flags>"
> 
> - Child MON group of non-default CTRL_MON group:
>    "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
> 
> 
> Flags can be one of the following:
> t  MBM total event is enabled
> l  MBM local event is enabled
> tl Both total and local MBM events are enabled
> _  None of the MBM events are enabled
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v5: Replaced "assignment flags" with "flags".
>      Changes related to mon structure.
>      Changes related renaming the interface from mbm_assign_control to
>      mbm_control.
> 
> v4: Added functionality to query domain specific assigment in.
>      rdtgroup_abmc_dom_state().
> 
> v3: New patch.
>      Addresses the feedback to provide the global assignment interface.
>      https://lore.kernel.org/lkml/c73f444b-83a1-4e9a-95d3-54c5165ee782@intel.com/
> ---
>   Documentation/arch/x86/resctrl.rst     |  54 ++++++++++
>   arch/x86/kernel/cpu/resctrl/monitor.c  |   1 +
>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 130 +++++++++++++++++++++++++
>   3 files changed, 185 insertions(+)
> 
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index 4c41c5622627..05fee779e109 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -304,6 +304,60 @@ with the following files:
>   "num_mbm_cntrs":
>   	The number of monitoring counters available for assignment.
>   
> +"mbm_control":
> +	Available when ABMC features are supported.

"Available when ABMC features are supported." can be dropped

> +	Reports the resctrl group and monitor status of each group.
> +
> +	List follows the following format:
> +		"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
> +
> +	Format for specific type of grpups:

grpups -> groups

> +
> +	* Default CTRL_MON group:
> +		"//<domain_id>=<flags>"
> +
> +	* Non-default CTRL_MON group:
> +		"<CTRL_MON group>//<domain_id>=<flags>"
> +
> +	* Child MON group of default CTRL_MON group:
> +		"/<MON group>/<domain_id>=<flags>"
> +
> +	* Child MON group of non-default CTRL_MON group:
> +		"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
> +
> +	Flags can be one of the following:
> +	::
> +
> +	 t  MBM total event is enabled.
> +	 l  MBM local event is enabled.
> +	 tl Both total and local MBM events are enabled.
> +	 _  None of the MBM events are enabled.
> +
> +	Examples:
> +	::
> +
> +	 # mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
> +	 # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
> +	 # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
> +
> +	 # cat /sys/fs/resctrl/info/L3_MON/mbm_control
> +	 non_default_ctrl_mon_grp//0=tl;1=tl;
> +	 non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> +	 //0=tl;1=tl;
> +	 /child_default_mon_grp/0=tl;1=tl;
> +
> +	 There are four resctrl groups. All the groups have total and local events are
> +	 enabled on domain 0 and 1.

"All the groups have total and local events are enabled" -> "All the groups have total and local events enabled"?

> +

The text below seems to repeat ealier description.

> +	 non_default_ctrl_mon_grp// - This is a non-default CTRL_MON group.
> +
> +	 non_default_ctrl_mon_grp/child_non_default_mon_grp/ - This is a child monitor
> +	 group of non-default CTRL_MON group.
> +
> +	 // - This is a default CTRL_MON group.
> +
> +	 /child_default_mon_grp/ - This is a child monitor group of default CTRL_MON group.
> +
>   "max_threshold_occupancy":
>   		Read/write file provides the largest value (in
>   		bytes) at which a previously used LLC_occupancy
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index b96b0a8bd7d3..684730f1a72d 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -1244,6 +1244,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>   				r->mon.num_mbm_cntrs = 64;
>   
>   			resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
> +			resctrl_file_fflags_init("mbm_control", RFTYPE_MON_INFO);

Shouldn't this file always be present?

>   		}
>   	}
>   
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index d978668c8865..0de9f23d5389 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -944,6 +944,130 @@ static ssize_t rdtgroup_mbm_mode_write(struct kernfs_open_file *of,
>   	return ret ?: nbytes;
>   }
>   
> +static void rdtgroup_abmc_dom_cfg(void *info)
> +{
> +	u64 *msrval = info;
> +
> +	wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *msrval);
> +	rdmsrl(MSR_IA32_L3_QOS_ABMC_DSC, *msrval);
> +}
> +
> +/*
> + * Writing the counter id with CfgEn=0 on L3_QOS_ABMC_CFG and reading
> + * L3_QOS_ABMC_DSC back will return configuration of the counter
> + * specified.

Can this be expanded to explain what the return values mean?

> + */
> +static int rdtgroup_abmc_dom_state(struct rdt_mon_domain *d, u32 cntr_id,
> +				   u32 rmid)
> +{
> +	union l3_qos_abmc_cfg abmc_cfg = { 0 };
> +
> +	abmc_cfg.split.cfg_en = 0;
> +	abmc_cfg.split.cntr_id = cntr_id;
> +
> +	smp_call_function_any(&d->hdr.cpu_mask, rdtgroup_abmc_dom_cfg,
> +			      &abmc_cfg, 1);
> +
> +	if (abmc_cfg.split.cntr_en && abmc_cfg.split.bw_src == rmid)
> +		return 0;
> +	else
> +		return -1;
> +}
> +
> +static char *rdtgroup_mon_state_to_str(struct rdtgroup *rdtgrp,
> +				       struct rdt_mon_domain *d, char *str)
> +{
> +	char *tmp = str;
> +	int dom_state = ASSIGN_NONE;

reverse fir

> +
> +	/*
> +	 * Query the monitor state for the domain.
> +	 * Index 0 for evtid == QOS_L3_MBM_TOTAL_EVENT_ID
> +	 * Index 1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID

Why not use the helper?

> +	 */
> +	if (rdtgrp->mon.cntr_id[0] != MON_CNTR_UNSET)
> +		if (!rdtgroup_abmc_dom_state(d, rdtgrp->mon.cntr_id[0], rdtgrp->mon.rmid))
> +			dom_state |= ASSIGN_TOTAL;
> +
> +	if (rdtgrp->mon.cntr_id[1] != MON_CNTR_UNSET)
> +		if (!rdtgroup_abmc_dom_state(d, rdtgrp->mon.cntr_id[1], rdtgrp->mon.rmid))
> +			dom_state |= ASSIGN_LOCAL;
> +
> +	switch (dom_state) {
> +	case ASSIGN_NONE:
> +		*tmp++ = '_';
> +		break;
> +	case (ASSIGN_TOTAL | ASSIGN_LOCAL):
> +		*tmp++ = 't';
> +		*tmp++ = 'l';
> +		break;
> +	case ASSIGN_TOTAL:
> +		*tmp++ = 't';
> +		break;
> +	case ASSIGN_LOCAL:
> +		*tmp++ = 'l';
> +		break;
> +	default:
> +		break;
> +	}

This switch statement does not scale. Adding new flags will be painful. Can flags not
just incrementally be printed as learned from hardware with "_" printed as last resort?
This would elimininate need for these "ASSIGN" flags.

> +
> +	*tmp = '\0';
> +	return str;
> +}
> +
> +static int rdtgroup_mbm_control_show(struct kernfs_open_file *of,
> +				     struct seq_file *s, void *v)
> +{
> +	struct rdt_resource *r = of->kn->parent->priv;
> +	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
> +	struct rdt_mon_domain *dom;
> +	struct rdtgroup *rdtg;
> +	int grp_default = 0;
> +	char str[10];
> +
> +	if (!hw_res->abmc_enabled) {
> +		rdt_last_cmd_puts("ABMC feature is not enabled\n");
> +		return -EINVAL;
> +	}
> +
> +	mutex_lock(&rdtgroup_mutex);
> +
> +	list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
> +		struct rdtgroup *crg;
> +
> +		if (rdtg == &rdtgroup_default) {
> +			grp_default = 1;
> +			seq_puts(s, "//");
> +		} else {
> +			grp_default = 0;
> +			seq_printf(s, "%s//", rdtg->kn->name);
> +		}

Isn't the default resource group's name already empty string? That should
eliminate the need for this special handling, no?

> +
> +		list_for_each_entry(dom, &r->mon_domains, hdr.list)
> +			seq_printf(s, "%d=%s;", dom->hdr.id,
> +				   rdtgroup_mon_state_to_str(rdtg, dom, str));
> +		seq_putc(s, '\n');
> +
> +		list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
> +				    mon.crdtgrp_list) {
> +			if (grp_default)
> +				seq_printf(s, "/%s/", crg->kn->name);
> +			else
> +				seq_printf(s, "%s/%s/", rdtg->kn->name,
> +					   crg->kn->name);
> +

Same here .... with default group having name of empty string it can just be
printed directly, no?

> +			list_for_each_entry(dom, &r->mon_domains, hdr.list)
> +				seq_printf(s, "%d=%s;", dom->hdr.id,
> +					   rdtgroup_mon_state_to_str(crg, dom, str));
> +			seq_putc(s, '\n');
> +		}
> +	}
> +
> +	mutex_unlock(&rdtgroup_mutex);
> +
> +	return 0;
> +}
> +
>   #ifdef CONFIG_PROC_CPU_RESCTRL
>   
>   /*
> @@ -2156,6 +2280,12 @@ static struct rftype res_common_files[] = {
>   		.kf_ops		= &rdtgroup_kf_single_ops,
>   		.seq_show	= rdtgroup_num_mbm_cntrs_show,
>   	},
> +	{
> +		.name		= "mbm_control",
> +		.mode		= 0444,
> +		.kf_ops		= &rdtgroup_kf_single_ops,
> +		.seq_show	= rdtgroup_mbm_control_show,
> +	},
>   	{
>   		.name		= "cpus_list",
>   		.mode		= 0644,

Reinette

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 20/20] x86/resctrl: Introduce interface to modify assignment states of the groups
  2024-07-03 21:48 ` [PATCH v5 20/20] x86/resctrl: Introduce interface to modify assignment states of " Babu Moger
@ 2024-07-12 22:17   ` Reinette Chatre
  2024-07-17 16:22     ` Moger, Babu
  2024-07-25  0:03   ` Peter Newman
  1 sibling, 1 reply; 95+ messages in thread
From: Reinette Chatre @ 2024-07-12 22:17 UTC (permalink / raw)
  To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 7/3/24 2:48 PM, Babu Moger wrote:
> Introduce the interface to enable events in ABMC mode.

As mentioned in cover letter, please take care with terms. This
interface does not "enable events" - note that events can be
"enabled" even in legacy mode. This is the interface to
assign counters.

> 
> Events can be enabled or disabled by writing to file
> /sys/fs/resctrl/info/L3_MON/mbm_control
> 
> Format is similar to the list format with addition of op-code for the
> assignment operation.
>   "<CTRL_MON group>/<MON group>/<op-code><flags>"

Missing a "domain_id".

> 
> Format for specific type of groups:
> 
>   * Default CTRL_MON group:
>           "//<domain_id><op-code><flags>"
> 
>   * Non-default CTRL_MON group:
>           "<CTRL_MON group>//<domain_id><op-code><flags>"
> 
>   * Child MON group of default CTRL_MON group:
>           "/<MON group>/<domain_id><op-code><flags>"
> 
>   * Child MON group of non-default CTRL_MON group:
>           "<CTRL_MON group>/<MON group>/<domain_id><op-code><flags>"
> 
> Op-code can be one of the following:
> 
>   = Update the assignment to match the flags
>   + enable a new state
>   - disable a new state

(note comment in cover letter about consistent terms)

> 
> Assignment flags can be one of the following:
>   t  MBM total event is enabled
>   l  MBM local event is enabled
>   tl Both total and local MBM events are enabled
>   _  None of the MBM events are enabled. Valid only with '=" opcode.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v5: Interface name changed from mbm_assign_control to mbm_control.
>      Fixed opcode and flags combination.
>      '=_" is valid.
>      "-_" amd "+_" is not valid.
>      Minor message update.
>      Renamed the function with prefix - rdtgroup_.
>      Corrected few documentation mistakes.
>      Rebase related changes after SNC support.
> 
> v4: Added domain specific assignments. Fixed the opcode parsing.
> 
> v3: New patch.
>      Addresses the feedback to provide the global assignment interface.
>      https://lore.kernel.org/lkml/c73f444b-83a1-4e9a-95d3-54c5165ee782@intel.com/
> ---
>   Documentation/arch/x86/resctrl.rst     |  81 +++++++-
>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 250 ++++++++++++++++++++++++-
>   2 files changed, 329 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index 05fee779e109..5a621235eb2b 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -331,7 +331,7 @@ with the following files:
>   	 t  MBM total event is enabled.
>   	 l  MBM local event is enabled.
>   	 tl Both total and local MBM events are enabled.
> -	 _  None of the MBM events are enabled.
> +	 _  None of the MBM events are enabled. Only works with opcode '=' for write.
>   
>   	Examples:
>   	::
> @@ -358,6 +358,85 @@ with the following files:
>   
>   	 /child_default_mon_grp/ - This is a child monitor group of default CTRL_MON group.
>   
> +	Assignment state can be updated by writing to the interface.
> +
> +	Format is similar to the list format with addition of op-code for the
> +	assignment operation.
> +
> +		"<CTRL_MON group>/<MON group>/<op-code><flags>"

Missing domain_id

> +
> +	Format for each type of groups:
> +
> +        * Default CTRL_MON group:
> +                "//<domain_id><op-code><flags>"
> +
> +        * Non-default CTRL_MON group:
> +                "<CTRL_MON group>//<domain_id><op-code><flags>"
> +
> +        * Child MON group of default CTRL_MON group:
> +                "/<MON group>/<domain_id><op-code><flags>"
> +
> +        * Child MON group of non-default CTRL_MON group:
> +                "<CTRL_MON group>/<MON group>/<domain_id><op-code><flags>"
> +
> +	Op-code can be one of the following:
> +	::
> +
> +	 = Update the assignment to match the flags.
> +	 + Add a new state.
> +	 - delete a new state.
> +
> +	Examples:
> +	::
> +
> +	  Initial group status:
> +	  # cat /sys/fs/resctrl/info/L3_MON/mbm_control
> +	  non_default_ctrl_mon_grp//0=tl;1=tl;
> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> +	  //0=tl;1=tl;
> +	  /child_default_mon_grp/0=tl;1=tl;
> +
> +	  To update the default group to enable only total event on domain 0:
> +	  # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_control
> +
> +	  Assignment status after the update:
> +	  # cat /sys/fs/resctrl/info/L3_MON/mbm_control
> +	  non_default_ctrl_mon_grp//0=tl;1=tl;
> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> +	  //0=t;1=tl;
> +	  /child_default_mon_grp/0=tl;1=tl;
> +
> +	  To update the MON group child_default_mon_grp to remove total event on domain 1:
> +	  # echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_control
> +
> +	  Assignment status after the update:
> +	  $ cat /sys/fs/resctrl/info/L3_MON/mbm_control
> +	  non_default_ctrl_mon_grp//0=tl;1=tl;
> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> +	  //0=t;1=tl;
> +	  /child_default_mon_grp/0=tl;1=l;
> +
> +	  To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to
> +	  remove both local and total events on domain 1:
> +	  # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
> +			/sys/fs/resctrl/info/L3_MON/mbm_control
> +
> +	  Assignment status after the update:
> +	  non_default_ctrl_mon_grp//0=tl;1=tl;
> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
> +	  //0=t;1=tl;
> +	  /child_default_mon_grp/0=tl;1=l;
> +
> +	  To update the default group to add a local event domain 0.
> +	  # echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_control
> +
> +	  Assignment status after the update:
> +	  # cat /sys/fs/resctrl/info/L3_MON/mbm_control
> +	  non_default_ctrl_mon_grp//0=tl;1=tl;
> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
> +	  //0=tl;1=tl;
> +	  /child_default_mon_grp/0=tl;1=l;
> +
>   "max_threshold_occupancy":
>   		Read/write file provides the largest value (in
>   		bytes) at which a previously used LLC_occupancy
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 0de9f23d5389..84c0874d7872 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1068,6 +1068,253 @@ static int rdtgroup_mbm_control_show(struct kernfs_open_file *of,
>   	return 0;
>   }
>   
> +static int rdtgroup_str_to_mon_state(char *flag)
> +{
> +	int i, mon_state = 0;
> +
> +	for (i = 0; i < strlen(flag); i++) {
> +		switch (*(flag + i)) {
> +		case 't':
> +			mon_state |= ASSIGN_TOTAL;
> +			break;
> +		case 'l':
> +			mon_state |= ASSIGN_LOCAL;
> +			break;
> +		case '_':
> +			mon_state = ASSIGN_NONE;
> +			break;
> +		default:
> +			mon_state = ASSIGN_NONE;
> +			break;
> +		}
> +	}
> +

No. As I mentioned before this makes all this work for nothing
by preventing us from ever adding another flag. Please do not
have a default catchall that unassigns all flags.

> +	return mon_state;
> +}
> +
> +static struct rdtgroup *rdtgroup_find_grp(enum rdt_group_type rtype, char *p_grp, char *c_grp)
> +{
> +	struct rdtgroup *rdtg, *crg;
> +
> +	if (rtype == RDTCTRL_GROUP && *p_grp == '\0') {
> +		return &rdtgroup_default;
> +	} else if (rtype == RDTCTRL_GROUP) {
> +		list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list)
> +			if (!strcmp(p_grp, rdtg->kn->name))
> +				return rdtg;
> +	} else if (rtype == RDTMON_GROUP) {
> +		list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
> +			if (!strcmp(p_grp, rdtg->kn->name)) {
> +				list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
> +						    mon.crdtgrp_list) {
> +					if (!strcmp(c_grp, crg->kn->name))
> +						return crg;
> +				}
> +			}
> +		}
> +	}
> +
> +	return NULL;
> +}
> +
> +static int rdtgroup_process_flags(enum rdt_group_type rtype, char *p_grp, char *c_grp, char *tok)
> +{
> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> +	int op, mon_state, assign_state, unassign_state;
> +	char *dom_str, *id_str, *op_str;
> +	struct rdt_mon_domain *d;
> +	struct rdtgroup *rdtgrp;
> +	unsigned long dom_id;
> +	int ret, found = 0;
> +
> +	rdtgrp = rdtgroup_find_grp(rtype, p_grp, c_grp);
> +
> +	if (!rdtgrp) {
> +		rdt_last_cmd_puts("Not a valid resctrl group\n");
> +		return -EINVAL;
> +	}
> +
> +next:
> +	if (!tok || tok[0] == '\0')
> +		return 0;
> +
> +	/* Start processing the strings for each domain */
> +	dom_str = strim(strsep(&tok, ";"));
> +
> +	op_str = strpbrk(dom_str, "=+-");
> +
> +	if (op_str) {
> +		op = *op_str;
> +	} else {
> +		rdt_last_cmd_puts("Missing operation =, +, -, _ character\n");
> +		return -EINVAL;
> +	}
> +
> +	id_str = strsep(&dom_str, "=+-");
> +
> +	if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
> +		rdt_last_cmd_puts("Missing domain id\n");
> +		return -EINVAL;
> +	}
> +
> +	/* Verify if the dom_id is valid */
> +	list_for_each_entry(d, &r->mon_domains, hdr.list) {
> +		if (d->hdr.id == dom_id) {
> +			found = 1;
> +			break;
> +		}
> +	}
> +	if (!found) {
> +		rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
> +		return -EINVAL;
> +	}
> +
> +	mon_state = rdtgroup_str_to_mon_state(dom_str);
> +
> +	assign_state = 0;
> +	unassign_state = 0;
> +
> +	switch (op) {
> +	case '+':
> +		if (mon_state == ASSIGN_NONE) {
> +			rdt_last_cmd_puts("Invalid assign opcode\n");
> +			goto out_fail;
> +		}
> +		assign_state = mon_state;
> +		break;
> +	case '-':
> +		if (mon_state == ASSIGN_NONE) {
> +			rdt_last_cmd_puts("Invalid assign opcode\n");
> +			goto out_fail;
> +		}
> +		unassign_state = mon_state;
> +		break;
> +	case '=':
> +		assign_state = mon_state;
> +		unassign_state = (ASSIGN_TOTAL | ASSIGN_LOCAL) & ~assign_state;
> +		break;
> +	default:
> +		break;
> +	}
> +

this flow is not clear to me ... I see how an existing counter is
configured but I do not see any counter being freed/allocated, where is that
done?

> +	if (assign_state & ASSIGN_TOTAL)
> +		ret = resctrl_arch_assign_cntr(d, QOS_L3_MBM_TOTAL_EVENT_ID,
> +					       rdtgrp->mon.rmid,
> +					       rdtgrp->mon.cntr_id[0],
> +					       rdtgrp->closid, 1);
> +	if (ret)
> +		goto out_fail;
> +
> +	if (assign_state & ASSIGN_LOCAL)
> +		ret = resctrl_arch_assign_cntr(d, QOS_L3_MBM_LOCAL_EVENT_ID,
> +					       rdtgrp->mon.rmid,
> +					       rdtgrp->mon.cntr_id[1],
> +					       rdtgrp->closid, 1);
> +
> +	if (ret)
> +		goto out_fail;
> +
> +	if (unassign_state & ASSIGN_TOTAL)
> +		ret = resctrl_arch_assign_cntr(d, QOS_L3_MBM_TOTAL_EVENT_ID,
> +					       rdtgrp->mon.rmid,
> +					       rdtgrp->mon.cntr_id[0],
> +					       rdtgrp->closid, 0);
> +
> +	if (ret)
> +		goto out_fail;
> +
> +	if (unassign_state & ASSIGN_LOCAL)
> +		ret = resctrl_arch_assign_cntr(d, QOS_L3_MBM_LOCAL_EVENT_ID,
> +					       rdtgrp->mon.rmid,
> +					       rdtgrp->mon.cntr_id[1],
> +					       rdtgrp->closid, 0);
> +	if (ret)
> +		goto out_fail;
> +
> +	goto next;
> +
> +out_fail:
> +
> +	return -EINVAL;
> +}
> +
> +static ssize_t rdtgroup_mbm_control_write(struct kernfs_open_file *of,
> +					  char *buf, size_t nbytes,
> +					  loff_t off)
> +{
> +	struct rdt_resource *r = of->kn->parent->priv;
> +	char *token, *cmon_grp, *mon_grp;
> +	struct rdt_hw_resource *hw_res;
> +	int ret;
> +
> +	hw_res = resctrl_to_arch_res(r);
> +	if (!hw_res->abmc_enabled)
> +		return -EINVAL;
> +
> +	/* Valid input requires a trailing newline */
> +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
> +		return -EINVAL;
> +
> +	buf[nbytes - 1] = '\0';
> +	rdt_last_cmd_clear();

rdt_last_cmd_clear() should be called with mutex held

> +
> +	cpus_read_lock();
> +	mutex_lock(&rdtgroup_mutex);
> +
> +	while ((token = strsep(&buf, "\n")) != NULL) {
> +		if (strstr(token, "//")) {
> +			/*
> +			 * The CTRL_MON group processing:
> +			 * default CTRL_MON group: "//<flags>"
> +			 * non-default CTRL_MON group: "<CTRL_MON group>//flags"
> +			 * The CTRL_MON group will be empty string if it is a
> +			 * default group.
> +			 */
> +			cmon_grp = strsep(&token, "//");
> +
> +			/*
> +			 * strsep returns empty string for contiguous delimiters.
> +			 * Make sure check for two consicutive delimiters and

consicutive -> consecutive

> +			 * advance the token.
> +			 */
> +			mon_grp = strsep(&token, "//");
> +			if (*mon_grp != '\0') {
> +				rdt_last_cmd_printf("Invalid CTRL_MON group format %s\n", token);
> +				ret = -EINVAL;
> +				break;
> +			}
> +
> +			ret = rdtgroup_process_flags(RDTCTRL_GROUP, cmon_grp, mon_grp, token);
> +			if (ret)
> +				break;
> +		} else if (strstr(token, "/")) {
> +			/*
> +			 * MON group processing:
> +			 * MON_GROUP inside default CTRL_MON group: "/<MON group>/<flags>"
> +			 * MON_GROUP within CTRL_MON group: "<CTRL_MON group>/<MON group>/<flags>"
> +			 */
> +			cmon_grp = strsep(&token, "/");
> +
> +			/* Extract the MON_GROUP. It cannot be empty string */
> +			mon_grp = strsep(&token, "/");
> +			if (*mon_grp == '\0') {
> +				rdt_last_cmd_printf("Invalid MON_GROUP format %s\n", token);
> +				ret = -EINVAL;
> +				break;
> +			}
> +
> +			ret = rdtgroup_process_flags(RDTMON_GROUP, cmon_grp, mon_grp, token);
> +			if (ret)
> +				break;
> +		}

can these two blocks not be merged? strsep(&token, "//") and strsep(&token, "/") do the same
thing, no?

> +	}
> +
> +	mutex_unlock(&rdtgroup_mutex);
> +	cpus_read_unlock();
> +
> +	return ret ?: nbytes;
> +}
> +
>   #ifdef CONFIG_PROC_CPU_RESCTRL
>   
>   /*
> @@ -2282,9 +2529,10 @@ static struct rftype res_common_files[] = {
>   	},
>   	{
>   		.name		= "mbm_control",
> -		.mode		= 0444,
> +		.mode		= 0644,
>   		.kf_ops		= &rdtgroup_kf_single_ops,
>   		.seq_show	= rdtgroup_mbm_control_show,
> +		.write		= rdtgroup_mbm_control_write,
>   	},
>   	{
>   		.name		= "cpus_list",

Reinette

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 17/20] x86/resctrl: Introduce the interface switch between monitor modes
  2024-07-03 21:48 ` [PATCH v5 17/20] x86/resctrl: Introduce the interface switch between monitor modes Babu Moger
  2024-07-12 22:14   ` Reinette Chatre
@ 2024-07-13  7:15   ` Markus Elfring
  1 sibling, 0 replies; 95+ messages in thread
From: Markus Elfring @ 2024-07-13  7:15 UTC (permalink / raw)
  To: Babu Moger, x86, Borislav Petkov, Dave Hansen, Fenghua Yu,
	Ingo Molnar, Jonathan Corbet, Reinette Chatre, Thomas Gleixner
  Cc: LKML, linux-doc, Breno Leitao, Daniel Sneddon, H. Peter Anvin,
	Ilpo Järvinen, James Morse, Jim Mattson, Jithu Joseph,
	Josh Poimboeuf, Julia Lawall, Kai Huang, Kan Liang, Kim Phillips,
	Lukas Bulwahn, Maciej Wieczor-Retman, Paolo Bonzini,
	Paul E. McKenney, Peter Newman, Peter Zijlstra, Randy Dunlap,
	Rick Edgecombe, Sandipan Das, Sean Christopherson,
	Stephane Eranian, Tejun Heo, Yan-Jie Wang

…
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -910,6 +910,40 @@ static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
>  	return 0;
>  }
>
> +static ssize_t rdtgroup_mbm_mode_write(struct kernfs_open_file *of,
> +				       char *buf, size_t nbytes,
> +				       loff_t off)
> +{
> +	cpus_read_lock();
> +	mutex_lock(&rdtgroup_mutex);
> +
> +	rdt_last_cmd_clear();
> +	mutex_unlock(&rdtgroup_mutex);
> +	cpus_read_unlock();
> +
> +	return ret ?: nbytes;
> +}
…

Would you become interested to apply statements like the following?

* guard(cpus_read_lock)();
  https://elixir.bootlin.com/linux/v6.10-rc7/source/include/linux/cleanup.h#L133

* guard(mutex)(&rdtgroup_mutex);
  https://elixir.bootlin.com/linux/v6.10-rc7/source/include/linux/mutex.h#L196


Regards,
Markus

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 16/20] x86/resctrl: Report "Unassigned" for MBM events in ABMC mode
  2024-07-03 21:48 ` [PATCH v5 16/20] x86/resctrl: Report "Unassigned" for MBM events in ABMC mode Babu Moger
  2024-07-12 22:13   ` Reinette Chatre
@ 2024-07-13 20:26   ` Markus Elfring
  1 sibling, 0 replies; 95+ messages in thread
From: Markus Elfring @ 2024-07-13 20:26 UTC (permalink / raw)
  To: Babu Moger, x86, Borislav Petkov, Dave Hansen, Fenghua Yu,
	Ingo Molnar, Jonathan Corbet, Reinette Chatre, Thomas Gleixner
  Cc: LKML, linux-doc, Breno Leitao, Daniel Sneddon, H. Peter Anvin,
	Ilpo Järvinen, James Morse, Jim Mattson, Jithu Joseph,
	Josh Poimboeuf, Julia Lawall, Kai Huang, Kan Liang, Kim Phillips,
	Lukas Bulwahn, Maciej Wieczor-Retman, Paolo Bonzini,
	Paul E. McKenney, Peter Newman, Peter Zijlstra, Randy Dunlap,
	Rick Edgecombe, Sandipan Das, Sean Christopherson,
	Stephane Eranian, Tejun Heo, Yan-Jie Wang

…
> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> @@ -609,12 +609,21 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
…
+		if (resctrl_arch_get_abmc_enabled()) {
+			index = mon_event_config_index_get(evtid);
+			if (rdtgrp->mon.cntr_id[index] == MON_CNTR_UNSET)
+				seq_puts(m, "Unassigned\n");
+			else
+				seq_puts(m, "Unavailable\n");
+		} else {
…

I suggest to restrict the scope for the shown local variable to this if branch.

How do you think about to apply a code variant like the following?

			int index = mon_event_config_index_get(evtid);

			seq_puts(m,
				 (rdtgrp->mon.cntr_id[index] == MON_CNTR_UNSET
				 ? "Unassigned\n"
                                 : "Unavailable\n") );


Regards,
Markus

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 01/20] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC)
  2024-07-12 21:55   ` Reinette Chatre
@ 2024-07-15 18:36     ` Moger, Babu
  0 siblings, 0 replies; 95+ messages in thread
From: Moger, Babu @ 2024-07-15 18:36 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 7/12/24 16:55, Reinette Chatre wrote:
> Hi Babu,
> 
> On 7/3/24 2:48 PM, Babu Moger wrote:
>> Users can create as many monitor groups as RMIDs supported by the hardware.
>> However, bandwidth monitoring feature on AMD system only guarantees that
>> RMIDs currently assigned to a processor will be tracked by hardware. The
>> counters of any other RMIDs which are no longer being tracked will be
>> reset to zero. The MBM event counters return "Unavailable" for the RMIDs
>> that are not tracked by hardware. So, there can be only limited number of
>> groups that can give guaranteed monitoring numbers. With ever changing
>> configurations there is no way to definitely know which of these groups
>> are being tracked for certain point of time. Users do not have the option
>> to monitor a group or set of groups for certain period of time without
>> worrying about RMID being reset in between.
>>
>> The ABMC feature provides an option to the user to assign a hardware
>> counter to an RMID and monitor the bandwidth as long as it is assigned.
>> The assigned RMID will be tracked by the hardware until the user unassigns
>> it manually. There is no need to worry about counters being reset during
>> this period. Additionally, the user can specify a bitmask identifying the
>> specific bandwidth types from the given source to track with the counter.
>>
>> Without ABMC enabled, monitoring will work in current mode without
>> assignment option.
>>
>> Linux resctrl subsystem provides the interface to count maximum of two
>> memory bandwidth events per group, from a combination of available total
>> and local events. Keeping the current interface, users can enable a maximum
>> of 2 ABMC counters per group. User will also have the option to enable only
>> one counter to the group. If the system runs out of assignable ABMC
>> counters, kernel will display an error. Users need to disable an already
>> enabled counter to make space for new assignments.
>>
>> The feature can be detected via CPUID_Fn80000020_EBX_x00 bit 5.
>> Bits Description
>> 5    ABMC (Assignable Bandwidth Monitoring Counters)
>>
>> The feature details are documented in APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC).
>>
>> Note: Checkpatch checks/warnings are ignored to maintain coding style.
> 
> This note may be more appropriate below the '---' separator line.

Sure.

> 
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> 
> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 03/20] x86/resctrl: Consolidate monitoring related data from rdt_resource
  2024-07-12 21:57   ` Reinette Chatre
@ 2024-07-15 19:05     ` Moger, Babu
  0 siblings, 0 replies; 95+ messages in thread
From: Moger, Babu @ 2024-07-15 19:05 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 7/12/24 16:57, Reinette Chatre wrote:
> Hi Babu,
> 
> On 7/3/24 2:48 PM, Babu Moger wrote:
>> The cache allocation and memory bandwidth allocation feature properties
>> are consolidated into cache and membw structures respectively. In
> 
> Let "In preparation ... " start a new paragraph.
> 
> Quoting Documentation/process/maintainer-tip.rst:
>     It's also useful to structure the changelog into several paragraphs
>     and not lump everything together into a single one. A good structure
>     is to explain the context, the problem and the solution in separate
>     paragraphs and this order.

Ok. Sure,
> 
>> preparation for more monitoring properties that will clobber the existing
>> resource struct more, re-organize the monitoring specific properties into
>> separate structure.
> 
> "re-organize the monitoring specific properties into separate structure" ->
> "re-organize the monitoring specific properties to also be in a separate
> structure."

Sure.
> 
>>
>> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> 
> ...
> 
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index b0875b99e811..e43fc5bb5a3a 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -182,6 +182,16 @@ enum resctrl_scope {
>>       RESCTRL_L3_NODE,
>>   };
>>   +/**
>> + * struct resctrl_mon - Monitoring related data
>> + * @num_rmid:        Number of RMIDs available
>> + * @evt_list:        List of monitoring events
>> + */
>> +struct resctrl_mon {
>> +    int            num_rmid;
>> +    struct list_head    evt_list;
>> +};
>> +
>>   /**
>>    * struct rdt_resource - attributes of a resctrl resource
>>    * @rid:        The index of the resource
>> @@ -207,11 +217,11 @@ struct rdt_resource {
>>       int            rid;
>>       bool            alloc_capable;
>>       bool            mon_capable;
>> -    int            num_rmid;
>>       enum resctrl_scope    ctrl_scope;
>>       enum resctrl_scope    mon_scope;
>>       struct resctrl_cache    cache;
>>       struct resctrl_membw    membw;
>> +    struct resctrl_mon    mon;
>>       struct list_head    ctrl_domains;
>>       struct list_head    mon_domains;
>>       char            *name;
>> @@ -221,7 +231,6 @@ struct rdt_resource {
>>       int            (*parse_ctrlval)(struct rdt_parse_data *data,
>>                            struct resctrl_schema *s,
>>                            struct rdt_ctrl_domain *d);
>> -    struct list_head    evt_list;
>>       unsigned long        fflags;
>>       bool            cdp_capable;
>>   };
> 
> struct rdt_resource's kernel-doc still refers to the members
> removed in this patch. Its kernel-doc also needs an update for the new
> member added.

Yea. My bad. Will correct it.

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 04/20] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
  2024-07-12 22:04   ` Reinette Chatre
@ 2024-07-15 20:04     ` Moger, Babu
  2024-07-16 15:11       ` Reinette Chatre
  0 siblings, 1 reply; 95+ messages in thread
From: Moger, Babu @ 2024-07-15 20:04 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 7/12/24 17:04, Reinette Chatre wrote:
> Hi Babu,
> 
> On 7/3/24 2:48 PM, Babu Moger wrote:
>> ABMC feature details are reported via CPUID Fn8000_0020_EBX_x5.
>> Bits Description
>> 15:0 MAX_ABMC Maximum Supported Assignable Bandwidth
>>       Monitoring Counter ID + 1
>>
>> The feature details are documented in APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC).
> 
> <insert snippet about what the patch does>

Ok Sure.

> 
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v5: Name change num_cntrs to num_mbm_cntrs.
>>      Moved abmc_capable to resctrl_mon.
>>
>> v4: Removed resctrl_arch_has_abmc(). Added all the code inline. We dont
>>      need to separate this as arch code.
>>
>> v3: Removed changes related to mon_features.
>>      Moved rdt_cpu_has to core.c and added new function
>> resctrl_arch_has_abmc.
>>      Also moved the fields mbm_assign_capable and mbm_assign_cntrs to
>>      rdt_resource. (James)
>>
>> v2: Changed the field name to mbm_assign_capable from abmc_capable.
>> ---
>>   arch/x86/kernel/cpu/resctrl/monitor.c | 12 ++++++++++++
>>   include/linux/resctrl.h               |  4 ++++
>>   2 files changed, 16 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
>> b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 795fe91a8feb..87d40f149ebc 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -1229,6 +1229,18 @@ int __init rdt_get_mon_l3_config(struct
>> rdt_resource *r)
>>               mbm_local_event.configurable = true;
>>               mbm_config_rftype_init("mbm_local_bytes_config");
>>           }
>> +
>> +        if (rdt_cpu_has(X86_FEATURE_ABMC)) {
>> +            r->mon.abmc_capable = true;
>> +            /*
>> +             * Query CPUID_Fn80000020_EBX_x05 for number of
>> +             * ABMC counters
>> +             */
>> +            cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
>> +            r->mon.num_mbm_cntrs = (ebx & 0xFFFF) + 1;
>> +            if (WARN_ON(r->mon.num_mbm_cntrs > 64))
>> +                r->mon.num_mbm_cntrs = 64;
>> +        }
>>       }
>>         l3_mon_evt_init(r);
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index e43fc5bb5a3a..62f0f002ef41 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -185,10 +185,14 @@ enum resctrl_scope {
>>   /**
>>    * struct resctrl_mon - Monitoring related data
>>    * @num_rmid:        Number of RMIDs available
>> + * @num_mbm_cntrs:    Number of monitoring counters
>> + * @abmc_capable:    Is system capable of supporting monitor assignment?
>>    * @evt_list:        List of monitoring events
>>    */
>>   struct resctrl_mon {
>>       int            num_rmid;
>> +    int            num_mbm_cntrs;
>> +    bool            abmc_capable;
>>       struct list_head    evt_list;
>>   };
>>   
> 
> How about renaming "abmc_capable" to "mbm_cntr_capable? That would,
> (a) connect the capability to the "num_mbm_cntrs" property, and (b)
> remove the AMD marketing name from the resctrl filesystem code that
> will be shared by all architectures.

"mbm_cntr_capable" does not give full meaning of the feature.

How about "mbm_cntr_assignable"?

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 04/20] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
  2024-07-15 20:04     ` Moger, Babu
@ 2024-07-16 15:11       ` Reinette Chatre
  0 siblings, 0 replies; 95+ messages in thread
From: Reinette Chatre @ 2024-07-16 15:11 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 7/15/24 1:04 PM, Moger, Babu wrote:
> On 7/12/24 17:04, Reinette Chatre wrote:
>> On 7/3/24 2:48 PM, Babu Moger wrote:

>>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>>> index e43fc5bb5a3a..62f0f002ef41 100644
>>> --- a/include/linux/resctrl.h
>>> +++ b/include/linux/resctrl.h
>>> @@ -185,10 +185,14 @@ enum resctrl_scope {
>>>    /**
>>>     * struct resctrl_mon - Monitoring related data
>>>     * @num_rmid:        Number of RMIDs available
>>> + * @num_mbm_cntrs:    Number of monitoring counters
>>> + * @abmc_capable:    Is system capable of supporting monitor assignment?
>>>     * @evt_list:        List of monitoring events
>>>     */
>>>    struct resctrl_mon {
>>>        int            num_rmid;
>>> +    int            num_mbm_cntrs;
>>> +    bool            abmc_capable;
>>>        struct list_head    evt_list;
>>>    };
>>>    
>>
>> How about renaming "abmc_capable" to "mbm_cntr_capable? That would,
>> (a) connect the capability to the "num_mbm_cntrs" property, and (b)
>> remove the AMD marketing name from the resctrl filesystem code that
>> will be shared by all architectures.
> 
> "mbm_cntr_capable" does not give full meaning of the feature.
> 
> How about "mbm_cntr_assignable"?
> 

"mbm_cntr_assignable" sounds good to me.

Thank you.

Reinette

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 06/20] x86/resctrl: Add support to enable/disable AMD ABMC feature
  2024-07-12 22:05   ` Reinette Chatre
@ 2024-07-16 15:13     ` Moger, Babu
  2024-07-16 17:51       ` Reinette Chatre
  2024-08-16 16:29     ` James Morse
  1 sibling, 1 reply; 95+ messages in thread
From: Moger, Babu @ 2024-07-16 15:13 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 7/12/24 17:05, Reinette Chatre wrote:
> Hi Babu,
> 
> On 7/3/24 2:48 PM, Babu Moger wrote:
>> Add the functionality to enable/disable AMD ABMC feature.
>>
>> AMD ABMC feature is enabled by setting enabled bit(0) in MSR
>> L3_QOS_EXT_CFG.  When the state of ABMC is changed, the MSR needs
>> to be updated on all the logical processors in the QOS Domain.
>>
>> Hardware counters will reset when ABMC state is changed. Reset the
>> architectural state so that reading of hardware counter is not considered
>> as an overflow in next update.
>>
>> The ABMC feature details are documented in APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC).
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>> ---
>> v5: Renamed resctrl_abmc_enable to resctrl_arch_abmc_enable.
>>      Renamed resctrl_abmc_disable to resctrl_arch_abmc_disable.
>>      Introduced resctrl_arch_get_abmc_enabled to get abmc state from
>>      non-arch code.
>>      Renamed resctrl_abmc_set_all to _resctrl_abmc_enable().
>>      Modified commit log to make it clear about AMD ABMC feature.
>>
>> v3: No changes.
>>
>> v2: Few text changes in commit message.
>> ---
>>   arch/x86/include/asm/msr-index.h       |  1 +
>>   arch/x86/kernel/cpu/resctrl/internal.h | 13 +++++
>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 66 ++++++++++++++++++++++++++
>>   3 files changed, 80 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/msr-index.h
>> b/arch/x86/include/asm/msr-index.h
>> index 01342963011e..263b2d9d00ed 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -1174,6 +1174,7 @@
>>   #define MSR_IA32_MBA_BW_BASE        0xc0000200
>>   #define MSR_IA32_SMBA_BW_BASE        0xc0000280
>>   #define MSR_IA32_EVT_CFG_BASE        0xc0000400
>> +#define MSR_IA32_L3_QOS_EXT_CFG        0xc00003ff
>>     /* MSR_IA32_VMX_MISC bits */
>>   #define MSR_IA32_VMX_MISC_INTEL_PT                 (1ULL << 14)
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
>> b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 2bd207624eec..0ce9797f80fe 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -97,6 +97,9 @@ cpumask_any_housekeeping(const struct cpumask *mask,
>> int exclude_cpu)
>>       return cpu;
>>   }
>>   +/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature */
> 
> Please be consistent throughout series to have sentences end with period.

Sure.

> 
>> +#define ABMC_ENABLE            BIT(0)
>> +
>>   struct rdt_fs_context {
>>       struct kernfs_fs_context    kfc;
>>       bool                enable_cdpl2;
>> @@ -477,6 +480,7 @@ struct rdt_parse_data {
>>    * @mbm_cfg_mask:    Bandwidth sources that can be tracked when Bandwidth
>>    *            Monitoring Event Configuration (BMEC) is supported.
>>    * @cdp_enabled:    CDP state of this resource
>> + * @abmc_enabled:    ABMC feature is enabled
>>    *
>>    * Members of this structure are either private to the architecture
>>    * e.g. mbm_width, or accessed via helpers that provide abstraction. e.g.
>> @@ -491,6 +495,7 @@ struct rdt_hw_resource {
>>       unsigned int        mbm_width;
>>       unsigned int        mbm_cfg_mask;
>>       bool            cdp_enabled;
>> +    bool            abmc_enabled;
>>   };
> 
> mbm_cntr_enabled? This is architecture specific code so there is more
> flexibility
> here, but it may make implementation easier to understand if consistent
> naming is used
> between fs and arch code.

How about "mbm_cntr_assign_enabled" or "cntr_assign_enabled" ?

> 
>>     static inline struct rdt_hw_resource *resctrl_to_arch_res(struct
>> rdt_resource *r)
>> @@ -536,6 +541,14 @@ int resctrl_arch_set_cdp_enabled(enum
>> resctrl_res_level l, bool enable);
>>     void arch_mon_domain_online(struct rdt_resource *r, struct
>> rdt_mon_domain *d);
>>   +static inline bool resctrl_arch_get_abmc_enabled(void)
>> +{
>> +    return rdt_resources_all[RDT_RESOURCE_L3].abmc_enabled;
>> +}
>> +
>> +int resctrl_arch_abmc_enable(void);
>> +void resctrl_arch_abmc_disable(void);
>> +
>>   /*
>>    * To return the common struct rdt_resource, which is contained in struct
>>    * rdt_hw_resource, walk the resctrl member of struct rdt_hw_resource.
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 7e76f8d839fc..471fc0dbd7c3 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -2402,6 +2402,72 @@ int resctrl_arch_set_cdp_enabled(enum
>> resctrl_res_level l, bool enable)
>>       return 0;
>>   }
>>   +/*
>> + * Update L3_QOS_EXT_CFG MSR on all the CPUs associated with the resource.
>> + */
>> +static void resctrl_abmc_set_one_amd(void *arg)
>> +{
>> +    bool *enable = arg;
>> +    u64 msrval;
>> +
>> +    rdmsrl(MSR_IA32_L3_QOS_EXT_CFG, msrval);
>> +
>> +    if (*enable)
>> +        msrval |= ABMC_ENABLE;
>> +    else
>> +        msrval &= ~ABMC_ENABLE;
>> +
>> +    wrmsrl(MSR_IA32_L3_QOS_EXT_CFG, msrval);
>> +}
> 
> msr_set_bit() and msr_clear_bit() can be used here.

Sure.

> 
>> +
>> +static int _resctrl_abmc_enable(struct rdt_resource *r, bool enable)
>> +{
>> +    struct rdt_mon_domain *d;
>> +
>> +    /*
>> +     * Hardware counters will reset after switching the monitor mode.
>> +     * Reset the architectural state so that reading of hardware
>> +     * counter is not considered as an overflow in the next update.
>> +     */
>> +    list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> +        on_each_cpu_mask(&d->hdr.cpu_mask,
>> +                 resctrl_abmc_set_one_amd, &enable, 1);
>> +        resctrl_arch_reset_rmid_all(r, d);
>> +    }
>> +
>> +    return 0;
>> +}
> 
> Seems like _resctrl_abmc_enable() can just return void.

Sure.
> 
>> +
>> +int resctrl_arch_abmc_enable(void)
> 
> resctrl_arch_mbm_cntr_enable()? I'll no longer point all these out.

Sure.

> 
>> +{
>> +    struct rdt_resource *r =
>> &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> +    struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>> +    int ret = 0;
>> +
>> +    lockdep_assert_held(&rdtgroup_mutex);
>> +
>> +    if (r->mon.abmc_capable && !hw_res->abmc_enabled) {
>> +        ret = _resctrl_abmc_enable(r, true);
>> +        if (!ret)
>> +            hw_res->abmc_enabled = true;
> 
> The above error handling seems unnecessary.

Sure.

> 
>> +    }
>> +
>> +    return ret;
> 
> resctrl_arch_abmc_enable() should probably keep returning an int even though
> this implementation does not need it since other archs may indeed return
> error.

Yea. Sure.

> 
>> +}
>> +
>> +void resctrl_arch_abmc_disable(void)
>> +{
>> +    struct rdt_resource *r =
>> &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> +    struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>> +
>> +    lockdep_assert_held(&rdtgroup_mutex);
>> +
>> +    if (hw_res->abmc_enabled) {
>> +        _resctrl_abmc_enable(r, false);
>> +        hw_res->abmc_enabled = false;
>> +    }
>> +}
>> +
>>   /*
>>    * We don't allow rdtgroup directories to be created anywhere
>>    * except the root directory. Thus when looking for the rdtgroup
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 07/20] x86/resctrl: Introduce the interface to display monitor mode
  2024-07-12 22:06   ` Reinette Chatre
@ 2024-07-16 16:51     ` Moger, Babu
  0 siblings, 0 replies; 95+ messages in thread
From: Moger, Babu @ 2024-07-16 16:51 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 7/12/24 17:06, Reinette Chatre wrote:
> Hi Babu,
> 
> On 7/3/24 2:48 PM, Babu Moger wrote:
>> The ABMC feature provides an option to the user to assign a hardware
>> counter to an RMID and monitor the bandwidth as long as it is assigned.
>> ABMC mode is enabled by default when supported. System can be one mode
>> at a time (Legacy monitor mode or ABMC mode).
>>
>> Provide an interface to display the monitor mode on the system.
>>      $cat /sys/fs/resctrl/info/L3_MON/mbm_mode
>>      [abmc]
>>      legacy
> 
> <insert snippet about what happens when user switches from one mode
> to another>

Sure.

> 
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v5: Changed interface name to mbm_mode.
>>      It will be always available even if ABMC feature is not supported.
>>      Added description in resctrl.rst about ABMC mode.
>>      Fixed display abmc and legacy consistantly.
>>
>> v4: Fixed the checks for legacy and abmc mode. Default it ABMC.
>>
>> v3: New patch to display ABMC capability.
>> ---
>>   Documentation/arch/x86/resctrl.rst     | 30 ++++++++++++++++++++++++++
>>   arch/x86/kernel/cpu/resctrl/monitor.c  |  2 ++
>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 26 ++++++++++++++++++++++
>>   3 files changed, 58 insertions(+)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst
>> b/Documentation/arch/x86/resctrl.rst
>> index 30586728a4cd..108e494fd7cc 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -257,6 +257,36 @@ with the following files:
>>           # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
>>           0=0x30;1=0x30;3=0x15;4=0x15
>>   +"mbm_mode":
>> +    Reports the list of assignable monitoring features supported. The
>> +    enclosed brackets indicate which feature is enabled.
>> +    ::
>> +
>> +      cat /sys/fs/resctrl/info/L3_MON/mbm_mode
>> +      [abmc]
>> +      legacy
>> +
> 
> "mbm_cntr" mode can be documented here with the details on how AMD's ABMC is
> one example of how it may be implemented on a system.

Will display as "mbm_cntr_assignable".

Yes. Will add details specific about AMD's ABMC.

> 
>> +    The bandwidth monitoring feature on AMD system only guarantees that
>> +    RMIDs currently assigned to a processor will be tracked by hardware.
>> +    The counters of any other RMIDs which are no longer being tracked
>> +    will be reset to zero. The MBM event counters return "Unavailable"
>> +    for the RMIDs that are not tracked by hardware. So, there can be
>> +    only limited number of groups that can give guaranteed monitoring
>> +    numbers. With ever changing configurations there is no way to
>> +    definitely know which of these groups are being tracked for certain
>> +    point of time. Users do not have the option to monitor a group or
>> +    set of groups for certain period of time without worrying about
>> +    RMID being reset in between.
>> +
>> +    The ABMC feature provides an option to the user to assign a
>> +    hardware counter to an RMID and monitor the bandwidth as long as
>> +    it is assigned. The assigned RMID will be tracked by the hardware
>> +    until the user unassigns it manually. There is no need to worry
>> +    about counters being reset during this period.
>> +
>> +    Without ABMC enabled, monitoring will work in "legacy" mode
>> +    without assignment option.
> 
> Let "legacy" be a distinct mode, instead of an alternative to ABMC.

Will add as another mode.

> 
>> +
>>   "max_threshold_occupancy":
>>           Read/write file provides the largest value (in
>>           bytes) at which a previously used LLC_occupancy
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
>> b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 12793762ca24..6c4cb36b4b50 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -1245,6 +1245,8 @@ int __init rdt_get_mon_l3_config(struct
>> rdt_resource *r)
>>           }
>>       }
>>   +    resctrl_file_fflags_init("mbm_mode", RFTYPE_MON_INFO);
>> +
> 
> Is this special flag assignment necessary? With file always visible I
> think it
> can just be initialized in res_common_files below with the flag already
> assigned?

Yes. We can do that.

> 
>>       l3_mon_evt_init(r);
>>         r->mon_capable = true;
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 471fc0dbd7c3..3988d7b86817 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -845,6 +845,26 @@ static int rdtgroup_rmid_show(struct
>> kernfs_open_file *of,
>>       return ret;
>>   }
>>   +static int rdtgroup_mbm_mode_show(struct kernfs_open_file *of,
>> +                  struct seq_file *s, void *v)
>> +{
>> +    struct rdt_resource *r = of->kn->parent->priv;
>> +
>> +    if (r->mon.abmc_capable) {
>> +        if (resctrl_arch_get_abmc_enabled()) {
>> +            seq_puts(s, "[abmc]\n");
>> +            seq_puts(s, "legacy\n");
>> +        } else {
>> +            seq_puts(s, "abmc\n");
>> +            seq_puts(s, "[legacy]\n");
>> +        }
>> +    } else {
>> +        seq_puts(s, "[legacy]\n");
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>   #ifdef CONFIG_PROC_CPU_RESCTRL
>>     /*
>> @@ -1901,6 +1921,12 @@ static struct rftype res_common_files[] = {
>>           .seq_show    = mbm_local_bytes_config_show,
>>           .write        = mbm_local_bytes_config_write,
>>       },
>> +    {
>> +        .name        = "mbm_mode",
>> +        .mode        = 0444,
>> +        .kf_ops        = &rdtgroup_kf_single_ops,
>> +        .seq_show    = rdtgroup_mbm_mode_show,
>> +    },
>>       {
>>           .name        = "cpus",
>>           .mode        = 0644,
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 06/20] x86/resctrl: Add support to enable/disable AMD ABMC feature
  2024-07-16 15:13     ` Moger, Babu
@ 2024-07-16 17:51       ` Reinette Chatre
  2024-07-16 18:48         ` Moger, Babu
  2024-07-18 21:11         ` Moger, Babu
  0 siblings, 2 replies; 95+ messages in thread
From: Reinette Chatre @ 2024-07-16 17:51 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 7/16/24 8:13 AM, Moger, Babu wrote:
> On 7/12/24 17:05, Reinette Chatre wrote:
>> On 7/3/24 2:48 PM, Babu Moger wrote:
>>> Add the functionality to enable/disable AMD ABMC feature.
>>>
>>> AMD ABMC feature is enabled by setting enabled bit(0) in MSR
>>> L3_QOS_EXT_CFG.  When the state of ABMC is changed, the MSR needs
>>> to be updated on all the logical processors in the QOS Domain.
>>>
>>> Hardware counters will reset when ABMC state is changed. Reset the
>>> architectural state so that reading of hardware counter is not considered
>>> as an overflow in next update.
>>>
>>> The ABMC feature details are documented in APM listed below [1].
>>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>>> Monitoring (ABMC).
>>>
>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>>> ---
>>> v5: Renamed resctrl_abmc_enable to resctrl_arch_abmc_enable.
>>>       Renamed resctrl_abmc_disable to resctrl_arch_abmc_disable.
>>>       Introduced resctrl_arch_get_abmc_enabled to get abmc state from
>>>       non-arch code.
>>>       Renamed resctrl_abmc_set_all to _resctrl_abmc_enable().
>>>       Modified commit log to make it clear about AMD ABMC feature.
>>>
>>> v3: No changes.
>>>
>>> v2: Few text changes in commit message.
>>> ---
>>>    arch/x86/include/asm/msr-index.h       |  1 +
>>>    arch/x86/kernel/cpu/resctrl/internal.h | 13 +++++
>>>    arch/x86/kernel/cpu/resctrl/rdtgroup.c | 66 ++++++++++++++++++++++++++
>>>    3 files changed, 80 insertions(+)
>>>
>>> diff --git a/arch/x86/include/asm/msr-index.h
>>> b/arch/x86/include/asm/msr-index.h
>>> index 01342963011e..263b2d9d00ed 100644
>>> --- a/arch/x86/include/asm/msr-index.h
>>> +++ b/arch/x86/include/asm/msr-index.h
>>> @@ -1174,6 +1174,7 @@
>>>    #define MSR_IA32_MBA_BW_BASE        0xc0000200
>>>    #define MSR_IA32_SMBA_BW_BASE        0xc0000280
>>>    #define MSR_IA32_EVT_CFG_BASE        0xc0000400
>>> +#define MSR_IA32_L3_QOS_EXT_CFG        0xc00003ff
>>>      /* MSR_IA32_VMX_MISC bits */
>>>    #define MSR_IA32_VMX_MISC_INTEL_PT                 (1ULL << 14)
>>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
>>> b/arch/x86/kernel/cpu/resctrl/internal.h
>>> index 2bd207624eec..0ce9797f80fe 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>>> @@ -97,6 +97,9 @@ cpumask_any_housekeeping(const struct cpumask *mask,
>>> int exclude_cpu)
>>>        return cpu;
>>>    }
>>>    +/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature */
>>
>> Please be consistent throughout series to have sentences end with period.
> 
> Sure.
> 
>>
>>> +#define ABMC_ENABLE            BIT(0)
>>> +
>>>    struct rdt_fs_context {
>>>        struct kernfs_fs_context    kfc;
>>>        bool                enable_cdpl2;
>>> @@ -477,6 +480,7 @@ struct rdt_parse_data {
>>>     * @mbm_cfg_mask:    Bandwidth sources that can be tracked when Bandwidth
>>>     *            Monitoring Event Configuration (BMEC) is supported.
>>>     * @cdp_enabled:    CDP state of this resource
>>> + * @abmc_enabled:    ABMC feature is enabled
>>>     *
>>>     * Members of this structure are either private to the architecture
>>>     * e.g. mbm_width, or accessed via helpers that provide abstraction. e.g.
>>> @@ -491,6 +495,7 @@ struct rdt_hw_resource {
>>>        unsigned int        mbm_width;
>>>        unsigned int        mbm_cfg_mask;
>>>        bool            cdp_enabled;
>>> +    bool            abmc_enabled;
>>>    };
>>
>> mbm_cntr_enabled? This is architecture specific code so there is more
>> flexibility
>> here, but it may make implementation easier to understand if consistent
>> naming is used
>> between fs and arch code.
> 
> How about "mbm_cntr_assign_enabled" or "cntr_assign_enabled" ?

My preference is to keep the term "mbm_cntr" to be consistent with the
other variables/struct members to help when reading the code.
"mbm_cntr_assign_enabled" does seem to be getting long though.
Are you planning to use it by assigning it to a local variable with shorter
name?

As a sidenote, I will be offline for large portions of the next few weeks
and thus unresponsive during this time. I'll be back to a regular
schedule on August 12th.

Reinette

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 09/20] x86/resctrl: Initialize monitor counters bitmap
  2024-07-12 22:07   ` Reinette Chatre
@ 2024-07-16 17:59     ` Moger, Babu
  0 siblings, 0 replies; 95+ messages in thread
From: Moger, Babu @ 2024-07-16 17:59 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 7/12/24 17:07, Reinette Chatre wrote:
> Hi Babu,
> 
> On 7/3/24 2:48 PM, Babu Moger wrote:
>> Hardware provides a set of counters when the ABMC feature is supported.
>> These counters are used for enabling the events in resctrl group when
>> the feature is enabled.
>>
>> Introduce mbm_cntrs_free_map bitmap to track available and free counters.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v5:
>>    Updated the comments and commit log.
>>    Few renames
>>     num_cntrs_free_map -> mbm_cntrs_free_map
>>     num_cntrs_init -> mbm_cntrs_init
>>     Added initialization in rdt_get_tree because the default ABMC
>>     enablement happens during the init.
>>
>> v4: Changed the name to num_cntrs where applicable.
>>      Used bitmap apis.
>>      Added more comments for the globals.
>>
>> v3: Changed the bitmap name to assign_cntrs_free_map. Removed abmc
>>      from the name.
>>
>> v2: Changed the bitmap name to assignable_counter_free_map from
>>      abmc_counter_free_map.
>> ---
>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 29 ++++++++++++++++++++++++--
>>   1 file changed, 27 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 4f47f52e01c2..b3d3fa048f15 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -185,6 +185,23 @@ bool closid_allocated(unsigned int closid)
>>       return !test_bit(closid, &closid_free_map);
>>   }
>>   +/*
>> + * Counter bitmap and its length for tracking available counters.
>> + * ABMC feature provides set of hardware counters for enabling events.
>> + * Each event takes one hardware counter. Kernel needs to keep track
> 
> What is meant with "Kernel" here? It looks to be the fs code but the
> implementation has both fs and arch code reaching into the counter
> management. This should not be the case, either the fs code or the
> arch code needs to manage the counters, not both.

Yes. This needs to be done at FS code.

> 
>> + * of number of available counters.
>> + */
>> +static unsigned long mbm_cntrs_free_map;
> 
> With the lengths involved this needs a proper DECLARE_BITMAP()

Sure.

> 
>> +static unsigned int mbm_cntrs_free_map_len;
>> +
>> +static void mbm_cntrs_init(void)
>> +{
>> +    struct rdt_resource *r =
>> &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> +
>> +    bitmap_fill(&mbm_cntrs_free_map, r->mon.num_mbm_cntrs);
>> +    mbm_cntrs_free_map_len = r->mon.num_mbm_cntrs;
>> +}
>> +
>>   /**
>>    * rdtgroup_mode_by_closid - Return mode of resource group with closid
>>    * @closid: closid if the resource group
>> @@ -2466,6 +2483,12 @@ static int _resctrl_abmc_enable(struct
>> rdt_resource *r, bool enable)
>>   {
>>       struct rdt_mon_domain *d;
>>   +    /*
>> +     * Clear all the previous assignments while switching the monitor
>> +     * mode.
>> +     */
>> +    mbm_cntrs_init();
>> +
> 
> If the counters are managed by fs code then the arch code should not be
> doing this. If needed the fs code should init the counters before calling
> the arch helpers.

Yes. We cannot make this call from here. I need to move this call to FS
layer before coming to this code. Thanks for good point.

> 
>>       /*
>>        * Hardware counters will reset after switching the monitor mode.
>>        * Reset the architectural state so that reading of hardware
>> @@ -2724,10 +2747,10 @@ static void schemata_list_destroy(void)
>>     static int rdt_get_tree(struct fs_context *fc)
>>   {
>> +    struct rdt_resource *r =
>> &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>>       struct rdt_fs_context *ctx = rdt_fc2context(fc);
>>       unsigned long flags = RFTYPE_CTRL_BASE;
>>       struct rdt_mon_domain *dom;
>> -    struct rdt_resource *r;
>>       int ret;
>>         cpus_read_lock();
>> @@ -2756,6 +2779,9 @@ static int rdt_get_tree(struct fs_context *fc)
>>         closid_init();
>>   +    if (r->mon.abmc_capable)
>> +        mbm_cntrs_init();
>> +
>>       if (resctrl_arch_mon_capable())
>>           flags |= RFTYPE_MON;
>>   @@ -2800,7 +2826,6 @@ static int rdt_get_tree(struct fs_context *fc)
>>           resctrl_mounted = true;
>>         if (is_mbm_enabled()) {
>> -        r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>>           list_for_each_entry(dom, &r->mon_domains, hdr.list)
>>               mbm_setup_overflow_handler(dom, MBM_OVERFLOW_INTERVAL,
>>                              RESCTRL_PICK_ANY_CPU);
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 06/20] x86/resctrl: Add support to enable/disable AMD ABMC feature
  2024-07-16 17:51       ` Reinette Chatre
@ 2024-07-16 18:48         ` Moger, Babu
  2024-07-16 20:41           ` Reinette Chatre
  2024-07-18 21:11         ` Moger, Babu
  1 sibling, 1 reply; 95+ messages in thread
From: Moger, Babu @ 2024-07-16 18:48 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 7/16/24 12:51, Reinette Chatre wrote:
> Hi Babu,
> 
> On 7/16/24 8:13 AM, Moger, Babu wrote:
>> On 7/12/24 17:05, Reinette Chatre wrote:
>>> On 7/3/24 2:48 PM, Babu Moger wrote:
>>>> Add the functionality to enable/disable AMD ABMC feature.
>>>>
>>>> AMD ABMC feature is enabled by setting enabled bit(0) in MSR
>>>> L3_QOS_EXT_CFG.  When the state of ABMC is changed, the MSR needs
>>>> to be updated on all the logical processors in the QOS Domain.
>>>>
>>>> Hardware counters will reset when ABMC state is changed. Reset the
>>>> architectural state so that reading of hardware counter is not considered
>>>> as an overflow in next update.
>>>>
>>>> The ABMC feature details are documented in APM listed below [1].
>>>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>>>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>>>> Monitoring (ABMC).
>>>>
>>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>>>> ---
>>>> v5: Renamed resctrl_abmc_enable to resctrl_arch_abmc_enable.
>>>>       Renamed resctrl_abmc_disable to resctrl_arch_abmc_disable.
>>>>       Introduced resctrl_arch_get_abmc_enabled to get abmc state from
>>>>       non-arch code.
>>>>       Renamed resctrl_abmc_set_all to _resctrl_abmc_enable().
>>>>       Modified commit log to make it clear about AMD ABMC feature.
>>>>
>>>> v3: No changes.
>>>>
>>>> v2: Few text changes in commit message.
>>>> ---
>>>>    arch/x86/include/asm/msr-index.h       |  1 +
>>>>    arch/x86/kernel/cpu/resctrl/internal.h | 13 +++++
>>>>    arch/x86/kernel/cpu/resctrl/rdtgroup.c | 66 ++++++++++++++++++++++++++
>>>>    3 files changed, 80 insertions(+)
>>>>
>>>> diff --git a/arch/x86/include/asm/msr-index.h
>>>> b/arch/x86/include/asm/msr-index.h
>>>> index 01342963011e..263b2d9d00ed 100644
>>>> --- a/arch/x86/include/asm/msr-index.h
>>>> +++ b/arch/x86/include/asm/msr-index.h
>>>> @@ -1174,6 +1174,7 @@
>>>>    #define MSR_IA32_MBA_BW_BASE        0xc0000200
>>>>    #define MSR_IA32_SMBA_BW_BASE        0xc0000280
>>>>    #define MSR_IA32_EVT_CFG_BASE        0xc0000400
>>>> +#define MSR_IA32_L3_QOS_EXT_CFG        0xc00003ff
>>>>      /* MSR_IA32_VMX_MISC bits */
>>>>    #define MSR_IA32_VMX_MISC_INTEL_PT                 (1ULL << 14)
>>>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
>>>> b/arch/x86/kernel/cpu/resctrl/internal.h
>>>> index 2bd207624eec..0ce9797f80fe 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>>>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>>>> @@ -97,6 +97,9 @@ cpumask_any_housekeeping(const struct cpumask *mask,
>>>> int exclude_cpu)
>>>>        return cpu;
>>>>    }
>>>>    +/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature */
>>>
>>> Please be consistent throughout series to have sentences end with period.
>>
>> Sure.
>>
>>>
>>>> +#define ABMC_ENABLE            BIT(0)
>>>> +
>>>>    struct rdt_fs_context {
>>>>        struct kernfs_fs_context    kfc;
>>>>        bool                enable_cdpl2;
>>>> @@ -477,6 +480,7 @@ struct rdt_parse_data {
>>>>     * @mbm_cfg_mask:    Bandwidth sources that can be tracked when
>>>> Bandwidth
>>>>     *            Monitoring Event Configuration (BMEC) is supported.
>>>>     * @cdp_enabled:    CDP state of this resource
>>>> + * @abmc_enabled:    ABMC feature is enabled
>>>>     *
>>>>     * Members of this structure are either private to the architecture
>>>>     * e.g. mbm_width, or accessed via helpers that provide
>>>> abstraction. e.g.
>>>> @@ -491,6 +495,7 @@ struct rdt_hw_resource {
>>>>        unsigned int        mbm_width;
>>>>        unsigned int        mbm_cfg_mask;
>>>>        bool            cdp_enabled;
>>>> +    bool            abmc_enabled;
>>>>    };
>>>
>>> mbm_cntr_enabled? This is architecture specific code so there is more
>>> flexibility
>>> here, but it may make implementation easier to understand if consistent
>>> naming is used
>>> between fs and arch code.
>>
>> How about "mbm_cntr_assign_enabled" or "cntr_assign_enabled" ?
> 
> My preference is to keep the term "mbm_cntr" to be consistent with the
> other variables/struct members to help when reading the code.
> "mbm_cntr_assign_enabled" does seem to be getting long though.
> Are you planning to use it by assigning it to a local variable with shorter
> name?

Yes. We can do that.

> 
> As a sidenote, I will be offline for large portions of the next few weeks
> and thus unresponsive during this time. I'll be back to a regular
> schedule on August 12th.

Thanks for the heads up.
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 10/20] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg
  2024-07-12 22:08   ` Reinette Chatre
@ 2024-07-16 19:21     ` Moger, Babu
  2024-07-16 20:42       ` Reinette Chatre
  0 siblings, 1 reply; 95+ messages in thread
From: Moger, Babu @ 2024-07-16 19:21 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 7/12/24 17:08, Reinette Chatre wrote:
> Hi Babu,
> 
> On 7/3/24 2:48 PM, Babu Moger wrote:
>> If the BMEC (Bandwidth Monitoring Event Configuration) feature is
>> supported, the bandwidth events can be configured to track specific
>> events. The event configuration is domain specific. ABMC (Assignable
>> Bandwidth Monitoring Counters) feature needs event configuration
>> information to assign hardware counter to an RMID. Event configurations
>> are not stored in resctrl but instead always read from or written to
>> hardware directly when prompted by user space.
>>
>> Read the event configuration from the hardware during the domain
>> initialization. Save the configuration information in the rdt_hw_domain,
> 
> rdt_hw_domain -> rdt_hw_mon_domain

Sure.

> 
>> so it can be used for counter assignment.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v5: Exported mon_event_config_index_get.
>>      Renamed arch_domain_mbm_evt_config to resctrl_arch_mbm_evt_config.
>>
>> v4: Read the configuration information from the hardware to initialize.
>>      Added few commit messages.
>>      Fixed the tab spaces.
>>
>> v3: Minor changes related to rebase in mbm_config_write_domain.
>>
>> v2: No changes.
>> ---
>>   arch/x86/kernel/cpu/resctrl/core.c     |  2 ++
>>   arch/x86/kernel/cpu/resctrl/internal.h |  6 ++++++
>>   arch/x86/kernel/cpu/resctrl/monitor.c  | 22 ++++++++++++++++++++++
>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c |  2 +-
>>   4 files changed, 31 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c
>> b/arch/x86/kernel/cpu/resctrl/core.c
>> index ff5cb693b396..6265ef8b610f 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>> @@ -619,6 +619,8 @@ static void domain_add_cpu_mon(int cpu, struct
>> rdt_resource *r)
>>         arch_mon_domain_online(r, d);
>>   +    resctrl_arch_mbm_evt_config(hw_dom);
>> +
> 
> This does not look to be an arch call called by the fs code so special
> naming does not seem to be required? If it _was_ an arch callback then

Yes. Correct.

> it cannot take a HW resource as parameter since the fs code does not have
> access to that.
> 
> 
>>       if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
>>           mon_domain_free(hw_dom);
>>           return;
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
>> b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 0ce9797f80fe..4cb1a5d014a3 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -401,6 +401,8 @@ struct rdt_hw_ctrl_domain {
>>    * @d_resctrl:    Properties exposed to the resctrl file system
>>    * @arch_mbm_total:    arch private state for MBM total bandwidth
>>    * @arch_mbm_local:    arch private state for MBM local bandwidth
>> + * @mbm_total_cfg:    MBM total bandwidth configuration
>> + * @mbm_local_cfg:    MBM local bandwidth configuration
>>    *
>>    * Members of this structure are accessed via helpers that provide
>> abstraction.
>>    */
>> @@ -408,6 +410,8 @@ struct rdt_hw_mon_domain {
>>       struct rdt_mon_domain        d_resctrl;
>>       struct arch_mbm_state        *arch_mbm_total;
>>       struct arch_mbm_state        *arch_mbm_local;
>> +    u32                mbm_total_cfg;
>> +    u32                mbm_local_cfg;
>>   };
>>     static inline struct rdt_hw_ctrl_domain
>> *resctrl_to_arch_ctrl_dom(struct rdt_ctrl_domain *r)
>> @@ -662,6 +666,8 @@ void __check_limbo(struct rdt_mon_domain *d, bool
>> force_free);
>>   void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>>   void __init resctrl_file_fflags_init(const char *config,
>>                        unsigned long fflags);
>> +void resctrl_arch_mbm_evt_config(struct rdt_hw_mon_domain *hw_dom);
>> +unsigned int mon_event_config_index_get(u32 evtid);
>>   void rdt_staged_configs_clear(void);
>>   bool closid_allocated(unsigned int closid);
>>   int resctrl_find_cleanest_closid(void);
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
>> b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 7a93a6d2b2de..b96b0a8bd7d3 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -1256,6 +1256,28 @@ int __init rdt_get_mon_l3_config(struct
>> rdt_resource *r)
>>       return 0;
>>   }
>>   +void resctrl_arch_mbm_evt_config(struct rdt_hw_mon_domain *hw_dom)
> 
> A function is expected to have a verb in its name and the verb here seems
> to be
> "config", which does not seem appropriate and creates confusion with
> resctrl_arch_event_config_set(). How about resctrl_arch_mbm_evt_config_init()
> with proper initializer of the config values to also cover case when
> events are
> not configurable (INVALID_CONFIG_VALUE introduced in next patch?) ?

Sorry. I am not clear on this comment. Can you please elaborate?

> 
>> +{
>> +    unsigned int index;
>> +    u64 msrval;
>> +
>> +    /*
>> +     * Read the configuration registers QOS_EVT_CFG_n, where <n> is
>> +     * the BMEC event number (EvtID).
>> +     */
>> +    if (mbm_total_event.configurable) {
>> +        index = mon_event_config_index_get(QOS_L3_MBM_TOTAL_EVENT_ID);
>> +        rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval);
>> +        hw_dom->mbm_total_cfg = msrval & MAX_EVT_CONFIG_BITS;
>> +    }
>> +
>> +    if (mbm_local_event.configurable) {
>> +        index = mon_event_config_index_get(QOS_L3_MBM_LOCAL_EVENT_ID);
>> +        rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval);
>> +        hw_dom->mbm_local_cfg = msrval & MAX_EVT_CONFIG_BITS;
>> +    }
>> +}
>> +
>>   void __exit rdt_put_mon_l3_config(void)
>>   {
>>       dom_data_exit();
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index b3d3fa048f15..b2b751741dd8 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1606,7 +1606,7 @@ struct mon_config_info {
>>    *         1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
>>    *         INVALID_CONFIG_INDEX for invalid evtid
>>    */
>> -static inline unsigned int mon_event_config_index_get(u32 evtid)
>> +unsigned int mon_event_config_index_get(u32 evtid)
>>   {
>>       switch (evtid) {
>>       case QOS_L3_MBM_TOTAL_EVENT_ID:
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 11/20] x86/resctrl: Remove MSR reading of event configuration value
  2024-07-12 22:10   ` Reinette Chatre
@ 2024-07-16 19:34     ` Moger, Babu
  0 siblings, 0 replies; 95+ messages in thread
From: Moger, Babu @ 2024-07-16 19:34 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 7/12/24 17:10, Reinette Chatre wrote:
> Hi Babu,
> 
> On 7/3/24 2:48 PM, Babu Moger wrote:
>> The event configuration is domain specific and initialized during domain
>> initialization. It is not required to read the configuration register
>> every time user asks for it. Use the value stored in rdt_mon_hw_domain
> 
> rdt_mon_hw_domain -> rdt_hw_mon_domain
> 
>> instead. Also update the configuration value when user writes it.
> 
> Please separate the context/problem/solution clearly.

Sure.

> 
>>
>> Introduce resctrl_arch_event_config_get() and
>> resctrl_arch_event_config_set() to get/set architecture domain specific
>> mbm_total_cfg/mbm_local_cfg values.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v5: Introduced resctrl_arch_event_config_get and
>>      resctrl_arch_event_config_get() based on our discussion.
>>     
>> https://lore.kernel.org/lkml/68e861f9-245d-4496-a72e-46fc57d19c62@amd.com/
>>
>> v4: New patch.
>> ---
>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 112 +++++++++++++++----------
>>   include/linux/resctrl.h                |   4 +
>>   2 files changed, 72 insertions(+), 44 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index b2b751741dd8..91c5d45ac367 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1591,10 +1591,59 @@ static int rdtgroup_size_show(struct
>> kernfs_open_file *of,
>>   }
>>     struct mon_config_info {
>> +    struct rdt_mon_domain *d;
>>       u32 evtid;
>>       u32 mon_config;
>>   };
> 
> as seen above, mon_config is a u32
> 
>>   +#define INVALID_CONFIG_VALUE   UINT_MAX
> 
> So an invalid config value can be U32_MAX?

Sure.

> 
>> +
>> +unsigned int resctrl_arch_event_config_get(struct rdt_mon_domain *d,
>> +                       enum resctrl_event_id eventid)
>> +{
>> +    struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
>> +
>> +    switch (eventid) {
>> +    case QOS_L3_OCCUP_EVENT_ID:
>> +        break;
>> +    case QOS_L3_MBM_TOTAL_EVENT_ID:
>> +        return hw_dom->mbm_total_cfg;
>> +    case QOS_L3_MBM_LOCAL_EVENT_ID:
>> +        return hw_dom->mbm_local_cfg;
>> +    }
>> +
>> +    /* Never expect to get here */
>> +    WARN_ON_ONCE(1);
>> +
>> +    return INVALID_CONFIG_VALUE;
>> +}
>> +
>> +void resctrl_arch_event_config_set(void *info)
>> +{
>> +    struct mon_config_info *mon_info = info;
>> +    struct rdt_hw_mon_domain *hw_dom;
>> +    unsigned int index;
>> +
>> +    index = mon_event_config_index_get(mon_info->evtid);
>> +    if (index == INVALID_CONFIG_VALUE) {
> 
> INVALID_CONFIG_INDEX?

Yes.

> 
>> +        pr_warn_once("Invalid event id %d\n", mon_info->evtid);
>> +        return;
>> +    }
>> +    wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
>> +
>> +    hw_dom = resctrl_to_arch_mon_dom(mon_info->d);
>> +
>> +    switch (mon_info->evtid) {
>> +    case QOS_L3_OCCUP_EVENT_ID:
>> +        break;
>> +    case QOS_L3_MBM_TOTAL_EVENT_ID:
>> +        hw_dom->mbm_total_cfg = mon_info->mon_config;
>> +        break;
>> +    case QOS_L3_MBM_LOCAL_EVENT_ID:
>> +        hw_dom->mbm_local_cfg =  mon_info->mon_config;
> 
> Please add a break here.

Sure.
> 
>> +    }
>> +}
>> +
>>   #define INVALID_CONFIG_INDEX   UINT_MAX
>>     /**
>> @@ -1619,33 +1668,11 @@ unsigned int mon_event_config_index_get(u32 evtid)
>>       }
>>   }
>>   -static void mon_event_config_read(void *info)
>> -{
>> -    struct mon_config_info *mon_info = info;
>> -    unsigned int index;
>> -    u64 msrval;
>> -
>> -    index = mon_event_config_index_get(mon_info->evtid);
>> -    if (index == INVALID_CONFIG_INDEX) {
>> -        pr_warn_once("Invalid event id %d\n", mon_info->evtid);
>> -        return;
>> -    }
>> -    rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval);
>> -
>> -    /* Report only the valid event configuration bits */
>> -    mon_info->mon_config = msrval & MAX_EVT_CONFIG_BITS;
>> -}
>> -
>> -static void mondata_config_read(struct rdt_mon_domain *d, struct
>> mon_config_info *mon_info)
>> -{
>> -    smp_call_function_any(&d->hdr.cpu_mask, mon_event_config_read,
>> mon_info, 1);
>> -}
>> -
>>   static int mbm_config_show(struct seq_file *s, struct rdt_resource *r,
>> u32 evtid)
>>   {
>> -    struct mon_config_info mon_info = {0};
>>       struct rdt_mon_domain *dom;
>>       bool sep = false;
>> +    int val;
>>         cpus_read_lock();
>>       mutex_lock(&rdtgroup_mutex);
>> @@ -1654,11 +1681,13 @@ static int mbm_config_show(struct seq_file *s,
>> struct rdt_resource *r, u32 evtid
>>           if (sep)
>>               seq_puts(s, ";");
>>   -        memset(&mon_info, 0, sizeof(struct mon_config_info));
>> -        mon_info.evtid = evtid;
>> -        mondata_config_read(dom, &mon_info);
>> +        val = resctrl_arch_event_config_get(dom, evtid);
> 
> There are too many types used interchangeably. The mon_config is a "u32",
> but the new function
> returns "unsigned int", which is then assigned to an "int". Please just
> use one type
> consistently, it is a u32 so resctrl_arch_event_config_get() can return
> u32 and "val" should
> be u32.
> 

Sure. Will do.

>> +        if (val == INVALID_CONFIG_VALUE) {
>> +            rdt_last_cmd_puts("Invalid event configuration\n");
> 
> I do not see a reason to print message to user space here. If this error
> is encountered
> then it is a kernel bug and resctrl_arch_event_config_get() would already
> have triggered
> a WARN.
> 
> Since this is a "never should happen" scenario I wonder if we can not just
> print
> the INVALID_CONFIG_VALUE to user space?

Ok. Will remove the check and the message.

> 
> 
>> +            break;
>> +        }
>>   -        seq_printf(s, "%d=0x%02x", dom->hdr.id, mon_info.mon_config);
>> +        seq_printf(s, "%d=0x%02x", dom->hdr.id, val);
>>           sep = true;
>>       }
>>       seq_puts(s, "\n");
>> @@ -1689,33 +1718,27 @@ static int mbm_local_bytes_config_show(struct
>> kernfs_open_file *of,
>>       return 0;
>>   }
>>   -static void mon_event_config_write(void *info)
>> -{
>> -    struct mon_config_info *mon_info = info;
>> -    unsigned int index;
>> -
>> -    index = mon_event_config_index_get(mon_info->evtid);
>> -    if (index == INVALID_CONFIG_INDEX) {
>> -        pr_warn_once("Invalid event id %d\n", mon_info->evtid);
>> -        return;
>> -    }
>> -    wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
>> -}
>>     static void mbm_config_write_domain(struct rdt_resource *r,
>>                       struct rdt_mon_domain *d, u32 evtid, u32 val)
>>   {
>>       struct mon_config_info mon_info = {0};
>> +    int config_val;
>>         /*
>> -     * Read the current config value first. If both are the same then
>> +     * Check the current config value first. If both are the same then
>>        * no need to write it again.
>>        */
>> -    mon_info.evtid = evtid;
>> -    mondata_config_read(d, &mon_info);
>> -    if (mon_info.mon_config == val)
>> +    config_val = resctrl_arch_event_config_get(d, evtid);
>> +    if (config_val == INVALID_CONFIG_VALUE) {
>> +        rdt_last_cmd_puts("Invalid event configuration\n");
> 
> same here about unneeded print to user space. When this is encountered it is
> a kernel bug.

Ok. Will remove the check and the message.

> 
>> +        return;
>> +    }
>> +    if (config_val == val)
>>           return;
>>   +    mon_info.d = d;
>> +    mon_info.evtid = evtid;
>>       mon_info.mon_config = val;
>>         /*
>> @@ -1724,7 +1747,8 @@ static void mbm_config_write_domain(struct
>> rdt_resource *r,
>>        * are scoped at the domain level. Writing any of these MSRs
>>        * on one CPU is observed by all the CPUs in the domain.
>>        */
>> -    smp_call_function_any(&d->hdr.cpu_mask, mon_event_config_write,
>> +    smp_call_function_any(&d->hdr.cpu_mask,
>> +                  resctrl_arch_event_config_set,
>>                     &mon_info, 1);
>>         /*
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index 62f0f002ef41..f017258ebf85 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -352,6 +352,10 @@ void resctrl_arch_reset_rmid(struct rdt_resource
>> *r, struct rdt_mon_domain *d,
>>    */
>>   void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct
>> rdt_mon_domain *d);
>>   +void resctrl_arch_event_config_set(void *info);
>> +unsigned int resctrl_arch_event_config_get(struct rdt_mon_domain *d,
>> +                       enum resctrl_event_id eventid);
>> +
>>   extern unsigned int resctrl_rmid_realloc_threshold;
>>   extern unsigned int resctrl_rmid_realloc_limit;
>>   
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 12/20] x86/resctrl: Add data structures and definitions for ABMC assignment
  2024-07-12 22:13   ` Reinette Chatre
@ 2024-07-16 20:24     ` Moger, Babu
  0 siblings, 0 replies; 95+ messages in thread
From: Moger, Babu @ 2024-07-16 20:24 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 7/12/24 17:13, Reinette Chatre wrote:
> Hi Babu,
> 
> On 7/3/24 2:48 PM, Babu Moger wrote:
>> The ABMC feature provides an option to the user to assign a hardware
>> counter to an RMID and monitor the bandwidth as long as the counter
>> is assigned. The bandwidth events will be tracked by the hardware until
>> the user changes the configuration. Each resctrl group can configure
>> maximum two counters, one for total event and one for local event.
>>
>> The counters are configured by writing to MSR L3_QOS_ABMC_CFG.
>> Configuration is done by setting the counter id, bandwidth source (RMID)
>> and bandwidth configuration supported by BMEC(Bandwidth Monitoring Event
>> Configuration). Reading L3_QOS_ABMC_DSC returns the configuration of the
>> counter id specified in L3_QOS_ABMC_CFG.
>>
>> Attempts to read or write these MSRs when ABMC is not enabled will result
>> in a #GP(0) exception.
>>
>> Introduce data structures and definitions for ABMC assignments.
>>
>> MSR L3_QOS_ABMC_CFG (0xC000_03FDh) and L3_QOS_ABMC_DSC (0xC000_03FEh)
>> details.
>> =========================================================================
>> Bits     Mnemonic    Description            Access Reset
>>                             Type   Value
>> =========================================================================
>> 63     CfgEn         Configuration Enable         R/W     0
>>
>> 62     CtrEn         Enable/disable Tracking        R/W     0
>>
>> 61:53     –         Reserved             MBZ     0
>>
>> 52:48     CtrID         Counter Identifier        R/W    0
>>
>> 47     IsCOS        BwSrc field is a CLOSID        R/W    0
>>             (not an RMID)
>>
>> 46:44     –        Reserved            MBZ    0
>>
>> 43:32    BwSrc        Bandwidth Source        R/W    0
>>             (RMID or CLOSID)
>>
>> 31:0    BwType        Bandwidth configuration        R/W    0
>>             to track for this counter
>> ==========================================================================
>>
>> The feature details are documented in the APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC).
> 
> The changelog only describes the hardware interface yet the patch contains
> part hardware interface part new driver support for hardware interface.
> 

Yes. I may have to separate this.

>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>> ---
>> v5: Moved assignment flags here (path 10/19 of v4).
>>      Added MON_CNTR_UNSET definition to initialize cntr_id's.
>>      More details in commit log.
>>      Renamed few fields in l3_qos_abmc_cfg for readability.
>>
>> v4: Added more descriptions.
>>      Changed the name abmc_ctr_id to ctr_id.
>>      Added L3_QOS_ABMC_DSC. Used for reading the configuration.
>>
>> v3: No changes.
>>
>> v2: No changes.
>> ---
>>   arch/x86/include/asm/msr-index.h       |  2 ++
>>   arch/x86/kernel/cpu/resctrl/internal.h | 40 ++++++++++++++++++++++++++
>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 18 ++++++++++++
>>   3 files changed, 60 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/msr-index.h
>> b/arch/x86/include/asm/msr-index.h
>> index 263b2d9d00ed..5e44ff91f459 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -1175,6 +1175,8 @@
>>   #define MSR_IA32_SMBA_BW_BASE        0xc0000280
>>   #define MSR_IA32_EVT_CFG_BASE        0xc0000400
>>   #define MSR_IA32_L3_QOS_EXT_CFG        0xc00003ff
>> +#define MSR_IA32_L3_QOS_ABMC_CFG    0xc00003fd
>> +#define MSR_IA32_L3_QOS_ABMC_DSC    0xc00003fe
>>     /* MSR_IA32_VMX_MISC bits */
>>   #define MSR_IA32_VMX_MISC_INTEL_PT                 (1ULL << 14)
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
>> b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 4cb1a5d014a3..6925c947682d 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -100,6 +100,18 @@ cpumask_any_housekeeping(const struct cpumask
>> *mask, int exclude_cpu)
>>   /* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature */
>>   #define ABMC_ENABLE            BIT(0)
>>   +/*
>> + * Assignment flags for ABMC feature
>> + */
>> +#define ASSIGN_NONE    0
>> +#define ASSIGN_TOTAL    BIT(QOS_L3_MBM_TOTAL_EVENT_ID)
>> +#define ASSIGN_LOCAL    BIT(QOS_L3_MBM_LOCAL_EVENT_ID)
> 
> These flags do not appear to be part of hardware interface and there
> is no explanation for what they mean or how they will be used. They are
> also not used in this patch. It is thus not possible to understand if
> they belong in this patch or is appropriate in this work.

ok. Will remove it from here. Will introduce later when it is used.

> 
>> +
>> +#define MON_CNTR_UNSET    U32_MAX
>> +
>> +/* Maximum assignable counters per resctrl group */
>> +#define MAX_CNTRS    2
>> +
>>   struct rdt_fs_context {
>>       struct kernfs_fs_context    kfc;
>>       bool                enable_cdpl2;
>> @@ -228,12 +240,14 @@ enum rdtgrp_mode {
>>    * @parent:            parent rdtgrp
>>    * @crdtgrp_list:        child rdtgroup node list
>>    * @rmid:            rmid for this rdtgroup
>> + * @cntr_id:            ABMC counter ids assigned to this group
> 
> struct mongroup is private to resctrl fs so it cannot contain an
> architecture specific feature. Having it contain a generic "cntr_id"
> may be ok at this point, but it should not be termed "ABMC counter".

Ok. Sure.

> 
>>    */
>>   struct mongroup {
>>       struct kernfs_node    *mon_data_kn;
>>       struct rdtgroup        *parent;
>>       struct list_head    crdtgrp_list;
>>       u32            rmid;
>> +    u32            cntr_id[MAX_CNTRS];
> 
> This is a significant addition yet is silently included as part of a patch
> that just introduces hardware interface. This is how resctrl will manage
> the hardware counters. It is significant since this is what dictates that it
> is resctrl fs that will manage the counters, which makes it important which
> interfaces are made available and from where it is called. Through
> this series I have also not come across a description of this architecture.
> With this introduction counters are maintained per monitor group, yet
> the new interface supports assigining counters per domain. There
> is no high level explanation of this architecture and the reader is forced
> to decipher it from the implementation making this work harder to review
> that necessary.
> 
> Would it be possible to present the fs and architecture code
> separately? I think doing so will make it easier to understand.

Sure. Will separate the two parts.

> 
>>   };
>>     /**
>> @@ -607,6 +621,32 @@ union cpuid_0x10_x_edx {
>>       unsigned int full;
>>   };
>>   +/*
>> + * ABMC counters can be configured by writing to L3_QOS_ABMC_CFG.
>> + * @bw_type        : Bandwidth configuration(supported by BMEC)
>> + *              to track this counter id.
> 
> Does "to track this counter id" mean "tracked by @cntr_id"?

Yea. Sure.

> 
>> + * @bw_src        : Bandwidth Source (RMID or CLOSID).
> 
> Please do not capitalize words mid sentence, like "Source"
> above, "Identifier", and "Enable" in two instances below.
> 
>> + * @reserved1        : Reserved.
>> + * @is_clos        : BwSrc field is a CLOSID (not an RMID).
> 
> Just stick to @bw_src.

Sure.

> 
>> + * @cntr_id        : Counter Identifier.
>> + * @reserved        : Reserved.
>> + * @cntr_en        : Tracking Enable bit.
> 
> Can this be more detailed about what happens when this bit is set/clear?

Sure. Will add it.
cfn_en = 1,  cntr_en= 0;
  Counter will be be configured and tracking is not enabled.

cfn_en = 1,  cntr_en= 1;
  Counter will be be configured and tracking will be enabled.


> 
>> + * @cfg_en        : Configuration Enable bit.
> 
> What is difference between "configuration enable" and "tracking enable"?
> What is relationship, if any, to @bw_type that is the bandwidth
> configuration?
> 
>> + */
>> +union l3_qos_abmc_cfg {
>> +    struct {
>> +        unsigned long    bw_type    :32,
>> +                bw_src    :12,
>> +                reserved1: 3,
>> +                is_clos    : 1,
>> +                cntr_id    : 5,
>> +                reserved : 9,
>> +                cntr_en    : 1,
>> +                cfg_en    : 1;
>> +    } split;
> 
> Please check the spacing in this data structure. Tabs are used inconsistently
> and the members are not lining up either.

Sure.

> 
>> +    unsigned long full;
>> +};
>> +
>>   void rdt_last_cmd_clear(void);
>>   void rdt_last_cmd_puts(const char *s);
>>   __printf(1, 2)
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 91c5d45ac367..d2663f1345b7 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -2505,6 +2505,7 @@ static void resctrl_abmc_set_one_amd(void *arg)
>>     static int _resctrl_abmc_enable(struct rdt_resource *r, bool enable)
>>   {
>> +    struct rdtgroup *prgrp, *crgrp;
>>       struct rdt_mon_domain *d;
>>         /*
>> @@ -2513,6 +2514,17 @@ static int _resctrl_abmc_enable(struct
>> rdt_resource *r, bool enable)
>>        */
>>       mbm_cntrs_init();
>>   +    /* Reset the cntr_id's for all the monitor groups */
>> +    list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
>> +        prgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
>> +        prgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
>> +        list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list,
>> +                    mon.crdtgrp_list) {
>> +            crgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
>> +            crgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
>> +        }
>> +    }
>> +
> 
> No. The counters are in the monitor group that is a structure that is private
> to the fs. The architecture code should not be accessing it. This should
> only be
> done by fs code.

Will move this code to FS part before coming here.

> 
>>       /*
>>        * Hardware counters will reset after switching the monitor mode.
>>        * Reset the architectural state so that reading of hardware
>> @@ -3573,6 +3585,8 @@ static int mkdir_rdt_prepare_rmid_alloc(struct
>> rdtgroup *rdtgrp)
>>           return ret;
>>       }
>>       rdtgrp->mon.rmid = ret;
>> +    rdtgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
>> +    rdtgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
>>         ret = mkdir_mondata_all(rdtgrp->kn, rdtgrp,
>> &rdtgrp->mon.mon_data_kn);
>>       if (ret) {
>> @@ -4128,6 +4142,10 @@ static void __init rdtgroup_setup_default(void)
>>       rdtgroup_default.closid = RESCTRL_RESERVED_CLOSID;
>>       rdtgroup_default.mon.rmid = RESCTRL_RESERVED_RMID;
>>       rdtgroup_default.type = RDTCTRL_GROUP;
>> +
>> +    rdtgroup_default.mon.cntr_id[0] = MON_CNTR_UNSET;
>> +    rdtgroup_default.mon.cntr_id[1] = MON_CNTR_UNSET;
>> +
>>       INIT_LIST_HEAD(&rdtgroup_default.mon.crdtgrp_list);
>>         list_add(&rdtgroup_default.rdtgroup_list, &rdt_all_groups);
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 06/20] x86/resctrl: Add support to enable/disable AMD ABMC feature
  2024-07-16 18:48         ` Moger, Babu
@ 2024-07-16 20:41           ` Reinette Chatre
  0 siblings, 0 replies; 95+ messages in thread
From: Reinette Chatre @ 2024-07-16 20:41 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 7/16/24 11:48 AM, Moger, Babu wrote:
> On 7/16/24 12:51, Reinette Chatre wrote:
>> On 7/16/24 8:13 AM, Moger, Babu wrote:
>>> On 7/12/24 17:05, Reinette Chatre wrote:
>>>> On 7/3/24 2:48 PM, Babu Moger wrote:

>>>>> @@ -491,6 +495,7 @@ struct rdt_hw_resource {
>>>>>         unsigned int        mbm_width;
>>>>>         unsigned int        mbm_cfg_mask;
>>>>>         bool            cdp_enabled;
>>>>> +    bool            abmc_enabled;
>>>>>     };
>>>>
>>>> mbm_cntr_enabled? This is architecture specific code so there is more
>>>> flexibility
>>>> here, but it may make implementation easier to understand if consistent
>>>> naming is used
>>>> between fs and arch code.
>>>
>>> How about "mbm_cntr_assign_enabled" or "cntr_assign_enabled" ?
>>
>> My preference is to keep the term "mbm_cntr" to be consistent with the
>> other variables/struct members to help when reading the code.
>> "mbm_cntr_assign_enabled" does seem to be getting long though.
>> Are you planning to use it by assigning it to a local variable with shorter
>> name?
> 
> Yes. We can do that.

ok. It is not clear to me how this will turn out. I'm afraid the length may
start to be cumbersome, but we can see how it turns out.

Reinette

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 10/20] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg
  2024-07-16 19:21     ` Moger, Babu
@ 2024-07-16 20:42       ` Reinette Chatre
  2024-07-16 22:43         ` Moger, Babu
  0 siblings, 1 reply; 95+ messages in thread
From: Reinette Chatre @ 2024-07-16 20:42 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 7/16/24 12:21 PM, Moger, Babu wrote:
> On 7/12/24 17:08, Reinette Chatre wrote:
>> On 7/3/24 2:48 PM, Babu Moger wrote:

>>> @@ -662,6 +666,8 @@ void __check_limbo(struct rdt_mon_domain *d, bool
>>> force_free);
>>>    void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>>>    void __init resctrl_file_fflags_init(const char *config,
>>>                         unsigned long fflags);
>>> +void resctrl_arch_mbm_evt_config(struct rdt_hw_mon_domain *hw_dom);
>>> +unsigned int mon_event_config_index_get(u32 evtid);
>>>    void rdt_staged_configs_clear(void);
>>>    bool closid_allocated(unsigned int closid);
>>>    int resctrl_find_cleanest_closid(void);
>>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
>>> b/arch/x86/kernel/cpu/resctrl/monitor.c
>>> index 7a93a6d2b2de..b96b0a8bd7d3 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>>> @@ -1256,6 +1256,28 @@ int __init rdt_get_mon_l3_config(struct
>>> rdt_resource *r)
>>>        return 0;
>>>    }
>>>    +void resctrl_arch_mbm_evt_config(struct rdt_hw_mon_domain *hw_dom)
>>
>> A function is expected to have a verb in its name and the verb here seems
>> to be
>> "config", which does not seem appropriate and creates confusion with
>> resctrl_arch_event_config_set(). How about resctrl_arch_mbm_evt_config_init()
>> with proper initializer of the config values to also cover case when
>> events are
>> not configurable (INVALID_CONFIG_VALUE introduced in next patch?) ?
> 
> Sorry. I am not clear on this comment. Can you please elaborate?

This comment has two parts.

First, there is the naming of the function.
The name of the function should reflect what the function does and I
believe that resctrl_arch_mbm_evt_config() is close enough to
resctrl_arch_event_config_set() to cause confusion while also lacking
an expected verb in the function name (since "config" should not be
considered a verb here) . I proposed resctrl_arch_mbm_evt_config_init()
as a new function name that has the "init" verb to indicate that this
function "init"ializes the MBM config values.

Second, there is the work done by the function.
In this implementation the function initializes
rdt_hw_mon_domain->mbm_total_cfg and rdt_hw_mon_domain->mbm_local_cfg
when the events are configurable. I proposed that as an initializer
the function can be expected to initialize rdt_hw_mon_domain->mbm_total_cfg
and rdt_hw_mon_domain->mbm_local_cfg whether the events are configurable
or not. In the latter case they can be initialized with INVALID_CONFIG_VALUE?

Reinette



^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 13/20] x86/resctrl: Add the interface to assign hardware counter
  2024-07-12 22:09   ` Reinette Chatre
@ 2024-07-16 20:45     ` Moger, Babu
  0 siblings, 0 replies; 95+ messages in thread
From: Moger, Babu @ 2024-07-16 20:45 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 7/12/24 17:09, Reinette Chatre wrote:
> Hi Babu,
> 
> On 7/3/24 2:48 PM, Babu Moger wrote:
>> The ABMC feature provides an option to the user to assign a hardware
>> counter to an RMID and monitor the bandwidth as long as it is assigned.
>> The assigned RMID will be tracked by the hardware until the user unassigns
>> it manually.
>>
>> Individual counters are configured by writing to L3_QOS_ABMC_CFG MSR
>> and specifying the counter id, bandwidth source, and bandwidth types.
>>
>> Provide the interface to assign the counter ids to RMID.
>>
> 
> Again this is a mix of a couple of layers where this single patch
> introduces fs code (mbm_cntr_alloc() and rdtgroup_assign_cntr()) as well
> as architecture specific code (resctrl_arch_assign_cntr() and
> rdtgroup_abmc_cfg()).
> Lumping this all together without any guidance to reader makes this very
> difficult
> to navigate. This work needs to be split into fs and arch parts with
> clear descriptions of how the layers interact.

Agree. We need to separate it. Will do.

> 
>> The feature details are documented in the APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>>      Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable
>> Bandwidth
>>      Monitoring (ABMC).
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>> ---
>> v5: Few name changes to match cntr_id.
>>      Changed the function names to
>>      rdtgroup_assign_cntr
>>      resctr_arch_assign_cntr
>>      More comments on commit log.
>>      Added function summary.
>>
>> v4: Commit message update.
>>      User bitmap APIs where applicable.
>>      Changed the interfaces considering MPAM(arm).
>>      Added domain specific assignment.
>>
>> v3: Removed the static from the prototype of rdtgroup_assign_abmc.
>>      The function is not called directly from user anymore. These
>>      changes are related to global assignment interface.
>>
>> v2: Minor text changes in commit message.
>> ---
>>   arch/x86/kernel/cpu/resctrl/internal.h |  3 +
>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 96 ++++++++++++++++++++++++++
>>   2 files changed, 99 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
>> b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 6925c947682d..66460375056c 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -708,6 +708,9 @@ void __init resctrl_file_fflags_init(const char
>> *config,
>>                        unsigned long fflags);
>>   void resctrl_arch_mbm_evt_config(struct rdt_hw_mon_domain *hw_dom);
>>   unsigned int mon_event_config_index_get(u32 evtid);
>> +int resctrl_arch_assign_cntr(struct rdt_mon_domain *d, u32 evtid, u32
>> rmid,
>> +                 u32 cntr_id, u32 closid, bool enable);
>> +int rdtgroup_assign_cntr(struct rdtgroup *rdtgrp, u32 evtid);
>>   void rdt_staged_configs_clear(void);
>>   bool closid_allocated(unsigned int closid);
>>   int resctrl_find_cleanest_closid(void);
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index d2663f1345b7..44f6eff42c30 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -202,6 +202,19 @@ static void mbm_cntrs_init(void)
>>       mbm_cntrs_free_map_len = r->mon.num_mbm_cntrs;
>>   }
>>   +static int mbm_cntr_alloc(void)
>> +{
>> +    u32 cntr_id = find_first_bit(&mbm_cntrs_free_map,
>> +                     mbm_cntrs_free_map_len);
>> +
>> +    if (cntr_id >= mbm_cntrs_free_map_len)
>> +        return -ENOSPC;
>> +
>> +    __clear_bit(cntr_id, &mbm_cntrs_free_map);
>> +
>> +    return cntr_id;
>> +}
>> +
>>   /**
>>    * rdtgroup_mode_by_closid - Return mode of resource group with closid
>>    * @closid: closid if the resource group
>> @@ -1860,6 +1873,89 @@ static ssize_t
>> mbm_local_bytes_config_write(struct kernfs_open_file *of,
>>       return ret ?: nbytes;
>>   }
>>   +static void rdtgroup_abmc_cfg(void *info)
>> +{
>> +    u64 *msrval = info;
>> +
>> +    wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *msrval);
>> +}
>> +
>> +/*
>> + * Send an IPI to the domain to assign the counter id to RMID.
>> + */
>> +int resctrl_arch_assign_cntr(struct rdt_mon_domain *d, u32 evtid, u32
>> rmid,
> 
> u32 evtid -> enum resctrl_event_id evtid

Sure.

> 
>> +                 u32 cntr_id, u32 closid, bool enable)
>> +{
>> +    struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
>> +    union l3_qos_abmc_cfg abmc_cfg = { 0 };
>> +    struct arch_mbm_state *arch_mbm;
>> +
>> +    abmc_cfg.split.cfg_en = 1;
>> +    abmc_cfg.split.cntr_en = enable ? 1 : 0;
>> +    abmc_cfg.split.cntr_id = cntr_id;
>> +    abmc_cfg.split.bw_src = rmid;
>> +
>> +    /* Update the event configuration from the domain */
>> +    if (evtid == QOS_L3_MBM_TOTAL_EVENT_ID) {
>> +        abmc_cfg.split.bw_type = hw_dom->mbm_total_cfg;
>> +        arch_mbm = &hw_dom->arch_mbm_total[rmid];
>> +    } else {
>> +        abmc_cfg.split.bw_type = hw_dom->mbm_local_cfg;
>> +        arch_mbm = &hw_dom->arch_mbm_local[rmid];
>> +    }
>> +
>> +    smp_call_function_any(&d->hdr.cpu_mask, rdtgroup_abmc_cfg,
>> &abmc_cfg, 1);
>> +
>> +    /*
>> +     * Reset the architectural state so that reading of hardware
>> +     * counter is not considered as an overflow in next update.
>> +     */
>> +    if (arch_mbm)
>> +        memset(arch_mbm, 0, sizeof(struct arch_mbm_state));
>> +
>> +    return 0;
>> +}
>> +
>> +/*
>> + * Assign a hardware counter id to the group. Allocate a new counter id
>> + * if the event is unassigned.
>> + */
>> +int rdtgroup_assign_cntr(struct rdtgroup *rdtgrp, u32 evtid)
> 
> u32 evtid -> enum resctrl_event_id evtid

Sure.

> 
>> +{
>> +    struct rdt_resource *r =
>> &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> +    int cntr_id = 0, index;
>> +    struct rdt_mon_domain *d;
> 
> reverse fir

Sure.

> 
>> +
>> +    index = mon_event_config_index_get(evtid);
>> +    if (index == INVALID_CONFIG_INDEX) {
>> +        rdt_last_cmd_puts("Invalid event id\n");
> 
> This is a kernel bug and can be a WARN (once) instead. No need to message
> user space.

Sure.

> 
>> +        return -EINVAL;
>> +    }
>> +
>> +    /* Nothing to do if event has been assigned already */
>> +    if (rdtgrp->mon.cntr_id[index] != MON_CNTR_UNSET) {
>> +        rdt_last_cmd_puts("ABMC counter is assigned already\n");
>> +        return 0;
>> +    }
>> +
>> +    /*
>> +     * Allocate a new counter id and update domains
>> +     */
>> +    cntr_id = mbm_cntr_alloc();
>> +    if (cntr_id < 0) {
>> +        rdt_last_cmd_puts("Out of ABMC counters\n");
>> +        return -ENOSPC;
>> +    }
>> +
>> +    rdtgrp->mon.cntr_id[index] = cntr_id;
>> +
>> +    list_for_each_entry(d, &r->mon_domains, hdr.list)
>> +        resctrl_arch_assign_cntr(d, evtid, rdtgrp->mon.rmid,
>> +                     cntr_id, rdtgrp->closid, 1);
>> +
>> +    return 0;
>> +}
>> +
>>   /* rdtgroup information files for one cache resource. */
>>   static struct rftype res_common_files[] = {
>>       {
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 15/20] x86/resctrl: Assign/unassign counters by default when ABMC is enabled
  2024-07-12 22:10   ` Reinette Chatre
@ 2024-07-16 20:58     ` Moger, Babu
  0 siblings, 0 replies; 95+ messages in thread
From: Moger, Babu @ 2024-07-16 20:58 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,


On 7/12/24 17:10, Reinette Chatre wrote:
> Hi Babu,
> 
> On 7/3/24 2:48 PM, Babu Moger wrote:
>> Assign/unassign counters on resctrl group creation/deletion. If the
>> counters are exhausted, report the warnings and continue. It is not
>> required to fail group creation for assignment failures. Users have
>> the option to modify the assignments later.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v5: Removed the code to enable/disable ABMC during the mount.
>>      That will be another patch.
>>      Added arch callers to get the arch specific data.
>>      Renamed fuctions to match the other abmc function.
>>      Added code comments for assignment failures.
>>
>> v4: Few name changes based on the upstream discussion.
>>      Commit message update.
>>
>> v3: This is a new patch. Patch addresses the upstream comment to enable
>>      ABMC feature by default if the feature is available.
>> ---
>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 78 ++++++++++++++++++++++++++
>>   1 file changed, 78 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index ffde30b36c1a..475a0c7b2a25 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -2910,6 +2910,46 @@ static void schemata_list_destroy(void)
>>       }
>>   }
>>   +/*
>> + * Called when new group is created. Assign the counters if ABMC is
>> + * already enabled. Two counters are required per group, one for total
>> + * event and one for local event. With limited number of counters,
>> + * the assignments can fail in some cases. But, it is not required to
>> + * fail the group creation. Users have the option to modify the
>> + * assignments after the group creation.
>> + */
>> +static int rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
>> +{
>> +    int ret = 0;
>> +
>> +    if (!resctrl_arch_get_abmc_enabled())
>> +        return 0;
>> +
>> +    if (is_mbm_total_enabled())
>> +        ret = rdtgroup_assign_cntr(rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
>> +
>> +    if (!ret && is_mbm_local_enabled())
>> +        ret = rdtgroup_assign_cntr(rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
>> +
>> +    return ret;
>> +}
>> +
>> +static int rdtgroup_unassign_cntrs(struct rdtgroup *rdtgrp)
>> +{
>> +    int ret = 0;
>> +
>> +    if (!resctrl_arch_get_abmc_enabled())
>> +        return 0;
>> +
>> +    if (is_mbm_total_enabled())
>> +        ret = rdtgroup_unassign_cntr(rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
>> +
>> +    if (!ret && is_mbm_local_enabled())
>> +        ret = rdtgroup_unassign_cntr(rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
>> +
>> +    return ret;
>> +}
>> +
>>   static int rdt_get_tree(struct fs_context *fc)
>>   {
>>       struct rdt_resource *r =
>> &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> @@ -2972,6 +3012,16 @@ static int rdt_get_tree(struct fs_context *fc)
>>           if (ret < 0)
>>               goto out_mongrp;
>>           rdtgroup_default.mon.mon_data_kn = kn_mondata;
>> +
>> +        /*
>> +         * Assign the counters if ABMC is already enabled.
>> +         * With limited number of counters, the assignments can
>> +         * fail in some cases. But, it is not required to fail
>> +         * the group creation. Users have the option to modify
>> +         * the assignments after the group creation.
>> +         */
> 
> The function has detailed comments - it seems unnecessary to me that the
> same comments are duplicated at each call site.

Sure. Will remove duplicates.

> 
>> +        if (rdtgroup_assign_cntrs(&rdtgroup_default) < 0)
>> +            rdt_last_cmd_puts("Monitor assignment failed\n");
> 
> rdtgroup_assign_cntrs() already prints message, why print another? Error
> handling can then be dropped.

Sure.

> 
>>       }
>>         ret = rdt_pseudo_lock_init();
>> @@ -3246,6 +3296,8 @@ static void rdt_kill_sb(struct super_block *sb)
>>       cpus_read_lock();
>>       mutex_lock(&rdtgroup_mutex);
>>   +    rdtgroup_unassign_cntrs(&rdtgroup_default);
>> +
> 
> This seems appropriate to be in the "Put everything back to default values"
> section.

Sure. Will move it down.

> 
>>       rdt_disable_ctx();
>>         /*Put everything back to default values. */
>> @@ -3850,6 +3902,16 @@ static int rdtgroup_mkdir_mon(struct kernfs_node
>> *parent_kn,
>>           goto out_unlock;
>>       }
>>   +    /*
>> +     * Assign the counters if ABMC is already enabled.
>> +     * With the limited number of counters, there can be cases
>> +     * only on assignment succeed. It is not required to fail
>> +     * here in that case. Users have the option to modify the
>> +     * assignments later.
>> +     */
>> +    if (rdtgroup_assign_cntrs(rdtgrp) < 0)
>> +        rdt_last_cmd_puts("Monitor assignment failed\n");
>> +
>>       kernfs_activate(rdtgrp->kn);
>>         /*
>> @@ -3894,6 +3956,17 @@ static int rdtgroup_mkdir_ctrl_mon(struct
>> kernfs_node *parent_kn,
>>       if (ret)
>>           goto out_closid_free;
>>   +    /*
>> +     * Assign the counters if ABMC is already enabled.
>> +     * With the limited number of counters, there can be cases
>> +     * only on assignment succeed. It is not required to fail
>> +     * here in that case. Users have the option to assign the
>> +     * counter later.
>> +     */
>> +
>> +    if (rdtgroup_assign_cntrs(rdtgrp) < 0)
>> +        rdt_last_cmd_puts("Monitor assignment failed\n");
>> +
>>       kernfs_activate(rdtgrp->kn);
>>         ret = rdtgroup_init_alloc(rdtgrp);
>> @@ -3989,6 +4062,9 @@ static int rdtgroup_rmdir_mon(struct rdtgroup
>> *rdtgrp, cpumask_var_t tmpmask)
>>       update_closid_rmid(tmpmask, NULL);
>>         rdtgrp->flags = RDT_DELETED;
>> +
>> +    rdtgroup_unassign_cntrs(rdtgrp);
>> +
>>       free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>>         /*
>> @@ -4035,6 +4111,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup
>> *rdtgrp, cpumask_var_t tmpmask)
>>       cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
>>       update_closid_rmid(tmpmask, NULL);
>>   +    rdtgroup_unassign_cntrs(rdtgrp);
>> +
>>       free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>>       closid_free(rdtgrp->closid);
>>   
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 16/20] x86/resctrl: Report "Unassigned" for MBM events in ABMC mode
  2024-07-12 22:13   ` Reinette Chatre
@ 2024-07-16 21:04     ` Moger, Babu
  0 siblings, 0 replies; 95+ messages in thread
From: Moger, Babu @ 2024-07-16 21:04 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,


On 7/12/24 17:13, Reinette Chatre wrote:
> Hi Babu,
> 
> On 7/3/24 2:48 PM, Babu Moger wrote:
>> In ABMC mode, the hardware counter should be assigned to read the MBM
>> events.
>>
>> Report "Unassigned" in case the user attempts to read the events without
>> assigning the counter.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v5: New patch.
>> ---
>>   Documentation/arch/x86/resctrl.rst        |  4 ++++
>>   arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 19 ++++++++++++++-----
>>   2 files changed, 18 insertions(+), 5 deletions(-)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst
>> b/Documentation/arch/x86/resctrl.rst
>> index 4907d0758118..11b7a5f26b40 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -284,6 +284,10 @@ with the following files:
>>       until the user unassigns it manually. There is no need to worry
>>       about counters being reset during this period.
>>   +    In ABMC mode, the MBM event counters will return "Unassigned" if
>> +    the hardware counter is not assigned to the event. Users need to
>> +    assign a counter manually to read the events.
> 
> This no longer seems accurate with counters assigned by default.

But, there are still cases where counters are not available for
assignment. Users can create more groups than the number of available
counters. I will add those details here.
> 
>> +
>>       Without ABMC enabled, monitoring will work in "legacy" mode
>>       without assignment option.
>>   diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> index 50fa1fe9a073..e60b469b7d12 100644
>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> @@ -562,7 +562,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void
>> *arg)
>>       struct rdtgroup *rdtgrp;
>>       struct rdt_resource *r;
>>       union mon_data_bits md;
>> -    int ret = 0;
>> +    int ret = 0, index;
>>         rdtgrp = rdtgroup_kn_lock_live(of->kn);
>>       if (!rdtgrp) {
>> @@ -609,12 +609,21 @@ int rdtgroup_mondata_show(struct seq_file *m, void
>> *arg)
>>     checkresult:
>>   -    if (rr.err == -EIO)
>> +    if (rr.err == -EIO) {
>>           seq_puts(m, "Error\n");
>> -    else if (rr.err == -EINVAL)
>> -        seq_puts(m, "Unavailable\n");
>> -    else
>> +    } else if (rr.err == -EINVAL) {
>> +        if (resctrl_arch_get_abmc_enabled()) {
>> +            index = mon_event_config_index_get(evtid);
>> +            if (rdtgrp->mon.cntr_id[index] == MON_CNTR_UNSET)
>> +                seq_puts(m, "Unassigned\n");
>> +            else
>> +                seq_puts(m, "Unavailable\n");
>> +        } else {
>> +            seq_puts(m, "Unavailable\n");
>> +        }
>> +    } else {
>>           seq_printf(m, "%llu\n", rr.val);
>> +    }
>>   
> 
> This still attempts to read from hardware that is futile to do knowing
> that a counter is not assigned. Why not just print "Unassigned" right away
> without trying to read data from hardware when knowing it will fail?

Yes. I will we can do that. Will do it.

> 
>>   out:
>>       rdtgroup_kn_unlock(of->kn);
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 10/20] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg
  2024-07-16 20:42       ` Reinette Chatre
@ 2024-07-16 22:43         ` Moger, Babu
  0 siblings, 0 replies; 95+ messages in thread
From: Moger, Babu @ 2024-07-16 22:43 UTC (permalink / raw)
  To: Reinette Chatre, babu.moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 7/16/2024 3:42 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 7/16/24 12:21 PM, Moger, Babu wrote:
>> On 7/12/24 17:08, Reinette Chatre wrote:
>>> On 7/3/24 2:48 PM, Babu Moger wrote:
> 
>>>> @@ -662,6 +666,8 @@ void __check_limbo(struct rdt_mon_domain *d, bool
>>>> force_free);
>>>>    void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>>>>    void __init resctrl_file_fflags_init(const char *config,
>>>>                         unsigned long fflags);
>>>> +void resctrl_arch_mbm_evt_config(struct rdt_hw_mon_domain *hw_dom);
>>>> +unsigned int mon_event_config_index_get(u32 evtid);
>>>>    void rdt_staged_configs_clear(void);
>>>>    bool closid_allocated(unsigned int closid);
>>>>    int resctrl_find_cleanest_closid(void);
>>>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
>>>> b/arch/x86/kernel/cpu/resctrl/monitor.c
>>>> index 7a93a6d2b2de..b96b0a8bd7d3 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>>>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>>>> @@ -1256,6 +1256,28 @@ int __init rdt_get_mon_l3_config(struct
>>>> rdt_resource *r)
>>>>        return 0;
>>>>    }
>>>>    +void resctrl_arch_mbm_evt_config(struct rdt_hw_mon_domain *hw_dom)
>>>
>>> A function is expected to have a verb in its name and the verb here 
>>> seems
>>> to be
>>> "config", which does not seem appropriate and creates confusion with
>>> resctrl_arch_event_config_set(). How about 
>>> resctrl_arch_mbm_evt_config_init()
>>> with proper initializer of the config values to also cover case when
>>> events are
>>> not configurable (INVALID_CONFIG_VALUE introduced in next patch?) ?
>>
>> Sorry. I am not clear on this comment. Can you please elaborate?
> 
> This comment has two parts.
> 
> First, there is the naming of the function.
> The name of the function should reflect what the function does and I
> believe that resctrl_arch_mbm_evt_config() is close enough to
> resctrl_arch_event_config_set() to cause confusion while also lacking
> an expected verb in the function name (since "config" should not be
> considered a verb here) . I proposed resctrl_arch_mbm_evt_config_init()
> as a new function name that has the "init" verb to indicate that this
> function "init"ializes the MBM config values.

Yes. Make sense.
> 
> Second, there is the work done by the function.
> In this implementation the function initializes
> rdt_hw_mon_domain->mbm_total_cfg and rdt_hw_mon_domain->mbm_local_cfg
> when the events are configurable. I proposed that as an initializer
> the function can be expected to initialize rdt_hw_mon_domain->mbm_total_cfg
> and rdt_hw_mon_domain->mbm_local_cfg whether the events are configurable
> or not. In the latter case they can be initialized with 
> INVALID_CONFIG_VALUE?

Yes. Thanks for the clarifications.

-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 17/20] x86/resctrl: Introduce the interface switch between monitor modes
  2024-07-12 22:14   ` Reinette Chatre
@ 2024-07-16 22:46     ` Moger, Babu
  0 siblings, 0 replies; 95+ messages in thread
From: Moger, Babu @ 2024-07-16 22:46 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 7/12/2024 5:14 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 7/3/24 2:48 PM, Babu Moger wrote:
>> Introduce interface to switch between ABMC and legacy modes.
>>
>> By default ABMC is enabled on boot if the feature is available.
>> Provide the interface to go back to legacy mode if required.
>>
>> $ cat /sys/fs/resctrl/info/L3_MON/mbm_mode
>> [abmc]
>> legacy
>>
>> To enable the legacy monitoring feature:
>> $ echo "legacy" > /sys/fs/resctrl/info/L3_MON/mbm_mode
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v4: Minor commit text changes. Keep the default to ABMC when supported.
>>      Fixed comments to reflect changed interface "mbm_mode".
>>
>> v3: New patch to address the review comments from upstream.
>> ---
>>   Documentation/arch/x86/resctrl.rst     | 10 +++++++
>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 37 +++++++++++++++++++++++++-
>>   2 files changed, 46 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst 
>> b/Documentation/arch/x86/resctrl.rst
>> index 11b7a5f26b40..4c41c5622627 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -291,6 +291,16 @@ with the following files:
>>       Without ABMC enabled, monitoring will work in "legacy" mode
>>       without assignment option.
>> +    * To enable ABMC feature:
>> +      ::
>> +
>> +        # echo  "abmc" > /sys/fs/resctrl/info/L3_MON/mbm_mode
>> +
>> +    * To enable the legacy monitoring feature:
>> +      ::
>> +
>> +        # echo  "legacy" > /sys/fs/resctrl/info/L3_MON/mbm_mode
>> +
> 
> Needs details on what user can expect to happen to counters/data when
> switching between modes.

Sure. Will add the details.

> 
>>   "num_mbm_cntrs":
>>       The number of monitoring counters available for assignment.
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 475a0c7b2a25..531233779f8d 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -910,6 +910,40 @@ static int rdtgroup_num_mbm_cntrs_show(struct 
>> kernfs_open_file *of,
>>       return 0;
>>   }
>> +static ssize_t rdtgroup_mbm_mode_write(struct kernfs_open_file *of,
>> +                       char *buf, size_t nbytes,
>> +                       loff_t off)
>> +{
>> +    struct rdt_resource *r = of->kn->parent->priv;
>> +    int ret = 0;
>> +
>> +    if (!r->mon.abmc_capable)
>> +        return -EINVAL;
>> +
> 
> Why should a user not be able to write "legacy" into this
> file if "legacy" is the only mode supported?

Yes. Will fix it.

> 
>> +    /* Valid input requires a trailing newline */
>> +    if (nbytes == 0 || buf[nbytes - 1] != '\n')
>> +        return -EINVAL;
>> +
>> +    buf[nbytes - 1] = '\0';
>> +
>> +    cpus_read_lock();
>> +    mutex_lock(&rdtgroup_mutex);
>> +
>> +    rdt_last_cmd_clear();
>> +
>> +    if (!strcmp(buf, "legacy"))
>> +        resctrl_arch_abmc_disable();
>> +    else if (!strcmp(buf, "abmc"))
>> +        ret = resctrl_arch_abmc_enable();
>> +    else
>> +        ret = -EINVAL;
>> +
>> +    mutex_unlock(&rdtgroup_mutex);
>> +    cpus_read_unlock();
>> +
>> +    return ret ?: nbytes;
>> +}
>> +
>>   #ifdef CONFIG_PROC_CPU_RESCTRL
>>   /*
>> @@ -2103,9 +2137,10 @@ static struct rftype res_common_files[] = {
>>       },
>>       {
>>           .name        = "mbm_mode",
>> -        .mode        = 0444,
>> +        .mode        = 0644,
>>           .kf_ops        = &rdtgroup_kf_single_ops,
>>           .seq_show    = rdtgroup_mbm_mode_show,
>> +        .write        = rdtgroup_mbm_mode_write,
>>       },
>>       {
>>           .name        = "cpus",
> 
> Reinette
> 

-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 18/20] x86/resctrl: Enable AMD ABMC feature by default when supported
  2024-07-12 22:15   ` Reinette Chatre
@ 2024-07-16 23:23     ` Moger, Babu
  2024-07-26  0:16       ` Moger, Babu
  0 siblings, 1 reply; 95+ messages in thread
From: Moger, Babu @ 2024-07-16 23:23 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 7/12/2024 5:15 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 7/3/24 2:48 PM, Babu Moger wrote:
>> Enable ABMC by default when supported during the boot up.
>>
>> Users will not see any difference in the behavior when resctrl is
>> mounted. With automatic assignment everything will work as running
>> in the legacy monitor mode.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v5: New patch to enable ABMC by default.
>> ---
>>   arch/x86/kernel/cpu/resctrl/core.c     |  2 ++
>>   arch/x86/kernel/cpu/resctrl/internal.h |  1 +
>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 17 +++++++++++++++++
>>   3 files changed, 20 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
>> b/arch/x86/kernel/cpu/resctrl/core.c
>> index 6265ef8b610f..b69b2650bde3 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>> @@ -599,6 +599,7 @@ static void domain_add_cpu_mon(int cpu, struct 
>> rdt_resource *r)
>>           d = container_of(hdr, struct rdt_mon_domain, hdr);
>>           cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
>> +        resctrl_arch_configure_abmc();
>>           return;
>>       }
>> @@ -620,6 +621,7 @@ static void domain_add_cpu_mon(int cpu, struct 
>> rdt_resource *r)
>>       arch_mon_domain_online(r, d);
>>       resctrl_arch_mbm_evt_config(hw_dom);
>> +    resctrl_arch_configure_abmc();
>>       if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
>>           mon_domain_free(hw_dom);
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h 
>> b/arch/x86/kernel/cpu/resctrl/internal.h
>> index beb005775fe4..0f858cff8ab1 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -707,6 +707,7 @@ void rdt_domain_reconfigure_cdp(struct 
>> rdt_resource *r);
>>   void __init resctrl_file_fflags_init(const char *config,
>>                        unsigned long fflags);
>>   void resctrl_arch_mbm_evt_config(struct rdt_hw_mon_domain *hw_dom);
>> +void resctrl_arch_configure_abmc(void);
>>   unsigned int mon_event_config_index_get(u32 evtid);
>>   int resctrl_arch_assign_cntr(struct rdt_mon_domain *d, u32 evtid, 
>> u32 rmid,
>>                    u32 cntr_id, u32 closid, bool enable);
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 531233779f8d..d978668c8865 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -2733,6 +2733,23 @@ void resctrl_arch_abmc_disable(void)
>>       }
>>   }
>> +void resctrl_arch_configure_abmc(void)
>> +{
>> +    struct rdt_resource *r = 
>> &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> +    struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>> +    bool enable = true;
>> +
>> +    mutex_lock(&rdtgroup_mutex);
>> +
>> +    if (r->mon.abmc_capable) {
>> +        if (!hw_res->abmc_enabled)
>> +            hw_res->abmc_enabled = true;
>> +        resctrl_abmc_set_one_amd(&enable);
>> +    }
> 
> This does not look right. It is not architecture code that needs to
> decide if this feature is enabled or not, right? The feature is enabled
> via fs (for example when user writes to mbm_mode). If the default is
> enabled then it should be set by fs. resctrl_arch_configure_abmc()
> then checks if feature is capable and enabled before it configures
> it on the CPU.

That is correct. But this is a default setting should be done during the 
initialization. This is like rdtgroup_setup_default(). I can move this 
inside rdtgroup_init(void). I will have to change few things make sure 
arch and fs code separate (like accessing abmc_enabled).
Thanks
- Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 19/20] x86/resctrl: Introduce interface to list monitor states of all the groups
  2024-07-12 22:16   ` Reinette Chatre
@ 2024-07-17 15:22     ` Moger, Babu
  2024-08-01 21:37       ` Reinette Chatre
  0 siblings, 1 reply; 95+ messages in thread
From: Moger, Babu @ 2024-07-17 15:22 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 7/12/24 17:16, Reinette Chatre wrote:
> Hi Babu,
> 
> On 7/3/24 2:48 PM, Babu Moger wrote:
>> Provide the interface to list the monitor states of all the resctrl
>> groups in ABMC mode.
>>
>> Example:
>> $cat /sys/fs/resctrl/info/L3_MON/mbm_control
>>
>> List follows the following format:
>>
>> "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>>
>> Format for specific type of groups:
>>
>> - Default CTRL_MON group:
>>    "//<domain_id>=<flags>"
>>
>> - Non-default CTRL_MON group:
>>    "<CTRL_MON group>//<domain_id>=<flags>"
>>
>> - Child MON group of default CTRL_MON group:
>>    "/<MON group>/<domain_id>=<flags>"
>>
>> - Child MON group of non-default CTRL_MON group:
>>    "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>>
>>
>> Flags can be one of the following:
>> t  MBM total event is enabled
>> l  MBM local event is enabled
>> tl Both total and local MBM events are enabled
>> _  None of the MBM events are enabled
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v5: Replaced "assignment flags" with "flags".
>>      Changes related to mon structure.
>>      Changes related renaming the interface from mbm_assign_control to
>>      mbm_control.
>>
>> v4: Added functionality to query domain specific assigment in.
>>      rdtgroup_abmc_dom_state().
>>
>> v3: New patch.
>>      Addresses the feedback to provide the global assignment interface.
>>     
>> https://lore.kernel.org/lkml/c73f444b-83a1-4e9a-95d3-54c5165ee782@intel.com/
>> ---
>>   Documentation/arch/x86/resctrl.rst     |  54 ++++++++++
>>   arch/x86/kernel/cpu/resctrl/monitor.c  |   1 +
>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 130 +++++++++++++++++++++++++
>>   3 files changed, 185 insertions(+)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst
>> b/Documentation/arch/x86/resctrl.rst
>> index 4c41c5622627..05fee779e109 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -304,6 +304,60 @@ with the following files:
>>   "num_mbm_cntrs":
>>       The number of monitoring counters available for assignment.
>>   +"mbm_control":
>> +    Available when ABMC features are supported.
> 
> "Available when ABMC features are supported." can be dropped
> 

Ok. Sure.

>> +    Reports the resctrl group and monitor status of each group.
>> +
>> +    List follows the following format:
>> +        "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>> +
>> +    Format for specific type of grpups:
> 
> grpups -> groups

Sure.

> 
>> +
>> +    * Default CTRL_MON group:
>> +        "//<domain_id>=<flags>"
>> +
>> +    * Non-default CTRL_MON group:
>> +        "<CTRL_MON group>//<domain_id>=<flags>"
>> +
>> +    * Child MON group of default CTRL_MON group:
>> +        "/<MON group>/<domain_id>=<flags>"
>> +
>> +    * Child MON group of non-default CTRL_MON group:
>> +        "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>> +
>> +    Flags can be one of the following:
>> +    ::
>> +
>> +     t  MBM total event is enabled.
>> +     l  MBM local event is enabled.
>> +     tl Both total and local MBM events are enabled.
>> +     _  None of the MBM events are enabled.
>> +
>> +    Examples:
>> +    ::
>> +
>> +     # mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
>> +     # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
>> +     # mkdir
>> /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
>> +
>> +     # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>> +     non_default_ctrl_mon_grp//0=tl;1=tl;
>> +     non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> +     //0=tl;1=tl;
>> +     /child_default_mon_grp/0=tl;1=tl;
>> +
>> +     There are four resctrl groups. All the groups have total and local
>> events are
>> +     enabled on domain 0 and 1.
> 
> "All the groups have total and local events are enabled" -> "All the
> groups have total and local events enabled"?
> 

Sure.

>> +
> 
> The text below seems to repeat ealier description.

I can remove it.

> 
>> +     non_default_ctrl_mon_grp// - This is a non-default CTRL_MON group.
>> +
>> +     non_default_ctrl_mon_grp/child_non_default_mon_grp/ - This is a
>> child monitor
>> +     group of non-default CTRL_MON group.
>> +
>> +     // - This is a default CTRL_MON group.
>> +
>> +     /child_default_mon_grp/ - This is a child monitor group of default
>> CTRL_MON group.
>> +
>>   "max_threshold_occupancy":
>>           Read/write file provides the largest value (in
>>           bytes) at which a previously used LLC_occupancy
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
>> b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index b96b0a8bd7d3..684730f1a72d 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -1244,6 +1244,7 @@ int __init rdt_get_mon_l3_config(struct
>> rdt_resource *r)
>>                   r->mon.num_mbm_cntrs = 64;
>>                 resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
>> +            resctrl_file_fflags_init("mbm_control", RFTYPE_MON_INFO);
> 
> Shouldn't this file always be present?
> 

This is only relevent when monitor assign features are supported.
Having the file without the feature is not usefull.


>>           }
>>       }
>>   diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index d978668c8865..0de9f23d5389 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -944,6 +944,130 @@ static ssize_t rdtgroup_mbm_mode_write(struct
>> kernfs_open_file *of,
>>       return ret ?: nbytes;
>>   }
>>   +static void rdtgroup_abmc_dom_cfg(void *info)
>> +{
>> +    u64 *msrval = info;
>> +
>> +    wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *msrval);
>> +    rdmsrl(MSR_IA32_L3_QOS_ABMC_DSC, *msrval);
>> +}
>> +
>> +/*
>> + * Writing the counter id with CfgEn=0 on L3_QOS_ABMC_CFG and reading
>> + * L3_QOS_ABMC_DSC back will return configuration of the counter
>> + * specified.
> 
> Can this be expanded to explain what the return values mean?

Sure. Basically returns the counter id with its configuration.

Will add few more details.

> 
>> + */
>> +static int rdtgroup_abmc_dom_state(struct rdt_mon_domain *d, u32 cntr_id,
>> +                   u32 rmid)
>> +{
>> +    union l3_qos_abmc_cfg abmc_cfg = { 0 };
>> +
>> +    abmc_cfg.split.cfg_en = 0;
>> +    abmc_cfg.split.cntr_id = cntr_id;
>> +
>> +    smp_call_function_any(&d->hdr.cpu_mask, rdtgroup_abmc_dom_cfg,
>> +                  &abmc_cfg, 1);
>> +
>> +    if (abmc_cfg.split.cntr_en && abmc_cfg.split.bw_src == rmid)
>> +        return 0;
>> +    else
>> +        return -1;
>> +}
>> +
>> +static char *rdtgroup_mon_state_to_str(struct rdtgroup *rdtgrp,
>> +                       struct rdt_mon_domain *d, char *str)
>> +{
>> +    char *tmp = str;
>> +    int dom_state = ASSIGN_NONE;
> 
> reverse fir

Sure.

> 
>> +
>> +    /*
>> +     * Query the monitor state for the domain.
>> +     * Index 0 for evtid == QOS_L3_MBM_TOTAL_EVENT_ID
>> +     * Index 1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
> 
> Why not use the helper?

Yes.

> 
>> +     */
>> +    if (rdtgrp->mon.cntr_id[0] != MON_CNTR_UNSET)
>> +        if (!rdtgroup_abmc_dom_state(d, rdtgrp->mon.cntr_id[0],
>> rdtgrp->mon.rmid))
>> +            dom_state |= ASSIGN_TOTAL;
>> +
>> +    if (rdtgrp->mon.cntr_id[1] != MON_CNTR_UNSET)
>> +        if (!rdtgroup_abmc_dom_state(d, rdtgrp->mon.cntr_id[1],
>> rdtgrp->mon.rmid))
>> +            dom_state |= ASSIGN_LOCAL;
>> +
>> +    switch (dom_state) {
>> +    case ASSIGN_NONE:
>> +        *tmp++ = '_';
>> +        break;
>> +    case (ASSIGN_TOTAL | ASSIGN_LOCAL):
>> +        *tmp++ = 't';
>> +        *tmp++ = 'l';
>> +        break;
>> +    case ASSIGN_TOTAL:
>> +        *tmp++ = 't';
>> +        break;
>> +    case ASSIGN_LOCAL:
>> +        *tmp++ = 'l';
>> +        break;
>> +    default:
>> +        break;
>> +    }
> 
> This switch statement does not scale. Adding new flags will be painful.
> Can flags not
> just incrementally be printed as learned from hardware with "_" printed as
> last resort?
> This would elimininate need for these "ASSIGN" flags.

Let me try to understand this.

You want to remove switch statement.

if (rdtgrp->mon.cntr_id[0] != MON_CNTR_UNSET)
   if (!rdtgroup_abmc_dom_state(d, rdtgrp->mon.cntr_id[0], rdtgrp->mon.rmid))
    *tmp++ = 't';

if (rdtgrp->mon.cntr_id[1] != MON_CNTR_UNSET)
   if (!rdtgroup_abmc_dom_state(d, rdtgrp->mon.cntr_id[1], rdtgrp->mon.rmid))
   *tmp++ = 'l';

If none of these flags are available, then
   *tmp++ = '_';

Is that the idea?

> 
>> +
>> +    *tmp = '\0';
>> +    return str;
>> +}
>> +
>> +static int rdtgroup_mbm_control_show(struct kernfs_open_file *of,
>> +                     struct seq_file *s, void *v)
>> +{
>> +    struct rdt_resource *r = of->kn->parent->priv;
>> +    struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>> +    struct rdt_mon_domain *dom;
>> +    struct rdtgroup *rdtg;
>> +    int grp_default = 0;
>> +    char str[10];
>> +
>> +    if (!hw_res->abmc_enabled) {
>> +        rdt_last_cmd_puts("ABMC feature is not enabled\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    mutex_lock(&rdtgroup_mutex);
>> +
>> +    list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
>> +        struct rdtgroup *crg;
>> +
>> +        if (rdtg == &rdtgroup_default) {
>> +            grp_default = 1;
>> +            seq_puts(s, "//");
>> +        } else {
>> +            grp_default = 0;
>> +            seq_printf(s, "%s//", rdtg->kn->name);
>> +        }
> 
> Isn't the default resource group's name already empty string? That should
> eliminate the need for this special handling, no?

Yea. Let me try that.
> 
>> +
>> +        list_for_each_entry(dom, &r->mon_domains, hdr.list)
>> +            seq_printf(s, "%d=%s;", dom->hdr.id,
>> +                   rdtgroup_mon_state_to_str(rdtg, dom, str));
>> +        seq_putc(s, '\n');
>> +
>> +        list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
>> +                    mon.crdtgrp_list) {
>> +            if (grp_default)
>> +                seq_printf(s, "/%s/", crg->kn->name);
>> +            else
>> +                seq_printf(s, "%s/%s/", rdtg->kn->name,
>> +                       crg->kn->name);
>> +
> 
> Same here .... with default group having name of empty string it can just be
> printed directly, no?

Yea. Let me try that.

> 
>> +            list_for_each_entry(dom, &r->mon_domains, hdr.list)
>> +                seq_printf(s, "%d=%s;", dom->hdr.id,
>> +                       rdtgroup_mon_state_to_str(crg, dom, str));
>> +            seq_putc(s, '\n');
>> +        }
>> +    }
>> +
>> +    mutex_unlock(&rdtgroup_mutex);
>> +
>> +    return 0;
>> +}
>> +
>>   #ifdef CONFIG_PROC_CPU_RESCTRL
>>     /*
>> @@ -2156,6 +2280,12 @@ static struct rftype res_common_files[] = {
>>           .kf_ops        = &rdtgroup_kf_single_ops,
>>           .seq_show    = rdtgroup_num_mbm_cntrs_show,
>>       },
>> +    {
>> +        .name        = "mbm_control",
>> +        .mode        = 0444,
>> +        .kf_ops        = &rdtgroup_kf_single_ops,
>> +        .seq_show    = rdtgroup_mbm_control_show,
>> +    },
>>       {
>>           .name        = "cpus_list",
>>           .mode        = 0644,
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 20/20] x86/resctrl: Introduce interface to modify assignment states of the groups
  2024-07-12 22:17   ` Reinette Chatre
@ 2024-07-17 16:22     ` Moger, Babu
  0 siblings, 0 replies; 95+ messages in thread
From: Moger, Babu @ 2024-07-17 16:22 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 7/12/24 17:17, Reinette Chatre wrote:
> Hi Babu,
> 
> On 7/3/24 2:48 PM, Babu Moger wrote:
>> Introduce the interface to enable events in ABMC mode.
> 
> As mentioned in cover letter, please take care with terms. This
> interface does not "enable events" - note that events can be
> "enabled" even in legacy mode. This is the interface to
> assign counters.
> 
>>
>> Events can be enabled or disabled by writing to file
>> /sys/fs/resctrl/info/L3_MON/mbm_control
>>
>> Format is similar to the list format with addition of op-code for the
>> assignment operation.
>>   "<CTRL_MON group>/<MON group>/<op-code><flags>"
> 
> Missing a "domain_id".

Yea. Will fix it.

> 
>>
>> Format for specific type of groups:
>>
>>   * Default CTRL_MON group:
>>           "//<domain_id><op-code><flags>"
>>
>>   * Non-default CTRL_MON group:
>>           "<CTRL_MON group>//<domain_id><op-code><flags>"
>>
>>   * Child MON group of default CTRL_MON group:
>>           "/<MON group>/<domain_id><op-code><flags>"
>>
>>   * Child MON group of non-default CTRL_MON group:
>>           "<CTRL_MON group>/<MON group>/<domain_id><op-code><flags>"
>>
>> Op-code can be one of the following:
>>
>>   = Update the assignment to match the flags
>>   + enable a new state
>>   - disable a new state
> 
> (note comment in cover letter about consistent terms)

Sure.
> 
>>
>> Assignment flags can be one of the following:
>>   t  MBM total event is enabled
>>   l  MBM local event is enabled
>>   tl Both total and local MBM events are enabled
>>   _  None of the MBM events are enabled. Valid only with '=" opcode.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v5: Interface name changed from mbm_assign_control to mbm_control.
>>      Fixed opcode and flags combination.
>>      '=_" is valid.
>>      "-_" amd "+_" is not valid.
>>      Minor message update.
>>      Renamed the function with prefix - rdtgroup_.
>>      Corrected few documentation mistakes.
>>      Rebase related changes after SNC support.
>>
>> v4: Added domain specific assignments. Fixed the opcode parsing.
>>
>> v3: New patch.
>>      Addresses the feedback to provide the global assignment interface.
>>     
>> https://lore.kernel.org/lkml/c73f444b-83a1-4e9a-95d3-54c5165ee782@intel.com/
>> ---
>>   Documentation/arch/x86/resctrl.rst     |  81 +++++++-
>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 250 ++++++++++++++++++++++++-
>>   2 files changed, 329 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst
>> b/Documentation/arch/x86/resctrl.rst
>> index 05fee779e109..5a621235eb2b 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -331,7 +331,7 @@ with the following files:
>>        t  MBM total event is enabled.
>>        l  MBM local event is enabled.
>>        tl Both total and local MBM events are enabled.
>> -     _  None of the MBM events are enabled.
>> +     _  None of the MBM events are enabled. Only works with opcode '='
>> for write.
>>         Examples:
>>       ::
>> @@ -358,6 +358,85 @@ with the following files:
>>          /child_default_mon_grp/ - This is a child monitor group of
>> default CTRL_MON group.
>>   +    Assignment state can be updated by writing to the interface.
>> +
>> +    Format is similar to the list format with addition of op-code for the
>> +    assignment operation.
>> +
>> +        "<CTRL_MON group>/<MON group>/<op-code><flags>"
> 
> Missing domain_id

Sure.

> 
>> +
>> +    Format for each type of groups:
>> +
>> +        * Default CTRL_MON group:
>> +                "//<domain_id><op-code><flags>"
>> +
>> +        * Non-default CTRL_MON group:
>> +                "<CTRL_MON group>//<domain_id><op-code><flags>"
>> +
>> +        * Child MON group of default CTRL_MON group:
>> +                "/<MON group>/<domain_id><op-code><flags>"
>> +
>> +        * Child MON group of non-default CTRL_MON group:
>> +                "<CTRL_MON group>/<MON group>/<domain_id><op-code><flags>"
>> +
>> +    Op-code can be one of the following:
>> +    ::
>> +
>> +     = Update the assignment to match the flags.
>> +     + Add a new state.
>> +     - delete a new state.
>> +
>> +    Examples:
>> +    ::
>> +
>> +      Initial group status:
>> +      # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>> +      non_default_ctrl_mon_grp//0=tl;1=tl;
>> +      non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> +      //0=tl;1=tl;
>> +      /child_default_mon_grp/0=tl;1=tl;
>> +
>> +      To update the default group to enable only total event on domain 0:
>> +      # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_control
>> +
>> +      Assignment status after the update:
>> +      # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>> +      non_default_ctrl_mon_grp//0=tl;1=tl;
>> +      non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> +      //0=t;1=tl;
>> +      /child_default_mon_grp/0=tl;1=tl;
>> +
>> +      To update the MON group child_default_mon_grp to remove total
>> event on domain 1:
>> +      # echo "/child_default_mon_grp/1-t" >
>> /sys/fs/resctrl/info/L3_MON/mbm_control
>> +
>> +      Assignment status after the update:
>> +      $ cat /sys/fs/resctrl/info/L3_MON/mbm_control
>> +      non_default_ctrl_mon_grp//0=tl;1=tl;
>> +      non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> +      //0=t;1=tl;
>> +      /child_default_mon_grp/0=tl;1=l;
>> +
>> +      To update the MON group
>> non_default_ctrl_mon_grp/child_non_default_mon_grp to
>> +      remove both local and total events on domain 1:
>> +      # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
>> +            /sys/fs/resctrl/info/L3_MON/mbm_control
>> +
>> +      Assignment status after the update:
>> +      non_default_ctrl_mon_grp//0=tl;1=tl;
>> +      non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
>> +      //0=t;1=tl;
>> +      /child_default_mon_grp/0=tl;1=l;
>> +
>> +      To update the default group to add a local event domain 0.
>> +      # echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_control
>> +
>> +      Assignment status after the update:
>> +      # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>> +      non_default_ctrl_mon_grp//0=tl;1=tl;
>> +      non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
>> +      //0=tl;1=tl;
>> +      /child_default_mon_grp/0=tl;1=l;
>> +
>>   "max_threshold_occupancy":
>>           Read/write file provides the largest value (in
>>           bytes) at which a previously used LLC_occupancy
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 0de9f23d5389..84c0874d7872 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1068,6 +1068,253 @@ static int rdtgroup_mbm_control_show(struct
>> kernfs_open_file *of,
>>       return 0;
>>   }
>>   +static int rdtgroup_str_to_mon_state(char *flag)
>> +{
>> +    int i, mon_state = 0;
>> +
>> +    for (i = 0; i < strlen(flag); i++) {
>> +        switch (*(flag + i)) {
>> +        case 't':
>> +            mon_state |= ASSIGN_TOTAL;
>> +            break;
>> +        case 'l':
>> +            mon_state |= ASSIGN_LOCAL;
>> +            break;
>> +        case '_':
>> +            mon_state = ASSIGN_NONE;
>> +            break;
>> +        default:
>> +            mon_state = ASSIGN_NONE;
>> +            break;
>> +        }
>> +    }
>> +
> 
> No. As I mentioned before this makes all this work for nothing
> by preventing us from ever adding another flag. Please do not
> have a default catchall that unassigns all flags.

Will remove default ASSIGN_NONE.

> 
>> +    return mon_state;
>> +}
>> +
>> +static struct rdtgroup *rdtgroup_find_grp(enum rdt_group_type rtype,
>> char *p_grp, char *c_grp)
>> +{
>> +    struct rdtgroup *rdtg, *crg;
>> +
>> +    if (rtype == RDTCTRL_GROUP && *p_grp == '\0') {
>> +        return &rdtgroup_default;
>> +    } else if (rtype == RDTCTRL_GROUP) {
>> +        list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list)
>> +            if (!strcmp(p_grp, rdtg->kn->name))
>> +                return rdtg;
>> +    } else if (rtype == RDTMON_GROUP) {
>> +        list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
>> +            if (!strcmp(p_grp, rdtg->kn->name)) {
>> +                list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
>> +                            mon.crdtgrp_list) {
>> +                    if (!strcmp(c_grp, crg->kn->name))
>> +                        return crg;
>> +                }
>> +            }
>> +        }
>> +    }
>> +
>> +    return NULL;
>> +}
>> +
>> +static int rdtgroup_process_flags(enum rdt_group_type rtype, char
>> *p_grp, char *c_grp, char *tok)
>> +{
>> +    struct rdt_resource *r =
>> &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> +    int op, mon_state, assign_state, unassign_state;
>> +    char *dom_str, *id_str, *op_str;
>> +    struct rdt_mon_domain *d;
>> +    struct rdtgroup *rdtgrp;
>> +    unsigned long dom_id;
>> +    int ret, found = 0;
>> +
>> +    rdtgrp = rdtgroup_find_grp(rtype, p_grp, c_grp);
>> +
>> +    if (!rdtgrp) {
>> +        rdt_last_cmd_puts("Not a valid resctrl group\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +next:
>> +    if (!tok || tok[0] == '\0')
>> +        return 0;
>> +
>> +    /* Start processing the strings for each domain */
>> +    dom_str = strim(strsep(&tok, ";"));
>> +
>> +    op_str = strpbrk(dom_str, "=+-");
>> +
>> +    if (op_str) {
>> +        op = *op_str;
>> +    } else {
>> +        rdt_last_cmd_puts("Missing operation =, +, -, _ character\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    id_str = strsep(&dom_str, "=+-");
>> +
>> +    if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
>> +        rdt_last_cmd_puts("Missing domain id\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    /* Verify if the dom_id is valid */
>> +    list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> +        if (d->hdr.id == dom_id) {
>> +            found = 1;
>> +            break;
>> +        }
>> +    }
>> +    if (!found) {
>> +        rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
>> +        return -EINVAL;
>> +    }
>> +
>> +    mon_state = rdtgroup_str_to_mon_state(dom_str);
>> +
>> +    assign_state = 0;
>> +    unassign_state = 0;
>> +
>> +    switch (op) {
>> +    case '+':
>> +        if (mon_state == ASSIGN_NONE) {
>> +            rdt_last_cmd_puts("Invalid assign opcode\n");
>> +            goto out_fail;
>> +        }
>> +        assign_state = mon_state;
>> +        break;
>> +    case '-':
>> +        if (mon_state == ASSIGN_NONE) {
>> +            rdt_last_cmd_puts("Invalid assign opcode\n");
>> +            goto out_fail;
>> +        }
>> +        unassign_state = mon_state;
>> +        break;
>> +    case '=':
>> +        assign_state = mon_state;
>> +        unassign_state = (ASSIGN_TOTAL | ASSIGN_LOCAL) & ~assign_state;
>> +        break;
>> +    default:
>> +        break;
>> +    }
>> +
> 
> this flow is not clear to me ... I see how an existing counter is
> configured but I do not see any counter being freed/allocated, where is that
> done?

My bad. There is a bug here. The following code updates the assignment
states if the group has the counter assigned already.
I need to add the check to allocated/free the counters based on
assign/unassign state requested. Good catch.



> 
>> +    if (assign_state & ASSIGN_TOTAL)
>> +        ret = resctrl_arch_assign_cntr(d, QOS_L3_MBM_TOTAL_EVENT_ID,
>> +                           rdtgrp->mon.rmid,
>> +                           rdtgrp->mon.cntr_id[0],
>> +                           rdtgrp->closid, 1);
>> +    if (ret)
>> +        goto out_fail;
>> +
>> +    if (assign_state & ASSIGN_LOCAL)
>> +        ret = resctrl_arch_assign_cntr(d, QOS_L3_MBM_LOCAL_EVENT_ID,
>> +                           rdtgrp->mon.rmid,
>> +                           rdtgrp->mon.cntr_id[1],
>> +                           rdtgrp->closid, 1);
>> +
>> +    if (ret)
>> +        goto out_fail;
>> +
>> +    if (unassign_state & ASSIGN_TOTAL)
>> +        ret = resctrl_arch_assign_cntr(d, QOS_L3_MBM_TOTAL_EVENT_ID,
>> +                           rdtgrp->mon.rmid,
>> +                           rdtgrp->mon.cntr_id[0],
>> +                           rdtgrp->closid, 0);
>> +
>> +    if (ret)
>> +        goto out_fail;
>> +
>> +    if (unassign_state & ASSIGN_LOCAL)
>> +        ret = resctrl_arch_assign_cntr(d, QOS_L3_MBM_LOCAL_EVENT_ID,
>> +                           rdtgrp->mon.rmid,
>> +                           rdtgrp->mon.cntr_id[1],
>> +                           rdtgrp->closid, 0);
>> +    if (ret)
>> +        goto out_fail;
>> +
>> +    goto next;
>> +
>> +out_fail:
>> +
>> +    return -EINVAL;
>> +}
>> +
>> +static ssize_t rdtgroup_mbm_control_write(struct kernfs_open_file *of,
>> +                      char *buf, size_t nbytes,
>> +                      loff_t off)
>> +{
>> +    struct rdt_resource *r = of->kn->parent->priv;
>> +    char *token, *cmon_grp, *mon_grp;
>> +    struct rdt_hw_resource *hw_res;
>> +    int ret;
>> +
>> +    hw_res = resctrl_to_arch_res(r);
>> +    if (!hw_res->abmc_enabled)
>> +        return -EINVAL;
>> +
>> +    /* Valid input requires a trailing newline */
>> +    if (nbytes == 0 || buf[nbytes - 1] != '\n')
>> +        return -EINVAL;
>> +
>> +    buf[nbytes - 1] = '\0';
>> +    rdt_last_cmd_clear();
> 
> rdt_last_cmd_clear() should be called with mutex held

Sure.
> 
>> +
>> +    cpus_read_lock();
>> +    mutex_lock(&rdtgroup_mutex);
>> +
>> +    while ((token = strsep(&buf, "\n")) != NULL) {
>> +        if (strstr(token, "//")) {
>> +            /*
>> +             * The CTRL_MON group processing:
>> +             * default CTRL_MON group: "//<flags>"
>> +             * non-default CTRL_MON group: "<CTRL_MON group>//flags"
>> +             * The CTRL_MON group will be empty string if it is a
>> +             * default group.
>> +             */
>> +            cmon_grp = strsep(&token, "//");
>> +
>> +            /*
>> +             * strsep returns empty string for contiguous delimiters.
>> +             * Make sure check for two consicutive delimiters and
> 
> consicutive -> consecutive

Sure.

> 
>> +             * advance the token.
>> +             */
>> +            mon_grp = strsep(&token, "//");
>> +            if (*mon_grp != '\0') {
>> +                rdt_last_cmd_printf("Invalid CTRL_MON group format
>> %s\n", token);
>> +                ret = -EINVAL;
>> +                break;
>> +            }
>> +
>> +            ret = rdtgroup_process_flags(RDTCTRL_GROUP, cmon_grp,
>> mon_grp, token);
>> +            if (ret)
>> +                break;
>> +        } else if (strstr(token, "/")) {
>> +            /*
>> +             * MON group processing:
>> +             * MON_GROUP inside default CTRL_MON group: "/<MON
>> group>/<flags>"
>> +             * MON_GROUP within CTRL_MON group: "<CTRL_MON group>/<MON
>> group>/<flags>"
>> +             */
>> +            cmon_grp = strsep(&token, "/");
>> +
>> +            /* Extract the MON_GROUP. It cannot be empty string */
>> +            mon_grp = strsep(&token, "/");
>> +            if (*mon_grp == '\0') {
>> +                rdt_last_cmd_printf("Invalid MON_GROUP format %s\n",
>> token);
>> +                ret = -EINVAL;
>> +                break;
>> +            }
>> +
>> +            ret = rdtgroup_process_flags(RDTMON_GROUP, cmon_grp,
>> mon_grp, token);
>> +            if (ret)
>> +                break;
>> +        }
> 
> can these two blocks not be merged? strsep(&token, "//") and
> strsep(&token, "/") do the same
> thing, no?

Sure. Will check.

> 
>> +    }
>> +
>> +    mutex_unlock(&rdtgroup_mutex);
>> +    cpus_read_unlock();
>> +
>> +    return ret ?: nbytes;
>> +}
>> +
>>   #ifdef CONFIG_PROC_CPU_RESCTRL
>>     /*
>> @@ -2282,9 +2529,10 @@ static struct rftype res_common_files[] = {
>>       },
>>       {
>>           .name        = "mbm_control",
>> -        .mode        = 0444,
>> +        .mode        = 0644,
>>           .kf_ops        = &rdtgroup_kf_single_ops,
>>           .seq_show    = rdtgroup_mbm_control_show,
>> +        .write        = rdtgroup_mbm_control_write,
>>       },
>>       {
>>           .name        = "cpus_list",
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2024-07-12 22:03 ` [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Reinette Chatre
@ 2024-07-17 17:19   ` Moger, Babu
  2024-08-01 21:49     ` Reinette Chatre
  0 siblings, 1 reply; 95+ messages in thread
From: Moger, Babu @ 2024-07-17 17:19 UTC (permalink / raw)
  To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 7/12/24 17:03, Reinette Chatre wrote:
> Hi Babu,
> 
> On 7/3/24 2:48 PM, Babu Moger wrote:
>> # Linux Implementation
>>
>> Linux resctrl subsystem provides the interface to count maximum of two
>> memory bandwidth events per group, from a combination of available total
>> and local events. Keeping the current interface, users can enable a maximum
>> of 2 ABMC counters per group. User will also have the option to enable only
>> one counter to the group. If the system runs out of assignable ABMC
>> counters, kernel will display an error. Users need to disable an already
>> enabled counter to make space for new assignments.
> 
> The implementation appears to be converging on an interface that can
> be generic enough to be used by other features discussed along the way.
> "Linux implementation" summary can thus add:
> 
>     Create a generic interface aimed to support user space assignment
>     of scarce counters used for monitoring. First usage of interface
>     is by ABMC with option to expand usage to "soft-RMID" and MPAM
>     counters in future.

Sure.

> 
> 
>> # Examples
>>
>> a. Check if ABMC support is available
>>     #mount -t resctrl resctrl /sys/fs/resctrl/
>>
>>     #cat /sys/fs/resctrl/info/L3_MON/mbm_mode
>>     [abmc]
>>     legacy
>>
>>     Linux kernel detected ABMC feature and it is enabled.
> 
> How about renaming "abmc" to "mbm_cntrs"? This will match the num_mbm_cntrs
> info file and be the final step to make this generic so that another
> architecture
> can more easily support assignining hardware counters without needing to call
> the feature AMD's "abmc".

I think we aleady settled this with "mbm_cntr_assignable".

For soft-RMID" it will be mbm_sw_assignable.


> 
> Expanding on this it may be possible to add a new "sw_mbm_cntrs" feature that
> will be the "soft-RMID" feature while also reflecting the "mbm_cntrs" name
> so that when user space enables that feature its properties can be found in
> "num_mbm_cntrs".
> 
> The "abmc" kernel parameter remains but that does seem separate from this
> resctrl fs feature since it is explicitly tied to X86_FEATURE_ABMC surely
> making it architecture specific.
> 
>>
>> b. Check how many ABMC counters are available.
>>
>>     #cat /sys/fs/resctrl/info/L3_MON/num_cntrs
>>     32
> 
> This is now num_mbm_cntrs

Sure.

> 
>>
>> c. Create few resctrl groups.
>>
>>     # mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
>>     # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
>>     # mkdir
>> /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
>>
>>
>> d. This series adds a new interface file
>> /sys/fs/resctrl/info/L3_MON/mbm_control
>>     to list and modify the group's monitoring states. File provides
>> single place
>>     to list monitoring states of all the resctrl groups. It makes it
>> easier for
>>     user space to learn about the counters are used without needing to
>> traverse
>>     all the groups thus reducing the number of filesystem calls.
>>
>>     The list follows the following format:
>>
>>     "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>>
>>     Format for specific type of groups:
>>
>>     * Default CTRL_MON group:
>>      "//<domain_id>=<flags>"
>>
>>         * Non-default CTRL_MON group:
>>                 "<CTRL_MON group>//<domain_id>=<flags>"
>>
>>         * Child MON group of default CTRL_MON group:
>>                 "/<MON group>/<domain_id>=<flags>"
>>
>>         * Child MON group of non-default CTRL_MON group:
>>                 "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>>
>>         Flags can be one of the following:
>>
>>          t  MBM total event is enabled.
>>          l  MBM local event is enabled.
>>          tl Both total and local MBM events are enabled.
>>          _  None of the MBM events are enabled
> 
> The language needs to be changed here (and in the many copied places) to
> be specific about what setting the flag accomplishes. For example, in
> "legacy" mode user space can be expected to find all events enabled, no?
> Needing a new feature to set a flag to accomplish something that is
> possible in legacy mode can thus cause confusion.

Yes. It is possible to do it. But I feel unnessassary.

> 
> If I understand the implementation reading "mbm_control" will fail
> if system is ABMC capable but it is disabled. Why can "mbm_control" not
> always be displayed to user space? For example, what if "mbm_control" is
> always available to user space and it can provide specific information to
> user space. For example:
>     t  MBM total event is enabled but may not always be counted.
>     T  MBM total event is enabled and being counted.
> 
> On AMD systems resource groups will have "t" associated with monitor
> groups when ABMC disabled, "T" when ABMC enabled and a counter assigned.
> On Intel systems monitor groups will always have "T".

I think more flags will add more confusion.

> 
> For "soft-RMID" the flag could possible continue to be "T"?
> 
> I am trying to find ways to communicate to user space consistently
> and clearly and any insights will be appreciated. We really do not want
> to add this interface and then find that it just causes confusion.
> 
> It is not quite obvious to me when the new files should be visible and
> what they should present to the user. "mbm_mode" is now always visible.
> Should "num_mbm_cntrs" not also always be visible? Right now "num_mbm_cntrs"
> appears to be only associated to ABMC, should it not also, for example,
> be the file that "soft-RMID" may use to share how many counters are
> available? Its contents will thus be dynamic based on which "MBM mode" is
> active, begging the question, what should it contain when "legacy" mode is
> enabled, should "num_mbm_cntrs" perhaps show "0" to user space when
> "legacy" mode is active?

Its good we have this discussion.

How about we go with simple way for now. The mbm_mode will only available
when ABMC or Soft_RMID(MPAM feature) is supported. Same way for the
num_mbm_cntrs.


> 
>>
>>     Examples:
>>
>>     # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>>     non_default_ctrl_mon_grp//0=tl;1=tl;
>>     non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>>     //0=tl;1=tl;
>>     /child_default_mon_grp/0=tl;1=tl;
>>     
>>     There are four groups and all the groups have local and total
>>     event enabled on domain 0 and 1.
> 
> "local and total event" is vague, can it be made specific with, for example,
> "local and total MBM events"

Sure.

> 
>>
>>     =tl means both total and local events are enabled.
> 
> Same here (and all copied places in this series)

Sure.

> 
>>
>>     "//" - This is a default CTRL_MON group
>>
>>     "non_default_ctrl_mon_grp//" - This is non-default CTRL_MON group
>>
>>     "/child_default_mon_grp/"  - This is Child MON group of the defult
>> group
> 
> Same typos as in previous version of cover letter.

Oh. no. Will fix it.

> 
>>
>>     "non_default_ctrl_mon_grp/child_non_default_mon_grp/" - This is child
>>     MON group of the non-default group
>>
>> e. Update the group assignment states using the interface file
>> /sys/fs/resctrl/info/L3_MON/mbm_control.
>>
>>     The write format is similar to the above list format with addition of
>>     op-code for the assignment operation.
>>     
>>     * Default CTRL_MON group:
>>             "//<domain_id><op-code><flags>"
>>     
>>     * Non-default CTRL_MON group:
>>             "<CTRL_MON group>//<domain_id><op-code><flags>"
>>     
>>     * Child MON group of default CTRL_MON group:
>>             "/<MON group>/<domain_id><op-code><flags>"
>>     
>>     * Child MON group of non-default CTRL_MON group:
>>             "<CTRL_MON group>/<MON group>/<domain_id><op-code><flags>"
>>     
>>     Op-code can be one of the following:
>>     
>>     = Update the assignment to match the flag.
>>     + Assign a new state.
>>     - Unassign a new state.
> 
> Please be consistent with terminology. Above switches between "flag"
> and "state" while it then continues below using "event". Also,
> "Unassign a _new_ state" is unexpected, it should probably be an
> _existing_ (not "new") state/flag/event?

I will use event consistantly.

> 
>>
>>     Flags can be one of the following:
>>
>>          t  MBM total event.
>>          l  MBM local event.
>>          tl Both total and local MBM events.
>>          _  None of the MBM events. Only works with '=' op-code.
>>     
>>     Initial group status:
>>     # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>>     non_default_ctrl_mon_grp//0=tl;1=tl;
>>     non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>>     //0=tl;1=tl;
>>     /child_default_mon_grp/0=tl;1=tl;
>>
>>     To update the default group to enable only total event on domain 0:
>>     # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_control
>>
>>     Assignment status after the update:
>>     # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>>     non_default_ctrl_mon_grp//0=tl;1=tl;
>>     non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>>     //0=t;1=tl;
>>     /child_default_mon_grp/0=tl;1=tl;
>>
>>     To update the MON group child_default_mon_grp to remove total event
>> on domain 1:
>>     # echo "/child_default_mon_grp/1-t" >
>> /sys/fs/resctrl/info/L3_MON/mbm_control
>>
>>     Assignment status after the update:
>>     $ cat /sys/fs/resctrl/info/L3_MON/mbm_control
>>     non_default_ctrl_mon_grp//0=tl;1=tl;
>>     non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>>     //0=t;1=tl;
>>     /child_default_mon_grp/0=tl;1=l;
>>
>>     To update the MON group
>> non_default_ctrl_mon_grp/child_non_default_mon_grp to
>>     remove both local and total events on domain 1:
>>     # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
>>            /sys/fs/resctrl/info/L3_MON/mbm_control
>>
>>     Assignment status after the update:
>>     non_default_ctrl_mon_grp//0=tl;1=tl;
>>     non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
>>     //0=t;1=tl;
>>     /child_default_mon_grp/0=tl;1=l;
>>
>>     To update the default group to add a local event domain 0.
>>     # echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_control
>>
>>     Assignment status after the update:
>>     # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>>     non_default_ctrl_mon_grp//0=tl;1=tl;
>>     non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
>>     //0=tl;1=tl;
>>     /child_default_mon_grp/0=tl;1=l;
>>
>>
>> f. Read the event mbm_total_bytes and mbm_local_bytes of the default group.
>>     There is no change in reading the events with ABMC. If the event is
>> unassigned
>>     when reading, then the read will come back as "Unassigned".
>>     
>>     # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>     779247936
>>     # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>     765207488
>>     
>> g. Users will have the option to go back to legacy mbm_mode if required.
>>     This can be done using the following command. Note that switching the
>>     mbm_mode will reset all the mbm counters of all resctrl groups.
> 
> mbm -> MBM (throughout)

Sure.

> 
>>
>>     # echo "legacy" > /sys/fs/resctrl/info/L3_MON/mbm_mode
>>     # cat /sys/fs/resctrl/info/L3_MON/mbm_mode
>>     abmc
>>     [legacy]
>>
>> h. Check the bandwidth configuration for the group. Note that bandwidth
>>     configuration has a domain scope. Total event defaults to 0x7F (to
>>     count all the events) and local event defaults to 0x15 (to count all
>>     the local numa events). The event bitmap decoding is available at
>>     https://www.kernel.org/doc/Documentation/x86/resctrl.rst
>>     in section "mbm_total_bytes_config", "mbm_local_bytes_config":
>>     
>>     #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
>>     0=0x7f;1=0x7f
>>     
>>     #cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
>>     0=0x15;1=0x15
>>     
>> j. Change the bandwidth source for domain 0 for the total event to count
>> only reads.
>>     Note that this change effects total events on the domain 0.
>>     
>>     #echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
>>     #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
>>     0=0x33;1=0x7F
>>     
>> k. Now read the total event again. The first read will come back with
>> "Unavailable"
>>     status. The subsequent read of mbm_total_bytes will display only the
>> read events.
>>     
>>     #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>     Unavailable
>>     #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>     314101
>>     
>> l. Unmount the resctrl
>>     
>>     #umount /sys/fs/resctrl/
>>
> 
> Reinette
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 06/20] x86/resctrl: Add support to enable/disable AMD ABMC feature
  2024-07-16 17:51       ` Reinette Chatre
  2024-07-16 18:48         ` Moger, Babu
@ 2024-07-18 21:11         ` Moger, Babu
  1 sibling, 0 replies; 95+ messages in thread
From: Moger, Babu @ 2024-07-18 21:11 UTC (permalink / raw)
  To: Reinette Chatre, corbet@lwn.net, fenghua.yu@intel.com,
	tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com
  Cc: x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
	rdunlap@infradead.org, tj@kernel.org, peterz@infradead.org,
	yanjiewtw@gmail.com, Phillips, Kim, lukas.bulwahn@gmail.com,
	seanjc@google.com, jmattson@google.com, leitao@debian.org,
	jpoimboe@kernel.org, rick.p.edgecombe@intel.com,
	kirill.shutemov@linux.intel.com, jithu.joseph@intel.com,
	kai.huang@intel.com, kan.liang@linux.intel.com,
	daniel.sneddon@linux.intel.com, pbonzini@redhat.com,
	Das1, Sandipan, ilpo.jarvinen@linux.intel.com,
	peternewman@google.com, maciej.wieczor-retman@intel.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	eranian@google.com, james.morse@arm.com

Hi Reinette,

On 7/16/24 12:51, Reinette Chatre wrote:
> Hi Babu,
> 
> On 7/16/24 8:13 AM, Moger, Babu wrote:
>> On 7/12/24 17:05, Reinette Chatre wrote:
>>> On 7/3/24 2:48 PM, Babu Moger wrote:
>>>> Add the functionality to enable/disable AMD ABMC feature.
>>>>
>>>> AMD ABMC feature is enabled by setting enabled bit(0) in MSR
>>>> L3_QOS_EXT_CFG.  When the state of ABMC is changed, the MSR needs
>>>> to be updated on all the logical processors in the QOS Domain.
>>>>
>>>> Hardware counters will reset when ABMC state is changed. Reset the
>>>> architectural state so that reading of hardware counter is not considered
>>>> as an overflow in next update.
>>>>
>>>> The ABMC feature details are documented in APM listed below [1].
>>>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>>>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>>>> Monitoring (ABMC).
>>>>
>>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>>>> ---
>>>> v5: Renamed resctrl_abmc_enable to resctrl_arch_abmc_enable.
>>>>       Renamed resctrl_abmc_disable to resctrl_arch_abmc_disable.
>>>>       Introduced resctrl_arch_get_abmc_enabled to get abmc state from
>>>>       non-arch code.
>>>>       Renamed resctrl_abmc_set_all to _resctrl_abmc_enable().
>>>>       Modified commit log to make it clear about AMD ABMC feature.
>>>>
>>>> v3: No changes.
>>>>
>>>> v2: Few text changes in commit message.
>>>> ---
>>>>    arch/x86/include/asm/msr-index.h       |  1 +
>>>>    arch/x86/kernel/cpu/resctrl/internal.h | 13 +++++
>>>>    arch/x86/kernel/cpu/resctrl/rdtgroup.c | 66 ++++++++++++++++++++++++++
>>>>    3 files changed, 80 insertions(+)
>>>>
>>>> diff --git a/arch/x86/include/asm/msr-index.h
>>>> b/arch/x86/include/asm/msr-index.h
>>>> index 01342963011e..263b2d9d00ed 100644
>>>> --- a/arch/x86/include/asm/msr-index.h
>>>> +++ b/arch/x86/include/asm/msr-index.h
>>>> @@ -1174,6 +1174,7 @@
>>>>    #define MSR_IA32_MBA_BW_BASE        0xc0000200
>>>>    #define MSR_IA32_SMBA_BW_BASE        0xc0000280
>>>>    #define MSR_IA32_EVT_CFG_BASE        0xc0000400
>>>> +#define MSR_IA32_L3_QOS_EXT_CFG        0xc00003ff
>>>>      /* MSR_IA32_VMX_MISC bits */
>>>>    #define MSR_IA32_VMX_MISC_INTEL_PT                 (1ULL << 14)
>>>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
>>>> b/arch/x86/kernel/cpu/resctrl/internal.h
>>>> index 2bd207624eec..0ce9797f80fe 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>>>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>>>> @@ -97,6 +97,9 @@ cpumask_any_housekeeping(const struct cpumask *mask,
>>>> int exclude_cpu)
>>>>        return cpu;
>>>>    }
>>>>    +/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature */
>>>
>>> Please be consistent throughout series to have sentences end with period.
>>
>> Sure.
>>
>>>
>>>> +#define ABMC_ENABLE            BIT(0)
>>>> +
>>>>    struct rdt_fs_context {
>>>>        struct kernfs_fs_context    kfc;
>>>>        bool                enable_cdpl2;
>>>> @@ -477,6 +480,7 @@ struct rdt_parse_data {
>>>>     * @mbm_cfg_mask:    Bandwidth sources that can be tracked when
>>>> Bandwidth
>>>>     *            Monitoring Event Configuration (BMEC) is supported.
>>>>     * @cdp_enabled:    CDP state of this resource
>>>> + * @abmc_enabled:    ABMC feature is enabled
>>>>     *
>>>>     * Members of this structure are either private to the architecture
>>>>     * e.g. mbm_width, or accessed via helpers that provide
>>>> abstraction. e.g.
>>>> @@ -491,6 +495,7 @@ struct rdt_hw_resource {
>>>>        unsigned int        mbm_width;
>>>>        unsigned int        mbm_cfg_mask;
>>>>        bool            cdp_enabled;
>>>> +    bool            abmc_enabled;
>>>>    };
>>>
>>> mbm_cntr_enabled? This is architecture specific code so there is more
>>> flexibility
>>> here, but it may make implementation easier to understand if consistent
>>> naming is used
>>> between fs and arch code.
>>
>> How about "mbm_cntr_assign_enabled" or "cntr_assign_enabled" ?
> 
> My preference is to keep the term "mbm_cntr" to be consistent with the
> other variables/struct members to help when reading the code.
> "mbm_cntr_assign_enabled" does seem to be getting long though.
> Are you planning to use it by assigning it to a local variable with shorter
> name?
> 
> As a sidenote, I will be offline for large portions of the next few weeks
> and thus unresponsive during this time. I'll be back to a regular
> schedule on August 12th.
> 
I will start working on v6 sometime next week. I will address all the
things which we have discussed already.

We still have to figure out few others items related to displaying the new
interface files when feature ABMC is available vs not available. We can
discuss it with v6.
-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 20/20] x86/resctrl: Introduce interface to modify assignment states of the groups
  2024-07-03 21:48 ` [PATCH v5 20/20] x86/resctrl: Introduce interface to modify assignment states of " Babu Moger
  2024-07-12 22:17   ` Reinette Chatre
@ 2024-07-25  0:03   ` Peter Newman
  2024-07-25  1:22     ` Moger, Babu
  1 sibling, 1 reply; 95+ messages in thread
From: Peter Newman @ 2024-07-25  0:03 UTC (permalink / raw)
  To: Babu Moger
  Cc: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen,
	x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	maciej.wieczor-retman, linux-doc, linux-kernel, eranian,
	james.morse

Hi Babu,

On Wed, Jul 3, 2024 at 2:51 PM Babu Moger <babu.moger@amd.com> wrote:
>
> Introduce the interface to enable events in ABMC mode.
>
> Events can be enabled or disabled by writing to file
> /sys/fs/resctrl/info/L3_MON/mbm_control
>
> Format is similar to the list format with addition of op-code for the
> assignment operation.
>  "<CTRL_MON group>/<MON group>/<op-code><flags>"
>
> Format for specific type of groups:
>
>  * Default CTRL_MON group:
>          "//<domain_id><op-code><flags>"
>
>  * Non-default CTRL_MON group:
>          "<CTRL_MON group>//<domain_id><op-code><flags>"
>
>  * Child MON group of default CTRL_MON group:
>          "/<MON group>/<domain_id><op-code><flags>"
>
>  * Child MON group of non-default CTRL_MON group:
>          "<CTRL_MON group>/<MON group>/<domain_id><op-code><flags>"

Just a reminder, Reinette and I had discussed[1] omitting the
domain_id for performing the same operation on all domains.

I would really appreciate this, otherwise our most typical operations
could be really tedious and needlessly serialized.

# cat mbm_control
//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl;12=tl;13=tl;14=tl;15=tl;16=tl;17=tl;18=tl;19=tl;20=tl;21=tl;22=tl;23=tl;24=tl;25=tl;26=tl;27=tl;28=tl;29=tl;30=tl;31=tl;
# echo '//-l' > mbm_control
-bash: echo: write error: Invalid argument
# cat ../last_cmd_status
Missing domain id

If you can't get to it in this series, I'll push a
scalability-oriented series after the basic assignment support is
merged.

Thanks!
-Peter

[1] https://lore.kernel.org/lkml/CALPaoChcJq5zoPchB2j0aM+nZpQe1xoo7w2QQUjtH+c58Yyxag@mail.gmail.com/

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 20/20] x86/resctrl: Introduce interface to modify assignment states of the groups
  2024-07-25  0:03   ` Peter Newman
@ 2024-07-25  1:22     ` Moger, Babu
  2024-07-25 17:11       ` Peter Newman
  0 siblings, 1 reply; 95+ messages in thread
From: Moger, Babu @ 2024-07-25  1:22 UTC (permalink / raw)
  To: Peter Newman, Babu Moger
  Cc: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen,
	x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	maciej.wieczor-retman, linux-doc, linux-kernel, eranian,
	james.morse

Hi Peter,

On 7/24/2024 7:03 PM, Peter Newman wrote:
> Hi Babu,
> 
> On Wed, Jul 3, 2024 at 2:51 PM Babu Moger <babu.moger@amd.com> wrote:
>>
>> Introduce the interface to enable events in ABMC mode.
>>
>> Events can be enabled or disabled by writing to file
>> /sys/fs/resctrl/info/L3_MON/mbm_control
>>
>> Format is similar to the list format with addition of op-code for the
>> assignment operation.
>>   "<CTRL_MON group>/<MON group>/<op-code><flags>"
>>
>> Format for specific type of groups:
>>
>>   * Default CTRL_MON group:
>>           "//<domain_id><op-code><flags>"
>>
>>   * Non-default CTRL_MON group:
>>           "<CTRL_MON group>//<domain_id><op-code><flags>"
>>
>>   * Child MON group of default CTRL_MON group:
>>           "/<MON group>/<domain_id><op-code><flags>"
>>
>>   * Child MON group of non-default CTRL_MON group:
>>           "<CTRL_MON group>/<MON group>/<domain_id><op-code><flags>"
> 
> Just a reminder, Reinette and I had discussed[1] omitting the
> domain_id for performing the same operation on all domains.

Yes. I remember. Lets refresh our memory.
> 
> I would really appreciate this, otherwise our most typical operations
> could be really tedious and needlessly serialized.

> 
> # cat mbm_control
> //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl;12=tl;13=tl;14=tl;15=tl;16=tl;17=tl;18=tl;19=tl;20=tl;21=tl;22=tl;23=tl;24=tl;25=tl;26=tl;27=tl;28=tl;29=tl;30=tl;31=tl;
> # echo '//-l' > mbm_control

What is the expectation here?
You want to unassign local event on all the domains?

Domain id makes it easy to parse the command. Without that it parsing 
code becomes  messy.

How about something like this? We can use the max domain id to mean all 
the domains. In the above case there are 32 domains(0-31). 32 is total 
number of domains. We can get that details looking through all the 
domains. We can print that detail when we list it.

# cat mbm_control
//0=tl;1=tl;2=tl;3=tl;... 31=tl;
Max domain id is 31. Use domain-id 32 to apply the flags on all the 
domains.

echo '//32-l' > mbm_control

There is only on syscall but IPIs will be sent to all the domains.

Any other ideas?

> -bash: echo: write error: Invalid argument
> # cat ../last_cmd_status
> Missing domain id
> 
> If you can't get to it in this series, I'll push a
> scalability-oriented series after the basic assignment support is
> merged.

Lets try to get this resolved in this series.

> 
> Thanks!
> -Peter
> 
> [1] https://lore.kernel.org/lkml/CALPaoChcJq5zoPchB2j0aM+nZpQe1xoo7w2QQUjtH+c58Yyxag@mail.gmail.com/

-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 20/20] x86/resctrl: Introduce interface to modify assignment states of the groups
  2024-07-25  1:22     ` Moger, Babu
@ 2024-07-25 17:11       ` Peter Newman
  2024-07-25 17:28         ` Moger, Babu
  0 siblings, 1 reply; 95+ messages in thread
From: Peter Newman @ 2024-07-25 17:11 UTC (permalink / raw)
  To: babu.moger
  Cc: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen,
	x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	maciej.wieczor-retman, linux-doc, linux-kernel, eranian,
	james.morse

Hi Babu,

On Wed, Jul 24, 2024 at 6:23 PM Moger, Babu <bmoger@amd.com> wrote:
>
> Hi Peter,
>
> On 7/24/2024 7:03 PM, Peter Newman wrote:
> > Hi Babu,
> >
> > On Wed, Jul 3, 2024 at 2:51 PM Babu Moger <babu.moger@amd.com> wrote:
> >>
> >> Introduce the interface to enable events in ABMC mode.
> >>
> >> Events can be enabled or disabled by writing to file
> >> /sys/fs/resctrl/info/L3_MON/mbm_control
> >>
> >> Format is similar to the list format with addition of op-code for the
> >> assignment operation.
> >>   "<CTRL_MON group>/<MON group>/<op-code><flags>"
> >>
> >> Format for specific type of groups:
> >>
> >>   * Default CTRL_MON group:
> >>           "//<domain_id><op-code><flags>"
> >>
> >>   * Non-default CTRL_MON group:
> >>           "<CTRL_MON group>//<domain_id><op-code><flags>"
> >>
> >>   * Child MON group of default CTRL_MON group:
> >>           "/<MON group>/<domain_id><op-code><flags>"
> >>
> >>   * Child MON group of non-default CTRL_MON group:
> >>           "<CTRL_MON group>/<MON group>/<domain_id><op-code><flags>"
> >
> > Just a reminder, Reinette and I had discussed[1] omitting the
> > domain_id for performing the same operation on all domains.
>
> Yes. I remember. Lets refresh our memory.
> >
> > I would really appreciate this, otherwise our most typical operations
> > could be really tedious and needlessly serialized.
>
> >
> > # cat mbm_control
> > //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl;12=tl;13=tl;14=tl;15=tl;16=tl;17=tl;18=tl;19=tl;20=tl;21=tl;22=tl;23=tl;24=tl;25=tl;26=tl;27=tl;28=tl;29=tl;30=tl;31=tl;
> > # echo '//-l' > mbm_control
>
> What is the expectation here?
> You want to unassign local event on all the domains?

Correct.

>
> Domain id makes it easy to parse the command. Without that it parsing
> code becomes  messy.
>
> How about something like this? We can use the max domain id to mean all
> the domains. In the above case there are 32 domains(0-31). 32 is total
> number of domains. We can get that details looking through all the
> domains. We can print that detail when we list it.

This sounds like only a minor simplification to the parsing code. It
seems like it would be easy to determine if the final '/' is
immediately followed by an opcode (+-=_) rather than a number.

-Peter

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 20/20] x86/resctrl: Introduce interface to modify assignment states of the groups
  2024-07-25 17:11       ` Peter Newman
@ 2024-07-25 17:28         ` Moger, Babu
  2024-08-01 18:56           ` Reinette Chatre
  0 siblings, 1 reply; 95+ messages in thread
From: Moger, Babu @ 2024-07-25 17:28 UTC (permalink / raw)
  To: Peter Newman
  Cc: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen,
	x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	maciej.wieczor-retman, linux-doc, linux-kernel, eranian,
	james.morse

Hi Peter,

On 7/25/24 12:11, Peter Newman wrote:
> Hi Babu,
> 
> On Wed, Jul 24, 2024 at 6:23 PM Moger, Babu <bmoger@amd.com> wrote:
>>
>> Hi Peter,
>>
>> On 7/24/2024 7:03 PM, Peter Newman wrote:
>>> Hi Babu,
>>>
>>> On Wed, Jul 3, 2024 at 2:51 PM Babu Moger <babu.moger@amd.com> wrote:
>>>>
>>>> Introduce the interface to enable events in ABMC mode.
>>>>
>>>> Events can be enabled or disabled by writing to file
>>>> /sys/fs/resctrl/info/L3_MON/mbm_control
>>>>
>>>> Format is similar to the list format with addition of op-code for the
>>>> assignment operation.
>>>>   "<CTRL_MON group>/<MON group>/<op-code><flags>"
>>>>
>>>> Format for specific type of groups:
>>>>
>>>>   * Default CTRL_MON group:
>>>>           "//<domain_id><op-code><flags>"
>>>>
>>>>   * Non-default CTRL_MON group:
>>>>           "<CTRL_MON group>//<domain_id><op-code><flags>"
>>>>
>>>>   * Child MON group of default CTRL_MON group:
>>>>           "/<MON group>/<domain_id><op-code><flags>"
>>>>
>>>>   * Child MON group of non-default CTRL_MON group:
>>>>           "<CTRL_MON group>/<MON group>/<domain_id><op-code><flags>"
>>>
>>> Just a reminder, Reinette and I had discussed[1] omitting the
>>> domain_id for performing the same operation on all domains.
>>
>> Yes. I remember. Lets refresh our memory.
>>>
>>> I would really appreciate this, otherwise our most typical operations
>>> could be really tedious and needlessly serialized.
>>
>>>
>>> # cat mbm_control
>>> //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl;12=tl;13=tl;14=tl;15=tl;16=tl;17=tl;18=tl;19=tl;20=tl;21=tl;22=tl;23=tl;24=tl;25=tl;26=tl;27=tl;28=tl;29=tl;30=tl;31=tl;
>>> # echo '//-l' > mbm_control
>>
>> What is the expectation here?
>> You want to unassign local event on all the domains?
> 
> Correct.
> 
>>
>> Domain id makes it easy to parse the command. Without that it parsing
>> code becomes  messy.
>>
>> How about something like this? We can use the max domain id to mean all
>> the domains. In the above case there are 32 domains(0-31). 32 is total
>> number of domains. We can get that details looking through all the
>> domains. We can print that detail when we list it.
> 
> This sounds like only a minor simplification to the parsing code. It
> seems like it would be easy to determine if the final '/' is
> immediately followed by an opcode (+-=_) rather than a number.

Ok. Will try to get that working. Will let you know if there are
complexities with that.--
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 18/20] x86/resctrl: Enable AMD ABMC feature by default when supported
  2024-07-16 23:23     ` Moger, Babu
@ 2024-07-26  0:16       ` Moger, Babu
  2024-08-01 21:40         ` Reinette Chatre
  0 siblings, 1 reply; 95+ messages in thread
From: Moger, Babu @ 2024-07-26  0:16 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 7/16/2024 6:23 PM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 7/12/2024 5:15 PM, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 7/3/24 2:48 PM, Babu Moger wrote:
>>> Enable ABMC by default when supported during the boot up.
>>>
>>> Users will not see any difference in the behavior when resctrl is
>>> mounted. With automatic assignment everything will work as running
>>> in the legacy monitor mode.
>>>
>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>>> ---
>>> v5: New patch to enable ABMC by default.
>>> ---
>>>   arch/x86/kernel/cpu/resctrl/core.c     |  2 ++
>>>   arch/x86/kernel/cpu/resctrl/internal.h |  1 +
>>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 17 +++++++++++++++++
>>>   3 files changed, 20 insertions(+)
>>>
>>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
>>> b/arch/x86/kernel/cpu/resctrl/core.c
>>> index 6265ef8b610f..b69b2650bde3 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>>> @@ -599,6 +599,7 @@ static void domain_add_cpu_mon(int cpu, struct 
>>> rdt_resource *r)
>>>           d = container_of(hdr, struct rdt_mon_domain, hdr);
>>>           cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
>>> +        resctrl_arch_configure_abmc();
>>>           return;
>>>       }
>>> @@ -620,6 +621,7 @@ static void domain_add_cpu_mon(int cpu, struct 
>>> rdt_resource *r)
>>>       arch_mon_domain_online(r, d);
>>>       resctrl_arch_mbm_evt_config(hw_dom);
>>> +    resctrl_arch_configure_abmc();
>>>       if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
>>>           mon_domain_free(hw_dom);
>>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h 
>>> b/arch/x86/kernel/cpu/resctrl/internal.h
>>> index beb005775fe4..0f858cff8ab1 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>>> @@ -707,6 +707,7 @@ void rdt_domain_reconfigure_cdp(struct 
>>> rdt_resource *r);
>>>   void __init resctrl_file_fflags_init(const char *config,
>>>                        unsigned long fflags);
>>>   void resctrl_arch_mbm_evt_config(struct rdt_hw_mon_domain *hw_dom);
>>> +void resctrl_arch_configure_abmc(void);
>>>   unsigned int mon_event_config_index_get(u32 evtid);
>>>   int resctrl_arch_assign_cntr(struct rdt_mon_domain *d, u32 evtid, 
>>> u32 rmid,
>>>                    u32 cntr_id, u32 closid, bool enable);
>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
>>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> index 531233779f8d..d978668c8865 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> @@ -2733,6 +2733,23 @@ void resctrl_arch_abmc_disable(void)
>>>       }
>>>   }
>>> +void resctrl_arch_configure_abmc(void)
>>> +{
>>> +    struct rdt_resource *r = 
>>> &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>>> +    struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>>> +    bool enable = true;
>>> +
>>> +    mutex_lock(&rdtgroup_mutex);
>>> +
>>> +    if (r->mon.abmc_capable) {
>>> +        if (!hw_res->abmc_enabled)
>>> +            hw_res->abmc_enabled = true;
>>> +        resctrl_abmc_set_one_amd(&enable);
>>> +    }
>>
>> This does not look right. It is not architecture code that needs to
>> decide if this feature is enabled or not, right? The feature is enabled
>> via fs (for example when user writes to mbm_mode). If the default is
>> enabled then it should be set by fs. resctrl_arch_configure_abmc()
>> then checks if feature is capable and enabled before it configures
>> it on the CPU.

Looking at the code again, I think it is fine to do it here. This is 
arch initialization code. I am checking if the feature is available and 
enable it by default. The fs code is not initialized yet at this stage.

Other option is to move everything to rdt_enable_ctx which is during the 
mount time.

I will keep it as is now. We can discuss more on this in v6.

> 
> That is correct. But this is a default setting should be done during the 
> initialization. This is like rdtgroup_setup_default(). I can move this 
> inside rdtgroup_init(void). I will have to change few things make sure 
> arch and fs code separate (like accessing abmc_enabled).
> Thanks
> - Babu Moger
> 

-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 09/20] x86/resctrl: Initialize monitor counters bitmap
  2024-07-03 21:48 ` [PATCH v5 09/20] x86/resctrl: Initialize monitor counters bitmap Babu Moger
  2024-07-12 22:07   ` Reinette Chatre
@ 2024-07-26 22:48   ` Peter Newman
  2024-07-26 23:53     ` Moger, Babu
  2024-08-01 21:05     ` Reinette Chatre
  1 sibling, 2 replies; 95+ messages in thread
From: Peter Newman @ 2024-07-26 22:48 UTC (permalink / raw)
  To: Babu Moger
  Cc: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen,
	x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	maciej.wieczor-retman, linux-doc, linux-kernel, eranian,
	james.morse

Hi Babu,

On Wed, Jul 3, 2024 at 2:50 PM Babu Moger <babu.moger@amd.com> wrote:
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 4f47f52e01c2..b3d3fa048f15 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -185,6 +185,23 @@ bool closid_allocated(unsigned int closid)
>         return !test_bit(closid, &closid_free_map);
>  }
>
> +/*
> + * Counter bitmap and its length for tracking available counters.
> + * ABMC feature provides set of hardware counters for enabling events.
> + * Each event takes one hardware counter. Kernel needs to keep track
> + * of number of available counters.
> + */
> +static unsigned long mbm_cntrs_free_map;
> +static unsigned int mbm_cntrs_free_map_len;

If counter assignment is supported at a per-domain granularity, then
counter id allocation needs to be done per-domain rather than
globally. For example, if I free a counter from one group in a
particular domain, it should be available to allocate to another group
only in that domain.

When I attempt this using the current series, the resulting behavior
is quite interesting. I noticed Reinette also commented on this later
in the series, noticing that counters are only allocated permanently
to groups and never move as a result of writing to mbm_control.

# grep 'g1[45]' info/L3_MON/mbm_control
test/g14/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl;12=tl;13=tl;14=tl;15=tl;16=tl;17=tl;18=tl;19=tl;20=tl;21=tl;22=tl;23=tl;24=tl;25=tl;26=tl;27=tl;28=tl;29=tl;30=tl;31=tl;
test/g15/0=_;1=_;2=_;3=_;4=_;5=_;6=_;7=_;8=_;9=_;10=_;11=_;12=_;13=_;14=_;15=_;16=_;17=_;18=_;19=_;20=_;21=_;22=_;23=_;24=_;25=_;26=_;27=_;28=_;29=_;30=_;31=_;

[domains 2-31 omitted for clarity below]

# echo 'test/g14/1-t' > info/L3_MON/mbm_control
# grep 'g1[45]' info/L3_MON/mbm_control
test/g14/0=tl;1=l;
test/g15/0=_;1=_;

# echo "test/g15/1+t" > info/L3_MON/mbm_control
# grep 'g1[45]' info/L3_MON/mbm_control
test/g14/0=tl;1=_;
test/g15/0=_;1=_;


-Peter

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 15/20] x86/resctrl: Assign/unassign counters by default when ABMC is enabled
  2024-07-03 21:48 ` [PATCH v5 15/20] x86/resctrl: Assign/unassign counters by default when ABMC is enabled Babu Moger
  2024-07-12 22:10   ` Reinette Chatre
@ 2024-07-26 23:22   ` Peter Newman
  2024-07-26 23:57     ` Moger, Babu
  1 sibling, 1 reply; 95+ messages in thread
From: Peter Newman @ 2024-07-26 23:22 UTC (permalink / raw)
  To: Babu Moger
  Cc: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen,
	x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	maciej.wieczor-retman, linux-doc, linux-kernel, eranian,
	james.morse

Hi Babu,

On Wed, Jul 3, 2024 at 2:50 PM Babu Moger <babu.moger@amd.com> wrote:

> @@ -3894,6 +3956,17 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
>         if (ret)
>                 goto out_closid_free;
>
> +       /*
> +        * Assign the counters if ABMC is already enabled.
> +        * With the limited number of counters, there can be cases
> +        * only on assignment succeed. It is not required to fail
> +        * here in that case. Users have the option to assign the
> +        * counter later.
> +        */
> +
> +       if (rdtgroup_assign_cntrs(rdtgrp) < 0)
> +               rdt_last_cmd_puts("Monitor assignment failed\n");

Supposing rdtgroup_init_alloc() below fails, would you want to release
the counters allocated here?

> +
>         kernfs_activate(rdtgrp->kn);
>
>         ret = rdtgroup_init_alloc(rdtgrp);

-Peter

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 09/20] x86/resctrl: Initialize monitor counters bitmap
  2024-07-26 22:48   ` Peter Newman
@ 2024-07-26 23:53     ` Moger, Babu
  2024-08-01 21:05     ` Reinette Chatre
  1 sibling, 0 replies; 95+ messages in thread
From: Moger, Babu @ 2024-07-26 23:53 UTC (permalink / raw)
  To: Peter Newman, Babu Moger
  Cc: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen,
	x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	maciej.wieczor-retman, linux-doc, linux-kernel, eranian,
	james.morse

Hi Peter,

On 7/26/2024 5:48 PM, Peter Newman wrote:
> Hi Babu,
> 
> On Wed, Jul 3, 2024 at 2:50 PM Babu Moger <babu.moger@amd.com> wrote:
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 4f47f52e01c2..b3d3fa048f15 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -185,6 +185,23 @@ bool closid_allocated(unsigned int closid)
>>          return !test_bit(closid, &closid_free_map);
>>   }
>>
>> +/*
>> + * Counter bitmap and its length for tracking available counters.
>> + * ABMC feature provides set of hardware counters for enabling events.
>> + * Each event takes one hardware counter. Kernel needs to keep track
>> + * of number of available counters.
>> + */
>> +static unsigned long mbm_cntrs_free_map;
>> +static unsigned int mbm_cntrs_free_map_len;
> 
> If counter assignment is supported at a per-domain granularity, then
> counter id allocation needs to be done per-domain rather than
> globally. For example, if I free a counter from one group in a
> particular domain, it should be available to allocate to another group
> only in that domain.

Yes. I noticed the problem.
> 
> When I attempt this using the current series, the resulting behavior
> is quite interesting. I noticed Reinette also commented on this later
> in the series, noticing that counters are only allocated permanently
> to groups and never move as a result of writing to mbm_control.

Working on fixing it right now.

We need to have bitmap at group level(global) as well as at domain 
level.  We need to set/clear bits at both the places when 
assigned/unassigned. If all the domsins are cleared then we need to free 
the counter at group level. Will address it v6. Thanks for the comments.

> 
> # grep 'g1[45]' info/L3_MON/mbm_control
> test/g14/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl;12=tl;13=tl;14=tl;15=tl;16=tl;17=tl;18=tl;19=tl;20=tl;21=tl;22=tl;23=tl;24=tl;25=tl;26=tl;27=tl;28=tl;29=tl;30=tl;31=tl;
> test/g15/0=_;1=_;2=_;3=_;4=_;5=_;6=_;7=_;8=_;9=_;10=_;11=_;12=_;13=_;14=_;15=_;16=_;17=_;18=_;19=_;20=_;21=_;22=_;23=_;24=_;25=_;26=_;27=_;28=_;29=_;30=_;31=_;
> 
> [domains 2-31 omitted for clarity below]
> 
> # echo 'test/g14/1-t' > info/L3_MON/mbm_control
> # grep 'g1[45]' info/L3_MON/mbm_control
> test/g14/0=tl;1=l;
> test/g15/0=_;1=_;
> 
> # echo "test/g15/1+t" > info/L3_MON/mbm_control
> # grep 'g1[45]' info/L3_MON/mbm_control
> test/g14/0=tl;1=_;
> test/g15/0=_;1=_;
> 
> 
> -Peter
> 

-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 15/20] x86/resctrl: Assign/unassign counters by default when ABMC is enabled
  2024-07-26 23:22   ` Peter Newman
@ 2024-07-26 23:57     ` Moger, Babu
  0 siblings, 0 replies; 95+ messages in thread
From: Moger, Babu @ 2024-07-26 23:57 UTC (permalink / raw)
  To: Peter Newman, Babu Moger
  Cc: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen,
	x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	maciej.wieczor-retman, linux-doc, linux-kernel, eranian,
	james.morse

Hi Peter,

On 7/26/2024 6:22 PM, Peter Newman wrote:
> Hi Babu,
> 
> On Wed, Jul 3, 2024 at 2:50 PM Babu Moger <babu.moger@amd.com> wrote:
> 
>> @@ -3894,6 +3956,17 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
>>          if (ret)
>>                  goto out_closid_free;
>>
>> +       /*
>> +        * Assign the counters if ABMC is already enabled.
>> +        * With the limited number of counters, there can be cases
>> +        * only on assignment succeed. It is not required to fail
>> +        * here in that case. Users have the option to assign the
>> +        * counter later.
>> +        */
>> +
>> +       if (rdtgroup_assign_cntrs(rdtgrp) < 0)
>> +               rdt_last_cmd_puts("Monitor assignment failed\n");
> 
> Supposing rdtgroup_init_alloc() below fails, would you want to release
> the counters allocated here?

Yes. Sure. Fix it in v6.
> 
>> +
>>          kernfs_activate(rdtgrp->kn);
>>
>>          ret = rdtgroup_init_alloc(rdtgrp);
> 
> -Peter
> 

-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 20/20] x86/resctrl: Introduce interface to modify assignment states of the groups
  2024-07-25 17:28         ` Moger, Babu
@ 2024-08-01 18:56           ` Reinette Chatre
  2024-08-01 19:40             ` Moger, Babu
  0 siblings, 1 reply; 95+ messages in thread
From: Reinette Chatre @ 2024-08-01 18:56 UTC (permalink / raw)
  To: babu.moger, Peter Newman
  Cc: corbet, fenghua.yu, tglx, mingo, bp, dave.hansen, x86, hpa,
	paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	maciej.wieczor-retman, linux-doc, linux-kernel, eranian,
	james.morse

Hi Babu and Peter,

On 7/25/24 10:28 AM, Moger, Babu wrote:
> Hi Peter,
> 
> On 7/25/24 12:11, Peter Newman wrote:
>> Hi Babu,
>>
>> On Wed, Jul 24, 2024 at 6:23 PM Moger, Babu <bmoger@amd.com> wrote:
>>>
>>> Hi Peter,
>>>
>>> On 7/24/2024 7:03 PM, Peter Newman wrote:
>>>> Hi Babu,
>>>>
>>>> On Wed, Jul 3, 2024 at 2:51 PM Babu Moger <babu.moger@amd.com> wrote:
>>>>>
>>>>> Introduce the interface to enable events in ABMC mode.
>>>>>
>>>>> Events can be enabled or disabled by writing to file
>>>>> /sys/fs/resctrl/info/L3_MON/mbm_control
>>>>>
>>>>> Format is similar to the list format with addition of op-code for the
>>>>> assignment operation.
>>>>>    "<CTRL_MON group>/<MON group>/<op-code><flags>"
>>>>>
>>>>> Format for specific type of groups:
>>>>>
>>>>>    * Default CTRL_MON group:
>>>>>            "//<domain_id><op-code><flags>"
>>>>>
>>>>>    * Non-default CTRL_MON group:
>>>>>            "<CTRL_MON group>//<domain_id><op-code><flags>"
>>>>>
>>>>>    * Child MON group of default CTRL_MON group:
>>>>>            "/<MON group>/<domain_id><op-code><flags>"
>>>>>
>>>>>    * Child MON group of non-default CTRL_MON group:
>>>>>            "<CTRL_MON group>/<MON group>/<domain_id><op-code><flags>"
>>>>
>>>> Just a reminder, Reinette and I had discussed[1] omitting the
>>>> domain_id for performing the same operation on all domains.
>>>
>>> Yes. I remember. Lets refresh our memory.
>>>>
>>>> I would really appreciate this, otherwise our most typical operations
>>>> could be really tedious and needlessly serialized.
>>>
>>>>
>>>> # cat mbm_control
>>>> //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl;12=tl;13=tl;14=tl;15=tl;16=tl;17=tl;18=tl;19=tl;20=tl;21=tl;22=tl;23=tl;24=tl;25=tl;26=tl;27=tl;28=tl;29=tl;30=tl;31=tl;
>>>> # echo '//-l' > mbm_control
>>>
>>> What is the expectation here?
>>> You want to unassign local event on all the domains?
>>
>> Correct.
>>
>>>
>>> Domain id makes it easy to parse the command. Without that it parsing
>>> code becomes  messy.
>>>
>>> How about something like this? We can use the max domain id to mean all
>>> the domains. In the above case there are 32 domains(0-31). 32 is total
>>> number of domains. We can get that details looking through all the
>>> domains. We can print that detail when we list it.
>>
>> This sounds like only a minor simplification to the parsing code. It
>> seems like it would be easy to determine if the final '/' is
>> immediately followed by an opcode (+-=_) rather than a number.
> 
> Ok. Will try to get that working. Will let you know if there are
> complexities with that.--
> Thanks
> Babu Moger

Dave suggested [1]  "*" to indicate "all domains". This seems an intuitive
addition to the interface to accomplish this goal.

Reinette

[1] https://lore.kernel.org/lkml/ZierjRNDMfg5swT8@e133380.arm.com/

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 20/20] x86/resctrl: Introduce interface to modify assignment states of the groups
  2024-08-01 18:56           ` Reinette Chatre
@ 2024-08-01 19:40             ` Moger, Babu
  0 siblings, 0 replies; 95+ messages in thread
From: Moger, Babu @ 2024-08-01 19:40 UTC (permalink / raw)
  To: Reinette Chatre, Peter Newman
  Cc: corbet, fenghua.yu, tglx, mingo, bp, dave.hansen, x86, hpa,
	paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	maciej.wieczor-retman, linux-doc, linux-kernel, eranian,
	james.morse

Hi Reinette,

On 8/1/24 13:56, Reinette Chatre wrote:
> Hi Babu and Peter,
> 
> On 7/25/24 10:28 AM, Moger, Babu wrote:
>> Hi Peter,
>>
>> On 7/25/24 12:11, Peter Newman wrote:
>>> Hi Babu,
>>>
>>> On Wed, Jul 24, 2024 at 6:23 PM Moger, Babu <bmoger@amd.com> wrote:
>>>>
>>>> Hi Peter,
>>>>
>>>> On 7/24/2024 7:03 PM, Peter Newman wrote:
>>>>> Hi Babu,
>>>>>
>>>>> On Wed, Jul 3, 2024 at 2:51 PM Babu Moger <babu.moger@amd.com> wrote:
>>>>>>
>>>>>> Introduce the interface to enable events in ABMC mode.
>>>>>>
>>>>>> Events can be enabled or disabled by writing to file
>>>>>> /sys/fs/resctrl/info/L3_MON/mbm_control
>>>>>>
>>>>>> Format is similar to the list format with addition of op-code for the
>>>>>> assignment operation.
>>>>>>    "<CTRL_MON group>/<MON group>/<op-code><flags>"
>>>>>>
>>>>>> Format for specific type of groups:
>>>>>>
>>>>>>    * Default CTRL_MON group:
>>>>>>            "//<domain_id><op-code><flags>"
>>>>>>
>>>>>>    * Non-default CTRL_MON group:
>>>>>>            "<CTRL_MON group>//<domain_id><op-code><flags>"
>>>>>>
>>>>>>    * Child MON group of default CTRL_MON group:
>>>>>>            "/<MON group>/<domain_id><op-code><flags>"
>>>>>>
>>>>>>    * Child MON group of non-default CTRL_MON group:
>>>>>>            "<CTRL_MON group>/<MON group>/<domain_id><op-code><flags>"
>>>>>
>>>>> Just a reminder, Reinette and I had discussed[1] omitting the
>>>>> domain_id for performing the same operation on all domains.
>>>>
>>>> Yes. I remember. Lets refresh our memory.
>>>>>
>>>>> I would really appreciate this, otherwise our most typical operations
>>>>> could be really tedious and needlessly serialized.
>>>>
>>>>>
>>>>> # cat mbm_control
>>>>> //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl;12=tl;13=tl;14=tl;15=tl;16=tl;17=tl;18=tl;19=tl;20=tl;21=tl;22=tl;23=tl;24=tl;25=tl;26=tl;27=tl;28=tl;29=tl;30=tl;31=tl;
>>>>> # echo '//-l' > mbm_control
>>>>
>>>> What is the expectation here?
>>>> You want to unassign local event on all the domains?
>>>
>>> Correct.
>>>
>>>>
>>>> Domain id makes it easy to parse the command. Without that it parsing
>>>> code becomes  messy.
>>>>
>>>> How about something like this? We can use the max domain id to mean all
>>>> the domains. In the above case there are 32 domains(0-31). 32 is total
>>>> number of domains. We can get that details looking through all the
>>>> domains. We can print that detail when we list it.
>>>
>>> This sounds like only a minor simplification to the parsing code. It
>>> seems like it would be easy to determine if the final '/' is
>>> immediately followed by an opcode (+-=_) rather than a number.
>>
>> Ok. Will try to get that working. Will let you know if there are
>> complexities with that.--
>> Thanks
>> Babu Moger
> 
> Dave suggested [1]  "*" to indicate "all domains". This seems an intuitive
> addition to the interface to accomplish this goal.

Yes. that is correct. Will try to address that in v6.

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 09/20] x86/resctrl: Initialize monitor counters bitmap
  2024-07-26 22:48   ` Peter Newman
  2024-07-26 23:53     ` Moger, Babu
@ 2024-08-01 21:05     ` Reinette Chatre
  1 sibling, 0 replies; 95+ messages in thread
From: Reinette Chatre @ 2024-08-01 21:05 UTC (permalink / raw)
  To: Peter Newman, Babu Moger
  Cc: corbet, fenghua.yu, tglx, mingo, bp, dave.hansen, x86, hpa,
	paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	maciej.wieczor-retman, linux-doc, linux-kernel, eranian,
	james.morse

Hi Peter,

On 7/26/24 3:48 PM, Peter Newman wrote:
> Hi Babu,
> 
> On Wed, Jul 3, 2024 at 2:50 PM Babu Moger <babu.moger@amd.com> wrote:
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 4f47f52e01c2..b3d3fa048f15 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -185,6 +185,23 @@ bool closid_allocated(unsigned int closid)
>>          return !test_bit(closid, &closid_free_map);
>>   }
>>
>> +/*
>> + * Counter bitmap and its length for tracking available counters.
>> + * ABMC feature provides set of hardware counters for enabling events.
>> + * Each event takes one hardware counter. Kernel needs to keep track
>> + * of number of available counters.
>> + */
>> +static unsigned long mbm_cntrs_free_map;
>> +static unsigned int mbm_cntrs_free_map_len;
> 
> If counter assignment is supported at a per-domain granularity, then
> counter id allocation needs to be done per-domain rather than
> globally. For example, if I free a counter from one group in a

It is not obvious to me that counter assignment supported per-domain
requires allocation per-domain. I think this may get complicated when
resources are monitored with one counter when tasks run in one domain
and another counter when the same tasks run in another domain.

> particular domain, it should be available to allocate to another group
> only in that domain.
> 
> When I attempt this using the current series, the resulting behavior
> is quite interesting. I noticed Reinette also commented on this later
> in the series, noticing that counters are only allocated permanently
> to groups and never move as a result of writing to mbm_control.

As I understand this is separate from how the counter is allocated, but instead
just a gap in current implementation of intended interface.

> 
> # grep 'g1[45]' info/L3_MON/mbm_control
> test/g14/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl;12=tl;13=tl;14=tl;15=tl;16=tl;17=tl;18=tl;19=tl;20=tl;21=tl;22=tl;23=tl;24=tl;25=tl;26=tl;27=tl;28=tl;29=tl;30=tl;31=tl;
> test/g15/0=_;1=_;2=_;3=_;4=_;5=_;6=_;7=_;8=_;9=_;10=_;11=_;12=_;13=_;14=_;15=_;16=_;17=_;18=_;19=_;20=_;21=_;22=_;23=_;24=_;25=_;26=_;27=_;28=_;29=_;30=_;31=_;
> 
> [domains 2-31 omitted for clarity below]
> 
> # echo 'test/g14/1-t' > info/L3_MON/mbm_control
> # grep 'g1[45]' info/L3_MON/mbm_control
> test/g14/0=tl;1=l;
> test/g15/0=_;1=_;
> 
> # echo "test/g15/1+t" > info/L3_MON/mbm_control
> # grep 'g1[45]' info/L3_MON/mbm_control
> test/g14/0=tl;1=_;
> test/g15/0=_;1=_;

Thank you very much for trying this out.

Reinette



^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 19/20] x86/resctrl: Introduce interface to list monitor states of all the groups
  2024-07-17 15:22     ` Moger, Babu
@ 2024-08-01 21:37       ` Reinette Chatre
  2024-08-02 16:10         ` Moger, Babu
  0 siblings, 1 reply; 95+ messages in thread
From: Reinette Chatre @ 2024-08-01 21:37 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 7/17/24 8:22 AM, Moger, Babu wrote:
> On 7/12/24 17:16, Reinette Chatre wrote:
>> On 7/3/24 2:48 PM, Babu Moger wrote:

>>> +     */
>>> +    if (rdtgrp->mon.cntr_id[0] != MON_CNTR_UNSET)
>>> +        if (!rdtgroup_abmc_dom_state(d, rdtgrp->mon.cntr_id[0],
>>> rdtgrp->mon.rmid))
>>> +            dom_state |= ASSIGN_TOTAL;
>>> +
>>> +    if (rdtgrp->mon.cntr_id[1] != MON_CNTR_UNSET)
>>> +        if (!rdtgroup_abmc_dom_state(d, rdtgrp->mon.cntr_id[1],
>>> rdtgrp->mon.rmid))
>>> +            dom_state |= ASSIGN_LOCAL;
>>> +
>>> +    switch (dom_state) {
>>> +    case ASSIGN_NONE:
>>> +        *tmp++ = '_';
>>> +        break;
>>> +    case (ASSIGN_TOTAL | ASSIGN_LOCAL):
>>> +        *tmp++ = 't';
>>> +        *tmp++ = 'l';
>>> +        break;
>>> +    case ASSIGN_TOTAL:
>>> +        *tmp++ = 't';
>>> +        break;
>>> +    case ASSIGN_LOCAL:
>>> +        *tmp++ = 'l';
>>> +        break;
>>> +    default:
>>> +        break;
>>> +    }
>>
>> This switch statement does not scale. Adding new flags will be painful.
>> Can flags not
>> just incrementally be printed as learned from hardware with "_" printed as
>> last resort?
>> This would elimininate need for these "ASSIGN" flags.
> 
> Let me try to understand this.
> 
> You want to remove switch statement.
> 
> if (rdtgrp->mon.cntr_id[0] != MON_CNTR_UNSET)
>     if (!rdtgroup_abmc_dom_state(d, rdtgrp->mon.cntr_id[0], rdtgrp->mon.rmid))
>      *tmp++ = 't';
> 
> if (rdtgrp->mon.cntr_id[1] != MON_CNTR_UNSET)
>     if (!rdtgroup_abmc_dom_state(d, rdtgrp->mon.cntr_id[1], rdtgrp->mon.rmid))
>     *tmp++ = 'l';
> 
> If none of these flags are available, then
>     *tmp++ = '_';
> 
> Is that the idea?

Indeed. Thank you. Can this be done without hard coding the counter index?

Reinette




^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 18/20] x86/resctrl: Enable AMD ABMC feature by default when supported
  2024-07-26  0:16       ` Moger, Babu
@ 2024-08-01 21:40         ` Reinette Chatre
  0 siblings, 0 replies; 95+ messages in thread
From: Reinette Chatre @ 2024-08-01 21:40 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 7/25/24 5:16 PM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 7/16/2024 6:23 PM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 7/12/2024 5:15 PM, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 7/3/24 2:48 PM, Babu Moger wrote:
>>>> Enable ABMC by default when supported during the boot up.
>>>>
>>>> Users will not see any difference in the behavior when resctrl is
>>>> mounted. With automatic assignment everything will work as running
>>>> in the legacy monitor mode.
>>>>
>>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>>>> ---
>>>> v5: New patch to enable ABMC by default.
>>>> ---
>>>>   arch/x86/kernel/cpu/resctrl/core.c     |  2 ++
>>>>   arch/x86/kernel/cpu/resctrl/internal.h |  1 +
>>>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 17 +++++++++++++++++
>>>>   3 files changed, 20 insertions(+)
>>>>
>>>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>>>> index 6265ef8b610f..b69b2650bde3 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>>>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>>>> @@ -599,6 +599,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
>>>>           d = container_of(hdr, struct rdt_mon_domain, hdr);
>>>>           cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
>>>> +        resctrl_arch_configure_abmc();
>>>>           return;
>>>>       }
>>>> @@ -620,6 +621,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
>>>>       arch_mon_domain_online(r, d);
>>>>       resctrl_arch_mbm_evt_config(hw_dom);
>>>> +    resctrl_arch_configure_abmc();
>>>>       if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
>>>>           mon_domain_free(hw_dom);
>>>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>>>> index beb005775fe4..0f858cff8ab1 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>>>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>>>> @@ -707,6 +707,7 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>>>>   void __init resctrl_file_fflags_init(const char *config,
>>>>                        unsigned long fflags);
>>>>   void resctrl_arch_mbm_evt_config(struct rdt_hw_mon_domain *hw_dom);
>>>> +void resctrl_arch_configure_abmc(void);
>>>>   unsigned int mon_event_config_index_get(u32 evtid);
>>>>   int resctrl_arch_assign_cntr(struct rdt_mon_domain *d, u32 evtid, u32 rmid,
>>>>                    u32 cntr_id, u32 closid, bool enable);
>>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> index 531233779f8d..d978668c8865 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> @@ -2733,6 +2733,23 @@ void resctrl_arch_abmc_disable(void)
>>>>       }
>>>>   }
>>>> +void resctrl_arch_configure_abmc(void)
>>>> +{
>>>> +    struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>>>> +    struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>>>> +    bool enable = true;
>>>> +
>>>> +    mutex_lock(&rdtgroup_mutex);
>>>> +
>>>> +    if (r->mon.abmc_capable) {
>>>> +        if (!hw_res->abmc_enabled)
>>>> +            hw_res->abmc_enabled = true;
>>>> +        resctrl_abmc_set_one_amd(&enable);
>>>> +    }
>>>
>>> This does not look right. It is not architecture code that needs to
>>> decide if this feature is enabled or not, right? The feature is enabled
>>> via fs (for example when user writes to mbm_mode). If the default is
>>> enabled then it should be set by fs. resctrl_arch_configure_abmc()
>>> then checks if feature is capable and enabled before it configures
>>> it on the CPU.
> 
> Looking at the code again, I think it is fine to do it here. This is arch initialization code. I am checking if the feature is available and enable it by default. The fs code is not initialized yet at this stage.
> 
> Other option is to move everything to rdt_enable_ctx which is during the mount time.
> 
> I will keep it as is now. We can discuss more on this in v6.

I will take a look at v6. At this time I do still believe that this should be
controlled only by fs code.

Reinette

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2024-07-17 17:19   ` Moger, Babu
@ 2024-08-01 21:49     ` Reinette Chatre
  2024-08-01 22:45       ` Peter Newman
  0 siblings, 1 reply; 95+ messages in thread
From: Reinette Chatre @ 2024-08-01 21:49 UTC (permalink / raw)
  To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Babu,

On 7/17/24 10:19 AM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 7/12/24 17:03, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 7/3/24 2:48 PM, Babu Moger wrote:
>>> # Linux Implementation
>>>
>>> Linux resctrl subsystem provides the interface to count maximum of two
>>> memory bandwidth events per group, from a combination of available total
>>> and local events. Keeping the current interface, users can enable a maximum
>>> of 2 ABMC counters per group. User will also have the option to enable only
>>> one counter to the group. If the system runs out of assignable ABMC
>>> counters, kernel will display an error. Users need to disable an already
>>> enabled counter to make space for new assignments.
>>
>> The implementation appears to be converging on an interface that can
>> be generic enough to be used by other features discussed along the way.
>> "Linux implementation" summary can thus add:
>>
>>      Create a generic interface aimed to support user space assignment
>>      of scarce counters used for monitoring. First usage of interface
>>      is by ABMC with option to expand usage to "soft-RMID" and MPAM
>>      counters in future.
> 
> Sure.
> 
>>
>>
>>> # Examples
>>>
>>> a. Check if ABMC support is available
>>>      #mount -t resctrl resctrl /sys/fs/resctrl/
>>>
>>>      #cat /sys/fs/resctrl/info/L3_MON/mbm_mode
>>>      [abmc]
>>>      legacy
>>>
>>>      Linux kernel detected ABMC feature and it is enabled.
>>
>> How about renaming "abmc" to "mbm_cntrs"? This will match the num_mbm_cntrs
>> info file and be the final step to make this generic so that another
>> architecture
>> can more easily support assignining hardware counters without needing to call
>> the feature AMD's "abmc".
> 
> I think we aleady settled this with "mbm_cntr_assignable".
> 
> For soft-RMID" it will be mbm_sw_assignable.

Maybe getting a bit long but how about "mbm_cntr_sw_assignable" to match
with the term "mbm_cntr" in accompanying "num_mbm_cntrs"?

>> Expanding on this it may be possible to add a new "sw_mbm_cntrs" feature that
>> will be the "soft-RMID" feature while also reflecting the "mbm_cntrs" name
>> so that when user space enables that feature its properties can be found in
>> "num_mbm_cntrs".
>>
>> The "abmc" kernel parameter remains but that does seem separate from this
>> resctrl fs feature since it is explicitly tied to X86_FEATURE_ABMC surely
>> making it architecture specific.
>>
>>>
>>> b. Check how many ABMC counters are available.
>>>
>>>      #cat /sys/fs/resctrl/info/L3_MON/num_cntrs
>>>      32
>>
>> This is now num_mbm_cntrs
> 
> Sure.
> 
>>
>>>
>>> c. Create few resctrl groups.
>>>
>>>      # mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
>>>      # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
>>>      # mkdir
>>> /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
>>>
>>>
>>> d. This series adds a new interface file
>>> /sys/fs/resctrl/info/L3_MON/mbm_control
>>>      to list and modify the group's monitoring states. File provides
>>> single place
>>>      to list monitoring states of all the resctrl groups. It makes it
>>> easier for
>>>      user space to learn about the counters are used without needing to
>>> traverse
>>>      all the groups thus reducing the number of filesystem calls.
>>>
>>>      The list follows the following format:
>>>
>>>      "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>>>
>>>      Format for specific type of groups:
>>>
>>>      * Default CTRL_MON group:
>>>       "//<domain_id>=<flags>"
>>>
>>>          * Non-default CTRL_MON group:
>>>                  "<CTRL_MON group>//<domain_id>=<flags>"
>>>
>>>          * Child MON group of default CTRL_MON group:
>>>                  "/<MON group>/<domain_id>=<flags>"
>>>
>>>          * Child MON group of non-default CTRL_MON group:
>>>                  "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>>>
>>>          Flags can be one of the following:
>>>
>>>           t  MBM total event is enabled.
>>>           l  MBM local event is enabled.
>>>           tl Both total and local MBM events are enabled.
>>>           _  None of the MBM events are enabled
>>
>> The language needs to be changed here (and in the many copied places) to
>> be specific about what setting the flag accomplishes. For example, in
>> "legacy" mode user space can be expected to find all events enabled, no?
>> Needing a new feature to set a flag to accomplish something that is
>> possible in legacy mode can thus cause confusion.
> 
> Yes. It is possible to do it. But I feel unnessassary.
> 
>>
>> If I understand the implementation reading "mbm_control" will fail
>> if system is ABMC capable but it is disabled. Why can "mbm_control" not
>> always be displayed to user space? For example, what if "mbm_control" is
>> always available to user space and it can provide specific information to
>> user space. For example:
>>      t  MBM total event is enabled but may not always be counted.
>>      T  MBM total event is enabled and being counted.
>>
>> On AMD systems resource groups will have "t" associated with monitor
>> groups when ABMC disabled, "T" when ABMC enabled and a counter assigned.
>> On Intel systems monitor groups will always have "T".
> 
> I think more flags will add more confusion.
> 
>>
>> For "soft-RMID" the flag could possible continue to be "T"?
>>
>> I am trying to find ways to communicate to user space consistently
>> and clearly and any insights will be appreciated. We really do not want
>> to add this interface and then find that it just causes confusion.
>>
>> It is not quite obvious to me when the new files should be visible and
>> what they should present to the user. "mbm_mode" is now always visible.
>> Should "num_mbm_cntrs" not also always be visible? Right now "num_mbm_cntrs"
>> appears to be only associated to ABMC, should it not also, for example,
>> be the file that "soft-RMID" may use to share how many counters are
>> available? Its contents will thus be dynamic based on which "MBM mode" is
>> active, begging the question, what should it contain when "legacy" mode is
>> enabled, should "num_mbm_cntrs" perhaps show "0" to user space when
>> "legacy" mode is active?
> 
> Its good we have this discussion.
> 
> How about we go with simple way for now. The mbm_mode will only available
> when ABMC or Soft_RMID(MPAM feature) is supported. Same way for the
> num_mbm_cntrs.

If ABMC or Soft_RMID is supported then user can still enable "legacy" instead.
What will num_mbm_cntrs and mbm_control display when user enables
"legacy"?

Reinette

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2024-08-01 21:49     ` Reinette Chatre
@ 2024-08-01 22:45       ` Peter Newman
  2024-08-02 16:13         ` Reinette Chatre
  0 siblings, 1 reply; 95+ messages in thread
From: Peter Newman @ 2024-08-01 22:45 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen, x86,
	hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	maciej.wieczor-retman, linux-doc, linux-kernel, eranian,
	james.morse

Hi Reinette and Babu,

On Thu, Aug 1, 2024 at 2:50 PM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> Hi Babu,
>
> On 7/17/24 10:19 AM, Moger, Babu wrote:
> > Hi Reinette,
> >
> > On 7/12/24 17:03, Reinette Chatre wrote:
> >> Hi Babu,
> >>
> >> On 7/3/24 2:48 PM, Babu Moger wrote:
> >>> # Linux Implementation
> >>>
> >>> Linux resctrl subsystem provides the interface to count maximum of two
> >>> memory bandwidth events per group, from a combination of available total
> >>> and local events. Keeping the current interface, users can enable a maximum
> >>> of 2 ABMC counters per group. User will also have the option to enable only
> >>> one counter to the group. If the system runs out of assignable ABMC
> >>> counters, kernel will display an error. Users need to disable an already
> >>> enabled counter to make space for new assignments.
> >>
> >> The implementation appears to be converging on an interface that can
> >> be generic enough to be used by other features discussed along the way.
> >> "Linux implementation" summary can thus add:
> >>
> >>      Create a generic interface aimed to support user space assignment
> >>      of scarce counters used for monitoring. First usage of interface
> >>      is by ABMC with option to expand usage to "soft-RMID" and MPAM
> >>      counters in future.
> >
> > Sure.
> >
> >>
> >>
> >>> # Examples
> >>>
> >>> a. Check if ABMC support is available
> >>>      #mount -t resctrl resctrl /sys/fs/resctrl/
> >>>
> >>>      #cat /sys/fs/resctrl/info/L3_MON/mbm_mode
> >>>      [abmc]
> >>>      legacy
> >>>
> >>>      Linux kernel detected ABMC feature and it is enabled.
> >>
> >> How about renaming "abmc" to "mbm_cntrs"? This will match the num_mbm_cntrs
> >> info file and be the final step to make this generic so that another
> >> architecture
> >> can more easily support assignining hardware counters without needing to call
> >> the feature AMD's "abmc".
> >
> > I think we aleady settled this with "mbm_cntr_assignable".
> >
> > For soft-RMID" it will be mbm_sw_assignable.
>
> Maybe getting a bit long but how about "mbm_cntr_sw_assignable" to match
> with the term "mbm_cntr" in accompanying "num_mbm_cntrs"?

My users are pushing for a consistent interface regardless of whether
counter assignment is implemented in hardware or software, so I would
like to avoid exposing implementation differences in the interface
where possible.

The main semantic difference with SW assignments is that it is not
possible to assign counters to individual events. Because the
implementation is assigning RMIDs to groups, assignment results in all
events being counted.

I was considering introducing a boolean mbm_assign_events node to
indicate whether assigning individual events is supported. If true,
num_mbm_cntrs indicates the number of events which can be counted,
otherwise it indicates the number of groups to which counters can be
assigned and attempting to assign a single event is silently upgraded
to assigning counters to all events in the group.

However, If we don't expect to see these semantics in any other
implementation, these semantics could be implicit in the definition of
a SW assignable counter.

-Peter

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 19/20] x86/resctrl: Introduce interface to list monitor states of all the groups
  2024-08-01 21:37       ` Reinette Chatre
@ 2024-08-02 16:10         ` Moger, Babu
  0 siblings, 0 replies; 95+ messages in thread
From: Moger, Babu @ 2024-08-02 16:10 UTC (permalink / raw)
  To: Reinette Chatre, babu.moger, corbet, fenghua.yu, tglx, mingo, bp,
	dave.hansen
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, james.morse

Hi Reinette,

On 8/1/2024 4:37 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 7/17/24 8:22 AM, Moger, Babu wrote:
>> On 7/12/24 17:16, Reinette Chatre wrote:
>>> On 7/3/24 2:48 PM, Babu Moger wrote:
> 
>>>> +     */
>>>> +    if (rdtgrp->mon.cntr_id[0] != MON_CNTR_UNSET)
>>>> +        if (!rdtgroup_abmc_dom_state(d, rdtgrp->mon.cntr_id[0],
>>>> rdtgrp->mon.rmid))
>>>> +            dom_state |= ASSIGN_TOTAL;
>>>> +
>>>> +    if (rdtgrp->mon.cntr_id[1] != MON_CNTR_UNSET)
>>>> +        if (!rdtgroup_abmc_dom_state(d, rdtgrp->mon.cntr_id[1],
>>>> rdtgrp->mon.rmid))
>>>> +            dom_state |= ASSIGN_LOCAL;
>>>> +
>>>> +    switch (dom_state) {
>>>> +    case ASSIGN_NONE:
>>>> +        *tmp++ = '_';
>>>> +        break;
>>>> +    case (ASSIGN_TOTAL | ASSIGN_LOCAL):
>>>> +        *tmp++ = 't';
>>>> +        *tmp++ = 'l';
>>>> +        break;
>>>> +    case ASSIGN_TOTAL:
>>>> +        *tmp++ = 't';
>>>> +        break;
>>>> +    case ASSIGN_LOCAL:
>>>> +        *tmp++ = 'l';
>>>> +        break;
>>>> +    default:
>>>> +        break;
>>>> +    }
>>>
>>> This switch statement does not scale. Adding new flags will be painful.
>>> Can flags not
>>> just incrementally be printed as learned from hardware with "_" 
>>> printed as
>>> last resort?
>>> This would elimininate need for these "ASSIGN" flags.
>>
>> Let me try to understand this.
>>
>> You want to remove switch statement.
>>
>> if (rdtgrp->mon.cntr_id[0] != MON_CNTR_UNSET)
>>     if (!rdtgroup_abmc_dom_state(d, rdtgrp->mon.cntr_id[0], 
>> rdtgrp->mon.rmid))
>>      *tmp++ = 't';
>>
>> if (rdtgrp->mon.cntr_id[1] != MON_CNTR_UNSET)
>>     if (!rdtgroup_abmc_dom_state(d, rdtgrp->mon.cntr_id[1], 
>> rdtgrp->mon.rmid))
>>     *tmp++ = 'l';
>>
>> If none of these flags are available, then
>>     *tmp++ = '_';
>>
>> Is that the idea?
> 
> Indeed. Thank you. Can this be done without hard coding the counter index?

Yes. We can do that.

-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2024-08-01 22:45       ` Peter Newman
@ 2024-08-02 16:13         ` Reinette Chatre
  2024-08-02 18:49           ` Moger, Babu
  2024-08-02 18:49           ` Peter Newman
  0 siblings, 2 replies; 95+ messages in thread
From: Reinette Chatre @ 2024-08-02 16:13 UTC (permalink / raw)
  To: Peter Newman
  Cc: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen, x86,
	hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	maciej.wieczor-retman, linux-doc, linux-kernel, eranian,
	james.morse

Hi Peter,

On 8/1/24 3:45 PM, Peter Newman wrote:
> On Thu, Aug 1, 2024 at 2:50 PM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>> On 7/17/24 10:19 AM, Moger, Babu wrote:
>>> On 7/12/24 17:03, Reinette Chatre wrote:
>>>> On 7/3/24 2:48 PM, Babu Moger wrote:

>>>>> # Examples
>>>>>
>>>>> a. Check if ABMC support is available
>>>>>       #mount -t resctrl resctrl /sys/fs/resctrl/
>>>>>
>>>>>       #cat /sys/fs/resctrl/info/L3_MON/mbm_mode
>>>>>       [abmc]
>>>>>       legacy
>>>>>
>>>>>       Linux kernel detected ABMC feature and it is enabled.
>>>>
>>>> How about renaming "abmc" to "mbm_cntrs"? This will match the num_mbm_cntrs
>>>> info file and be the final step to make this generic so that another
>>>> architecture
>>>> can more easily support assignining hardware counters without needing to call
>>>> the feature AMD's "abmc".
>>>
>>> I think we aleady settled this with "mbm_cntr_assignable".
>>>
>>> For soft-RMID" it will be mbm_sw_assignable.
>>
>> Maybe getting a bit long but how about "mbm_cntr_sw_assignable" to match
>> with the term "mbm_cntr" in accompanying "num_mbm_cntrs"?
> 
> My users are pushing for a consistent interface regardless of whether
> counter assignment is implemented in hardware or software, so I would
> like to avoid exposing implementation differences in the interface
> where possible.

This seems a reasonable ask but can we be confident that if hardware
supports assignable counters then there will never be a reason to use
software assignable counters? (This needs to also consider how/if Arm
may use this feature.)

I am of course not familiar with details of the software implementation
- could there be benefits to using it even if hardware counters are
supported?

What I would like to avoid is future complexity of needing a new mount/config
option that user space needs to use to select if a single "mbm_cntr_assignable"
is backed by hardware or software.

> The main semantic difference with SW assignments is that it is not
> possible to assign counters to individual events. Because the
> implementation is assigning RMIDs to groups, assignment results in all
> events being counted.
> 
> I was considering introducing a boolean mbm_assign_events node to
> indicate whether assigning individual events is supported. If true,
> num_mbm_cntrs indicates the number of events which can be counted,
> otherwise it indicates the number of groups to which counters can be
> assigned and attempting to assign a single event is silently upgraded
> to assigning counters to all events in the group.

How were you envisioning your users using the control file ("mbm_control")
in these scenarios? Does this file's interface even work for SW assignment
scenarios?

Users should expect consistent interface for "mbm_control" also.

It sounds to me that a potential "mbm_assign_events" will be false for SW
assignments. That would mean that "num_mbm_cntrs" will
contain the number of groups to which counters can be assigned?
Would user space be required to always enable all flags (enable all events) of
all domains to the same values ... or would enabling of one flag (one event)
in one domain automatically result in all flags (all events) enabled for all
domains ... or would enabling of one flag (one event) in one domain only appear
to user space to be enabled while in reality all flags/events are actually enabled?

> However, If we don't expect to see these semantics in any other
> implementation, these semantics could be implicit in the definition of
> a SW assignable counter.

It is not clear to me how implementation differences between hardware
and software assignment can be hidden from user space. It is possible
to let user space enable individual events and then silently upgrade it
to all events. I see two options here, either "mbm_control" needs to
explicitly show this "silent upgrade" so that user space knows which
events are actually enabled, or "mbm_control" only shows flags/events enabled
from user space perspective. In the former scenario, this needs more
user space support since a generic user space cannot be confident which
flags are set after writing to "mbm_control". In the latter scenario,
meaning of "num_mbm_cntrs" becomes unclear since user space is expected
to rely on it to know which events can be enabled and if some are
actually "silently enabled" when user space still thinks it needs to be
enabled the number of available counters becomes vague.

It is not clear to me how to present hardware and software assignable
counters with a single consistent interface. Actually, what if the
"mbm_mode" is what distinguishes how counters are assigned instead of how
it is backed (hw vs sw)? What if, instead of "mbm_cntr_assignable" and
"mbm_cntr_sw_assignable" MBM modes the terms "mbm_cntr_event_assignable"
and "mbm_cntr_group_assignable" is used? Could that replace a
potential "mbm_assign_events" while also supporting user space in
interactions with "mbm_control"?

Reinette

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2024-08-02 16:13         ` Reinette Chatre
@ 2024-08-02 18:49           ` Moger, Babu
  2024-08-02 19:13             ` Peter Newman
  2024-08-02 18:49           ` Peter Newman
  1 sibling, 1 reply; 95+ messages in thread
From: Moger, Babu @ 2024-08-02 18:49 UTC (permalink / raw)
  To: Reinette Chatre, Peter Newman
  Cc: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen, x86,
	hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	maciej.wieczor-retman, linux-doc, linux-kernel, eranian,
	james.morse

Hi Peter/Reinette,

On 8/2/2024 11:13 AM, Reinette Chatre wrote:
> Hi Peter,
> 
> On 8/1/24 3:45 PM, Peter Newman wrote:
>> On Thu, Aug 1, 2024 at 2:50 PM Reinette Chatre
>> <reinette.chatre@intel.com> wrote:
>>> On 7/17/24 10:19 AM, Moger, Babu wrote:
>>>> On 7/12/24 17:03, Reinette Chatre wrote:
>>>>> On 7/3/24 2:48 PM, Babu Moger wrote:
> 
>>>>>> # Examples
>>>>>>
>>>>>> a. Check if ABMC support is available
>>>>>>       #mount -t resctrl resctrl /sys/fs/resctrl/
>>>>>>
>>>>>>       #cat /sys/fs/resctrl/info/L3_MON/mbm_mode
>>>>>>       [abmc]
>>>>>>       legacy
>>>>>>
>>>>>>       Linux kernel detected ABMC feature and it is enabled.
>>>>>
>>>>> How about renaming "abmc" to "mbm_cntrs"? This will match the 
>>>>> num_mbm_cntrs
>>>>> info file and be the final step to make this generic so that another
>>>>> architecture
>>>>> can more easily support assignining hardware counters without 
>>>>> needing to call
>>>>> the feature AMD's "abmc".
>>>>
>>>> I think we aleady settled this with "mbm_cntr_assignable".
>>>>
>>>> For soft-RMID" it will be mbm_sw_assignable.
>>>
>>> Maybe getting a bit long but how about "mbm_cntr_sw_assignable" to match
>>> with the term "mbm_cntr" in accompanying "num_mbm_cntrs"?
>>
>> My users are pushing for a consistent interface regardless of whether
>> counter assignment is implemented in hardware or software, so I would
>> like to avoid exposing implementation differences in the interface
>> where possible.
> 
> This seems a reasonable ask but can we be confident that if hardware
> supports assignable counters then there will never be a reason to use
> software assignable counters? (This needs to also consider how/if Arm
> may use this feature.)
> 
> I am of course not familiar with details of the software implementation
> - could there be benefits to using it even if hardware counters are
> supported?
> 
> What I would like to avoid is future complexity of needing a new 
> mount/config
> option that user space needs to use to select if a single 
> "mbm_cntr_assignable"
> is backed by hardware or software.
> 
>> The main semantic difference with SW assignments is that it is not
>> possible to assign counters to individual events. Because the
>> implementation is assigning RMIDs to groups, assignment results in all
>> events being counted.
>>
>> I was considering introducing a boolean mbm_assign_events node to
>> indicate whether assigning individual events is supported. If true,
>> num_mbm_cntrs indicates the number of events which can be counted,
>> otherwise it indicates the number of groups to which counters can be
>> assigned and attempting to assign a single event is silently upgraded
>> to assigning counters to all events in the group.
> 
> How were you envisioning your users using the control file ("mbm_control")
> in these scenarios? Does this file's interface even work for SW assignment
> scenarios?
> 
> Users should expect consistent interface for "mbm_control" also.
> 
> It sounds to me that a potential "mbm_assign_events" will be false for SW
> assignments. That would mean that "num_mbm_cntrs" will
> contain the number of groups to which counters can be assigned?
> Would user space be required to always enable all flags (enable all 
> events) of
> all domains to the same values ... or would enabling of one flag (one 
> event)
> in one domain automatically result in all flags (all events) enabled for 
> all
> domains ... or would enabling of one flag (one event) in one domain only 
> appear
> to user space to be enabled while in reality all flags/events are 
> actually enabled?
> 
>> However, If we don't expect to see these semantics in any other
>> implementation, these semantics could be implicit in the definition of
>> a SW assignable counter.
> 
> It is not clear to me how implementation differences between hardware
> and software assignment can be hidden from user space. It is possible
> to let user space enable individual events and then silently upgrade it
> to all events. I see two options here, either "mbm_control" needs to
> explicitly show this "silent upgrade" so that user space knows which
> events are actually enabled, or "mbm_control" only shows flags/events 
> enabled
> from user space perspective. In the former scenario, this needs more
> user space support since a generic user space cannot be confident which
> flags are set after writing to "mbm_control". In the latter scenario,
> meaning of "num_mbm_cntrs" becomes unclear since user space is expected
> to rely on it to know which events can be enabled and if some are
> actually "silently enabled" when user space still thinks it needs to be
> enabled the number of available counters becomes vague.
> 
> It is not clear to me how to present hardware and software assignable
> counters with a single consistent interface. Actually, what if the
> "mbm_mode" is what distinguishes how counters are assigned instead of how
> it is backed (hw vs sw)? What if, instead of "mbm_cntr_assignable" and
> "mbm_cntr_sw_assignable" MBM modes the terms "mbm_cntr_event_assignable"
> and "mbm_cntr_group_assignable" is used? Could that replace a
> potential "mbm_assign_events" while also supporting user space in
> interactions with "mbm_control"?

If I understand correctly, current interface might work for both the sw 
and hw assignments.

In case of SW assignment, you need to manage two counters at context 
switch time. One for total event and one for local event. Basically, you 
need to calculate delta for both events. You need to do rmid read for 
both events and then calculate the delta.

If the user assigns only one event you do the calculations only for the 
event user is interested in. That will save cycles as well. In this case 
"mbm_control" will report as one one event is assigned.

In many cases user will not interested in both the events. Also events 
are configurable so users can get what they want with just one event.

Does that make sense?

-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2024-08-02 16:13         ` Reinette Chatre
  2024-08-02 18:49           ` Moger, Babu
@ 2024-08-02 18:49           ` Peter Newman
  2024-08-02 20:38             ` Moger, Babu
  2024-08-02 20:55             ` Reinette Chatre
  1 sibling, 2 replies; 95+ messages in thread
From: Peter Newman @ 2024-08-02 18:49 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen, x86,
	hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	maciej.wieczor-retman, linux-doc, linux-kernel, eranian,
	james.morse

Hi Reinette,

On Fri, Aug 2, 2024 at 9:14 AM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> Hi Peter,
>
> On 8/1/24 3:45 PM, Peter Newman wrote:
> > On Thu, Aug 1, 2024 at 2:50 PM Reinette Chatre
> > <reinette.chatre@intel.com> wrote:
> >> On 7/17/24 10:19 AM, Moger, Babu wrote:
> >>> On 7/12/24 17:03, Reinette Chatre wrote:
> >>>> On 7/3/24 2:48 PM, Babu Moger wrote:
>
> >>>>> # Examples
> >>>>>
> >>>>> a. Check if ABMC support is available
> >>>>>       #mount -t resctrl resctrl /sys/fs/resctrl/
> >>>>>
> >>>>>       #cat /sys/fs/resctrl/info/L3_MON/mbm_mode
> >>>>>       [abmc]
> >>>>>       legacy
> >>>>>
> >>>>>       Linux kernel detected ABMC feature and it is enabled.
> >>>>
> >>>> How about renaming "abmc" to "mbm_cntrs"? This will match the num_mbm_cntrs
> >>>> info file and be the final step to make this generic so that another
> >>>> architecture
> >>>> can more easily support assignining hardware counters without needing to call
> >>>> the feature AMD's "abmc".
> >>>
> >>> I think we aleady settled this with "mbm_cntr_assignable".
> >>>
> >>> For soft-RMID" it will be mbm_sw_assignable.
> >>
> >> Maybe getting a bit long but how about "mbm_cntr_sw_assignable" to match
> >> with the term "mbm_cntr" in accompanying "num_mbm_cntrs"?
> >
> > My users are pushing for a consistent interface regardless of whether
> > counter assignment is implemented in hardware or software, so I would
> > like to avoid exposing implementation differences in the interface
> > where possible.
>
> This seems a reasonable ask but can we be confident that if hardware
> supports assignable counters then there will never be a reason to use
> software assignable counters? (This needs to also consider how/if Arm
> may use this feature.)
>
> I am of course not familiar with details of the software implementation
> - could there be benefits to using it even if hardware counters are
> supported?

I can't see any situation where the user would want to choose software
over hardware counters. The number of groups which can be monitored by
software assignable counters will always be less than with hardware,
due to the need for consuming one RMID (and the counters automatically
allocated to it by the AMD hardware) for all unassigned groups.

I consider software assignable a workaround to enable measuring
bandwidth reliably on a large number of groups on pre-ABMC AMD
hardware, or rather salvaging MBM on pre-ABMC hardware making use of
our users' effort to adapt to counter assignment in resctrl. We hope
no future implementations will choose to silently drop bandwidth
counts, so fingers crossed, the software implementation can be phased
out when these generations of AMD hardware are decommissioned.

The MPAM specification natively supports (or requires) counter
assignment in hardware. From what I recall in the last of James'
prototypes I looked at, MBM was only supported if the implementation
provided as many bandwidth counters as there were possible monitoring
groups, so that it could assume a monitor IDs for every PARTID:PMG
combination.

>
> What I would like to avoid is future complexity of needing a new mount/config
> option that user space needs to use to select if a single "mbm_cntr_assignable"
> is backed by hardware or software.

In my testing so far, automatically enabling counter assignment and
automatically allocating counters for all events in new groups works
well enough.

The only configuration I need is the ability to disable the automatic
counter allocation so that a userspace agent can have control of where
all the counters are assigned at all times. It's easy to implement
this as a simple flag if the user accepts that they need to manually
deallocate any automatically-allocated counters from groups created
before the flag was cleared.

>
> > The main semantic difference with SW assignments is that it is not
> > possible to assign counters to individual events. Because the
> > implementation is assigning RMIDs to groups, assignment results in all
> > events being counted.
> >
> > I was considering introducing a boolean mbm_assign_events node to
> > indicate whether assigning individual events is supported. If true,
> > num_mbm_cntrs indicates the number of events which can be counted,
> > otherwise it indicates the number of groups to which counters can be
> > assigned and attempting to assign a single event is silently upgraded
> > to assigning counters to all events in the group.
>
> How were you envisioning your users using the control file ("mbm_control")
> in these scenarios? Does this file's interface even work for SW assignment
> scenarios?
>
> Users should expect consistent interface for "mbm_control" also.
>
> It sounds to me that a potential "mbm_assign_events" will be false for SW
> assignments. That would mean that "num_mbm_cntrs" will
> contain the number of groups to which counters can be assigned?
> Would user space be required to always enable all flags (enable all events) of
> all domains to the same values ... or would enabling of one flag (one event)
> in one domain automatically result in all flags (all events) enabled for all
> domains ... or would enabling of one flag (one event) in one domain only appear
> to user space to be enabled while in reality all flags/events are actually enabled?

I believe mbm_control should always accurately reflect which events
are being counted.

The behavior as I've implemented today is:

# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_events
0

# cat /sys/fs/resctrl/info/L3_MON/mbm_control
test//0=_;1=_;
//0=_;1=_;

# echo "test//1+l" > /sys/fs/resctrl/info/L3_MON/mbm_control
# cat /sys/fs/resctrl/info/L3_MON/mbm_control
test//0=_;1=tl;
//0=_;1=_;

# echo "test//1-t" > /sys/fs/resctrl/info/L3_MON/mbm_control
# cat /sys/fs/resctrl/info/L3_MON/mbm_control
test//0=_;1=_;
//0=_;1=_;


>
> > However, If we don't expect to see these semantics in any other
> > implementation, these semantics could be implicit in the definition of
> > a SW assignable counter.
>
> It is not clear to me how implementation differences between hardware
> and software assignment can be hidden from user space. It is possible
> to let user space enable individual events and then silently upgrade it
> to all events. I see two options here, either "mbm_control" needs to
> explicitly show this "silent upgrade" so that user space knows which
> events are actually enabled, or "mbm_control" only shows flags/events enabled
> from user space perspective. In the former scenario, this needs more
> user space support since a generic user space cannot be confident which
> flags are set after writing to "mbm_control". In the latter scenario,
> meaning of "num_mbm_cntrs" becomes unclear since user space is expected
> to rely on it to know which events can be enabled and if some are
> actually "silently enabled" when user space still thinks it needs to be
> enabled the number of available counters becomes vague.
>
> It is not clear to me how to present hardware and software assignable
> counters with a single consistent interface. Actually, what if the
> "mbm_mode" is what distinguishes how counters are assigned instead of how
> it is backed (hw vs sw)? What if, instead of "mbm_cntr_assignable" and
> "mbm_cntr_sw_assignable" MBM modes the terms "mbm_cntr_event_assignable"
> and "mbm_cntr_group_assignable" is used? Could that replace a
> potential "mbm_assign_events" while also supporting user space in
> interactions with "mbm_control"?

If I understand this correctly, is this a preference that the info
node be named differently if its value will have different units,
rather than a second node to indicate what the value of num_mbm_cntrs
actually means? This sounds reasonable to me.

I think it's also important to note that in MPAM, the MBWU (memory
bandwidth usage) monitors don't have a concept of local versus total
bandwidth, so event assignment would likely not apply there either.
What the counted bandwidth actually represents is more implicit in the
monitor's position in the memory system in the particular
implementation. On a theoretical multi-socket system, resctrl would
require knowledge about the system's architecture to stitch together
the counts from different types of monitors to produce a local and
total value. I don't know if we'd program this SoC-specific knowledge
into the kernel to produce a unified MBM resource like we're
accustomed to now or if we'd present multiple MBM resources, each only
providing an mbm_total_bytes event. In this case, the counters would
have to be assigned separately in each MBM resource, especially if the
different MBM resources support a different number of counters.

Thanks,
-Peter

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2024-08-02 18:49           ` Moger, Babu
@ 2024-08-02 19:13             ` Peter Newman
  2024-08-02 20:23               ` Moger, Babu
  0 siblings, 1 reply; 95+ messages in thread
From: Peter Newman @ 2024-08-02 19:13 UTC (permalink / raw)
  To: babu.moger
  Cc: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen,
	x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	maciej.wieczor-retman, linux-doc, linux-kernel, eranian,
	james.morse

Hi Babu,

On Fri, Aug 2, 2024 at 11:49 AM Moger, Babu <bmoger@amd.com> wrote:
>
> Hi Peter/Reinette,
>
> On 8/2/2024 11:13 AM, Reinette Chatre wrote:
> > Hi Peter,
> >
> > On 8/1/24 3:45 PM, Peter Newman wrote:
> >> However, If we don't expect to see these semantics in any other
> >> implementation, these semantics could be implicit in the definition of
> >> a SW assignable counter.
> >
> > It is not clear to me how implementation differences between hardware
> > and software assignment can be hidden from user space. It is possible
> > to let user space enable individual events and then silently upgrade it
> > to all events. I see two options here, either "mbm_control" needs to
> > explicitly show this "silent upgrade" so that user space knows which
> > events are actually enabled, or "mbm_control" only shows flags/events
> > enabled
> > from user space perspective. In the former scenario, this needs more
> > user space support since a generic user space cannot be confident which
> > flags are set after writing to "mbm_control". In the latter scenario,
> > meaning of "num_mbm_cntrs" becomes unclear since user space is expected
> > to rely on it to know which events can be enabled and if some are
> > actually "silently enabled" when user space still thinks it needs to be
> > enabled the number of available counters becomes vague.
> >
> > It is not clear to me how to present hardware and software assignable
> > counters with a single consistent interface. Actually, what if the
> > "mbm_mode" is what distinguishes how counters are assigned instead of how
> > it is backed (hw vs sw)? What if, instead of "mbm_cntr_assignable" and
> > "mbm_cntr_sw_assignable" MBM modes the terms "mbm_cntr_event_assignable"
> > and "mbm_cntr_group_assignable" is used? Could that replace a
> > potential "mbm_assign_events" while also supporting user space in
> > interactions with "mbm_control"?
>
> If I understand correctly, current interface might work for both the sw
> and hw assignments.
>
> In case of SW assignment, you need to manage two counters at context
> switch time. One for total event and one for local event. Basically, you
> need to calculate delta for both events. You need to do rmid read for
> both events and then calculate the delta.
>
> If the user assigns only one event you do the calculations only for the
> event user is interested in. That will save cycles as well. In this case
> "mbm_control" will report as one one event is assigned.
>
> In many cases user will not interested in both the events. Also events
> are configurable so users can get what they want with just one event.
>
> Does that make sense?

I think you've confused soft-RMID for soft-ABMC. Or more likely I've
confused you by not using consistent terminology.

soft-RMIDs are simulated by reading the counters of HW RMIDs
permanently assigned to each CPU at context switch. We found the
context switch cost of this approach unacceptable.

soft-ABMC is permanently associating an RMID with the local and total
counter-pair that will be automatically associated with it when it is
first loaded into a PQR_ASSOC MSR in a domain, then using the
mbm_control interface to choose which group to associate with these
RMIDs. This does not require any context switching work. This
technique is specific to the behavior of AMD hardware.

-Peter

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2024-08-02 19:13             ` Peter Newman
@ 2024-08-02 20:23               ` Moger, Babu
  0 siblings, 0 replies; 95+ messages in thread
From: Moger, Babu @ 2024-08-02 20:23 UTC (permalink / raw)
  To: Peter Newman, babu.moger
  Cc: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen,
	x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	maciej.wieczor-retman, linux-doc, linux-kernel, eranian,
	james.morse

Hi  Peter,

On 8/2/2024 2:13 PM, Peter Newman wrote:
> Hi Babu,
> 
> On Fri, Aug 2, 2024 at 11:49 AM Moger, Babu <bmoger@amd.com> wrote:
>>
>> Hi Peter/Reinette,
>>
>> On 8/2/2024 11:13 AM, Reinette Chatre wrote:
>>> Hi Peter,
>>>
>>> On 8/1/24 3:45 PM, Peter Newman wrote:
>>>> However, If we don't expect to see these semantics in any other
>>>> implementation, these semantics could be implicit in the definition of
>>>> a SW assignable counter.
>>>
>>> It is not clear to me how implementation differences between hardware
>>> and software assignment can be hidden from user space. It is possible
>>> to let user space enable individual events and then silently upgrade it
>>> to all events. I see two options here, either "mbm_control" needs to
>>> explicitly show this "silent upgrade" so that user space knows which
>>> events are actually enabled, or "mbm_control" only shows flags/events
>>> enabled
>>> from user space perspective. In the former scenario, this needs more
>>> user space support since a generic user space cannot be confident which
>>> flags are set after writing to "mbm_control". In the latter scenario,
>>> meaning of "num_mbm_cntrs" becomes unclear since user space is expected
>>> to rely on it to know which events can be enabled and if some are
>>> actually "silently enabled" when user space still thinks it needs to be
>>> enabled the number of available counters becomes vague.
>>>
>>> It is not clear to me how to present hardware and software assignable
>>> counters with a single consistent interface. Actually, what if the
>>> "mbm_mode" is what distinguishes how counters are assigned instead of how
>>> it is backed (hw vs sw)? What if, instead of "mbm_cntr_assignable" and
>>> "mbm_cntr_sw_assignable" MBM modes the terms "mbm_cntr_event_assignable"
>>> and "mbm_cntr_group_assignable" is used? Could that replace a
>>> potential "mbm_assign_events" while also supporting user space in
>>> interactions with "mbm_control"?
>>
>> If I understand correctly, current interface might work for both the sw
>> and hw assignments.
>>
>> In case of SW assignment, you need to manage two counters at context
>> switch time. One for total event and one for local event. Basically, you
>> need to calculate delta for both events. You need to do rmid read for
>> both events and then calculate the delta.
>>
>> If the user assigns only one event you do the calculations only for the
>> event user is interested in. That will save cycles as well. In this case
>> "mbm_control" will report as one one event is assigned.
>>
>> In many cases user will not interested in both the events. Also events
>> are configurable so users can get what they want with just one event.
>>
>> Does that make sense?
> 
> I think you've confused soft-RMID for soft-ABMC. Or more likely I've
> confused you by not using consistent terminology.
> 
> soft-RMIDs are simulated by reading the counters of HW RMIDs
> permanently assigned to each CPU at context switch. We found the
> context switch cost of this approach unacceptable.
> 
> soft-ABMC is permanently associating an RMID with the local and total
> counter-pair that will be automatically associated with it when it is
> first loaded into a PQR_ASSOC MSR in a domain, then using the
> mbm_control interface to choose which group to associate with these
> RMIDs. This does not require any context switching work. This
> technique is specific to the behavior of AMD hardware.

Got it.

I assume you have not posted the patches for this yet right?

thanks

Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2024-08-02 18:49           ` Peter Newman
@ 2024-08-02 20:38             ` Moger, Babu
  2024-08-02 20:55             ` Reinette Chatre
  1 sibling, 0 replies; 95+ messages in thread
From: Moger, Babu @ 2024-08-02 20:38 UTC (permalink / raw)
  To: Peter Newman, Reinette Chatre
  Cc: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen, x86,
	hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	maciej.wieczor-retman, linux-doc, linux-kernel, eranian,
	james.morse

Hi Peter/Reinette,

On 8/2/2024 1:49 PM, Peter Newman wrote:
> Hi Reinette,
> 
> On Fri, Aug 2, 2024 at 9:14 AM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>>
>> Hi Peter,
>>
>> On 8/1/24 3:45 PM, Peter Newman wrote:
>>> On Thu, Aug 1, 2024 at 2:50 PM Reinette Chatre
>>> <reinette.chatre@intel.com> wrote:
>>>> On 7/17/24 10:19 AM, Moger, Babu wrote:
>>>>> On 7/12/24 17:03, Reinette Chatre wrote:
>>>>>> On 7/3/24 2:48 PM, Babu Moger wrote:
>>
>>>>>>> # Examples
>>>>>>>
>>>>>>> a. Check if ABMC support is available
>>>>>>>        #mount -t resctrl resctrl /sys/fs/resctrl/
>>>>>>>
>>>>>>>        #cat /sys/fs/resctrl/info/L3_MON/mbm_mode
>>>>>>>        [abmc]
>>>>>>>        legacy
>>>>>>>
>>>>>>>        Linux kernel detected ABMC feature and it is enabled.
>>>>>>
>>>>>> How about renaming "abmc" to "mbm_cntrs"? This will match the num_mbm_cntrs
>>>>>> info file and be the final step to make this generic so that another
>>>>>> architecture
>>>>>> can more easily support assignining hardware counters without needing to call
>>>>>> the feature AMD's "abmc".
>>>>>
>>>>> I think we aleady settled this with "mbm_cntr_assignable".
>>>>>
>>>>> For soft-RMID" it will be mbm_sw_assignable.
>>>>
>>>> Maybe getting a bit long but how about "mbm_cntr_sw_assignable" to match
>>>> with the term "mbm_cntr" in accompanying "num_mbm_cntrs"?
>>>
>>> My users are pushing for a consistent interface regardless of whether
>>> counter assignment is implemented in hardware or software, so I would
>>> like to avoid exposing implementation differences in the interface
>>> where possible.
>>
>> This seems a reasonable ask but can we be confident that if hardware
>> supports assignable counters then there will never be a reason to use
>> software assignable counters? (This needs to also consider how/if Arm
>> may use this feature.)
>>
>> I am of course not familiar with details of the software implementation
>> - could there be benefits to using it even if hardware counters are
>> supported?
> 
> I can't see any situation where the user would want to choose software
> over hardware counters. The number of groups which can be monitored by
> software assignable counters will always be less than with hardware,
> due to the need for consuming one RMID (and the counters automatically
> allocated to it by the AMD hardware) for all unassigned groups.
> 
> I consider software assignable a workaround to enable measuring
> bandwidth reliably on a large number of groups on pre-ABMC AMD
> hardware, or rather salvaging MBM on pre-ABMC hardware making use of
> our users' effort to adapt to counter assignment in resctrl. We hope
> no future implementations will choose to silently drop bandwidth
> counts, so fingers crossed, the software implementation can be phased
> out when these generations of AMD hardware are decommissioned.
> 
> The MPAM specification natively supports (or requires) counter
> assignment in hardware. From what I recall in the last of James'
> prototypes I looked at, MBM was only supported if the implementation
> provided as many bandwidth counters as there were possible monitoring
> groups, so that it could assume a monitor IDs for every PARTID:PMG
> combination.
> 
>>
>> What I would like to avoid is future complexity of needing a new mount/config
>> option that user space needs to use to select if a single "mbm_cntr_assignable"
>> is backed by hardware or software.
> 
> In my testing so far, automatically enabling counter assignment and
> automatically allocating counters for all events in new groups works
> well enough.
> 
> The only configuration I need is the ability to disable the automatic
> counter allocation so that a userspace agent can have control of where
> all the counters are assigned at all times. It's easy to implement
> this as a simple flag if the user accepts that they need to manually
> deallocate any automatically-allocated counters from groups created
> before the flag was cleared.
> 
>>
>>> The main semantic difference with SW assignments is that it is not
>>> possible to assign counters to individual events. Because the
>>> implementation is assigning RMIDs to groups, assignment results in all
>>> events being counted.
>>>
>>> I was considering introducing a boolean mbm_assign_events node to
>>> indicate whether assigning individual events is supported. If true,
>>> num_mbm_cntrs indicates the number of events which can be counted,
>>> otherwise it indicates the number of groups to which counters can be
>>> assigned and attempting to assign a single event is silently upgraded
>>> to assigning counters to all events in the group.
>>
>> How were you envisioning your users using the control file ("mbm_control")
>> in these scenarios? Does this file's interface even work for SW assignment
>> scenarios?
>>
>> Users should expect consistent interface for "mbm_control" also.
>>
>> It sounds to me that a potential "mbm_assign_events" will be false for SW
>> assignments. That would mean that "num_mbm_cntrs" will
>> contain the number of groups to which counters can be assigned?
>> Would user space be required to always enable all flags (enable all events) of
>> all domains to the same values ... or would enabling of one flag (one event)
>> in one domain automatically result in all flags (all events) enabled for all
>> domains ... or would enabling of one flag (one event) in one domain only appear
>> to user space to be enabled while in reality all flags/events are actually enabled?
> 
> I believe mbm_control should always accurately reflect which events
> are being counted.
> 
> The behavior as I've implemented today is:
> 
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_events
> 0
> 
> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
> test//0=_;1=_;
> //0=_;1=_;
> 
> # echo "test//1+l" > /sys/fs/resctrl/info/L3_MON/mbm_control
> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
> test//0=_;1=tl;
> //0=_;1=_;
> 
> # echo "test//1-t" > /sys/fs/resctrl/info/L3_MON/mbm_control
> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
> test//0=_;1=_;
> //0=_;1=_;

It enables/disables the events automatically("silent upgrade/degrade").
This looks good to me.

> 
> 
>>
>>> However, If we don't expect to see these semantics in any other
>>> implementation, these semantics could be implicit in the definition of
>>> a SW assignable counter.
>>
>> It is not clear to me how implementation differences between hardware
>> and software assignment can be hidden from user space. It is possible
>> to let user space enable individual events and then silently upgrade it
>> to all events. I see two options here, either "mbm_control" needs to
>> explicitly show this "silent upgrade" so that user space knows which
>> events are actually enabled, or "mbm_control" only shows flags/events enabled
>> from user space perspective. In the former scenario, this needs more
>> user space support since a generic user space cannot be confident which
>> flags are set after writing to "mbm_control". In the latter scenario,
>> meaning of "num_mbm_cntrs" becomes unclear since user space is expected
>> to rely on it to know which events can be enabled and if some are
>> actually "silently enabled" when user space still thinks it needs to be
>> enabled the number of available counters becomes vague.
>>
>> It is not clear to me how to present hardware and software assignable
>> counters with a single consistent interface. Actually, what if the
>> "mbm_mode" is what distinguishes how counters are assigned instead of how
>> it is backed (hw vs sw)? What if, instead of "mbm_cntr_assignable" and
>> "mbm_cntr_sw_assignable" MBM modes the terms "mbm_cntr_event_assignable"
>> and "mbm_cntr_group_assignable" is used? Could that replace a
>> potential "mbm_assign_events" while also supporting user space in
>> interactions with "mbm_control"?
> 
> If I understand this correctly, is this a preference that the info
> node be named differently if its value will have different units,
> rather than a second node to indicate what the value of num_mbm_cntrs
> actually means? This sounds reasonable to me.

Looks like we are agreeing with "silent upgrade/degrade" option.

"mbm_mode" will look like below(Replaced event with evt and group with grp).

#cat /sys/fs/resctrl/infor/L3_MON/mbm_mode
[mbm_cntr_evt_assignable]
mbm_cntr_grp_assignable
legacy

Does that look ok?

I am not clear on num_mbm_cntrs in case of mbm_cntr_grp_assignable.

Peter, How do you figure out how many counters are available in soft-ABMC?


> 
> I think it's also important to note that in MPAM, the MBWU (memory
> bandwidth usage) monitors don't have a concept of local versus total
> bandwidth, so event assignment would likely not apply there either.
> What the counted bandwidth actually represents is more implicit in the
> monitor's position in the memory system in the particular
> implementation. On a theoretical multi-socket system, resctrl would
> require knowledge about the system's architecture to stitch together
> the counts from different types of monitors to produce a local and
> total value. I don't know if we'd program this SoC-specific knowledge
> into the kernel to produce a unified MBM resource like we're
> accustomed to now or if we'd present multiple MBM resources, each only
> providing an mbm_total_bytes event. In this case, the counters would
> have to be assigned separately in each MBM resource, especially if the
> different MBM resources support a different number of counters.
> 
> Thanks,
> -Peter
> 

-- 
- Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2024-08-02 18:49           ` Peter Newman
  2024-08-02 20:38             ` Moger, Babu
@ 2024-08-02 20:55             ` Reinette Chatre
  2024-08-02 22:50               ` Peter Newman
  2024-08-03  0:49               ` Moger, Babu
  1 sibling, 2 replies; 95+ messages in thread
From: Reinette Chatre @ 2024-08-02 20:55 UTC (permalink / raw)
  To: Peter Newman
  Cc: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen, x86,
	hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	maciej.wieczor-retman, linux-doc, linux-kernel, eranian,
	james.morse

Hi Peter,

On 8/2/24 11:49 AM, Peter Newman wrote:
> On Fri, Aug 2, 2024 at 9:14 AM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>> On 8/1/24 3:45 PM, Peter Newman wrote:
>>> On Thu, Aug 1, 2024 at 2:50 PM Reinette Chatre
>>> <reinette.chatre@intel.com> wrote:
>>>> On 7/17/24 10:19 AM, Moger, Babu wrote:
>>>>> On 7/12/24 17:03, Reinette Chatre wrote:
>>>>>> On 7/3/24 2:48 PM, Babu Moger wrote:
>>
>>>>>>> # Examples
>>>>>>>
>>>>>>> a. Check if ABMC support is available
>>>>>>>        #mount -t resctrl resctrl /sys/fs/resctrl/
>>>>>>>
>>>>>>>        #cat /sys/fs/resctrl/info/L3_MON/mbm_mode
>>>>>>>        [abmc]
>>>>>>>        legacy
>>>>>>>
>>>>>>>        Linux kernel detected ABMC feature and it is enabled.
>>>>>>
>>>>>> How about renaming "abmc" to "mbm_cntrs"? This will match the num_mbm_cntrs
>>>>>> info file and be the final step to make this generic so that another
>>>>>> architecture
>>>>>> can more easily support assignining hardware counters without needing to call
>>>>>> the feature AMD's "abmc".
>>>>>
>>>>> I think we aleady settled this with "mbm_cntr_assignable".
>>>>>
>>>>> For soft-RMID" it will be mbm_sw_assignable.
>>>>
>>>> Maybe getting a bit long but how about "mbm_cntr_sw_assignable" to match
>>>> with the term "mbm_cntr" in accompanying "num_mbm_cntrs"?
>>>
>>> My users are pushing for a consistent interface regardless of whether
>>> counter assignment is implemented in hardware or software, so I would
>>> like to avoid exposing implementation differences in the interface
>>> where possible.
>>
>> This seems a reasonable ask but can we be confident that if hardware
>> supports assignable counters then there will never be a reason to use
>> software assignable counters? (This needs to also consider how/if Arm
>> may use this feature.)
>>
>> I am of course not familiar with details of the software implementation
>> - could there be benefits to using it even if hardware counters are
>> supported?
> 
> I can't see any situation where the user would want to choose software
> over hardware counters. The number of groups which can be monitored by
> software assignable counters will always be less than with hardware,
> due to the need for consuming one RMID (and the counters automatically
> allocated to it by the AMD hardware) for all unassigned groups.

Thank you for clarifying. This seems specific to this software implementation,
and I missed that there was a shift from soft-RMIDs to soft-ABMC. If I remember
correctly this depends on undocumented hardware specific knowledge.
  
> I consider software assignable a workaround to enable measuring
> bandwidth reliably on a large number of groups on pre-ABMC AMD
> hardware, or rather salvaging MBM on pre-ABMC hardware making use of
> our users' effort to adapt to counter assignment in resctrl. We hope
> no future implementations will choose to silently drop bandwidth
> counts, so fingers crossed, the software implementation can be phased
> out when these generations of AMD hardware are decommissioned.

That sounds ideal.

> 
> The MPAM specification natively supports (or requires) counter
> assignment in hardware. From what I recall in the last of James'
> prototypes I looked at, MBM was only supported if the implementation
> provided as many bandwidth counters as there were possible monitoring
> groups, so that it could assume a monitor IDs for every PARTID:PMG
> combination.

Thank you for this insight.

> 
>>
>> What I would like to avoid is future complexity of needing a new mount/config
>> option that user space needs to use to select if a single "mbm_cntr_assignable"
>> is backed by hardware or software.
> 
> In my testing so far, automatically enabling counter assignment and
> automatically allocating counters for all events in new groups works
> well enough.
> 
> The only configuration I need is the ability to disable the automatic
> counter allocation so that a userspace agent can have control of where
> all the counters are assigned at all times. It's easy to implement
> this as a simple flag if the user accepts that they need to manually
> deallocate any automatically-allocated counters from groups created
> before the flag was cleared.
> 
>>
>>> The main semantic difference with SW assignments is that it is not
>>> possible to assign counters to individual events. Because the
>>> implementation is assigning RMIDs to groups, assignment results in all
>>> events being counted.
>>>
>>> I was considering introducing a boolean mbm_assign_events node to
>>> indicate whether assigning individual events is supported. If true,
>>> num_mbm_cntrs indicates the number of events which can be counted,
>>> otherwise it indicates the number of groups to which counters can be
>>> assigned and attempting to assign a single event is silently upgraded
>>> to assigning counters to all events in the group.
>>
>> How were you envisioning your users using the control file ("mbm_control")
>> in these scenarios? Does this file's interface even work for SW assignment
>> scenarios?
>>
>> Users should expect consistent interface for "mbm_control" also.
>>
>> It sounds to me that a potential "mbm_assign_events" will be false for SW
>> assignments. That would mean that "num_mbm_cntrs" will
>> contain the number of groups to which counters can be assigned?
>> Would user space be required to always enable all flags (enable all events) of
>> all domains to the same values ... or would enabling of one flag (one event)
>> in one domain automatically result in all flags (all events) enabled for all
>> domains ... or would enabling of one flag (one event) in one domain only appear
>> to user space to be enabled while in reality all flags/events are actually enabled?
> 
> I believe mbm_control should always accurately reflect which events
> are being counted.

I agree.

> 
> The behavior as I've implemented today is:
> 
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_events
> 0
> 
> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
> test//0=_;1=_;
> //0=_;1=_;
> 
> # echo "test//1+l" > /sys/fs/resctrl/info/L3_MON/mbm_control
> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
> test//0=_;1=tl;
> //0=_;1=_;
> 
> # echo "test//1-t" > /sys/fs/resctrl/info/L3_MON/mbm_control
> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
> test//0=_;1=_;
> //0=_;1=_;
> 
> 

This highlights how there cannot be a generic/consistent interface between hardware
and software implementation. If resctrl implements something like above without any
other hints to user space then it will push complexity to user space since user space
would not know if setting one flag results in setting more than that flag, which may
force a user space implementation to always follow a write with a read that
needs to confirm what actually resulted from the write. Similarly, that removing a
flag impacts other flags needs to be clear without user space needing to "try and
see what happens".

It is not clear to me how to interpret the above example when it comes to the
RMID management though. If the RMID assignment is per group then I expected all
the domains of a group to have the same flag(s)?

>>
>>> However, If we don't expect to see these semantics in any other
>>> implementation, these semantics could be implicit in the definition of
>>> a SW assignable counter.
>>
>> It is not clear to me how implementation differences between hardware
>> and software assignment can be hidden from user space. It is possible
>> to let user space enable individual events and then silently upgrade it
>> to all events. I see two options here, either "mbm_control" needs to
>> explicitly show this "silent upgrade" so that user space knows which
>> events are actually enabled, or "mbm_control" only shows flags/events enabled
>> from user space perspective. In the former scenario, this needs more
>> user space support since a generic user space cannot be confident which
>> flags are set after writing to "mbm_control". In the latter scenario,
>> meaning of "num_mbm_cntrs" becomes unclear since user space is expected
>> to rely on it to know which events can be enabled and if some are
>> actually "silently enabled" when user space still thinks it needs to be
>> enabled the number of available counters becomes vague.
>>
>> It is not clear to me how to present hardware and software assignable
>> counters with a single consistent interface. Actually, what if the
>> "mbm_mode" is what distinguishes how counters are assigned instead of how
>> it is backed (hw vs sw)? What if, instead of "mbm_cntr_assignable" and
>> "mbm_cntr_sw_assignable" MBM modes the terms "mbm_cntr_event_assignable"
>> and "mbm_cntr_group_assignable" is used? Could that replace a
>> potential "mbm_assign_events" while also supporting user space in
>> interactions with "mbm_control"?
> 
> If I understand this correctly, is this a preference that the info
> node be named differently if its value will have different units,
> rather than a second node to indicate what the value of num_mbm_cntrs
> actually means? This sounds reasonable to me.

Indeed. As you highlighted, user space may not need to know if
counters are backed by hardware or software, but user space needs to
know what to expect from (how to interact with) interface.

> I think it's also important to note that in MPAM, the MBWU (memory
> bandwidth usage) monitors don't have a concept of local versus total
> bandwidth, so event assignment would likely not apply there either.
> What the counted bandwidth actually represents is more implicit in the
> monitor's position in the memory system in the particular
> implementation. On a theoretical multi-socket system, resctrl would
> require knowledge about the system's architecture to stitch together
> the counts from different types of monitors to produce a local and
> total value. I don't know if we'd program this SoC-specific knowledge
> into the kernel to produce a unified MBM resource like we're
> accustomed to now or if we'd present multiple MBM resources, each only
> providing an mbm_total_bytes event. In this case, the counters would
> have to be assigned separately in each MBM resource, especially if the
> different MBM resources support a different number of counters.
> 

"total" and "local" bandwidth is already in grey area after the
introduction of mbm_total_bytes_config/mbm_local_bytes_config where
user space could set values reported to not be constrained by the
"total" and "local" terms. We keep sticking with it though, even in
this implementation that uses the "t" and "l" flags, knowing that
what is actually monitored when "l" is set is just what the user
configured via mbm_local_bytes_config, which theoretically
can be "total" bandwidth.

Reinette

ps. I will be offline next week.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2024-08-02 20:55             ` Reinette Chatre
@ 2024-08-02 22:50               ` Peter Newman
  2024-08-14 17:37                 ` Reinette Chatre
  2024-08-03  0:49               ` Moger, Babu
  1 sibling, 1 reply; 95+ messages in thread
From: Peter Newman @ 2024-08-02 22:50 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen, x86,
	hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	maciej.wieczor-retman, linux-doc, linux-kernel, eranian,
	james.morse

Hi Reinette,

On Fri, Aug 2, 2024 at 1:55 PM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> Hi Peter,
>
> On 8/2/24 11:49 AM, Peter Newman wrote:
> > On Fri, Aug 2, 2024 at 9:14 AM Reinette Chatre
> >> I am of course not familiar with details of the software implementation
> >> - could there be benefits to using it even if hardware counters are
> >> supported?
> >
> > I can't see any situation where the user would want to choose software
> > over hardware counters. The number of groups which can be monitored by
> > software assignable counters will always be less than with hardware,
> > due to the need for consuming one RMID (and the counters automatically
> > allocated to it by the AMD hardware) for all unassigned groups.
>
> Thank you for clarifying. This seems specific to this software implementation,
> and I missed that there was a shift from soft-RMIDs to soft-ABMC. If I remember
> correctly this depends on undocumented hardware specific knowledge.

For the benefit of anyone else who needs to monitor bandwidth on a
large number of monitoring groups on pre-ABMC AMD implementations,
hopefully a future AMD publication will clarify, at least on some
existing, pre-ABMC models, exactly when the QM_CTR.U bit is set.


> >
> > The behavior as I've implemented today is:
> >
> > # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_events
> > 0
> >
> > # cat /sys/fs/resctrl/info/L3_MON/mbm_control
> > test//0=_;1=_;
> > //0=_;1=_;
> >
> > # echo "test//1+l" > /sys/fs/resctrl/info/L3_MON/mbm_control
> > # cat /sys/fs/resctrl/info/L3_MON/mbm_control
> > test//0=_;1=tl;
> > //0=_;1=_;
> >
> > # echo "test//1-t" > /sys/fs/resctrl/info/L3_MON/mbm_control
> > # cat /sys/fs/resctrl/info/L3_MON/mbm_control
> > test//0=_;1=_;
> > //0=_;1=_;
> >
> >
>
> This highlights how there cannot be a generic/consistent interface between hardware
> and software implementation. If resctrl implements something like above without any
> other hints to user space then it will push complexity to user space since user space
> would not know if setting one flag results in setting more than that flag, which may
> force a user space implementation to always follow a write with a read that
> needs to confirm what actually resulted from the write. Similarly, that removing a
> flag impacts other flags needs to be clear without user space needing to "try and
> see what happens".

I'll return to this topic in the context of MPAM below...

> It is not clear to me how to interpret the above example when it comes to the
> RMID management though. If the RMID assignment is per group then I expected all
> the domains of a group to have the same flag(s)?

The group RMIDs are never programmed into any MSRs and the RMID space
is independent in each domain, so it is still possible to do
per-domain assignment. (and like with soft RMIDs, this enables us to
create unlimited groups, but we've never been limited by the size of
the RMID space)

However, in our use cases, jobs are not confined to any domain, so
bandwidth measurements must be done simultaneously in all domains, so
we have no current use for per-domain assignment. But if any Google
users did begin to see value in confining jobs to domains, this could
change.

>
> >>
> >>> However, If we don't expect to see these semantics in any other
> >>> implementation, these semantics could be implicit in the definition of
> >>> a SW assignable counter.
> >>
> >> It is not clear to me how implementation differences between hardware
> >> and software assignment can be hidden from user space. It is possible
> >> to let user space enable individual events and then silently upgrade it
> >> to all events. I see two options here, either "mbm_control" needs to
> >> explicitly show this "silent upgrade" so that user space knows which
> >> events are actually enabled, or "mbm_control" only shows flags/events enabled
> >> from user space perspective. In the former scenario, this needs more
> >> user space support since a generic user space cannot be confident which
> >> flags are set after writing to "mbm_control". In the latter scenario,
> >> meaning of "num_mbm_cntrs" becomes unclear since user space is expected
> >> to rely on it to know which events can be enabled and if some are
> >> actually "silently enabled" when user space still thinks it needs to be
> >> enabled the number of available counters becomes vague.
> >>
> >> It is not clear to me how to present hardware and software assignable
> >> counters with a single consistent interface. Actually, what if the
> >> "mbm_mode" is what distinguishes how counters are assigned instead of how
> >> it is backed (hw vs sw)? What if, instead of "mbm_cntr_assignable" and
> >> "mbm_cntr_sw_assignable" MBM modes the terms "mbm_cntr_event_assignable"
> >> and "mbm_cntr_group_assignable" is used? Could that replace a
> >> potential "mbm_assign_events" while also supporting user space in
> >> interactions with "mbm_control"?
> >
> > If I understand this correctly, is this a preference that the info
> > node be named differently if its value will have different units,
> > rather than a second node to indicate what the value of num_mbm_cntrs
> > actually means? This sounds reasonable to me.
>
> Indeed. As you highlighted, user space may not need to know if
> counters are backed by hardware or software, but user space needs to
> know what to expect from (how to interact with) interface.
>
> > I think it's also important to note that in MPAM, the MBWU (memory
> > bandwidth usage) monitors don't have a concept of local versus total
> > bandwidth, so event assignment would likely not apply there either.
> > What the counted bandwidth actually represents is more implicit in the
> > monitor's position in the memory system in the particular
> > implementation. On a theoretical multi-socket system, resctrl would
> > require knowledge about the system's architecture to stitch together
> > the counts from different types of monitors to produce a local and
> > total value. I don't know if we'd program this SoC-specific knowledge
> > into the kernel to produce a unified MBM resource like we're
> > accustomed to now or if we'd present multiple MBM resources, each only
> > providing an mbm_total_bytes event. In this case, the counters would
> > have to be assigned separately in each MBM resource, especially if the
> > different MBM resources support a different number of counters.
> >
>
> "total" and "local" bandwidth is already in grey area after the
> introduction of mbm_total_bytes_config/mbm_local_bytes_config where
> user space could set values reported to not be constrained by the
> "total" and "local" terms. We keep sticking with it though, even in
> this implementation that uses the "t" and "l" flags, knowing that
> what is actually monitored when "l" is set is just what the user
> configured via mbm_local_bytes_config, which theoretically
> can be "total" bandwidth.

If it makes sense to support a separate, group-assignment interface at
least for MPAM, this would be a better fit for soft-ABMC, even if it
does have to stay downstream.

Thanks,
-Peter

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2024-08-02 20:55             ` Reinette Chatre
  2024-08-02 22:50               ` Peter Newman
@ 2024-08-03  0:49               ` Moger, Babu
  1 sibling, 0 replies; 95+ messages in thread
From: Moger, Babu @ 2024-08-03  0:49 UTC (permalink / raw)
  To: Reinette Chatre, Peter Newman
  Cc: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen, x86,
	hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	maciej.wieczor-retman, linux-doc, linux-kernel, eranian,
	james.morse

Hi Peter/Reinette,

On 8/2/2024 3:55 PM, Reinette Chatre wrote:
> Hi Peter,
> 
> On 8/2/24 11:49 AM, Peter Newman wrote:
>> On Fri, Aug 2, 2024 at 9:14 AM Reinette Chatre
>> <reinette.chatre@intel.com> wrote:
>>> On 8/1/24 3:45 PM, Peter Newman wrote:
>>>> On Thu, Aug 1, 2024 at 2:50 PM Reinette Chatre
>>>> <reinette.chatre@intel.com> wrote:
>>>>> On 7/17/24 10:19 AM, Moger, Babu wrote:
>>>>>> On 7/12/24 17:03, Reinette Chatre wrote:
>>>>>>> On 7/3/24 2:48 PM, Babu Moger wrote:
>>>
>>>>>>>> # Examples
>>>>>>>>
>>>>>>>> a. Check if ABMC support is available
>>>>>>>>        #mount -t resctrl resctrl /sys/fs/resctrl/
>>>>>>>>
>>>>>>>>        #cat /sys/fs/resctrl/info/L3_MON/mbm_mode
>>>>>>>>        [abmc]
>>>>>>>>        legacy
>>>>>>>>
>>>>>>>>        Linux kernel detected ABMC feature and it is enabled.
>>>>>>>
>>>>>>> How about renaming "abmc" to "mbm_cntrs"? This will match the 
>>>>>>> num_mbm_cntrs
>>>>>>> info file and be the final step to make this generic so that another
>>>>>>> architecture
>>>>>>> can more easily support assignining hardware counters without 
>>>>>>> needing to call
>>>>>>> the feature AMD's "abmc".
>>>>>>
>>>>>> I think we aleady settled this with "mbm_cntr_assignable".
>>>>>>
>>>>>> For soft-RMID" it will be mbm_sw_assignable.
>>>>>
>>>>> Maybe getting a bit long but how about "mbm_cntr_sw_assignable" to 
>>>>> match
>>>>> with the term "mbm_cntr" in accompanying "num_mbm_cntrs"?
>>>>
>>>> My users are pushing for a consistent interface regardless of whether
>>>> counter assignment is implemented in hardware or software, so I would
>>>> like to avoid exposing implementation differences in the interface
>>>> where possible.
>>>
>>> This seems a reasonable ask but can we be confident that if hardware
>>> supports assignable counters then there will never be a reason to use
>>> software assignable counters? (This needs to also consider how/if Arm
>>> may use this feature.)
>>>
>>> I am of course not familiar with details of the software implementation
>>> - could there be benefits to using it even if hardware counters are
>>> supported?
>>
>> I can't see any situation where the user would want to choose software
>> over hardware counters. The number of groups which can be monitored by
>> software assignable counters will always be less than with hardware,
>> due to the need for consuming one RMID (and the counters automatically
>> allocated to it by the AMD hardware) for all unassigned groups.
> 
> Thank you for clarifying. This seems specific to this software 
> implementation,
> and I missed that there was a shift from soft-RMIDs to soft-ABMC. If I 
> remember
> correctly this depends on undocumented hardware specific knowledge.
> 
>> I consider software assignable a workaround to enable measuring
>> bandwidth reliably on a large number of groups on pre-ABMC AMD
>> hardware, or rather salvaging MBM on pre-ABMC hardware making use of
>> our users' effort to adapt to counter assignment in resctrl. We hope
>> no future implementations will choose to silently drop bandwidth
>> counts, so fingers crossed, the software implementation can be phased
>> out when these generations of AMD hardware are decommissioned.
> 
> That sounds ideal.
> 
>>
>> The MPAM specification natively supports (or requires) counter
>> assignment in hardware. From what I recall in the last of James'
>> prototypes I looked at, MBM was only supported if the implementation
>> provided as many bandwidth counters as there were possible monitoring
>> groups, so that it could assume a monitor IDs for every PARTID:PMG
>> combination.
> 
> Thank you for this insight.
> 
>>
>>>
>>> What I would like to avoid is future complexity of needing a new 
>>> mount/config
>>> option that user space needs to use to select if a single 
>>> "mbm_cntr_assignable"
>>> is backed by hardware or software.
>>
>> In my testing so far, automatically enabling counter assignment and
>> automatically allocating counters for all events in new groups works
>> well enough.
>>
>> The only configuration I need is the ability to disable the automatic
>> counter allocation so that a userspace agent can have control of where
>> all the counters are assigned at all times. It's easy to implement
>> this as a simple flag if the user accepts that they need to manually
>> deallocate any automatically-allocated counters from groups created
>> before the flag was cleared.
>>
>>>
>>>> The main semantic difference with SW assignments is that it is not
>>>> possible to assign counters to individual events. Because the
>>>> implementation is assigning RMIDs to groups, assignment results in all
>>>> events being counted.
>>>>
>>>> I was considering introducing a boolean mbm_assign_events node to
>>>> indicate whether assigning individual events is supported. If true,
>>>> num_mbm_cntrs indicates the number of events which can be counted,
>>>> otherwise it indicates the number of groups to which counters can be
>>>> assigned and attempting to assign a single event is silently upgraded
>>>> to assigning counters to all events in the group.
>>>
>>> How were you envisioning your users using the control file 
>>> ("mbm_control")
>>> in these scenarios? Does this file's interface even work for SW 
>>> assignment
>>> scenarios?
>>>
>>> Users should expect consistent interface for "mbm_control" also.
>>>
>>> It sounds to me that a potential "mbm_assign_events" will be false 
>>> for SW
>>> assignments. That would mean that "num_mbm_cntrs" will
>>> contain the number of groups to which counters can be assigned?
>>> Would user space be required to always enable all flags (enable all 
>>> events) of
>>> all domains to the same values ... or would enabling of one flag (one 
>>> event)
>>> in one domain automatically result in all flags (all events) enabled 
>>> for all
>>> domains ... or would enabling of one flag (one event) in one domain 
>>> only appear
>>> to user space to be enabled while in reality all flags/events are 
>>> actually enabled?
>>
>> I believe mbm_control should always accurately reflect which events
>> are being counted.
> 
> I agree.
> 
>>
>> The behavior as I've implemented today is:
>>
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_events
>> 0
>>
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>> test//0=_;1=_;
>> //0=_;1=_;
>>
>> # echo "test//1+l" > /sys/fs/resctrl/info/L3_MON/mbm_control
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>> test//0=_;1=tl;
>> //0=_;1=_;
>>
>> # echo "test//1-t" > /sys/fs/resctrl/info/L3_MON/mbm_control
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>> test//0=_;1=_;
>> //0=_;1=_;
>>
>>
> 
> This highlights how there cannot be a generic/consistent interface 
> between hardware
> and software implementation. If resctrl implements something like above 
> without any
> other hints to user space then it will push complexity to user space 
> since user space
> would not know if setting one flag results in setting more than that 
> flag, which may
> force a user space implementation to always follow a write with a read that
> needs to confirm what actually resulted from the write. Similarly, that 
> removing a
> flag impacts other flags needs to be clear without user space needing to 
> "try and
> see what happens".
> 
> It is not clear to me how to interpret the above example when it comes 
> to the
> RMID management though. If the RMID assignment is per group then I 
> expected all
> the domains of a group to have the same flag(s)?
> 
>>>
>>>> However, If we don't expect to see these semantics in any other
>>>> implementation, these semantics could be implicit in the definition of
>>>> a SW assignable counter.
>>>
>>> It is not clear to me how implementation differences between hardware
>>> and software assignment can be hidden from user space. It is possible
>>> to let user space enable individual events and then silently upgrade it
>>> to all events. I see two options here, either "mbm_control" needs to
>>> explicitly show this "silent upgrade" so that user space knows which
>>> events are actually enabled, or "mbm_control" only shows flags/events 
>>> enabled
>>> from user space perspective. In the former scenario, this needs more
>>> user space support since a generic user space cannot be confident which
>>> flags are set after writing to "mbm_control". In the latter scenario,
>>> meaning of "num_mbm_cntrs" becomes unclear since user space is expected
>>> to rely on it to know which events can be enabled and if some are
>>> actually "silently enabled" when user space still thinks it needs to be
>>> enabled the number of available counters becomes vague.
>>>
>>> It is not clear to me how to present hardware and software assignable
>>> counters with a single consistent interface. Actually, what if the
>>> "mbm_mode" is what distinguishes how counters are assigned instead of 
>>> how
>>> it is backed (hw vs sw)? What if, instead of "mbm_cntr_assignable" and
>>> "mbm_cntr_sw_assignable" MBM modes the terms "mbm_cntr_event_assignable"
>>> and "mbm_cntr_group_assignable" is used? Could that replace a
>>> potential "mbm_assign_events" while also supporting user space in
>>> interactions with "mbm_control"?
>>
>> If I understand this correctly, is this a preference that the info
>> node be named differently if its value will have different units,
>> rather than a second node to indicate what the value of num_mbm_cntrs
>> actually means? This sounds reasonable to me.
> 
> Indeed. As you highlighted, user space may not need to know if
> counters are backed by hardware or software, but user space needs to
> know what to expect from (how to interact with) interface.
> 
>> I think it's also important to note that in MPAM, the MBWU (memory
>> bandwidth usage) monitors don't have a concept of local versus total
>> bandwidth, so event assignment would likely not apply there either.
>> What the counted bandwidth actually represents is more implicit in the
>> monitor's position in the memory system in the particular
>> implementation. On a theoretical multi-socket system, resctrl would
>> require knowledge about the system's architecture to stitch together
>> the counts from different types of monitors to produce a local and
>> total value. I don't know if we'd program this SoC-specific knowledge
>> into the kernel to produce a unified MBM resource like we're
>> accustomed to now or if we'd present multiple MBM resources, each only
>> providing an mbm_total_bytes event. In this case, the counters would
>> have to be assigned separately in each MBM resource, especially if the
>> different MBM resources support a different number of counters.
>>
> 
> "total" and "local" bandwidth is already in grey area after the
> introduction of mbm_total_bytes_config/mbm_local_bytes_config where
> user space could set values reported to not be constrained by the
> "total" and "local" terms. We keep sticking with it though, even in
> this implementation that uses the "t" and "l" flags, knowing that
> what is actually monitored when "l" is set is just what the user
> configured via mbm_local_bytes_config, which theoretically
> can be "total" bandwidth.
> 
> Reinette
> 
> ps. I will be offline next week.

Thanks for heads up.

Looks like we still need to figure out few things about the interface.

However, I need resolve few issues with v5. I can go ahead and post v6 
next week. We can continue our discussion. That way we are making some 
forward progress in the series. Let me know  what do you think.

Thanks
- Babu Moger

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2024-08-02 22:50               ` Peter Newman
@ 2024-08-14 17:37                 ` Reinette Chatre
  2024-08-15 23:06                   ` Peter Newman
  0 siblings, 1 reply; 95+ messages in thread
From: Reinette Chatre @ 2024-08-14 17:37 UTC (permalink / raw)
  To: Peter Newman
  Cc: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen, x86,
	hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	maciej.wieczor-retman, linux-doc, linux-kernel, eranian,
	james.morse

Hi Peter,

On 8/2/24 3:50 PM, Peter Newman wrote:
> On Fri, Aug 2, 2024 at 1:55 PM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>> On 8/2/24 11:49 AM, Peter Newman wrote:
>>> On Fri, Aug 2, 2024 at 9:14 AM Reinette Chatre
>>>> I am of course not familiar with details of the software implementation
>>>> - could there be benefits to using it even if hardware counters are
>>>> supported?
>>>
>>> I can't see any situation where the user would want to choose software
>>> over hardware counters. The number of groups which can be monitored by
>>> software assignable counters will always be less than with hardware,
>>> due to the need for consuming one RMID (and the counters automatically
>>> allocated to it by the AMD hardware) for all unassigned groups.
>>
>> Thank you for clarifying. This seems specific to this software implementation,
>> and I missed that there was a shift from soft-RMIDs to soft-ABMC. If I remember
>> correctly this depends on undocumented hardware specific knowledge.
> 
> For the benefit of anyone else who needs to monitor bandwidth on a
> large number of monitoring groups on pre-ABMC AMD implementations,
> hopefully a future AMD publication will clarify, at least on some
> existing, pre-ABMC models, exactly when the QM_CTR.U bit is set.
> 
> 
>>>
>>> The behavior as I've implemented today is:
>>>
>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_events
>>> 0
>>>
>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>>> test//0=_;1=_;
>>> //0=_;1=_;
>>>
>>> # echo "test//1+l" > /sys/fs/resctrl/info/L3_MON/mbm_control
>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>>> test//0=_;1=tl;
>>> //0=_;1=_;
>>>
>>> # echo "test//1-t" > /sys/fs/resctrl/info/L3_MON/mbm_control
>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>>> test//0=_;1=_;
>>> //0=_;1=_;
>>>
>>>
>>
>> This highlights how there cannot be a generic/consistent interface between hardware
>> and software implementation. If resctrl implements something like above without any
>> other hints to user space then it will push complexity to user space since user space
>> would not know if setting one flag results in setting more than that flag, which may
>> force a user space implementation to always follow a write with a read that
>> needs to confirm what actually resulted from the write. Similarly, that removing a
>> flag impacts other flags needs to be clear without user space needing to "try and
>> see what happens".
> 
> I'll return to this topic in the context of MPAM below...
> 
>> It is not clear to me how to interpret the above example when it comes to the
>> RMID management though. If the RMID assignment is per group then I expected all
>> the domains of a group to have the same flag(s)?
> 
> The group RMIDs are never programmed into any MSRs and the RMID space
> is independent in each domain, so it is still possible to do
> per-domain assignment. (and like with soft RMIDs, this enables us to
> create unlimited groups, but we've never been limited by the size of
> the RMID space)
> 
> However, in our use cases, jobs are not confined to any domain, so
> bandwidth measurements must be done simultaneously in all domains, so
> we have no current use for per-domain assignment. But if any Google
> users did begin to see value in confining jobs to domains, this could
> change.
> 
>>
>>>>
>>>>> However, If we don't expect to see these semantics in any other
>>>>> implementation, these semantics could be implicit in the definition of
>>>>> a SW assignable counter.
>>>>
>>>> It is not clear to me how implementation differences between hardware
>>>> and software assignment can be hidden from user space. It is possible
>>>> to let user space enable individual events and then silently upgrade it
>>>> to all events. I see two options here, either "mbm_control" needs to
>>>> explicitly show this "silent upgrade" so that user space knows which
>>>> events are actually enabled, or "mbm_control" only shows flags/events enabled
>>>> from user space perspective. In the former scenario, this needs more
>>>> user space support since a generic user space cannot be confident which
>>>> flags are set after writing to "mbm_control". In the latter scenario,
>>>> meaning of "num_mbm_cntrs" becomes unclear since user space is expected
>>>> to rely on it to know which events can be enabled and if some are
>>>> actually "silently enabled" when user space still thinks it needs to be
>>>> enabled the number of available counters becomes vague.
>>>>
>>>> It is not clear to me how to present hardware and software assignable
>>>> counters with a single consistent interface. Actually, what if the
>>>> "mbm_mode" is what distinguishes how counters are assigned instead of how
>>>> it is backed (hw vs sw)? What if, instead of "mbm_cntr_assignable" and
>>>> "mbm_cntr_sw_assignable" MBM modes the terms "mbm_cntr_event_assignable"
>>>> and "mbm_cntr_group_assignable" is used? Could that replace a
>>>> potential "mbm_assign_events" while also supporting user space in
>>>> interactions with "mbm_control"?
>>>
>>> If I understand this correctly, is this a preference that the info
>>> node be named differently if its value will have different units,
>>> rather than a second node to indicate what the value of num_mbm_cntrs
>>> actually means? This sounds reasonable to me.
>>
>> Indeed. As you highlighted, user space may not need to know if
>> counters are backed by hardware or software, but user space needs to
>> know what to expect from (how to interact with) interface.
>>
>>> I think it's also important to note that in MPAM, the MBWU (memory
>>> bandwidth usage) monitors don't have a concept of local versus total
>>> bandwidth, so event assignment would likely not apply there either.
>>> What the counted bandwidth actually represents is more implicit in the
>>> monitor's position in the memory system in the particular
>>> implementation. On a theoretical multi-socket system, resctrl would
>>> require knowledge about the system's architecture to stitch together
>>> the counts from different types of monitors to produce a local and
>>> total value. I don't know if we'd program this SoC-specific knowledge
>>> into the kernel to produce a unified MBM resource like we're
>>> accustomed to now or if we'd present multiple MBM resources, each only
>>> providing an mbm_total_bytes event. In this case, the counters would
>>> have to be assigned separately in each MBM resource, especially if the
>>> different MBM resources support a different number of counters.
>>>
>>
>> "total" and "local" bandwidth is already in grey area after the
>> introduction of mbm_total_bytes_config/mbm_local_bytes_config where
>> user space could set values reported to not be constrained by the
>> "total" and "local" terms. We keep sticking with it though, even in
>> this implementation that uses the "t" and "l" flags, knowing that
>> what is actually monitored when "l" is set is just what the user
>> configured via mbm_local_bytes_config, which theoretically
>> can be "total" bandwidth.
> 
> If it makes sense to support a separate, group-assignment interface at
> least for MPAM, this would be a better fit for soft-ABMC, even if it
> does have to stay downstream.

(apologies for the delay)

Could we please take a step back and confirm/agree what is meant with "group-
assignment"? In a previous message [1] I latched onto the statement
"the implementation is assigning RMIDs to groups, assignment results in all
events being counted.". In this I understood "groups" to be resctrl groups
and I understood this to mean that when a (soft-ABMC) counter is assigned
it applies to the entire resctrl group (all domains, all events). The
subsequent example in [2] was thus unexpected to me when the interface
was used to assign a (soft-ABMC) counter to the group but not all domains
were impacted.

Considering this, could you please elaborate what is meant with
"group assignment"?

Thank you

Reinette

[1] https://lore.kernel.org/lkml/CALPaoCi_TBZnULHQpYns+H+30jODZvyQpUHJRDHNwjQzajrD=A@mail.gmail.com/
[2] https://lore.kernel.org/lkml/CALPaoCi1CwLy_HbFNOxPfdReEJstd3c+DvOMJHb5P9jBP+iatw@mail.gmail.com/


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2024-08-14 17:37                 ` Reinette Chatre
@ 2024-08-15 23:06                   ` Peter Newman
  2024-08-16  1:45                     ` Reinette Chatre
  0 siblings, 1 reply; 95+ messages in thread
From: Peter Newman @ 2024-08-15 23:06 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen, x86,
	hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	maciej.wieczor-retman, linux-doc, linux-kernel, eranian,
	james.morse

Hi Reinette,

On Wed, Aug 14, 2024 at 10:37 AM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> Hi Peter,
>
> On 8/2/24 3:50 PM, Peter Newman wrote:
> > On Fri, Aug 2, 2024 at 1:55 PM Reinette Chatre
> > <reinette.chatre@intel.com> wrote:
> >> On 8/2/24 11:49 AM, Peter Newman wrote:
> >>> On Fri, Aug 2, 2024 at 9:14 AM Reinette Chatre
> >>>> I am of course not familiar with details of the software implementation
> >>>> - could there be benefits to using it even if hardware counters are
> >>>> supported?
> >>>
> >>> I can't see any situation where the user would want to choose software
> >>> over hardware counters. The number of groups which can be monitored by
> >>> software assignable counters will always be less than with hardware,
> >>> due to the need for consuming one RMID (and the counters automatically
> >>> allocated to it by the AMD hardware) for all unassigned groups.
> >>
> >> Thank you for clarifying. This seems specific to this software implementation,
> >> and I missed that there was a shift from soft-RMIDs to soft-ABMC. If I remember
> >> correctly this depends on undocumented hardware specific knowledge.
> >
> > For the benefit of anyone else who needs to monitor bandwidth on a
> > large number of monitoring groups on pre-ABMC AMD implementations,
> > hopefully a future AMD publication will clarify, at least on some
> > existing, pre-ABMC models, exactly when the QM_CTR.U bit is set.
> >
> >
> >>>
> >>> The behavior as I've implemented today is:
> >>>
> >>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_events
> >>> 0
> >>>
> >>> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
> >>> test//0=_;1=_;
> >>> //0=_;1=_;
> >>>
> >>> # echo "test//1+l" > /sys/fs/resctrl/info/L3_MON/mbm_control
> >>> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
> >>> test//0=_;1=tl;
> >>> //0=_;1=_;
> >>>
> >>> # echo "test//1-t" > /sys/fs/resctrl/info/L3_MON/mbm_control
> >>> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
> >>> test//0=_;1=_;
> >>> //0=_;1=_;
> >>>
> >>>
> >>
> >> This highlights how there cannot be a generic/consistent interface between hardware
> >> and software implementation. If resctrl implements something like above without any
> >> other hints to user space then it will push complexity to user space since user space
> >> would not know if setting one flag results in setting more than that flag, which may
> >> force a user space implementation to always follow a write with a read that
> >> needs to confirm what actually resulted from the write. Similarly, that removing a
> >> flag impacts other flags needs to be clear without user space needing to "try and
> >> see what happens".
> >
> > I'll return to this topic in the context of MPAM below...
> >
> >> It is not clear to me how to interpret the above example when it comes to the
> >> RMID management though. If the RMID assignment is per group then I expected all
> >> the domains of a group to have the same flag(s)?
> >
> > The group RMIDs are never programmed into any MSRs and the RMID space
> > is independent in each domain, so it is still possible to do
> > per-domain assignment. (and like with soft RMIDs, this enables us to
> > create unlimited groups, but we've never been limited by the size of
> > the RMID space)
> >
> > However, in our use cases, jobs are not confined to any domain, so
> > bandwidth measurements must be done simultaneously in all domains, so
> > we have no current use for per-domain assignment. But if any Google
> > users did begin to see value in confining jobs to domains, this could
> > change.
> >
> >>
> >>>>
> >>>>> However, If we don't expect to see these semantics in any other
> >>>>> implementation, these semantics could be implicit in the definition of
> >>>>> a SW assignable counter.
> >>>>
> >>>> It is not clear to me how implementation differences between hardware
> >>>> and software assignment can be hidden from user space. It is possible
> >>>> to let user space enable individual events and then silently upgrade it
> >>>> to all events. I see two options here, either "mbm_control" needs to
> >>>> explicitly show this "silent upgrade" so that user space knows which
> >>>> events are actually enabled, or "mbm_control" only shows flags/events enabled
> >>>> from user space perspective. In the former scenario, this needs more
> >>>> user space support since a generic user space cannot be confident which
> >>>> flags are set after writing to "mbm_control". In the latter scenario,
> >>>> meaning of "num_mbm_cntrs" becomes unclear since user space is expected
> >>>> to rely on it to know which events can be enabled and if some are
> >>>> actually "silently enabled" when user space still thinks it needs to be
> >>>> enabled the number of available counters becomes vague.
> >>>>
> >>>> It is not clear to me how to present hardware and software assignable
> >>>> counters with a single consistent interface. Actually, what if the
> >>>> "mbm_mode" is what distinguishes how counters are assigned instead of how
> >>>> it is backed (hw vs sw)? What if, instead of "mbm_cntr_assignable" and
> >>>> "mbm_cntr_sw_assignable" MBM modes the terms "mbm_cntr_event_assignable"
> >>>> and "mbm_cntr_group_assignable" is used? Could that replace a
> >>>> potential "mbm_assign_events" while also supporting user space in
> >>>> interactions with "mbm_control"?
> >>>
> >>> If I understand this correctly, is this a preference that the info
> >>> node be named differently if its value will have different units,
> >>> rather than a second node to indicate what the value of num_mbm_cntrs
> >>> actually means? This sounds reasonable to me.
> >>
> >> Indeed. As you highlighted, user space may not need to know if
> >> counters are backed by hardware or software, but user space needs to
> >> know what to expect from (how to interact with) interface.
> >>
> >>> I think it's also important to note that in MPAM, the MBWU (memory
> >>> bandwidth usage) monitors don't have a concept of local versus total
> >>> bandwidth, so event assignment would likely not apply there either.
> >>> What the counted bandwidth actually represents is more implicit in the
> >>> monitor's position in the memory system in the particular
> >>> implementation. On a theoretical multi-socket system, resctrl would
> >>> require knowledge about the system's architecture to stitch together
> >>> the counts from different types of monitors to produce a local and
> >>> total value. I don't know if we'd program this SoC-specific knowledge
> >>> into the kernel to produce a unified MBM resource like we're
> >>> accustomed to now or if we'd present multiple MBM resources, each only
> >>> providing an mbm_total_bytes event. In this case, the counters would
> >>> have to be assigned separately in each MBM resource, especially if the
> >>> different MBM resources support a different number of counters.
> >>>
> >>
> >> "total" and "local" bandwidth is already in grey area after the
> >> introduction of mbm_total_bytes_config/mbm_local_bytes_config where
> >> user space could set values reported to not be constrained by the
> >> "total" and "local" terms. We keep sticking with it though, even in
> >> this implementation that uses the "t" and "l" flags, knowing that
> >> what is actually monitored when "l" is set is just what the user
> >> configured via mbm_local_bytes_config, which theoretically
> >> can be "total" bandwidth.
> >
> > If it makes sense to support a separate, group-assignment interface at
> > least for MPAM, this would be a better fit for soft-ABMC, even if it
> > does have to stay downstream.
>
> (apologies for the delay)
>
> Could we please take a step back and confirm/agree what is meant with "group-
> assignment"? In a previous message [1] I latched onto the statement
> "the implementation is assigning RMIDs to groups, assignment results in all
> events being counted.". In this I understood "groups" to be resctrl groups
> and I understood this to mean that when a (soft-ABMC) counter is assigned
> it applies to the entire resctrl group (all domains, all events). The
> subsequent example in [2] was thus unexpected to me when the interface
> was used to assign a (soft-ABMC) counter to the group but not all domains
> were impacted.
>
> Considering this, could you please elaborate what is meant with
> "group assignment"?

By "group assignment", I just mean assigning counters to individual
MBM events is not possible, or that assignment results in counters
being assigned to all MBM events for a group in a domain.

I only omitted per-domain assignment in soft-ABMC before because
Google doesn't have a use-case for it. I started the prototype before
Babu's proposed interface required domain-scoped assignments[1]. Now
that some sort of domain selector is required, I'm reconsidering.

-Peter

[1] https://lore.kernel.org/lkml/cover.1705688538.git.babu.moger@amd.com/

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
  2024-08-15 23:06                   ` Peter Newman
@ 2024-08-16  1:45                     ` Reinette Chatre
  0 siblings, 0 replies; 95+ messages in thread
From: Reinette Chatre @ 2024-08-16  1:45 UTC (permalink / raw)
  To: Peter Newman
  Cc: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen, x86,
	hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	maciej.wieczor-retman, linux-doc, linux-kernel, eranian,
	james.morse

Hi Peter,

On 8/15/24 4:06 PM, Peter Newman wrote:
> On Wed, Aug 14, 2024 at 10:37 AM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>>
>> Hi Peter,
>>
>> On 8/2/24 3:50 PM, Peter Newman wrote:
>>> On Fri, Aug 2, 2024 at 1:55 PM Reinette Chatre
>>> <reinette.chatre@intel.com> wrote:
>>>> On 8/2/24 11:49 AM, Peter Newman wrote:
>>>>> On Fri, Aug 2, 2024 at 9:14 AM Reinette Chatre
>>>>>> I am of course not familiar with details of the software implementation
>>>>>> - could there be benefits to using it even if hardware counters are
>>>>>> supported?
>>>>>
>>>>> I can't see any situation where the user would want to choose software
>>>>> over hardware counters. The number of groups which can be monitored by
>>>>> software assignable counters will always be less than with hardware,
>>>>> due to the need for consuming one RMID (and the counters automatically
>>>>> allocated to it by the AMD hardware) for all unassigned groups.
>>>>
>>>> Thank you for clarifying. This seems specific to this software implementation,
>>>> and I missed that there was a shift from soft-RMIDs to soft-ABMC. If I remember
>>>> correctly this depends on undocumented hardware specific knowledge.
>>>
>>> For the benefit of anyone else who needs to monitor bandwidth on a
>>> large number of monitoring groups on pre-ABMC AMD implementations,
>>> hopefully a future AMD publication will clarify, at least on some
>>> existing, pre-ABMC models, exactly when the QM_CTR.U bit is set.
>>>
>>>
>>>>>
>>>>> The behavior as I've implemented today is:
>>>>>
>>>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_events
>>>>> 0
>>>>>
>>>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>>>>> test//0=_;1=_;
>>>>> //0=_;1=_;
>>>>>
>>>>> # echo "test//1+l" > /sys/fs/resctrl/info/L3_MON/mbm_control
>>>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>>>>> test//0=_;1=tl;
>>>>> //0=_;1=_;
>>>>>
>>>>> # echo "test//1-t" > /sys/fs/resctrl/info/L3_MON/mbm_control
>>>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>>>>> test//0=_;1=_;
>>>>> //0=_;1=_;
>>>>>
>>>>>
>>>>
>>>> This highlights how there cannot be a generic/consistent interface between hardware
>>>> and software implementation. If resctrl implements something like above without any
>>>> other hints to user space then it will push complexity to user space since user space
>>>> would not know if setting one flag results in setting more than that flag, which may
>>>> force a user space implementation to always follow a write with a read that
>>>> needs to confirm what actually resulted from the write. Similarly, that removing a
>>>> flag impacts other flags needs to be clear without user space needing to "try and
>>>> see what happens".
>>>
>>> I'll return to this topic in the context of MPAM below...
>>>
>>>> It is not clear to me how to interpret the above example when it comes to the
>>>> RMID management though. If the RMID assignment is per group then I expected all
>>>> the domains of a group to have the same flag(s)?
>>>
>>> The group RMIDs are never programmed into any MSRs and the RMID space
>>> is independent in each domain, so it is still possible to do
>>> per-domain assignment. (and like with soft RMIDs, this enables us to
>>> create unlimited groups, but we've never been limited by the size of
>>> the RMID space)
>>>
>>> However, in our use cases, jobs are not confined to any domain, so
>>> bandwidth measurements must be done simultaneously in all domains, so
>>> we have no current use for per-domain assignment. But if any Google
>>> users did begin to see value in confining jobs to domains, this could
>>> change.
>>>
>>>>
>>>>>>
>>>>>>> However, If we don't expect to see these semantics in any other
>>>>>>> implementation, these semantics could be implicit in the definition of
>>>>>>> a SW assignable counter.
>>>>>>
>>>>>> It is not clear to me how implementation differences between hardware
>>>>>> and software assignment can be hidden from user space. It is possible
>>>>>> to let user space enable individual events and then silently upgrade it
>>>>>> to all events. I see two options here, either "mbm_control" needs to
>>>>>> explicitly show this "silent upgrade" so that user space knows which
>>>>>> events are actually enabled, or "mbm_control" only shows flags/events enabled
>>>>>> from user space perspective. In the former scenario, this needs more
>>>>>> user space support since a generic user space cannot be confident which
>>>>>> flags are set after writing to "mbm_control". In the latter scenario,
>>>>>> meaning of "num_mbm_cntrs" becomes unclear since user space is expected
>>>>>> to rely on it to know which events can be enabled and if some are
>>>>>> actually "silently enabled" when user space still thinks it needs to be
>>>>>> enabled the number of available counters becomes vague.
>>>>>>
>>>>>> It is not clear to me how to present hardware and software assignable
>>>>>> counters with a single consistent interface. Actually, what if the
>>>>>> "mbm_mode" is what distinguishes how counters are assigned instead of how
>>>>>> it is backed (hw vs sw)? What if, instead of "mbm_cntr_assignable" and
>>>>>> "mbm_cntr_sw_assignable" MBM modes the terms "mbm_cntr_event_assignable"
>>>>>> and "mbm_cntr_group_assignable" is used? Could that replace a
>>>>>> potential "mbm_assign_events" while also supporting user space in
>>>>>> interactions with "mbm_control"?
>>>>>
>>>>> If I understand this correctly, is this a preference that the info
>>>>> node be named differently if its value will have different units,
>>>>> rather than a second node to indicate what the value of num_mbm_cntrs
>>>>> actually means? This sounds reasonable to me.
>>>>
>>>> Indeed. As you highlighted, user space may not need to know if
>>>> counters are backed by hardware or software, but user space needs to
>>>> know what to expect from (how to interact with) interface.
>>>>
>>>>> I think it's also important to note that in MPAM, the MBWU (memory
>>>>> bandwidth usage) monitors don't have a concept of local versus total
>>>>> bandwidth, so event assignment would likely not apply there either.
>>>>> What the counted bandwidth actually represents is more implicit in the
>>>>> monitor's position in the memory system in the particular
>>>>> implementation. On a theoretical multi-socket system, resctrl would
>>>>> require knowledge about the system's architecture to stitch together
>>>>> the counts from different types of monitors to produce a local and
>>>>> total value. I don't know if we'd program this SoC-specific knowledge
>>>>> into the kernel to produce a unified MBM resource like we're
>>>>> accustomed to now or if we'd present multiple MBM resources, each only
>>>>> providing an mbm_total_bytes event. In this case, the counters would
>>>>> have to be assigned separately in each MBM resource, especially if the
>>>>> different MBM resources support a different number of counters.
>>>>>
>>>>
>>>> "total" and "local" bandwidth is already in grey area after the
>>>> introduction of mbm_total_bytes_config/mbm_local_bytes_config where
>>>> user space could set values reported to not be constrained by the
>>>> "total" and "local" terms. We keep sticking with it though, even in
>>>> this implementation that uses the "t" and "l" flags, knowing that
>>>> what is actually monitored when "l" is set is just what the user
>>>> configured via mbm_local_bytes_config, which theoretically
>>>> can be "total" bandwidth.
>>>
>>> If it makes sense to support a separate, group-assignment interface at
>>> least for MPAM, this would be a better fit for soft-ABMC, even if it
>>> does have to stay downstream.
>>
>> (apologies for the delay)
>>
>> Could we please take a step back and confirm/agree what is meant with "group-
>> assignment"? In a previous message [1] I latched onto the statement
>> "the implementation is assigning RMIDs to groups, assignment results in all
>> events being counted.". In this I understood "groups" to be resctrl groups
>> and I understood this to mean that when a (soft-ABMC) counter is assigned
>> it applies to the entire resctrl group (all domains, all events). The
>> subsequent example in [2] was thus unexpected to me when the interface
>> was used to assign a (soft-ABMC) counter to the group but not all domains
>> were impacted.
>>
>> Considering this, could you please elaborate what is meant with
>> "group assignment"?
> 
> By "group assignment", I just mean assigning counters to individual
> MBM events is not possible, or that assignment results in counters
> being assigned to all MBM events for a group in a domain.

Thank you for clarifying. I still think it is possible to use an entry
in "mbm_mode" to indicate to user space what to expect from the mbm_control
interface but I withdraw my original naming suggestions since it would create
confusion about what is meant by "group".

> 
> I only omitted per-domain assignment in soft-ABMC before because
> Google doesn't have a use-case for it. I started the prototype before
> Babu's proposed interface required domain-scoped assignments[1]. Now
> that some sort of domain selector is required, I'm reconsidering.

Could you please elaborate what you mean with the required "domain selector"?
The latest ABMC version (v6) added support for assigning all domains using '*'.

Reinette

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v5 06/20] x86/resctrl: Add support to enable/disable AMD ABMC feature
  2024-07-12 22:05   ` Reinette Chatre
  2024-07-16 15:13     ` Moger, Babu
@ 2024-08-16 16:29     ` James Morse
  1 sibling, 0 replies; 95+ messages in thread
From: James Morse @ 2024-08-16 16:29 UTC (permalink / raw)
  To: Babu Moger
  Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
	lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
	rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
	kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
	peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
	eranian, dave.hansen, fenghua.yu, mingo, tglx, corbet, bp,
	Reinette Chatre

Hello!

On 12/07/2024 23:05, Reinette Chatre wrote:
> On 7/3/24 2:48 PM, Babu Moger wrote:
>> Add the functionality to enable/disable AMD ABMC feature.
>>
>> AMD ABMC feature is enabled by setting enabled bit(0) in MSR
>> L3_QOS_EXT_CFG.  When the state of ABMC is changed, the MSR needs
>> to be updated on all the logical processors in the QOS Domain.
>>
>> Hardware counters will reset when ABMC state is changed. Reset the
>> architectural state so that reading of hardware counter is not considered
>> as an overflow in next update.
>>
>> The ABMC feature details are documented in APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC).

>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 7e76f8d839fc..471fc0dbd7c3 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -2402,6 +2402,72 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool

>> +int resctrl_arch_abmc_enable(void)
>> +{
>> +    struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> +    struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>> +    int ret = 0;
>> +
>> +    lockdep_assert_held(&rdtgroup_mutex);
>> +
>> +    if (r->mon.abmc_capable && !hw_res->abmc_enabled) {
>> +        ret = _resctrl_abmc_enable(r, true);
>> +        if (!ret)
>> +            hw_res->abmc_enabled = true;
>> +    }
>> +
>> +    return ret;

> resctrl_arch_abmc_enable() should probably keep returning an int even though
> this implementation does not need it since other archs may indeed return error.

Just as a datapoint on this: arm64 does indeed need to be able to return an error here.
This helper gets used to allocate all the monitors (and an array to hold them) which can fail.


Thanks,

James

^ permalink raw reply	[flat|nested] 95+ messages in thread

end of thread, other threads:[~2024-08-16 16:29 UTC | newest]

Thread overview: 95+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-03 21:48 [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
2024-07-03 21:48 ` [PATCH v5 01/20] x86/cpufeatures: Add support for " Babu Moger
2024-07-12 21:55   ` Reinette Chatre
2024-07-15 18:36     ` Moger, Babu
2024-07-03 21:48 ` [PATCH v5 02/20] x86/resctrl: Add ABMC feature in the command line options Babu Moger
2024-07-03 21:48 ` [PATCH v5 03/20] x86/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
2024-07-12 21:57   ` Reinette Chatre
2024-07-15 19:05     ` Moger, Babu
2024-07-03 21:48 ` [PATCH v5 04/20] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
2024-07-12 22:04   ` Reinette Chatre
2024-07-15 20:04     ` Moger, Babu
2024-07-16 15:11       ` Reinette Chatre
2024-07-03 21:48 ` [PATCH v5 05/20] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags Babu Moger
2024-07-12 22:04   ` Reinette Chatre
2024-07-03 21:48 ` [PATCH v5 06/20] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
2024-07-12 22:05   ` Reinette Chatre
2024-07-16 15:13     ` Moger, Babu
2024-07-16 17:51       ` Reinette Chatre
2024-07-16 18:48         ` Moger, Babu
2024-07-16 20:41           ` Reinette Chatre
2024-07-18 21:11         ` Moger, Babu
2024-08-16 16:29     ` James Morse
2024-07-03 21:48 ` [PATCH v5 07/20] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
2024-07-12 22:06   ` Reinette Chatre
2024-07-16 16:51     ` Moger, Babu
2024-07-03 21:48 ` [PATCH v5 08/20] x86/resctrl: Introduce interface to display number of monitoring counters Babu Moger
2024-07-03 21:48 ` [PATCH v5 09/20] x86/resctrl: Initialize monitor counters bitmap Babu Moger
2024-07-12 22:07   ` Reinette Chatre
2024-07-16 17:59     ` Moger, Babu
2024-07-26 22:48   ` Peter Newman
2024-07-26 23:53     ` Moger, Babu
2024-08-01 21:05     ` Reinette Chatre
2024-07-03 21:48 ` [PATCH v5 10/20] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg Babu Moger
2024-07-12 22:08   ` Reinette Chatre
2024-07-16 19:21     ` Moger, Babu
2024-07-16 20:42       ` Reinette Chatre
2024-07-16 22:43         ` Moger, Babu
2024-07-03 21:48 ` [PATCH v5 11/20] x86/resctrl: Remove MSR reading of event configuration value Babu Moger
2024-07-12 22:10   ` Reinette Chatre
2024-07-16 19:34     ` Moger, Babu
2024-07-03 21:48 ` [PATCH v5 12/20] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
2024-07-12 22:13   ` Reinette Chatre
2024-07-16 20:24     ` Moger, Babu
2024-07-03 21:48 ` [PATCH v5 13/20] x86/resctrl: Add the interface to assign hardware counter Babu Moger
2024-07-12 22:09   ` Reinette Chatre
2024-07-16 20:45     ` Moger, Babu
2024-07-03 21:48 ` [PATCH v5 14/20] x86/resctrl: Add the interface to unassign " Babu Moger
2024-07-03 21:48 ` [PATCH v5 15/20] x86/resctrl: Assign/unassign counters by default when ABMC is enabled Babu Moger
2024-07-12 22:10   ` Reinette Chatre
2024-07-16 20:58     ` Moger, Babu
2024-07-26 23:22   ` Peter Newman
2024-07-26 23:57     ` Moger, Babu
2024-07-03 21:48 ` [PATCH v5 16/20] x86/resctrl: Report "Unassigned" for MBM events in ABMC mode Babu Moger
2024-07-12 22:13   ` Reinette Chatre
2024-07-16 21:04     ` Moger, Babu
2024-07-13 20:26   ` Markus Elfring
2024-07-03 21:48 ` [PATCH v5 17/20] x86/resctrl: Introduce the interface switch between monitor modes Babu Moger
2024-07-12 22:14   ` Reinette Chatre
2024-07-16 22:46     ` Moger, Babu
2024-07-13  7:15   ` Markus Elfring
2024-07-03 21:48 ` [PATCH v5 18/20] x86/resctrl: Enable AMD ABMC feature by default when supported Babu Moger
2024-07-12 22:15   ` Reinette Chatre
2024-07-16 23:23     ` Moger, Babu
2024-07-26  0:16       ` Moger, Babu
2024-08-01 21:40         ` Reinette Chatre
2024-07-03 21:48 ` [PATCH v5 19/20] x86/resctrl: Introduce interface to list monitor states of all the groups Babu Moger
2024-07-12 22:16   ` Reinette Chatre
2024-07-17 15:22     ` Moger, Babu
2024-08-01 21:37       ` Reinette Chatre
2024-08-02 16:10         ` Moger, Babu
2024-07-03 21:48 ` [PATCH v5 20/20] x86/resctrl: Introduce interface to modify assignment states of " Babu Moger
2024-07-12 22:17   ` Reinette Chatre
2024-07-17 16:22     ` Moger, Babu
2024-07-25  0:03   ` Peter Newman
2024-07-25  1:22     ` Moger, Babu
2024-07-25 17:11       ` Peter Newman
2024-07-25 17:28         ` Moger, Babu
2024-08-01 18:56           ` Reinette Chatre
2024-08-01 19:40             ` Moger, Babu
2024-07-12 22:03 ` [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Reinette Chatre
2024-07-17 17:19   ` Moger, Babu
2024-08-01 21:49     ` Reinette Chatre
2024-08-01 22:45       ` Peter Newman
2024-08-02 16:13         ` Reinette Chatre
2024-08-02 18:49           ` Moger, Babu
2024-08-02 19:13             ` Peter Newman
2024-08-02 20:23               ` Moger, Babu
2024-08-02 18:49           ` Peter Newman
2024-08-02 20:38             ` Moger, Babu
2024-08-02 20:55             ` Reinette Chatre
2024-08-02 22:50               ` Peter Newman
2024-08-14 17:37                 ` Reinette Chatre
2024-08-15 23:06                   ` Peter Newman
2024-08-16  1:45                     ` Reinette Chatre
2024-08-03  0:49               ` Moger, Babu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).