* [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
@ 2024-08-06 22:00 Babu Moger
2024-08-06 22:00 ` [PATCH v6 01/22] x86/cpufeatures: Add support for " Babu Moger
` (22 more replies)
0 siblings, 23 replies; 96+ messages in thread
From: Babu Moger @ 2024-08-06 22:00 UTC (permalink / raw)
To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
This series adds the support for Assignable Bandwidth Monitoring Counters
(ABMC). It is also called QoS RMID Pinning feature
The feature details are documented in the APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC). The documentation is available at
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
The patches are based on top of commit
325700e2e9e8 (tip/master) Merge branch into tip/master: 'x86/timers'.
# Introduction
Users can create as many monitor groups as RMIDs supported by the hardware.
However, bandwidth monitoring feature on AMD system only guarantees that
RMIDs currently assigned to a processor will be tracked by hardware.
The counters of any other RMIDs which are no longer being tracked will be
reset to zero. The MBM event counters return "Unavailable" for the RMIDs
that are not tracked by hardware. So, there can be only limited number of
groups that can give guaranteed monitoring numbers. With ever changing
configurations there is no way to definitely know which of these groups
are being tracked for certain point of time. Users do not have the option
to monitor a group or set of groups for certain period of time without
worrying about RMID being reset in between.
The ABMC feature provides an option to the user to assign a hardware
counter to an RMID and monitor the bandwidth as long as it is assigned.
The assigned RMID will be tracked by the hardware until the user unassigns
it manually. There is no need to worry about counters being reset during
this period. Additionally, the user can specify a bitmask identifying the
specific bandwidth types from the given source to track with the counter.
Without ABMC enabled, monitoring will work in current mode without
assignment option.
# Linux Implementation
Create a generic interface aimed to support user space assignment
of scarce counters used for monitoring. First usage of interface
is by ABMC with option to expand usage to "soft-ABMC" and MPAM
counters in future.
Feature adds following interface files:
/sys/fs/resctrl/info/L3_MON/mbm_mode: Reports the list of assignable
monitoring features supported. The enclosed brackets indicate which
feature is enabled.
/sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
counters available for assignment.
/sys/fs/resctrl/info/L3_MON/mbm_control: Reports the resctrl group and monitor
status of each group. Assignment state can be updated by writing to the
interface.
# Examples
a. Check if ABMC support is available
#mount -t resctrl resctrl /sys/fs/resctrl/
#cat /sys/fs/resctrl/info/L3_MON/mbm_mode
[mbm_cntr_assign]
legacy
ABMC feature is detected and it is enabled.
b. Check how many ABMC counters are available.
#cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
32
c. Create few resctrl groups.
# mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_control
to list and modify the group's monitoring states. File provides single place
to list monitoring states of all the resctrl groups. It makes it easier for
user space to learn about the counters are used without needing to traverse
all the groups thus reducing the number of file system calls.
The list follows the following format:
"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
Format for specific type of groups:
* Default CTRL_MON group:
"//<domain_id>=<flags>"
* Non-default CTRL_MON group:
"<CTRL_MON group>//<domain_id>=<flags>"
* Child MON group of default CTRL_MON group:
"/<MON group>/<domain_id>=<flags>"
* Child MON group of non-default CTRL_MON group:
"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
Flags can be one of the following:
t MBM total event is enabled.
l MBM local event is enabled.
tl Both total and local MBM events are enabled.
_ None of the MBM events are enabled
Examples:
# cat /sys/fs/resctrl/info/L3_MON/mbm_control
non_default_ctrl_mon_grp//0=tl;1=tl;
non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
//0=tl;1=tl;
/child_default_mon_grp/0=tl;1=tl;
There are four groups and all the groups have local and total
event enabled on domain 0 and 1.
e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_control.
The write format is similar to the above list format with addition
of opcode for the assignment operation.
“<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>”
* Default CTRL_MON group:
"//<domain_id><opcode><flags>"
* Non-default CTRL_MON group:
"<CTRL_MON group>//<domain_id><opcode><flags>"
* Child MON group of default CTRL_MON group:
"/<MON group>/<domain_id><opcode><flags>"
* Child MON group of non-default CTRL_MON group:
"<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
Opcode can be one of the following:
= Update the assignment to match the flag.
+ Assign a new event.
- Unassign a new event.
Flags can be one of the following:
t MBM total event.
l MBM local event.
tl Both total and local MBM events.
_ None of the MBM events. Only works with '=' opcode.
Initial group status:
# cat /sys/fs/resctrl/info/L3_MON/mbm_control
non_default_ctrl_mon_grp//0=tl;1=tl;
non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
//0=tl;1=tl;
/child_default_mon_grp/0=tl;1=tl;
To update the default group to enable only total event on domain 0:
# echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_control
Assignment status after the update:
# cat /sys/fs/resctrl/info/L3_MON/mbm_control
non_default_ctrl_mon_grp//0=tl;1=tl;
non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
//0=t;1=tl;
/child_default_mon_grp/0=tl;1=tl;
To update the MON group child_default_mon_grp to remove total event on domain 1:
# echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_control
Assignment status after the update:
$ cat /sys/fs/resctrl/info/L3_MON/mbm_control
non_default_ctrl_mon_grp//0=tl;1=tl;
non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
//0=t;1=tl;
/child_default_mon_grp/0=tl;1=l;
To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to
remove both local and total events on domain 1:
# echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
/sys/fs/resctrl/info/L3_MON/mbm_control
Assignment status after the update:
non_default_ctrl_mon_grp//0=tl;1=tl;
non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
//0=t;1=tl;
/child_default_mon_grp/0=tl;1=l;
To update the default group to add a local event domain 0.
# echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_control
Assignment status after the update:
# cat /sys/fs/resctrl/info/L3_MON/mbm_control
non_default_ctrl_mon_grp//0=tl;1=tl;
non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
//0=tl;1=tl;
/child_default_mon_grp/0=tl;1=l;
To update the non default CTRL_MON group non_default_ctrl_mon_grp to unassign all
the MBM events on all the domains.
# echo "non_default_ctrl_mon_grp//*=_" > /sys/fs/resctrl/info/L3_MON/mbm_control
Assignment status after the update:
# cat /sys/fs/resctrl/info/L3_MON/mbm_control
non_default_ctrl_mon_grp//0=_;1=_;
non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
//0=tl;1=tl;
/child_default_mon_grp/0=tl;1=l;
f. Read the event mbm_total_bytes and mbm_local_bytes of the default group.
There is no change in reading the events with ABMC. If the event is unassigned
when reading, then the read will come back as "Unassigned".
# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
779247936
# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
765207488
g. Check the bandwidth configuration for the group. Note that bandwidth
configuration has a domain scope. Total event defaults to 0x7F (to
count all the events) and local event defaults to 0x15 (to count all
the local numa events). The event bitmap decoding is available at
https://www.kernel.org/doc/Documentation/x86/resctrl.rst
in section "mbm_total_bytes_config", "mbm_local_bytes_config":
#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
0=0x7f;1=0x7f
#cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
0=0x15;1=0x15
h. Change the bandwidth source for domain 0 for the total event to count only reads.
Note that this change effects total events on the domain 0.
#echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
0=0x33;1=0x7F
i. Now read the total event again. The first read will come back with "Unavailable"
status. The subsequent read of mbm_total_bytes will display only the read events.
#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
Unavailable
#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
314101
j. Users will have the option to go back to legacy mbm_mode if required.
This can be done using the following command. Note that switching the
mbm_mode will reset all the mbm counters of all resctrl groups.
# echo "legacy" > /sys/fs/resctrl/info/L3_MON/mbm_mode
# cat /sys/fs/resctrl/info/L3_MON/mbm_mode
mbm_cntr_assign
[legacy]
k. Unmount the resctrl
#umount /sys/fs/resctrl/
---
v6:
We still need to finalize few interface details on mbm_mode and mbm_control
in case of ABMC and Soft-ABMC. We can continue the discussion with this series.
Added support for domain-id '*' to update all the domains at once.
Fixed assign interface to allocate the counter if counter is
not assigned.
Fixed unassign interface to free the counter if the counter is not
assigned in any of the domains.
Renamed abmc_capable to mbm_cntr_assignable.
Renamed abmc_enabled to mbm_cntr_assign_enabled.
Used msr_set_bit and msr_clear_bit for msr updates.
Renamed resctrl_arch_abmc_enable() to resctrl_arch_mbm_cntr_assign_enable().
Renamed resctrl_arch_abmc_disable() to resctrl_arch_mbm_cntr_assign_disable().
Changed the display name from num_cntrs to num_mbm_cntrs.
Removed the variable mbm_cntrs_free_map_len. This is not required.
Removed the call mbm_cntrs_init() in arch code. This needs to be done at higher level.
Used DECLARE_BITMAP to initialize mbm_cntrs_free_map.
Removed unused config value definitions.
Introduced mbm_cntr_map to track counters at domain level. With this
we dont need to send MSR read to read the counter configuration.
Separated all the counter id management to upper level in FS code.
Added checks to detect "Unassigned" before reading the RMID.
More details in each patch.
v5:
Rebase changes (because of SNC support)
Interface changes.
/sys/fs/resctrl/mbm_assign to /sys/fs/resctrl/mbm_mode.
/sys/fs/resctrl/mbm_assign_control to /sys/fs/resctrl/mbm_control.
Added few arch specific routines.
resctrl_arch_get_abmc_enabled.
resctrl_arch_abmc_enable.
resctrl_arch_abmc_disable.
Few renames
num_cntrs_free_map -> mbm_cntrs_free_map
num_cntrs_init -> mbm_cntrs_init
arch_domain_mbm_evt_config -> resctrl_arch_mbm_evt_config
Introduced resctrl_arch_event_config_get and
resctrl_arch_event_config_set() to update event configuration.
Removed mon_state field mongroup. Added MON_CNTR_UNSET to initialize counters.
Renamed ctr_id to cntr_id for the hardware counter.
Report "Unassigned" in case the user attempts to read the events without assigning the counter.
ABMC is enabled during the boot up. Can be enabled or disabled later.
Fixed opcode and flags combination.
'=_" is valid.
"-_" amd "+_" is not valid.
Added all the comments as far as I know. If I missed something, it is not intentional.
v4:
Main change is domain specific event assignment.
Kept the ABMC feature as a default.
Dynamcic switching between ABMC and mbm_legacy is still allowed.
We are still not clear about mount option.
Moved the monitoring related data in resctrl_mon structure from rdt_resource.
Fixed the display of legacy and ABMC mode.
Used bimap APIs when possible.
Removed event configuration read from MSRs. We can use the
internal saved data.(patch 12)
Added more comments about L3_QOS_ABMC_CFG MSR.
Added IPIs to read the assignment status for each domain (patch 18 and 19)
More details in each patch.
v3:
This series adds the support for global assignment mode discussed in
the thread. https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/
Removed the individual assignment mode and included the global assignment interface.
Added following interface files.
a. /sys/fs/resctrl/info/L3_MON/mbm_assign
Used for displaying the current assignment mode and switch between
ABMC and legacy mode.
b. /sys/fs/resctrl/info/L3_MON/mbm_assign_control
Used for lising the groups assignment mode and modify the assignment states.
c. Most of the changes are related to the new interface.
d. Addressed the comments from Reinette, James and Peter.
e. Hope I have addressed most of the major feedbacks discussed. If I missed
something then it is not intentional. Please feel free to comment.
f. Sending this as an RFC as per Reinette's comment. So, this is still open
for discussion.
v2:
a. Major change is the way ABMC is enabled. Earlier, user needed to remount
with -o abmc to enable ABMC feature. Removed that option now.
Now users can enable ABMC by "$echo 1 to /sys/fs/resctrl/info/L3_MON/mbm_assign_enable".
b. Added new word 21 to x86/cpufeatures.h.
c. Display unsupported if user attempts to read the events when ABMC is enabled
and event is not assigned.
d. Display monitor_state as "Unsupported" when ABMC is disabled.
e. Text updates and rebase to latest tip tree (as of Jan 18).
f. This series is still work in progress. I am yet to hear from ARM developers.
v5:
https://lore.kernel.org/lkml/cover.1720043311.git.babu.moger@amd.com/
v4:
https://lore.kernel.org/lkml/cover.1716552602.git.babu.moger@amd.com/
v3:
https://lore.kernel.org/lkml/cover.1711674410.git.babu.moger@amd.com/
v2:
https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/
v1 :
https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/
Babu Moger (22):
x86/cpufeatures: Add support for Assignable Bandwidth Monitoring
Counters (ABMC)
x86/resctrl: Add ABMC feature in the command line options
x86/resctrl: Consolidate monitoring related data from rdt_resource
x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags
x86/resctrl: Add support to enable/disable AMD ABMC feature
x86/resctrl: Introduce the interface to display monitor mode
x86/resctrl: Introduce interface to display number of monitoring
counters
x86/resctrl: Introduce MBM counters bitmap
x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg
x86/resctrl: Remove MSR reading of event configuration value
x86/resctrl: Introduce mbm_cntr_map to track counters at domain
x86/resctrl: Add data structures and definitions for ABMC assignment
x86/resctrl: Introduce cntr_id in mongroup for assignments
x86/resctrl: Add the interface to assign a hardware counter
x86/resctrl: Add the interface to unassign a MBM counter
x86/resctrl: Assign/unassign counters by default when ABMC is enabled
x86/resctrl: Report "Unassigned" for MBM events in ABMC mode
x86/resctrl: Introduce the interface to switch between monitor modes
x86/resctrl: Enable AMD ABMC feature by default when supported
x86/resctrl: Introduce interface to list monitor states of all the
groups
x86/resctrl: Introduce interface to modify assignment states of the
groups
.../admin-guide/kernel-parameters.txt | 2 +-
Documentation/arch/x86/resctrl.rst | 201 ++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 3 +
arch/x86/kernel/cpu/cpuid-deps.c | 3 +
arch/x86/kernel/cpu/resctrl/core.c | 12 +-
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 13 +-
arch/x86/kernel/cpu/resctrl/internal.h | 90 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 65 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 948 ++++++++++++++++--
arch/x86/kernel/cpu/scattered.c | 1 +
include/linux/resctrl.h | 25 +-
12 files changed, 1262 insertions(+), 102 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH v6 01/22] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC)
2024-08-06 22:00 [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
@ 2024-08-06 22:00 ` Babu Moger
2024-08-07 16:32 ` Thomas Gleixner
2024-08-06 22:00 ` [PATCH v6 02/22] x86/resctrl: Add ABMC feature in the command line options Babu Moger
` (21 subsequent siblings)
22 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-08-06 22:00 UTC (permalink / raw)
To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Users can create as many monitor groups as RMIDs supported by the hardware.
However, bandwidth monitoring feature on AMD system only guarantees that
RMIDs currently assigned to a processor will be tracked by hardware. The
counters of any other RMIDs which are no longer being tracked will be
reset to zero. The MBM event counters return "Unavailable" for the RMIDs
that are not tracked by hardware. So, there can be only limited number of
groups that can give guaranteed monitoring numbers. With ever changing
configurations there is no way to definitely know which of these groups
are being tracked for certain point of time. Users do not have the option
to monitor a group or set of groups for certain period of time without
worrying about RMID being reset in between.
The ABMC feature provides an option to the user to assign a hardware
counter to an RMID and monitor the bandwidth as long as it is assigned.
The assigned RMID will be tracked by the hardware until the user unassigns
it manually. There is no need to worry about counters being reset during
this period. Additionally, the user can specify a bitmask identifying the
specific bandwidth types from the given source to track with the counter.
Without ABMC enabled, monitoring will work in current mode without
assignment option.
Linux resctrl subsystem provides the interface to count maximum of two
memory bandwidth events per group, from a combination of available total
and local events. Keeping the current interface, users can enable a maximum
of 2 ABMC counters per group. User will also have the option to enable only
one counter to the group. If the system runs out of assignable ABMC
counters, kernel will display an error. Users need to disable an already
enabled counter to make space for new assignments.
The feature can be detected via CPUID_Fn80000020_EBX_x00 bit 5.
Bits Description
5 ABMC (Assignable Bandwidth Monitoring Counters)
The feature details are documented in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
Note: Checkpatch checks/warnings are ignored to maintain coding style.
v6: Added Reinette's Reviewed-by. Moved the Checkpatch note below ---.
v5: Minor rebase change and subject line update.
v4: Changes because of rebase. Feature word 21 has few more additions now.
Changed the text to "tracked by hardware" instead of active.
v3: Change because of rebase. Actual patch did not change.
v2: Added dependency on X86_FEATURE_BMEC.
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/cpuid-deps.c | 3 +++
arch/x86/kernel/cpu/scattered.c | 1 +
3 files changed, 5 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index dd4682857c12..9dc54d24e8a5 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -473,6 +473,7 @@
#define X86_FEATURE_CLEAR_BHB_HW (21*32+ 3) /* BHI_DIS_S HW control enabled */
#define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* Clear branch history at vmexit using SW loop */
#define X86_FEATURE_FAST_CPPC (21*32 + 5) /* AMD Fast CPPC */
+#define X86_FEATURE_ABMC (21*32+ 6) /* "" Assignable Bandwidth Monitoring Counters */
/*
* BUG word(s)
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index b7d9f530ae16..5227a6232e9e 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -70,6 +70,9 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_CQM_MBM_LOCAL, X86_FEATURE_CQM_LLC },
{ X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_TOTAL },
{ X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_LOCAL },
+ { X86_FEATURE_ABMC, X86_FEATURE_CQM_MBM_TOTAL },
+ { X86_FEATURE_ABMC, X86_FEATURE_CQM_MBM_LOCAL },
+ { X86_FEATURE_ABMC, X86_FEATURE_BMEC },
{ X86_FEATURE_AVX512_BF16, X86_FEATURE_AVX512VL },
{ X86_FEATURE_AVX512_FP16, X86_FEATURE_AVX512BW },
{ X86_FEATURE_ENQCMD, X86_FEATURE_XSAVES },
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index c84c30188fdf..87f63e6b2994 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -49,6 +49,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_MBA, CPUID_EBX, 6, 0x80000008, 0 },
{ X86_FEATURE_SMBA, CPUID_EBX, 2, 0x80000020, 0 },
{ X86_FEATURE_BMEC, CPUID_EBX, 3, 0x80000020, 0 },
+ { X86_FEATURE_ABMC, CPUID_EBX, 5, 0x80000020, 0 },
{ X86_FEATURE_PERFMON_V2, CPUID_EAX, 0, 0x80000022, 0 },
{ X86_FEATURE_AMD_LBR_V2, CPUID_EAX, 1, 0x80000022, 0 },
{ X86_FEATURE_AMD_LBR_PMC_FREEZE, CPUID_EAX, 2, 0x80000022, 0 },
--
2.34.1
^ permalink raw reply related [flat|nested] 96+ messages in thread
* [PATCH v6 02/22] x86/resctrl: Add ABMC feature in the command line options
2024-08-06 22:00 [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
2024-08-06 22:00 ` [PATCH v6 01/22] x86/cpufeatures: Add support for " Babu Moger
@ 2024-08-06 22:00 ` Babu Moger
2024-08-06 22:00 ` [PATCH v6 03/22] x86/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
` (20 subsequent siblings)
22 siblings, 0 replies; 96+ messages in thread
From: Babu Moger @ 2024-08-06 22:00 UTC (permalink / raw)
To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Add the command line option to enable or disable the new resctrl feature
ABMC (Assignable Bandwidth Monitoring Counters).
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v6: No changes
v5: No changes
v4: No changes
v3: No changes
v2: No changes
---
Documentation/admin-guide/kernel-parameters.txt | 2 +-
Documentation/arch/x86/resctrl.rst | 1 +
arch/x86/kernel/cpu/resctrl/core.c | 2 ++
3 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 09126bb8cc9f..12cc0a26c82a 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5604,7 +5604,7 @@
rdt= [HW,X86,RDT]
Turn on/off individual RDT features. List is:
cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
- mba, smba, bmec.
+ mba, smba, bmec, abmc.
E.g. to turn on cmt and turn off mba use:
rdt=cmt,!mba
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index a824affd741d..30586728a4cd 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -26,6 +26,7 @@ MBM (Memory Bandwidth Monitoring) "cqm_mbm_total", "cqm_mbm_local"
MBA (Memory Bandwidth Allocation) "mba"
SMBA (Slow Memory Bandwidth Allocation) ""
BMEC (Bandwidth Monitoring Event Configuration) ""
+ABMC (Assignable Bandwidth Monitoring Counters) ""
=============================================== ================================
Historically, new features were made visible by default in /proc/cpuinfo. This
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 1930fce9dfe9..9417d8bb7029 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -801,6 +801,7 @@ enum {
RDT_FLAG_MBA,
RDT_FLAG_SMBA,
RDT_FLAG_BMEC,
+ RDT_FLAG_ABMC,
};
#define RDT_OPT(idx, n, f) \
@@ -826,6 +827,7 @@ static struct rdt_options rdt_options[] __initdata = {
RDT_OPT(RDT_FLAG_MBA, "mba", X86_FEATURE_MBA),
RDT_OPT(RDT_FLAG_SMBA, "smba", X86_FEATURE_SMBA),
RDT_OPT(RDT_FLAG_BMEC, "bmec", X86_FEATURE_BMEC),
+ RDT_OPT(RDT_FLAG_ABMC, "abmc", X86_FEATURE_ABMC),
};
#define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options)
--
2.34.1
^ permalink raw reply related [flat|nested] 96+ messages in thread
* [PATCH v6 03/22] x86/resctrl: Consolidate monitoring related data from rdt_resource
2024-08-06 22:00 [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
2024-08-06 22:00 ` [PATCH v6 01/22] x86/cpufeatures: Add support for " Babu Moger
2024-08-06 22:00 ` [PATCH v6 02/22] x86/resctrl: Add ABMC feature in the command line options Babu Moger
@ 2024-08-06 22:00 ` Babu Moger
2024-08-16 21:29 ` Reinette Chatre
2024-08-06 22:00 ` [PATCH v6 04/22] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
` (19 subsequent siblings)
22 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-08-06 22:00 UTC (permalink / raw)
To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
The cache allocation and memory bandwidth allocation feature properties
are consolidated into cache and membw structures respectively.
In preparation for more monitoring properties that will clobber the
existing resource struct more, re-organize the monitoring specific
properties to also be in a separate structure.
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v6: Update commit message and update kernel doc for rdt_resource.
v5: Commit message update.
Also changes related to data structure updates does to SNC support.
v4: New patch.
---
arch/x86/kernel/cpu/resctrl/core.c | 2 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 18 +++++++++---------
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 ++++----
include/linux/resctrl.h | 15 +++++++++++----
4 files changed, 25 insertions(+), 18 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 9417d8bb7029..4a2d0955ccdc 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -617,7 +617,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
arch_mon_domain_online(r, d);
- if (arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
+ if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
mon_domain_free(hw_dom);
return;
}
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 851b561850e0..795fe91a8feb 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -222,7 +222,7 @@ static int logical_rmid_to_physical_rmid(int cpu, int lrmid)
if (snc_nodes_per_l3_cache == 1)
return lrmid;
- return lrmid + (cpu_to_node(cpu) % snc_nodes_per_l3_cache) * r->num_rmid;
+ return lrmid + (cpu_to_node(cpu) % snc_nodes_per_l3_cache) * r->mon.num_rmid;
}
static int __rmid_read_phys(u32 prmid, enum resctrl_event_id eventid, u64 *val)
@@ -297,11 +297,11 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *
if (is_mbm_total_enabled())
memset(hw_dom->arch_mbm_total, 0,
- sizeof(*hw_dom->arch_mbm_total) * r->num_rmid);
+ sizeof(*hw_dom->arch_mbm_total) * r->mon.num_rmid);
if (is_mbm_local_enabled())
memset(hw_dom->arch_mbm_local, 0,
- sizeof(*hw_dom->arch_mbm_local) * r->num_rmid);
+ sizeof(*hw_dom->arch_mbm_local) * r->mon.num_rmid);
}
static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
@@ -1083,14 +1083,14 @@ static struct mon_evt mbm_local_event = {
*/
static void l3_mon_evt_init(struct rdt_resource *r)
{
- INIT_LIST_HEAD(&r->evt_list);
+ INIT_LIST_HEAD(&r->mon.evt_list);
if (is_llc_occupancy_enabled())
- list_add_tail(&llc_occupancy_event.list, &r->evt_list);
+ list_add_tail(&llc_occupancy_event.list, &r->mon.evt_list);
if (is_mbm_total_enabled())
- list_add_tail(&mbm_total_event.list, &r->evt_list);
+ list_add_tail(&mbm_total_event.list, &r->mon.evt_list);
if (is_mbm_local_enabled())
- list_add_tail(&mbm_local_event.list, &r->evt_list);
+ list_add_tail(&mbm_local_event.list, &r->mon.evt_list);
}
/*
@@ -1186,7 +1186,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale / snc_nodes_per_l3_cache;
- r->num_rmid = (boot_cpu_data.x86_cache_max_rmid + 1) / snc_nodes_per_l3_cache;
+ r->mon.num_rmid = (boot_cpu_data.x86_cache_max_rmid + 1) / snc_nodes_per_l3_cache;
hw_res->mbm_width = MBM_CNTR_WIDTH_BASE;
if (mbm_offset > 0 && mbm_offset <= MBM_CNTR_WIDTH_OFFSET_MAX)
@@ -1201,7 +1201,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
*
* For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC.
*/
- threshold = resctrl_rmid_realloc_limit / r->num_rmid;
+ threshold = resctrl_rmid_realloc_limit / r->mon.num_rmid;
/*
* Because num_rmid may not be a power of two, round the value
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index d7163b764c62..f9f3b5db1987 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1097,7 +1097,7 @@ static int rdt_num_rmids_show(struct kernfs_open_file *of,
{
struct rdt_resource *r = of->kn->parent->priv;
- seq_printf(seq, "%d\n", r->num_rmid);
+ seq_printf(seq, "%d\n", r->mon.num_rmid);
return 0;
}
@@ -1108,7 +1108,7 @@ static int rdt_mon_features_show(struct kernfs_open_file *of,
struct rdt_resource *r = of->kn->parent->priv;
struct mon_evt *mevt;
- list_for_each_entry(mevt, &r->evt_list, list) {
+ list_for_each_entry(mevt, &r->mon.evt_list, list) {
seq_printf(seq, "%s\n", mevt->name);
if (mevt->configurable)
seq_printf(seq, "%s_config\n", mevt->name);
@@ -3057,13 +3057,13 @@ static int mon_add_all_files(struct kernfs_node *kn, struct rdt_mon_domain *d,
struct mon_evt *mevt;
int ret;
- if (WARN_ON(list_empty(&r->evt_list)))
+ if (WARN_ON(list_empty(&r->mon.evt_list)))
return -EPERM;
priv.u.rid = r->rid;
priv.u.domid = do_sum ? d->ci->id : d->hdr.id;
priv.u.sum = do_sum;
- list_for_each_entry(mevt, &r->evt_list, list) {
+ list_for_each_entry(mevt, &r->mon.evt_list, list) {
priv.u.evtid = mevt->evtid;
ret = mon_addfile(kn, mevt->name, priv.priv);
if (ret)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index b0875b99e811..1097559f4987 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -182,12 +182,21 @@ enum resctrl_scope {
RESCTRL_L3_NODE,
};
+/**
+ * struct resctrl_mon - Monitoring related data
+ * @num_rmid: Number of RMIDs available
+ * @evt_list: List of monitoring events
+ */
+struct resctrl_mon {
+ int num_rmid;
+ struct list_head evt_list;
+};
+
/**
* struct rdt_resource - attributes of a resctrl resource
* @rid: The index of the resource
* @alloc_capable: Is allocation available on this machine
* @mon_capable: Is monitor feature available on this machine
- * @num_rmid: Number of RMIDs available
* @ctrl_scope: Scope of this resource for control functions
* @mon_scope: Scope of this resource for monitor functions
* @cache: Cache allocation related data
@@ -199,7 +208,6 @@ enum resctrl_scope {
* @default_ctrl: Specifies default cache cbm or memory B/W percent.
* @format_str: Per resource format string to show domain value
* @parse_ctrlval: Per resource function pointer to parse control values
- * @evt_list: List of monitoring events
* @fflags: flags to choose base and info files
* @cdp_capable: Is the CDP feature available on this resource
*/
@@ -207,11 +215,11 @@ struct rdt_resource {
int rid;
bool alloc_capable;
bool mon_capable;
- int num_rmid;
enum resctrl_scope ctrl_scope;
enum resctrl_scope mon_scope;
struct resctrl_cache cache;
struct resctrl_membw membw;
+ struct resctrl_mon mon;
struct list_head ctrl_domains;
struct list_head mon_domains;
char *name;
@@ -221,7 +229,6 @@ struct rdt_resource {
int (*parse_ctrlval)(struct rdt_parse_data *data,
struct resctrl_schema *s,
struct rdt_ctrl_domain *d);
- struct list_head evt_list;
unsigned long fflags;
bool cdp_capable;
};
--
2.34.1
^ permalink raw reply related [flat|nested] 96+ messages in thread
* [PATCH v6 04/22] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
2024-08-06 22:00 [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (2 preceding siblings ...)
2024-08-06 22:00 ` [PATCH v6 03/22] x86/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
@ 2024-08-06 22:00 ` Babu Moger
2024-08-07 16:33 ` Thomas Gleixner
2024-08-16 21:30 ` Reinette Chatre
2024-08-06 22:00 ` [PATCH v6 05/22] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags Babu Moger
` (18 subsequent siblings)
22 siblings, 2 replies; 96+ messages in thread
From: Babu Moger @ 2024-08-06 22:00 UTC (permalink / raw)
To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
ABMC feature details are reported via CPUID Fn8000_0020_EBX_x5.
Bits Description
15:0 MAX_ABMC Maximum Supported Assignable Bandwidth
Monitoring Counter ID + 1
The feature details are documented in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
Detect the feature and number of assignable counters supported.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v6: Commit message update.
Renamed abmc_capable to mbm_cntr_assignable.
v5: Name change num_cntrs to num_mbm_cntrs.
Moved abmc_capable to resctrl_mon.
v4: Removed resctrl_arch_has_abmc(). Added all the code inline. We dont
need to separate this as arch code.
v3: Removed changes related to mon_features.
Moved rdt_cpu_has to core.c and added new function resctrl_arch_has_abmc.
Also moved the fields mbm_assign_capable and mbm_assign_cntrs to
rdt_resource. (James)
v2: Changed the field name to mbm_assign_capable from abmc_capable.
---
arch/x86/kernel/cpu/resctrl/monitor.c | 12 ++++++++++++
include/linux/resctrl.h | 4 ++++
2 files changed, 16 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 795fe91a8feb..88312b5f0069 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1229,6 +1229,18 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
mbm_local_event.configurable = true;
mbm_config_rftype_init("mbm_local_bytes_config");
}
+
+ if (rdt_cpu_has(X86_FEATURE_ABMC)) {
+ r->mon.mbm_cntr_assignable = true;
+ /*
+ * Query CPUID_Fn80000020_EBX_x05 for number of
+ * ABMC counters.
+ */
+ cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
+ r->mon.num_mbm_cntrs = (ebx & 0xFFFF) + 1;
+ if (WARN_ON(r->mon.num_mbm_cntrs > 64))
+ r->mon.num_mbm_cntrs = 64;
+ }
}
l3_mon_evt_init(r);
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 1097559f4987..72c498deeb5e 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -185,10 +185,14 @@ enum resctrl_scope {
/**
* struct resctrl_mon - Monitoring related data
* @num_rmid: Number of RMIDs available
+ * @num_mbm_cntrs: Number of monitoring counters
+ * @mbm_cntr_assignable:Is system capable of supporting monitor assignment?
* @evt_list: List of monitoring events
*/
struct resctrl_mon {
int num_rmid;
+ int num_mbm_cntrs;
+ bool mbm_cntr_assignable;
struct list_head evt_list;
};
--
2.34.1
^ permalink raw reply related [flat|nested] 96+ messages in thread
* [PATCH v6 05/22] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags
2024-08-06 22:00 [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (3 preceding siblings ...)
2024-08-06 22:00 ` [PATCH v6 04/22] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
@ 2024-08-06 22:00 ` Babu Moger
2024-08-06 22:00 ` [PATCH v6 06/22] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
` (17 subsequent siblings)
22 siblings, 0 replies; 96+ messages in thread
From: Babu Moger @ 2024-08-06 22:00 UTC (permalink / raw)
To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
thread_throttle_mode_init() and mbm_config_rftype_init() both initialize
fflags for resctrl files.
Adding new files will involve adding another function to initialize
the fflags. This can be simplified by adding a new function
resctrl_file_fflags_init() and passing the file name and flags
to be initialized.
Consolidate fflags initialization into resctrl_file_fflags_init() and
remove thread_throttle_mode_init() and mbm_config_rftype_init().
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
v6: Added Reviewed-by from Reinette.
v5: Commit message update.
v4: Commit message update.
v3: New patch to display ABMC capability.
---
arch/x86/kernel/cpu/resctrl/core.c | 4 +++-
arch/x86/kernel/cpu/resctrl/internal.h | 4 ++--
arch/x86/kernel/cpu/resctrl/monitor.c | 6 ++++--
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 16 +++-------------
4 files changed, 12 insertions(+), 18 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 4a2d0955ccdc..ff5cb693b396 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -226,7 +226,9 @@ static bool __get_mem_config_intel(struct rdt_resource *r)
r->membw.throttle_mode = THREAD_THROTTLE_PER_THREAD;
else
r->membw.throttle_mode = THREAD_THROTTLE_MAX;
- thread_throttle_mode_init();
+
+ resctrl_file_fflags_init("thread_throttle_mode",
+ RFTYPE_CTRL_INFO | RFTYPE_RES_MB);
r->alloc_capable = true;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 955999aecfca..2bd207624eec 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -647,8 +647,8 @@ void cqm_handle_limbo(struct work_struct *work);
bool has_busy_rmid(struct rdt_mon_domain *d);
void __check_limbo(struct rdt_mon_domain *d, bool force_free);
void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
-void __init thread_throttle_mode_init(void);
-void __init mbm_config_rftype_init(const char *config);
+void __init resctrl_file_fflags_init(const char *config,
+ unsigned long fflags);
void rdt_staged_configs_clear(void);
bool closid_allocated(unsigned int closid);
int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 88312b5f0069..5e8706ab6361 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1223,11 +1223,13 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
mbm_total_event.configurable = true;
- mbm_config_rftype_init("mbm_total_bytes_config");
+ resctrl_file_fflags_init("mbm_total_bytes_config",
+ RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
}
if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
mbm_local_event.configurable = true;
- mbm_config_rftype_init("mbm_local_bytes_config");
+ resctrl_file_fflags_init("mbm_local_bytes_config",
+ RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
}
if (rdt_cpu_has(X86_FEATURE_ABMC)) {
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index f9f3b5db1987..7e76f8d839fc 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2020,24 +2020,14 @@ static struct rftype *rdtgroup_get_rftype_by_name(const char *name)
return NULL;
}
-void __init thread_throttle_mode_init(void)
-{
- struct rftype *rft;
-
- rft = rdtgroup_get_rftype_by_name("thread_throttle_mode");
- if (!rft)
- return;
-
- rft->fflags = RFTYPE_CTRL_INFO | RFTYPE_RES_MB;
-}
-
-void __init mbm_config_rftype_init(const char *config)
+void __init resctrl_file_fflags_init(const char *config,
+ unsigned long fflags)
{
struct rftype *rft;
rft = rdtgroup_get_rftype_by_name(config);
if (rft)
- rft->fflags = RFTYPE_MON_INFO | RFTYPE_RES_CACHE;
+ rft->fflags = fflags;
}
/**
--
2.34.1
^ permalink raw reply related [flat|nested] 96+ messages in thread
* [PATCH v6 06/22] x86/resctrl: Add support to enable/disable AMD ABMC feature
2024-08-06 22:00 [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (4 preceding siblings ...)
2024-08-06 22:00 ` [PATCH v6 05/22] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags Babu Moger
@ 2024-08-06 22:00 ` Babu Moger
2024-08-16 16:29 ` James Morse
2024-08-16 21:31 ` Reinette Chatre
2024-08-06 22:00 ` [PATCH v6 07/22] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
` (16 subsequent siblings)
22 siblings, 2 replies; 96+ messages in thread
From: Babu Moger @ 2024-08-06 22:00 UTC (permalink / raw)
To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Add the functionality to enable/disable AMD ABMC feature.
AMD ABMC feature is enabled by setting enabled bit(0) in MSR
L3_QOS_EXT_CFG. When the state of ABMC is changed, the MSR needs
to be updated on all the logical processors in the QOS Domain.
Hardware counters will reset when ABMC state is changed. Reset the
architectural state so that reading of hardware counter is not considered
as an overflow in next update.
The ABMC feature details are documented in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v6: Renamed abmc_enabled to mbm_cntr_assign_enabled.
Used msr_set_bit and msr_clear_bit for msr updates.
Renamed resctrl_arch_abmc_enable() to resctrl_arch_mbm_cntr_assign_enable().
Renamed resctrl_arch_abmc_disable() to resctrl_arch_mbm_cntr_assign_disable().
Made _resctrl_abmc_enable to return void.
v5: Renamed resctrl_abmc_enable to resctrl_arch_abmc_enable.
Renamed resctrl_abmc_disable to resctrl_arch_abmc_disable.
Introduced resctrl_arch_get_abmc_enabled to get abmc state from
non-arch code.
Renamed resctrl_abmc_set_all to _resctrl_abmc_enable().
Modified commit log to make it clear about AMD ABMC feature.
v3: No changes.
v2: Few text changes in commit message.
---
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/resctrl/internal.h | 13 ++++++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 57 ++++++++++++++++++++++++++
3 files changed, 71 insertions(+)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 82c6a4d350e0..d86469bf5d41 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1182,6 +1182,7 @@
#define MSR_IA32_MBA_BW_BASE 0xc0000200
#define MSR_IA32_SMBA_BW_BASE 0xc0000280
#define MSR_IA32_EVT_CFG_BASE 0xc0000400
+#define MSR_IA32_L3_QOS_EXT_CFG 0xc00003ff
/* MSR_IA32_VMX_MISC bits */
#define MSR_IA32_VMX_MISC_INTEL_PT (1ULL << 14)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 2bd207624eec..154983a67646 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -56,6 +56,9 @@
/* Max event bits supported */
#define MAX_EVT_CONFIG_BITS GENMASK(6, 0)
+/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
+#define ABMC_ENABLE_BIT 0
+
/**
* cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
* aren't marked nohz_full
@@ -477,6 +480,7 @@ struct rdt_parse_data {
* @mbm_cfg_mask: Bandwidth sources that can be tracked when Bandwidth
* Monitoring Event Configuration (BMEC) is supported.
* @cdp_enabled: CDP state of this resource
+ * @mbm_cntr_assign_enabled: ABMC feature is enabled
*
* Members of this structure are either private to the architecture
* e.g. mbm_width, or accessed via helpers that provide abstraction. e.g.
@@ -491,6 +495,7 @@ struct rdt_hw_resource {
unsigned int mbm_width;
unsigned int mbm_cfg_mask;
bool cdp_enabled;
+ bool mbm_cntr_assign_enabled;
};
static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r)
@@ -536,6 +541,14 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
void arch_mon_domain_online(struct rdt_resource *r, struct rdt_mon_domain *d);
+static inline bool resctrl_arch_get_abmc_enabled(void)
+{
+ return rdt_resources_all[RDT_RESOURCE_L3].mbm_cntr_assign_enabled;
+}
+
+int resctrl_arch_mbm_cntr_assign_enable(void);
+void resctrl_arch_mbm_cntr_assign_disable(void);
+
/*
* To return the common struct rdt_resource, which is contained in struct
* rdt_hw_resource, walk the resctrl member of struct rdt_hw_resource.
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 7e76f8d839fc..6075b1e5bb77 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2402,6 +2402,63 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable)
return 0;
}
+/*
+ * Update L3_QOS_EXT_CFG MSR on all the CPUs associated with the resource.
+ */
+static void resctrl_abmc_set_one_amd(void *arg)
+{
+ bool *enable = arg;
+
+ if (*enable)
+ msr_set_bit(MSR_IA32_L3_QOS_EXT_CFG, ABMC_ENABLE_BIT);
+ else
+ msr_clear_bit(MSR_IA32_L3_QOS_EXT_CFG, ABMC_ENABLE_BIT);
+}
+
+static void _resctrl_abmc_enable(struct rdt_resource *r, bool enable)
+{
+ struct rdt_mon_domain *d;
+
+ /*
+ * Hardware counters will reset after switching the monitor mode.
+ * Reset the architectural state so that reading of hardware
+ * counter is not considered as an overflow in the next update.
+ */
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
+ on_each_cpu_mask(&d->hdr.cpu_mask,
+ resctrl_abmc_set_one_amd, &enable, 1);
+ resctrl_arch_reset_rmid_all(r, d);
+ }
+}
+
+int resctrl_arch_mbm_cntr_assign_enable(void)
+{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+
+ lockdep_assert_held(&rdtgroup_mutex);
+
+ if (r->mon.mbm_cntr_assignable && !hw_res->mbm_cntr_assign_enabled) {
+ _resctrl_abmc_enable(r, true);
+ hw_res->mbm_cntr_assign_enabled = true;
+ }
+
+ return 0;
+}
+
+void resctrl_arch_mbm_cntr_assign_disable(void)
+{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+
+ lockdep_assert_held(&rdtgroup_mutex);
+
+ if (hw_res->mbm_cntr_assign_enabled) {
+ _resctrl_abmc_enable(r, false);
+ hw_res->mbm_cntr_assign_enabled = false;
+ }
+}
+
/*
* We don't allow rdtgroup directories to be created anywhere
* except the root directory. Thus when looking for the rdtgroup
--
2.34.1
^ permalink raw reply related [flat|nested] 96+ messages in thread
* [PATCH v6 07/22] x86/resctrl: Introduce the interface to display monitor mode
2024-08-06 22:00 [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (5 preceding siblings ...)
2024-08-06 22:00 ` [PATCH v6 06/22] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
@ 2024-08-06 22:00 ` Babu Moger
2024-08-16 16:56 ` James Morse
2024-08-16 21:32 ` Reinette Chatre
2024-08-06 22:00 ` [PATCH v6 08/22] x86/resctrl: Introduce interface to display number of monitoring counters Babu Moger
` (15 subsequent siblings)
22 siblings, 2 replies; 96+ messages in thread
From: Babu Moger @ 2024-08-06 22:00 UTC (permalink / raw)
To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
The mbm_mode displays list of monitor modes supported.
The mbm_cntr_assign is one of the currently supported modes. It is also
called ABMC (Assignable Bandwidth Monitoring Counters) feature. ABMC
feature provides option to assign a hardware counter to an RMID and
monitor the bandwidth as long as it is assigned. ABMC mode is enabled
by default when supported.
Legacy mode works without the assignment option.
Provide an interface to display the monitor mode on the system.
$cat /sys/fs/resctrl/info/L3_MON/mbm_mode
[mbm_cntr_assign]
legacy
Switching the mbm_mode will reset all the mbm counters of all resctrl
groups.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v6: Added documentation for mbm_cntr_assign and legacy mode.
Moved mbm_mode fflags initialization to static initialization.
v5: Changed interface name to mbm_mode.
It will be always available even if ABMC feature is not supported.
Added description in resctrl.rst about ABMC mode.
Fixed display abmc and legacy consistantly.
v4: Fixed the checks for legacy and abmc mode. Default it ABMC.
v3: New patch to display ABMC capability.
---
Documentation/arch/x86/resctrl.rst | 34 ++++++++++++++++++++++++++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 27 ++++++++++++++++++++
2 files changed, 61 insertions(+)
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 30586728a4cd..d4ec605b200a 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -257,6 +257,40 @@ with the following files:
# cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
0=0x30;1=0x30;3=0x15;4=0x15
+"mbm_mode":
+ Reports the list of assignable monitoring features supported. The
+ enclosed brackets indicate which feature is enabled.
+ ::
+
+ cat /sys/fs/resctrl/info/L3_MON/mbm_mode
+ [mbm_cntr_assign]
+ legacy
+
+ "mbm_cntr_assign":
+ AMD's ABMC feature is one of the mbm_cntr_assign mode supported.
+ The bandwidth monitoring feature on AMD system only guarantees
+ that RMIDs currently assigned to a processor will be tracked by
+ hardware. The counters of any other RMIDs which are no longer
+ being tracked will be reset to zero. The MBM event counters
+ return "Unavailable" for the RMIDs that are not tracked by
+ hardware. So, there can be only limited number of groups that can
+ give guaranteed monitoring numbers. With ever changing configurations
+ there is no way to definitely know which of these groups are being
+ tracked for certain point of time. Users do not have the option to
+ monitor a group or set of groups for certain period of time without
+ worrying about RMID being reset in between.
+
+ The ABMC feature provides an option to the user to assign a hardware
+ counter to an RMID and monitor the bandwidth as long as it is assigned.
+ The assigned RMID will be tracked by the hardware until the user
+ unassigns it manually. There is no need to worry about counters being
+ reset during this period.
+
+ "Legacy":
+ Legacy mode works without the assignment option. The monitoring works
+ as long as there are enough RMID counters available to support number
+ of monitoring groups.
+
"max_threshold_occupancy":
Read/write file provides the largest value (in
bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 6075b1e5bb77..d8f85b20ab8f 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -845,6 +845,26 @@ static int rdtgroup_rmid_show(struct kernfs_open_file *of,
return ret;
}
+static int rdtgroup_mbm_mode_show(struct kernfs_open_file *of,
+ struct seq_file *s, void *v)
+{
+ struct rdt_resource *r = of->kn->parent->priv;
+
+ if (r->mon.mbm_cntr_assignable) {
+ if (resctrl_arch_get_abmc_enabled()) {
+ seq_puts(s, "[mbm_cntr_assign]\n");
+ seq_puts(s, "legacy\n");
+ } else {
+ seq_puts(s, "mbm_cntr_assign\n");
+ seq_puts(s, "[legacy]\n");
+ }
+ } else {
+ seq_puts(s, "[legacy]\n");
+ }
+
+ return 0;
+}
+
#ifdef CONFIG_PROC_CPU_RESCTRL
/*
@@ -1901,6 +1921,13 @@ static struct rftype res_common_files[] = {
.seq_show = mbm_local_bytes_config_show,
.write = mbm_local_bytes_config_write,
},
+ {
+ .name = "mbm_mode",
+ .mode = 0444,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .seq_show = rdtgroup_mbm_mode_show,
+ .fflags = RFTYPE_MON_INFO,
+ },
{
.name = "cpus",
.mode = 0644,
--
2.34.1
^ permalink raw reply related [flat|nested] 96+ messages in thread
* [PATCH v6 08/22] x86/resctrl: Introduce interface to display number of monitoring counters
2024-08-06 22:00 [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (6 preceding siblings ...)
2024-08-06 22:00 ` [PATCH v6 07/22] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
@ 2024-08-06 22:00 ` Babu Moger
2024-08-16 21:34 ` Reinette Chatre
2024-08-06 22:00 ` [PATCH v6 09/22] x86/resctrl: Introduce MBM counters bitmap Babu Moger
` (14 subsequent siblings)
22 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-08-06 22:00 UTC (permalink / raw)
To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
The ABMC feature provides an option to the user to assign a hardware
counter to an RMID and monitor the bandwidth as long as the counter is
assigned. Number of assignments depend on number of monitoring counters
available.
Provide the interface to display the number of monitoring counters
supported.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v6: No changes.
v5: Changed the display name from num_cntrs to num_mbm_cntrs.
Updated the commit message.
Moved the patch after mbm_mode is introduced.
v4: Changed the counter name to num_cntrs. And few text changes.
v3: Changed the field name to mbm_assign_cntrs.
v2: Changed the field name to mbm_assignable_counters from abmc_counte
---
Documentation/arch/x86/resctrl.rst | 3 +++
arch/x86/kernel/cpu/resctrl/monitor.c | 2 ++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 16 ++++++++++++++++
3 files changed, 21 insertions(+)
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index d4ec605b200a..fe9f10766c4f 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -291,6 +291,9 @@ with the following files:
as long as there are enough RMID counters available to support number
of monitoring groups.
+"num_mbm_cntrs":
+ The number of monitoring counters available for assignment.
+
"max_threshold_occupancy":
Read/write file provides the largest value (in
bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 5e8706ab6361..83329cefebf7 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1242,6 +1242,8 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
r->mon.num_mbm_cntrs = (ebx & 0xFFFF) + 1;
if (WARN_ON(r->mon.num_mbm_cntrs > 64))
r->mon.num_mbm_cntrs = 64;
+
+ resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
}
}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index d8f85b20ab8f..ab4fab3b7cf1 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -865,6 +865,16 @@ static int rdtgroup_mbm_mode_show(struct kernfs_open_file *of,
return 0;
}
+static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
+ struct seq_file *s, void *v)
+{
+ struct rdt_resource *r = of->kn->parent->priv;
+
+ seq_printf(s, "%d\n", r->mon.num_mbm_cntrs);
+
+ return 0;
+}
+
#ifdef CONFIG_PROC_CPU_RESCTRL
/*
@@ -1936,6 +1946,12 @@ static struct rftype res_common_files[] = {
.seq_show = rdtgroup_cpus_show,
.fflags = RFTYPE_BASE,
},
+ {
+ .name = "num_mbm_cntrs",
+ .mode = 0444,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .seq_show = rdtgroup_num_mbm_cntrs_show,
+ },
{
.name = "cpus_list",
.mode = 0644,
--
2.34.1
^ permalink raw reply related [flat|nested] 96+ messages in thread
* [PATCH v6 09/22] x86/resctrl: Introduce MBM counters bitmap
2024-08-06 22:00 [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (7 preceding siblings ...)
2024-08-06 22:00 ` [PATCH v6 08/22] x86/resctrl: Introduce interface to display number of monitoring counters Babu Moger
@ 2024-08-06 22:00 ` Babu Moger
2024-08-16 16:29 ` James Morse
2024-08-16 21:35 ` Reinette Chatre
2024-08-06 22:00 ` [PATCH v6 10/22] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg Babu Moger
` (13 subsequent siblings)
22 siblings, 2 replies; 96+ messages in thread
From: Babu Moger @ 2024-08-06 22:00 UTC (permalink / raw)
To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hardware provides a set of counters when mbm_cntr_assignable feature is
supported. These counters are used for assigning the events in resctrl
group when the feature is enabled.
Introduce mbm_cntrs_free_map bitmap to track available and free counters
and set of routines to allocate and free the counters.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v6: Removed the variable mbm_cntrs_free_map_len. This is not required.
Removed the call mbm_cntrs_init() in arch code. This needs to be
done at higher level.
Used DECLARE_BITMAP to initialize mbm_cntrs_free_map.
Moved all the counter interfaces mbm_cntr_alloc() and mbm_cntr_free()
in here as part of separating arch and fs bits.
v5:
Updated the comments and commit log.
Few renames
num_cntrs_free_map -> mbm_cntrs_free_map
num_cntrs_init -> mbm_cntrs_init
Added initialization in rdt_get_tree because the default ABMC
enablement happens during the init.
v4: Changed the name to num_cntrs where applicable.
Used bitmap apis.
Added more comments for the globals.
v3: Changed the bitmap name to assign_cntrs_free_map. Removed abmc
from the name.
v2: Changed the bitmap name to assignable_counter_free_map from
abmc_counter_free_map.
---
arch/x86/kernel/cpu/resctrl/internal.h | 2 ++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 33 ++++++++++++++++++++++++++
2 files changed, 35 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 154983a67646..6263362496a3 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -662,6 +662,8 @@ void __check_limbo(struct rdt_mon_domain *d, bool force_free);
void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
void __init resctrl_file_fflags_init(const char *config,
unsigned long fflags);
+int mbm_cntr_alloc(struct rdt_resource *r);
+void mbm_cntr_free(u32 cntr_id);
void rdt_staged_configs_clear(void);
bool closid_allocated(unsigned int closid);
int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index ab4fab3b7cf1..c818965e36c9 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -185,6 +185,37 @@ bool closid_allocated(unsigned int closid)
return !test_bit(closid, &closid_free_map);
}
+/*
+ * Counter bitmap for tracking the available counters.
+ * ABMC feature provides set of hardware counters for enabling events.
+ * Each event takes one hardware counter. Kernel needs to keep track
+ * of number of available counters.
+ */
+static DECLARE_BITMAP(mbm_cntrs_free_map, 64);
+
+static void mbm_cntrs_init(struct rdt_resource *r)
+{
+ bitmap_fill(mbm_cntrs_free_map, r->mon.num_mbm_cntrs);
+}
+
+int mbm_cntr_alloc(struct rdt_resource *r)
+{
+ int cntr_id;
+
+ cntr_id = find_first_bit(mbm_cntrs_free_map, r->mon.num_mbm_cntrs);
+ if (cntr_id >= r->mon.num_mbm_cntrs)
+ return -ENOSPC;
+
+ __clear_bit(cntr_id, mbm_cntrs_free_map);
+
+ return cntr_id;
+}
+
+void mbm_cntr_free(u32 cntr_id)
+{
+ __set_bit(cntr_id, mbm_cntrs_free_map);
+}
+
/**
* rdtgroup_mode_by_closid - Return mode of resource group with closid
* @closid: closid if the resource group
@@ -2748,6 +2779,8 @@ static int rdt_get_tree(struct fs_context *fc)
closid_init();
+ mbm_cntrs_init(&rdt_resources_all[RDT_RESOURCE_L3].r_resctrl);
+
if (resctrl_arch_mon_capable())
flags |= RFTYPE_MON;
--
2.34.1
^ permalink raw reply related [flat|nested] 96+ messages in thread
* [PATCH v6 10/22] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg
2024-08-06 22:00 [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (8 preceding siblings ...)
2024-08-06 22:00 ` [PATCH v6 09/22] x86/resctrl: Introduce MBM counters bitmap Babu Moger
@ 2024-08-06 22:00 ` Babu Moger
2024-08-06 22:00 ` [PATCH v6 11/22] x86/resctrl: Remove MSR reading of event configuration value Babu Moger
` (12 subsequent siblings)
22 siblings, 0 replies; 96+ messages in thread
From: Babu Moger @ 2024-08-06 22:00 UTC (permalink / raw)
To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
If the BMEC (Bandwidth Monitoring Event Configuration) feature is
supported, the bandwidth events can be configured to track specific
events. The event configuration is domain specific. ABMC (Assignable
Bandwidth Monitoring Counters) feature needs event configuration
information to assign hardware counter to an RMID. Event configurations
are not stored in resctrl but instead always read from or written to
hardware directly when prompted by user space.
Read the event configuration from the hardware during the domain
initialization. Save the configuration information in rdt_hw_mon_domain,
so it can be used for counter assignment.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v6: Renamed resctrl_arch_mbm_evt_config -> resctrl_mbm_evt_config_init
Initialized value to INVALID_CONFIG_VALUE if it is not configurable.
Minor commit message update.
v5: Exported mon_event_config_index_get.
Renamed arch_domain_mbm_evt_config to resctrl_arch_mbm_evt_config.
v4: Read the configuration information from the hardware to initialize.
Added few commit messages.
Fixed the tab spaces.
v3: Minor changes related to rebase in mbm_config_write_domain.
v2: No changes.
---
arch/x86/kernel/cpu/resctrl/core.c | 2 ++
arch/x86/kernel/cpu/resctrl/internal.h | 9 +++++++++
arch/x86/kernel/cpu/resctrl/monitor.c | 26 ++++++++++++++++++++++++++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 4 +---
4 files changed, 38 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index ff5cb693b396..6fb0cfdb5529 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -619,6 +619,8 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
arch_mon_domain_online(r, d);
+ resctrl_mbm_evt_config_init(hw_dom);
+
if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
mon_domain_free(hw_dom);
return;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 6263362496a3..4d8cc36a8d79 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -56,6 +56,9 @@
/* Max event bits supported */
#define MAX_EVT_CONFIG_BITS GENMASK(6, 0)
+#define INVALID_CONFIG_VALUE U32_MAX
+#define INVALID_CONFIG_INDEX UINT_MAX
+
/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
#define ABMC_ENABLE_BIT 0
@@ -401,6 +404,8 @@ struct rdt_hw_ctrl_domain {
* @d_resctrl: Properties exposed to the resctrl file system
* @arch_mbm_total: arch private state for MBM total bandwidth
* @arch_mbm_local: arch private state for MBM local bandwidth
+ * @mbm_total_cfg: MBM total bandwidth configuration
+ * @mbm_local_cfg: MBM local bandwidth configuration
*
* Members of this structure are accessed via helpers that provide abstraction.
*/
@@ -408,6 +413,8 @@ struct rdt_hw_mon_domain {
struct rdt_mon_domain d_resctrl;
struct arch_mbm_state *arch_mbm_total;
struct arch_mbm_state *arch_mbm_local;
+ u32 mbm_total_cfg;
+ u32 mbm_local_cfg;
};
static inline struct rdt_hw_ctrl_domain *resctrl_to_arch_ctrl_dom(struct rdt_ctrl_domain *r)
@@ -664,6 +671,8 @@ void __init resctrl_file_fflags_init(const char *config,
unsigned long fflags);
int mbm_cntr_alloc(struct rdt_resource *r);
void mbm_cntr_free(u32 cntr_id);
+void resctrl_mbm_evt_config_init(struct rdt_hw_mon_domain *hw_dom);
+unsigned int mon_event_config_index_get(u32 evtid);
void rdt_staged_configs_clear(void);
bool closid_allocated(unsigned int closid);
int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 83329cefebf7..2f4d0c12b80d 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1254,6 +1254,32 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
return 0;
}
+void resctrl_mbm_evt_config_init(struct rdt_hw_mon_domain *hw_dom)
+{
+ unsigned int index;
+ u64 msrval;
+
+ /*
+ * Read the configuration registers QOS_EVT_CFG_n, where <n> is
+ * the BMEC event number (EvtID).
+ */
+ if (mbm_total_event.configurable) {
+ index = mon_event_config_index_get(QOS_L3_MBM_TOTAL_EVENT_ID);
+ rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval);
+ hw_dom->mbm_total_cfg = msrval & MAX_EVT_CONFIG_BITS;
+ } else {
+ hw_dom->mbm_total_cfg = INVALID_CONFIG_VALUE;
+ }
+
+ if (mbm_local_event.configurable) {
+ index = mon_event_config_index_get(QOS_L3_MBM_LOCAL_EVENT_ID);
+ rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval);
+ hw_dom->mbm_local_cfg = msrval & MAX_EVT_CONFIG_BITS;
+ } else {
+ hw_dom->mbm_total_cfg = INVALID_CONFIG_VALUE;
+ }
+}
+
void __exit rdt_put_mon_l3_config(void)
{
dom_data_exit();
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index c818965e36c9..02afd3442876 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1609,8 +1609,6 @@ struct mon_config_info {
u32 mon_config;
};
-#define INVALID_CONFIG_INDEX UINT_MAX
-
/**
* mon_event_config_index_get - get the hardware index for the
* configurable event
@@ -1620,7 +1618,7 @@ struct mon_config_info {
* 1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
* INVALID_CONFIG_INDEX for invalid evtid
*/
-static inline unsigned int mon_event_config_index_get(u32 evtid)
+unsigned int mon_event_config_index_get(u32 evtid)
{
switch (evtid) {
case QOS_L3_MBM_TOTAL_EVENT_ID:
--
2.34.1
^ permalink raw reply related [flat|nested] 96+ messages in thread
* [PATCH v6 11/22] x86/resctrl: Remove MSR reading of event configuration value
2024-08-06 22:00 [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (9 preceding siblings ...)
2024-08-06 22:00 ` [PATCH v6 10/22] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg Babu Moger
@ 2024-08-06 22:00 ` Babu Moger
2024-08-16 21:36 ` Reinette Chatre
2024-08-06 22:00 ` [PATCH v6 12/22] x86/resctrl: Introduce mbm_cntr_map to track counters at domain Babu Moger
` (11 subsequent siblings)
22 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-08-06 22:00 UTC (permalink / raw)
To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
The event configuration is domain specific and initialized during domain
initialization. The values is stored in rdt_hw_mon_domain.
It is not required to read the configuration register every time user asks
for it. Use the value stored in rdt_hw_mon_domain instead.
Introduce resctrl_arch_event_config_get() and
resctrl_arch_event_config_set() to get/set architecture domain specific
mbm_total_cfg/mbm_local_cfg values. Also, remove unused config value
definitions.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v6: Fixed inconstancy with types. Made all the types to u32 for config
value.
Removed few rdt_last_cmd_puts as it is not necessary.
Removed unused config value definitions.
Few more updates to commit message.
v5: Introduced resctrl_arch_event_config_get and
resctrl_arch_event_config_get() based on our discussion.
https://lore.kernel.org/lkml/68e861f9-245d-4496-a72e-46fc57d19c62@amd.com/
v4: New patch.
---
arch/x86/kernel/cpu/resctrl/internal.h | 21 -----
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 104 ++++++++++++++-----------
include/linux/resctrl.h | 4 +
3 files changed, 64 insertions(+), 65 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 4d8cc36a8d79..1021227d8c7e 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -32,27 +32,6 @@
*/
#define MBM_CNTR_WIDTH_OFFSET_MAX (62 - MBM_CNTR_WIDTH_BASE)
-/* Reads to Local DRAM Memory */
-#define READS_TO_LOCAL_MEM BIT(0)
-
-/* Reads to Remote DRAM Memory */
-#define READS_TO_REMOTE_MEM BIT(1)
-
-/* Non-Temporal Writes to Local Memory */
-#define NON_TEMP_WRITE_TO_LOCAL_MEM BIT(2)
-
-/* Non-Temporal Writes to Remote Memory */
-#define NON_TEMP_WRITE_TO_REMOTE_MEM BIT(3)
-
-/* Reads to Local Memory the system identifies as "Slow Memory" */
-#define READS_TO_LOCAL_S_MEM BIT(4)
-
-/* Reads to Remote Memory the system identifies as "Slow Memory" */
-#define READS_TO_REMOTE_S_MEM BIT(5)
-
-/* Dirty Victims to All Types of Memory */
-#define DIRTY_VICTIMS_TO_ALL_MEM BIT(6)
-
/* Max event bits supported */
#define MAX_EVT_CONFIG_BITS GENMASK(6, 0)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 02afd3442876..0047b4eb0ff5 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1605,10 +1605,57 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
}
struct mon_config_info {
+ struct rdt_mon_domain *d;
u32 evtid;
u32 mon_config;
};
+u32 resctrl_arch_event_config_get(struct rdt_mon_domain *d,
+ enum resctrl_event_id eventid)
+{
+ struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
+
+ switch (eventid) {
+ case QOS_L3_OCCUP_EVENT_ID:
+ break;
+ case QOS_L3_MBM_TOTAL_EVENT_ID:
+ return hw_dom->mbm_total_cfg;
+ case QOS_L3_MBM_LOCAL_EVENT_ID:
+ return hw_dom->mbm_local_cfg;
+ }
+
+ /* Never expect to get here */
+ WARN_ON_ONCE(1);
+
+ return INVALID_CONFIG_VALUE;
+}
+
+void resctrl_arch_event_config_set(void *info)
+{
+ struct mon_config_info *mon_info = info;
+ struct rdt_hw_mon_domain *hw_dom;
+ unsigned int index;
+
+ index = mon_event_config_index_get(mon_info->evtid);
+ if (index == INVALID_CONFIG_INDEX)
+ return;
+
+ wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
+
+ hw_dom = resctrl_to_arch_mon_dom(mon_info->d);
+
+ switch (mon_info->evtid) {
+ case QOS_L3_OCCUP_EVENT_ID:
+ break;
+ case QOS_L3_MBM_TOTAL_EVENT_ID:
+ hw_dom->mbm_total_cfg = mon_info->mon_config;
+ break;
+ case QOS_L3_MBM_LOCAL_EVENT_ID:
+ hw_dom->mbm_local_cfg = mon_info->mon_config;
+ break;
+ }
+}
+
/**
* mon_event_config_index_get - get the hardware index for the
* configurable event
@@ -1631,33 +1678,11 @@ unsigned int mon_event_config_index_get(u32 evtid)
}
}
-static void mon_event_config_read(void *info)
-{
- struct mon_config_info *mon_info = info;
- unsigned int index;
- u64 msrval;
-
- index = mon_event_config_index_get(mon_info->evtid);
- if (index == INVALID_CONFIG_INDEX) {
- pr_warn_once("Invalid event id %d\n", mon_info->evtid);
- return;
- }
- rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval);
-
- /* Report only the valid event configuration bits */
- mon_info->mon_config = msrval & MAX_EVT_CONFIG_BITS;
-}
-
-static void mondata_config_read(struct rdt_mon_domain *d, struct mon_config_info *mon_info)
-{
- smp_call_function_any(&d->hdr.cpu_mask, mon_event_config_read, mon_info, 1);
-}
-
static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
{
- struct mon_config_info mon_info = {0};
struct rdt_mon_domain *dom;
bool sep = false;
+ u32 val;
cpus_read_lock();
mutex_lock(&rdtgroup_mutex);
@@ -1666,11 +1691,11 @@ static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid
if (sep)
seq_puts(s, ";");
- memset(&mon_info, 0, sizeof(struct mon_config_info));
- mon_info.evtid = evtid;
- mondata_config_read(dom, &mon_info);
+ val = resctrl_arch_event_config_get(dom, evtid);
+ if (val == INVALID_CONFIG_VALUE)
+ break;
- seq_printf(s, "%d=0x%02x", dom->hdr.id, mon_info.mon_config);
+ seq_printf(s, "%d=0x%02x", dom->hdr.id, val);
sep = true;
}
seq_puts(s, "\n");
@@ -1701,33 +1726,23 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
return 0;
}
-static void mon_event_config_write(void *info)
-{
- struct mon_config_info *mon_info = info;
- unsigned int index;
-
- index = mon_event_config_index_get(mon_info->evtid);
- if (index == INVALID_CONFIG_INDEX) {
- pr_warn_once("Invalid event id %d\n", mon_info->evtid);
- return;
- }
- wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
-}
static void mbm_config_write_domain(struct rdt_resource *r,
struct rdt_mon_domain *d, u32 evtid, u32 val)
{
struct mon_config_info mon_info = {0};
+ u32 config_val;
/*
- * Read the current config value first. If both are the same then
+ * Check the current config value first. If both are the same then
* no need to write it again.
*/
- mon_info.evtid = evtid;
- mondata_config_read(d, &mon_info);
- if (mon_info.mon_config == val)
+ config_val = resctrl_arch_event_config_get(d, evtid);
+ if (config_val == INVALID_CONFIG_VALUE || config_val == val)
return;
+ mon_info.d = d;
+ mon_info.evtid = evtid;
mon_info.mon_config = val;
/*
@@ -1736,7 +1751,8 @@ static void mbm_config_write_domain(struct rdt_resource *r,
* are scoped at the domain level. Writing any of these MSRs
* on one CPU is observed by all the CPUs in the domain.
*/
- smp_call_function_any(&d->hdr.cpu_mask, mon_event_config_write,
+ smp_call_function_any(&d->hdr.cpu_mask,
+ resctrl_arch_event_config_set,
&mon_info, 1);
/*
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 72c498deeb5e..ef08f75191f2 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -350,6 +350,10 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
*/
void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d);
+void resctrl_arch_event_config_set(void *info);
+u32 resctrl_arch_event_config_get(struct rdt_mon_domain *d,
+ enum resctrl_event_id eventid);
+
extern unsigned int resctrl_rmid_realloc_threshold;
extern unsigned int resctrl_rmid_realloc_limit;
--
2.34.1
^ permalink raw reply related [flat|nested] 96+ messages in thread
* [PATCH v6 12/22] x86/resctrl: Introduce mbm_cntr_map to track counters at domain
2024-08-06 22:00 [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (10 preceding siblings ...)
2024-08-06 22:00 ` [PATCH v6 11/22] x86/resctrl: Remove MSR reading of event configuration value Babu Moger
@ 2024-08-06 22:00 ` Babu Moger
2024-08-16 21:37 ` Reinette Chatre
2024-08-06 22:00 ` [PATCH v6 13/22] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
` (10 subsequent siblings)
22 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-08-06 22:00 UTC (permalink / raw)
To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
The MBM counters are allocated at resctrl group level. It is tracked by
mbm_cntrs_free_map. Then it is assigned to the domain based on the user
input. It needs to be tracked at domain level also.
Add the mbm_cntr_map bitmap in rdt_mon_domain structure to keep track of
assignment at domain level. The global counter at mbm_cntrs_free_map can
be released when assignment at all the domain are cleared.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v6: New patch to add domain level assignment.
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 10 ++++++++++
include/linux/resctrl.h | 2 ++
2 files changed, 12 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 0047b4eb0ff5..1a90c671a027 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -4127,6 +4127,7 @@ static void __init rdtgroup_setup_default(void)
static void domain_destroy_mon_state(struct rdt_mon_domain *d)
{
+ bitmap_free(d->mbm_cntr_map);
bitmap_free(d->rmid_busy_llc);
kfree(d->mbm_total);
kfree(d->mbm_local);
@@ -4200,6 +4201,15 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain
return -ENOMEM;
}
}
+ if (is_mbm_enabled()) {
+ d->mbm_cntr_map = bitmap_zalloc(r->mon.num_mbm_cntrs, GFP_KERNEL);
+ if (!d->mbm_cntr_map) {
+ bitmap_free(d->rmid_busy_llc);
+ kfree(d->mbm_total);
+ kfree(d->mbm_local);
+ return -ENOMEM;
+ }
+ }
return 0;
}
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index ef08f75191f2..034fa994e84f 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -105,6 +105,7 @@ struct rdt_ctrl_domain {
* @cqm_limbo: worker to periodically read CQM h/w counters
* @mbm_work_cpu: worker CPU for MBM h/w counters
* @cqm_work_cpu: worker CPU for CQM h/w counters
+ * @mbm_cntr_map: bitmap to track domain counter assignment
*/
struct rdt_mon_domain {
struct rdt_domain_hdr hdr;
@@ -116,6 +117,7 @@ struct rdt_mon_domain {
struct delayed_work cqm_limbo;
int mbm_work_cpu;
int cqm_work_cpu;
+ unsigned long *mbm_cntr_map;
};
/**
--
2.34.1
^ permalink raw reply related [flat|nested] 96+ messages in thread
* [PATCH v6 13/22] x86/resctrl: Add data structures and definitions for ABMC assignment
2024-08-06 22:00 [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (11 preceding siblings ...)
2024-08-06 22:00 ` [PATCH v6 12/22] x86/resctrl: Introduce mbm_cntr_map to track counters at domain Babu Moger
@ 2024-08-06 22:00 ` Babu Moger
2024-08-16 21:38 ` Reinette Chatre
2024-08-06 22:00 ` [PATCH v6 14/22] x86/resctrl: Introduce cntr_id in mongroup for assignments Babu Moger
` (9 subsequent siblings)
22 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-08-06 22:00 UTC (permalink / raw)
To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
The ABMC feature provides an option to the user to assign a hardware
counter to an RMID and monitor the bandwidth as long as the counter
is assigned. The bandwidth events will be tracked by the hardware until
the user changes the configuration. Each resctrl group can configure
maximum two counters, one for total event and one for local event.
The ABMC feature implements a pair of MSRs, L3_QOS_ABMC_CFG (C000_03FDh)
and L3_QOS_ABMC_DSC (C000_3FEh). The counters are configured by writing
to MSR L3_QOS_ABMC_CFG. Configuration is done by setting the counter id,
bandwidth source (RMID) and bandwidth configuration supported by BMEC
(Bandwidth Monitoring Event Configuration).
L3_QOS_ABMC_DSC is a read-only MSR. Reading L3_QOS_ABMC_DSC returns the
configuration of the counter id specified in L3_QOS_ABMC_CFG.cntr_id
with rmid(bw_src) and event configuration(bw_type).
Attempts to read or write these MSRs when ABMC is not enabled will result
in a #GP(0) exception.
Introduce data structures and definitions for ABMC MSRs.
MSR L3_QOS_ABMC_CFG (0xC000_03FDh) and L3_QOS_ABMC_DSC (0xC000_03FEh)
details.
=========================================================================
Bits Mnemonic Description Access Reset
Type Value
=========================================================================
63 CfgEn Configuration Enable R/W 0
62 CtrEn Enable/disable Tracking R/W 0
61:53 – Reserved MBZ 0
52:48 CtrID Counter Identifier R/W 0
47 IsCOS BwSrc field is a CLOSID R/W 0
(not an RMID)
46:44 – Reserved MBZ 0
43:32 BwSrc Bandwidth Source R/W 0
(RMID or CLOSID)
31:0 BwType Bandwidth configuration R/W 0
to track for this counter
==========================================================================
Configuration and tracking:
CfgEn=1,CtrEn=0 : Configure CtrID and but no tracking the events yet.
CfgEn=1,CtrEn=1 : Configure CtrID and start tracking events.
The feature details are documented in the APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v6: Removed all the fs related changes.
Added note on CfgEn,CtrEn.
Removed the definitions which are not used.
Removed cntr_id initialization.
v5: Moved assignment flags here (path 10/19 of v4).
Added MON_CNTR_UNSET definition to initialize cntr_id's.
More details in commit log.
Renamed few fields in l3_qos_abmc_cfg for readability.
v4: Added more descriptions.
Changed the name abmc_ctr_id to ctr_id.
Added L3_QOS_ABMC_DSC. Used for reading the configuration.
v3: No changes.
v2: No changes.
---
arch/x86/include/asm/msr-index.h | 2 ++
arch/x86/kernel/cpu/resctrl/internal.h | 26 ++++++++++++++++++++++++++
2 files changed, 28 insertions(+)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index d86469bf5d41..5b3931a59d5a 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1183,6 +1183,8 @@
#define MSR_IA32_SMBA_BW_BASE 0xc0000280
#define MSR_IA32_EVT_CFG_BASE 0xc0000400
#define MSR_IA32_L3_QOS_EXT_CFG 0xc00003ff
+#define MSR_IA32_L3_QOS_ABMC_CFG 0xc00003fd
+#define MSR_IA32_L3_QOS_ABMC_DSC 0xc00003fe
/* MSR_IA32_VMX_MISC bits */
#define MSR_IA32_VMX_MISC_INTEL_PT (1ULL << 14)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 1021227d8c7e..af3efa35a62e 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -589,6 +589,32 @@ union cpuid_0x10_x_edx {
unsigned int full;
};
+/*
+ * ABMC counters can be configured by writing to L3_QOS_ABMC_CFG.
+ * @bw_type : Bandwidth configuration(supported by BMEC)
+ * tracked by the @cntr_id.
+ * @bw_src : Bandwidth source (RMID or CLOSID).
+ * @reserved1 : Reserved.
+ * @is_clos : @bw_src field is a CLOSID (not an RMID).
+ * @cntr_id : Counter identifier.
+ * @reserved : Reserved.
+ * @cntr_en : Tracking enable bit.
+ * @cfg_en : Configuration enable bit.
+ */
+union l3_qos_abmc_cfg {
+ struct {
+ unsigned long bw_type :32,
+ bw_src :12,
+ reserved1: 3,
+ is_clos : 1,
+ cntr_id : 5,
+ reserved : 9,
+ cntr_en : 1,
+ cfg_en : 1;
+ } split;
+ unsigned long full;
+};
+
void rdt_last_cmd_clear(void);
void rdt_last_cmd_puts(const char *s);
__printf(1, 2)
--
2.34.1
^ permalink raw reply related [flat|nested] 96+ messages in thread
* [PATCH v6 14/22] x86/resctrl: Introduce cntr_id in mongroup for assignments
2024-08-06 22:00 [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (12 preceding siblings ...)
2024-08-06 22:00 ` [PATCH v6 13/22] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
@ 2024-08-06 22:00 ` Babu Moger
2024-08-16 21:38 ` Reinette Chatre
2024-08-06 22:00 ` [PATCH v6 15/22] x86/resctrl: Add the interface to assign a hardware counter Babu Moger
` (8 subsequent siblings)
22 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-08-06 22:00 UTC (permalink / raw)
To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
mbm_cntr_assignable feature provides an option to the user to assign a
hardware counter to an RMID and monitor the bandwidth as long as the
counter is assigned. There can be two counters per monitor group, one
for total event and another for local event.
Introduce cntr_id to manage the assignments.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v6: New patch.
Separated FS and arch bits.
---
arch/x86/kernel/cpu/resctrl/internal.h | 7 +++++++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 6 ++++++
2 files changed, 13 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index af3efa35a62e..d93082b65d69 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -41,6 +41,11 @@
/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
#define ABMC_ENABLE_BIT 0
+/* Maximum assignable counters per resctrl group */
+#define MAX_CNTRS 2
+
+#define MON_CNTR_UNSET U32_MAX
+
/**
* cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
* aren't marked nohz_full
@@ -210,12 +215,14 @@ enum rdtgrp_mode {
* @parent: parent rdtgrp
* @crdtgrp_list: child rdtgroup node list
* @rmid: rmid for this rdtgroup
+ * @cntr_id: Counter ids for assignment
*/
struct mongroup {
struct kernfs_node *mon_data_kn;
struct rdtgroup *parent;
struct list_head crdtgrp_list;
u32 rmid;
+ u32 cntr_id[MAX_CNTRS];
};
/**
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 1a90c671a027..60696b248b56 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -3564,6 +3564,9 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
}
rdtgrp->mon.rmid = ret;
+ rdtgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
+ rdtgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
+
ret = mkdir_mondata_all(rdtgrp->kn, rdtgrp, &rdtgrp->mon.mon_data_kn);
if (ret) {
rdt_last_cmd_puts("kernfs subdir error\n");
@@ -4118,6 +4121,9 @@ static void __init rdtgroup_setup_default(void)
rdtgroup_default.closid = RESCTRL_RESERVED_CLOSID;
rdtgroup_default.mon.rmid = RESCTRL_RESERVED_RMID;
rdtgroup_default.type = RDTCTRL_GROUP;
+ rdtgroup_default.mon.cntr_id[0] = MON_CNTR_UNSET;
+ rdtgroup_default.mon.cntr_id[1] = MON_CNTR_UNSET;
+
INIT_LIST_HEAD(&rdtgroup_default.mon.crdtgrp_list);
list_add(&rdtgroup_default.rdtgroup_list, &rdt_all_groups);
--
2.34.1
^ permalink raw reply related [flat|nested] 96+ messages in thread
* [PATCH v6 15/22] x86/resctrl: Add the interface to assign a hardware counter
2024-08-06 22:00 [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (13 preceding siblings ...)
2024-08-06 22:00 ` [PATCH v6 14/22] x86/resctrl: Introduce cntr_id in mongroup for assignments Babu Moger
@ 2024-08-06 22:00 ` Babu Moger
2024-08-16 16:30 ` James Morse
2024-08-16 21:41 ` Reinette Chatre
2024-08-06 22:00 ` [PATCH v6 16/22] x86/resctrl: Add the interface to unassign a MBM counter Babu Moger
` (7 subsequent siblings)
22 siblings, 2 replies; 96+ messages in thread
From: Babu Moger @ 2024-08-06 22:00 UTC (permalink / raw)
To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
The ABMC feature provides an option to the user to assign a hardware
counter to an RMID and monitor the bandwidth as long as it is assigned.
The assigned RMID will be tracked by the hardware until the user unassigns
it manually.
Counters are configured by writing to L3_QOS_ABMC_CFG MSR and
specifying the counter id, bandwidth source, and bandwidth types.
Provide the interface to assign the counter ids to RMID.
The feature details are documented in the APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v6: Removed mbm_cntr_alloc() from this patch to keep fs and arch code
separate.
Added code to update the counter assignment at domain level.
v5: Few name changes to match cntr_id.
Changed the function names to
rdtgroup_assign_cntr
resctr_arch_assign_cntr
More comments on commit log.
Added function summary.
v4: Commit message update.
User bitmap APIs where applicable.
Changed the interfaces considering MPAM(arm).
Added domain specific assignment.
v3: Removed the static from the prototype of rdtgroup_assign_abmc.
The function is not called directly from user anymore. These
changes are related to global assignment interface.
v2: Minor text changes in commit message.
---
arch/x86/kernel/cpu/resctrl/internal.h | 4 ++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 97 ++++++++++++++++++++++++++
2 files changed, 101 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index d93082b65d69..4e8109dee174 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -685,6 +685,10 @@ int mbm_cntr_alloc(struct rdt_resource *r);
void mbm_cntr_free(u32 cntr_id);
void resctrl_mbm_evt_config_init(struct rdt_hw_mon_domain *hw_dom);
unsigned int mon_event_config_index_get(u32 evtid);
+int resctrl_arch_assign_cntr(struct rdt_mon_domain *d, enum resctrl_event_id evtid,
+ u32 rmid, u32 cntr_id, u32 closid, bool assign);
+int rdtgroup_assign_cntr(struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
+int rdtgroup_alloc_cntr(struct rdtgroup *rdtgrp, int index);
void rdt_staged_configs_clear(void);
bool closid_allocated(unsigned int closid);
int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 60696b248b56..1ee91a7293a8 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1864,6 +1864,103 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
return ret ?: nbytes;
}
+static void rdtgroup_abmc_cfg(void *info)
+{
+ u64 *msrval = info;
+
+ wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *msrval);
+}
+
+/*
+ * Send an IPI to the domain to assign the counter id to RMID.
+ */
+int resctrl_arch_assign_cntr(struct rdt_mon_domain *d, enum resctrl_event_id evtid,
+ u32 rmid, u32 cntr_id, u32 closid, bool assign)
+{
+ struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
+ union l3_qos_abmc_cfg abmc_cfg = { 0 };
+ struct arch_mbm_state *arch_mbm;
+
+ abmc_cfg.split.cfg_en = 1;
+ abmc_cfg.split.cntr_en = assign ? 1 : 0;
+ abmc_cfg.split.cntr_id = cntr_id;
+ abmc_cfg.split.bw_src = rmid;
+
+ /* Update the event configuration from the domain */
+ if (evtid == QOS_L3_MBM_TOTAL_EVENT_ID) {
+ abmc_cfg.split.bw_type = hw_dom->mbm_total_cfg;
+ arch_mbm = &hw_dom->arch_mbm_total[rmid];
+ } else {
+ abmc_cfg.split.bw_type = hw_dom->mbm_local_cfg;
+ arch_mbm = &hw_dom->arch_mbm_local[rmid];
+ }
+
+ smp_call_function_any(&d->hdr.cpu_mask, rdtgroup_abmc_cfg, &abmc_cfg, 1);
+
+ /*
+ * Reset the architectural state so that reading of hardware
+ * counter is not considered as an overflow in next update.
+ */
+ if (arch_mbm)
+ memset(arch_mbm, 0, sizeof(struct arch_mbm_state));
+
+ return 0;
+}
+
+/* Allocate a new counter id if the event is unassigned */
+int rdtgroup_alloc_cntr(struct rdtgroup *rdtgrp, int index)
+{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ int cntr_id;
+
+ /* Nothing to do if event has been assigned already */
+ if (rdtgrp->mon.cntr_id[index] != MON_CNTR_UNSET) {
+ rdt_last_cmd_puts("ABMC counter is assigned already\n");
+ return 0;
+ }
+
+ /*
+ * Allocate a new counter id and update domains
+ */
+ cntr_id = mbm_cntr_alloc(r);
+ if (cntr_id < 0) {
+ rdt_last_cmd_puts("Out of ABMC counters\n");
+ return -ENOSPC;
+ }
+
+ rdtgrp->mon.cntr_id[index] = cntr_id;
+
+ return 0;
+}
+
+/*
+ * Assign a hardware counter to the group and assign the counter
+ * all the domains in the group. It will try to allocate the mbm
+ * counter if the counter is available.
+ */
+int rdtgroup_assign_cntr(struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
+{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ struct rdt_mon_domain *d;
+ int index;
+
+ index = mon_event_config_index_get(evtid);
+ if (index == INVALID_CONFIG_INDEX)
+ return -EINVAL;
+
+ if (rdtgroup_alloc_cntr(rdtgrp, index))
+ return -EINVAL;
+
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
+ resctrl_arch_assign_cntr(d, evtid, rdtgrp->mon.rmid,
+ rdtgrp->mon.cntr_id[index],
+ rdtgrp->closid, true);
+ set_bit(rdtgrp->mon.cntr_id[index], d->mbm_cntr_map);
+ }
+
+ return 0;
+}
+
/* rdtgroup information files for one cache resource. */
static struct rftype res_common_files[] = {
{
--
2.34.1
^ permalink raw reply related [flat|nested] 96+ messages in thread
* [PATCH v6 16/22] x86/resctrl: Add the interface to unassign a MBM counter
2024-08-06 22:00 [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (14 preceding siblings ...)
2024-08-06 22:00 ` [PATCH v6 15/22] x86/resctrl: Add the interface to assign a hardware counter Babu Moger
@ 2024-08-06 22:00 ` Babu Moger
2024-08-16 21:41 ` Reinette Chatre
2024-08-06 22:00 ` [PATCH v6 17/22] x86/resctrl: Assign/unassign counters by default when ABMC is enabled Babu Moger
` (6 subsequent siblings)
22 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-08-06 22:00 UTC (permalink / raw)
To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
The ABMC feature provides an option to the user to assign a hardware
counter to an RMID and monitor the bandwidth as long as it is assigned.
The assigned RMID will be tracked by the hardware until the user unassigns
it manually.
Hardware provides only limited number of counters. If the system runs out
of assignable counters, kernel will display an error when a new assignment
is requested. Users need to unassign a already assigned counter to make
space for new assignment.
Provide the interface to unassign the counter ids from the group. Free the
counter if it is not assigned in any of the domains.
The feature details are documented in the APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v6: Removed mbm_cntr_free from this patch.
Added counter test in all the domains and free if it is not assigned to
any domains.
v5: Few name changes to match cntr_id.
Changed the function names to
rdtgroup_unassign_cntr
More comments on commit log.
v4: Added domain specific unassign feature.
Few name changes.
v3: Removed the static from the prototype of rdtgroup_unassign_abmc.
The function is not called directly from user anymore. These
changes are related to global assignment interface.
v2: No changes.
---
arch/x86/kernel/cpu/resctrl/internal.h | 2 +
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 52 ++++++++++++++++++++++++++
2 files changed, 54 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 4e8109dee174..cc832955b787 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -689,6 +689,8 @@ int resctrl_arch_assign_cntr(struct rdt_mon_domain *d, enum resctrl_event_id evt
u32 rmid, u32 cntr_id, u32 closid, bool assign);
int rdtgroup_assign_cntr(struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
int rdtgroup_alloc_cntr(struct rdtgroup *rdtgrp, int index);
+int rdtgroup_unassign_cntr(struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
+void rdtgroup_free_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp, int index);
void rdt_staged_configs_clear(void);
bool closid_allocated(unsigned int closid);
int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 1ee91a7293a8..0c2215dbd497 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1961,6 +1961,58 @@ int rdtgroup_assign_cntr(struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
return 0;
}
+static int rdtgroup_mbm_cntr_test(struct rdt_resource *r, u32 cntr_id)
+{
+ struct rdt_mon_domain *d;
+
+ list_for_each_entry(d, &r->mon_domains, hdr.list)
+ if (test_bit(cntr_id, d->mbm_cntr_map))
+ return 1;
+
+ return 0;
+}
+
+/* Free the counter id after the event is unassigned */
+void rdtgroup_free_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
+ int index)
+{
+ /* Update the counter bitmap */
+ if (!rdtgroup_mbm_cntr_test(r, rdtgrp->mon.cntr_id[index])) {
+ mbm_cntr_free(rdtgrp->mon.cntr_id[index]);
+ rdtgrp->mon.cntr_id[index] = MON_CNTR_UNSET;
+ }
+}
+
+/*
+ * Unassign a hardware counter from the group and update all the domains
+ * in the group.
+ */
+int rdtgroup_unassign_cntr(struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
+{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ struct rdt_mon_domain *d;
+ int index;
+
+ index = mon_event_config_index_get(evtid);
+ if (index == INVALID_CONFIG_INDEX)
+ return -EINVAL;
+
+ if (rdtgrp->mon.cntr_id[index] != MON_CNTR_UNSET) {
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
+ resctrl_arch_assign_cntr(d, evtid, rdtgrp->mon.rmid,
+ rdtgrp->mon.cntr_id[index],
+ rdtgrp->closid, false);
+ clear_bit(rdtgrp->mon.cntr_id[index],
+ d->mbm_cntr_map);
+ }
+
+ /* Free the counter at group level */
+ rdtgroup_free_cntr(r, rdtgrp, index);
+ }
+
+ return 0;
+}
+
/* rdtgroup information files for one cache resource. */
static struct rftype res_common_files[] = {
{
--
2.34.1
^ permalink raw reply related [flat|nested] 96+ messages in thread
* [PATCH v6 17/22] x86/resctrl: Assign/unassign counters by default when ABMC is enabled
2024-08-06 22:00 [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (15 preceding siblings ...)
2024-08-06 22:00 ` [PATCH v6 16/22] x86/resctrl: Add the interface to unassign a MBM counter Babu Moger
@ 2024-08-06 22:00 ` Babu Moger
2024-08-16 21:42 ` Reinette Chatre
2024-08-06 22:00 ` [PATCH v6 18/22] x86/resctrl: Report "Unassigned" for MBM events in ABMC mode Babu Moger
` (5 subsequent siblings)
22 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-08-06 22:00 UTC (permalink / raw)
To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Assign/unassign counters on resctrl group creation/deletion. Two counters
are required per group, one for total event and one for local event.
There are only limited number of counters for assignment. If the counters
are exhausted, report the warnings and continue. It is not required to
fail group creation for assignment failures. Users have the option to
modify the assignments later.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v6: Removed the redundant comments on all the calls of
rdtgroup_assign_cntrs. Updated the commit message.
Dropped printing error message on every call of rdtgroup_assign_cntrs.
v5: Removed the code to enable/disable ABMC during the mount.
That will be another patch.
Added arch callers to get the arch specific data.
Renamed fuctions to match the other abmc function.
Added code comments for assignment failures.
v4: Few name changes based on the upstream discussion.
Commit message update.
v3: This is a new patch. Patch addresses the upstream comment to enable
ABMC feature by default if the feature is available.
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 55 ++++++++++++++++++++++++++
1 file changed, 55 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 0c2215dbd497..d93c1d784b91 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2908,6 +2908,46 @@ static void schemata_list_destroy(void)
}
}
+/*
+ * Called when new group is created. Assign the counters if ABMC is
+ * already enabled. Two counters are required per group, one for total
+ * event and one for local event. With limited number of counters,
+ * the assignments can fail in some cases. But, it is not required to
+ * fail the group creation. Users have the option to modify the
+ * assignments after the group creation.
+ */
+static int rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
+{
+ int ret = 0;
+
+ if (!resctrl_arch_get_abmc_enabled())
+ return 0;
+
+ if (is_mbm_total_enabled())
+ ret = rdtgroup_assign_cntr(rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
+
+ if (!ret && is_mbm_local_enabled())
+ ret = rdtgroup_assign_cntr(rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
+
+ return ret;
+}
+
+static int rdtgroup_unassign_cntrs(struct rdtgroup *rdtgrp)
+{
+ int ret = 0;
+
+ if (!resctrl_arch_get_abmc_enabled())
+ return 0;
+
+ if (is_mbm_total_enabled())
+ ret = rdtgroup_unassign_cntr(rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
+
+ if (!ret && is_mbm_local_enabled())
+ ret = rdtgroup_unassign_cntr(rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
+
+ return ret;
+}
+
static int rdt_get_tree(struct fs_context *fc)
{
struct rdt_fs_context *ctx = rdt_fc2context(fc);
@@ -2969,6 +3009,8 @@ static int rdt_get_tree(struct fs_context *fc)
if (ret < 0)
goto out_mongrp;
rdtgroup_default.mon.mon_data_kn = kn_mondata;
+
+ rdtgroup_assign_cntrs(&rdtgroup_default);
}
ret = rdt_pseudo_lock_init();
@@ -2999,6 +3041,7 @@ static int rdt_get_tree(struct fs_context *fc)
out_psl:
rdt_pseudo_lock_release();
out_mondata:
+ rdtgroup_unassign_cntrs(&rdtgroup_default);
if (resctrl_arch_mon_capable())
kernfs_remove(kn_mondata);
out_mongrp:
@@ -3258,6 +3301,8 @@ static void rdt_kill_sb(struct super_block *sb)
resctrl_arch_disable_alloc();
if (resctrl_arch_mon_capable())
resctrl_arch_disable_mon();
+
+ rdtgroup_unassign_cntrs(&rdtgroup_default);
resctrl_mounted = false;
kernfs_kill_sb(sb);
mutex_unlock(&rdtgroup_mutex);
@@ -3849,6 +3894,8 @@ static int rdtgroup_mkdir_mon(struct kernfs_node *parent_kn,
goto out_unlock;
}
+ rdtgroup_assign_cntrs(rdtgrp);
+
kernfs_activate(rdtgrp->kn);
/*
@@ -3893,6 +3940,8 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
if (ret)
goto out_closid_free;
+ rdtgroup_assign_cntrs(rdtgrp);
+
kernfs_activate(rdtgrp->kn);
ret = rdtgroup_init_alloc(rdtgrp);
@@ -3918,6 +3967,7 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
out_del_list:
list_del(&rdtgrp->rdtgroup_list);
out_rmid_free:
+ rdtgroup_unassign_cntrs(rdtgrp);
mkdir_rdt_prepare_rmid_free(rdtgrp);
out_closid_free:
closid_free(closid);
@@ -3988,6 +4038,9 @@ static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
update_closid_rmid(tmpmask, NULL);
rdtgrp->flags = RDT_DELETED;
+
+ rdtgroup_unassign_cntrs(rdtgrp);
+
free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
/*
@@ -4034,6 +4087,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
update_closid_rmid(tmpmask, NULL);
+ rdtgroup_unassign_cntrs(rdtgrp);
+
free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
closid_free(rdtgrp->closid);
--
2.34.1
^ permalink raw reply related [flat|nested] 96+ messages in thread
* [PATCH v6 18/22] x86/resctrl: Report "Unassigned" for MBM events in ABMC mode
2024-08-06 22:00 [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (16 preceding siblings ...)
2024-08-06 22:00 ` [PATCH v6 17/22] x86/resctrl: Assign/unassign counters by default when ABMC is enabled Babu Moger
@ 2024-08-06 22:00 ` Babu Moger
2024-08-16 21:42 ` Reinette Chatre
2024-08-06 22:00 ` [PATCH v6 19/22] x86/resctrl: Introduce the interface to switch between monitor modes Babu Moger
` (4 subsequent siblings)
22 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-08-06 22:00 UTC (permalink / raw)
To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
In ABMC mode, the hardware counter should be assigned to read the MBM
events.
Report "Unassigned" in case the user attempts to read the events without
assigning the counter.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v6: Added more explaination in the resctrl.rst
Added checks to detect "Unassigned" before reading RMID.
v5: New patch.
---
Documentation/arch/x86/resctrl.rst | 11 +++++++++++
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 13 ++++++++++++-
2 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index fe9f10766c4f..aea440ee6107 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -294,6 +294,17 @@ with the following files:
"num_mbm_cntrs":
The number of monitoring counters available for assignment.
+ Resctrl subsystem provides the interface to count maximum of two
+ MBM events per group, from a combination of total and local events.
+ Keeping the current interface, users can assign a maximum of two
+ monitoring counters per group. User will also have the option to
+ enable only one counter to the group.
+
+ With limited number of counters, system can run out of assignable counters.
+ In mbm_cntr_assign mode, the MBM event counters will return "Unassigned" if
+ the counter is not assigned to the event when read. Users need to assign a
+ counter manually to read the events.
+
"max_threshold_occupancy":
Read/write file provides the largest value (in
bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 50fa1fe9a073..ea918ce7c3ef 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -562,7 +562,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
union mon_data_bits md;
- int ret = 0;
+ int ret = 0, index;
rdtgrp = rdtgroup_kn_lock_live(of->kn);
if (!rdtgrp) {
@@ -576,6 +576,15 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
evtid = md.u.evtid;
r = &rdt_resources_all[resid].r_resctrl;
+ if (resctrl_arch_get_abmc_enabled() && evtid != QOS_L3_OCCUP_EVENT_ID) {
+ index = mon_event_config_index_get(evtid);
+ if (index != INVALID_CONFIG_INDEX &&
+ rdtgrp->mon.cntr_id[index] == MON_CNTR_UNSET) {
+ rr.err = -ENOENT;
+ goto checkresult;
+ }
+ }
+
if (md.u.sum) {
/*
* This file requires summing across all domains that share
@@ -613,6 +622,8 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
seq_puts(m, "Error\n");
else if (rr.err == -EINVAL)
seq_puts(m, "Unavailable\n");
+ else if (rr.err == -ENOENT)
+ seq_puts(m, "Unassigned\n");
else
seq_printf(m, "%llu\n", rr.val);
--
2.34.1
^ permalink raw reply related [flat|nested] 96+ messages in thread
* [PATCH v6 19/22] x86/resctrl: Introduce the interface to switch between monitor modes
2024-08-06 22:00 [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (17 preceding siblings ...)
2024-08-06 22:00 ` [PATCH v6 18/22] x86/resctrl: Report "Unassigned" for MBM events in ABMC mode Babu Moger
@ 2024-08-06 22:00 ` Babu Moger
2024-08-16 16:31 ` James Morse
2024-08-16 21:42 ` Reinette Chatre
2024-08-06 22:00 ` [PATCH v6 20/22] x86/resctrl: Enable AMD ABMC feature by default when supported Babu Moger
` (3 subsequent siblings)
22 siblings, 2 replies; 96+ messages in thread
From: Babu Moger @ 2024-08-06 22:00 UTC (permalink / raw)
To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Introduce interface to switch between ABMC and legacy modes.
By default ABMC is enabled on boot if the feature is available.
Provide the interface to go back to legacy mode if required.
$ cat /sys/fs/resctrl/info/L3_MON/mbm_mode
[mbm_cntr_assign]
legacy
To enable the "mbm_cntr_assign" mode:
$ echo "mbm_cntr_assign" > /sys/fs/resctrl/info/L3_MON/mbm_mode
To enable the legacy monitoring feature:
$ echo "legacy" > /sys/fs/resctrl/info/L3_MON/mbm_mode
MBM event counters will reset when mbm_mode is changed.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v6: Changed the mode name to mbm_cntr_assign.
Moved all the FS related code here.
Added changes to reset mbm_cntr_map and resctrl group counters.
v5: Change log and mode description text correction.
v4: Minor commit text changes. Keep the default to ABMC when supported.
Fixed comments to reflect changed interface "mbm_mode".
v3: New patch to address the review comments from upstream.
---
Documentation/arch/x86/resctrl.rst | 15 +++++++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 62 +++++++++++++++++++++++++-
2 files changed, 76 insertions(+), 1 deletion(-)
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index aea440ee6107..d6d6a8276401 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -291,6 +291,21 @@ with the following files:
as long as there are enough RMID counters available to support number
of monitoring groups.
+ * To enable ABMC feature:
+ ::
+
+ # echo "mbm_cntr_assign" > /sys/fs/resctrl/info/L3_MON/mbm_mode
+
+ * To enable the legacy monitoring feature:
+ ::
+
+ # echo "legacy" > /sys/fs/resctrl/info/L3_MON/mbm_mode
+
+ The MBM event counters will reset when mbm_mode is changed. Moving to
+ mbm_cntr_assign will require users to assign the counters to the events to
+ read the events. Otherwise, the MBM event counters will return "Unassigned"
+ when read.
+
"num_mbm_cntrs":
The number of monitoring counters available for assignment.
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index d93c1d784b91..66febff2a3d3 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -896,6 +896,65 @@ static int rdtgroup_mbm_mode_show(struct kernfs_open_file *of,
return 0;
}
+static void rdtgroup_mbm_cntr_reset(struct rdt_resource *r)
+{
+ struct rdtgroup *prgrp, *crgrp;
+ struct rdt_mon_domain *dom;
+
+ mbm_cntrs_init(r);
+
+ list_for_each_entry(dom, &r->mon_domains, hdr.list)
+ bitmap_zero(dom->mbm_cntr_map, r->mon.num_mbm_cntrs);
+
+ /* Reset the cntr_id's for all the monitor groups */
+ list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
+ prgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
+ prgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
+ list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list,
+ mon.crdtgrp_list) {
+ crgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
+ crgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
+ }
+ }
+}
+
+static ssize_t rdtgroup_mbm_mode_write(struct kernfs_open_file *of,
+ char *buf, size_t nbytes,
+ loff_t off)
+{
+ int mbm_cntr_assign = resctrl_arch_get_abmc_enabled();
+ struct rdt_resource *r = of->kn->parent->priv;
+ int ret = 0;
+
+ /* Valid input requires a trailing newline */
+ if (nbytes == 0 || buf[nbytes - 1] != '\n')
+ return -EINVAL;
+
+ buf[nbytes - 1] = '\0';
+
+ cpus_read_lock();
+ mutex_lock(&rdtgroup_mutex);
+
+ rdt_last_cmd_clear();
+
+ if (!strcmp(buf, "legacy")) {
+ if (mbm_cntr_assign)
+ resctrl_arch_mbm_cntr_assign_disable();
+ } else if (!strcmp(buf, "mbm_cntr_assign")) {
+ if (!mbm_cntr_assign) {
+ rdtgroup_mbm_cntr_reset(r);
+ ret = resctrl_arch_mbm_cntr_assign_enable();
+ }
+ } else {
+ ret = -EINVAL;
+ }
+
+ mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();
+
+ return ret ?: nbytes;
+}
+
static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
struct seq_file *s, void *v)
{
@@ -2127,9 +2186,10 @@ static struct rftype res_common_files[] = {
},
{
.name = "mbm_mode",
- .mode = 0444,
+ .mode = 0644,
.kf_ops = &rdtgroup_kf_single_ops,
.seq_show = rdtgroup_mbm_mode_show,
+ .write = rdtgroup_mbm_mode_write,
.fflags = RFTYPE_MON_INFO,
},
{
--
2.34.1
^ permalink raw reply related [flat|nested] 96+ messages in thread
* [PATCH v6 20/22] x86/resctrl: Enable AMD ABMC feature by default when supported
2024-08-06 22:00 [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (18 preceding siblings ...)
2024-08-06 22:00 ` [PATCH v6 19/22] x86/resctrl: Introduce the interface to switch between monitor modes Babu Moger
@ 2024-08-06 22:00 ` Babu Moger
2024-08-16 16:32 ` James Morse
2024-08-16 22:33 ` Reinette Chatre
2024-08-06 22:00 ` [PATCH v6 21/22] x86/resctrl: Introduce interface to list monitor states of all the groups Babu Moger
` (2 subsequent siblings)
22 siblings, 2 replies; 96+ messages in thread
From: Babu Moger @ 2024-08-06 22:00 UTC (permalink / raw)
To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Enable ABMC by default when supported during the boot up.
Users will not see any difference in the behavior when resctrl is
mounted. With automatic assignment everything will work as running
in the legacy monitor mode.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v6 : Keeping the default enablement in arch init code for now.
This may need some discussion.
Renamed resctrl_arch_configure_abmc to resctrl_arch_mbm_cntr_assign_configure.
v5: New patch to enable ABMC by default.
---
arch/x86/kernel/cpu/resctrl/core.c | 2 ++
arch/x86/kernel/cpu/resctrl/internal.h | 1 +
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 17 +++++++++++++++++
3 files changed, 20 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 6fb0cfdb5529..a7980f84c487 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -599,6 +599,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
d = container_of(hdr, struct rdt_mon_domain, hdr);
cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
+ resctrl_arch_mbm_cntr_assign_configure();
return;
}
@@ -620,6 +621,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
arch_mon_domain_online(r, d);
resctrl_mbm_evt_config_init(hw_dom);
+ resctrl_arch_mbm_cntr_assign_configure();
if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
mon_domain_free(hw_dom);
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index cc832955b787..ba3012f8f940 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -685,6 +685,7 @@ int mbm_cntr_alloc(struct rdt_resource *r);
void mbm_cntr_free(u32 cntr_id);
void resctrl_mbm_evt_config_init(struct rdt_hw_mon_domain *hw_dom);
unsigned int mon_event_config_index_get(u32 evtid);
+void resctrl_arch_mbm_cntr_assign_configure(void);
int resctrl_arch_assign_cntr(struct rdt_mon_domain *d, enum resctrl_event_id evtid,
u32 rmid, u32 cntr_id, u32 closid, bool assign);
int rdtgroup_assign_cntr(struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 66febff2a3d3..d15fd1bde5f4 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2756,6 +2756,23 @@ void resctrl_arch_mbm_cntr_assign_disable(void)
}
}
+void resctrl_arch_mbm_cntr_assign_configure(void)
+{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+ bool enable = true;
+
+ mutex_lock(&rdtgroup_mutex);
+
+ if (r->mon.mbm_cntr_assignable) {
+ if (!hw_res->mbm_cntr_assign_enabled)
+ hw_res->mbm_cntr_assign_enabled = true;
+ resctrl_abmc_set_one_amd(&enable);
+ }
+
+ mutex_unlock(&rdtgroup_mutex);
+}
+
/*
* We don't allow rdtgroup directories to be created anywhere
* except the root directory. Thus when looking for the rdtgroup
--
2.34.1
^ permalink raw reply related [flat|nested] 96+ messages in thread
* [PATCH v6 21/22] x86/resctrl: Introduce interface to list monitor states of all the groups
2024-08-06 22:00 [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (19 preceding siblings ...)
2024-08-06 22:00 ` [PATCH v6 20/22] x86/resctrl: Enable AMD ABMC feature by default when supported Babu Moger
@ 2024-08-06 22:00 ` Babu Moger
2024-08-16 16:28 ` James Morse
2024-08-06 22:00 ` [PATCH v6 22/22] x86/resctrl: Introduce interface to modify assignment states of " Babu Moger
2024-08-16 21:28 ` [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Reinette Chatre
22 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-08-06 22:00 UTC (permalink / raw)
To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Provide the interface to list the monitor states of all the resctrl
groups in ABMC mode.
Example:
$cat /sys/fs/resctrl/info/L3_MON/mbm_control
List follows the following format:
"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
Format for specific type of groups:
- Default CTRL_MON group:
"//<domain_id>=<flags>"
- Non-default CTRL_MON group:
"<CTRL_MON group>//<domain_id>=<flags>"
- Child MON group of default CTRL_MON group:
"/<MON group>/<domain_id>=<flags>"
- Child MON group of non-default CTRL_MON group:
"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
Flags can be one of the following:
t MBM total event is enabled
l MBM local event is enabled
tl Both total and local MBM events are enabled
_ None of the MBM events are enabled
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v6: The domain specific assignment can be determined looking at mbm_cntr_map.
Removed rdtgroup_abmc_dom_cfg() and rdtgroup_abmc_dom_state().
Removed the switch statement for the domain_state detection.
Determined the flags incremently.
Removed special handling of default group while printing..
v5: Replaced "assignment flags" with "flags".
Changes related to mon structure.
Changes related renaming the interface from mbm_assign_control to
mbm_control.
v4: Added functionality to query domain specific assigment in.
rdtgroup_abmc_dom_state().
v3: New patch.
Addresses the feedback to provide the global assignment interface.
https://lore.kernel.org/lkml/c73f444b-83a1-4e9a-95d3-54c5165ee782@intel.com/
---
Documentation/arch/x86/resctrl.rst | 45 ++++++++++++++++
arch/x86/kernel/cpu/resctrl/monitor.c | 1 +
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 75 ++++++++++++++++++++++++++
3 files changed, 121 insertions(+)
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index d6d6a8276401..113c22ba6db3 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -320,6 +320,51 @@ with the following files:
the counter is not assigned to the event when read. Users need to assign a
counter manually to read the events.
+"mbm_control":
+ Reports the resctrl group and monitor status of each group.
+
+ List follows the following format:
+ "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
+
+ Format for specific type of groups:
+
+ * Default CTRL_MON group:
+ "//<domain_id>=<flags>"
+
+ * Non-default CTRL_MON group:
+ "<CTRL_MON group>//<domain_id>=<flags>"
+
+ * Child MON group of default CTRL_MON group:
+ "/<MON group>/<domain_id>=<flags>"
+
+ * Child MON group of non-default CTRL_MON group:
+ "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
+
+ Flags can be one of the following:
+ ::
+
+ t MBM total event is enabled.
+ l MBM local event is enabled.
+ tl Both total and local MBM events are enabled.
+ _ None of the MBM events are enabled.
+
+ Examples:
+ ::
+
+ # mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
+ # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
+ # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
+
+ # cat /sys/fs/resctrl/info/L3_MON/mbm_control
+ non_default_ctrl_mon_grp//0=tl;1=tl;
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+ //0=tl;1=tl;
+ /child_default_mon_grp/0=tl;1=tl;
+
+ There are four resctrl groups. All the groups have total and local MBM events
+ enabled on domain 0 and 1.
+
+
"max_threshold_occupancy":
Read/write file provides the largest value (in
bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 2f4d0c12b80d..87537abedb01 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1244,6 +1244,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
r->mon.num_mbm_cntrs = 64;
resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
+ resctrl_file_fflags_init("mbm_control", RFTYPE_MON_INFO);
}
}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index d15fd1bde5f4..d7aadca5e4ab 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -965,6 +965,75 @@ static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
return 0;
}
+static char *rdtgroup_mon_state_to_str(struct rdtgroup *rdtgrp,
+ struct rdt_mon_domain *d, char *str)
+{
+ char *tmp = str;
+ int index;
+
+ /*
+ * Query the monitor state for the domain.
+ * Index 0 for evtid == QOS_L3_MBM_TOTAL_EVENT_ID
+ * Index 1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
+ */
+ index = mon_event_config_index_get(QOS_L3_MBM_TOTAL_EVENT_ID);
+ if (rdtgrp->mon.cntr_id[index] != MON_CNTR_UNSET &&
+ test_bit(rdtgrp->mon.cntr_id[index], d->mbm_cntr_map))
+ *tmp++ = 't';
+
+ index = mon_event_config_index_get(QOS_L3_MBM_LOCAL_EVENT_ID);
+ if (rdtgrp->mon.cntr_id[index] != MON_CNTR_UNSET &&
+ test_bit(rdtgrp->mon.cntr_id[index], d->mbm_cntr_map))
+ *tmp++ = 'l';
+
+ if (tmp == str)
+ *tmp++ = '_';
+
+ *tmp = '\0';
+ return str;
+}
+
+static int rdtgroup_mbm_control_show(struct kernfs_open_file *of,
+ struct seq_file *s, void *v)
+{
+ struct rdt_resource *r = of->kn->parent->priv;
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+ struct rdt_mon_domain *dom;
+ struct rdtgroup *rdtg;
+ char str[10];
+
+ if (!hw_res->mbm_cntr_assign_enabled) {
+ rdt_last_cmd_puts("ABMC feature is not enabled\n");
+ return -EINVAL;
+ }
+
+ mutex_lock(&rdtgroup_mutex);
+
+ list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
+ struct rdtgroup *crg;
+
+ seq_printf(s, "%s//", rdtg->kn->name);
+
+ list_for_each_entry(dom, &r->mon_domains, hdr.list)
+ seq_printf(s, "%d=%s;", dom->hdr.id,
+ rdtgroup_mon_state_to_str(rdtg, dom, str));
+ seq_putc(s, '\n');
+
+ list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
+ mon.crdtgrp_list) {
+ seq_printf(s, "%s/%s/", rdtg->kn->name, crg->kn->name);
+
+ list_for_each_entry(dom, &r->mon_domains, hdr.list)
+ seq_printf(s, "%d=%s;", dom->hdr.id,
+ rdtgroup_mon_state_to_str(crg, dom, str));
+ seq_putc(s, '\n');
+ }
+ }
+
+ mutex_unlock(&rdtgroup_mutex);
+ return 0;
+}
+
#ifdef CONFIG_PROC_CPU_RESCTRL
/*
@@ -2206,6 +2275,12 @@ static struct rftype res_common_files[] = {
.kf_ops = &rdtgroup_kf_single_ops,
.seq_show = rdtgroup_num_mbm_cntrs_show,
},
+ {
+ .name = "mbm_control",
+ .mode = 0444,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .seq_show = rdtgroup_mbm_control_show,
+ },
{
.name = "cpus_list",
.mode = 0644,
--
2.34.1
^ permalink raw reply related [flat|nested] 96+ messages in thread
* [PATCH v6 22/22] x86/resctrl: Introduce interface to modify assignment states of the groups
2024-08-06 22:00 [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (20 preceding siblings ...)
2024-08-06 22:00 ` [PATCH v6 21/22] x86/resctrl: Introduce interface to list monitor states of all the groups Babu Moger
@ 2024-08-06 22:00 ` Babu Moger
2024-08-16 22:33 ` Reinette Chatre
2024-08-16 21:28 ` [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Reinette Chatre
22 siblings, 1 reply; 96+ messages in thread
From: Babu Moger @ 2024-08-06 22:00 UTC (permalink / raw)
To: corbet, fenghua.yu, reinette.chatre, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Introduce the interface to assign MBM events in ABMC mode.
Events can be enabled or disabled by writing to file
/sys/fs/resctrl/info/L3_MON/mbm_control
Format is similar to the list format with addition of opcode for the
assignment operation.
"<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
Format for specific type of groups:
* Default CTRL_MON group:
"//<domain_id><opcode><flags>"
* Non-default CTRL_MON group:
"<CTRL_MON group>//<domain_id><opcode><flags>"
* Child MON group of default CTRL_MON group:
"/<MON group>/<domain_id><opcode><flags>"
* Child MON group of non-default CTRL_MON group:
"<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
Domain_id '*' will apply the flags on all the domains.
Opcode can be one of the following:
= Update the assignment to match the flags
+ assign a MBM event
- unassign a MBM event
Assignment flags can be one of the following:
t MBM total event
l MBM local event
tl Both total and local MBM events
_ None of the MBM events. Valid only with '=' opcode.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v6: Added support assign all if domain id is '*'
Fixed the allocation of counter id if it not assigned already.
v5: Interface name changed from mbm_assign_control to mbm_control.
Fixed opcode and flags combination.
'=_" is valid.
"-_" amd "+_" is not valid.
Minor message update.
Renamed the function with prefix - rdtgroup_.
Corrected few documentation mistakes.
Rebase related changes after SNC support.
v4: Added domain specific assignments. Fixed the opcode parsing.
v3: New patch.
Addresses the feedback to provide the global assignment interface.
https://lore.kernel.org/lkml/c73f444b-83a1-4e9a-95d3-54c5165ee782@intel.com/
---
Documentation/arch/x86/resctrl.rst | 94 +++++++-
arch/x86/kernel/cpu/resctrl/internal.h | 7 +
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 315 ++++++++++++++++++++++++-
3 files changed, 414 insertions(+), 2 deletions(-)
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 113c22ba6db3..ae3b17b7cefe 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -346,7 +346,7 @@ with the following files:
t MBM total event is enabled.
l MBM local event is enabled.
tl Both total and local MBM events are enabled.
- _ None of the MBM events are enabled.
+ _ None of the MBM events are enabled. Only works with opcode '=' for write.
Examples:
::
@@ -365,6 +365,98 @@ with the following files:
enabled on domain 0 and 1.
+ Assignment state can be updated by writing to the interface.
+
+ Format is similar to the list format with addition of opcode for the
+ assignment operation.
+
+ "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
+
+ Format for each type of groups:
+
+ * Default CTRL_MON group:
+ "//<domain_id><opcode><flags>"
+
+ * Non-default CTRL_MON group:
+ "<CTRL_MON group>//<domain_id><opcode><flags>"
+
+ * Child MON group of default CTRL_MON group:
+ "/<MON group>/<domain_id><opcode><flags>"
+
+ * Child MON group of non-default CTRL_MON group:
+ "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
+
+ Domain_id '*' wil apply the flags on all the domains.
+
+ Opcode can be one of the following:
+ ::
+
+ = Update the assignment to match the MBM event.
+ + Assign a MBM event.
+ - Unassign a MBM event.
+
+ Examples:
+ ::
+
+ Initial group status:
+ # cat /sys/fs/resctrl/info/L3_MON/mbm_control
+ non_default_ctrl_mon_grp//0=tl;1=tl;
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+ //0=tl;1=tl;
+ /child_default_mon_grp/0=tl;1=tl;
+
+ To update the default group to assign only total MBM event on domain 0:
+ # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_control
+
+ Assignment status after the update:
+ # cat /sys/fs/resctrl/info/L3_MON/mbm_control
+ non_default_ctrl_mon_grp//0=tl;1=tl;
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+ //0=t;1=tl;
+ /child_default_mon_grp/0=tl;1=tl;
+
+ To update the MON group child_default_mon_grp to remove total MBM event on domain 1:
+ # echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_control
+
+ Assignment status after the update:
+ $ cat /sys/fs/resctrl/info/L3_MON/mbm_control
+ non_default_ctrl_mon_grp//0=tl;1=tl;
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+ //0=t;1=tl;
+ /child_default_mon_grp/0=tl;1=l;
+
+ To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to
+ unassign both local and total MBM events on domain 1:
+ # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
+ /sys/fs/resctrl/info/L3_MON/mbm_control
+
+ Assignment status after the update:
+ non_default_ctrl_mon_grp//0=tl;1=tl;
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
+ //0=t;1=tl;
+ /child_default_mon_grp/0=tl;1=l;
+
+ To update the default group to add a local MBM event domain 0.
+ # echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_control
+
+ Assignment status after the update:
+ # cat /sys/fs/resctrl/info/L3_MON/mbm_control
+ non_default_ctrl_mon_grp//0=tl;1=tl;
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
+ //0=tl;1=tl;
+ /child_default_mon_grp/0=tl;1=l;
+
+ To update the non default CTRL_MON group non_default_ctrl_mon_grp to unassign all
+ the MBM events on all the domains.
+ # echo "non_default_ctrl_mon_grp//*=_" > /sys/fs/resctrl/info/L3_MON/mbm_control
+
+ Assignment status after the update:
+ #cat /sys/fs/resctrl/info/L3_MON/mbm_control
+ non_default_ctrl_mon_grp//0=_;1=_;
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
+ //0=tl;1=tl;
+ /child_default_mon_grp/0=tl;1=l;
+
"max_threshold_occupancy":
Read/write file provides the largest value (in
bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index ba3012f8f940..5af225b4a497 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -46,6 +46,13 @@
#define MON_CNTR_UNSET U32_MAX
+/*
+ * Assignment flags for ABMC feature
+ */
+#define ASSIGN_NONE 0
+#define ASSIGN_TOTAL BIT(QOS_L3_MBM_TOTAL_EVENT_ID)
+#define ASSIGN_LOCAL BIT(QOS_L3_MBM_LOCAL_EVENT_ID)
+
/**
* cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
* aren't marked nohz_full
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index d7aadca5e4ab..8567fb3a6274 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1034,6 +1034,318 @@ static int rdtgroup_mbm_control_show(struct kernfs_open_file *of,
return 0;
}
+/*
+ * Update the assign states for the domain.
+ *
+ * If this is a new assignment for the group then allocate a counter and update
+ * the assignment else just update the assign state
+ */
+static int rdtgroup_assign_update(struct rdtgroup *rdtgrp, enum resctrl_event_id evtid,
+ struct rdt_mon_domain *d)
+{
+ int ret, index;
+
+ index = mon_event_config_index_get(evtid);
+ if (index == INVALID_CONFIG_INDEX)
+ return -EINVAL;
+
+ if (rdtgrp->mon.cntr_id[index] == MON_CNTR_UNSET) {
+ ret = rdtgroup_alloc_cntr(rdtgrp, index);
+ if (ret < 0)
+ goto out_done;
+ }
+
+ /* Update the state on all domains if d == NULL */
+ if (d == NULL) {
+ ret = rdtgroup_assign_cntr(rdtgrp, evtid);
+ } else {
+ ret = resctrl_arch_assign_cntr(d, evtid, rdtgrp->mon.rmid,
+ rdtgrp->mon.cntr_id[index],
+ rdtgrp->closid, 1);
+ if (!ret)
+ set_bit(rdtgrp->mon.cntr_id[index], d->mbm_cntr_map);
+ }
+
+out_done:
+ return ret;
+}
+
+/*
+ * Update the unassign state for the domain.
+ *
+ * Free the counter if it is unassigned on all the domains else just
+ * update the unassign state
+ */
+static int rdtgroup_unassign_update(struct rdtgroup *rdtgrp, enum resctrl_event_id evtid,
+ struct rdt_mon_domain *d)
+{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ int ret = 0, index;
+
+ index = mon_event_config_index_get(evtid);
+ if (index == INVALID_CONFIG_INDEX)
+ return -EINVAL;
+
+ if (rdtgrp->mon.cntr_id[index] == MON_CNTR_UNSET)
+ goto out_done;
+
+ if (d == NULL) {
+ ret = rdtgroup_unassign_cntr(rdtgrp, evtid);
+ } else {
+ ret = resctrl_arch_assign_cntr(d, evtid, rdtgrp->mon.rmid,
+ rdtgrp->mon.cntr_id[index],
+ rdtgrp->closid, 0);
+ if (!ret) {
+ clear_bit(rdtgrp->mon.cntr_id[index], d->mbm_cntr_map);
+ rdtgroup_free_cntr(r, rdtgrp, index);
+ }
+ }
+
+out_done:
+ return ret;
+}
+
+static int rdtgroup_str_to_mon_state(char *flag)
+{
+ int i, mon_state = 0;
+
+ for (i = 0; i < strlen(flag); i++) {
+ switch (*(flag + i)) {
+ case 't':
+ mon_state |= ASSIGN_TOTAL;
+ break;
+ case 'l':
+ mon_state |= ASSIGN_LOCAL;
+ break;
+ case '_':
+ mon_state = ASSIGN_NONE;
+ break;
+ default:
+ break;
+ }
+ }
+
+ return mon_state;
+}
+
+static struct rdtgroup *rdtgroup_find_grp(enum rdt_group_type rtype, char *p_grp, char *c_grp)
+{
+ struct rdtgroup *rdtg, *crg;
+
+ if (rtype == RDTCTRL_GROUP && *p_grp == '\0') {
+ return &rdtgroup_default;
+ } else if (rtype == RDTCTRL_GROUP) {
+ list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list)
+ if (!strcmp(p_grp, rdtg->kn->name))
+ return rdtg;
+ } else if (rtype == RDTMON_GROUP) {
+ list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
+ if (!strcmp(p_grp, rdtg->kn->name)) {
+ list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
+ mon.crdtgrp_list) {
+ if (!strcmp(c_grp, crg->kn->name))
+ return crg;
+ }
+ }
+ }
+ }
+
+ return NULL;
+}
+
+static int rdtgroup_process_flags(struct rdt_resource *r,
+ enum rdt_group_type rtype,
+ char *p_grp, char *c_grp, char *tok)
+{
+ int op, mon_state, assign_state, unassign_state;
+ char *dom_str, *id_str, *op_str;
+ struct rdt_mon_domain *d;
+ struct rdtgroup *rdtgrp;
+ unsigned long dom_id;
+ int ret, found = 0;
+
+ rdtgrp = rdtgroup_find_grp(rtype, p_grp, c_grp);
+
+ if (!rdtgrp) {
+ rdt_last_cmd_puts("Not a valid resctrl group\n");
+ return -EINVAL;
+ }
+
+next:
+ if (!tok || tok[0] == '\0')
+ return 0;
+
+ /* Start processing the strings for each domain */
+ dom_str = strim(strsep(&tok, ";"));
+
+ op_str = strpbrk(dom_str, "=+-");
+
+ if (op_str) {
+ op = *op_str;
+ } else {
+ rdt_last_cmd_puts("Missing operation =, +, -, _ character\n");
+ return -EINVAL;
+ }
+
+ id_str = strsep(&dom_str, "=+-");
+
+ /* Check for domain id '*' which means all domains */
+ if (id_str && *id_str == '*') {
+ d = NULL;
+ goto check_state;
+ } else if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
+ rdt_last_cmd_puts("Missing domain id\n");
+ return -EINVAL;
+ }
+
+ /* Verify if the dom_id is valid */
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
+ if (d->hdr.id == dom_id) {
+ found = 1;
+ break;
+ }
+ }
+
+ if (!found) {
+ rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
+ return -EINVAL;
+ }
+
+check_state:
+ mon_state = rdtgroup_str_to_mon_state(dom_str);
+
+ assign_state = 0;
+ unassign_state = 0;
+
+ switch (op) {
+ case '+':
+ if (mon_state == ASSIGN_NONE) {
+ rdt_last_cmd_puts("Invalid assign opcode\n");
+ goto out_fail;
+ }
+ assign_state = mon_state;
+ break;
+ case '-':
+ if (mon_state == ASSIGN_NONE) {
+ rdt_last_cmd_puts("Invalid assign opcode\n");
+ goto out_fail;
+ }
+ unassign_state = mon_state;
+ break;
+ case '=':
+ assign_state = mon_state;
+ unassign_state = (ASSIGN_TOTAL | ASSIGN_LOCAL) & ~assign_state;
+ break;
+ default:
+ break;
+ }
+
+ if (assign_state & ASSIGN_TOTAL) {
+ ret = rdtgroup_assign_update(rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID, d);
+ if (ret)
+ goto out_fail;
+ }
+
+ if (assign_state & ASSIGN_LOCAL) {
+ ret = rdtgroup_assign_update(rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID, d);
+ if (ret)
+ goto out_fail;
+ }
+
+ if (unassign_state & ASSIGN_TOTAL) {
+ ret = rdtgroup_unassign_update(rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID, d);
+ if (ret)
+ goto out_fail;
+ }
+
+ if (unassign_state & ASSIGN_LOCAL) {
+ ret = rdtgroup_unassign_update(rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID, d);
+ if (ret)
+ goto out_fail;
+ }
+
+ goto next;
+
+out_fail:
+
+ return -EINVAL;
+}
+
+static ssize_t rdtgroup_mbm_control_write(struct kernfs_open_file *of,
+ char *buf, size_t nbytes,
+ loff_t off)
+{
+ struct rdt_resource *r = of->kn->parent->priv;
+ char *token, *cmon_grp, *mon_grp;
+ int ret;
+
+ if (!resctrl_arch_get_abmc_enabled())
+ return -EINVAL;
+
+ /* Valid input requires a trailing newline */
+ if (nbytes == 0 || buf[nbytes - 1] != '\n')
+ return -EINVAL;
+
+ buf[nbytes - 1] = '\0';
+
+ cpus_read_lock();
+ mutex_lock(&rdtgroup_mutex);
+ rdt_last_cmd_clear();
+
+ while ((token = strsep(&buf, "\n")) != NULL) {
+ if (strstr(token, "//")) {
+ /*
+ * The CTRL_MON group processing:
+ * default CTRL_MON group: "//<flags>"
+ * non-default CTRL_MON group: "<CTRL_MON group>//flags"
+ * The CTRL_MON group will be empty string if it is a
+ * default group.
+ */
+ cmon_grp = strsep(&token, "//");
+
+ /*
+ * strsep returns empty string for contiguous delimiters.
+ * Make sure check for two consecutive delimiters and
+ * advance the token.
+ */
+ mon_grp = strsep(&token, "//");
+ if (*mon_grp != '\0') {
+ rdt_last_cmd_printf("Invalid CTRL_MON group format %s\n", token);
+ ret = -EINVAL;
+ break;
+ }
+
+ ret = rdtgroup_process_flags(r, RDTCTRL_GROUP, cmon_grp, mon_grp, token);
+ if (ret)
+ break;
+ } else if (strstr(token, "/")) {
+ /*
+ * MON group processing:
+ * MON_GROUP inside default CTRL_MON group: "/<MON group>/<flags>"
+ * MON_GROUP within CTRL_MON group: "<CTRL_MON group>/<MON group>/<flags>"
+ */
+ cmon_grp = strsep(&token, "/");
+
+ /* Extract the MON_GROUP. It cannot be empty string */
+ mon_grp = strsep(&token, "/");
+ if (*mon_grp == '\0') {
+ rdt_last_cmd_printf("Invalid MON_GROUP format %s\n", token);
+ ret = -EINVAL;
+ break;
+ }
+
+ ret = rdtgroup_process_flags(r, RDTMON_GROUP, cmon_grp, mon_grp, token);
+ if (ret)
+ break;
+ }
+ }
+
+ mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();
+
+ return ret ?: nbytes;
+}
+
#ifdef CONFIG_PROC_CPU_RESCTRL
/*
@@ -2277,9 +2589,10 @@ static struct rftype res_common_files[] = {
},
{
.name = "mbm_control",
- .mode = 0444,
+ .mode = 0644,
.kf_ops = &rdtgroup_kf_single_ops,
.seq_show = rdtgroup_mbm_control_show,
+ .write = rdtgroup_mbm_control_write,
},
{
.name = "cpus_list",
--
2.34.1
^ permalink raw reply related [flat|nested] 96+ messages in thread
* Re: [PATCH v6 01/22] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC)
2024-08-06 22:00 ` [PATCH v6 01/22] x86/cpufeatures: Add support for " Babu Moger
@ 2024-08-07 16:32 ` Thomas Gleixner
2024-08-08 14:46 ` Moger, Babu
0 siblings, 1 reply; 96+ messages in thread
From: Thomas Gleixner @ 2024-08-07 16:32 UTC (permalink / raw)
To: Babu Moger, corbet, fenghua.yu, reinette.chatre, mingo, bp,
dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
On Tue, Aug 06 2024 at 17:00, Babu Moger wrote:
> Users can create as many monitor groups as RMIDs supported by the hardware.
> However, bandwidth monitoring feature on AMD system only guarantees that
> RMIDs currently assigned to a processor will be tracked by hardware. The
> counters of any other RMIDs which are no longer being tracked will be
> reset to zero. The MBM event counters return "Unavailable" for the RMIDs
> that are not tracked by hardware. So, there can be only limited number of
> groups that can give guaranteed monitoring numbers. With ever changing
> configurations there is no way to definitely know which of these groups
> are being tracked for certain point of time. Users do not have the option
> to monitor a group or set of groups for certain period of time without
> worrying about RMID being reset in between.
>
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID and monitor the bandwidth as long as it is assigned.
> The assigned RMID will be tracked by the hardware until the user unassigns
> it manually. There is no need to worry about counters being reset during
> this period. Additionally, the user can specify a bitmask identifying the
> specific bandwidth types from the given source to track with the counter.
>
> Without ABMC enabled, monitoring will work in current mode without
> assignment option.
>
> Linux resctrl subsystem provides the interface to count maximum of two
> memory bandwidth events per group, from a combination of available total
> and local events. Keeping the current interface, users can enable a maximum
> of 2 ABMC counters per group. User will also have the option to enable only
> one counter to the group. If the system runs out of assignable ABMC
> counters, kernel will display an error. Users need to disable an already
> enabled counter to make space for new assignments.
>
> The feature can be detected via CPUID_Fn80000020_EBX_x00 bit 5.
> Bits Description
> 5 ABMC (Assignable Bandwidth Monitoring Counters)
Can you please update the CPUID database with that new bit:
https://gitlab.com/x86-cpuid.org/x86-cpuid-db
Thanks,
tglx
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 04/22] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
2024-08-06 22:00 ` [PATCH v6 04/22] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
@ 2024-08-07 16:33 ` Thomas Gleixner
2024-08-16 21:30 ` Reinette Chatre
1 sibling, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2024-08-07 16:33 UTC (permalink / raw)
To: Babu Moger, corbet, fenghua.yu, reinette.chatre, mingo, bp,
dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, babu.moger,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
On Tue, Aug 06 2024 at 17:00, Babu Moger wrote:
> ABMC feature details are reported via CPUID Fn8000_0020_EBX_x5.
> Bits Description
> 15:0 MAX_ABMC Maximum Supported Assignable Bandwidth
> Monitoring Counter ID + 1
This one wants to be in the database too
Thanks,
tglx
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 01/22] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC)
2024-08-07 16:32 ` Thomas Gleixner
@ 2024-08-08 14:46 ` Moger, Babu
0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-08-08 14:46 UTC (permalink / raw)
To: Thomas Gleixner, corbet, fenghua.yu, reinette.chatre, mingo, bp,
dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Thomas,
On 8/7/24 11:32, Thomas Gleixner wrote:
> On Tue, Aug 06 2024 at 17:00, Babu Moger wrote:
>> Users can create as many monitor groups as RMIDs supported by the hardware.
>> However, bandwidth monitoring feature on AMD system only guarantees that
>> RMIDs currently assigned to a processor will be tracked by hardware. The
>> counters of any other RMIDs which are no longer being tracked will be
>> reset to zero. The MBM event counters return "Unavailable" for the RMIDs
>> that are not tracked by hardware. So, there can be only limited number of
>> groups that can give guaranteed monitoring numbers. With ever changing
>> configurations there is no way to definitely know which of these groups
>> are being tracked for certain point of time. Users do not have the option
>> to monitor a group or set of groups for certain period of time without
>> worrying about RMID being reset in between.
>>
>> The ABMC feature provides an option to the user to assign a hardware
>> counter to an RMID and monitor the bandwidth as long as it is assigned.
>> The assigned RMID will be tracked by the hardware until the user unassigns
>> it manually. There is no need to worry about counters being reset during
>> this period. Additionally, the user can specify a bitmask identifying the
>> specific bandwidth types from the given source to track with the counter.
>>
>> Without ABMC enabled, monitoring will work in current mode without
>> assignment option.
>>
>> Linux resctrl subsystem provides the interface to count maximum of two
>> memory bandwidth events per group, from a combination of available total
>> and local events. Keeping the current interface, users can enable a maximum
>> of 2 ABMC counters per group. User will also have the option to enable only
>> one counter to the group. If the system runs out of assignable ABMC
>> counters, kernel will display an error. Users need to disable an already
>> enabled counter to make space for new assignments.
>>
>> The feature can be detected via CPUID_Fn80000020_EBX_x00 bit 5.
>> Bits Description
>> 5 ABMC (Assignable Bandwidth Monitoring Counters)
>
> Can you please update the CPUID database with that new bit:
>
> https://gitlab.com/x86-cpuid.org/x86-cpuid-db
Sure. Should not be a problem. I have let our management to look at this
new project for contributions.
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 21/22] x86/resctrl: Introduce interface to list monitor states of all the groups
2024-08-06 22:00 ` [PATCH v6 21/22] x86/resctrl: Introduce interface to list monitor states of all the groups Babu Moger
@ 2024-08-16 16:28 ` James Morse
2024-08-16 20:40 ` Moger, Babu
0 siblings, 1 reply; 96+ messages in thread
From: James Morse @ 2024-08-16 16:28 UTC (permalink / raw)
To: Babu Moger
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, dave.hansen, tglx, corbet, fenghua.yu, reinette.chatre,
mingo, bp
Hi Babu,
On 06/08/2024 23:00, Babu Moger wrote:
> Provide the interface to list the monitor states of all the resctrl
> groups in ABMC mode.
>
> Example:
> $cat /sys/fs/resctrl/info/L3_MON/mbm_control
>
> List follows the following format:
>
> "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>
> Format for specific type of groups:
>
> - Default CTRL_MON group:
> "//<domain_id>=<flags>"
>
> - Non-default CTRL_MON group:
> "<CTRL_MON group>//<domain_id>=<flags>"
>
> - Child MON group of default CTRL_MON group:
> "/<MON group>/<domain_id>=<flags>"
>
> - Child MON group of non-default CTRL_MON group:
> "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>
> Flags can be one of the following:
> t MBM total event is enabled
> l MBM local event is enabled
> tl Both total and local MBM events are enabled
> _ None of the MBM events are enabled
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index d15fd1bde5f4..d7aadca5e4ab 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -965,6 +965,75 @@ static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
> return 0;
> }
>
> +static char *rdtgroup_mon_state_to_str(struct rdtgroup *rdtgrp,
> + struct rdt_mon_domain *d, char *str)
> +{
> + char *tmp = str;
> + int index;
> +
> + /*
> + * Query the monitor state for the domain.
> + * Index 0 for evtid == QOS_L3_MBM_TOTAL_EVENT_ID
> + * Index 1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
> + */
> + index = mon_event_config_index_get(QOS_L3_MBM_TOTAL_EVENT_ID);
> + if (rdtgrp->mon.cntr_id[index] != MON_CNTR_UNSET &&
> + test_bit(rdtgrp->mon.cntr_id[index], d->mbm_cntr_map))
> + *tmp++ = 't';
> +
> + index = mon_event_config_index_get(QOS_L3_MBM_LOCAL_EVENT_ID);
> + if (rdtgrp->mon.cntr_id[index] != MON_CNTR_UNSET &&
> + test_bit(rdtgrp->mon.cntr_id[index], d->mbm_cntr_map))
> + *tmp++ = 'l';
> +
> + if (tmp == str)
> + *tmp++ = '_';
> +
> + *tmp = '\0';
> + return str;
> +}
> +
> +static int rdtgroup_mbm_control_show(struct kernfs_open_file *of,
> + struct seq_file *s, void *v)
> +{
> + struct rdt_resource *r = of->kn->parent->priv;
> + struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
This is filesystem code, once it moves to /fs/ you can't grab an architecture specific
struct like this. (suggestion below).
> + struct rdt_mon_domain *dom;
> + struct rdtgroup *rdtg;
> + char str[10];
Shouldn't new commands that might fail start with rdt_last_cmd_clear()?
> + if (!hw_res->mbm_cntr_assign_enabled) {
I think this should be wrapped up as:
| resctrl_arch_mbm_cntr_assign_test(r)
as this flag is private to the architecture.
> + rdt_last_cmd_puts("ABMC feature is not enabled\n");
lockdep barks that you need to hold rdtgroup_mutex when calling rdt_last_cmd_puts() -
otherwise this can run in parallel with another syscall.
> + return -EINVAL;
> + }
> +
> + mutex_lock(&rdtgroup_mutex);
> +
> + list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
> + struct rdtgroup *crg;
> +
> + seq_printf(s, "%s//", rdtg->kn->name);
> +
> + list_for_each_entry(dom, &r->mon_domains, hdr.list)
> + seq_printf(s, "%d=%s;", dom->hdr.id,
> + rdtgroup_mon_state_to_str(rdtg, dom, str));
> + seq_putc(s, '\n');
> +
> + list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
> + mon.crdtgrp_list) {
> + seq_printf(s, "%s/%s/", rdtg->kn->name, crg->kn->name);
> +
> + list_for_each_entry(dom, &r->mon_domains, hdr.list)
> + seq_printf(s, "%d=%s;", dom->hdr.id,
> + rdtgroup_mon_state_to_str(crg, dom, str));
> + seq_putc(s, '\n');
> + }
> + }
> +
> + mutex_unlock(&rdtgroup_mutex);
> + return 0;
> +}
Thanks,
James
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 06/22] x86/resctrl: Add support to enable/disable AMD ABMC feature
2024-08-06 22:00 ` [PATCH v6 06/22] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
@ 2024-08-16 16:29 ` James Morse
2024-08-16 20:38 ` Moger, Babu
2024-08-16 21:31 ` Reinette Chatre
1 sibling, 1 reply; 96+ messages in thread
From: James Morse @ 2024-08-16 16:29 UTC (permalink / raw)
To: Babu Moger
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, corbet,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, dave.hansen, reinette.chatre, mingo, fenghua.yu, tglx,
bp
Hi Babu,
Some boring comments about where the code goes...
On 06/08/2024 23:00, Babu Moger wrote:
> Add the functionality to enable/disable AMD ABMC feature.
>
> AMD ABMC feature is enabled by setting enabled bit(0) in MSR
> L3_QOS_EXT_CFG. When the state of ABMC is changed, the MSR needs
> to be updated on all the logical processors in the QOS Domain.
>
> Hardware counters will reset when ABMC state is changed. Reset the
> architectural state so that reading of hardware counter is not considered
> as an overflow in next update.
>
> The ABMC feature details are documented in APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 2bd207624eec..154983a67646 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -536,6 +541,14 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
>
> void arch_mon_domain_online(struct rdt_resource *r, struct rdt_mon_domain *d);
>
> +static inline bool resctrl_arch_get_abmc_enabled(void)
> +{
> + return rdt_resources_all[RDT_RESOURCE_L3].mbm_cntr_assign_enabled;
> +}
Once the filesystem code moves to /fs/resctrl, this can't be inlined from the
architectures internal.h. Accessing rdt_resources_all[] from asm/resctrl.h isn't something
that is done today... could you move this to be a non-inline function in core.c?
(this saves me moving it later!)
> +int resctrl_arch_mbm_cntr_assign_enable(void);
> +void resctrl_arch_mbm_cntr_assign_disable(void);
Please add these in linux/resctrl.h - it saves me moving them later!
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 7e76f8d839fc..6075b1e5bb77 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2402,6 +2402,63 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable)
> +static void _resctrl_abmc_enable(struct rdt_resource *r, bool enable)
> +{
> + struct rdt_mon_domain *d;
> + /*
> + * Hardware counters will reset after switching the monitor mode.
> + * Reset the architectural state so that reading of hardware
> + * counter is not considered as an overflow in the next update.
> + */
> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
> + on_each_cpu_mask(&d->hdr.cpu_mask,
> + resctrl_abmc_set_one_amd, &enable, 1);
> + resctrl_arch_reset_rmid_all(r, d);
> + }
Is there any mileage in getting resctrl_arch_mbm_cntr_assign_enable()'s caller to do this?
Every architecture that supports this will have to do this, and neither x86 nor arm64 are
able to do it atomically, or quicker than calling resctrl_arch_reset_rmid_all() for each
domain.
> +}
> +int resctrl_arch_mbm_cntr_assign_enable(void)
Could we pass the struct rdt_resource in - instead of hard coding it to be the L3 - you
already check hw_res->mbm_cntr_assign_enabled so no additional check is needed...
Background: I'd like to reduce the amount of "I magically know its the L3" to reduce the
work for whoever has to add monitor support for something other than the L3.
(I've currently no plans - but someone is going to build it!)
> +{
> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> + struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
> + lockdep_assert_held(&rdtgroup_mutex);
After the split between the architecture and filesystem code - this lock is private to the
filesystem. If you need to prevent concurrent enable/disable calls the architecture should
take its own mutex.
| static DEFINE_MUTEX(abmc_lock);
?
> + if (r->mon.mbm_cntr_assignable && !hw_res->mbm_cntr_assign_enabled) {
> + _resctrl_abmc_enable(r, true);
> + hw_res->mbm_cntr_assign_enabled = true;
> + }
> +
> + return 0;
> +}
> +
> +void resctrl_arch_mbm_cntr_assign_disable(void)
> +{
> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> + struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
> +
> + lockdep_assert_held(&rdtgroup_mutex);
(same plea for passing the resource in, and not referring to the filesystem's locks)
> + if (hw_res->mbm_cntr_assign_enabled) {
> + _resctrl_abmc_enable(r, false);
> + hw_res->mbm_cntr_assign_enabled = false;
> + }
> +}
The work you do in these functions is pretty symmetric. Is it worth combining them into:
| resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable) {
| struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
|
| if (hw_res->mbm_cntr_assign_enabled != enable) {
| _resctrl_abmc_enable(r, enable
| hw_res->mbm_cntr_assign_enabled = enable;
| }
| }
I think you need a resctrl_arch_mbm_cntr_assign_test() too - I'll comment on that patch...
Thanks,
James
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 09/22] x86/resctrl: Introduce MBM counters bitmap
2024-08-06 22:00 ` [PATCH v6 09/22] x86/resctrl: Introduce MBM counters bitmap Babu Moger
@ 2024-08-16 16:29 ` James Morse
2024-08-16 20:39 ` Moger, Babu
2024-08-16 21:35 ` Reinette Chatre
1 sibling, 1 reply; 96+ messages in thread
From: James Morse @ 2024-08-16 16:29 UTC (permalink / raw)
To: Babu Moger
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, tglx, mingo, dave.hansen, bp, corbet, fenghua.yu,
reinette.chatre
Hi Babu,
On 06/08/2024 23:00, Babu Moger wrote:
> Hardware provides a set of counters when mbm_cntr_assignable feature is
> supported. These counters are used for assigning the events in resctrl
> group when the feature is enabled.
>
> Introduce mbm_cntrs_free_map bitmap to track available and free counters
> and set of routines to allocate and free the counters.
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index ab4fab3b7cf1..c818965e36c9 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -185,6 +185,37 @@ bool closid_allocated(unsigned int closid)
> return !test_bit(closid, &closid_free_map);
> }
>
> +/*
> + * Counter bitmap for tracking the available counters.
> + * ABMC feature provides set of hardware counters for enabling events.
> + * Each event takes one hardware counter. Kernel needs to keep track
> + * of number of available counters.
> + */
> +static DECLARE_BITMAP(mbm_cntrs_free_map, 64);
Please make this resctrl limit of '64' a define in linux/resctrl.h so the arch code knows
what the limit is!
MPAM platforms may have more than this - and really bad things happen if mbm_cntrs_init()
passes bitmap_fill() a value greater than 64.
Even better - could we dynamically allocate this bitmap using the size advertised by the
architecture code?
Thanks,
James
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 15/22] x86/resctrl: Add the interface to assign a hardware counter
2024-08-06 22:00 ` [PATCH v6 15/22] x86/resctrl: Add the interface to assign a hardware counter Babu Moger
@ 2024-08-16 16:30 ` James Morse
2024-08-16 20:39 ` Moger, Babu
2024-08-16 21:41 ` Reinette Chatre
1 sibling, 1 reply; 96+ messages in thread
From: James Morse @ 2024-08-16 16:30 UTC (permalink / raw)
To: Babu Moger
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, dave.hansen, tglx, mingo, bp, corbet, fenghua.yu,
reinette.chatre
Hi Babu,
On 06/08/2024 23:00, Babu Moger wrote:
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID and monitor the bandwidth as long as it is assigned.
> The assigned RMID will be tracked by the hardware until the user unassigns
> it manually.
>
> Counters are configured by writing to L3_QOS_ABMC_CFG MSR and
> specifying the counter id, bandwidth source, and bandwidth types.
>
> Provide the interface to assign the counter ids to RMID.
>
> The feature details are documented in the APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 60696b248b56..1ee91a7293a8 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1864,6 +1864,103 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
> +/*
> + * Send an IPI to the domain to assign the counter id to RMID.
> + */
> +int resctrl_arch_assign_cntr(struct rdt_mon_domain *d, enum resctrl_event_id evtid,
> + u32 rmid, u32 cntr_id, u32 closid, bool assign)
MPAM ends up with a per-resource array of monitor-ids that it uses to map cntr_id
allocated by resctrl to the underlying hardware id. Could this function pass the struct
rdt_resource too?
(this saves me having to assume its the L3 - adding to the technical debt in this area)
Nit: could closid and rmid appear next to each other, and in that order ... just to fit
with other helpers that take both.
> +{
> + struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
> + union l3_qos_abmc_cfg abmc_cfg = { 0 };
> + struct arch_mbm_state *arch_mbm;
> +
> + abmc_cfg.split.cfg_en = 1;
> + abmc_cfg.split.cntr_en = assign ? 1 : 0;
> + abmc_cfg.split.cntr_id = cntr_id;
> + abmc_cfg.split.bw_src = rmid;
> +
> + /* Update the event configuration from the domain */
> + if (evtid == QOS_L3_MBM_TOTAL_EVENT_ID) {
> + abmc_cfg.split.bw_type = hw_dom->mbm_total_cfg;
> + arch_mbm = &hw_dom->arch_mbm_total[rmid];
> + } else {
> + abmc_cfg.split.bw_type = hw_dom->mbm_local_cfg;
> + arch_mbm = &hw_dom->arch_mbm_local[rmid];
> + }
> +
> + smp_call_function_any(&d->hdr.cpu_mask, rdtgroup_abmc_cfg, &abmc_cfg, 1);
> +
> + /*
> + * Reset the architectural state so that reading of hardware
> + * counter is not considered as an overflow in next update.
> + */
> + if (arch_mbm)
> + memset(arch_mbm, 0, sizeof(struct arch_mbm_state));
> +
> + return 0;
> +}
Thanks,
James
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 19/22] x86/resctrl: Introduce the interface to switch between monitor modes
2024-08-06 22:00 ` [PATCH v6 19/22] x86/resctrl: Introduce the interface to switch between monitor modes Babu Moger
@ 2024-08-16 16:31 ` James Morse
2024-08-16 17:01 ` Reinette Chatre
2024-08-16 21:42 ` Reinette Chatre
1 sibling, 1 reply; 96+ messages in thread
From: James Morse @ 2024-08-16 16:31 UTC (permalink / raw)
To: Babu Moger
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, mingo, bp, corbet, dave.hansen, fenghua.yu,
reinette.chatre, tglx
Hi Babu,
On 06/08/2024 23:00, Babu Moger wrote:
> Introduce interface to switch between ABMC and legacy modes.
>
> By default ABMC is enabled on boot if the feature is available.
> Provide the interface to go back to legacy mode if required.
I may have missed it on an earlier version ... why would anyone want the non-ABMC
behaviour on hardware that requires it: counters randomly reset and randomly return
'Unavailable'... is that actually useful?
You default this to on, so there isn't a backward compatibility argument here.
It seems like being able to disable this is a source of complexity - is it needed?
For MPAM I'm looking at enabling this on any platform that is short of monitors. If
user-space disables it I don't have a "at random" hardware behaviour to fall back on - its
extra work to invent a behaviour I'm not sure is useful...
> $ cat /sys/fs/resctrl/info/L3_MON/mbm_mode
> [mbm_cntr_assign]
> legacy
>
> To enable the "mbm_cntr_assign" mode:
> $ echo "mbm_cntr_assign" > /sys/fs/resctrl/info/L3_MON/mbm_mode
>
> To enable the legacy monitoring feature:
> $ echo "legacy" > /sys/fs/resctrl/info/L3_MON/mbm_mode
>
> MBM event counters will reset when mbm_mode is changed.
Thanks,
James
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 20/22] x86/resctrl: Enable AMD ABMC feature by default when supported
2024-08-06 22:00 ` [PATCH v6 20/22] x86/resctrl: Enable AMD ABMC feature by default when supported Babu Moger
@ 2024-08-16 16:32 ` James Morse
2024-08-16 20:40 ` Moger, Babu
2024-08-16 22:33 ` Reinette Chatre
1 sibling, 1 reply; 96+ messages in thread
From: James Morse @ 2024-08-16 16:32 UTC (permalink / raw)
To: Babu Moger
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, mingo, bp, corbet, fenghua.yu, reinette.chatre, tglx,
dave.hansen
Hi Babu,
On 06/08/2024 23:00, Babu Moger wrote:
> Enable ABMC by default when supported during the boot up.
>
> Users will not see any difference in the behavior when resctrl is
> mounted. With automatic assignment everything will work as running
> in the legacy monitor mode.
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 6fb0cfdb5529..a7980f84c487 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -599,6 +599,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
> d = container_of(hdr, struct rdt_mon_domain, hdr);
>
> cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
> + resctrl_arch_mbm_cntr_assign_configure();
> return;
> }
>
> @@ -620,6 +621,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
> arch_mon_domain_online(r, d);
>
> resctrl_mbm_evt_config_init(hw_dom);
> + resctrl_arch_mbm_cntr_assign_configure();
>
> if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
> mon_domain_free(hw_dom);
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 66febff2a3d3..d15fd1bde5f4 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2756,6 +2756,23 @@ void resctrl_arch_mbm_cntr_assign_disable(void)
> }
> }
>
> +void resctrl_arch_mbm_cntr_assign_configure(void)
> +{
> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> + struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
> + bool enable = true;
> +
> + mutex_lock(&rdtgroup_mutex);
As before - this lock isn't available to the architecture code after the filesystem code
moves to /fs/. To prevent concurrent calls to resctrl_abmc_set_one_amd() I think you need
your own mutex.
> + if (r->mon.mbm_cntr_assignable) {
> + if (!hw_res->mbm_cntr_assign_enabled)
> + hw_res->mbm_cntr_assign_enabled = true;
> + resctrl_abmc_set_one_amd(&enable);
> + }
> +
> + mutex_unlock(&rdtgroup_mutex);
> +}
Neither of this functions callers are in filesystem code, could you drop the 'arch' from
the name - it isn't part of the fs/arch interface.
Thanks,
James
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 07/22] x86/resctrl: Introduce the interface to display monitor mode
2024-08-06 22:00 ` [PATCH v6 07/22] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
@ 2024-08-16 16:56 ` James Morse
2024-08-16 20:38 ` Moger, Babu
2024-08-16 21:32 ` Reinette Chatre
1 sibling, 1 reply; 96+ messages in thread
From: James Morse @ 2024-08-16 16:56 UTC (permalink / raw)
To: Babu Moger
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, dave.hansen, tglx, bp, mingo, corbet, fenghua.yu,
reinette.chatre
Hi Babu,
On 06/08/2024 23:00, Babu Moger wrote:
> The mbm_mode displays list of monitor modes supported.
>
> The mbm_cntr_assign is one of the currently supported modes. It is also
> called ABMC (Assignable Bandwidth Monitoring Counters) feature. ABMC
> feature provides option to assign a hardware counter to an RMID and
> monitor the bandwidth as long as it is assigned. ABMC mode is enabled
> by default when supported.
>
> Legacy mode works without the assignment option.
>
> Provide an interface to display the monitor mode on the system.
> $cat /sys/fs/resctrl/info/L3_MON/mbm_mode
> [mbm_cntr_assign]
> legacy
>
> Switching the mbm_mode will reset all the mbm counters of all resctrl
> groups.
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index 30586728a4cd..d4ec605b200a 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -257,6 +257,40 @@ with the following files:
> # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
> 0=0x30;1=0x30;3=0x15;4=0x15
>
> +"mbm_mode":
> + Reports the list of assignable monitoring features supported. The
> + enclosed brackets indicate which feature is enabled.
> + ::
> +
> + cat /sys/fs/resctrl/info/L3_MON/mbm_mode
> + [mbm_cntr_assign]
> + legacy
> +
> + "mbm_cntr_assign":
> + AMD's ABMC feature is one of the mbm_cntr_assign mode supported.
> + The bandwidth monitoring feature on AMD system only guarantees
> + that RMIDs currently assigned to a processor will be tracked by
> + hardware. The counters of any other RMIDs which are no longer
> + being tracked will be reset to zero. The MBM event counters
> + return "Unavailable" for the RMIDs that are not tracked by
> + hardware. So, there can be only limited number of groups that can
> + give guaranteed monitoring numbers. With ever changing configurations
> + there is no way to definitely know which of these groups are being
> + tracked for certain point of time. Users do not have the option to
> + monitor a group or set of groups for certain period of time without
> + worrying about RMID being reset in between.
> +
> + The ABMC feature provides an option to the user to assign a hardware
> + counter to an RMID and monitor the bandwidth as long as it is assigned.
> + The assigned RMID will be tracked by the hardware until the user
> + unassigns it manually. There is no need to worry about counters being
> + reset during this period.
While debugging my rebase of MPAM on top of this series, I've come back to this wording a
few times to try and work out what I should expect to see ...
Is it possible to disentangle the AMD hardware feature description from the description of
the filesystem behaviour this enables? You are really describing what the hardware does if
you don't enable this mode...
An incomplete suggestion of the shape would be something like:
| In mbm_cntr_assign mode user-space is able to specify which control
| or monitor groups in resctrl should have a hardware counter assigned
| using the 'mbm_control' file. The number of hardware counters available
| is described in the 'num_mbm_cntrs' file.
| Changing this mode will cause all counters on a resource to reset.
|
| The feature is needed on platforms which support more control and monitor
| groups than hardware counters, meaning 'unassigned' control or monitor groups will
| report 'Unavailable' or not count all the traffic in an unpredictable way.
|
| Platforms with AMDs ABMC feature enable this mode by default so that counters
| remain assigned even when the corresponding RMID is not in use by any processor.
> + "Legacy":
Calling "enough hardware counters" 'legacy' is a bit curious.... 'default'?
(but I haven't worked out the benefit of disabling this mode, so maybe it doesn't need a
name.)
> + Legacy mode works without the assignment option. The monitoring works
> + as long as there are enough RMID counters available to support number
> + of monitoring groups.
How can user-space tell this is the case? Could we be specific as to what 'works' means?
Something like:
| By default resctrl assumes each control and monitor group has a hardware counter.
| Hardware without this property will still allow more control or monitor groups
| than 'num_mbm_cntrs' to be created. Reading the mbm files may report
| 'Unavailable' if there is no hardware resource assigned.
N.B. I don't suggest referring to the num_rmid file in these as MPAM doesn't have an
equivalent property.
Thanks,
James
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 19/22] x86/resctrl: Introduce the interface to switch between monitor modes
2024-08-16 16:31 ` James Morse
@ 2024-08-16 17:01 ` Reinette Chatre
2024-08-16 17:16 ` Peter Newman
0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-08-16 17:01 UTC (permalink / raw)
To: James Morse, Babu Moger
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, mingo, bp, corbet, dave.hansen, fenghua.yu, tglx
Hi James,
On 8/16/24 9:31 AM, James Morse wrote:
> Hi Babu,
>
> On 06/08/2024 23:00, Babu Moger wrote:
>> Introduce interface to switch between ABMC and legacy modes.
>>
>> By default ABMC is enabled on boot if the feature is available.
>> Provide the interface to go back to legacy mode if required.
>
> I may have missed it on an earlier version ... why would anyone want the non-ABMC
> behaviour on hardware that requires it: counters randomly reset and randomly return
> 'Unavailable'... is that actually useful?
>
> You default this to on, so there isn't a backward compatibility argument here.
>
> It seems like being able to disable this is a source of complexity - is it needed?
The ability to go back to legacy was added while looking ahead to support the next
"assignable counter" feature that is software based ("soft-RMID" .. "soft-ABMC"?).
This series adds support for ABMC on recent AMD hardware to address the issue described
in cover letter. This issue also exists on earlier AMD hardware that does not have the ABMC
feature and Peter is working on a software solution to address the issue on non-ABMC hardware.
This software solution is expected to have the same interface as the hardware solution but
earlier discussions revealed that it may introduce extra latency that users may only want to
accept during periods of active monitoring. Thus the option to disable the counter assignment
mode.
Your point about users returning to "legacy" mode on ABMC hardware is valid. I do not know
if that is useful. Here I can only speculate since monitoring with ABMC is
more accurate but also requires more user space involvement to assign counters while legacy
mode is less accurate while requiring less user space involvement.
> For MPAM I'm looking at enabling this on any platform that is short of monitors. If
> user-space disables it I don't have a "at random" hardware behaviour to fall back on - its
> extra work to invent a behaviour I'm not sure is useful...
It should not be required for MPAM to have a "legacy" mode. resctrl fs can expose only one
mode that is always enabled. Noting this now is important so that we can get the wording right
in the documentation.
Thanks for chiming in on MPAM's plans for this work.
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 19/22] x86/resctrl: Introduce the interface to switch between monitor modes
2024-08-16 17:01 ` Reinette Chatre
@ 2024-08-16 17:16 ` Peter Newman
2024-08-16 18:09 ` Reinette Chatre
0 siblings, 1 reply; 96+ messages in thread
From: Peter Newman @ 2024-08-16 17:16 UTC (permalink / raw)
To: Reinette Chatre
Cc: James Morse, Babu Moger, x86, hpa, paulmck, rdunlap, tj, peterz,
yanjiewtw, kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao,
jpoimboe, rick.p.edgecombe, kirill.shutemov, jithu.joseph,
kai.huang, kan.liang, daniel.sneddon, pbonzini, sandipan.das,
ilpo.jarvinen, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, mingo, bp, corbet, dave.hansen, fenghua.yu, tglx
Hi Reinette,
On Fri, Aug 16, 2024 at 10:01 AM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> Hi James,
>
> On 8/16/24 9:31 AM, James Morse wrote:
> > Hi Babu,
> >
> > On 06/08/2024 23:00, Babu Moger wrote:
> >> Introduce interface to switch between ABMC and legacy modes.
> >>
> >> By default ABMC is enabled on boot if the feature is available.
> >> Provide the interface to go back to legacy mode if required.
> >
> > I may have missed it on an earlier version ... why would anyone want the non-ABMC
> > behaviour on hardware that requires it: counters randomly reset and randomly return
> > 'Unavailable'... is that actually useful?
> >
> > You default this to on, so there isn't a backward compatibility argument here.
> >
> > It seems like being able to disable this is a source of complexity - is it needed?
>
> The ability to go back to legacy was added while looking ahead to support the next
> "assignable counter" feature that is software based ("soft-RMID" .. "soft-ABMC"?).
>
> This series adds support for ABMC on recent AMD hardware to address the issue described
> in cover letter. This issue also exists on earlier AMD hardware that does not have the ABMC
> feature and Peter is working on a software solution to address the issue on non-ABMC hardware.
> This software solution is expected to have the same interface as the hardware solution but
> earlier discussions revealed that it may introduce extra latency that users may only want to
> accept during periods of active monitoring. Thus the option to disable the counter assignment
> mode.
Sorry again for the soft-RMID/soft-ABMC confusion[1], it was soft-RMID
that impacted context switch latency. Soft-ABMC does not require any
additional work at context switch.
The only disadvantage to soft-ABMC I can think of is that it also
limits reading llc_occupancy event counts to "assigned" groups,
whereas without it, llc_occupancy works reliably on all RMIDs on AMD
hardware.
-Peter
[1] https://lore.kernel.org/lkml/CALPaoChDv+irGEmccaQ6SpsuVS8PZ_cfzPgceq3hD3N2cqNjZA@mail.gmail.com/
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 19/22] x86/resctrl: Introduce the interface to switch between monitor modes
2024-08-16 17:16 ` Peter Newman
@ 2024-08-16 18:09 ` Reinette Chatre
2024-08-19 14:52 ` Reinette Chatre
0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-08-16 18:09 UTC (permalink / raw)
To: Peter Newman
Cc: James Morse, Babu Moger, x86, hpa, paulmck, rdunlap, tj, peterz,
yanjiewtw, kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao,
jpoimboe, rick.p.edgecombe, kirill.shutemov, jithu.joseph,
kai.huang, kan.liang, daniel.sneddon, pbonzini, sandipan.das,
ilpo.jarvinen, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, mingo, bp, corbet, dave.hansen, fenghua.yu, tglx
Hi Peter,
On 8/16/24 10:16 AM, Peter Newman wrote:
> Hi Reinette,
>
> On Fri, Aug 16, 2024 at 10:01 AM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>>
>> Hi James,
>>
>> On 8/16/24 9:31 AM, James Morse wrote:
>>> Hi Babu,
>>>
>>> On 06/08/2024 23:00, Babu Moger wrote:
>>>> Introduce interface to switch between ABMC and legacy modes.
>>>>
>>>> By default ABMC is enabled on boot if the feature is available.
>>>> Provide the interface to go back to legacy mode if required.
>>>
>>> I may have missed it on an earlier version ... why would anyone want the non-ABMC
>>> behaviour on hardware that requires it: counters randomly reset and randomly return
>>> 'Unavailable'... is that actually useful?
>>>
>>> You default this to on, so there isn't a backward compatibility argument here.
>>>
>>> It seems like being able to disable this is a source of complexity - is it needed?
>>
>> The ability to go back to legacy was added while looking ahead to support the next
>> "assignable counter" feature that is software based ("soft-RMID" .. "soft-ABMC"?).
>>
>> This series adds support for ABMC on recent AMD hardware to address the issue described
>> in cover letter. This issue also exists on earlier AMD hardware that does not have the ABMC
>> feature and Peter is working on a software solution to address the issue on non-ABMC hardware.
>> This software solution is expected to have the same interface as the hardware solution but
>> earlier discussions revealed that it may introduce extra latency that users may only want to
>> accept during periods of active monitoring. Thus the option to disable the counter assignment
>> mode.
>
> Sorry again for the soft-RMID/soft-ABMC confusion[1], it was soft-RMID
> that impacted context switch latency. Soft-ABMC does not require any
> additional work at context switch.
No problem. I did read [1] but I do not think I've seen soft-ABMC yet so
my understanding of what it does is vague.
> The only disadvantage to soft-ABMC I can think of is that it also
> limits reading llc_occupancy event counts to "assigned" groups,
> whereas without it, llc_occupancy works reliably on all RMIDs on AMD
> hardware.
hmmm ... keeping original llc_occupancy behavior does seem useful enough
as motivation to keep the "legacy"/"default" mbm_assign_mode? It does sound
to me as though soft-ABMC may not be as accurate when it comes to llc_occupancy.
As I understand the hardware may tag entries in cache with RMID and that has a longer
lifetime than the tasks that allocated that data into the cache. If soft-ABMC
permanently associates an RMID with a local and total counter pair but that
RMID is dynamically assigned to resctrl groups then a group may not always
get the same RMID ... and thus its llc_occupancy data would be a combination of
its cache allocations and all the cache allocations of resource groups that had
that RMID before it. This may need significantly enhanced "limbo" handling?
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 06/22] x86/resctrl: Add support to enable/disable AMD ABMC feature
2024-08-16 16:29 ` James Morse
@ 2024-08-16 20:38 ` Moger, Babu
0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-08-16 20:38 UTC (permalink / raw)
To: James Morse
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, corbet,
kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, dave.hansen, reinette.chatre, mingo, fenghua.yu, tglx,
bp
Hi James,
On 8/16/24 11:29, James Morse wrote:
> Hi Babu,
>
> Some boring comments about where the code goes...
No worries. Lets address it when we can.
>
> On 06/08/2024 23:00, Babu Moger wrote:
>> Add the functionality to enable/disable AMD ABMC feature.
>>
>> AMD ABMC feature is enabled by setting enabled bit(0) in MSR
>> L3_QOS_EXT_CFG. When the state of ABMC is changed, the MSR needs
>> to be updated on all the logical processors in the QOS Domain.
>>
>> Hardware counters will reset when ABMC state is changed. Reset the
>> architectural state so that reading of hardware counter is not considered
>> as an overflow in next update.
>>
>> The ABMC feature details are documented in APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC).
>
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 2bd207624eec..154983a67646 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>
>> @@ -536,6 +541,14 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
>>
>> void arch_mon_domain_online(struct rdt_resource *r, struct rdt_mon_domain *d);
>>
>> +static inline bool resctrl_arch_get_abmc_enabled(void)
>> +{
>> + return rdt_resources_all[RDT_RESOURCE_L3].mbm_cntr_assign_enabled;
>> +}
>
> Once the filesystem code moves to /fs/resctrl, this can't be inlined from the
> architectures internal.h. Accessing rdt_resources_all[] from asm/resctrl.h isn't something
> that is done today... could you move this to be a non-inline function in core.c?
Sure.
>
> (this saves me moving it later!)
>
>
>> +int resctrl_arch_mbm_cntr_assign_enable(void);
>> +void resctrl_arch_mbm_cntr_assign_disable(void);
>
> Please add these in linux/resctrl.h - it saves me moving them later!
>
Sure.
>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 7e76f8d839fc..6075b1e5bb77 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -2402,6 +2402,63 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable)
>
>> +static void _resctrl_abmc_enable(struct rdt_resource *r, bool enable)
>> +{
>> + struct rdt_mon_domain *d;
>
>
>> + /*
>> + * Hardware counters will reset after switching the monitor mode.
>> + * Reset the architectural state so that reading of hardware
>> + * counter is not considered as an overflow in the next update.
>> + */
>> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> + on_each_cpu_mask(&d->hdr.cpu_mask,
>> + resctrl_abmc_set_one_amd, &enable, 1);
>> + resctrl_arch_reset_rmid_all(r, d);
>> + }
>
> Is there any mileage in getting resctrl_arch_mbm_cntr_assign_enable()'s caller to do this?
> Every architecture that supports this will have to do this, and neither x86 nor arm64 are
> able to do it atomically, or quicker than calling resctrl_arch_reset_rmid_all() for each
> domain.
Yes. I think it is better to it at at higher level(at
rdtgroup_mbm_mode_write). That way it is common across all the architectures.
>
>> +}
>
>
>> +int resctrl_arch_mbm_cntr_assign_enable(void)
>
> Could we pass the struct rdt_resource in - instead of hard coding it to be the L3 - you
> already check hw_res->mbm_cntr_assign_enabled so no additional check is needed...
>
> Background: I'd like to reduce the amount of "I magically know its the L3" to reduce the
> work for whoever has to add monitor support for something other than the L3.
> (I've currently no plans - but someone is going to build it!)
Yes. We can pass struct rdt_resource.
>
>
>> +{
>> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> + struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>
>> + lockdep_assert_held(&rdtgroup_mutex);
>
> After the split between the architecture and filesystem code - this lock is private to the
> filesystem. If you need to prevent concurrent enable/disable calls the architecture should
> take its own mutex.
>
> | static DEFINE_MUTEX(abmc_lock);
> ?
These calls are originated from filesystem (in this case
rdtgroup_mbm_mode_write) which holds the mutex already. I don't think we
need a separate lock here. Let me know If I am missing something here.
>
>
>> + if (r->mon.mbm_cntr_assignable && !hw_res->mbm_cntr_assign_enabled) {
>> + _resctrl_abmc_enable(r, true);
>> + hw_res->mbm_cntr_assign_enabled = true;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +void resctrl_arch_mbm_cntr_assign_disable(void)
>> +{
>> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> + struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>> +
>> + lockdep_assert_held(&rdtgroup_mutex);
>
> (same plea for passing the resource in, and not referring to the filesystem's locks)
Sure.
>
>
>> + if (hw_res->mbm_cntr_assign_enabled) {
>> + _resctrl_abmc_enable(r, false);
>> + hw_res->mbm_cntr_assign_enabled = false;
>> + }
>> +}
>
>
> The work you do in these functions is pretty symmetric. Is it worth combining them into:
> | resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable) {
> | struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
> |
> | if (hw_res->mbm_cntr_assign_enabled != enable) {
> | _resctrl_abmc_enable(r, enable
> | hw_res->mbm_cntr_assign_enabled = enable;
> | }
> | }
Yes. We can do it.
>
> I think you need a resctrl_arch_mbm_cntr_assign_test() too - I'll comment on that patch...
>
>
> Thanks,
>
> James
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 07/22] x86/resctrl: Introduce the interface to display monitor mode
2024-08-16 16:56 ` James Morse
@ 2024-08-16 20:38 ` Moger, Babu
0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-08-16 20:38 UTC (permalink / raw)
To: James Morse
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, dave.hansen, tglx, bp, mingo, corbet, fenghua.yu,
reinette.chatre
Hi James,
On 8/16/24 11:56, James Morse wrote:
> Hi Babu,
>
> On 06/08/2024 23:00, Babu Moger wrote:
>> The mbm_mode displays list of monitor modes supported.
>>
>> The mbm_cntr_assign is one of the currently supported modes. It is also
>> called ABMC (Assignable Bandwidth Monitoring Counters) feature. ABMC
>> feature provides option to assign a hardware counter to an RMID and
>> monitor the bandwidth as long as it is assigned. ABMC mode is enabled
>> by default when supported.
>>
>> Legacy mode works without the assignment option.
>>
>> Provide an interface to display the monitor mode on the system.
>> $cat /sys/fs/resctrl/info/L3_MON/mbm_mode
>> [mbm_cntr_assign]
>> legacy
>>
>> Switching the mbm_mode will reset all the mbm counters of all resctrl
>> groups.
>
>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>> index 30586728a4cd..d4ec605b200a 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -257,6 +257,40 @@ with the following files:
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
>> 0=0x30;1=0x30;3=0x15;4=0x15
>>
>> +"mbm_mode":
>> + Reports the list of assignable monitoring features supported. The
>> + enclosed brackets indicate which feature is enabled.
>> + ::
>> +
>> + cat /sys/fs/resctrl/info/L3_MON/mbm_mode
>> + [mbm_cntr_assign]
>> + legacy
>> +
>> + "mbm_cntr_assign":
>> + AMD's ABMC feature is one of the mbm_cntr_assign mode supported.
>> + The bandwidth monitoring feature on AMD system only guarantees
>> + that RMIDs currently assigned to a processor will be tracked by
>> + hardware. The counters of any other RMIDs which are no longer
>> + being tracked will be reset to zero. The MBM event counters
>> + return "Unavailable" for the RMIDs that are not tracked by
>> + hardware. So, there can be only limited number of groups that can
>> + give guaranteed monitoring numbers. With ever changing configurations
>> + there is no way to definitely know which of these groups are being
>> + tracked for certain point of time. Users do not have the option to
>> + monitor a group or set of groups for certain period of time without
>> + worrying about RMID being reset in between.
>> +
>> + The ABMC feature provides an option to the user to assign a hardware
>> + counter to an RMID and monitor the bandwidth as long as it is assigned.
>> + The assigned RMID will be tracked by the hardware until the user
>> + unassigns it manually. There is no need to worry about counters being
>> + reset during this period.
>
> While debugging my rebase of MPAM on top of this series, I've come back to this wording a
> few times to try and work out what I should expect to see ...
>
> Is it possible to disentangle the AMD hardware feature description from the description of
> the filesystem behaviour this enables? You are really describing what the hardware does if
> you don't enable this mode...
>
> An incomplete suggestion of the shape would be something like:
>
> | In mbm_cntr_assign mode user-space is able to specify which control
> | or monitor groups in resctrl should have a hardware counter assigned
> | using the 'mbm_control' file. The number of hardware counters available
> | is described in the 'num_mbm_cntrs' file.
> | Changing this mode will cause all counters on a resource to reset.
> |
> | The feature is needed on platforms which support more control and monitor
> | groups than hardware counters, meaning 'unassigned' control or monitor groups will
> | report 'Unavailable' or not count all the traffic in an unpredictable way.
> |
> | Platforms with AMDs ABMC feature enable this mode by default so that counters
> | remain assigned even when the corresponding RMID is not in use by any processor.
>
Looks good to me.
>
>> + "Legacy":
>
> Calling "enough hardware counters" 'legacy' is a bit curious.... 'default'?
"Default" Sound good me to if no objections from others.
> (but I haven't worked out the benefit of disabling this mode, so maybe it doesn't need a
> name.)
>
>> + Legacy mode works without the assignment option. The monitoring works
>> + as long as there are enough RMID counters available to support number
>> + of monitoring groups.
>
> How can user-space tell this is the case? Could we be specific as to what 'works' means?
>
> Something like:
> | By default resctrl assumes each control and monitor group has a hardware counter.
> | Hardware without this property will still allow more control or monitor groups
> | than 'num_mbm_cntrs' to be created. Reading the mbm files may report
> | 'Unavailable' if there is no hardware resource assigned.
Looks good.
>
>
> N.B. I don't suggest referring to the num_rmid file in these as MPAM doesn't have an
> equivalent property.
>
>
> Thanks,
>
> James
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 09/22] x86/resctrl: Introduce MBM counters bitmap
2024-08-16 16:29 ` James Morse
@ 2024-08-16 20:39 ` Moger, Babu
0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-08-16 20:39 UTC (permalink / raw)
To: James Morse
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, tglx, mingo, dave.hansen, bp, corbet, fenghua.yu,
reinette.chatre
Hi James,
On 8/16/24 11:29, James Morse wrote:
> Hi Babu,
>
> On 06/08/2024 23:00, Babu Moger wrote:
>> Hardware provides a set of counters when mbm_cntr_assignable feature is
>> supported. These counters are used for assigning the events in resctrl
>> group when the feature is enabled.
>>
>> Introduce mbm_cntrs_free_map bitmap to track available and free counters
>> and set of routines to allocate and free the counters.
>
>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index ab4fab3b7cf1..c818965e36c9 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -185,6 +185,37 @@ bool closid_allocated(unsigned int closid)
>> return !test_bit(closid, &closid_free_map);
>> }
>>
>> +/*
>> + * Counter bitmap for tracking the available counters.
>> + * ABMC feature provides set of hardware counters for enabling events.
>> + * Each event takes one hardware counter. Kernel needs to keep track
>> + * of number of available counters.
>> + */
>> +static DECLARE_BITMAP(mbm_cntrs_free_map, 64);
>
> Please make this resctrl limit of '64' a define in linux/resctrl.h so the arch code knows
> what the limit is!
>
> MPAM platforms may have more than this - and really bad things happen if mbm_cntrs_init()
> passes bitmap_fill() a value greater than 64.
>
> Even better - could we dynamically allocate this bitmap using the size advertised by the
> architecture code?
Yes. Actually, I was thinking about allocating it dynamically. It needs
few other changes as well. Will do it.
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 15/22] x86/resctrl: Add the interface to assign a hardware counter
2024-08-16 16:30 ` James Morse
@ 2024-08-16 20:39 ` Moger, Babu
0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-08-16 20:39 UTC (permalink / raw)
To: James Morse
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, dave.hansen, tglx, mingo, bp, corbet, fenghua.yu,
reinette.chatre
Hi James,
On 8/16/24 11:30, James Morse wrote:
> Hi Babu,
>
> On 06/08/2024 23:00, Babu Moger wrote:
>> The ABMC feature provides an option to the user to assign a hardware
>> counter to an RMID and monitor the bandwidth as long as it is assigned.
>> The assigned RMID will be tracked by the hardware until the user unassigns
>> it manually.
>>
>> Counters are configured by writing to L3_QOS_ABMC_CFG MSR and
>> specifying the counter id, bandwidth source, and bandwidth types.
>>
>> Provide the interface to assign the counter ids to RMID.
>>
>> The feature details are documented in the APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC).
>
>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 60696b248b56..1ee91a7293a8 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1864,6 +1864,103 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
>
>> +/*
>> + * Send an IPI to the domain to assign the counter id to RMID.
>> + */
>> +int resctrl_arch_assign_cntr(struct rdt_mon_domain *d, enum resctrl_event_id evtid,
>> + u32 rmid, u32 cntr_id, u32 closid, bool assign)
>
> MPAM ends up with a per-resource array of monitor-ids that it uses to map cntr_id
> allocated by resctrl to the underlying hardware id. Could this function pass the struct
> rdt_resource too?
> (this saves me having to assume its the L3 - adding to the technical debt in this area)
Yes. We can pass rdt_resource. It will be 7 parameters for this function.
Hope that is fine.
>
> Nit: could closid and rmid appear next to each other, and in that order ... just to fit
> with other helpers that take both.
Sure.
>
>
>> +{
>> + struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
>> + union l3_qos_abmc_cfg abmc_cfg = { 0 };
>> + struct arch_mbm_state *arch_mbm;
>> +
>> + abmc_cfg.split.cfg_en = 1;
>> + abmc_cfg.split.cntr_en = assign ? 1 : 0;
>> + abmc_cfg.split.cntr_id = cntr_id;
>> + abmc_cfg.split.bw_src = rmid;
>> +
>> + /* Update the event configuration from the domain */
>> + if (evtid == QOS_L3_MBM_TOTAL_EVENT_ID) {
>> + abmc_cfg.split.bw_type = hw_dom->mbm_total_cfg;
>> + arch_mbm = &hw_dom->arch_mbm_total[rmid];
>> + } else {
>> + abmc_cfg.split.bw_type = hw_dom->mbm_local_cfg;
>> + arch_mbm = &hw_dom->arch_mbm_local[rmid];
>> + }
>> +
>> + smp_call_function_any(&d->hdr.cpu_mask, rdtgroup_abmc_cfg, &abmc_cfg, 1);
>> +
>> + /*
>> + * Reset the architectural state so that reading of hardware
>> + * counter is not considered as an overflow in next update.
>> + */
>> + if (arch_mbm)
>> + memset(arch_mbm, 0, sizeof(struct arch_mbm_state));
>> +
>> + return 0;
>> +}
>
>
> Thanks,
>
> James
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 20/22] x86/resctrl: Enable AMD ABMC feature by default when supported
2024-08-16 16:32 ` James Morse
@ 2024-08-16 20:40 ` Moger, Babu
0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-08-16 20:40 UTC (permalink / raw)
To: James Morse
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, mingo, bp, corbet, fenghua.yu, reinette.chatre, tglx,
dave.hansen
Hi James,
On 8/16/24 11:32, James Morse wrote:
> Hi Babu,
>
> On 06/08/2024 23:00, Babu Moger wrote:
>> Enable ABMC by default when supported during the boot up.
>>
>> Users will not see any difference in the behavior when resctrl is
>> mounted. With automatic assignment everything will work as running
>> in the legacy monitor mode.
>
>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>> index 6fb0cfdb5529..a7980f84c487 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>> @@ -599,6 +599,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
>> d = container_of(hdr, struct rdt_mon_domain, hdr);
>>
>> cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
>> + resctrl_arch_mbm_cntr_assign_configure();
>> return;
>> }
>>
>> @@ -620,6 +621,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
>> arch_mon_domain_online(r, d);
>>
>> resctrl_mbm_evt_config_init(hw_dom);
>> + resctrl_arch_mbm_cntr_assign_configure();
>>
>> if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
>> mon_domain_free(hw_dom);
>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 66febff2a3d3..d15fd1bde5f4 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -2756,6 +2756,23 @@ void resctrl_arch_mbm_cntr_assign_disable(void)
>> }
>> }
>>
>> +void resctrl_arch_mbm_cntr_assign_configure(void)
>> +{
>> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> + struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>> + bool enable = true;
>> +
>> + mutex_lock(&rdtgroup_mutex);
>
> As before - this lock isn't available to the architecture code after the filesystem code
> moves to /fs/. To prevent concurrent calls to resctrl_abmc_set_one_amd() I think you need
> your own mutex.
These calls are originated from domain_add_cpu_mon sequence. It holds the
lock already. We should be good. I will remove the lock.
>
>
>> + if (r->mon.mbm_cntr_assignable) {
>> + if (!hw_res->mbm_cntr_assign_enabled)
>> + hw_res->mbm_cntr_assign_enabled = true;
>> + resctrl_abmc_set_one_amd(&enable);
>> + }
>> +
>> + mutex_unlock(&rdtgroup_mutex);
>> +}
>
> Neither of this functions callers are in filesystem code, could you drop the 'arch' from
> the name - it isn't part of the fs/arch interface.
Yes. Sure.
>
>
> Thanks,
>
> James
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 21/22] x86/resctrl: Introduce interface to list monitor states of all the groups
2024-08-16 16:28 ` James Morse
@ 2024-08-16 20:40 ` Moger, Babu
0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-08-16 20:40 UTC (permalink / raw)
To: James Morse
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, dave.hansen, tglx, corbet, fenghua.yu, reinette.chatre,
mingo, bp
Hi James,
On 8/16/24 11:28, James Morse wrote:
> Hi Babu,
>
> On 06/08/2024 23:00, Babu Moger wrote:
>> Provide the interface to list the monitor states of all the resctrl
>> groups in ABMC mode.
>>
>> Example:
>> $cat /sys/fs/resctrl/info/L3_MON/mbm_control
>>
>> List follows the following format:
>>
>> "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>>
>> Format for specific type of groups:
>>
>> - Default CTRL_MON group:
>> "//<domain_id>=<flags>"
>>
>> - Non-default CTRL_MON group:
>> "<CTRL_MON group>//<domain_id>=<flags>"
>>
>> - Child MON group of default CTRL_MON group:
>> "/<MON group>/<domain_id>=<flags>"
>>
>> - Child MON group of non-default CTRL_MON group:
>> "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>>
>> Flags can be one of the following:
>> t MBM total event is enabled
>> l MBM local event is enabled
>> tl Both total and local MBM events are enabled
>> _ None of the MBM events are enabled
>
>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index d15fd1bde5f4..d7aadca5e4ab 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -965,6 +965,75 @@ static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
>> return 0;
>> }
>>
>> +static char *rdtgroup_mon_state_to_str(struct rdtgroup *rdtgrp,
>> + struct rdt_mon_domain *d, char *str)
>> +{
>> + char *tmp = str;
>> + int index;
>> +
>> + /*
>> + * Query the monitor state for the domain.
>> + * Index 0 for evtid == QOS_L3_MBM_TOTAL_EVENT_ID
>> + * Index 1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
>> + */
>> + index = mon_event_config_index_get(QOS_L3_MBM_TOTAL_EVENT_ID);
>> + if (rdtgrp->mon.cntr_id[index] != MON_CNTR_UNSET &&
>> + test_bit(rdtgrp->mon.cntr_id[index], d->mbm_cntr_map))
>> + *tmp++ = 't';
>> +
>> + index = mon_event_config_index_get(QOS_L3_MBM_LOCAL_EVENT_ID);
>> + if (rdtgrp->mon.cntr_id[index] != MON_CNTR_UNSET &&
>> + test_bit(rdtgrp->mon.cntr_id[index], d->mbm_cntr_map))
>> + *tmp++ = 'l';
>> +
>> + if (tmp == str)
>> + *tmp++ = '_';
>> +
>> + *tmp = '\0';
>> + return str;
>> +}
>> +
>> +static int rdtgroup_mbm_control_show(struct kernfs_open_file *of,
>> + struct seq_file *s, void *v)
>> +{
>> + struct rdt_resource *r = of->kn->parent->priv;
>
>> + struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>
> This is filesystem code, once it moves to /fs/ you can't grab an architecture specific
> struct like this. (suggestion below).
Yes. Correct. I don't need this. I will remove this.
>
>
>> + struct rdt_mon_domain *dom;
>> + struct rdtgroup *rdtg;
>> + char str[10];
>
> Shouldn't new commands that might fail start with rdt_last_cmd_clear()?
Yes. Correct.
>
>
>> + if (!hw_res->mbm_cntr_assign_enabled) {
>
> I think this should be wrapped up as:
> | resctrl_arch_mbm_cntr_assign_test(r)
>
> as this flag is private to the architecture.
I can use resctrl_arch_get_abmc_enabled()
>
>
>> + rdt_last_cmd_puts("ABMC feature is not enabled\n");
>
> lockdep barks that you need to hold rdtgroup_mutex when calling rdt_last_cmd_puts() -
> otherwise this can run in parallel with another syscall.
Ok. Sure. Will fix it.
>
>> + return -EINVAL;
>> + }
>> +
>> + mutex_lock(&rdtgroup_mutex);
>> +
>> + list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
>> + struct rdtgroup *crg;
>> +
>> + seq_printf(s, "%s//", rdtg->kn->name);
>> +
>> + list_for_each_entry(dom, &r->mon_domains, hdr.list)
>> + seq_printf(s, "%d=%s;", dom->hdr.id,
>> + rdtgroup_mon_state_to_str(rdtg, dom, str));
>> + seq_putc(s, '\n');
>> +
>> + list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
>> + mon.crdtgrp_list) {
>> + seq_printf(s, "%s/%s/", rdtg->kn->name, crg->kn->name);
>> +
>> + list_for_each_entry(dom, &r->mon_domains, hdr.list)
>> + seq_printf(s, "%d=%s;", dom->hdr.id,
>> + rdtgroup_mon_state_to_str(crg, dom, str));
>> + seq_putc(s, '\n');
>> + }
>> + }
>> +
>> + mutex_unlock(&rdtgroup_mutex);
>> + return 0;
>> +}
>
>
> Thanks,
>
> James
>
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
2024-08-06 22:00 [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (21 preceding siblings ...)
2024-08-06 22:00 ` [PATCH v6 22/22] x86/resctrl: Introduce interface to modify assignment states of " Babu Moger
@ 2024-08-16 21:28 ` Reinette Chatre
2024-08-22 1:31 ` Moger, Babu
22 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-08-16 21:28 UTC (permalink / raw)
To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
On 8/6/24 3:00 PM, Babu Moger wrote:
>
> Feature adds following interface files:
>
> /sys/fs/resctrl/info/L3_MON/mbm_mode: Reports the list of assignable
> monitoring features supported. The enclosed brackets indicate which
> feature is enabled.
I've been considering this file as a generic file where all future "MBM modes"
can be captured, while this series treats it as specific to "assignable monitoring
features" (btw, should this be "assignable monitoring modes" to match the name?).
Looking closer at this implementation it does make things easier that "mbm_mode" is
specific to "assignable monitoring features" but when doing so I think it should have
a less generic name to avoid the obstacles we have with the existing "mon_features".
Apologies that this goes back to be close to what you had earlier ... maybe
"mbm_assign_mode"?
>
> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
> counters available for assignment.
>
> /sys/fs/resctrl/info/L3_MON/mbm_control: Reports the resctrl group and monitor
> status of each group. Assignment state can be updated by writing to the
> interface.
>
> # Examples
>
> a. Check if ABMC support is available
> #mount -t resctrl resctrl /sys/fs/resctrl/
>
> #cat /sys/fs/resctrl/info/L3_MON/mbm_mode
> [mbm_cntr_assign]
> legacy
>
> ABMC feature is detected and it is enabled.
>
> b. Check how many ABMC counters are available.
>
> #cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 32
>
> c. Create few resctrl groups.
>
> # mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
> # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
> # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
>
>
> d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_control
> to list and modify the group's monitoring states. File provides single place
> to list monitoring states of all the resctrl groups. It makes it easier for
> user space to learn about the counters are used without needing to traverse
"to learn about the counters are used" -> "to learn the counters that are used" or
"to learn about the used counters" or ...?
> all the groups thus reducing the number of file system calls.
>
> The list follows the following format:
>
> "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>
> Format for specific type of groups:
>
> * Default CTRL_MON group:
> "//<domain_id>=<flags>"
>
> * Non-default CTRL_MON group:
> "<CTRL_MON group>//<domain_id>=<flags>"
>
> * Child MON group of default CTRL_MON group:
> "/<MON group>/<domain_id>=<flags>"
>
> * Child MON group of non-default CTRL_MON group:
> "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>
> Flags can be one of the following:
>
> t MBM total event is enabled.
> l MBM local event is enabled.
> tl Both total and local MBM events are enabled.
> _ None of the MBM events are enabled
>
> Examples:
>
> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
> non_default_ctrl_mon_grp//0=tl;1=tl;
> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> //0=tl;1=tl;
> /child_default_mon_grp/0=tl;1=tl;
>
> There are four groups and all the groups have local and total
> event enabled on domain 0 and 1.
>
> e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_control.
>
> The write format is similar to the above list format with addition
> of opcode for the assignment operation.
> “<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>”
>
>
> * Default CTRL_MON group:
> "//<domain_id><opcode><flags>"
>
> * Non-default CTRL_MON group:
> "<CTRL_MON group>//<domain_id><opcode><flags>"
>
> * Child MON group of default CTRL_MON group:
> "/<MON group>/<domain_id><opcode><flags>"
>
> * Child MON group of non-default CTRL_MON group:
> "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
>
> Opcode can be one of the following:
>
> = Update the assignment to match the flag.
> + Assign a new event.
> - Unassign a new event.
Since user space can provide more than one flag the text could be more accurate
noting this. Eg. "Update the assignment to match the flag" -> "Update the assignment
to match the flags.".
>
> Flags can be one of the following:
>
> t MBM total event.
> l MBM local event.
> tl Both total and local MBM events.
> _ None of the MBM events. Only works with '=' opcode.
Please take care with the implementation that seems to support a variety of
combinations. If I understand correctly the implementation support flags like,
for example, "tttt", "llll", "ltlt" ... those may not be an issue but of most
concern is, for example, a pattern like "_lt" that (unexpectedly) appears to
result in set of total and local.
>
> Initial group status:
> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
> non_default_ctrl_mon_grp//0=tl;1=tl;
> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> //0=tl;1=tl;
> /child_default_mon_grp/0=tl;1=tl;
>
> To update the default group to enable only total event on domain 0:
> # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_control
>
> Assignment status after the update:
> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
> non_default_ctrl_mon_grp//0=tl;1=tl;
> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> //0=t;1=tl;
> /child_default_mon_grp/0=tl;1=tl;
>
> To update the MON group child_default_mon_grp to remove total event on domain 1:
> # echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_control
>
> Assignment status after the update:
> $ cat /sys/fs/resctrl/info/L3_MON/mbm_control
> non_default_ctrl_mon_grp//0=tl;1=tl;
> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> //0=t;1=tl;
> /child_default_mon_grp/0=tl;1=l;
>
> To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to
> remove both local and total events on domain 1:
> # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
> /sys/fs/resctrl/info/L3_MON/mbm_control
>
> Assignment status after the update:
> non_default_ctrl_mon_grp//0=tl;1=tl;
> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
> //0=t;1=tl;
> /child_default_mon_grp/0=tl;1=l;
>
> To update the default group to add a local event domain 0.
> # echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_control
>
> Assignment status after the update:
> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
> non_default_ctrl_mon_grp//0=tl;1=tl;
> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
> //0=tl;1=tl;
> /child_default_mon_grp/0=tl;1=l;
>
> To update the non default CTRL_MON group non_default_ctrl_mon_grp to unassign all
> the MBM events on all the domains.
> # echo "non_default_ctrl_mon_grp//*=_" > /sys/fs/resctrl/info/L3_MON/mbm_control
>
> Assignment status after the update:
> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
> non_default_ctrl_mon_grp//0=_;1=_;
> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
> //0=tl;1=tl;
> /child_default_mon_grp/0=tl;1=l;
>
>
> f. Read the event mbm_total_bytes and mbm_local_bytes of the default group.
> There is no change in reading the events with ABMC. If the event is unassigned
> when reading, then the read will come back as "Unassigned".
>
> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 779247936
> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> 765207488
>
> g. Check the bandwidth configuration for the group. Note that bandwidth
> configuration has a domain scope. Total event defaults to 0x7F (to
> count all the events) and local event defaults to 0x15 (to count all
> the local numa events). The event bitmap decoding is available at
> https://www.kernel.org/doc/Documentation/x86/resctrl.rst
> in section "mbm_total_bytes_config", "mbm_local_bytes_config":
>
> #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> 0=0x7f;1=0x7f
>
> #cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
> 0=0x15;1=0x15
>
> h. Change the bandwidth source for domain 0 for the total event to count only reads.
> Note that this change effects total events on the domain 0.
>
> #echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> 0=0x33;1=0x7F
>
> i. Now read the total event again. The first read will come back with "Unavailable"
> status. The subsequent read of mbm_total_bytes will display only the read events.
>
> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> Unavailable
> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 314101
>
> j. Users will have the option to go back to legacy mbm_mode if required.
> This can be done using the following command. Note that switching the
> mbm_mode will reset all the mbm counters of all resctrl groups.
"reset all the mbm counters" -> "reset all the MBM counters"
>
> # echo "legacy" > /sys/fs/resctrl/info/L3_MON/mbm_mode
> # cat /sys/fs/resctrl/info/L3_MON/mbm_mode
> mbm_cntr_assign
> [legacy]
>
>
> k. Unmount the resctrl
>
> #umount /sys/fs/resctrl/
> ---
> v6:
> We still need to finalize few interface details on mbm_mode and mbm_control
> in case of ABMC and Soft-ABMC. We can continue the discussion with this series.
Could you please list the details that need to be finalized?
Thank you
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 03/22] x86/resctrl: Consolidate monitoring related data from rdt_resource
2024-08-06 22:00 ` [PATCH v6 03/22] x86/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
@ 2024-08-16 21:29 ` Reinette Chatre
2024-08-19 14:46 ` Moger, Babu
0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-08-16 21:29 UTC (permalink / raw)
To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
On 8/6/24 3:00 PM, Babu Moger wrote:
> The cache allocation and memory bandwidth allocation feature properties
> are consolidated into cache and membw structures respectively.
"are consolidated into cache and membw structures respectively" ->
"are consolidated into struct resctrl_cache and struct resctrl_membw respectively"
>
> In preparation for more monitoring properties that will clobber the
> existing resource struct more, re-organize the monitoring specific
> properties to also be in a separate structure.
>
> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
...
> @@ -182,12 +182,21 @@ enum resctrl_scope {
> RESCTRL_L3_NODE,
> };
>
> +/**
> + * struct resctrl_mon - Monitoring related data
To capture that this is not global monitoring data but instead
resource specific:
"Monitoring related data" -> "Monitoring related data of a resctrl resource"
> + * @num_rmid: Number of RMIDs available
> + * @evt_list: List of monitoring events
> + */
> +struct resctrl_mon {
> + int num_rmid;
> + struct list_head evt_list;
> +};
> +
> /**
> * struct rdt_resource - attributes of a resctrl resource
> * @rid: The index of the resource
> * @alloc_capable: Is allocation available on this machine
> * @mon_capable: Is monitor feature available on this machine
> - * @num_rmid: Number of RMIDs available
> * @ctrl_scope: Scope of this resource for control functions
> * @mon_scope: Scope of this resource for monitor functions
> * @cache: Cache allocation related data
> @@ -199,7 +208,6 @@ enum resctrl_scope {
> * @default_ctrl: Specifies default cache cbm or memory B/W percent.
> * @format_str: Per resource format string to show domain value
> * @parse_ctrlval: Per resource function pointer to parse control values
> - * @evt_list: List of monitoring events
> * @fflags: flags to choose base and info files
> * @cdp_capable: Is the CDP feature available on this resource
> */
Please add a kernel-doc entry for the new member.
> @@ -207,11 +215,11 @@ struct rdt_resource {
> int rid;
> bool alloc_capable;
> bool mon_capable;
> - int num_rmid;
> enum resctrl_scope ctrl_scope;
> enum resctrl_scope mon_scope;
> struct resctrl_cache cache;
> struct resctrl_membw membw;
> + struct resctrl_mon mon;
> struct list_head ctrl_domains;
> struct list_head mon_domains;
> char *name;
> @@ -221,7 +229,6 @@ struct rdt_resource {
> int (*parse_ctrlval)(struct rdt_parse_data *data,
> struct resctrl_schema *s,
> struct rdt_ctrl_domain *d);
> - struct list_head evt_list;
> unsigned long fflags;
> bool cdp_capable;
> };
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 04/22] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
2024-08-06 22:00 ` [PATCH v6 04/22] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
2024-08-07 16:33 ` Thomas Gleixner
@ 2024-08-16 21:30 ` Reinette Chatre
2024-08-19 15:37 ` Moger, Babu
1 sibling, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-08-16 21:30 UTC (permalink / raw)
To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
On 8/6/24 3:00 PM, Babu Moger wrote:
> ABMC feature details are reported via CPUID Fn8000_0020_EBX_x5.
> Bits Description
> 15:0 MAX_ABMC Maximum Supported Assignable Bandwidth
> Monitoring Counter ID + 1
>
> The feature details are documented in APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
>
> Detect the feature and number of assignable counters supported.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v6: Commit message update.
> Renamed abmc_capable to mbm_cntr_assignable.
>
> v5: Name change num_cntrs to num_mbm_cntrs.
> Moved abmc_capable to resctrl_mon.
>
> v4: Removed resctrl_arch_has_abmc(). Added all the code inline. We dont
> need to separate this as arch code.
>
> v3: Removed changes related to mon_features.
> Moved rdt_cpu_has to core.c and added new function resctrl_arch_has_abmc.
> Also moved the fields mbm_assign_capable and mbm_assign_cntrs to
> rdt_resource. (James)
>
> v2: Changed the field name to mbm_assign_capable from abmc_capable.
> ---
> arch/x86/kernel/cpu/resctrl/monitor.c | 12 ++++++++++++
> include/linux/resctrl.h | 4 ++++
> 2 files changed, 16 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 795fe91a8feb..88312b5f0069 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -1229,6 +1229,18 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
> mbm_local_event.configurable = true;
> mbm_config_rftype_init("mbm_local_bytes_config");
> }
> +
> + if (rdt_cpu_has(X86_FEATURE_ABMC)) {
> + r->mon.mbm_cntr_assignable = true;
> + /*
> + * Query CPUID_Fn80000020_EBX_x05 for number of
> + * ABMC counters.
> + */
At this point this comment seems unnecessary. Not an issue, it can stay of you
prefer.
> + cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
> + r->mon.num_mbm_cntrs = (ebx & 0xFFFF) + 1;
> + if (WARN_ON(r->mon.num_mbm_cntrs > 64))
Please document where this "64" limit comes from. This is potentially a problem
since the resctrl fs managed bitmap is hardcoded to be of size 64 but the arch code
sets how many counters are supported. Will comment more later on bitmap portions, but
to handle this I expect resctrl fs should at least sanity check the number of counters
before attempting to initialize its bitmap ... or better, as James suggests, make the
bitmap creation dynamic.
> + r->mon.num_mbm_cntrs = 64;
> + }
> }
>
> l3_mon_evt_init(r);
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 1097559f4987..72c498deeb5e 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -185,10 +185,14 @@ enum resctrl_scope {
> /**
> * struct resctrl_mon - Monitoring related data
> * @num_rmid: Number of RMIDs available
> + * @num_mbm_cntrs: Number of monitoring counters
> + * @mbm_cntr_assignable:Is system capable of supporting monitor assignment?
> * @evt_list: List of monitoring events
> */
> struct resctrl_mon {
> int num_rmid;
> + int num_mbm_cntrs;
> + bool mbm_cntr_assignable;
> struct list_head evt_list;
> };
>
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 06/22] x86/resctrl: Add support to enable/disable AMD ABMC feature
2024-08-06 22:00 ` [PATCH v6 06/22] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
2024-08-16 16:29 ` James Morse
@ 2024-08-16 21:31 ` Reinette Chatre
2024-08-19 18:07 ` Moger, Babu
1 sibling, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-08-16 21:31 UTC (permalink / raw)
To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
On 8/6/24 3:00 PM, Babu Moger wrote:
> Add the functionality to enable/disable AMD ABMC feature.
>
> AMD ABMC feature is enabled by setting enabled bit(0) in MSR
> L3_QOS_EXT_CFG. When the state of ABMC is changed, the MSR needs
> to be updated on all the logical processors in the QOS Domain.
>
> Hardware counters will reset when ABMC state is changed. Reset the
Could you please clarify how this works when ABMC state is changed on
one CPU in a domain vs all (as done in this patch) CPUs of a domain? In this
patch it is clear that all hardware counters are reset and consequently
the architectural state maintained by resctrl is reset also. Later, when
the code is added to handle CPU online I see that ABMC state is changed
on a new online CPU but I do not see matching reset of architectural state.
(more in that patch later)
> architectural state so that reading of hardware counter is not considered
"architectural state" -> "architectural state maintained by resctrl"
> as an overflow in next update.
"so that reading of hardware counter" -> "so that reading of the/a(?)
hardware counter"
>
> The ABMC feature details are documented in APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v6: Renamed abmc_enabled to mbm_cntr_assign_enabled.
> Used msr_set_bit and msr_clear_bit for msr updates.
> Renamed resctrl_arch_abmc_enable() to resctrl_arch_mbm_cntr_assign_enable().
> Renamed resctrl_arch_abmc_disable() to resctrl_arch_mbm_cntr_assign_disable().
> Made _resctrl_abmc_enable to return void.
>
> v5: Renamed resctrl_abmc_enable to resctrl_arch_abmc_enable.
> Renamed resctrl_abmc_disable to resctrl_arch_abmc_disable.
> Introduced resctrl_arch_get_abmc_enabled to get abmc state from
> non-arch code.
> Renamed resctrl_abmc_set_all to _resctrl_abmc_enable().
> Modified commit log to make it clear about AMD ABMC feature.
>
> v3: No changes.
>
> v2: Few text changes in commit message.
> ---
> arch/x86/include/asm/msr-index.h | 1 +
> arch/x86/kernel/cpu/resctrl/internal.h | 13 ++++++
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 57 ++++++++++++++++++++++++++
> 3 files changed, 71 insertions(+)
>
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 82c6a4d350e0..d86469bf5d41 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -1182,6 +1182,7 @@
> #define MSR_IA32_MBA_BW_BASE 0xc0000200
> #define MSR_IA32_SMBA_BW_BASE 0xc0000280
> #define MSR_IA32_EVT_CFG_BASE 0xc0000400
> +#define MSR_IA32_L3_QOS_EXT_CFG 0xc00003ff
>
> /* MSR_IA32_VMX_MISC bits */
> #define MSR_IA32_VMX_MISC_INTEL_PT (1ULL << 14)
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 2bd207624eec..154983a67646 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -56,6 +56,9 @@
> /* Max event bits supported */
> #define MAX_EVT_CONFIG_BITS GENMASK(6, 0)
>
> +/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
> +#define ABMC_ENABLE_BIT 0
> +
> /**
> * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
> * aren't marked nohz_full
> @@ -477,6 +480,7 @@ struct rdt_parse_data {
> * @mbm_cfg_mask: Bandwidth sources that can be tracked when Bandwidth
> * Monitoring Event Configuration (BMEC) is supported.
> * @cdp_enabled: CDP state of this resource
> + * @mbm_cntr_assign_enabled: ABMC feature is enabled
> *
> * Members of this structure are either private to the architecture
> * e.g. mbm_width, or accessed via helpers that provide abstraction. e.g.
> @@ -491,6 +495,7 @@ struct rdt_hw_resource {
> unsigned int mbm_width;
> unsigned int mbm_cfg_mask;
> bool cdp_enabled;
> + bool mbm_cntr_assign_enabled;
> };
>
> static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r)
> @@ -536,6 +541,14 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
>
> void arch_mon_domain_online(struct rdt_resource *r, struct rdt_mon_domain *d);
>
> +static inline bool resctrl_arch_get_abmc_enabled(void)
This function will be called by resctrl fs code. Please contain the "abmc" naming to the
x86 architecture code and let resctrl fs just refer to it as "mbm_assign"/"mbm_cntr_assign".
> +{
> + return rdt_resources_all[RDT_RESOURCE_L3].mbm_cntr_assign_enabled;
> +}
> +
> +int resctrl_arch_mbm_cntr_assign_enable(void);
> +void resctrl_arch_mbm_cntr_assign_disable(void);
> +
> /*
> * To return the common struct rdt_resource, which is contained in struct
> * rdt_hw_resource, walk the resctrl member of struct rdt_hw_resource.
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 07/22] x86/resctrl: Introduce the interface to display monitor mode
2024-08-06 22:00 ` [PATCH v6 07/22] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
2024-08-16 16:56 ` James Morse
@ 2024-08-16 21:32 ` Reinette Chatre
2024-08-19 19:27 ` Moger, Babu
1 sibling, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-08-16 21:32 UTC (permalink / raw)
To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
(expanding on what James said)
On 8/6/24 3:00 PM, Babu Moger wrote:
> The mbm_mode displays list of monitor modes supported.
>
> The mbm_cntr_assign is one of the currently supported modes. It is also
> called ABMC (Assignable Bandwidth Monitoring Counters) feature. ABMC
> feature provides option to assign a hardware counter to an RMID and
> monitor the bandwidth as long as it is assigned. ABMC mode is enabled
> by default when supported.
>
> Legacy mode works without the assignment option.
>
> Provide an interface to display the monitor mode on the system.
> $cat /sys/fs/resctrl/info/L3_MON/mbm_mode
> [mbm_cntr_assign]
> legacy
>
> Switching the mbm_mode will reset all the mbm counters of all resctrl
> groups.
The changelog also needs to be clear to distinguish the resctrl fs
"mbm_cntr_assign" mode from how it is backed by ABMC on AMD hardware.
for example (please improve):
Introduce "mbm_cntr_assign" mode that provides the option to assign a
hardware counter to an RMID and monitor the bandwidth as long as it is
assigned. On AMD systems "mbm_cntr_assign" is backed by the ABMC (Assignable
Bandwidth Monitoring Counters) hardware feature. "mbm_cntr_assign" mode
is enabled by default when supported.
"default" mode is the existing monitoring mode that works without the
explicit counter assignment, instead relying on dynamic counter assignment
by hardware that may result in hardware not dedicating a counter resulting in
monitoring data reads returning "Unavailable".
Provide an interface to display the monitor mode on the system.
$cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
[mbm_cntr_assign]
default
Switching the mbm_assign_mode will reset all the MBM counters of all resctrl
groups.
....
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 6075b1e5bb77..d8f85b20ab8f 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -845,6 +845,26 @@ static int rdtgroup_rmid_show(struct kernfs_open_file *of,
> return ret;
> }
>
> +static int rdtgroup_mbm_mode_show(struct kernfs_open_file *of,
> + struct seq_file *s, void *v)
> +{
> + struct rdt_resource *r = of->kn->parent->priv;
> +
> + if (r->mon.mbm_cntr_assignable) {
> + if (resctrl_arch_get_abmc_enabled()) {
Since this state can change during runtime this access needs to be protected.
> + seq_puts(s, "[mbm_cntr_assign]\n");
> + seq_puts(s, "legacy\n");
> + } else {
> + seq_puts(s, "mbm_cntr_assign\n");
> + seq_puts(s, "[legacy]\n");
> + }
> + } else {
> + seq_puts(s, "[legacy]\n");
> + }
> +
> + return 0;
> +}
> +
> #ifdef CONFIG_PROC_CPU_RESCTRL
>
> /*
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 08/22] x86/resctrl: Introduce interface to display number of monitoring counters
2024-08-06 22:00 ` [PATCH v6 08/22] x86/resctrl: Introduce interface to display number of monitoring counters Babu Moger
@ 2024-08-16 21:34 ` Reinette Chatre
2024-08-20 15:56 ` Moger, Babu
0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-08-16 21:34 UTC (permalink / raw)
To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
On 8/6/24 3:00 PM, Babu Moger wrote:
> The ABMC feature provides an option to the user to assign a hardware
Here and in all patches, when referring to resctrl fs please use the more
generic "mbm_assign_cntr" mode to distinguish it from the hardware/architecture
specific code that involves ABMC. Something like
"The ABMC feature provides" -> ""mbm_cntr_assign" mode provides"
I also think that being explicit with this separation will help us to see
gaps in interface between resctrl fs and arch.
> counter to an RMID and monitor the bandwidth as long as the counter is
Please clarify the scope of this feature. Above mentions that a counter is
assigned to an RMID but later it is mentioned that the counter is assigned
to an event. Perhaps consistently mention that a counter is assigned to
a RMID,event pair?
> assigned. Number of assignments depend on number of monitoring counters
> available.
>
> Provide the interface to display the number of monitoring counters
> supported.
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v6: No changes.
>
> v5: Changed the display name from num_cntrs to num_mbm_cntrs.
> Updated the commit message.
> Moved the patch after mbm_mode is introduced.
>
> v4: Changed the counter name to num_cntrs. And few text changes.
>
> v3: Changed the field name to mbm_assign_cntrs.
>
> v2: Changed the field name to mbm_assignable_counters from abmc_counte
> ---
> Documentation/arch/x86/resctrl.rst | 3 +++
> arch/x86/kernel/cpu/resctrl/monitor.c | 2 ++
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 16 ++++++++++++++++
> 3 files changed, 21 insertions(+)
>
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index d4ec605b200a..fe9f10766c4f 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -291,6 +291,9 @@ with the following files:
> as long as there are enough RMID counters available to support number
> of monitoring groups.
>
> +"num_mbm_cntrs":
> + The number of monitoring counters available for assignment.
> +
> "max_threshold_occupancy":
> Read/write file provides the largest value (in
> bytes) at which a previously used LLC_occupancy
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 5e8706ab6361..83329cefebf7 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -1242,6 +1242,8 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
> r->mon.num_mbm_cntrs = (ebx & 0xFFFF) + 1;
> if (WARN_ON(r->mon.num_mbm_cntrs > 64))
> r->mon.num_mbm_cntrs = 64;
> +
> + resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
The arch code should not access the resctrl file flags. This should be moved to make
the MPAM support easier. With the arch code setting r->mon.mbm_cntr_assignable the
fs code can use that to set the flags. Something similar to below patch is needed:
https://lore.kernel.org/lkml/20240802172853.22529-27-james.morse@arm.com/
> }
> }
>
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 09/22] x86/resctrl: Introduce MBM counters bitmap
2024-08-06 22:00 ` [PATCH v6 09/22] x86/resctrl: Introduce MBM counters bitmap Babu Moger
2024-08-16 16:29 ` James Morse
@ 2024-08-16 21:35 ` Reinette Chatre
2024-08-19 15:49 ` Moger, Babu
1 sibling, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-08-16 21:35 UTC (permalink / raw)
To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
On 8/6/24 3:00 PM, Babu Moger wrote:
> Hardware provides a set of counters when mbm_cntr_assignable feature is
> supported. These counters are used for assigning the events in resctrl
> group when the feature is enabled.
"in resctrl group" -> "in a resctrl group"?
>
> Introduce mbm_cntrs_free_map bitmap to track available and free counters
What is the difference between an available and a free counter?
> and set of routines to allocate and free the counters.
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> ---
> arch/x86/kernel/cpu/resctrl/internal.h | 2 ++
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 33 ++++++++++++++++++++++++++
> 2 files changed, 35 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 154983a67646..6263362496a3 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -662,6 +662,8 @@ void __check_limbo(struct rdt_mon_domain *d, bool force_free);
> void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
> void __init resctrl_file_fflags_init(const char *config,
> unsigned long fflags);
> +int mbm_cntr_alloc(struct rdt_resource *r);
> +void mbm_cntr_free(u32 cntr_id);
> void rdt_staged_configs_clear(void);
> bool closid_allocated(unsigned int closid);
> int resctrl_find_cleanest_closid(void);
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index ab4fab3b7cf1..c818965e36c9 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -185,6 +185,37 @@ bool closid_allocated(unsigned int closid)
> return !test_bit(closid, &closid_free_map);
> }
>
> +/*
> + * Counter bitmap for tracking the available counters.
> + * ABMC feature provides set of hardware counters for enabling events.
"ABMC feature" -> "mbm_cntr_assign mode"
> + * Each event takes one hardware counter. Kernel needs to keep track
"Each event takes one hardware counter" -> "Each RMID and event pair takes
one hardware counter" ?
> + * of number of available counters.
"of number of available counters" -> "of the number of available counters"?
> + */
> +static DECLARE_BITMAP(mbm_cntrs_free_map, 64);
> +
> +static void mbm_cntrs_init(struct rdt_resource *r)
> +{
> + bitmap_fill(mbm_cntrs_free_map, r->mon.num_mbm_cntrs);
Apart from what James mentioned about the different sizes, please also
add checking that the resource actually supports monitoring and
assignable counters before proceeding with the bitmap ops.
> +}
> +
> +int mbm_cntr_alloc(struct rdt_resource *r)
> +{
> + int cntr_id;
> +
> + cntr_id = find_first_bit(mbm_cntrs_free_map, r->mon.num_mbm_cntrs);
> + if (cntr_id >= r->mon.num_mbm_cntrs)
> + return -ENOSPC;
> +
> + __clear_bit(cntr_id, mbm_cntrs_free_map);
> +
> + return cntr_id;
> +}
> +
> +void mbm_cntr_free(u32 cntr_id)
> +{
> + __set_bit(cntr_id, mbm_cntrs_free_map);
> +}
> +
> /**
> * rdtgroup_mode_by_closid - Return mode of resource group with closid
> * @closid: closid if the resource group
> @@ -2748,6 +2779,8 @@ static int rdt_get_tree(struct fs_context *fc)
>
> closid_init();
>
> + mbm_cntrs_init(&rdt_resources_all[RDT_RESOURCE_L3].r_resctrl);
> +
> if (resctrl_arch_mon_capable())
> flags |= RFTYPE_MON;
>
This is also an example of what James mentioned elsewhere where there is an
assumption that this feature applies to the L3 resource. This has a consequence
that some code is global (like mbm_cntrs_free_map), assuming the L3 resource, while
other code takes the resource as parameter (eg. mbm_cntr_alloc()). This results
in inconsistent interface where, for example, allocating a counter needs resource
as parameter but freeing a counter does not. James already proposed different
treatment of the bitmap and L3 resource parameters, I expect with such guidance
the interfaces will become more intuitive.
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 11/22] x86/resctrl: Remove MSR reading of event configuration value
2024-08-06 22:00 ` [PATCH v6 11/22] x86/resctrl: Remove MSR reading of event configuration value Babu Moger
@ 2024-08-16 21:36 ` Reinette Chatre
2024-08-20 16:19 ` Moger, Babu
0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-08-16 21:36 UTC (permalink / raw)
To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
On 8/6/24 3:00 PM, Babu Moger wrote:
> The event configuration is domain specific and initialized during domain
> initialization. The values is stored in rdt_hw_mon_domain.
"The values is stored in rdt_hw_mon_domain." -> "The values are stored
in struct rdt_hw_mon_domain."
>
> It is not required to read the configuration register every time user asks
> for it. Use the value stored in rdt_hw_mon_domain instead.
"rdt_hw_mon_domain" -> "struct rdt_hw_mon_domain"
>
> Introduce resctrl_arch_event_config_get() and
> resctrl_arch_event_config_set() to get/set architecture domain specific
> mbm_total_cfg/mbm_local_cfg values. Also, remove unused config value
> definitions.
hmmm ... while the config values are not used they are now established
ABI and any other architecture that wants to support configurable events
will need to follow these definitions. It is thus required to keep them
documented in the kernel in support of future changes. I
understand that they are documented in user docs, but could we keep them
in the kernel code also? Since they are unused they could perhaps be moved
to comments as a compromise?
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v6: Fixed inconstancy with types. Made all the types to u32 for config
> value.
> Removed few rdt_last_cmd_puts as it is not necessary.
> Removed unused config value definitions.
> Few more updates to commit message.
>
> v5: Introduced resctrl_arch_event_config_get and
> resctrl_arch_event_config_get() based on our discussion.
> https://lore.kernel.org/lkml/68e861f9-245d-4496-a72e-46fc57d19c62@amd.com/
>
> v4: New patch.
> ---
> arch/x86/kernel/cpu/resctrl/internal.h | 21 -----
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 104 ++++++++++++++-----------
> include/linux/resctrl.h | 4 +
> 3 files changed, 64 insertions(+), 65 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 4d8cc36a8d79..1021227d8c7e 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -32,27 +32,6 @@
> */
> #define MBM_CNTR_WIDTH_OFFSET_MAX (62 - MBM_CNTR_WIDTH_BASE)
>
> -/* Reads to Local DRAM Memory */
> -#define READS_TO_LOCAL_MEM BIT(0)
> -
> -/* Reads to Remote DRAM Memory */
> -#define READS_TO_REMOTE_MEM BIT(1)
> -
> -/* Non-Temporal Writes to Local Memory */
> -#define NON_TEMP_WRITE_TO_LOCAL_MEM BIT(2)
> -
> -/* Non-Temporal Writes to Remote Memory */
> -#define NON_TEMP_WRITE_TO_REMOTE_MEM BIT(3)
> -
> -/* Reads to Local Memory the system identifies as "Slow Memory" */
> -#define READS_TO_LOCAL_S_MEM BIT(4)
> -
> -/* Reads to Remote Memory the system identifies as "Slow Memory" */
> -#define READS_TO_REMOTE_S_MEM BIT(5)
> -
> -/* Dirty Victims to All Types of Memory */
> -#define DIRTY_VICTIMS_TO_ALL_MEM BIT(6)
> -
> /* Max event bits supported */
> #define MAX_EVT_CONFIG_BITS GENMASK(6, 0)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 02afd3442876..0047b4eb0ff5 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1605,10 +1605,57 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
> }
>
> struct mon_config_info {
> + struct rdt_mon_domain *d;
> u32 evtid;
> u32 mon_config;
> };
>
> +u32 resctrl_arch_event_config_get(struct rdt_mon_domain *d,
> + enum resctrl_event_id eventid)
> +{
> + struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
> +
> + switch (eventid) {
> + case QOS_L3_OCCUP_EVENT_ID:
> + break;
> + case QOS_L3_MBM_TOTAL_EVENT_ID:
> + return hw_dom->mbm_total_cfg;
> + case QOS_L3_MBM_LOCAL_EVENT_ID:
> + return hw_dom->mbm_local_cfg;
> + }
> +
> + /* Never expect to get here */
> + WARN_ON_ONCE(1);
> +
> + return INVALID_CONFIG_VALUE;
> +}
> +
> +void resctrl_arch_event_config_set(void *info)
> +{
> + struct mon_config_info *mon_info = info;
> + struct rdt_hw_mon_domain *hw_dom;
> + unsigned int index;
> +
> + index = mon_event_config_index_get(mon_info->evtid);
> + if (index == INVALID_CONFIG_INDEX)
> + return;
> +
> + wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
> +
> + hw_dom = resctrl_to_arch_mon_dom(mon_info->d);
> +
> + switch (mon_info->evtid) {
> + case QOS_L3_OCCUP_EVENT_ID:
> + break;
> + case QOS_L3_MBM_TOTAL_EVENT_ID:
> + hw_dom->mbm_total_cfg = mon_info->mon_config;
> + break;
> + case QOS_L3_MBM_LOCAL_EVENT_ID:
> + hw_dom->mbm_local_cfg = mon_info->mon_config;
> + break;
> + }
> +}
> +
> /**
> * mon_event_config_index_get - get the hardware index for the
> * configurable event
> @@ -1631,33 +1678,11 @@ unsigned int mon_event_config_index_get(u32 evtid)
> }
> }
>
> -static void mon_event_config_read(void *info)
> -{
> - struct mon_config_info *mon_info = info;
> - unsigned int index;
> - u64 msrval;
> -
> - index = mon_event_config_index_get(mon_info->evtid);
> - if (index == INVALID_CONFIG_INDEX) {
> - pr_warn_once("Invalid event id %d\n", mon_info->evtid);
> - return;
> - }
> - rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval);
> -
> - /* Report only the valid event configuration bits */
> - mon_info->mon_config = msrval & MAX_EVT_CONFIG_BITS;
> -}
> -
> -static void mondata_config_read(struct rdt_mon_domain *d, struct mon_config_info *mon_info)
> -{
> - smp_call_function_any(&d->hdr.cpu_mask, mon_event_config_read, mon_info, 1);
> -}
> -
> static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
> {
> - struct mon_config_info mon_info = {0};
> struct rdt_mon_domain *dom;
> bool sep = false;
> + u32 val;
>
> cpus_read_lock();
> mutex_lock(&rdtgroup_mutex);
> @@ -1666,11 +1691,11 @@ static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid
> if (sep)
> seq_puts(s, ";");
>
> - memset(&mon_info, 0, sizeof(struct mon_config_info));
> - mon_info.evtid = evtid;
> - mondata_config_read(dom, &mon_info);
> + val = resctrl_arch_event_config_get(dom, evtid);
> + if (val == INVALID_CONFIG_VALUE)
Can this check and the "break" that follows be dropped? val being
INVALID_CONFIG_VALUE would be a kernel bug and resctrl_arch_event_config_get()
would already have printed the WARN. In this unlikely scenario I find it
unexpected that mbm_config_show() will return success in this case and the
below seq_printf() would handle the printing of INVALID_CONFIG_VALUE without
issue anyway.
> + break;
>
> - seq_printf(s, "%d=0x%02x", dom->hdr.id, mon_info.mon_config);
> + seq_printf(s, "%d=0x%02x", dom->hdr.id, val);
> sep = true;
> }
> seq_puts(s, "\n");
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 12/22] x86/resctrl: Introduce mbm_cntr_map to track counters at domain
2024-08-06 22:00 ` [PATCH v6 12/22] x86/resctrl: Introduce mbm_cntr_map to track counters at domain Babu Moger
@ 2024-08-16 21:37 ` Reinette Chatre
2024-08-20 18:24 ` Moger, Babu
0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-08-16 21:37 UTC (permalink / raw)
To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
On 8/6/24 3:00 PM, Babu Moger wrote:
> The MBM counters are allocated at resctrl group level. It is tracked by
Are they not allocated globally? (but maybe that is about to change?
> mbm_cntrs_free_map. Then it is assigned to the domain based on the user
> input. It needs to be tracked at domain level also.
Please elaborate why it needs to be tracked at domain level.
>
> Add the mbm_cntr_map bitmap in rdt_mon_domain structure to keep track of
"rdt_mon_domain structure" -> "struct rdt_mon_domain"
> assignment at domain level. The global counter at mbm_cntrs_free_map can
> be released when assignment at all the domain are cleared.
"all the domain" -> "all the domains"?
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v6: New patch to add domain level assignment.
> ---
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 10 ++++++++++
> include/linux/resctrl.h | 2 ++
> 2 files changed, 12 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 0047b4eb0ff5..1a90c671a027 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -4127,6 +4127,7 @@ static void __init rdtgroup_setup_default(void)
>
> static void domain_destroy_mon_state(struct rdt_mon_domain *d)
> {
> + bitmap_free(d->mbm_cntr_map);
> bitmap_free(d->rmid_busy_llc);
> kfree(d->mbm_total);
> kfree(d->mbm_local);
> @@ -4200,6 +4201,15 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain
> return -ENOMEM;
> }
> }
> + if (is_mbm_enabled()) {
This should also depend on whether the resource supports counter assignment, and that it
is enabled to ensure that r->mon.num_mbm_cntrs is valid.
> + d->mbm_cntr_map = bitmap_zalloc(r->mon.num_mbm_cntrs, GFP_KERNEL);
> + if (!d->mbm_cntr_map) {
> + bitmap_free(d->rmid_busy_llc);
> + kfree(d->mbm_total);
> + kfree(d->mbm_local);
> + return -ENOMEM;
> + }
> + }
>
> return 0;
> }
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index ef08f75191f2..034fa994e84f 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -105,6 +105,7 @@ struct rdt_ctrl_domain {
> * @cqm_limbo: worker to periodically read CQM h/w counters
> * @mbm_work_cpu: worker CPU for MBM h/w counters
> * @cqm_work_cpu: worker CPU for CQM h/w counters
> + * @mbm_cntr_map: bitmap to track domain counter assignment
> */
> struct rdt_mon_domain {
> struct rdt_domain_hdr hdr;
> @@ -116,6 +117,7 @@ struct rdt_mon_domain {
> struct delayed_work cqm_limbo;
> int mbm_work_cpu;
> int cqm_work_cpu;
> + unsigned long *mbm_cntr_map;
> };
>
> /**
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 13/22] x86/resctrl: Add data structures and definitions for ABMC assignment
2024-08-06 22:00 ` [PATCH v6 13/22] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
@ 2024-08-16 21:38 ` Reinette Chatre
2024-08-20 20:56 ` Moger, Babu
0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-08-16 21:38 UTC (permalink / raw)
To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
This patch now only introduces one data structure so the subject could
be made more specific.
On 8/6/24 3:00 PM, Babu Moger wrote:
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID and monitor the bandwidth as long as the counter
> is assigned. The bandwidth events will be tracked by the hardware until
> the user changes the configuration. Each resctrl group can configure
> maximum two counters, one for total event and one for local event.
>
>
(extra empty line)
> The ABMC feature implements a pair of MSRs, L3_QOS_ABMC_CFG (C000_03FDh)
> and L3_QOS_ABMC_DSC (C000_3FEh). The counters are configured by writing
> to MSR L3_QOS_ABMC_CFG. Configuration is done by setting the counter id,
> bandwidth source (RMID) and bandwidth configuration supported by BMEC
> (Bandwidth Monitoring Event Configuration).
>
> L3_QOS_ABMC_DSC is a read-only MSR. Reading L3_QOS_ABMC_DSC returns the
> configuration of the counter id specified in L3_QOS_ABMC_CFG.cntr_id
> with rmid(bw_src) and event configuration(bw_type).
>
> Attempts to read or write these MSRs when ABMC is not enabled will result
> in a #GP(0) exception.
>
> Introduce data structures and definitions for ABMC MSRs.
>
> MSR L3_QOS_ABMC_CFG (0xC000_03FDh) and L3_QOS_ABMC_DSC (0xC000_03FEh)
> details.
The changelog and patch introduce L3_QOS_ABMC_DSC but I cannot see that it is
used in this series.
> =========================================================================
> Bits Mnemonic Description Access Reset
> Type Value
> =========================================================================
> 63 CfgEn Configuration Enable R/W 0
>
> 62 CtrEn Enable/disable Tracking R/W 0
>
> 61:53 – Reserved MBZ 0
>
> 52:48 CtrID Counter Identifier R/W 0
>
> 47 IsCOS BwSrc field is a CLOSID R/W 0
> (not an RMID)
>
> 46:44 – Reserved MBZ 0
>
> 43:32 BwSrc Bandwidth Source R/W 0
> (RMID or CLOSID)
>
> 31:0 BwType Bandwidth configuration R/W 0
> to track for this counter
> ==========================================================================
>
> Configuration and tracking:
> CfgEn=1,CtrEn=0 : Configure CtrID and but no tracking the events yet.
> CfgEn=1,CtrEn=1 : Configure CtrID and start tracking events.
Could you please add the above snippet noting field combinations to the
kernel-doc of the union?
>
> The feature details are documented in the APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v6: Removed all the fs related changes.
> Added note on CfgEn,CtrEn.
> Removed the definitions which are not used.
> Removed cntr_id initialization.
>
> v5: Moved assignment flags here (path 10/19 of v4).
> Added MON_CNTR_UNSET definition to initialize cntr_id's.
> More details in commit log.
> Renamed few fields in l3_qos_abmc_cfg for readability.
>
> v4: Added more descriptions.
> Changed the name abmc_ctr_id to ctr_id.
> Added L3_QOS_ABMC_DSC. Used for reading the configuration.
>
> v3: No changes.
>
> v2: No changes.
> ---
> arch/x86/include/asm/msr-index.h | 2 ++
> arch/x86/kernel/cpu/resctrl/internal.h | 26 ++++++++++++++++++++++++++
> 2 files changed, 28 insertions(+)
>
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index d86469bf5d41..5b3931a59d5a 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -1183,6 +1183,8 @@
> #define MSR_IA32_SMBA_BW_BASE 0xc0000280
> #define MSR_IA32_EVT_CFG_BASE 0xc0000400
> #define MSR_IA32_L3_QOS_EXT_CFG 0xc00003ff
> +#define MSR_IA32_L3_QOS_ABMC_CFG 0xc00003fd
> +#define MSR_IA32_L3_QOS_ABMC_DSC 0xc00003fe
>
> /* MSR_IA32_VMX_MISC bits */
> #define MSR_IA32_VMX_MISC_INTEL_PT (1ULL << 14)
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 1021227d8c7e..af3efa35a62e 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -589,6 +589,32 @@ union cpuid_0x10_x_edx {
> unsigned int full;
> };
>
> +/*
> + * ABMC counters can be configured by writing to L3_QOS_ABMC_CFG.
> + * @bw_type : Bandwidth configuration(supported by BMEC)
> + * tracked by the @cntr_id.
> + * @bw_src : Bandwidth source (RMID or CLOSID).
> + * @reserved1 : Reserved.
> + * @is_clos : @bw_src field is a CLOSID (not an RMID).
> + * @cntr_id : Counter identifier.
> + * @reserved : Reserved.
> + * @cntr_en : Tracking enable bit.
> + * @cfg_en : Configuration enable bit.
> + */
> +union l3_qos_abmc_cfg {
> + struct {
> + unsigned long bw_type :32,
> + bw_src :12,
> + reserved1: 3,
> + is_clos : 1,
> + cntr_id : 5,
> + reserved : 9,
> + cntr_en : 1,
> + cfg_en : 1;
> + } split;
> + unsigned long full;
> +};
> +
This data structure still uses tabs that seem to have goal of aligning members
but the tabs are used inconsistently and members are not lining up either.
> void rdt_last_cmd_clear(void);
> void rdt_last_cmd_puts(const char *s);
> __printf(1, 2)
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 14/22] x86/resctrl: Introduce cntr_id in mongroup for assignments
2024-08-06 22:00 ` [PATCH v6 14/22] x86/resctrl: Introduce cntr_id in mongroup for assignments Babu Moger
@ 2024-08-16 21:38 ` Reinette Chatre
2024-08-20 22:42 ` Moger, Babu
0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-08-16 21:38 UTC (permalink / raw)
To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
On 8/6/24 3:00 PM, Babu Moger wrote:
> mbm_cntr_assignable feature provides an option to the user to assign a
> hardware counter to an RMID and monitor the bandwidth as long as the
> counter is assigned. There can be two counters per monitor group, one
> for total event and another for local event.
>
> Introduce cntr_id to manage the assignments.
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v6: New patch.
> Separated FS and arch bits.
> ---
> arch/x86/kernel/cpu/resctrl/internal.h | 7 +++++++
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 6 ++++++
> 2 files changed, 13 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index af3efa35a62e..d93082b65d69 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -41,6 +41,11 @@
> /* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
> #define ABMC_ENABLE_BIT 0
>
> +/* Maximum assignable counters per resctrl group */
> +#define MAX_CNTRS 2
> +
> +#define MON_CNTR_UNSET U32_MAX
> +
> /**
> * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
> * aren't marked nohz_full
> @@ -210,12 +215,14 @@ enum rdtgrp_mode {
> * @parent: parent rdtgrp
> * @crdtgrp_list: child rdtgroup node list
> * @rmid: rmid for this rdtgroup
> + * @cntr_id: Counter ids for assignment
Could this be:
"IDs of hardware counters assigned to monitor group"
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 15/22] x86/resctrl: Add the interface to assign a hardware counter
2024-08-06 22:00 ` [PATCH v6 15/22] x86/resctrl: Add the interface to assign a hardware counter Babu Moger
2024-08-16 16:30 ` James Morse
@ 2024-08-16 21:41 ` Reinette Chatre
2024-08-21 15:04 ` Moger, Babu
1 sibling, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-08-16 21:41 UTC (permalink / raw)
To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
On 8/6/24 3:00 PM, Babu Moger wrote:
> The ABMC feature provides an option to the user to assign a hardware
This patch is a mix of resctrl fs and arch code, could each piece please
be desribed clearly?
> counter to an RMID and monitor the bandwidth as long as it is assigned.
> The assigned RMID will be tracked by the hardware until the user unassigns
> it manually.
>
> Counters are configured by writing to L3_QOS_ABMC_CFG MSR and
> specifying the counter id, bandwidth source, and bandwidth types.
>
> Provide the interface to assign the counter ids to RMID.
>
> The feature details are documented in the APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v6: Removed mbm_cntr_alloc() from this patch to keep fs and arch code
> separate.
> Added code to update the counter assignment at domain level.
>
> v5: Few name changes to match cntr_id.
> Changed the function names to
> rdtgroup_assign_cntr
> resctr_arch_assign_cntr
> More comments on commit log.
> Added function summary.
>
> v4: Commit message update.
> User bitmap APIs where applicable.
> Changed the interfaces considering MPAM(arm).
> Added domain specific assignment.
>
> v3: Removed the static from the prototype of rdtgroup_assign_abmc.
> The function is not called directly from user anymore. These
> changes are related to global assignment interface.
>
> v2: Minor text changes in commit message.
> ---
> arch/x86/kernel/cpu/resctrl/internal.h | 4 ++
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 97 ++++++++++++++++++++++++++
> 2 files changed, 101 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index d93082b65d69..4e8109dee174 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -685,6 +685,10 @@ int mbm_cntr_alloc(struct rdt_resource *r);
> void mbm_cntr_free(u32 cntr_id);
> void resctrl_mbm_evt_config_init(struct rdt_hw_mon_domain *hw_dom);
> unsigned int mon_event_config_index_get(u32 evtid);
> +int resctrl_arch_assign_cntr(struct rdt_mon_domain *d, enum resctrl_event_id evtid,
> + u32 rmid, u32 cntr_id, u32 closid, bool assign);
> +int rdtgroup_assign_cntr(struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
> +int rdtgroup_alloc_cntr(struct rdtgroup *rdtgrp, int index);
> void rdt_staged_configs_clear(void);
> bool closid_allocated(unsigned int closid);
> int resctrl_find_cleanest_closid(void);
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 60696b248b56..1ee91a7293a8 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1864,6 +1864,103 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
> return ret ?: nbytes;
> }
>
> +static void rdtgroup_abmc_cfg(void *info)
This has nothing to do with a resctrl group (arch code has no insight into the groups anyway).
Maybe an arch specific name like "resctrl_abmc_config_one_amd()" to match earlier
"resctrl_abmc_set_one_amd()"?
> +{
> + u64 *msrval = info;
> +
> + wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *msrval);
> +}
> +
> +/*
> + * Send an IPI to the domain to assign the counter id to RMID.
> + */
> +int resctrl_arch_assign_cntr(struct rdt_mon_domain *d, enum resctrl_event_id evtid,
> + u32 rmid, u32 cntr_id, u32 closid, bool assign)
> +{
> + struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
> + union l3_qos_abmc_cfg abmc_cfg = { 0 };
> + struct arch_mbm_state *arch_mbm;
> +
> + abmc_cfg.split.cfg_en = 1;
> + abmc_cfg.split.cntr_en = assign ? 1 : 0;
> + abmc_cfg.split.cntr_id = cntr_id;
> + abmc_cfg.split.bw_src = rmid;
> +
> + /* Update the event configuration from the domain */
> + if (evtid == QOS_L3_MBM_TOTAL_EVENT_ID) {
> + abmc_cfg.split.bw_type = hw_dom->mbm_total_cfg;
> + arch_mbm = &hw_dom->arch_mbm_total[rmid];
> + } else {
> + abmc_cfg.split.bw_type = hw_dom->mbm_local_cfg;
> + arch_mbm = &hw_dom->arch_mbm_local[rmid];
> + }
> +
> + smp_call_function_any(&d->hdr.cpu_mask, rdtgroup_abmc_cfg, &abmc_cfg, 1);
> +
> + /*
> + * Reset the architectural state so that reading of hardware
> + * counter is not considered as an overflow in next update.
> + */
> + if (arch_mbm)
> + memset(arch_mbm, 0, sizeof(struct arch_mbm_state));
> +
> + return 0;
> +}
> +
> +/* Allocate a new counter id if the event is unassigned */
> +int rdtgroup_alloc_cntr(struct rdtgroup *rdtgrp, int index)
> +{
> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> + int cntr_id;
> +
> + /* Nothing to do if event has been assigned already */
> + if (rdtgrp->mon.cntr_id[index] != MON_CNTR_UNSET) {
> + rdt_last_cmd_puts("ABMC counter is assigned already\n");
This is resctrl fs code. Please replace the arch specific messages
("ABMC") with resctrl fs terms.
> + return 0;
> + }
> +
> + /*
> + * Allocate a new counter id and update domains
> + */
> + cntr_id = mbm_cntr_alloc(r);
> + if (cntr_id < 0) {
> + rdt_last_cmd_puts("Out of ABMC counters\n");
here also.
> + return -ENOSPC;
> + }
> +
> + rdtgrp->mon.cntr_id[index] = cntr_id;
> +
> + return 0;
> +}
> +
> +/*
> + * Assign a hardware counter to the group and assign the counter
> + * all the domains in the group. It will try to allocate the mbm
> + * counter if the counter is available.
> + */
> +int rdtgroup_assign_cntr(struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
> +{
> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> + struct rdt_mon_domain *d;
> + int index;
> +
> + index = mon_event_config_index_get(evtid);
After going through MPAM series this no longer looks correct. As the name of this
function implies this is an index unique to the monitor event configuration feature
and as the MPAM series highlights, it is unique to the architecture, not something
that is visible to resctrl fs. resctrl fs uses the event IDs and it is only when the
fs makes a request to the architecture that this translation comes into play.
With this change, what is the architecture specific "mon event config index" now
becomes part of resctrl fs used for something totally different from mon event
configuration.
I think we should separate this to make sure we distinguish between an architectural
translation and a resctrl fs translation, the array index is not the same as the architecture
specific "mov event config index".
How about we start with something simple that is defined by resctrl fs? for example:
#define MBM_EVENT_ARRAY_INDEX(_event) (_event - 2)
> + if (index == INVALID_CONFIG_INDEX)
> + return -EINVAL;
> +
> + if (rdtgroup_alloc_cntr(rdtgrp, index))
> + return -EINVAL;
> +
hmmm ... so rdtgroup_alloc_cntr() returns 0 if the counter is assigned already, and
in this case the configuration is done again even if counter was already assigned.
Is this intended?
rdtgroup_assign_cntr() seems to be almost identical to rdtgroup_assign_update()
that has protection against the above from happening. It looks like these two
functions can be merged into one?
> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
> + resctrl_arch_assign_cntr(d, evtid, rdtgrp->mon.rmid,
> + rdtgrp->mon.cntr_id[index],
There currently seems to be a mismatch between functions needing to
access this ID directly as above in some cases while also needing to
use helpers like rdtgroup_alloc_cntr().
Also, as James indicated, resctrl_arch_assign_cntr() may fail on Arm
so this needs error checking even though the x86 implementation always
returns success.
> + rdtgrp->closid, true);
> + set_bit(rdtgrp->mon.cntr_id[index], d->mbm_cntr_map);
> + }
> +
> + return 0;
> +}
> +
> /* rdtgroup information files for one cache resource. */
> static struct rftype res_common_files[] = {
> {
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 16/22] x86/resctrl: Add the interface to unassign a MBM counter
2024-08-06 22:00 ` [PATCH v6 16/22] x86/resctrl: Add the interface to unassign a MBM counter Babu Moger
@ 2024-08-16 21:41 ` Reinette Chatre
2024-08-21 16:01 ` Moger, Babu
0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-08-16 21:41 UTC (permalink / raw)
To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
On 8/6/24 3:00 PM, Babu Moger wrote:
> The ABMC feature provides an option to the user to assign a hardware
This is about resctrl fs so "The ABMC feature" -> "mbm_cntr_assign mode"
(please check whole series).
> counter to an RMID and monitor the bandwidth as long as it is assigned.
> The assigned RMID will be tracked by the hardware until the user unassigns
> it manually.
>
> Hardware provides only limited number of counters. If the system runs out
> of assignable counters, kernel will display an error when a new assignment
> is requested. Users need to unassign a already assigned counter to make
> space for new assignment.
>
> Provide the interface to unassign the counter ids from the group. Free the
> counter if it is not assigned in any of the domains.
>
> The feature details are documented in the APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v6: Removed mbm_cntr_free from this patch.
> Added counter test in all the domains and free if it is not assigned to
> any domains.
>
> v5: Few name changes to match cntr_id.
> Changed the function names to
> rdtgroup_unassign_cntr
> More comments on commit log.
>
> v4: Added domain specific unassign feature.
> Few name changes.
>
> v3: Removed the static from the prototype of rdtgroup_unassign_abmc.
> The function is not called directly from user anymore. These
> changes are related to global assignment interface.
>
> v2: No changes.
> ---
> arch/x86/kernel/cpu/resctrl/internal.h | 2 +
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 52 ++++++++++++++++++++++++++
> 2 files changed, 54 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 4e8109dee174..cc832955b787 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -689,6 +689,8 @@ int resctrl_arch_assign_cntr(struct rdt_mon_domain *d, enum resctrl_event_id evt
> u32 rmid, u32 cntr_id, u32 closid, bool assign);
> int rdtgroup_assign_cntr(struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
> int rdtgroup_alloc_cntr(struct rdtgroup *rdtgrp, int index);
> +int rdtgroup_unassign_cntr(struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
> +void rdtgroup_free_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp, int index);
> void rdt_staged_configs_clear(void);
> bool closid_allocated(unsigned int closid);
> int resctrl_find_cleanest_closid(void);
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 1ee91a7293a8..0c2215dbd497 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1961,6 +1961,58 @@ int rdtgroup_assign_cntr(struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
> return 0;
> }
>
> +static int rdtgroup_mbm_cntr_test(struct rdt_resource *r, u32 cntr_id)
Could "test" be replaced with something more specific about what is tested?
for example, "rdtgroup_mbm_cntr_is_assigned()" or something better? The function
looks like a good candidate for returning a bool.
Is this function needed though? (more below)
> +{
> + struct rdt_mon_domain *d;
> +
> + list_for_each_entry(d, &r->mon_domains, hdr.list)
> + if (test_bit(cntr_id, d->mbm_cntr_map))
> + return 1;
> +
> + return 0;
> +}
> +
> +/* Free the counter id after the event is unassigned */
> +void rdtgroup_free_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
> + int index)
> +{
> + /* Update the counter bitmap */
> + if (!rdtgroup_mbm_cntr_test(r, rdtgrp->mon.cntr_id[index])) {
> + mbm_cntr_free(rdtgrp->mon.cntr_id[index]);
> + rdtgrp->mon.cntr_id[index] = MON_CNTR_UNSET;
> + }
> +}
> +
> +/*
> + * Unassign a hardware counter from the group and update all the domains
> + * in the group.
> + */
> +int rdtgroup_unassign_cntr(struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
> +{
> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> + struct rdt_mon_domain *d;
> + int index;
> +
> + index = mon_event_config_index_get(evtid);
> + if (index == INVALID_CONFIG_INDEX)
> + return -EINVAL;
> +
> + if (rdtgrp->mon.cntr_id[index] != MON_CNTR_UNSET) {
> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
> + resctrl_arch_assign_cntr(d, evtid, rdtgrp->mon.rmid,
> + rdtgrp->mon.cntr_id[index],
> + rdtgrp->closid, false);
> + clear_bit(rdtgrp->mon.cntr_id[index],
> + d->mbm_cntr_map);
> + }
> +
> + /* Free the counter at group level */
> + rdtgroup_free_cntr(r, rdtgrp, index);
rdtgroup_free_cntr() is called right after the counter has been unassigned
from all domains. Will rdtgroup_mbm_cntr_test() thus not always return 0?
It seems unnecessary to have rdtgroup_mbm_cntr_test() and considering that,
rdtgroup_free_cntr() can just be open coded here?
> + }
> +
> + return 0;
> +}
> +
> /* rdtgroup information files for one cache resource. */
> static struct rftype res_common_files[] = {
> {
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 17/22] x86/resctrl: Assign/unassign counters by default when ABMC is enabled
2024-08-06 22:00 ` [PATCH v6 17/22] x86/resctrl: Assign/unassign counters by default when ABMC is enabled Babu Moger
@ 2024-08-16 21:42 ` Reinette Chatre
2024-08-21 17:20 ` Moger, Babu
0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-08-16 21:42 UTC (permalink / raw)
To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
On 8/6/24 3:00 PM, Babu Moger wrote:
> Assign/unassign counters on resctrl group creation/deletion. Two counters
> are required per group, one for total event and one for local event.
>
> There are only limited number of counters for assignment. If the counters
> are exhausted, report the warnings and continue. It is not required to
Regarding "report the warnings and continue", which warnings are you referring to?
> fail group creation for assignment failures. Users have the option to
> modify the assignments later.
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v6: Removed the redundant comments on all the calls of
> rdtgroup_assign_cntrs. Updated the commit message.
> Dropped printing error message on every call of rdtgroup_assign_cntrs.
>
> v5: Removed the code to enable/disable ABMC during the mount.
> That will be another patch.
> Added arch callers to get the arch specific data.
> Renamed fuctions to match the other abmc function.
> Added code comments for assignment failures.
>
> v4: Few name changes based on the upstream discussion.
> Commit message update.
>
> v3: This is a new patch. Patch addresses the upstream comment to enable
> ABMC feature by default if the feature is available.
> ---
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 55 ++++++++++++++++++++++++++
> 1 file changed, 55 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 0c2215dbd497..d93c1d784b91 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2908,6 +2908,46 @@ static void schemata_list_destroy(void)
> }
> }
>
> +/*
> + * Called when new group is created. Assign the counters if ABMC is
Please replace ABMC with resctrl fs generic terms.
> + * already enabled. Two counters are required per group, one for total
> + * event and one for local event. With limited number of counters,
> + * the assignments can fail in some cases. But, it is not required to
> + * fail the group creation. Users have the option to modify the
> + * assignments after the group creation.
> + */
> +static int rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
> +{
> + int ret = 0;
> +
> + if (!resctrl_arch_get_abmc_enabled())
> + return 0;
> +
> + if (is_mbm_total_enabled())
> + ret = rdtgroup_assign_cntr(rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
> +
> + if (!ret && is_mbm_local_enabled())
> + ret = rdtgroup_assign_cntr(rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
> +
> + return ret;
> +}
> +
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 18/22] x86/resctrl: Report "Unassigned" for MBM events in ABMC mode
2024-08-06 22:00 ` [PATCH v6 18/22] x86/resctrl: Report "Unassigned" for MBM events in ABMC mode Babu Moger
@ 2024-08-16 21:42 ` Reinette Chatre
2024-08-21 17:30 ` Moger, Babu
0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-08-16 21:42 UTC (permalink / raw)
To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
On 8/6/24 3:00 PM, Babu Moger wrote:
> In ABMC mode, the hardware counter should be assigned to read the MBM
> events.
>
> Report "Unassigned" in case the user attempts to read the events without
> assigning the counter.
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v6: Added more explaination in the resctrl.rst
> Added checks to detect "Unassigned" before reading RMID.
>
> v5: New patch.
> ---
> Documentation/arch/x86/resctrl.rst | 11 +++++++++++
> arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 13 ++++++++++++-
> 2 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index fe9f10766c4f..aea440ee6107 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -294,6 +294,17 @@ with the following files:
> "num_mbm_cntrs":
> The number of monitoring counters available for assignment.
>
> + Resctrl subsystem provides the interface to count maximum of two
> + MBM events per group, from a combination of total and local events.
> + Keeping the current interface, users can assign a maximum of two
> + monitoring counters per group. User will also have the option to
> + enable only one counter to the group.
> +
> + With limited number of counters, system can run out of assignable counters.
> + In mbm_cntr_assign mode, the MBM event counters will return "Unassigned" if
> + the counter is not assigned to the event when read. Users need to assign a
> + counter manually to read the events.
This seems more appropriate for the "mon_data" section.
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 19/22] x86/resctrl: Introduce the interface to switch between monitor modes
2024-08-06 22:00 ` [PATCH v6 19/22] x86/resctrl: Introduce the interface to switch between monitor modes Babu Moger
2024-08-16 16:31 ` James Morse
@ 2024-08-16 21:42 ` Reinette Chatre
2024-08-21 18:08 ` Moger, Babu
1 sibling, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-08-16 21:42 UTC (permalink / raw)
To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
On 8/6/24 3:00 PM, Babu Moger wrote:
> +static ssize_t rdtgroup_mbm_mode_write(struct kernfs_open_file *of,
> + char *buf, size_t nbytes,
> + loff_t off)
> +{
> + int mbm_cntr_assign = resctrl_arch_get_abmc_enabled();
This needs to be protected by the mutex.
> + struct rdt_resource *r = of->kn->parent->priv;
> + int ret = 0;
> +
> + /* Valid input requires a trailing newline */
> + if (nbytes == 0 || buf[nbytes - 1] != '\n')
> + return -EINVAL;
> +
> + buf[nbytes - 1] = '\0';
> +
> + cpus_read_lock();
> + mutex_lock(&rdtgroup_mutex);
> +
> + rdt_last_cmd_clear();
> +
> + if (!strcmp(buf, "legacy")) {
> + if (mbm_cntr_assign)
> + resctrl_arch_mbm_cntr_assign_disable();
> + } else if (!strcmp(buf, "mbm_cntr_assign")) {
> + if (!mbm_cntr_assign) {
> + rdtgroup_mbm_cntr_reset(r);
> + ret = resctrl_arch_mbm_cntr_assign_enable();
> + }
> + } else {
> + ret = -EINVAL;
> + }
> +
> + mutex_unlock(&rdtgroup_mutex);
> + cpus_read_unlock();
> +
> + return ret ?: nbytes;
> +}
> +
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 20/22] x86/resctrl: Enable AMD ABMC feature by default when supported
2024-08-06 22:00 ` [PATCH v6 20/22] x86/resctrl: Enable AMD ABMC feature by default when supported Babu Moger
2024-08-16 16:32 ` James Morse
@ 2024-08-16 22:33 ` Reinette Chatre
2024-08-19 18:18 ` Moger, Babu
1 sibling, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-08-16 22:33 UTC (permalink / raw)
To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
On 8/6/24 3:00 PM, Babu Moger wrote:
> Enable ABMC by default when supported during the boot up.
>
> Users will not see any difference in the behavior when resctrl is
> mounted. With automatic assignment everything will work as running
> in the legacy monitor mode.
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v6 : Keeping the default enablement in arch init code for now.
> This may need some discussion.
> Renamed resctrl_arch_configure_abmc to resctrl_arch_mbm_cntr_assign_configure.
>
> v5: New patch to enable ABMC by default.
> ---
> arch/x86/kernel/cpu/resctrl/core.c | 2 ++
> arch/x86/kernel/cpu/resctrl/internal.h | 1 +
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 17 +++++++++++++++++
> 3 files changed, 20 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 6fb0cfdb5529..a7980f84c487 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -599,6 +599,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
> d = container_of(hdr, struct rdt_mon_domain, hdr);
>
> cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
> + resctrl_arch_mbm_cntr_assign_configure();
> return;
> }
>
> @@ -620,6 +621,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
> arch_mon_domain_online(r, d);
>
> resctrl_mbm_evt_config_init(hw_dom);
> + resctrl_arch_mbm_cntr_assign_configure();
>
> if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
> mon_domain_free(hw_dom);
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index cc832955b787..ba3012f8f940 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -685,6 +685,7 @@ int mbm_cntr_alloc(struct rdt_resource *r);
> void mbm_cntr_free(u32 cntr_id);
> void resctrl_mbm_evt_config_init(struct rdt_hw_mon_domain *hw_dom);
> unsigned int mon_event_config_index_get(u32 evtid);
> +void resctrl_arch_mbm_cntr_assign_configure(void);
> int resctrl_arch_assign_cntr(struct rdt_mon_domain *d, enum resctrl_event_id evtid,
> u32 rmid, u32 cntr_id, u32 closid, bool assign);
> int rdtgroup_assign_cntr(struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 66febff2a3d3..d15fd1bde5f4 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2756,6 +2756,23 @@ void resctrl_arch_mbm_cntr_assign_disable(void)
> }
> }
>
> +void resctrl_arch_mbm_cntr_assign_configure(void)
> +{
> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> + struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
> + bool enable = true;
> +
> + mutex_lock(&rdtgroup_mutex);
> +
> + if (r->mon.mbm_cntr_assignable) {
> + if (!hw_res->mbm_cntr_assign_enabled)
> + hw_res->mbm_cntr_assign_enabled = true;
> + resctrl_abmc_set_one_amd(&enable);
Earlier changelogs mentioned that counters are reset when ABMC is enabled.
How does that behave here when one CPU comes online? Consider the scenario where
a system is booted without all CPUs online. ABMC is initially enabled on all online
CPUs with this flow ... user space could start using resctrl fs and create
monitor groups that start accumulating architectural state. If the remaining
CPUs come online at this point and this snippet enables ABMC, would it reset
all counters? Should the architectural state be cleared?
Also, it still does not look right that the architecture decides the policy.
Could this enabling be moved to resctrl_online_cpu() for resctrl fs to
request architecture to enable assignable counters if it is supported?
> + }
> +
> + mutex_unlock(&rdtgroup_mutex);
> +}
> +
> /*
> * We don't allow rdtgroup directories to be created anywhere
> * except the root directory. Thus when looking for the rdtgroup
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 22/22] x86/resctrl: Introduce interface to modify assignment states of the groups
2024-08-06 22:00 ` [PATCH v6 22/22] x86/resctrl: Introduce interface to modify assignment states of " Babu Moger
@ 2024-08-16 22:33 ` Reinette Chatre
2024-08-21 20:11 ` Moger, Babu
0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-08-16 22:33 UTC (permalink / raw)
To: Babu Moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
On 8/6/24 3:00 PM, Babu Moger wrote:
> Introduce the interface to assign MBM events in ABMC mode.
>
> Events can be enabled or disabled by writing to file
> /sys/fs/resctrl/info/L3_MON/mbm_control
>
> Format is similar to the list format with addition of opcode for the
> assignment operation.
> "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
>
> Format for specific type of groups:
>
> * Default CTRL_MON group:
> "//<domain_id><opcode><flags>"
>
> * Non-default CTRL_MON group:
> "<CTRL_MON group>//<domain_id><opcode><flags>"
>
> * Child MON group of default CTRL_MON group:
> "/<MON group>/<domain_id><opcode><flags>"
>
> * Child MON group of non-default CTRL_MON group:
> "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
>
> Domain_id '*' will apply the flags on all the domains.
>
> Opcode can be one of the following:
>
> = Update the assignment to match the flags
> + assign a MBM event
> - unassign a MBM event
>
> Assignment flags can be one of the following:
> t MBM total event
> l MBM local event
> tl Both total and local MBM events
> _ None of the MBM events. Valid only with '=' opcode.
(please note comments in this area of cover letter)
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v6: Added support assign all if domain id is '*'
> Fixed the allocation of counter id if it not assigned already.
>
> v5: Interface name changed from mbm_assign_control to mbm_control.
> Fixed opcode and flags combination.
> '=_" is valid.
> "-_" amd "+_" is not valid.
> Minor message update.
> Renamed the function with prefix - rdtgroup_.
> Corrected few documentation mistakes.
> Rebase related changes after SNC support.
>
> v4: Added domain specific assignments. Fixed the opcode parsing.
>
> v3: New patch.
> Addresses the feedback to provide the global assignment interface.
> https://lore.kernel.org/lkml/c73f444b-83a1-4e9a-95d3-54c5165ee782@intel.com/
> ---
> Documentation/arch/x86/resctrl.rst | 94 +++++++-
> arch/x86/kernel/cpu/resctrl/internal.h | 7 +
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 315 ++++++++++++++++++++++++-
> 3 files changed, 414 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index 113c22ba6db3..ae3b17b7cefe 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -346,7 +346,7 @@ with the following files:
> t MBM total event is enabled.
> l MBM local event is enabled.
> tl Both total and local MBM events are enabled.
> - _ None of the MBM events are enabled.
> + _ None of the MBM events are enabled. Only works with opcode '=' for write.
>
> Examples:
> ::
> @@ -365,6 +365,98 @@ with the following files:
> enabled on domain 0 and 1.
>
>
> + Assignment state can be updated by writing to the interface.
> +
> + Format is similar to the list format with addition of opcode for the
> + assignment operation.
> +
> + "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
> +
> + Format for each type of groups:
> +
> + * Default CTRL_MON group:
> + "//<domain_id><opcode><flags>"
> +
> + * Non-default CTRL_MON group:
> + "<CTRL_MON group>//<domain_id><opcode><flags>"
> +
> + * Child MON group of default CTRL_MON group:
> + "/<MON group>/<domain_id><opcode><flags>"
> +
> + * Child MON group of non-default CTRL_MON group:
> + "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
> +
> + Domain_id '*' wil apply the flags on all the domains.
> +
> + Opcode can be one of the following:
> + ::
> +
> + = Update the assignment to match the MBM event.
> + + Assign a MBM event.
"Assign a new MBM event without impacting existing assignments."?
> + - Unassign a MBM event.
(similar)
> +
> + Examples:
> + ::
> +
> + Initial group status:
> + # cat /sys/fs/resctrl/info/L3_MON/mbm_control
> + non_default_ctrl_mon_grp//0=tl;1=tl;
> + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> + //0=tl;1=tl;
> + /child_default_mon_grp/0=tl;1=tl;
> +
> + To update the default group to assign only total MBM event on domain 0:
> + # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_control
> +
> + Assignment status after the update:
> + # cat /sys/fs/resctrl/info/L3_MON/mbm_control
> + non_default_ctrl_mon_grp//0=tl;1=tl;
> + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> + //0=t;1=tl;
> + /child_default_mon_grp/0=tl;1=tl;
> +
> + To update the MON group child_default_mon_grp to remove total MBM event on domain 1:
> + # echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_control
> +
> + Assignment status after the update:
> + $ cat /sys/fs/resctrl/info/L3_MON/mbm_control
> + non_default_ctrl_mon_grp//0=tl;1=tl;
> + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> + //0=t;1=tl;
> + /child_default_mon_grp/0=tl;1=l;
> +
> + To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to
> + unassign both local and total MBM events on domain 1:
> + # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
> + /sys/fs/resctrl/info/L3_MON/mbm_control
> +
> + Assignment status after the update:
> + non_default_ctrl_mon_grp//0=tl;1=tl;
> + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
> + //0=t;1=tl;
> + /child_default_mon_grp/0=tl;1=l;
> +
> + To update the default group to add a local MBM event domain 0.
> + # echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_control
> +
> + Assignment status after the update:
> + # cat /sys/fs/resctrl/info/L3_MON/mbm_control
> + non_default_ctrl_mon_grp//0=tl;1=tl;
> + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
> + //0=tl;1=tl;
> + /child_default_mon_grp/0=tl;1=l;
> +
> + To update the non default CTRL_MON group non_default_ctrl_mon_grp to unassign all
> + the MBM events on all the domains.
> + # echo "non_default_ctrl_mon_grp//*=_" > /sys/fs/resctrl/info/L3_MON/mbm_control
> +
> + Assignment status after the update:
> + #cat /sys/fs/resctrl/info/L3_MON/mbm_control
> + non_default_ctrl_mon_grp//0=_;1=_;
> + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
> + //0=tl;1=tl;
> + /child_default_mon_grp/0=tl;1=l;
> +
> "max_threshold_occupancy":
> Read/write file provides the largest value (in
> bytes) at which a previously used LLC_occupancy
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index ba3012f8f940..5af225b4a497 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -46,6 +46,13 @@
>
> #define MON_CNTR_UNSET U32_MAX
>
> +/*
> + * Assignment flags for ABMC feature
(this is resctrl fs code)
> + */
> +#define ASSIGN_NONE 0
> +#define ASSIGN_TOTAL BIT(QOS_L3_MBM_TOTAL_EVENT_ID)
> +#define ASSIGN_LOCAL BIT(QOS_L3_MBM_LOCAL_EVENT_ID)
> +
> /**
> * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
> * aren't marked nohz_full
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index d7aadca5e4ab..8567fb3a6274 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1034,6 +1034,318 @@ static int rdtgroup_mbm_control_show(struct kernfs_open_file *of,
> return 0;
> }
>
> +/*
> + * Update the assign states for the domain.
> + *
> + * If this is a new assignment for the group then allocate a counter and update
> + * the assignment else just update the assign state
> + */
> +static int rdtgroup_assign_update(struct rdtgroup *rdtgrp, enum resctrl_event_id evtid,
> + struct rdt_mon_domain *d)
> +{
> + int ret, index;
> +
> + index = mon_event_config_index_get(evtid);
> + if (index == INVALID_CONFIG_INDEX)
> + return -EINVAL;
(wrong spacing ... see checkpatch.pl)
> +
> + if (rdtgrp->mon.cntr_id[index] == MON_CNTR_UNSET) {
> + ret = rdtgroup_alloc_cntr(rdtgrp, index);
> + if (ret < 0)
> + goto out_done;
> + }
> +
> + /* Update the state on all domains if d == NULL */
> + if (d == NULL) {
if (!d) ... (checkpatch)
> + ret = rdtgroup_assign_cntr(rdtgrp, evtid);
> + } else {
> + ret = resctrl_arch_assign_cntr(d, evtid, rdtgrp->mon.rmid,
> + rdtgrp->mon.cntr_id[index],
> + rdtgrp->closid, 1);
> + if (!ret)
> + set_bit(rdtgrp->mon.cntr_id[index], d->mbm_cntr_map);
> + }
> +
> +out_done:
> + return ret;
> +}
Please merge this with almost identical rdtgroup_assign_cntr()
> +
> +/*
> + * Update the unassign state for the domain.
> + *
> + * Free the counter if it is unassigned on all the domains else just
> + * update the unassign state
> + */
> +static int rdtgroup_unassign_update(struct rdtgroup *rdtgrp, enum resctrl_event_id evtid,
> + struct rdt_mon_domain *d)
> +{
> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> + int ret = 0, index;
> +
> + index = mon_event_config_index_get(evtid);
> + if (index == INVALID_CONFIG_INDEX)
> + return -EINVAL;
(wrong spacing ... see checkpatch.pl)
> +
> + if (rdtgrp->mon.cntr_id[index] == MON_CNTR_UNSET)
> + goto out_done;
> +
> + if (d == NULL) {
if (!d)
> + ret = rdtgroup_unassign_cntr(rdtgrp, evtid);
> + } else {
> + ret = resctrl_arch_assign_cntr(d, evtid, rdtgrp->mon.rmid,
> + rdtgrp->mon.cntr_id[index],
> + rdtgrp->closid, 0);
> + if (!ret) {
> + clear_bit(rdtgrp->mon.cntr_id[index], d->mbm_cntr_map);
> + rdtgroup_free_cntr(r, rdtgrp, index);
> + }
> + }
> +
> +out_done:
> + return ret;
> +}
Please merge this with almost identical rdtgroup_unassign_cntr()
> +
> +static int rdtgroup_str_to_mon_state(char *flag)
> +{
> + int i, mon_state = 0;
> +
> + for (i = 0; i < strlen(flag); i++) {
> + switch (*(flag + i)) {
> + case 't':
> + mon_state |= ASSIGN_TOTAL;
> + break;
> + case 'l':
> + mon_state |= ASSIGN_LOCAL;
> + break;
> + case '_':
> + mon_state = ASSIGN_NONE;
> + break;
It looks like this supports flags like "_lt", treating it as assigning
both local and total. I expect this should remove all flags instead?
> + default:
> + break;
> + }
> + }
> +
> + return mon_state;
> +}
hmmm ... so you removed assigning mon_state to ASSIGN_NONE from default,
but that did not change what this function returns since ASSIGN_NONE is 0
and mon_state is initialized to 0. Unknown flags should cause error so
that it is possible to add flags in the future. Above prevents us from
ever adding new flags.
> +
> +static struct rdtgroup *rdtgroup_find_grp(enum rdt_group_type rtype, char *p_grp, char *c_grp)
rdtgroup_find_grp() -> rdtgroup_find_grp_by_name()?
> +{
> + struct rdtgroup *rdtg, *crg;
> +
> + if (rtype == RDTCTRL_GROUP && *p_grp == '\0') {
> + return &rdtgroup_default;
> + } else if (rtype == RDTCTRL_GROUP) {
> + list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list)
> + if (!strcmp(p_grp, rdtg->kn->name))
> + return rdtg;
> + } else if (rtype == RDTMON_GROUP) {
> + list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
> + if (!strcmp(p_grp, rdtg->kn->name)) {
> + list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
> + mon.crdtgrp_list) {
> + if (!strcmp(c_grp, crg->kn->name))
> + return crg;
> + }
> + }
> + }
> + }
> +
> + return NULL;
> +}
> +
> +static int rdtgroup_process_flags(struct rdt_resource *r,
> + enum rdt_group_type rtype,
> + char *p_grp, char *c_grp, char *tok)
> +{
> + int op, mon_state, assign_state, unassign_state;
> + char *dom_str, *id_str, *op_str;
> + struct rdt_mon_domain *d;
> + struct rdtgroup *rdtgrp;
> + unsigned long dom_id;
> + int ret, found = 0;
> +
> + rdtgrp = rdtgroup_find_grp(rtype, p_grp, c_grp);
> +
> + if (!rdtgrp) {
> + rdt_last_cmd_puts("Not a valid resctrl group\n");
> + return -EINVAL;
> + }
> +
> +next:
> + if (!tok || tok[0] == '\0')
> + return 0;
> +
> + /* Start processing the strings for each domain */
> + dom_str = strim(strsep(&tok, ";"));
> +
> + op_str = strpbrk(dom_str, "=+-");
> +
> + if (op_str) {
> + op = *op_str;
> + } else {
> + rdt_last_cmd_puts("Missing operation =, +, -, _ character\n");
> + return -EINVAL;
> + }
> +
> + id_str = strsep(&dom_str, "=+-");
> +
> + /* Check for domain id '*' which means all domains */
> + if (id_str && *id_str == '*') {
> + d = NULL;
> + goto check_state;
> + } else if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
> + rdt_last_cmd_puts("Missing domain id\n");
> + return -EINVAL;
> + }
> +
> + /* Verify if the dom_id is valid */
> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
> + if (d->hdr.id == dom_id) {
> + found = 1;
> + break;
> + }
> + }
> +
> + if (!found) {
> + rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
> + return -EINVAL;
> + }
> +
> +check_state:
> + mon_state = rdtgroup_str_to_mon_state(dom_str);
Function should return error and exit here.
> +
> + assign_state = 0;
> + unassign_state = 0;
> +
> + switch (op) {
> + case '+':
> + if (mon_state == ASSIGN_NONE) {
> + rdt_last_cmd_puts("Invalid assign opcode\n");
> + goto out_fail;
> + }
> + assign_state = mon_state;
> + break;
> + case '-':
> + if (mon_state == ASSIGN_NONE) {
> + rdt_last_cmd_puts("Invalid assign opcode\n");
> + goto out_fail;
> + }
> + unassign_state = mon_state;
> + break;
> + case '=':
> + assign_state = mon_state;
> + unassign_state = (ASSIGN_TOTAL | ASSIGN_LOCAL) & ~assign_state;
> + break;
> + default:
> + break;
> + }
> +
> + if (assign_state & ASSIGN_TOTAL) {
> + ret = rdtgroup_assign_update(rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID, d);
> + if (ret)
> + goto out_fail;
> + }
Should unassign occur before assign so that unassign can make counters available for
assign that follows?
> +
> + if (assign_state & ASSIGN_LOCAL) {
> + ret = rdtgroup_assign_update(rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID, d);
> + if (ret)
> + goto out_fail;
> + }
> +
> + if (unassign_state & ASSIGN_TOTAL) {
> + ret = rdtgroup_unassign_update(rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID, d);
> + if (ret)
> + goto out_fail;
> + }
> +
> + if (unassign_state & ASSIGN_LOCAL) {
> + ret = rdtgroup_unassign_update(rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID, d);
> + if (ret)
> + goto out_fail;
> + }
> +
> + goto next;
> +
> +out_fail:
> +
> + return -EINVAL;
> +}
> +
> +static ssize_t rdtgroup_mbm_control_write(struct kernfs_open_file *of,
> + char *buf, size_t nbytes,
> + loff_t off)
> +{
> + struct rdt_resource *r = of->kn->parent->priv;
> + char *token, *cmon_grp, *mon_grp;
> + int ret;
> +
> + if (!resctrl_arch_get_abmc_enabled())
> + return -EINVAL;
This needs to be protected by mutex.
> +
> + /* Valid input requires a trailing newline */
> + if (nbytes == 0 || buf[nbytes - 1] != '\n')
> + return -EINVAL;
> +
> + buf[nbytes - 1] = '\0';
> +
> + cpus_read_lock();
> + mutex_lock(&rdtgroup_mutex);
> + rdt_last_cmd_clear();
> +
> + while ((token = strsep(&buf, "\n")) != NULL) {
> + if (strstr(token, "//")) {
> + /*
> + * The CTRL_MON group processing:
> + * default CTRL_MON group: "//<flags>"
> + * non-default CTRL_MON group: "<CTRL_MON group>//flags"
> + * The CTRL_MON group will be empty string if it is a
> + * default group.
> + */
> + cmon_grp = strsep(&token, "//");
> +
> + /*
> + * strsep returns empty string for contiguous delimiters.
> + * Make sure check for two consecutive delimiters and
> + * advance the token.
> + */
> + mon_grp = strsep(&token, "//");
> + if (*mon_grp != '\0') {
> + rdt_last_cmd_printf("Invalid CTRL_MON group format %s\n", token);
> + ret = -EINVAL;
> + break;
> + }
> +
> + ret = rdtgroup_process_flags(r, RDTCTRL_GROUP, cmon_grp, mon_grp, token);
> + if (ret)
> + break;
> + } else if (strstr(token, "/")) {
> + /*
> + * MON group processing:
> + * MON_GROUP inside default CTRL_MON group: "/<MON group>/<flags>"
> + * MON_GROUP within CTRL_MON group: "<CTRL_MON group>/<MON group>/<flags>"
> + */
> + cmon_grp = strsep(&token, "/");
Isn't strsep(&token, "//") the same as strsep(&token, "/")? It looks like these two big branches
can be merged.
> +
> + /* Extract the MON_GROUP. It cannot be empty string */
> + mon_grp = strsep(&token, "/");
> + if (*mon_grp == '\0') {
> + rdt_last_cmd_printf("Invalid MON_GROUP format %s\n", token);
> + ret = -EINVAL;
> + break;
> + }
> +
> + ret = rdtgroup_process_flags(r, RDTMON_GROUP, cmon_grp, mon_grp, token);
> + if (ret)
> + break;
> + }
> + }
> +
> + mutex_unlock(&rdtgroup_mutex);
> + cpus_read_unlock();
> +
> + return ret ?: nbytes;
> +}
> +
> #ifdef CONFIG_PROC_CPU_RESCTRL
>
> /*
> @@ -2277,9 +2589,10 @@ static struct rftype res_common_files[] = {
> },
> {
> .name = "mbm_control",
> - .mode = 0444,
> + .mode = 0644,
> .kf_ops = &rdtgroup_kf_single_ops,
> .seq_show = rdtgroup_mbm_control_show,
> + .write = rdtgroup_mbm_control_write,
> },
> {
> .name = "cpus_list",
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 03/22] x86/resctrl: Consolidate monitoring related data from rdt_resource
2024-08-16 21:29 ` Reinette Chatre
@ 2024-08-19 14:46 ` Moger, Babu
0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-08-19 14:46 UTC (permalink / raw)
To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Reinette,
On 8/16/24 16:29, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/6/24 3:00 PM, Babu Moger wrote:
>> The cache allocation and memory bandwidth allocation feature properties
>> are consolidated into cache and membw structures respectively.
>
> "are consolidated into cache and membw structures respectively" ->
> "are consolidated into struct resctrl_cache and struct resctrl_membw
> respectively"
Sure.
>
>>
>> In preparation for more monitoring properties that will clobber the
>> existing resource struct more, re-organize the monitoring specific
>> properties to also be in a separate structure.
>>
>> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>
> ...
>
>> @@ -182,12 +182,21 @@ enum resctrl_scope {
>> RESCTRL_L3_NODE,
>> };
>> +/**
>> + * struct resctrl_mon - Monitoring related data
>
> To capture that this is not global monitoring data but instead
> resource specific:
> "Monitoring related data" -> "Monitoring related data of a resctrl resource"
Sure.
>
>> + * @num_rmid: Number of RMIDs available
>> + * @evt_list: List of monitoring events
>> + */
>> +struct resctrl_mon {
>> + int num_rmid;
>> + struct list_head evt_list;
>> +};
>> +
>> /**
>> * struct rdt_resource - attributes of a resctrl resource
>> * @rid: The index of the resource
>> * @alloc_capable: Is allocation available on this machine
>> * @mon_capable: Is monitor feature available on this machine
>> - * @num_rmid: Number of RMIDs available
>> * @ctrl_scope: Scope of this resource for control functions
>> * @mon_scope: Scope of this resource for monitor functions
>> * @cache: Cache allocation related data
>> @@ -199,7 +208,6 @@ enum resctrl_scope {
>> * @default_ctrl: Specifies default cache cbm or memory B/W percent.
>> * @format_str: Per resource format string to show domain value
>> * @parse_ctrlval: Per resource function pointer to parse control
>> values
>> - * @evt_list: List of monitoring events
>> * @fflags: flags to choose base and info files
>> * @cdp_capable: Is the CDP feature available on this resource
>> */
>
> Please add a kernel-doc entry for the new member.
Yes. Missed it.
>
>> @@ -207,11 +215,11 @@ struct rdt_resource {
>> int rid;
>> bool alloc_capable;
>> bool mon_capable;
>> - int num_rmid;
>> enum resctrl_scope ctrl_scope;
>> enum resctrl_scope mon_scope;
>> struct resctrl_cache cache;
>> struct resctrl_membw membw;
>> + struct resctrl_mon mon;
>> struct list_head ctrl_domains;
>> struct list_head mon_domains;
>> char *name;
>> @@ -221,7 +229,6 @@ struct rdt_resource {
>> int (*parse_ctrlval)(struct rdt_parse_data *data,
>> struct resctrl_schema *s,
>> struct rdt_ctrl_domain *d);
>> - struct list_head evt_list;
>> unsigned long fflags;
>> bool cdp_capable;
>> };
>
> Reinette
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 19/22] x86/resctrl: Introduce the interface to switch between monitor modes
2024-08-16 18:09 ` Reinette Chatre
@ 2024-08-19 14:52 ` Reinette Chatre
2024-08-19 18:27 ` Peter Newman
0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-08-19 14:52 UTC (permalink / raw)
To: Peter Newman
Cc: James Morse, Babu Moger, x86, hpa, paulmck, rdunlap, tj, peterz,
yanjiewtw, kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao,
jpoimboe, rick.p.edgecombe, kirill.shutemov, jithu.joseph,
kai.huang, kan.liang, daniel.sneddon, pbonzini, sandipan.das,
ilpo.jarvinen, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, mingo, bp, corbet, dave.hansen, fenghua.yu, tglx
Hi Peter and James,
On 8/16/24 11:09 AM, Reinette Chatre wrote:
> Hi Peter,
>
> On 8/16/24 10:16 AM, Peter Newman wrote:
>> Hi Reinette,
>>
>> On Fri, Aug 16, 2024 at 10:01 AM Reinette Chatre
>> <reinette.chatre@intel.com> wrote:
>>>
>>> Hi James,
>>>
>>> On 8/16/24 9:31 AM, James Morse wrote:
>>>> Hi Babu,
>>>>
>>>> On 06/08/2024 23:00, Babu Moger wrote:
>>>>> Introduce interface to switch between ABMC and legacy modes.
>>>>>
>>>>> By default ABMC is enabled on boot if the feature is available.
>>>>> Provide the interface to go back to legacy mode if required.
>>>>
>>>> I may have missed it on an earlier version ... why would anyone want the non-ABMC
>>>> behaviour on hardware that requires it: counters randomly reset and randomly return
>>>> 'Unavailable'... is that actually useful?
>>>>
>>>> You default this to on, so there isn't a backward compatibility argument here.
>>>>
>>>> It seems like being able to disable this is a source of complexity - is it needed?
>>>
>>> The ability to go back to legacy was added while looking ahead to support the next
>>> "assignable counter" feature that is software based ("soft-RMID" .. "soft-ABMC"?).
>>>
>>> This series adds support for ABMC on recent AMD hardware to address the issue described
>>> in cover letter. This issue also exists on earlier AMD hardware that does not have the ABMC
>>> feature and Peter is working on a software solution to address the issue on non-ABMC hardware.
>>> This software solution is expected to have the same interface as the hardware solution but
>>> earlier discussions revealed that it may introduce extra latency that users may only want to
>>> accept during periods of active monitoring. Thus the option to disable the counter assignment
>>> mode.
>>
>> Sorry again for the soft-RMID/soft-ABMC confusion[1], it was soft-RMID
>> that impacted context switch latency. Soft-ABMC does not require any
>> additional work at context switch.
>
> No problem. I did read [1] but I do not think I've seen soft-ABMC yet so
> my understanding of what it does is vague.
>
>> The only disadvantage to soft-ABMC I can think of is that it also
>> limits reading llc_occupancy event counts to "assigned" groups,
>> whereas without it, llc_occupancy works reliably on all RMIDs on AMD
>> hardware.
>
> hmmm ... keeping original llc_occupancy behavior does seem useful enough
> as motivation to keep the "legacy"/"default" mbm_assign_mode? It does sound
> to me as though soft-ABMC may not be as accurate when it comes to llc_occupancy.
> As I understand the hardware may tag entries in cache with RMID and that has a longer
> lifetime than the tasks that allocated that data into the cache. If soft-ABMC
> permanently associates an RMID with a local and total counter pair but that
> RMID is dynamically assigned to resctrl groups then a group may not always
> get the same RMID ... and thus its llc_occupancy data would be a combination of
> its cache allocations and all the cache allocations of resource groups that had
> that RMID before it. This may need significantly enhanced "limbo" handling?
To expand on this we may have to rework the interface if the counters can be
assigned to events other than MBM.
James: could you please elaborate how you plan to use this feature and if this
interface works for the planned usage?
Peter: considering the previous example [1] where soft-ABMC was using the "mbm_control"
interface I do not think it is ideal to only use the "t" and "l" flags while
llc_occupancy is also enabled/disabled via this interface. We should consider
(a) renaming the control file to indicate larger scope than MBM, (b) add flags
for llc_occupancy. What do you think? I believe this is in line with stated goal
from [1]: "I believe mbm_control should always accurately reflect which events
are being counted."
Reinette
[1] https://lore.kernel.org/lkml/CALPaoCi1CwLy_HbFNOxPfdReEJstd3c+DvOMJHb5P9jBP+iatw@mail.gmail.com/
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 04/22] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
2024-08-16 21:30 ` Reinette Chatre
@ 2024-08-19 15:37 ` Moger, Babu
0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-08-19 15:37 UTC (permalink / raw)
To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Reinette,
On 8/16/24 16:30, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/6/24 3:00 PM, Babu Moger wrote:
>> ABMC feature details are reported via CPUID Fn8000_0020_EBX_x5.
>> Bits Description
>> 15:0 MAX_ABMC Maximum Supported Assignable Bandwidth
>> Monitoring Counter ID + 1
>>
>> The feature details are documented in APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC).
>>
>> Detect the feature and number of assignable counters supported.
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v6: Commit message update.
>> Renamed abmc_capable to mbm_cntr_assignable.
>>
>> v5: Name change num_cntrs to num_mbm_cntrs.
>> Moved abmc_capable to resctrl_mon.
>>
>> v4: Removed resctrl_arch_has_abmc(). Added all the code inline. We dont
>> need to separate this as arch code.
>>
>> v3: Removed changes related to mon_features.
>> Moved rdt_cpu_has to core.c and added new function
>> resctrl_arch_has_abmc.
>> Also moved the fields mbm_assign_capable and mbm_assign_cntrs to
>> rdt_resource. (James)
>>
>> v2: Changed the field name to mbm_assign_capable from abmc_capable.
>> ---
>> arch/x86/kernel/cpu/resctrl/monitor.c | 12 ++++++++++++
>> include/linux/resctrl.h | 4 ++++
>> 2 files changed, 16 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
>> b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 795fe91a8feb..88312b5f0069 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -1229,6 +1229,18 @@ int __init rdt_get_mon_l3_config(struct
>> rdt_resource *r)
>> mbm_local_event.configurable = true;
>> mbm_config_rftype_init("mbm_local_bytes_config");
>> }
>> +
>> + if (rdt_cpu_has(X86_FEATURE_ABMC)) {
>> + r->mon.mbm_cntr_assignable = true;
>> + /*
>> + * Query CPUID_Fn80000020_EBX_x05 for number of
>> + * ABMC counters.
>> + */
>
> At this point this comment seems unnecessary. Not an issue, it can stay of
> you
> prefer.
>
>> + cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
>> + r->mon.num_mbm_cntrs = (ebx & 0xFFFF) + 1;
>> + if (WARN_ON(r->mon.num_mbm_cntrs > 64))
>
> Please document where this "64" limit comes from. This is potentially a
> problem
> since the resctrl fs managed bitmap is hardcoded to be of size 64 but the
> arch code
> sets how many counters are supported. Will comment more later on bitmap
> portions, but
> to handle this I expect resctrl fs should at least sanity check the number
> of counters
> before attempting to initialize its bitmap ... or better, as James
> suggests, make the
> bitmap creation dynamic.
Yes. Agree. It is better we allocate it dynamically. Then we don't need
WARN_ON here.
>
>> + r->mon.num_mbm_cntrs = 64;
>> + }
>> }
>> l3_mon_evt_init(r);
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index 1097559f4987..72c498deeb5e 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -185,10 +185,14 @@ enum resctrl_scope {
>> /**
>> * struct resctrl_mon - Monitoring related data
>> * @num_rmid: Number of RMIDs available
>> + * @num_mbm_cntrs: Number of monitoring counters
>> + * @mbm_cntr_assignable:Is system capable of supporting monitor
>> assignment?
>> * @evt_list: List of monitoring events
>> */
>> struct resctrl_mon {
>> int num_rmid;
>> + int num_mbm_cntrs;
>> + bool mbm_cntr_assignable;
>> struct list_head evt_list;
>> };
>>
>
> Reinette
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 09/22] x86/resctrl: Introduce MBM counters bitmap
2024-08-16 21:35 ` Reinette Chatre
@ 2024-08-19 15:49 ` Moger, Babu
2024-08-20 18:08 ` Reinette Chatre
0 siblings, 1 reply; 96+ messages in thread
From: Moger, Babu @ 2024-08-19 15:49 UTC (permalink / raw)
To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Reinette,
On 8/16/24 16:35, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/6/24 3:00 PM, Babu Moger wrote:
>> Hardware provides a set of counters when mbm_cntr_assignable feature is
>> supported. These counters are used for assigning the events in resctrl
>> group when the feature is enabled.
>
> "in resctrl group" -> "in a resctrl group"?
>
Sure.
>>
>> Introduce mbm_cntrs_free_map bitmap to track available and free counters
>
> What is the difference between an available and a free counter?
It is the same. Will correct the text here.
>
>> and set of routines to allocate and free the counters.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>
>> ---
>> arch/x86/kernel/cpu/resctrl/internal.h | 2 ++
>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 33 ++++++++++++++++++++++++++
>> 2 files changed, 35 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
>> b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 154983a67646..6263362496a3 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -662,6 +662,8 @@ void __check_limbo(struct rdt_mon_domain *d, bool
>> force_free);
>> void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>> void __init resctrl_file_fflags_init(const char *config,
>> unsigned long fflags);
>> +int mbm_cntr_alloc(struct rdt_resource *r);
>> +void mbm_cntr_free(u32 cntr_id);
>> void rdt_staged_configs_clear(void);
>> bool closid_allocated(unsigned int closid);
>> int resctrl_find_cleanest_closid(void);
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index ab4fab3b7cf1..c818965e36c9 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -185,6 +185,37 @@ bool closid_allocated(unsigned int closid)
>> return !test_bit(closid, &closid_free_map);
>> }
>> +/*
>> + * Counter bitmap for tracking the available counters.
>> + * ABMC feature provides set of hardware counters for enabling events.
>
> "ABMC feature" -> "mbm_cntr_assign mode"
Sure.
>
>> + * Each event takes one hardware counter. Kernel needs to keep track
>
> "Each event takes one hardware counter" -> "Each RMID and event pair takes
> one hardware counter" ?
Sure.
>
>
>> + * of number of available counters.
>
> "of number of available counters" -> "of the number of available counters"?
Sure.
>
>> + */
>> +static DECLARE_BITMAP(mbm_cntrs_free_map, 64);
>> +
>> +static void mbm_cntrs_init(struct rdt_resource *r)
>> +{
>> + bitmap_fill(mbm_cntrs_free_map, r->mon.num_mbm_cntrs);
>
> Apart from what James mentioned about the different sizes, please also
> add checking that the resource actually supports monitoring and
> assignable counters before proceeding with the bitmap ops.
Sure.
>
>> +}
>> +
>> +int mbm_cntr_alloc(struct rdt_resource *r)
>> +{
>> + int cntr_id;
>> +
>> + cntr_id = find_first_bit(mbm_cntrs_free_map, r->mon.num_mbm_cntrs);
>> + if (cntr_id >= r->mon.num_mbm_cntrs)
>> + return -ENOSPC;
>> +
>> + __clear_bit(cntr_id, mbm_cntrs_free_map);
>> +
>> + return cntr_id;
>> +}
>> +
>> +void mbm_cntr_free(u32 cntr_id)
>> +{
>> + __set_bit(cntr_id, mbm_cntrs_free_map);
>> +}
>> +
>> /**
>> * rdtgroup_mode_by_closid - Return mode of resource group with closid
>> * @closid: closid if the resource group
>> @@ -2748,6 +2779,8 @@ static int rdt_get_tree(struct fs_context *fc)
>> closid_init();
>> + mbm_cntrs_init(&rdt_resources_all[RDT_RESOURCE_L3].r_resctrl);
>> +
>> if (resctrl_arch_mon_capable())
>> flags |= RFTYPE_MON;
>>
>
> This is also an example of what James mentioned elsewhere where there is an
> assumption that this feature applies to the L3 resource. This has a
> consequence
> that some code is global (like mbm_cntrs_free_map), assuming the L3
> resource, while
> other code takes the resource as parameter (eg. mbm_cntr_alloc()). This
> results
> in inconsistent interface where, for example, allocating a counter needs
Yes. Will address it.
> resource
> as parameter but freeing a counter does not. James already proposed different
> treatment of the bitmap and L3 resource parameters, I expect with such
> guidance
> the interfaces will become more intuitive.
>
How about making "mbm_cntrs_free_map" as part of struct resctrl_mon?
It will be pointer and allocated dynamically based on number of counters.
All the related information (num_mbm_cntrs and mbm_cntr_assignable) is
already part of this data structure.
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 06/22] x86/resctrl: Add support to enable/disable AMD ABMC feature
2024-08-16 21:31 ` Reinette Chatre
@ 2024-08-19 18:07 ` Moger, Babu
2024-08-20 18:17 ` Reinette Chatre
0 siblings, 1 reply; 96+ messages in thread
From: Moger, Babu @ 2024-08-19 18:07 UTC (permalink / raw)
To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Reinette,
On 8/16/24 16:31, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/6/24 3:00 PM, Babu Moger wrote:
>> Add the functionality to enable/disable AMD ABMC feature.
>>
>> AMD ABMC feature is enabled by setting enabled bit(0) in MSR
>> L3_QOS_EXT_CFG. When the state of ABMC is changed, the MSR needs
>> to be updated on all the logical processors in the QOS Domain.
>>
>> Hardware counters will reset when ABMC state is changed. Reset the
>
> Could you please clarify how this works when ABMC state is changed on
> one CPU in a domain vs all (as done in this patch) CPUs of a domain? In this
> patch it is clear that all hardware counters are reset and consequently
> the architectural state maintained by resctrl is reset also. Later, when
> the code is added to handle CPU online I see that ABMC state is changed
> on a new online CPU but I do not see matching reset of architectural state.
> (more in that patch later)
Yes. I missed testing this scenario.
When new cpu comes online, it should inherit the abmc state which is set
already. it should not force it either way. In that case, it is not
required to reset the architectural state.
I need to make few changes to make it work properly.
We need to set abmc state to "enabled" during the init when abmc is
detected. resctrl_late_init -> .. -> rdt_get_mon_l3_config
This only happens once during the init.
Then during the hotplug, just update the abmc state which is already set
duing the init. This should work fine.
>
>> architectural state so that reading of hardware counter is not considered
>
> "architectural state" -> "architectural state maintained by resctrl"
Sure.
>
>> as an overflow in next update.
>
> "so that reading of hardware counter" -> "so that reading of the/a(?)
> hardware counter"
Sure.
>>
>> The ABMC feature details are documented in APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC).
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v6: Renamed abmc_enabled to mbm_cntr_assign_enabled.
>> Used msr_set_bit and msr_clear_bit for msr updates.
>> Renamed resctrl_arch_abmc_enable() to
>> resctrl_arch_mbm_cntr_assign_enable().
>> Renamed resctrl_arch_abmc_disable() to
>> resctrl_arch_mbm_cntr_assign_disable().
>> Made _resctrl_abmc_enable to return void.
>>
>> v5: Renamed resctrl_abmc_enable to resctrl_arch_abmc_enable.
>> Renamed resctrl_abmc_disable to resctrl_arch_abmc_disable.
>> Introduced resctrl_arch_get_abmc_enabled to get abmc state from
>> non-arch code.
>> Renamed resctrl_abmc_set_all to _resctrl_abmc_enable().
>> Modified commit log to make it clear about AMD ABMC feature.
>>
>> v3: No changes.
>>
>> v2: Few text changes in commit message.
>> ---
>> arch/x86/include/asm/msr-index.h | 1 +
>> arch/x86/kernel/cpu/resctrl/internal.h | 13 ++++++
>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 57 ++++++++++++++++++++++++++
>> 3 files changed, 71 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/msr-index.h
>> b/arch/x86/include/asm/msr-index.h
>> index 82c6a4d350e0..d86469bf5d41 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -1182,6 +1182,7 @@
>> #define MSR_IA32_MBA_BW_BASE 0xc0000200
>> #define MSR_IA32_SMBA_BW_BASE 0xc0000280
>> #define MSR_IA32_EVT_CFG_BASE 0xc0000400
>> +#define MSR_IA32_L3_QOS_EXT_CFG 0xc00003ff
>> /* MSR_IA32_VMX_MISC bits */
>> #define MSR_IA32_VMX_MISC_INTEL_PT (1ULL << 14)
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
>> b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 2bd207624eec..154983a67646 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -56,6 +56,9 @@
>> /* Max event bits supported */
>> #define MAX_EVT_CONFIG_BITS GENMASK(6, 0)
>> +/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
>> +#define ABMC_ENABLE_BIT 0
>> +
>> /**
>> * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring
>> those that
>> * aren't marked nohz_full
>> @@ -477,6 +480,7 @@ struct rdt_parse_data {
>> * @mbm_cfg_mask: Bandwidth sources that can be tracked when Bandwidth
>> * Monitoring Event Configuration (BMEC) is supported.
>> * @cdp_enabled: CDP state of this resource
>> + * @mbm_cntr_assign_enabled: ABMC feature is enabled
>> *
>> * Members of this structure are either private to the architecture
>> * e.g. mbm_width, or accessed via helpers that provide abstraction. e.g.
>> @@ -491,6 +495,7 @@ struct rdt_hw_resource {
>> unsigned int mbm_width;
>> unsigned int mbm_cfg_mask;
>> bool cdp_enabled;
>> + bool mbm_cntr_assign_enabled;
>> };
>> static inline struct rdt_hw_resource *resctrl_to_arch_res(struct
>> rdt_resource *r)
>> @@ -536,6 +541,14 @@ int resctrl_arch_set_cdp_enabled(enum
>> resctrl_res_level l, bool enable);
>> void arch_mon_domain_online(struct rdt_resource *r, struct
>> rdt_mon_domain *d);
>> +static inline bool resctrl_arch_get_abmc_enabled(void)
>
> This function will be called by resctrl fs code. Please contain the "abmc"
> naming to the
> x86 architecture code and let resctrl fs just refer to it as
> "mbm_assign"/"mbm_cntr_assign".
Sure.
>
>> +{
>> + return rdt_resources_all[RDT_RESOURCE_L3].mbm_cntr_assign_enabled;
>> +}
>> +
>> +int resctrl_arch_mbm_cntr_assign_enable(void);
>> +void resctrl_arch_mbm_cntr_assign_disable(void);
>> +
>> /*
>> * To return the common struct rdt_resource, which is contained in struct
>> * rdt_hw_resource, walk the resctrl member of struct rdt_hw_resource.
>
> Reinette
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 20/22] x86/resctrl: Enable AMD ABMC feature by default when supported
2024-08-16 22:33 ` Reinette Chatre
@ 2024-08-19 18:18 ` Moger, Babu
2024-08-20 18:12 ` Reinette Chatre
0 siblings, 1 reply; 96+ messages in thread
From: Moger, Babu @ 2024-08-19 18:18 UTC (permalink / raw)
To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Reinette,
On 8/16/24 17:33, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/6/24 3:00 PM, Babu Moger wrote:
>> Enable ABMC by default when supported during the boot up.
>>
>> Users will not see any difference in the behavior when resctrl is
>> mounted. With automatic assignment everything will work as running
>> in the legacy monitor mode.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v6 : Keeping the default enablement in arch init code for now.
>> This may need some discussion.
>> Renamed resctrl_arch_configure_abmc to
>> resctrl_arch_mbm_cntr_assign_configure.
>>
>> v5: New patch to enable ABMC by default.
>> ---
>> arch/x86/kernel/cpu/resctrl/core.c | 2 ++
>> arch/x86/kernel/cpu/resctrl/internal.h | 1 +
>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 17 +++++++++++++++++
>> 3 files changed, 20 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c
>> b/arch/x86/kernel/cpu/resctrl/core.c
>> index 6fb0cfdb5529..a7980f84c487 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>> @@ -599,6 +599,7 @@ static void domain_add_cpu_mon(int cpu, struct
>> rdt_resource *r)
>> d = container_of(hdr, struct rdt_mon_domain, hdr);
>> cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
>> + resctrl_arch_mbm_cntr_assign_configure();
>> return;
>> }
>> @@ -620,6 +621,7 @@ static void domain_add_cpu_mon(int cpu, struct
>> rdt_resource *r)
>> arch_mon_domain_online(r, d);
>> resctrl_mbm_evt_config_init(hw_dom);
>> + resctrl_arch_mbm_cntr_assign_configure();
>> if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
>> mon_domain_free(hw_dom);
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
>> b/arch/x86/kernel/cpu/resctrl/internal.h
>> index cc832955b787..ba3012f8f940 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -685,6 +685,7 @@ int mbm_cntr_alloc(struct rdt_resource *r);
>> void mbm_cntr_free(u32 cntr_id);
>> void resctrl_mbm_evt_config_init(struct rdt_hw_mon_domain *hw_dom);
>> unsigned int mon_event_config_index_get(u32 evtid);
>> +void resctrl_arch_mbm_cntr_assign_configure(void);
>> int resctrl_arch_assign_cntr(struct rdt_mon_domain *d, enum
>> resctrl_event_id evtid,
>> u32 rmid, u32 cntr_id, u32 closid, bool assign);
>> int rdtgroup_assign_cntr(struct rdtgroup *rdtgrp, enum
>> resctrl_event_id evtid);
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 66febff2a3d3..d15fd1bde5f4 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -2756,6 +2756,23 @@ void resctrl_arch_mbm_cntr_assign_disable(void)
>> }
>> }
>> +void resctrl_arch_mbm_cntr_assign_configure(void)
>> +{
>> + struct rdt_resource *r =
>> &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> + struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>> + bool enable = true;
>> +
>> + mutex_lock(&rdtgroup_mutex);
>> +
>> + if (r->mon.mbm_cntr_assignable) {
>> + if (!hw_res->mbm_cntr_assign_enabled)
>> + hw_res->mbm_cntr_assign_enabled = true;
>> + resctrl_abmc_set_one_amd(&enable);
>
> Earlier changelogs mentioned that counters are reset when ABMC is enabled.
> How does that behave here when one CPU comes online? Consider the scenario
> where
> a system is booted without all CPUs online. ABMC is initially enabled on
> all online
> CPUs with this flow ... user space could start using resctrl fs and create
> monitor groups that start accumulating architectural state. If the remaining
> CPUs come online at this point and this snippet enables ABMC, would it reset
> all counters? Should the architectural state be cleared?
When new cpu comes online, it should inherit the abmc state which is set
already. it should not force it either way. In that case, it is not
required to reset the architectural state.
Responded to your earlier comment.
https://lore.kernel.org/lkml/0256b457-175d-4923-aa49-00e8e52b865b@amd.com/
>
> Also, it still does not look right that the architecture decides the policy.
> Could this enabling be moved to resctrl_online_cpu() for resctrl fs to
> request architecture to enable assignable counters if it is supported?
Sure. Will move the resctrl_arch_mbm_cntr_assign_configure() here with
changes just to update the abmc state which is set during the init.
>
>> + }
>> +
>> + mutex_unlock(&rdtgroup_mutex);
>> +}
>> +
>> /*
>> * We don't allow rdtgroup directories to be created anywhere
>> * except the root directory. Thus when looking for the rdtgroup
>
> Reinette
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 19/22] x86/resctrl: Introduce the interface to switch between monitor modes
2024-08-19 14:52 ` Reinette Chatre
@ 2024-08-19 18:27 ` Peter Newman
2024-08-20 18:11 ` Reinette Chatre
0 siblings, 1 reply; 96+ messages in thread
From: Peter Newman @ 2024-08-19 18:27 UTC (permalink / raw)
To: Reinette Chatre
Cc: James Morse, Babu Moger, x86, hpa, paulmck, rdunlap, tj, peterz,
yanjiewtw, kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao,
jpoimboe, rick.p.edgecombe, kirill.shutemov, jithu.joseph,
kai.huang, kan.liang, daniel.sneddon, pbonzini, sandipan.das,
ilpo.jarvinen, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, mingo, bp, corbet, dave.hansen, fenghua.yu, tglx
Hi Reinette,
On Mon, Aug 19, 2024 at 7:53 AM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> Hi Peter and James,
>
> On 8/16/24 11:09 AM, Reinette Chatre wrote:
> > Hi Peter,
> >
> > On 8/16/24 10:16 AM, Peter Newman wrote:
> >> Hi Reinette,
> >>
> >> On Fri, Aug 16, 2024 at 10:01 AM Reinette Chatre
> >> <reinette.chatre@intel.com> wrote:
> >>>
> >>> Hi James,
> >>>
> >>> On 8/16/24 9:31 AM, James Morse wrote:
> >>>> Hi Babu,
> >>>>
> >>>> On 06/08/2024 23:00, Babu Moger wrote:
> >>>>> Introduce interface to switch between ABMC and legacy modes.
> >>>>>
> >>>>> By default ABMC is enabled on boot if the feature is available.
> >>>>> Provide the interface to go back to legacy mode if required.
> >>>>
> >>>> I may have missed it on an earlier version ... why would anyone want the non-ABMC
> >>>> behaviour on hardware that requires it: counters randomly reset and randomly return
> >>>> 'Unavailable'... is that actually useful?
> >>>>
> >>>> You default this to on, so there isn't a backward compatibility argument here.
> >>>>
> >>>> It seems like being able to disable this is a source of complexity - is it needed?
> >>>
> >>> The ability to go back to legacy was added while looking ahead to support the next
> >>> "assignable counter" feature that is software based ("soft-RMID" .. "soft-ABMC"?).
> >>>
> >>> This series adds support for ABMC on recent AMD hardware to address the issue described
> >>> in cover letter. This issue also exists on earlier AMD hardware that does not have the ABMC
> >>> feature and Peter is working on a software solution to address the issue on non-ABMC hardware.
> >>> This software solution is expected to have the same interface as the hardware solution but
> >>> earlier discussions revealed that it may introduce extra latency that users may only want to
> >>> accept during periods of active monitoring. Thus the option to disable the counter assignment
> >>> mode.
> >>
> >> Sorry again for the soft-RMID/soft-ABMC confusion[1], it was soft-RMID
> >> that impacted context switch latency. Soft-ABMC does not require any
> >> additional work at context switch.
> >
> > No problem. I did read [1] but I do not think I've seen soft-ABMC yet so
> > my understanding of what it does is vague.
> >
> >> The only disadvantage to soft-ABMC I can think of is that it also
> >> limits reading llc_occupancy event counts to "assigned" groups,
> >> whereas without it, llc_occupancy works reliably on all RMIDs on AMD
> >> hardware.
> >
> > hmmm ... keeping original llc_occupancy behavior does seem useful enough
> > as motivation to keep the "legacy"/"default" mbm_assign_mode? It does sound
> > to me as though soft-ABMC may not be as accurate when it comes to llc_occupancy.
> > As I understand the hardware may tag entries in cache with RMID and that has a longer
> > lifetime than the tasks that allocated that data into the cache. If soft-ABMC
> > permanently associates an RMID with a local and total counter pair but that
> > RMID is dynamically assigned to resctrl groups then a group may not always
> > get the same RMID ... and thus its llc_occupancy data would be a combination of
> > its cache allocations and all the cache allocations of resource groups that had
> > that RMID before it. This may need significantly enhanced "limbo" handling?
>
For the use case of soft-ABMC that I'm aware of, it would be better to
disable llc_occupancy events and accept it as a limitation as we're
not using this feature. I don't want to slow down the rate at which
MBM counters could be reassigned. Over the course of a multiple-second
bandwidth measurement window on a bandwidth-saturated host, a previous
group's initial cache occupancy isn't significant enough to justify a
limbo period, especially when padded out to 1 second.
I would feel differently if my users were more interested in
llc_occupancy counts and it was possible for the LLC to immediately
notify when the occupancy threshold for any of a set of groups has
been crossed.
> To expand on this we may have to rework the interface if the counters can be
> assigned to events other than MBM.
>
> James: could you please elaborate how you plan to use this feature and if this
> interface works for the planned usage?
>
> Peter: considering the previous example [1] where soft-ABMC was using the "mbm_control"
> interface I do not think it is ideal to only use the "t" and "l" flags while
> llc_occupancy is also enabled/disabled via this interface. We should consider
> (a) renaming the control file to indicate larger scope than MBM, (b) add flags
> for llc_occupancy. What do you think? I believe this is in line with stated goal
> from [1]: "I believe mbm_control should always accurately reflect which events
> are being counted."
I should have said, "I believe mbm_control should always accurately
reflect which _MBM_ events are being counted."
In general, MBM requires maintaining cumulative, running counts, while
llc_occupancy is only a snapshot of cache usage. This is why MBM
results in contended resources (counters) which must be managed by the
user. In the MPAM implementations I've seen so far, a small number
(relative to the number of monitoring groups supported) of occupancy
monitors is sufficient for a large number of groups, because it only
limits the number of monitoring groups' occupancy counts which can be
read in parallel and can be adequately managed within the MPAM driver
without user interaction.
Because of this, broadening the scope of mbm_control to include
occupancy would only serve to remind the user whether occupancy is
supported, but would provide no new information beyond what's already
provided by mon_features.
-Peter
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 07/22] x86/resctrl: Introduce the interface to display monitor mode
2024-08-16 21:32 ` Reinette Chatre
@ 2024-08-19 19:27 ` Moger, Babu
0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-08-19 19:27 UTC (permalink / raw)
To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Reinette,
On 8/16/24 16:32, Reinette Chatre wrote:
> Hi Babu,
>
> (expanding on what James said)
>
> On 8/6/24 3:00 PM, Babu Moger wrote:
>> The mbm_mode displays list of monitor modes supported.
>>
>> The mbm_cntr_assign is one of the currently supported modes. It is also
>> called ABMC (Assignable Bandwidth Monitoring Counters) feature. ABMC
>> feature provides option to assign a hardware counter to an RMID and
>> monitor the bandwidth as long as it is assigned. ABMC mode is enabled
>> by default when supported.
>>
>> Legacy mode works without the assignment option.
>>
>> Provide an interface to display the monitor mode on the system.
>> $cat /sys/fs/resctrl/info/L3_MON/mbm_mode
>> [mbm_cntr_assign]
>> legacy
>>
>> Switching the mbm_mode will reset all the mbm counters of all resctrl
>> groups.
>
> The changelog also needs to be clear to distinguish the resctrl fs
> "mbm_cntr_assign" mode from how it is backed by ABMC on AMD hardware.
>
> for example (please improve):
>
> Introduce "mbm_cntr_assign" mode that provides the option to assign a
> hardware counter to an RMID and monitor the bandwidth as long as it is
> assigned. On AMD systems "mbm_cntr_assign" is backed by the ABMC
> (Assignable
> Bandwidth Monitoring Counters) hardware feature. "mbm_cntr_assign" mode
> is enabled by default when supported.
>
> "default" mode is the existing monitoring mode that works without the
> explicit counter assignment, instead relying on dynamic counter
> assignment
> by hardware that may result in hardware not dedicating a counter
> resulting in
> monitoring data reads returning "Unavailable".
>
> Provide an interface to display the monitor mode on the system.
> $cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> [mbm_cntr_assign]
> default
>
> Switching the mbm_assign_mode will reset all the MBM counters of all
> resctrl
> groups.
Looks good thanks
>
>
> ....
>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 6075b1e5bb77..d8f85b20ab8f 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -845,6 +845,26 @@ static int rdtgroup_rmid_show(struct
>> kernfs_open_file *of,
>> return ret;
>> }
>> +static int rdtgroup_mbm_mode_show(struct kernfs_open_file *of,
>> + struct seq_file *s, void *v)
>> +{
>> + struct rdt_resource *r = of->kn->parent->priv;
>> +
>> + if (r->mon.mbm_cntr_assignable) {
>> + if (resctrl_arch_get_abmc_enabled()) {
>
> Since this state can change during runtime this access needs to be protected.
Sure. Will do.
>
>> + seq_puts(s, "[mbm_cntr_assign]\n");
>> + seq_puts(s, "legacy\n");
>> + } else {
>> + seq_puts(s, "mbm_cntr_assign\n");
>> + seq_puts(s, "[legacy]\n");
>> + }
>> + } else {
>> + seq_puts(s, "[legacy]\n");
>> + }
>> +
>> + return 0;
>> +}
>> +
>> #ifdef CONFIG_PROC_CPU_RESCTRL
>> /*
>
> Reinette
>
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 08/22] x86/resctrl: Introduce interface to display number of monitoring counters
2024-08-16 21:34 ` Reinette Chatre
@ 2024-08-20 15:56 ` Moger, Babu
2024-08-20 18:08 ` Reinette Chatre
0 siblings, 1 reply; 96+ messages in thread
From: Moger, Babu @ 2024-08-20 15:56 UTC (permalink / raw)
To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Reinette,
On 8/16/24 16:34, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/6/24 3:00 PM, Babu Moger wrote:
>> The ABMC feature provides an option to the user to assign a hardware
>
> Here and in all patches, when referring to resctrl fs please use the more
> generic "mbm_assign_cntr" mode to distinguish it from the
> hardware/architecture
> specific code that involves ABMC. Something like
>
> "The ABMC feature provides" -> ""mbm_cntr_assign" mode provides"
Sure.
>
> I also think that being explicit with this separation will help us to see
> gaps in interface between resctrl fs and arch.
Yes.
>
>> counter to an RMID and monitor the bandwidth as long as the counter is
>
> Please clarify the scope of this feature. Above mentions that a counter is
> assigned to an RMID but later it is mentioned that the counter is assigned
> to an event. Perhaps consistently mention that a counter is assigned to
> a RMID,event pair?
Yes.
"a counter is assigned to an RMID,event pair" gives more clarity.
Will changes it.
>
>> assigned. Number of assignments depend on number of monitoring counters
>> available.
>>
>> Provide the interface to display the number of monitoring counters
>> supported.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v6: No changes.
>>
>> v5: Changed the display name from num_cntrs to num_mbm_cntrs.
>> Updated the commit message.
>> Moved the patch after mbm_mode is introduced.
>>
>> v4: Changed the counter name to num_cntrs. And few text changes.
>>
>> v3: Changed the field name to mbm_assign_cntrs.
>>
>> v2: Changed the field name to mbm_assignable_counters from abmc_counte
>> ---
>> Documentation/arch/x86/resctrl.rst | 3 +++
>> arch/x86/kernel/cpu/resctrl/monitor.c | 2 ++
>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 16 ++++++++++++++++
>> 3 files changed, 21 insertions(+)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst
>> b/Documentation/arch/x86/resctrl.rst
>> index d4ec605b200a..fe9f10766c4f 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -291,6 +291,9 @@ with the following files:
>> as long as there are enough RMID counters available to support
>> number
>> of monitoring groups.
>> +"num_mbm_cntrs":
>> + The number of monitoring counters available for assignment.
>> +
>> "max_threshold_occupancy":
>> Read/write file provides the largest value (in
>> bytes) at which a previously used LLC_occupancy
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
>> b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 5e8706ab6361..83329cefebf7 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -1242,6 +1242,8 @@ int __init rdt_get_mon_l3_config(struct
>> rdt_resource *r)
>> r->mon.num_mbm_cntrs = (ebx & 0xFFFF) + 1;
>> if (WARN_ON(r->mon.num_mbm_cntrs > 64))
>> r->mon.num_mbm_cntrs = 64;
>> +
>> + resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
>
> The arch code should not access the resctrl file flags. This should be
> moved to make
> the MPAM support easier. With the arch code setting
> r->mon.mbm_cntr_assignable the
> fs code can use that to set the flags. Something similar to below patch is
> needed:
> https://lore.kernel.org/lkml/20240802172853.22529-27-james.morse@arm.com/
It is just moving the calls resctrl_file_fflags_init() to resctrl_init().
The rdt_resource fields are already setup here. Something like
https://lore.kernel.org/lkml/20240802172853.22529-20-james.morse@arm.com/
I feel it is better done when MBAM fs/arch separation.
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 11/22] x86/resctrl: Remove MSR reading of event configuration value
2024-08-16 21:36 ` Reinette Chatre
@ 2024-08-20 16:19 ` Moger, Babu
2024-08-20 18:09 ` Reinette Chatre
0 siblings, 1 reply; 96+ messages in thread
From: Moger, Babu @ 2024-08-20 16:19 UTC (permalink / raw)
To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Reinette,
On 8/16/24 16:36, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/6/24 3:00 PM, Babu Moger wrote:
>> The event configuration is domain specific and initialized during domain
>> initialization. The values is stored in rdt_hw_mon_domain.
>
> "The values is stored in rdt_hw_mon_domain." -> "The values are stored
> in struct rdt_hw_mon_domain."
Sure.
>
>>
>> It is not required to read the configuration register every time user asks
>> for it. Use the value stored in rdt_hw_mon_domain instead.
>
> "rdt_hw_mon_domain" -> "struct rdt_hw_mon_domain"
Sure.
>
>>
>> Introduce resctrl_arch_event_config_get() and
>> resctrl_arch_event_config_set() to get/set architecture domain specific
>> mbm_total_cfg/mbm_local_cfg values. Also, remove unused config value
>> definitions.
>
> hmmm ... while the config values are not used they are now established
> ABI and any other architecture that wants to support configurable events
> will need to follow these definitions. It is thus required to keep them
> documented in the kernel in support of future changes. I
> understand that they are documented in user docs, but could we keep them
> in the kernel code also? Since they are unused they could perhaps be moved
> to comments as a compromise?
How about just keeping them as is? I will just not remove it.
>
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v6: Fixed inconstancy with types. Made all the types to u32 for config
>> value.
>> Removed few rdt_last_cmd_puts as it is not necessary.
>> Removed unused config value definitions.
>> Few more updates to commit message.
>>
>> v5: Introduced resctrl_arch_event_config_get and
>> resctrl_arch_event_config_get() based on our discussion.
>>
>> https://lore.kernel.org/lkml/68e861f9-245d-4496-a72e-46fc57d19c62@amd.com/
>>
>> v4: New patch.
>> ---
>> arch/x86/kernel/cpu/resctrl/internal.h | 21 -----
>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 104 ++++++++++++++-----------
>> include/linux/resctrl.h | 4 +
>> 3 files changed, 64 insertions(+), 65 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
>> b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 4d8cc36a8d79..1021227d8c7e 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -32,27 +32,6 @@
>> */
>> #define MBM_CNTR_WIDTH_OFFSET_MAX (62 - MBM_CNTR_WIDTH_BASE)
>> -/* Reads to Local DRAM Memory */
>> -#define READS_TO_LOCAL_MEM BIT(0)
>> -
>> -/* Reads to Remote DRAM Memory */
>> -#define READS_TO_REMOTE_MEM BIT(1)
>> -
>> -/* Non-Temporal Writes to Local Memory */
>> -#define NON_TEMP_WRITE_TO_LOCAL_MEM BIT(2)
>> -
>> -/* Non-Temporal Writes to Remote Memory */
>> -#define NON_TEMP_WRITE_TO_REMOTE_MEM BIT(3)
>> -
>> -/* Reads to Local Memory the system identifies as "Slow Memory" */
>> -#define READS_TO_LOCAL_S_MEM BIT(4)
>> -
>> -/* Reads to Remote Memory the system identifies as "Slow Memory" */
>> -#define READS_TO_REMOTE_S_MEM BIT(5)
>> -
>> -/* Dirty Victims to All Types of Memory */
>> -#define DIRTY_VICTIMS_TO_ALL_MEM BIT(6)
>> -
>> /* Max event bits supported */
>> #define MAX_EVT_CONFIG_BITS GENMASK(6, 0)
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 02afd3442876..0047b4eb0ff5 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1605,10 +1605,57 @@ static int rdtgroup_size_show(struct
>> kernfs_open_file *of,
>> }
>> struct mon_config_info {
>> + struct rdt_mon_domain *d;
>> u32 evtid;
>> u32 mon_config;
>> };
>> +u32 resctrl_arch_event_config_get(struct rdt_mon_domain *d,
>> + enum resctrl_event_id eventid)
>> +{
>> + struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
>> +
>> + switch (eventid) {
>> + case QOS_L3_OCCUP_EVENT_ID:
>> + break;
>> + case QOS_L3_MBM_TOTAL_EVENT_ID:
>> + return hw_dom->mbm_total_cfg;
>> + case QOS_L3_MBM_LOCAL_EVENT_ID:
>> + return hw_dom->mbm_local_cfg;
>> + }
>> +
>> + /* Never expect to get here */
>> + WARN_ON_ONCE(1);
>> +
>> + return INVALID_CONFIG_VALUE;
>> +}
>> +
>> +void resctrl_arch_event_config_set(void *info)
>> +{
>> + struct mon_config_info *mon_info = info;
>> + struct rdt_hw_mon_domain *hw_dom;
>> + unsigned int index;
>> +
>> + index = mon_event_config_index_get(mon_info->evtid);
>> + if (index == INVALID_CONFIG_INDEX)
>> + return;
>> +
>> + wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
>> +
>> + hw_dom = resctrl_to_arch_mon_dom(mon_info->d);
>> +
>> + switch (mon_info->evtid) {
>> + case QOS_L3_OCCUP_EVENT_ID:
>> + break;
>> + case QOS_L3_MBM_TOTAL_EVENT_ID:
>> + hw_dom->mbm_total_cfg = mon_info->mon_config;
>> + break;
>> + case QOS_L3_MBM_LOCAL_EVENT_ID:
>> + hw_dom->mbm_local_cfg = mon_info->mon_config;
>> + break;
>> + }
>> +}
>> +
>> /**
>> * mon_event_config_index_get - get the hardware index for the
>> * configurable event
>> @@ -1631,33 +1678,11 @@ unsigned int mon_event_config_index_get(u32 evtid)
>> }
>> }
>> -static void mon_event_config_read(void *info)
>> -{
>> - struct mon_config_info *mon_info = info;
>> - unsigned int index;
>> - u64 msrval;
>> -
>> - index = mon_event_config_index_get(mon_info->evtid);
>> - if (index == INVALID_CONFIG_INDEX) {
>> - pr_warn_once("Invalid event id %d\n", mon_info->evtid);
>> - return;
>> - }
>> - rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval);
>> -
>> - /* Report only the valid event configuration bits */
>> - mon_info->mon_config = msrval & MAX_EVT_CONFIG_BITS;
>> -}
>> -
>> -static void mondata_config_read(struct rdt_mon_domain *d, struct
>> mon_config_info *mon_info)
>> -{
>> - smp_call_function_any(&d->hdr.cpu_mask, mon_event_config_read,
>> mon_info, 1);
>> -}
>> -
>> static int mbm_config_show(struct seq_file *s, struct rdt_resource *r,
>> u32 evtid)
>> {
>> - struct mon_config_info mon_info = {0};
>> struct rdt_mon_domain *dom;
>> bool sep = false;
>> + u32 val;
>> cpus_read_lock();
>> mutex_lock(&rdtgroup_mutex);
>> @@ -1666,11 +1691,11 @@ static int mbm_config_show(struct seq_file *s,
>> struct rdt_resource *r, u32 evtid
>> if (sep)
>> seq_puts(s, ";");
>> - memset(&mon_info, 0, sizeof(struct mon_config_info));
>> - mon_info.evtid = evtid;
>> - mondata_config_read(dom, &mon_info);
>> + val = resctrl_arch_event_config_get(dom, evtid);
>> + if (val == INVALID_CONFIG_VALUE)
>
> Can this check and the "break" that follows be dropped? val being
> INVALID_CONFIG_VALUE would be a kernel bug and
> resctrl_arch_event_config_get()
> would already have printed the WARN. In this unlikely scenario I find it
> unexpected that mbm_config_show() will return success in this case and the
> below seq_printf() would handle the printing of INVALID_CONFIG_VALUE without
> issue anyway.
Sure. I will drop the check and break.
>
>> + break;
>> - seq_printf(s, "%d=0x%02x", dom->hdr.id, mon_info.mon_config);
>> + seq_printf(s, "%d=0x%02x", dom->hdr.id, val);
>> sep = true;
>> }
>> seq_puts(s, "\n");
>
> Reinette
>
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 08/22] x86/resctrl: Introduce interface to display number of monitoring counters
2024-08-20 15:56 ` Moger, Babu
@ 2024-08-20 18:08 ` Reinette Chatre
0 siblings, 0 replies; 96+ messages in thread
From: Reinette Chatre @ 2024-08-20 18:08 UTC (permalink / raw)
To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
On 8/20/24 8:56 AM, Moger, Babu wrote:
> On 8/16/24 16:34, Reinette Chatre wrote:
>> On 8/6/24 3:00 PM, Babu Moger wrote:
>>> diff --git a/Documentation/arch/x86/resctrl.rst
>>> b/Documentation/arch/x86/resctrl.rst
>>> index d4ec605b200a..fe9f10766c4f 100644
>>> --- a/Documentation/arch/x86/resctrl.rst
>>> +++ b/Documentation/arch/x86/resctrl.rst
>>> @@ -291,6 +291,9 @@ with the following files:
>>> as long as there are enough RMID counters available to support
>>> number
>>> of monitoring groups.
>>> +"num_mbm_cntrs":
>>> + The number of monitoring counters available for assignment.
>>> +
>>> "max_threshold_occupancy":
>>> Read/write file provides the largest value (in
>>> bytes) at which a previously used LLC_occupancy
>>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
>>> b/arch/x86/kernel/cpu/resctrl/monitor.c
>>> index 5e8706ab6361..83329cefebf7 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>>> @@ -1242,6 +1242,8 @@ int __init rdt_get_mon_l3_config(struct
>>> rdt_resource *r)
>>> r->mon.num_mbm_cntrs = (ebx & 0xFFFF) + 1;
>>> if (WARN_ON(r->mon.num_mbm_cntrs > 64))
>>> r->mon.num_mbm_cntrs = 64;
>>> +
>>> + resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
>>
>> The arch code should not access the resctrl file flags. This should be
>> moved to make
>> the MPAM support easier. With the arch code setting
>> r->mon.mbm_cntr_assignable the
>> fs code can use that to set the flags. Something similar to below patch is
>> needed:
>> https://lore.kernel.org/lkml/20240802172853.22529-27-james.morse@arm.com/
>
> It is just moving the calls resctrl_file_fflags_init() to resctrl_init().
> The rdt_resource fields are already setup here. Something like
> https://lore.kernel.org/lkml/20240802172853.22529-20-james.morse@arm.com/
>
> I feel it is better done when MBAM fs/arch separation.
Indeed, it belongs with the rest of the mon state setup that is organized
as part of the MPAM work.
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 09/22] x86/resctrl: Introduce MBM counters bitmap
2024-08-19 15:49 ` Moger, Babu
@ 2024-08-20 18:08 ` Reinette Chatre
0 siblings, 0 replies; 96+ messages in thread
From: Reinette Chatre @ 2024-08-20 18:08 UTC (permalink / raw)
To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
On 8/19/24 8:49 AM, Moger, Babu wrote:
> On 8/16/24 16:35, Reinette Chatre wrote:
>> On 8/6/24 3:00 PM, Babu Moger wrote:
>> resource
>> as parameter but freeing a counter does not. James already proposed different
>> treatment of the bitmap and L3 resource parameters, I expect with such
>> guidance
>> the interfaces will become more intuitive.
>>
>
> How about making "mbm_cntrs_free_map" as part of struct resctrl_mon?
> It will be pointer and allocated dynamically based on number of counters.
> All the related information (num_mbm_cntrs and mbm_cntr_assignable) is
> already part of this data structure.
That sounds good.
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 11/22] x86/resctrl: Remove MSR reading of event configuration value
2024-08-20 16:19 ` Moger, Babu
@ 2024-08-20 18:09 ` Reinette Chatre
0 siblings, 0 replies; 96+ messages in thread
From: Reinette Chatre @ 2024-08-20 18:09 UTC (permalink / raw)
To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
On 8/20/24 9:19 AM, Moger, Babu wrote:
> On 8/16/24 16:36, Reinette Chatre wrote:
>> On 8/6/24 3:00 PM, Babu Moger wrote:
>>> Introduce resctrl_arch_event_config_get() and
>>> resctrl_arch_event_config_set() to get/set architecture domain specific
>>> mbm_total_cfg/mbm_local_cfg values. Also, remove unused config value
>>> definitions.
>>
>> hmmm ... while the config values are not used they are now established
>> ABI and any other architecture that wants to support configurable events
>> will need to follow these definitions. It is thus required to keep them
>> documented in the kernel in support of future changes. I
>> understand that they are documented in user docs, but could we keep them
>> in the kernel code also? Since they are unused they could perhaps be moved
>> to comments as a compromise?
>
> How about just keeping them as is? I will just not remove it.
>
I am not aware of any policy here. I'm ok with keeping them as is. I do not
know if there are any static checkers that may complain, if there are then
the defines can be moved to comments.
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 19/22] x86/resctrl: Introduce the interface to switch between monitor modes
2024-08-19 18:27 ` Peter Newman
@ 2024-08-20 18:11 ` Reinette Chatre
0 siblings, 0 replies; 96+ messages in thread
From: Reinette Chatre @ 2024-08-20 18:11 UTC (permalink / raw)
To: Peter Newman
Cc: James Morse, Babu Moger, x86, hpa, paulmck, rdunlap, tj, peterz,
yanjiewtw, kim.phillips, lukas.bulwahn, seanjc, jmattson, leitao,
jpoimboe, rick.p.edgecombe, kirill.shutemov, jithu.joseph,
kai.huang, kan.liang, daniel.sneddon, pbonzini, sandipan.das,
ilpo.jarvinen, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, mingo, bp, corbet, dave.hansen, fenghua.yu, tglx
Hi Peter,
On 8/19/24 11:27 AM, Peter Newman wrote:
> On Mon, Aug 19, 2024 at 7:53 AM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>> On 8/16/24 11:09 AM, Reinette Chatre wrote:
>>> On 8/16/24 10:16 AM, Peter Newman wrote:
>>>> On Fri, Aug 16, 2024 at 10:01 AM Reinette Chatre
>>>> <reinette.chatre@intel.com> wrote:
>>>>> On 8/16/24 9:31 AM, James Morse wrote:
>>>>>> On 06/08/2024 23:00, Babu Moger wrote:
>>>>>>> Introduce interface to switch between ABMC and legacy modes.
>>>>>>>
>>>>>>> By default ABMC is enabled on boot if the feature is available.
>>>>>>> Provide the interface to go back to legacy mode if required.
>>>>>>
>>>>>> I may have missed it on an earlier version ... why would anyone want the non-ABMC
>>>>>> behaviour on hardware that requires it: counters randomly reset and randomly return
>>>>>> 'Unavailable'... is that actually useful?
>>>>>>
>>>>>> You default this to on, so there isn't a backward compatibility argument here.
>>>>>>
>>>>>> It seems like being able to disable this is a source of complexity - is it needed?
>>>>>
>>>>> The ability to go back to legacy was added while looking ahead to support the next
>>>>> "assignable counter" feature that is software based ("soft-RMID" .. "soft-ABMC"?).
>>>>>
>>>>> This series adds support for ABMC on recent AMD hardware to address the issue described
>>>>> in cover letter. This issue also exists on earlier AMD hardware that does not have the ABMC
>>>>> feature and Peter is working on a software solution to address the issue on non-ABMC hardware.
>>>>> This software solution is expected to have the same interface as the hardware solution but
>>>>> earlier discussions revealed that it may introduce extra latency that users may only want to
>>>>> accept during periods of active monitoring. Thus the option to disable the counter assignment
>>>>> mode.
>>>>
>>>> Sorry again for the soft-RMID/soft-ABMC confusion[1], it was soft-RMID
>>>> that impacted context switch latency. Soft-ABMC does not require any
>>>> additional work at context switch.
>>>
>>> No problem. I did read [1] but I do not think I've seen soft-ABMC yet so
>>> my understanding of what it does is vague.
>>>
>>>> The only disadvantage to soft-ABMC I can think of is that it also
>>>> limits reading llc_occupancy event counts to "assigned" groups,
>>>> whereas without it, llc_occupancy works reliably on all RMIDs on AMD
>>>> hardware.
>>>
>>> hmmm ... keeping original llc_occupancy behavior does seem useful enough
>>> as motivation to keep the "legacy"/"default" mbm_assign_mode? It does sound
>>> to me as though soft-ABMC may not be as accurate when it comes to llc_occupancy.
>>> As I understand the hardware may tag entries in cache with RMID and that has a longer
>>> lifetime than the tasks that allocated that data into the cache. If soft-ABMC
>>> permanently associates an RMID with a local and total counter pair but that
>>> RMID is dynamically assigned to resctrl groups then a group may not always
>>> get the same RMID ... and thus its llc_occupancy data would be a combination of
>>> its cache allocations and all the cache allocations of resource groups that had
>>> that RMID before it. This may need significantly enhanced "limbo" handling?
>>
>
> For the use case of soft-ABMC that I'm aware of, it would be better to
> disable llc_occupancy events and accept it as a limitation as we're
> not using this feature. I don't want to slow down the rate at which
> MBM counters could be reassigned. Over the course of a multiple-second
> bandwidth measurement window on a bandwidth-saturated host, a previous
> group's initial cache occupancy isn't significant enough to justify a
> limbo period, especially when padded out to 1 second.
This sounds fair. It also sounds like a motivation for user space to
be able to enable/disable soft-ABMC to be able to disable/enable
llc_occupancy.
>
> I would feel differently if my users were more interested in
> llc_occupancy counts and it was possible for the LLC to immediately
> notify when the occupancy threshold for any of a set of groups has
> been crossed.
>
>> To expand on this we may have to rework the interface if the counters can be
>> assigned to events other than MBM.
>>
>> James: could you please elaborate how you plan to use this feature and if this
>> interface works for the planned usage?
>>
>> Peter: considering the previous example [1] where soft-ABMC was using the "mbm_control"
>> interface I do not think it is ideal to only use the "t" and "l" flags while
>> llc_occupancy is also enabled/disabled via this interface. We should consider
>> (a) renaming the control file to indicate larger scope than MBM, (b) add flags
>> for llc_occupancy. What do you think? I believe this is in line with stated goal
>> from [1]: "I believe mbm_control should always accurately reflect which events
>> are being counted."
>
> I should have said, "I believe mbm_control should always accurately
> reflect which _MBM_ events are being counted."
>
> In general, MBM requires maintaining cumulative, running counts, while
> llc_occupancy is only a snapshot of cache usage. This is why MBM
> results in contended resources (counters) which must be managed by the
> user. In the MPAM implementations I've seen so far, a small number
> (relative to the number of monitoring groups supported) of occupancy
> monitors is sufficient for a large number of groups, because it only
> limits the number of monitoring groups' occupancy counts which can be
> read in parallel and can be adequately managed within the MPAM driver
> without user interaction.
Thank you very much for keeping an eye on the MPAM requirements.
>
> Because of this, broadening the scope of mbm_control to include
> occupancy would only serve to remind the user whether occupancy is
> supported, but would provide no new information beyond what's already
> provided by mon_features.
Thank you.
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 20/22] x86/resctrl: Enable AMD ABMC feature by default when supported
2024-08-19 18:18 ` Moger, Babu
@ 2024-08-20 18:12 ` Reinette Chatre
2024-08-20 20:04 ` Moger, Babu
0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-08-20 18:12 UTC (permalink / raw)
To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
On 8/19/24 11:18 AM, Moger, Babu wrote:
> On 8/16/24 17:33, Reinette Chatre wrote:
>> On 8/6/24 3:00 PM, Babu Moger wrote:
>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> index 66febff2a3d3..d15fd1bde5f4 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> @@ -2756,6 +2756,23 @@ void resctrl_arch_mbm_cntr_assign_disable(void)
>>> }
>>> }
>>> +void resctrl_arch_mbm_cntr_assign_configure(void)
>>> +{
>>> + struct rdt_resource *r =
>>> &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>>> + struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>>> + bool enable = true;
>>> +
>>> + mutex_lock(&rdtgroup_mutex);
>>> +
>>> + if (r->mon.mbm_cntr_assignable) {
>>> + if (!hw_res->mbm_cntr_assign_enabled)
>>> + hw_res->mbm_cntr_assign_enabled = true;
>>> + resctrl_abmc_set_one_amd(&enable);
>>
>> Earlier changelogs mentioned that counters are reset when ABMC is enabled.
>> How does that behave here when one CPU comes online? Consider the scenario
>> where
>> a system is booted without all CPUs online. ABMC is initially enabled on
>> all online
>> CPUs with this flow ... user space could start using resctrl fs and create
>> monitor groups that start accumulating architectural state. If the remaining
>> CPUs come online at this point and this snippet enables ABMC, would it reset
>> all counters? Should the architectural state be cleared?
>
> When new cpu comes online, it should inherit the abmc state which is set
> already. it should not force it either way. In that case, it is not
> required to reset the architectural state.
>
> Responded to your earlier comment.
> https://lore.kernel.org/lkml/0256b457-175d-4923-aa49-00e8e52b865b@amd.com/
>
>
>>
>> Also, it still does not look right that the architecture decides the policy.
>> Could this enabling be moved to resctrl_online_cpu() for resctrl fs to
>> request architecture to enable assignable counters if it is supported?
>
> Sure. Will move the resctrl_arch_mbm_cntr_assign_configure() here with
> changes just to update the abmc state which is set during the init.
>
I do not think we are seeing it the same way. In your earlier comment you mention:
> We need to set abmc state to "enabled" during the init when abmc is
> detected. resctrl_late_init -> .. -> rdt_get_mon_l3_config
>
> This only happens once during the init.
I do not think that the ABMC state can be set during init since that runs
before the fs code and thus the arch code cannot be aware of the fs policy
that "mbm_assign_mode" is the default. This may become clear when you move
resctrl_arch_mbm_cntr_assign_configure() to resctrl_online_cpu() though
since I expect that the r->mon.mbm_cntr_assignable check will move
into the fs resctrl_online_cpu() that will call the arch helper to
set the state to enabled.
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 06/22] x86/resctrl: Add support to enable/disable AMD ABMC feature
2024-08-19 18:07 ` Moger, Babu
@ 2024-08-20 18:17 ` Reinette Chatre
0 siblings, 0 replies; 96+ messages in thread
From: Reinette Chatre @ 2024-08-20 18:17 UTC (permalink / raw)
To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
On 8/19/24 11:07 AM, Moger, Babu wrote:
> Hi Reinette,
>
> On 8/16/24 16:31, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 8/6/24 3:00 PM, Babu Moger wrote:
>>> Add the functionality to enable/disable AMD ABMC feature.
>>>
>>> AMD ABMC feature is enabled by setting enabled bit(0) in MSR
>>> L3_QOS_EXT_CFG. When the state of ABMC is changed, the MSR needs
>>> to be updated on all the logical processors in the QOS Domain.
>>>
>>> Hardware counters will reset when ABMC state is changed. Reset the
>>
>> Could you please clarify how this works when ABMC state is changed on
>> one CPU in a domain vs all (as done in this patch) CPUs of a domain? In this
>> patch it is clear that all hardware counters are reset and consequently
>> the architectural state maintained by resctrl is reset also. Later, when
>> the code is added to handle CPU online I see that ABMC state is changed
>> on a new online CPU but I do not see matching reset of architectural state.
>> (more in that patch later)
>
> Yes. I missed testing this scenario.
> When new cpu comes online, it should inherit the abmc state which is set
> already. it should not force it either way. In that case, it is not
> required to reset the architectural state.
>
> I need to make few changes to make it work properly.
>
> We need to set abmc state to "enabled" during the init when abmc is
> detected. resctrl_late_init -> .. -> rdt_get_mon_l3_config
>
> This only happens once during the init.
>
> Then during the hotplug, just update the abmc state which is already set
> duing the init. This should work fine.
>
The discussion about this flow is now split between this thread and the
discussion of patch #20. I tried to merge the discussions in my response [1]
to patch #20.
Reinette
[1] https://lore.kernel.org/lkml/6b1ad4b2-d99f-44fb-afcc-b9f48e51df6e@intel.com/
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 12/22] x86/resctrl: Introduce mbm_cntr_map to track counters at domain
2024-08-16 21:37 ` Reinette Chatre
@ 2024-08-20 18:24 ` Moger, Babu
0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-08-20 18:24 UTC (permalink / raw)
To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Reinette,
On 8/16/24 16:37, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/6/24 3:00 PM, Babu Moger wrote:
>> The MBM counters are allocated at resctrl group level. It is tracked by
>
> Are they not allocated globally? (but maybe that is about to change?
No. It is not changing. It is allocated globally and assigned to
RMID,event pair in a resctrl group.
>
>> mbm_cntrs_free_map. Then it is assigned to the domain based on the user
>> input. It needs to be tracked at domain level also.
>
> Please elaborate why it needs to be tracked at domain level.
The user can apply the counter assignment either to a specific domain
within a group or to all domains in the group. The mbm_cntr_map will be
used to track the domain-specific assignments.
>
>>
>> Add the mbm_cntr_map bitmap in rdt_mon_domain structure to keep track of
>
> "rdt_mon_domain structure" -> "struct rdt_mon_domain"
Sure.
>
>> assignment at domain level. The global counter at mbm_cntrs_free_map can
>> be released when assignment at all the domain are cleared.
>
> "all the domain" -> "all the domains"?
>
Sure.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v6: New patch to add domain level assignment.
>> ---
>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 10 ++++++++++
>> include/linux/resctrl.h | 2 ++
>> 2 files changed, 12 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 0047b4eb0ff5..1a90c671a027 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -4127,6 +4127,7 @@ static void __init rdtgroup_setup_default(void)
>> static void domain_destroy_mon_state(struct rdt_mon_domain *d)
>> {
>> + bitmap_free(d->mbm_cntr_map);
>> bitmap_free(d->rmid_busy_llc);
>> kfree(d->mbm_total);
>> kfree(d->mbm_local);
>> @@ -4200,6 +4201,15 @@ static int domain_setup_mon_state(struct
>> rdt_resource *r, struct rdt_mon_domain
>> return -ENOMEM;
>> }
>> }
>> + if (is_mbm_enabled()) {
>
> This should also depend on whether the resource supports counter
> assignment, and that it
> is enabled to ensure that r->mon.num_mbm_cntrs is valid.
I can add the check if(r->mon.mbm_cntr_assignable) .
>
>> + d->mbm_cntr_map = bitmap_zalloc(r->mon.num_mbm_cntrs, GFP_KERNEL);
>> + if (!d->mbm_cntr_map) {
>> + bitmap_free(d->rmid_busy_llc);
>> + kfree(d->mbm_total);
>> + kfree(d->mbm_local);
>> + return -ENOMEM;
>> + }
>> + }
>> return 0;
>> }
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index ef08f75191f2..034fa994e84f 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -105,6 +105,7 @@ struct rdt_ctrl_domain {
>> * @cqm_limbo: worker to periodically read CQM h/w counters
>> * @mbm_work_cpu: worker CPU for MBM h/w counters
>> * @cqm_work_cpu: worker CPU for CQM h/w counters
>> + * @mbm_cntr_map: bitmap to track domain counter assignment
>> */
>> struct rdt_mon_domain {
>> struct rdt_domain_hdr hdr;
>> @@ -116,6 +117,7 @@ struct rdt_mon_domain {
>> struct delayed_work cqm_limbo;
>> int mbm_work_cpu;
>> int cqm_work_cpu;
>> + unsigned long *mbm_cntr_map;
>> };
>> /**
>
> Reinette
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 20/22] x86/resctrl: Enable AMD ABMC feature by default when supported
2024-08-20 18:12 ` Reinette Chatre
@ 2024-08-20 20:04 ` Moger, Babu
2024-08-20 20:18 ` Moger, Babu
0 siblings, 1 reply; 96+ messages in thread
From: Moger, Babu @ 2024-08-20 20:04 UTC (permalink / raw)
To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Reinette,
On 8/20/24 13:12, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/19/24 11:18 AM, Moger, Babu wrote:
>> On 8/16/24 17:33, Reinette Chatre wrote:
>>> On 8/6/24 3:00 PM, Babu Moger wrote:
>
>>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> index 66febff2a3d3..d15fd1bde5f4 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> @@ -2756,6 +2756,23 @@ void resctrl_arch_mbm_cntr_assign_disable(void)
>>>> }
>>>> }
>>>> +void resctrl_arch_mbm_cntr_assign_configure(void)
>>>> +{
>>>> + struct rdt_resource *r =
>>>> &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>>>> + struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>>>> + bool enable = true;
>>>> +
>>>> + mutex_lock(&rdtgroup_mutex);
>>>> +
>>>> + if (r->mon.mbm_cntr_assignable) {
>>>> + if (!hw_res->mbm_cntr_assign_enabled)
>>>> + hw_res->mbm_cntr_assign_enabled = true;
>>>> + resctrl_abmc_set_one_amd(&enable);
>>>
>>> Earlier changelogs mentioned that counters are reset when ABMC is enabled.
>>> How does that behave here when one CPU comes online? Consider the scenario
>>> where
>>> a system is booted without all CPUs online. ABMC is initially enabled on
>>> all online
>>> CPUs with this flow ... user space could start using resctrl fs and create
>>> monitor groups that start accumulating architectural state. If the
>>> remaining
>>> CPUs come online at this point and this snippet enables ABMC, would it
>>> reset
>>> all counters? Should the architectural state be cleared?
>>
>> When new cpu comes online, it should inherit the abmc state which is set
>> already. it should not force it either way. In that case, it is not
>> required to reset the architectural state.
>>
>> Responded to your earlier comment.
>> https://lore.kernel.org/lkml/0256b457-175d-4923-aa49-00e8e52b865b@amd.com/
>>
>>
>>>
>>> Also, it still does not look right that the architecture decides the
>>> policy.
>>> Could this enabling be moved to resctrl_online_cpu() for resctrl fs to
>>> request architecture to enable assignable counters if it is supported?
>>
>> Sure. Will move the resctrl_arch_mbm_cntr_assign_configure() here with
>> changes just to update the abmc state which is set during the init.
>>
>
> I do not think we are seeing it the same way. In your earlier comment you
> mention:
>
>> We need to set abmc state to "enabled" during the init when abmc is
>> detected. resctrl_late_init -> .. -> rdt_get_mon_l3_config
>>
>> This only happens once during the init.
>
>
> I do not think that the ABMC state can be set during init since that runs
> before the fs code and thus the arch code cannot be aware of the fs policy
> that "mbm_assign_mode" is the default. This may become clear when you move
> resctrl_arch_mbm_cntr_assign_configure() to resctrl_online_cpu() though
> since I expect that the r->mon.mbm_cntr_assignable check will move
> into the fs resctrl_online_cpu() that will call the arch helper to
> set the state to enabled.
There are couple of problems here.
1. Hotplug with ABMC enabled.
System is running with ABMC enabled. Now, new cpu cames online.
The function resctrl_arch_mbm_cntr_assign_configure() will set the MSR
MSR_IA32_L3_QOS_EXT_CFG to enable ABMC on the new CPU. This scenario works
fine.
2. Hotplug with ABMC disabled.
Current code will force the system to enable ABMC on the new CPU.
That is not correct.
We need to address both these cases.
I was thinking of separating the functionality in
resctrl_arch_mbm_cntr_assign_configure() into two.
a. Just set the mbm_cntr_assign_enabled to true during the init.
if (r->mon.mbm_cntr_assignable)
hw_res->mbm_cntr_assign_enabled = true;
This is similar to rdtgroup_setup_default(). Isn't it?
b. Change the functionality in resctrl_arch_mbm_cntr_assign_configure()
to update the MSR MSR_IA32_L3_QOS_EXT_CFG based on
hw_res->mbm_cntr_assign_enabled. Something like this.
void resctrl_arch_mbm_cntr_assign_configure(void)
{
---
if (r->mon.mbm_cntr_assignable && hw_res->mbm_cntr_assign_enabled)
abmc_set_one_amd(&enable);
---
}
Yes. The function resctrl_arch_mbm_cntr_assign_configure() will be called
from resctrl_online_cpu().
Does it make sense? Any other idea?
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 20/22] x86/resctrl: Enable AMD ABMC feature by default when supported
2024-08-20 20:04 ` Moger, Babu
@ 2024-08-20 20:18 ` Moger, Babu
2024-08-20 20:37 ` Reinette Chatre
0 siblings, 1 reply; 96+ messages in thread
From: Moger, Babu @ 2024-08-20 20:18 UTC (permalink / raw)
To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
On 8/20/24 15:04, Moger, Babu wrote:
> Hi Reinette,
>
> On 8/20/24 13:12, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 8/19/24 11:18 AM, Moger, Babu wrote:
>>> On 8/16/24 17:33, Reinette Chatre wrote:
>>>> On 8/6/24 3:00 PM, Babu Moger wrote:
>>
>>>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>>> index 66febff2a3d3..d15fd1bde5f4 100644
>>>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>>> @@ -2756,6 +2756,23 @@ void resctrl_arch_mbm_cntr_assign_disable(void)
>>>>> }
>>>>> }
>>>>> +void resctrl_arch_mbm_cntr_assign_configure(void)
>>>>> +{
>>>>> + struct rdt_resource *r =
>>>>> &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>>>>> + struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>>>>> + bool enable = true;
>>>>> +
>>>>> + mutex_lock(&rdtgroup_mutex);
>>>>> +
>>>>> + if (r->mon.mbm_cntr_assignable) {
>>>>> + if (!hw_res->mbm_cntr_assign_enabled)
>>>>> + hw_res->mbm_cntr_assign_enabled = true;
>>>>> + resctrl_abmc_set_one_amd(&enable);
>>>>
>>>> Earlier changelogs mentioned that counters are reset when ABMC is enabled.
>>>> How does that behave here when one CPU comes online? Consider the scenario
>>>> where
>>>> a system is booted without all CPUs online. ABMC is initially enabled on
>>>> all online
>>>> CPUs with this flow ... user space could start using resctrl fs and create
>>>> monitor groups that start accumulating architectural state. If the
>>>> remaining
>>>> CPUs come online at this point and this snippet enables ABMC, would it
>>>> reset
>>>> all counters? Should the architectural state be cleared?
>>>
>>> When new cpu comes online, it should inherit the abmc state which is set
>>> already. it should not force it either way. In that case, it is not
>>> required to reset the architectural state.
>>>
>>> Responded to your earlier comment.
>>> https://lore.kernel.org/lkml/0256b457-175d-4923-aa49-00e8e52b865b@amd.com/
>>>
>>>
>>>>
>>>> Also, it still does not look right that the architecture decides the
>>>> policy.
>>>> Could this enabling be moved to resctrl_online_cpu() for resctrl fs to
>>>> request architecture to enable assignable counters if it is supported?
>>>
>>> Sure. Will move the resctrl_arch_mbm_cntr_assign_configure() here with
>>> changes just to update the abmc state which is set during the init.
>>>
>>
>> I do not think we are seeing it the same way. In your earlier comment you
>> mention:
>>
>>> We need to set abmc state to "enabled" during the init when abmc is
>>> detected. resctrl_late_init -> .. -> rdt_get_mon_l3_config
>>>
>>> This only happens once during the init.
>>
>>
>> I do not think that the ABMC state can be set during init since that runs
>> before the fs code and thus the arch code cannot be aware of the fs policy
>> that "mbm_assign_mode" is the default. This may become clear when you move
>> resctrl_arch_mbm_cntr_assign_configure() to resctrl_online_cpu() though
>> since I expect that the r->mon.mbm_cntr_assignable check will move
>> into the fs resctrl_online_cpu() that will call the arch helper to
>> set the state to enabled.
>
> There are couple of problems here.
>
> 1. Hotplug with ABMC enabled.
>
> System is running with ABMC enabled. Now, new cpu cames online.
> The function resctrl_arch_mbm_cntr_assign_configure() will set the MSR
> MSR_IA32_L3_QOS_EXT_CFG to enable ABMC on the new CPU. This scenario works
> fine.
>
>
> 2. Hotplug with ABMC disabled.
> Current code will force the system to enable ABMC on the new CPU.
> That is not correct.
>
>
> We need to address both these cases.
>
>
> I was thinking of separating the functionality in
> resctrl_arch_mbm_cntr_assign_configure() into two.
>
> a. Just set the mbm_cntr_assign_enabled to true during the init.
> if (r->mon.mbm_cntr_assignable)
> hw_res->mbm_cntr_assign_enabled = true;
>
> This is similar to rdtgroup_setup_default(). Isn't it?
I just noticed that I cannot access rdt_hw_resource here. I may have to
write another function resctrl_arch_mbm_cntr_assign_set_default() to do
this. What do you think?
>
>
> b. Change the functionality in resctrl_arch_mbm_cntr_assign_configure()
> to update the MSR MSR_IA32_L3_QOS_EXT_CFG based on
> hw_res->mbm_cntr_assign_enabled. Something like this.
>
>
> void resctrl_arch_mbm_cntr_assign_configure(void)
> {
> ---
> if (r->mon.mbm_cntr_assignable && hw_res->mbm_cntr_assign_enabled)
> abmc_set_one_amd(&enable);
> ---
> }
>
>
> Yes. The function resctrl_arch_mbm_cntr_assign_configure() will be called
> from resctrl_online_cpu().
>
> Does it make sense? Any other idea?
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 20/22] x86/resctrl: Enable AMD ABMC feature by default when supported
2024-08-20 20:18 ` Moger, Babu
@ 2024-08-20 20:37 ` Reinette Chatre
0 siblings, 0 replies; 96+ messages in thread
From: Reinette Chatre @ 2024-08-20 20:37 UTC (permalink / raw)
To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
On 8/20/24 1:18 PM, Moger, Babu wrote:
> On 8/20/24 15:04, Moger, Babu wrote:
>> On 8/20/24 13:12, Reinette Chatre wrote:
>>> On 8/19/24 11:18 AM, Moger, Babu wrote:
>>>> On 8/16/24 17:33, Reinette Chatre wrote:
>>>>> On 8/6/24 3:00 PM, Babu Moger wrote:
>>>
>>>>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>>>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>>>> index 66febff2a3d3..d15fd1bde5f4 100644
>>>>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>>>> @@ -2756,6 +2756,23 @@ void resctrl_arch_mbm_cntr_assign_disable(void)
>>>>>> }
>>>>>> }
>>>>>> +void resctrl_arch_mbm_cntr_assign_configure(void)
>>>>>> +{
>>>>>> + struct rdt_resource *r =
>>>>>> &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>>>>>> + struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>>>>>> + bool enable = true;
>>>>>> +
>>>>>> + mutex_lock(&rdtgroup_mutex);
>>>>>> +
>>>>>> + if (r->mon.mbm_cntr_assignable) {
>>>>>> + if (!hw_res->mbm_cntr_assign_enabled)
>>>>>> + hw_res->mbm_cntr_assign_enabled = true;
>>>>>> + resctrl_abmc_set_one_amd(&enable);
>>>>>
>>>>> Earlier changelogs mentioned that counters are reset when ABMC is enabled.
>>>>> How does that behave here when one CPU comes online? Consider the scenario
>>>>> where
>>>>> a system is booted without all CPUs online. ABMC is initially enabled on
>>>>> all online
>>>>> CPUs with this flow ... user space could start using resctrl fs and create
>>>>> monitor groups that start accumulating architectural state. If the
>>>>> remaining
>>>>> CPUs come online at this point and this snippet enables ABMC, would it
>>>>> reset
>>>>> all counters? Should the architectural state be cleared?
>>>>
>>>> When new cpu comes online, it should inherit the abmc state which is set
>>>> already. it should not force it either way. In that case, it is not
>>>> required to reset the architectural state.
>>>>
>>>> Responded to your earlier comment.
>>>> https://lore.kernel.org/lkml/0256b457-175d-4923-aa49-00e8e52b865b@amd.com/
>>>>
>>>>
>>>>>
>>>>> Also, it still does not look right that the architecture decides the
>>>>> policy.
>>>>> Could this enabling be moved to resctrl_online_cpu() for resctrl fs to
>>>>> request architecture to enable assignable counters if it is supported?
>>>>
>>>> Sure. Will move the resctrl_arch_mbm_cntr_assign_configure() here with
>>>> changes just to update the abmc state which is set during the init.
>>>>
>>>
>>> I do not think we are seeing it the same way. In your earlier comment you
>>> mention:
>>>
>>>> We need to set abmc state to "enabled" during the init when abmc is
>>>> detected. resctrl_late_init -> .. -> rdt_get_mon_l3_config
>>>>
>>>> This only happens once during the init.
>>>
>>>
>>> I do not think that the ABMC state can be set during init since that runs
>>> before the fs code and thus the arch code cannot be aware of the fs policy
>>> that "mbm_assign_mode" is the default. This may become clear when you move
>>> resctrl_arch_mbm_cntr_assign_configure() to resctrl_online_cpu() though
>>> since I expect that the r->mon.mbm_cntr_assignable check will move
>>> into the fs resctrl_online_cpu() that will call the arch helper to
>>> set the state to enabled.
>>
>> There are couple of problems here.
>>
>> 1. Hotplug with ABMC enabled.
>>
>> System is running with ABMC enabled. Now, new cpu cames online.
>> The function resctrl_arch_mbm_cntr_assign_configure() will set the MSR
>> MSR_IA32_L3_QOS_EXT_CFG to enable ABMC on the new CPU. This scenario works
>> fine.
>>
>>
>> 2. Hotplug with ABMC disabled.
>> Current code will force the system to enable ABMC on the new CPU.
>> That is not correct.
>>
>>
>> We need to address both these cases.
>>
>>
>> I was thinking of separating the functionality in
>> resctrl_arch_mbm_cntr_assign_configure() into two.
>>
>> a. Just set the mbm_cntr_assign_enabled to true during the init.
>> if (r->mon.mbm_cntr_assignable)
>> hw_res->mbm_cntr_assign_enabled = true;
>>
>> This is similar to rdtgroup_setup_default(). Isn't it?
Right. I (mis)understood from your earlier comment that this was planned to
be done from "resctrl_late_init -> .. -> rdt_get_mon_l3_config", but
handling that within rdtgroup_init() (after MPAM it will be resctrl_init())
seem appropriate to me since it is where the fs code can set its policy.
>
> I just noticed that I cannot access rdt_hw_resource here. I may have to
> write another function resctrl_arch_mbm_cntr_assign_set_default() to do
> this. What do you think?
Sounds good.
>
>
>>
>>
>> b. Change the functionality in resctrl_arch_mbm_cntr_assign_configure()
>> to update the MSR MSR_IA32_L3_QOS_EXT_CFG based on
>> hw_res->mbm_cntr_assign_enabled. Something like this.
>>
>>
>> void resctrl_arch_mbm_cntr_assign_configure(void)
>> {
>> ---
>> if (r->mon.mbm_cntr_assignable && hw_res->mbm_cntr_assign_enabled)
>> abmc_set_one_amd(&enable);
>> ---
>> }
>>
>>
>> Yes. The function resctrl_arch_mbm_cntr_assign_configure() will be called
>> from resctrl_online_cpu().
Checking r->mon.mbm_cntr_assignable may be redundant since I expect that
resctrl_online_cpu() will also do the check. Would do no harm though.
>>
>> Does it make sense? Any other idea?
>
Yes, this makes sense to me. Thank you.
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 13/22] x86/resctrl: Add data structures and definitions for ABMC assignment
2024-08-16 21:38 ` Reinette Chatre
@ 2024-08-20 20:56 ` Moger, Babu
2024-08-20 21:09 ` Reinette Chatre
0 siblings, 1 reply; 96+ messages in thread
From: Moger, Babu @ 2024-08-20 20:56 UTC (permalink / raw)
To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Reinette,
On 8/16/24 16:38, Reinette Chatre wrote:
> Hi Babu,
>
> This patch now only introduces one data structure so the subject could
> be made more specific.
How about?
x86/resctrl: Add data structures for L3_QOS_ABMC_CFG MSR
>
> On 8/6/24 3:00 PM, Babu Moger wrote:
>> The ABMC feature provides an option to the user to assign a hardware
>> counter to an RMID and monitor the bandwidth as long as the counter
>> is assigned. The bandwidth events will be tracked by the hardware until
>> the user changes the configuration. Each resctrl group can configure
>> maximum two counters, one for total event and one for local event.
>>
>>
>
> (extra empty line)
Sure.
>
>> The ABMC feature implements a pair of MSRs, L3_QOS_ABMC_CFG (C000_03FDh)
>> and L3_QOS_ABMC_DSC (C000_3FEh). The counters are configured by writing
>> to MSR L3_QOS_ABMC_CFG. Configuration is done by setting the counter id,
>> bandwidth source (RMID) and bandwidth configuration supported by BMEC
>> (Bandwidth Monitoring Event Configuration).
>>
>> L3_QOS_ABMC_DSC is a read-only MSR. Reading L3_QOS_ABMC_DSC returns the
>> configuration of the counter id specified in L3_QOS_ABMC_CFG.cntr_id
>> with rmid(bw_src) and event configuration(bw_type).
>>
>> Attempts to read or write these MSRs when ABMC is not enabled will result
>> in a #GP(0) exception.
>>
>> Introduce data structures and definitions for ABMC MSRs.
>>
>> MSR L3_QOS_ABMC_CFG (0xC000_03FDh) and L3_QOS_ABMC_DSC (0xC000_03FEh)
>> details.
>
> The changelog and patch introduce L3_QOS_ABMC_DSC but I cannot see that it is
> used in this series.
Yes. I was using it in v5 to read the configuration back. It is not
required anymore. I will remove it.
>
>> =========================================================================
>> Bits Mnemonic Description Access Reset
>> Type Value
>> =========================================================================
>> 63 CfgEn Configuration Enable R/W 0
>>
>> 62 CtrEn Enable/disable Tracking R/W 0
>>
>> 61:53 – Reserved MBZ 0
>>
>> 52:48 CtrID Counter Identifier R/W 0
>>
>> 47 IsCOS BwSrc field is a CLOSID R/W 0
>> (not an RMID)
>>
>> 46:44 – Reserved MBZ 0
>>
>> 43:32 BwSrc Bandwidth Source R/W 0
>> (RMID or CLOSID)
>>
>> 31:0 BwType Bandwidth configuration R/W 0
>> to track for this counter
>> ==========================================================================
>>
>> Configuration and tracking:
>> CfgEn=1,CtrEn=0 : Configure CtrID and but no tracking the events yet.
>> CfgEn=1,CtrEn=1 : Configure CtrID and start tracking events.
>
> Could you please add the above snippet noting field combinations to the
> kernel-doc of the union?
Sure.
>
>>
>> The feature details are documented in the APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC).
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v6: Removed all the fs related changes.
>> Added note on CfgEn,CtrEn.
>> Removed the definitions which are not used.
>> Removed cntr_id initialization.
>>
>> v5: Moved assignment flags here (path 10/19 of v4).
>> Added MON_CNTR_UNSET definition to initialize cntr_id's.
>> More details in commit log.
>> Renamed few fields in l3_qos_abmc_cfg for readability.
>>
>> v4: Added more descriptions.
>> Changed the name abmc_ctr_id to ctr_id.
>> Added L3_QOS_ABMC_DSC. Used for reading the configuration.
>>
>> v3: No changes.
>>
>> v2: No changes.
>> ---
>> arch/x86/include/asm/msr-index.h | 2 ++
>> arch/x86/kernel/cpu/resctrl/internal.h | 26 ++++++++++++++++++++++++++
>> 2 files changed, 28 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/msr-index.h
>> b/arch/x86/include/asm/msr-index.h
>> index d86469bf5d41..5b3931a59d5a 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -1183,6 +1183,8 @@
>> #define MSR_IA32_SMBA_BW_BASE 0xc0000280
>> #define MSR_IA32_EVT_CFG_BASE 0xc0000400
>> #define MSR_IA32_L3_QOS_EXT_CFG 0xc00003ff
>> +#define MSR_IA32_L3_QOS_ABMC_CFG 0xc00003fd
>> +#define MSR_IA32_L3_QOS_ABMC_DSC 0xc00003fe
>> /* MSR_IA32_VMX_MISC bits */
>> #define MSR_IA32_VMX_MISC_INTEL_PT (1ULL << 14)
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
>> b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 1021227d8c7e..af3efa35a62e 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -589,6 +589,32 @@ union cpuid_0x10_x_edx {
>> unsigned int full;
>> };
>> +/*
>> + * ABMC counters can be configured by writing to L3_QOS_ABMC_CFG.
>> + * @bw_type : Bandwidth configuration(supported by BMEC)
>> + * tracked by the @cntr_id.
>> + * @bw_src : Bandwidth source (RMID or CLOSID).
>> + * @reserved1 : Reserved.
>> + * @is_clos : @bw_src field is a CLOSID (not an RMID).
>> + * @cntr_id : Counter identifier.
>> + * @reserved : Reserved.
>> + * @cntr_en : Tracking enable bit.
>> + * @cfg_en : Configuration enable bit.
>> + */
>> +union l3_qos_abmc_cfg {
>> + struct {
>> + unsigned long bw_type :32,
>> + bw_src :12,
>> + reserved1: 3,
>> + is_clos : 1,
>> + cntr_id : 5,
>> + reserved : 9,
>> + cntr_en : 1,
>> + cfg_en : 1;
>> + } split;
>> + unsigned long full;
>> +};
>> +
>
> This data structure still uses tabs that seem to have goal of aligning
> members
> but the tabs are used inconsistently and members are not lining up either.
Sorry. I always have issues with these tabs. Will address it next revision.
>
>> void rdt_last_cmd_clear(void);
>> void rdt_last_cmd_puts(const char *s);
>> __printf(1, 2)
>
> Reinette
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 13/22] x86/resctrl: Add data structures and definitions for ABMC assignment
2024-08-20 20:56 ` Moger, Babu
@ 2024-08-20 21:09 ` Reinette Chatre
0 siblings, 0 replies; 96+ messages in thread
From: Reinette Chatre @ 2024-08-20 21:09 UTC (permalink / raw)
To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
On 8/20/24 1:56 PM, Moger, Babu wrote:
> On 8/16/24 16:38, Reinette Chatre wrote:
>> This patch now only introduces one data structure so the subject could
>> be made more specific.
>
> How about?
>
> x86/resctrl: Add data structures for L3_QOS_ABMC_CFG MSR
>
Looks good to me, thank you.
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 14/22] x86/resctrl: Introduce cntr_id in mongroup for assignments
2024-08-16 21:38 ` Reinette Chatre
@ 2024-08-20 22:42 ` Moger, Babu
0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-08-20 22:42 UTC (permalink / raw)
To: Reinette Chatre, Babu Moger, corbet, fenghua.yu, tglx, mingo, bp,
dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Reinette,
On 8/16/2024 4:38 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/6/24 3:00 PM, Babu Moger wrote:
>> mbm_cntr_assignable feature provides an option to the user to assign a
>> hardware counter to an RMID and monitor the bandwidth as long as the
>> counter is assigned. There can be two counters per monitor group, one
>> for total event and another for local event.
>>
>> Introduce cntr_id to manage the assignments.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v6: New patch.
>> Separated FS and arch bits.
>> ---
>> arch/x86/kernel/cpu/resctrl/internal.h | 7 +++++++
>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 6 ++++++
>> 2 files changed, 13 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
>> b/arch/x86/kernel/cpu/resctrl/internal.h
>> index af3efa35a62e..d93082b65d69 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -41,6 +41,11 @@
>> /* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
>> #define ABMC_ENABLE_BIT 0
>> +/* Maximum assignable counters per resctrl group */
>> +#define MAX_CNTRS 2
>> +
>> +#define MON_CNTR_UNSET U32_MAX
>> +
>> /**
>> * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring
>> those that
>> * aren't marked nohz_full
>> @@ -210,12 +215,14 @@ enum rdtgrp_mode {
>> * @parent: parent rdtgrp
>> * @crdtgrp_list: child rdtgroup node list
>> * @rmid: rmid for this rdtgroup
>> + * @cntr_id: Counter ids for assignment
>
> Could this be:
> "IDs of hardware counters assigned to monitor group"
>
Sure. Will do.
--
- Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 15/22] x86/resctrl: Add the interface to assign a hardware counter
2024-08-16 21:41 ` Reinette Chatre
@ 2024-08-21 15:04 ` Moger, Babu
0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-08-21 15:04 UTC (permalink / raw)
To: Reinette Chatre, Babu Moger, corbet, fenghua.yu, tglx, mingo, bp,
dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Reinette,
On 8/16/2024 4:41 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/6/24 3:00 PM, Babu Moger wrote:
>> The ABMC feature provides an option to the user to assign a hardware
>
> This patch is a mix of resctrl fs and arch code, could each piece please
> be desribed clearly?
I will separate them. That is probably better.
>
>> counter to an RMID and monitor the bandwidth as long as it is assigned.
>> The assigned RMID will be tracked by the hardware until the user
>> unassigns
>> it manually.
>>
>> Counters are configured by writing to L3_QOS_ABMC_CFG MSR and
>> specifying the counter id, bandwidth source, and bandwidth types.
>>
>> Provide the interface to assign the counter ids to RMID.
>>
>> The feature details are documented in the APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable
>> Bandwidth
>> Monitoring (ABMC).
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v6: Removed mbm_cntr_alloc() from this patch to keep fs and arch code
>> separate.
>> Added code to update the counter assignment at domain level.
>>
>> v5: Few name changes to match cntr_id.
>> Changed the function names to
>> rdtgroup_assign_cntr
>> resctr_arch_assign_cntr
>> More comments on commit log.
>> Added function summary.
>>
>> v4: Commit message update.
>> User bitmap APIs where applicable.
>> Changed the interfaces considering MPAM(arm).
>> Added domain specific assignment.
>>
>> v3: Removed the static from the prototype of rdtgroup_assign_abmc.
>> The function is not called directly from user anymore. These
>> changes are related to global assignment interface.
>>
>> v2: Minor text changes in commit message.
>> ---
>> arch/x86/kernel/cpu/resctrl/internal.h | 4 ++
>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 97 ++++++++++++++++++++++++++
>> 2 files changed, 101 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
>> b/arch/x86/kernel/cpu/resctrl/internal.h
>> index d93082b65d69..4e8109dee174 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -685,6 +685,10 @@ int mbm_cntr_alloc(struct rdt_resource *r);
>> void mbm_cntr_free(u32 cntr_id);
>> void resctrl_mbm_evt_config_init(struct rdt_hw_mon_domain *hw_dom);
>> unsigned int mon_event_config_index_get(u32 evtid);
>> +int resctrl_arch_assign_cntr(struct rdt_mon_domain *d, enum
>> resctrl_event_id evtid,
>> + u32 rmid, u32 cntr_id, u32 closid, bool assign);
>> +int rdtgroup_assign_cntr(struct rdtgroup *rdtgrp, enum
>> resctrl_event_id evtid);
>> +int rdtgroup_alloc_cntr(struct rdtgroup *rdtgrp, int index);
>> void rdt_staged_configs_clear(void);
>> bool closid_allocated(unsigned int closid);
>> int resctrl_find_cleanest_closid(void);
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 60696b248b56..1ee91a7293a8 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1864,6 +1864,103 @@ static ssize_t
>> mbm_local_bytes_config_write(struct kernfs_open_file *of,
>> return ret ?: nbytes;
>> }
>> +static void rdtgroup_abmc_cfg(void *info)
>
> This has nothing to do with a resctrl group (arch code has no insight
> into the groups anyway).
> Maybe an arch specific name like "resctrl_abmc_config_one_amd()" to
> match earlier
> "resctrl_abmc_set_one_amd()"?
>
Sure.
>
>> +{
>> + u64 *msrval = info;
>> +
>> + wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *msrval);
>> +}
>> +
>> +/*
>> + * Send an IPI to the domain to assign the counter id to RMID.
>> + */
>> +int resctrl_arch_assign_cntr(struct rdt_mon_domain *d, enum
>> resctrl_event_id evtid,
>> + u32 rmid, u32 cntr_id, u32 closid, bool assign)
>> +{
>> + struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
>> + union l3_qos_abmc_cfg abmc_cfg = { 0 };
>> + struct arch_mbm_state *arch_mbm;
>> +
>> + abmc_cfg.split.cfg_en = 1;
>> + abmc_cfg.split.cntr_en = assign ? 1 : 0;
>> + abmc_cfg.split.cntr_id = cntr_id;
>> + abmc_cfg.split.bw_src = rmid;
>> +
>> + /* Update the event configuration from the domain */
>> + if (evtid == QOS_L3_MBM_TOTAL_EVENT_ID) {
>> + abmc_cfg.split.bw_type = hw_dom->mbm_total_cfg;
>> + arch_mbm = &hw_dom->arch_mbm_total[rmid];
>> + } else {
>> + abmc_cfg.split.bw_type = hw_dom->mbm_local_cfg;
>> + arch_mbm = &hw_dom->arch_mbm_local[rmid];
>> + }
>> +
>> + smp_call_function_any(&d->hdr.cpu_mask, rdtgroup_abmc_cfg,
>> &abmc_cfg, 1);
>> +
>> + /*
>> + * Reset the architectural state so that reading of hardware
>> + * counter is not considered as an overflow in next update.
>> + */
>> + if (arch_mbm)
>> + memset(arch_mbm, 0, sizeof(struct arch_mbm_state));
>> +
>> + return 0;
>> +}
>> +
>> +/* Allocate a new counter id if the event is unassigned */
>> +int rdtgroup_alloc_cntr(struct rdtgroup *rdtgrp, int index)
>> +{
>> + struct rdt_resource *r =
>> &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> + int cntr_id;
>> +
>> + /* Nothing to do if event has been assigned already */
>> + if (rdtgrp->mon.cntr_id[index] != MON_CNTR_UNSET) {
>> + rdt_last_cmd_puts("ABMC counter is assigned already\n");
>
> This is resctrl fs code. Please replace the arch specific messages
> ("ABMC") with resctrl fs terms.
Sure.
>
>> + return 0;
>> + }
>> +
>> + /*
>> + * Allocate a new counter id and update domains
>> + */
>> + cntr_id = mbm_cntr_alloc(r);
>> + if (cntr_id < 0) {
>> + rdt_last_cmd_puts("Out of ABMC counters\n");
>
> here also.
Sure.
>
>> + return -ENOSPC;
>> + }
>> +
>> + rdtgrp->mon.cntr_id[index] = cntr_id;
>> +
>> + return 0;
>> +}
>> +
>> +/*
>> + * Assign a hardware counter to the group and assign the counter
>> + * all the domains in the group. It will try to allocate the mbm
>> + * counter if the counter is available.
>> + */
>> +int rdtgroup_assign_cntr(struct rdtgroup *rdtgrp, enum
>> resctrl_event_id evtid)
>> +{
>> + struct rdt_resource *r =
>> &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> + struct rdt_mon_domain *d;
>> + int index;
>> +
>> + index = mon_event_config_index_get(evtid);
>
> After going through MPAM series this no longer looks correct. As the
> name of this
> function implies this is an index unique to the monitor event
> configuration feature
> and as the MPAM series highlights, it is unique to the architecture, not
> something
> that is visible to resctrl fs. resctrl fs uses the event IDs and it is
> only when the
> fs makes a request to the architecture that this translation comes into
> play.
>
> With this change, what is the architecture specific "mon event config
> index" now
> becomes part of resctrl fs used for something totally different from mon
> event
> configuration.
>
> I think we should separate this to make sure we distinguish between an
> architectural
> translation and a resctrl fs translation, the array index is not the
> same as the architecture
> specific "mov event config index".
>
> How about we start with something simple that is defined by resctrl fs?
> for example:
> #define MBM_EVENT_ARRAY_INDEX(_event) (_event - 2)
Yes. Good point. We can do that.
>
>
>> + if (index == INVALID_CONFIG_INDEX)
>> + return -EINVAL;
>> +
>> + if (rdtgroup_alloc_cntr(rdtgrp, index))
>> + return -EINVAL;
>> +
>
> hmmm ... so rdtgroup_alloc_cntr() returns 0 if the counter is assigned
> already, and
> in this case the configuration is done again even if counter was already
> assigned.
> Is this intended?
I didn't think thru this. Yea. It is not required as far as I can see.
Will address it.
>
> rdtgroup_assign_cntr() seems to be almost identical to
> rdtgroup_assign_update()
> that has protection against the above from happening. It looks like
> these two
> functions can be merged into one?
Yes. We can do that.
>
>> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> + resctrl_arch_assign_cntr(d, evtid, rdtgrp->mon.rmid,
>> + rdtgrp->mon.cntr_id[index],
>
> There currently seems to be a mismatch between functions needing to
> access this ID directly as above in some cases while also needing to
> use helpers like rdtgroup_alloc_cntr().
I think I need to merge rdtgroup_assign_cntr(), rdtgroup_assign_update()
and rdtgroup_alloc_cntr. It will probably make it clear.
>
> Also, as James indicated, resctrl_arch_assign_cntr() may fail on Arm
> so this needs error checking even though the x86 implementation always
> returns success.
Sure. Will do.
>
>> + rdtgrp->closid, true);
>> + set_bit(rdtgrp->mon.cntr_id[index], d->mbm_cntr_map);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> /* rdtgroup information files for one cache resource. */
>> static struct rftype res_common_files[] = {
>> {
>
> Reinette
>
Thanks
- Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 16/22] x86/resctrl: Add the interface to unassign a MBM counter
2024-08-16 21:41 ` Reinette Chatre
@ 2024-08-21 16:01 ` Moger, Babu
2024-08-23 20:18 ` Reinette Chatre
0 siblings, 1 reply; 96+ messages in thread
From: Moger, Babu @ 2024-08-21 16:01 UTC (permalink / raw)
To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Reinette,
On 8/16/24 16:41, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/6/24 3:00 PM, Babu Moger wrote:
>> The ABMC feature provides an option to the user to assign a hardware
>
> This is about resctrl fs so "The ABMC feature" -> "mbm_cntr_assign mode"
Sure.
> (please check whole series).
Sure.
>
>> counter to an RMID and monitor the bandwidth as long as it is assigned.
>> The assigned RMID will be tracked by the hardware until the user unassigns
>> it manually.
>>
>> Hardware provides only limited number of counters. If the system runs out
>> of assignable counters, kernel will display an error when a new assignment
>> is requested. Users need to unassign a already assigned counter to make
>> space for new assignment.
>>
>> Provide the interface to unassign the counter ids from the group. Free the
>> counter if it is not assigned in any of the domains.
>>
>> The feature details are documented in the APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable
>> Bandwidth
>> Monitoring (ABMC).
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v6: Removed mbm_cntr_free from this patch.
>> Added counter test in all the domains and free if it is not
>> assigned to
>> any domains.
>>
>> v5: Few name changes to match cntr_id.
>> Changed the function names to
>> rdtgroup_unassign_cntr
>> More comments on commit log.
>>
>> v4: Added domain specific unassign feature.
>> Few name changes.
>>
>> v3: Removed the static from the prototype of rdtgroup_unassign_abmc.
>> The function is not called directly from user anymore. These
>> changes are related to global assignment interface.
>>
>> v2: No changes.
>> ---
>> arch/x86/kernel/cpu/resctrl/internal.h | 2 +
>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 52 ++++++++++++++++++++++++++
>> 2 files changed, 54 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
>> b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 4e8109dee174..cc832955b787 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -689,6 +689,8 @@ int resctrl_arch_assign_cntr(struct rdt_mon_domain
>> *d, enum resctrl_event_id evt
>> u32 rmid, u32 cntr_id, u32 closid, bool assign);
>> int rdtgroup_assign_cntr(struct rdtgroup *rdtgrp, enum
>> resctrl_event_id evtid);
>> int rdtgroup_alloc_cntr(struct rdtgroup *rdtgrp, int index);
>> +int rdtgroup_unassign_cntr(struct rdtgroup *rdtgrp, enum
>> resctrl_event_id evtid);
>> +void rdtgroup_free_cntr(struct rdt_resource *r, struct rdtgroup
>> *rdtgrp, int index);
>> void rdt_staged_configs_clear(void);
>> bool closid_allocated(unsigned int closid);
>> int resctrl_find_cleanest_closid(void);
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 1ee91a7293a8..0c2215dbd497 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1961,6 +1961,58 @@ int rdtgroup_assign_cntr(struct rdtgroup *rdtgrp,
>> enum resctrl_event_id evtid)
>> return 0;
>> }
>> +static int rdtgroup_mbm_cntr_test(struct rdt_resource *r, u32 cntr_id)
>
> Could "test" be replaced with something more specific about what is tested?
> for example, "rdtgroup_mbm_cntr_is_assigned()" or something better? The
Yes. We can do that.
> function
> looks like a good candidate for returning a bool.
Sure.
>
> Is this function needed though? (more below)
Yes. It is required. It is called from two places
(rdtgroup_unassign_update and rdtgroup_unassign_cntr).
We can open code in rdtgroup_unassign_cntr. But we can't do that in
rdtgroup_unassign_update. But, I will check again for sure.
>
>> +{
>> + struct rdt_mon_domain *d;
>> +
>> + list_for_each_entry(d, &r->mon_domains, hdr.list)
>> + if (test_bit(cntr_id, d->mbm_cntr_map))
>> + return 1;
>> +
>> + return 0;
>> +}
>> +
>> +/* Free the counter id after the event is unassigned */
>> +void rdtgroup_free_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>> + int index)
>> +{
>> + /* Update the counter bitmap */
>> + if (!rdtgroup_mbm_cntr_test(r, rdtgrp->mon.cntr_id[index])) {
>> + mbm_cntr_free(rdtgrp->mon.cntr_id[index]);
>> + rdtgrp->mon.cntr_id[index] = MON_CNTR_UNSET;
>> + }
>> +}
>> +
>> +/*
>> + * Unassign a hardware counter from the group and update all the domains
>> + * in the group.
>> + */
>> +int rdtgroup_unassign_cntr(struct rdtgroup *rdtgrp, enum
>> resctrl_event_id evtid)
>> +{
>> + struct rdt_resource *r =
>> &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> + struct rdt_mon_domain *d;
>> + int index;
>> +
>> + index = mon_event_config_index_get(evtid);
>> + if (index == INVALID_CONFIG_INDEX)
>> + return -EINVAL;
>> +
>> + if (rdtgrp->mon.cntr_id[index] != MON_CNTR_UNSET) {
>> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> + resctrl_arch_assign_cntr(d, evtid, rdtgrp->mon.rmid,
>> + rdtgrp->mon.cntr_id[index],
>> + rdtgrp->closid, false);
>> + clear_bit(rdtgrp->mon.cntr_id[index],
>> + d->mbm_cntr_map);
>> + }
>> +
>> + /* Free the counter at group level */
>> + rdtgroup_free_cntr(r, rdtgrp, index);
>
> rdtgroup_free_cntr() is called right after the counter has been unassigned
> from all domains. Will rdtgroup_mbm_cntr_test() thus not always return 0?
> It seems unnecessary to have rdtgroup_mbm_cntr_test() and considering that,
> rdtgroup_free_cntr() can just be open coded here?
Yes. We can open code here.
>
>> + }
>> +
>> + return 0;
>> +}
>> +
>> /* rdtgroup information files for one cache resource. */
>> static struct rftype res_common_files[] = {
>> {
>
> Reinette
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 17/22] x86/resctrl: Assign/unassign counters by default when ABMC is enabled
2024-08-16 21:42 ` Reinette Chatre
@ 2024-08-21 17:20 ` Moger, Babu
0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-08-21 17:20 UTC (permalink / raw)
To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Reinette,
On 8/16/24 16:42, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/6/24 3:00 PM, Babu Moger wrote:
>> Assign/unassign counters on resctrl group creation/deletion. Two counters
>> are required per group, one for total event and one for local event.
>>
>> There are only limited number of counters for assignment. If the counters
>> are exhausted, report the warnings and continue. It is not required to
>
> Regarding "report the warnings and continue", which warnings are you
> referring to?
I was referring to "rdt_last_cmd_puts("Out of ABMC counters\n");"
I will make that clear here.
>
>> fail group creation for assignment failures. Users have the option to
>> modify the assignments later.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v6: Removed the redundant comments on all the calls of
>> rdtgroup_assign_cntrs. Updated the commit message.
>> Dropped printing error message on every call of rdtgroup_assign_cntrs.
>>
>> v5: Removed the code to enable/disable ABMC during the mount.
>> That will be another patch.
>> Added arch callers to get the arch specific data.
>> Renamed fuctions to match the other abmc function.
>> Added code comments for assignment failures.
>>
>> v4: Few name changes based on the upstream discussion.
>> Commit message update.
>>
>> v3: This is a new patch. Patch addresses the upstream comment to enable
>> ABMC feature by default if the feature is available.
>> ---
>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 55 ++++++++++++++++++++++++++
>> 1 file changed, 55 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 0c2215dbd497..d93c1d784b91 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -2908,6 +2908,46 @@ static void schemata_list_destroy(void)
>> }
>> }
>> +/*
>> + * Called when new group is created. Assign the counters if ABMC is
>
> Please replace ABMC with resctrl fs generic terms.
Sure.
>
>> + * already enabled. Two counters are required per group, one for total
>> + * event and one for local event. With limited number of counters,
>> + * the assignments can fail in some cases. But, it is not required to
>> + * fail the group creation. Users have the option to modify the
>> + * assignments after the group creation.
>> + */
>> +static int rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
>> +{
>> + int ret = 0;
>> +
>> + if (!resctrl_arch_get_abmc_enabled())
>> + return 0;
>> +
>> + if (is_mbm_total_enabled())
>> + ret = rdtgroup_assign_cntr(rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
>> +
>> + if (!ret && is_mbm_local_enabled())
>> + ret = rdtgroup_assign_cntr(rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
>> +
>> + return ret;
>> +}
>> +
>
> Reinette
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 18/22] x86/resctrl: Report "Unassigned" for MBM events in ABMC mode
2024-08-16 21:42 ` Reinette Chatre
@ 2024-08-21 17:30 ` Moger, Babu
0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-08-21 17:30 UTC (permalink / raw)
To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Reinette,
On 8/16/24 16:42, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/6/24 3:00 PM, Babu Moger wrote:
>> In ABMC mode, the hardware counter should be assigned to read the MBM
>> events.
>>
>> Report "Unassigned" in case the user attempts to read the events without
>> assigning the counter.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v6: Added more explaination in the resctrl.rst
>> Added checks to detect "Unassigned" before reading RMID.
>>
>> v5: New patch.
>> ---
>> Documentation/arch/x86/resctrl.rst | 11 +++++++++++
>> arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 13 ++++++++++++-
>> 2 files changed, 23 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst
>> b/Documentation/arch/x86/resctrl.rst
>> index fe9f10766c4f..aea440ee6107 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -294,6 +294,17 @@ with the following files:
>> "num_mbm_cntrs":
>> The number of monitoring counters available for assignment.
>> + Resctrl subsystem provides the interface to count maximum of two
>> + MBM events per group, from a combination of total and local events.
>> + Keeping the current interface, users can assign a maximum of two
>> + monitoring counters per group. User will also have the option to
>> + enable only one counter to the group.
>> +
>> + With limited number of counters, system can run out of assignable
>> counters.
>> + In mbm_cntr_assign mode, the MBM event counters will return
>> "Unassigned" if
>> + the counter is not assigned to the event when read. Users need to
>> assign a
>> + counter manually to read the events.
>
> This seems more appropriate for the "mon_data" section.
Sure. Will do.
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 19/22] x86/resctrl: Introduce the interface to switch between monitor modes
2024-08-16 21:42 ` Reinette Chatre
@ 2024-08-21 18:08 ` Moger, Babu
0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-08-21 18:08 UTC (permalink / raw)
To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Reinette,
On 8/16/24 16:42, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/6/24 3:00 PM, Babu Moger wrote:
>> +static ssize_t rdtgroup_mbm_mode_write(struct kernfs_open_file *of,
>> + char *buf, size_t nbytes,
>> + loff_t off)
>> +{
>> + int mbm_cntr_assign = resctrl_arch_get_abmc_enabled();
>
> This needs to be protected by the mutex.
Agree.. Will do.
>
>> + struct rdt_resource *r = of->kn->parent->priv;
>> + int ret = 0;
>> +
>> + /* Valid input requires a trailing newline */
>> + if (nbytes == 0 || buf[nbytes - 1] != '\n')
>> + return -EINVAL;
>> +
>> + buf[nbytes - 1] = '\0';
>> +
>> + cpus_read_lock();
>> + mutex_lock(&rdtgroup_mutex);
>> +
>> + rdt_last_cmd_clear();
>> +
>> + if (!strcmp(buf, "legacy")) {
>> + if (mbm_cntr_assign)
>> + resctrl_arch_mbm_cntr_assign_disable();
>> + } else if (!strcmp(buf, "mbm_cntr_assign")) {
>> + if (!mbm_cntr_assign) {
>> + rdtgroup_mbm_cntr_reset(r);
>> + ret = resctrl_arch_mbm_cntr_assign_enable();
>> + }
>> + } else {
>> + ret = -EINVAL;
>> + }
>> +
>> + mutex_unlock(&rdtgroup_mutex);
>> + cpus_read_unlock();
>> +
>> + return ret ?: nbytes;
>> +}
>> +
>
> Reinette
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 22/22] x86/resctrl: Introduce interface to modify assignment states of the groups
2024-08-16 22:33 ` Reinette Chatre
@ 2024-08-21 20:11 ` Moger, Babu
2024-08-23 20:18 ` Reinette Chatre
0 siblings, 1 reply; 96+ messages in thread
From: Moger, Babu @ 2024-08-21 20:11 UTC (permalink / raw)
To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Reinette,
On 8/16/24 17:33, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/6/24 3:00 PM, Babu Moger wrote:
>> Introduce the interface to assign MBM events in ABMC mode.
>>
>> Events can be enabled or disabled by writing to file
>> /sys/fs/resctrl/info/L3_MON/mbm_control
>>
>> Format is similar to the list format with addition of opcode for the
>> assignment operation.
>> "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
>>
>> Format for specific type of groups:
>>
>> * Default CTRL_MON group:
>> "//<domain_id><opcode><flags>"
>>
>> * Non-default CTRL_MON group:
>> "<CTRL_MON group>//<domain_id><opcode><flags>"
>>
>> * Child MON group of default CTRL_MON group:
>> "/<MON group>/<domain_id><opcode><flags>"
>>
>> * Child MON group of non-default CTRL_MON group:
>> "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
>>
>> Domain_id '*' will apply the flags on all the domains.
>>
>> Opcode can be one of the following:
>>
>> = Update the assignment to match the flags
>> + assign a MBM event
>> - unassign a MBM event
>>
>> Assignment flags can be one of the following:
>> t MBM total event
>> l MBM local event
>> tl Both total and local MBM events
>> _ None of the MBM events. Valid only with '=' opcode.
>
> (please note comments in this area of cover letter)
Ok. Yes. Will test differrent cobminations of flags(like _lt etc..).
>
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v6: Added support assign all if domain id is '*'
>> Fixed the allocation of counter id if it not assigned already.
>>
>> v5: Interface name changed from mbm_assign_control to mbm_control.
>> Fixed opcode and flags combination.
>> '=_" is valid.
>> "-_" amd "+_" is not valid.
>> Minor message update.
>> Renamed the function with prefix - rdtgroup_.
>> Corrected few documentation mistakes.
>> Rebase related changes after SNC support.
>>
>> v4: Added domain specific assignments. Fixed the opcode parsing.
>>
>> v3: New patch.
>> Addresses the feedback to provide the global assignment interface.
>>
>> https://lore.kernel.org/lkml/c73f444b-83a1-4e9a-95d3-54c5165ee782@intel.com/
>> ---
>> Documentation/arch/x86/resctrl.rst | 94 +++++++-
>> arch/x86/kernel/cpu/resctrl/internal.h | 7 +
>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 315 ++++++++++++++++++++++++-
>> 3 files changed, 414 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst
>> b/Documentation/arch/x86/resctrl.rst
>> index 113c22ba6db3..ae3b17b7cefe 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -346,7 +346,7 @@ with the following files:
>> t MBM total event is enabled.
>> l MBM local event is enabled.
>> tl Both total and local MBM events are enabled.
>> - _ None of the MBM events are enabled.
>> + _ None of the MBM events are enabled. Only works with opcode '='
>> for write.
>> Examples:
>> ::
>> @@ -365,6 +365,98 @@ with the following files:
>> enabled on domain 0 and 1.
>> + Assignment state can be updated by writing to the interface.
>> +
>> + Format is similar to the list format with addition of opcode for the
>> + assignment operation.
>> +
>> + "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
>> +
>> + Format for each type of groups:
>> +
>> + * Default CTRL_MON group:
>> + "//<domain_id><opcode><flags>"
>> +
>> + * Non-default CTRL_MON group:
>> + "<CTRL_MON group>//<domain_id><opcode><flags>"
>> +
>> + * Child MON group of default CTRL_MON group:
>> + "/<MON group>/<domain_id><opcode><flags>"
>> +
>> + * Child MON group of non-default CTRL_MON group:
>> + "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
>> +
>> + Domain_id '*' wil apply the flags on all the domains.
>> +
>> + Opcode can be one of the following:
>> + ::
>> +
>> + = Update the assignment to match the MBM event.
>> + + Assign a MBM event.
>
> "Assign a new MBM event without impacting existing assignments."?
Sure.
>
>> + - Unassign a MBM event.
>
> (similar)
Sure.
>
>> +
>> + Examples:
>> + ::
>> +
>> + Initial group status:
>> + # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>> + non_default_ctrl_mon_grp//0=tl;1=tl;
>> + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> + //0=tl;1=tl;
>> + /child_default_mon_grp/0=tl;1=tl;
>> +
>> + To update the default group to assign only total MBM event on
>> domain 0:
>> + # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_control
>> +
>> + Assignment status after the update:
>> + # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>> + non_default_ctrl_mon_grp//0=tl;1=tl;
>> + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> + //0=t;1=tl;
>> + /child_default_mon_grp/0=tl;1=tl;
>> +
>> + To update the MON group child_default_mon_grp to remove total MBM
>> event on domain 1:
>> + # echo "/child_default_mon_grp/1-t" >
>> /sys/fs/resctrl/info/L3_MON/mbm_control
>> +
>> + Assignment status after the update:
>> + $ cat /sys/fs/resctrl/info/L3_MON/mbm_control
>> + non_default_ctrl_mon_grp//0=tl;1=tl;
>> + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> + //0=t;1=tl;
>> + /child_default_mon_grp/0=tl;1=l;
>> +
>> + To update the MON group
>> non_default_ctrl_mon_grp/child_non_default_mon_grp to
>> + unassign both local and total MBM events on domain 1:
>> + # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
>> + /sys/fs/resctrl/info/L3_MON/mbm_control
>> +
>> + Assignment status after the update:
>> + non_default_ctrl_mon_grp//0=tl;1=tl;
>> + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
>> + //0=t;1=tl;
>> + /child_default_mon_grp/0=tl;1=l;
>> +
>> + To update the default group to add a local MBM event domain 0.
>> + # echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_control
>> +
>> + Assignment status after the update:
>> + # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>> + non_default_ctrl_mon_grp//0=tl;1=tl;
>> + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
>> + //0=tl;1=tl;
>> + /child_default_mon_grp/0=tl;1=l;
>> +
>> + To update the non default CTRL_MON group non_default_ctrl_mon_grp
>> to unassign all
>> + the MBM events on all the domains.
>> + # echo "non_default_ctrl_mon_grp//*=_" >
>> /sys/fs/resctrl/info/L3_MON/mbm_control
>> +
>> + Assignment status after the update:
>> + #cat /sys/fs/resctrl/info/L3_MON/mbm_control
>> + non_default_ctrl_mon_grp//0=_;1=_;
>> + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
>> + //0=tl;1=tl;
>> + /child_default_mon_grp/0=tl;1=l;
>> +
>> "max_threshold_occupancy":
>> Read/write file provides the largest value (in
>> bytes) at which a previously used LLC_occupancy
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
>> b/arch/x86/kernel/cpu/resctrl/internal.h
>> index ba3012f8f940..5af225b4a497 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -46,6 +46,13 @@
>> #define MON_CNTR_UNSET U32_MAX
>> +/*
>> + * Assignment flags for ABMC feature
>
> (this is resctrl fs code)
Ok.
>
>> + */
>> +#define ASSIGN_NONE 0
>> +#define ASSIGN_TOTAL BIT(QOS_L3_MBM_TOTAL_EVENT_ID)
>> +#define ASSIGN_LOCAL BIT(QOS_L3_MBM_LOCAL_EVENT_ID)
>> +
>> /**
>> * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring
>> those that
>> * aren't marked nohz_full
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index d7aadca5e4ab..8567fb3a6274 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1034,6 +1034,318 @@ static int rdtgroup_mbm_control_show(struct
>> kernfs_open_file *of,
>> return 0;
>> }
>> +/*
>> + * Update the assign states for the domain.
>> + *
>> + * If this is a new assignment for the group then allocate a counter
>> and update
>> + * the assignment else just update the assign state
>> + */
>> +static int rdtgroup_assign_update(struct rdtgroup *rdtgrp, enum
>> resctrl_event_id evtid,
>> + struct rdt_mon_domain *d)
>> +{
>> + int ret, index;
>> +
>> + index = mon_event_config_index_get(evtid);
>> + if (index == INVALID_CONFIG_INDEX)
>> + return -EINVAL;
>
> (wrong spacing ... see checkpatch.pl)
ok
>
>> +
>> + if (rdtgrp->mon.cntr_id[index] == MON_CNTR_UNSET) {
>> + ret = rdtgroup_alloc_cntr(rdtgrp, index);
>> + if (ret < 0)
>> + goto out_done;
>> + }
>> +
>> + /* Update the state on all domains if d == NULL */
>> + if (d == NULL) {
>
> if (!d) ... (checkpatch)
ok.
>
>> + ret = rdtgroup_assign_cntr(rdtgrp, evtid);
>> + } else {
>> + ret = resctrl_arch_assign_cntr(d, evtid, rdtgrp->mon.rmid,
>> + rdtgrp->mon.cntr_id[index],
>> + rdtgrp->closid, 1);
>> + if (!ret)
>> + set_bit(rdtgrp->mon.cntr_id[index], d->mbm_cntr_map);
>> + }
>> +
>> +out_done:
>> + return ret;
>> +}
>
> Please merge this with almost identical rdtgroup_assign_cntr()
Yes. Sure.
>
>> +
>> +/*
>> + * Update the unassign state for the domain.
>> + *
>> + * Free the counter if it is unassigned on all the domains else just
>> + * update the unassign state
>> + */
>> +static int rdtgroup_unassign_update(struct rdtgroup *rdtgrp, enum
>> resctrl_event_id evtid,
>> + struct rdt_mon_domain *d)
>> +{
>> + struct rdt_resource *r =
>> &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> + int ret = 0, index;
>> +
>> + index = mon_event_config_index_get(evtid);
>> + if (index == INVALID_CONFIG_INDEX)
>> + return -EINVAL;
>
> (wrong spacing ... see checkpatch.pl)
Sure.
>
>> +
>> + if (rdtgrp->mon.cntr_id[index] == MON_CNTR_UNSET)
>> + goto out_done;
>> +
>> + if (d == NULL) {
>
> if (!d)
>
>> + ret = rdtgroup_unassign_cntr(rdtgrp, evtid);
>> + } else {
>> + ret = resctrl_arch_assign_cntr(d, evtid, rdtgrp->mon.rmid,
>> + rdtgrp->mon.cntr_id[index],
>> + rdtgrp->closid, 0);
>> + if (!ret) {
>> + clear_bit(rdtgrp->mon.cntr_id[index], d->mbm_cntr_map);
>> + rdtgroup_free_cntr(r, rdtgrp, index);
>> + }
>> + }
>> +
>> +out_done:
>> + return ret;
>> +}
>
> Please merge this with almost identical rdtgroup_unassign_cntr()
Sure.
>
>> +
>> +static int rdtgroup_str_to_mon_state(char *flag)
>> +{
>> + int i, mon_state = 0;
>> +
>> + for (i = 0; i < strlen(flag); i++) {
>> + switch (*(flag + i)) {
>> + case 't':
>> + mon_state |= ASSIGN_TOTAL;
>> + break;
>> + case 'l':
>> + mon_state |= ASSIGN_LOCAL;
>> + break;
>> + case '_':
>> + mon_state = ASSIGN_NONE;
>> + break;
>
> It looks like this supports flags like "_lt", treating it as assigning
> both local and total. I expect this should remove all flags instead?
This is a cobination of flags.
"_lt" This will assign both local and total.
"lt_" This with remove both the flags.
It seems alright to me. Do you want me to change the bahaviour here?
>
>> + default:
>> + break;
>> + }
>> + }
>> +
>> + return mon_state;
>> +}
>
> hmmm ... so you removed assigning mon_state to ASSIGN_NONE from default,
> but that did not change what this function returns since ASSIGN_NONE is 0
> and mon_state is initialized to 0. Unknown flags should cause error so
> that it is possible to add flags in the future. Above prevents us from
> ever adding new flags.
May be I am missing something here. How about this?
enum {
ASSIGN_NONE = 0,
ASSIGN_TOTAL,
ASSIGN_LOCAL,
ASSIGN_INVALID,
};
static int rdtgroup_str_to_mon_state(char *flag)
{
int i, mon_state = ASSIGN_NONE;
for (i = 0; i < strlen(flag); i++) {
switch (*(flag + i)) {
case 't':
mon_state |= ASSIGN_TOTAL;
break;
case 'l':
mon_state |= ASSIGN_LOCAL;
break;
case '_':
mon_state = ASSIGN_NONE;
break;
default:
mon_state = ASSIGN_INVALID;
goto out_done;
}
}
:out_done:
return mon_state;
}
Then handle the ASSIGN_INVALID from the caller. Is that what you think?
>
>> +
>> +static struct rdtgroup *rdtgroup_find_grp(enum rdt_group_type rtype,
>> char *p_grp, char *c_grp)
>
> rdtgroup_find_grp() -> rdtgroup_find_grp_by_name()?
>
>> +{
>> + struct rdtgroup *rdtg, *crg;
>> +
>> + if (rtype == RDTCTRL_GROUP && *p_grp == '\0') {
>> + return &rdtgroup_default;
>> + } else if (rtype == RDTCTRL_GROUP) {
>> + list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list)
>> + if (!strcmp(p_grp, rdtg->kn->name))
>> + return rdtg;
>> + } else if (rtype == RDTMON_GROUP) {
>> + list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
>> + if (!strcmp(p_grp, rdtg->kn->name)) {
>> + list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
>> + mon.crdtgrp_list) {
>> + if (!strcmp(c_grp, crg->kn->name))
>> + return crg;
>> + }
>> + }
>> + }
>> + }
>> +
>> + return NULL;
>> +}
>> +
>> +static int rdtgroup_process_flags(struct rdt_resource *r,
>> + enum rdt_group_type rtype,
>> + char *p_grp, char *c_grp, char *tok)
>> +{
>> + int op, mon_state, assign_state, unassign_state;
>> + char *dom_str, *id_str, *op_str;
>> + struct rdt_mon_domain *d;
>> + struct rdtgroup *rdtgrp;
>> + unsigned long dom_id;
>> + int ret, found = 0;
>> +
>> + rdtgrp = rdtgroup_find_grp(rtype, p_grp, c_grp);
>> +
>> + if (!rdtgrp) {
>> + rdt_last_cmd_puts("Not a valid resctrl group\n");
>> + return -EINVAL;
>> + }
>> +
>> +next:
>> + if (!tok || tok[0] == '\0')
>> + return 0;
>> +
>> + /* Start processing the strings for each domain */
>> + dom_str = strim(strsep(&tok, ";"));
>> +
>> + op_str = strpbrk(dom_str, "=+-");
>> +
>> + if (op_str) {
>> + op = *op_str;
>> + } else {
>> + rdt_last_cmd_puts("Missing operation =, +, -, _ character\n");
>> + return -EINVAL;
>> + }
>> +
>> + id_str = strsep(&dom_str, "=+-");
>> +
>> + /* Check for domain id '*' which means all domains */
>> + if (id_str && *id_str == '*') {
>> + d = NULL;
>> + goto check_state;
>> + } else if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
>> + rdt_last_cmd_puts("Missing domain id\n");
>> + return -EINVAL;
>> + }
>> +
>> + /* Verify if the dom_id is valid */
>> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> + if (d->hdr.id == dom_id) {
>> + found = 1;
>> + break;
>> + }
>> + }
>> +
>> + if (!found) {
>> + rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
>> + return -EINVAL;
>> + }
>> +
>> +check_state:
>> + mon_state = rdtgroup_str_to_mon_state(dom_str);
>
> Function should return error and exit here.
No. This is case to skip checking for domain when '*' is passed to apply
assignment to all the domains.
>
>> +
>> + assign_state = 0;
>> + unassign_state = 0;
>> +
>> + switch (op) {
>> + case '+':
>> + if (mon_state == ASSIGN_NONE) {
>> + rdt_last_cmd_puts("Invalid assign opcode\n");
>> + goto out_fail;
>> + }
>> + assign_state = mon_state;
>> + break;
>> + case '-':
>> + if (mon_state == ASSIGN_NONE) {
>> + rdt_last_cmd_puts("Invalid assign opcode\n");
>> + goto out_fail;
>> + }
>> + unassign_state = mon_state;
>> + break;
>> + case '=':
>> + assign_state = mon_state;
>> + unassign_state = (ASSIGN_TOTAL | ASSIGN_LOCAL) & ~assign_state;
>> + break;
>> + default:
>> + break;
>> + }
>> +
>> + if (assign_state & ASSIGN_TOTAL) {
>> + ret = rdtgroup_assign_update(rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID,
>> d);
>> + if (ret)
>> + goto out_fail;
>> + }
>
> Should unassign occur before assign so that unassign can make counters
> available for
> assign that follows?
Yes. It works. Just tested it.
>
>> +
>> + if (assign_state & ASSIGN_LOCAL) {
>> + ret = rdtgroup_assign_update(rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID,
>> d);
>> + if (ret)
>> + goto out_fail;
>> + }
>> +
>> + if (unassign_state & ASSIGN_TOTAL) {
>> + ret = rdtgroup_unassign_update(rdtgrp,
>> QOS_L3_MBM_TOTAL_EVENT_ID, d);
>> + if (ret)
>> + goto out_fail;
>> + }
>> +
>> + if (unassign_state & ASSIGN_LOCAL) {
>> + ret = rdtgroup_unassign_update(rdtgrp,
>> QOS_L3_MBM_LOCAL_EVENT_ID, d);
>> + if (ret)
>> + goto out_fail;
>> + }
>> +
>> + goto next;
>> +
>> +out_fail:
>> +
>> + return -EINVAL;
>> +}
>> +
>> +static ssize_t rdtgroup_mbm_control_write(struct kernfs_open_file *of,
>> + char *buf, size_t nbytes,
>> + loff_t off)
>> +{
>> + struct rdt_resource *r = of->kn->parent->priv;
>> + char *token, *cmon_grp, *mon_grp;
>> + int ret;
>> +
>> + if (!resctrl_arch_get_abmc_enabled())
>> + return -EINVAL;
>
> This needs to be protected by mutex.
Sure.
>
>> +
>> + /* Valid input requires a trailing newline */
>> + if (nbytes == 0 || buf[nbytes - 1] != '\n')
>> + return -EINVAL;
>> +
>> + buf[nbytes - 1] = '\0';
>> +
>> + cpus_read_lock();
>> + mutex_lock(&rdtgroup_mutex);
>> + rdt_last_cmd_clear();
>> +
>> + while ((token = strsep(&buf, "\n")) != NULL) {
>> + if (strstr(token, "//")) {
>> + /*
>> + * The CTRL_MON group processing:
>> + * default CTRL_MON group: "//<flags>"
>> + * non-default CTRL_MON group: "<CTRL_MON group>//flags"
>> + * The CTRL_MON group will be empty string if it is a
>> + * default group.
>> + */
>> + cmon_grp = strsep(&token, "//");
>> +
>> + /*
>> + * strsep returns empty string for contiguous delimiters.
>> + * Make sure check for two consecutive delimiters and
>> + * advance the token.
>> + */
>> + mon_grp = strsep(&token, "//");
>> + if (*mon_grp != '\0') {
>> + rdt_last_cmd_printf("Invalid CTRL_MON group format
>> %s\n", token);
>> + ret = -EINVAL;
>> + break;
>> + }
>> +
>> + ret = rdtgroup_process_flags(r, RDTCTRL_GROUP, cmon_grp,
>> mon_grp, token);
>> + if (ret)
>> + break;
>> + } else if (strstr(token, "/")) {
>> + /*
>> + * MON group processing:
>> + * MON_GROUP inside default CTRL_MON group: "/<MON
>> group>/<flags>"
>> + * MON_GROUP within CTRL_MON group: "<CTRL_MON group>/<MON
>> group>/<flags>"
>> + */
>> + cmon_grp = strsep(&token, "/");
>
> Isn't strsep(&token, "//") the same as strsep(&token, "/")? It looks like
> these two big branches
> can be merged.
Sure. Will check this.
>
>> +
>> + /* Extract the MON_GROUP. It cannot be empty string */
>> + mon_grp = strsep(&token, "/");
>> + if (*mon_grp == '\0') {
>> + rdt_last_cmd_printf("Invalid MON_GROUP format %s\n",
>> token);
>> + ret = -EINVAL;
>> + break;
>> + }
>> +
>> + ret = rdtgroup_process_flags(r, RDTMON_GROUP, cmon_grp,
>> mon_grp, token);
>> + if (ret)
>> + break;
>> + }
>> + }
>> +
>> + mutex_unlock(&rdtgroup_mutex);
>> + cpus_read_unlock();
>> +
>> + return ret ?: nbytes;
>> +}
>> +
>> #ifdef CONFIG_PROC_CPU_RESCTRL
>> /*
>> @@ -2277,9 +2589,10 @@ static struct rftype res_common_files[] = {
>> },
>> {
>> .name = "mbm_control",
>> - .mode = 0444,
>> + .mode = 0644,
>> .kf_ops = &rdtgroup_kf_single_ops,
>> .seq_show = rdtgroup_mbm_control_show,
>> + .write = rdtgroup_mbm_control_write,
>> },
>> {
>> .name = "cpus_list",
>
> Reinette
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
2024-08-16 21:28 ` [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Reinette Chatre
@ 2024-08-22 1:31 ` Moger, Babu
2024-08-23 20:29 ` Reinette Chatre
0 siblings, 1 reply; 96+ messages in thread
From: Moger, Babu @ 2024-08-22 1:31 UTC (permalink / raw)
To: Reinette Chatre, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Reinette,
On 8/16/24 16:28, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/6/24 3:00 PM, Babu Moger wrote:
>>
>> Feature adds following interface files:
>>
>> /sys/fs/resctrl/info/L3_MON/mbm_mode: Reports the list of assignable
>> monitoring features supported. The enclosed brackets indicate which
>> feature is enabled.
>
> I've been considering this file as a generic file where all future "MBM
> modes"
> can be captured, while this series treats it as specific to "assignable
> monitoring
> features" (btw, should this be "assignable monitoring modes" to match the
> name?).
> Looking closer at this implementation it does make things easier that
> "mbm_mode" is
> specific to "assignable monitoring features" but when doing so I think it
> should have
> a less generic name to avoid the obstacles we have with the existing
> "mon_features".
> Apologies that this goes back to be close to what you had earlier ... maybe
> "mbm_assign_mode"?
Lets see:
#cat /sys/fs/resctrl/info/L3_MON/mbm_mode
[mbm_cntr_assign] <- This already says 'assign'. Isn't that enough?
default <- Default mode is not related assignable features.
I would think mbm_mode is fine. Let me know.
>>
>> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
>> counters available for assignment.
>>
>> /sys/fs/resctrl/info/L3_MON/mbm_control: Reports the resctrl group and
>> monitor
>> status of each group. Assignment state can be updated by writing to the
>> interface.
>>
>> # Examples
>>
>> a. Check if ABMC support is available
>> #mount -t resctrl resctrl /sys/fs/resctrl/
>>
>> #cat /sys/fs/resctrl/info/L3_MON/mbm_mode
>> [mbm_cntr_assign]
>> legacy
>>
>> ABMC feature is detected and it is enabled.
>>
>> b. Check how many ABMC counters are available.
>>
>> #cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
>> 32
>>
>> c. Create few resctrl groups.
>>
>> # mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
>> # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
>> # mkdir
>> /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
>>
>>
>> d. This series adds a new interface file
>> /sys/fs/resctrl/info/L3_MON/mbm_control
>> to list and modify the group's monitoring states. File provides
>> single place
>> to list monitoring states of all the resctrl groups. It makes it
>> easier for
>> user space to learn about the counters are used without needing to
>> traverse
>
> "to learn about the counters are used" -> "to learn the counters that are
> used" or
> "to learn about the used counters" or ...?
Sure.
>
>> all the groups thus reducing the number of file system calls.
>>
>> The list follows the following format:
>>
>> "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>>
>> Format for specific type of groups:
>>
>> * Default CTRL_MON group:
>> "//<domain_id>=<flags>"
>>
>> * Non-default CTRL_MON group:
>> "<CTRL_MON group>//<domain_id>=<flags>"
>>
>> * Child MON group of default CTRL_MON group:
>> "/<MON group>/<domain_id>=<flags>"
>>
>> * Child MON group of non-default CTRL_MON group:
>> "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>>
>> Flags can be one of the following:
>>
>> t MBM total event is enabled.
>> l MBM local event is enabled.
>> tl Both total and local MBM events are enabled.
>> _ None of the MBM events are enabled
>>
>> Examples:
>>
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>> non_default_ctrl_mon_grp//0=tl;1=tl;
>> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> //0=tl;1=tl;
>> /child_default_mon_grp/0=tl;1=tl;
>>
>> There are four groups and all the groups have local and total
>> event enabled on domain 0 and 1.
>>
>> e. Update the group assignment states using the interface file
>> /sys/fs/resctrl/info/L3_MON/mbm_control.
>>
>> The write format is similar to the above list format with addition
>> of opcode for the assignment operation.
>> “<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>”
>>
>>
>> * Default CTRL_MON group:
>> "//<domain_id><opcode><flags>"
>>
>> * Non-default CTRL_MON group:
>> "<CTRL_MON group>//<domain_id><opcode><flags>"
>>
>> * Child MON group of default CTRL_MON group:
>> "/<MON group>/<domain_id><opcode><flags>"
>>
>> * Child MON group of non-default CTRL_MON group:
>> "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
>>
>> Opcode can be one of the following:
>>
>> = Update the assignment to match the flag.
>> + Assign a new event.
>> - Unassign a new event.
>
> Since user space can provide more than one flag the text could be more
> accurate
> noting this. Eg. "Update the assignment to match the flag" -> "Update the
> assignment
> to match the flags.".
Sure.
>
>>
>> Flags can be one of the following:
>>
>> t MBM total event.
>> l MBM local event.
>> tl Both total and local MBM events.
>> _ None of the MBM events. Only works with '=' opcode.
>
> Please take care with the implementation that seems to support a variety of
> combinations. If I understand correctly the implementation support flags
> like,
> for example, "tttt", "llll", "ltlt" ... those may not be an issue but of most
> concern is, for example, a pattern like "_lt" that (unexpectedly) appears to
> result in set of total and local.
Yes. Should we not allow flag combinations with "_"?
I am not very sure about how to go about this.
>
>>
>> Initial group status:
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>> non_default_ctrl_mon_grp//0=tl;1=tl;
>> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> //0=tl;1=tl;
>> /child_default_mon_grp/0=tl;1=tl;
>>
>> To update the default group to enable only total event on domain 0:
>> # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_control
>>
>> Assignment status after the update:
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>> non_default_ctrl_mon_grp//0=tl;1=tl;
>> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> //0=t;1=tl;
>> /child_default_mon_grp/0=tl;1=tl;
>>
>> To update the MON group child_default_mon_grp to remove total event
>> on domain 1:
>> # echo "/child_default_mon_grp/1-t" >
>> /sys/fs/resctrl/info/L3_MON/mbm_control
>>
>> Assignment status after the update:
>> $ cat /sys/fs/resctrl/info/L3_MON/mbm_control
>> non_default_ctrl_mon_grp//0=tl;1=tl;
>> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> //0=t;1=tl;
>> /child_default_mon_grp/0=tl;1=l;
>>
>> To update the MON group
>> non_default_ctrl_mon_grp/child_non_default_mon_grp to
>> remove both local and total events on domain 1:
>> # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
>> /sys/fs/resctrl/info/L3_MON/mbm_control
>>
>> Assignment status after the update:
>> non_default_ctrl_mon_grp//0=tl;1=tl;
>> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
>> //0=t;1=tl;
>> /child_default_mon_grp/0=tl;1=l;
>>
>> To update the default group to add a local event domain 0.
>> # echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_control
>>
>> Assignment status after the update:
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>> non_default_ctrl_mon_grp//0=tl;1=tl;
>> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
>> //0=tl;1=tl;
>> /child_default_mon_grp/0=tl;1=l;
>>
>> To update the non default CTRL_MON group non_default_ctrl_mon_grp to
>> unassign all
>> the MBM events on all the domains.
>> # echo "non_default_ctrl_mon_grp//*=_" >
>> /sys/fs/resctrl/info/L3_MON/mbm_control
>>
>> Assignment status after the update:
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>> non_default_ctrl_mon_grp//0=_;1=_;
>> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
>> //0=tl;1=tl;
>> /child_default_mon_grp/0=tl;1=l;
>>
>>
>> f. Read the event mbm_total_bytes and mbm_local_bytes of the default group.
>> There is no change in reading the events with ABMC. If the event is
>> unassigned
>> when reading, then the read will come back as "Unassigned".
>>
>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>> 779247936
>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>> 765207488
>>
>> g. Check the bandwidth configuration for the group. Note that bandwidth
>> configuration has a domain scope. Total event defaults to 0x7F (to
>> count all the events) and local event defaults to 0x15 (to count all
>> the local numa events). The event bitmap decoding is available at
>> https://www.kernel.org/doc/Documentation/x86/resctrl.rst
>> in section "mbm_total_bytes_config", "mbm_local_bytes_config":
>>
>> #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
>> 0=0x7f;1=0x7f
>>
>> #cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
>> 0=0x15;1=0x15
>>
>> h. Change the bandwidth source for domain 0 for the total event to count
>> only reads.
>> Note that this change effects total events on the domain 0.
>>
>> #echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
>> #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
>> 0=0x33;1=0x7F
>>
>> i. Now read the total event again. The first read will come back with
>> "Unavailable"
>> status. The subsequent read of mbm_total_bytes will display only the
>> read events.
>>
>> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>> Unavailable
>> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>> 314101
>>
>> j. Users will have the option to go back to legacy mbm_mode if required.
>> This can be done using the following command. Note that switching the
>> mbm_mode will reset all the mbm counters of all resctrl groups.
>
> "reset all the mbm counters" -> "reset all the MBM counters"
Sure.
>
>>
>> # echo "legacy" > /sys/fs/resctrl/info/L3_MON/mbm_mode
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_mode
>> mbm_cntr_assign
>> [legacy]
>>
>>
>> k. Unmount the resctrl
>>
>> #umount /sys/fs/resctrl/
>> ---
>> v6:
>> We still need to finalize few interface details on mbm_mode and
>> mbm_control
>> in case of ABMC and Soft-ABMC. We can continue the discussion with
>> this series.
>
> Could you please list the details that need to be finalized?
1. mbm_mode display
# cat /sys/fs/resctrl/info/L3_MON/mbm_mode
mbm_cntr_assign
[legacy]
"mbm_cntr_assign"
Are we sticking with ""mbm_cntr_assign" for ABMC?
What should we name for soft-ABMC?
2. Also we had some concerns about Individual event assignment(ABMC)
and group assignment(soft-ABMC)?
Are the flags "t" and 'l' good for both these modes?
>
> Thank you
>
> Reinette
>
--
Thanks
Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 22/22] x86/resctrl: Introduce interface to modify assignment states of the groups
2024-08-21 20:11 ` Moger, Babu
@ 2024-08-23 20:18 ` Reinette Chatre
2024-08-23 22:04 ` Moger, Babu
0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-08-23 20:18 UTC (permalink / raw)
To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
On 8/21/24 1:11 PM, Moger, Babu wrote:
> On 8/16/24 17:33, Reinette Chatre wrote:
>> On 8/6/24 3:00 PM, Babu Moger wrote:
...
>>> +
>>> +static int rdtgroup_str_to_mon_state(char *flag)
>>> +{
>>> + int i, mon_state = 0;
>>> +
>>> + for (i = 0; i < strlen(flag); i++) {
>>> + switch (*(flag + i)) {
>>> + case 't':
>>> + mon_state |= ASSIGN_TOTAL;
>>> + break;
>>> + case 'l':
>>> + mon_state |= ASSIGN_LOCAL;
>>> + break;
>>> + case '_':
>>> + mon_state = ASSIGN_NONE;
>>> + break;
>>
>> It looks like this supports flags like "_lt", treating it as assigning
>> both local and total. I expect this should remove all flags instead?
>
> This is a cobination of flags.
> "_lt" This will assign both local and total.
> "lt_" This with remove both the flags.
>
> It seems alright to me. Do you want me to change the bahaviour here?
This looks like undefined behavior to me. A request to set individual flags
and also clear all flags looks like a contradiction to me.
>
>
>>
>>> + default:
>>> + break;
>>> + }
>>> + }
>>> +
>>> + return mon_state;
>>> +}
>>
>> hmmm ... so you removed assigning mon_state to ASSIGN_NONE from default,
>> but that did not change what this function returns since ASSIGN_NONE is 0
>> and mon_state is initialized to 0. Unknown flags should cause error so
>> that it is possible to add flags in the future. Above prevents us from
>> ever adding new flags.
>
> May be I am missing something here. How about this?
>
> enum {
> ASSIGN_NONE = 0,
> ASSIGN_TOTAL,
> ASSIGN_LOCAL,
> ASSIGN_INVALID,
> };
>
>
> static int rdtgroup_str_to_mon_state(char *flag)
> {
> int i, mon_state = ASSIGN_NONE;
>
> for (i = 0; i < strlen(flag); i++) {
> switch (*(flag + i)) {
> case 't':
> mon_state |= ASSIGN_TOTAL;
> break;
> case 'l':
> mon_state |= ASSIGN_LOCAL;
> break;
> case '_':
> mon_state = ASSIGN_NONE;
> break;
> default:
> mon_state = ASSIGN_INVALID;
> goto out_done;
> }
> }
>
> :out_done:
> return mon_state;
> }
>
> Then handle the ASSIGN_INVALID from the caller. Is that what you think?
Why not return an error?
>
>>
>>> +
>>> +static struct rdtgroup *rdtgroup_find_grp(enum rdt_group_type rtype,
>>> char *p_grp, char *c_grp)
>>
>> rdtgroup_find_grp() -> rdtgroup_find_grp_by_name()?
>>
>>> +{
>>> + struct rdtgroup *rdtg, *crg;
>>> +
>>> + if (rtype == RDTCTRL_GROUP && *p_grp == '\0') {
>>> + return &rdtgroup_default;
>>> + } else if (rtype == RDTCTRL_GROUP) {
>>> + list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list)
>>> + if (!strcmp(p_grp, rdtg->kn->name))
>>> + return rdtg;
>>> + } else if (rtype == RDTMON_GROUP) {
>>> + list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
>>> + if (!strcmp(p_grp, rdtg->kn->name)) {
>>> + list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
>>> + mon.crdtgrp_list) {
>>> + if (!strcmp(c_grp, crg->kn->name))
>>> + return crg;
>>> + }
>>> + }
>>> + }
>>> + }
>>> +
>>> + return NULL;
>>> +}
>>> +
>>> +static int rdtgroup_process_flags(struct rdt_resource *r,
>>> + enum rdt_group_type rtype,
>>> + char *p_grp, char *c_grp, char *tok)
>>> +{
>>> + int op, mon_state, assign_state, unassign_state;
>>> + char *dom_str, *id_str, *op_str;
>>> + struct rdt_mon_domain *d;
>>> + struct rdtgroup *rdtgrp;
>>> + unsigned long dom_id;
>>> + int ret, found = 0;
>>> +
>>> + rdtgrp = rdtgroup_find_grp(rtype, p_grp, c_grp);
>>> +
>>> + if (!rdtgrp) {
>>> + rdt_last_cmd_puts("Not a valid resctrl group\n");
>>> + return -EINVAL;
>>> + }
>>> +
>>> +next:
>>> + if (!tok || tok[0] == '\0')
>>> + return 0;
>>> +
>>> + /* Start processing the strings for each domain */
>>> + dom_str = strim(strsep(&tok, ";"));
>>> +
>>> + op_str = strpbrk(dom_str, "=+-");
>>> +
>>> + if (op_str) {
>>> + op = *op_str;
>>> + } else {
>>> + rdt_last_cmd_puts("Missing operation =, +, -, _ character\n");
>>> + return -EINVAL;
>>> + }
>>> +
>>> + id_str = strsep(&dom_str, "=+-");
>>> +
>>> + /* Check for domain id '*' which means all domains */
>>> + if (id_str && *id_str == '*') {
>>> + d = NULL;
>>> + goto check_state;
>>> + } else if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
>>> + rdt_last_cmd_puts("Missing domain id\n");
>>> + return -EINVAL;
>>> + }
>>> +
>>> + /* Verify if the dom_id is valid */
>>> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
>>> + if (d->hdr.id == dom_id) {
>>> + found = 1;
>>> + break;
>>> + }
>>> + }
>>> +
>>> + if (!found) {
>>> + rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
>>> + return -EINVAL;
>>> + }
>>> +
>>> +check_state:
>>> + mon_state = rdtgroup_str_to_mon_state(dom_str);
>>
>> Function should return error and exit here.
>
> No. This is case to skip checking for domain when '*' is passed to apply
> assignment to all the domains.
Using "*" for a domain still requires valid flags, no?
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 16/22] x86/resctrl: Add the interface to unassign a MBM counter
2024-08-21 16:01 ` Moger, Babu
@ 2024-08-23 20:18 ` Reinette Chatre
2024-08-23 22:05 ` Moger, Babu
0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-08-23 20:18 UTC (permalink / raw)
To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
On 8/21/24 9:01 AM, Moger, Babu wrote:
> Hi Reinette,
>
> On 8/16/24 16:41, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 8/6/24 3:00 PM, Babu Moger wrote:
>>> The ABMC feature provides an option to the user to assign a hardware
>>
>> This is about resctrl fs so "The ABMC feature" -> "mbm_cntr_assign mode"
>
> Sure.
>
>> (please check whole series).
>
> Sure.
>
>>
>>> counter to an RMID and monitor the bandwidth as long as it is assigned.
>>> The assigned RMID will be tracked by the hardware until the user unassigns
>>> it manually.
>>>
>>> Hardware provides only limited number of counters. If the system runs out
>>> of assignable counters, kernel will display an error when a new assignment
>>> is requested. Users need to unassign a already assigned counter to make
>>> space for new assignment.
>>>
>>> Provide the interface to unassign the counter ids from the group. Free the
>>> counter if it is not assigned in any of the domains.
>>>
>>> The feature details are documented in the APM listed below [1].
>>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable
>>> Bandwidth
>>> Monitoring (ABMC).
>>>
>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>>> ---
>>> v6: Removed mbm_cntr_free from this patch.
>>> Added counter test in all the domains and free if it is not
>>> assigned to
>>> any domains.
>>>
>>> v5: Few name changes to match cntr_id.
>>> Changed the function names to
>>> rdtgroup_unassign_cntr
>>> More comments on commit log.
>>>
>>> v4: Added domain specific unassign feature.
>>> Few name changes.
>>>
>>> v3: Removed the static from the prototype of rdtgroup_unassign_abmc.
>>> The function is not called directly from user anymore. These
>>> changes are related to global assignment interface.
>>>
>>> v2: No changes.
>>> ---
>>> arch/x86/kernel/cpu/resctrl/internal.h | 2 +
>>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 52 ++++++++++++++++++++++++++
>>> 2 files changed, 54 insertions(+)
>>>
>>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
>>> b/arch/x86/kernel/cpu/resctrl/internal.h
>>> index 4e8109dee174..cc832955b787 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>>> @@ -689,6 +689,8 @@ int resctrl_arch_assign_cntr(struct rdt_mon_domain
>>> *d, enum resctrl_event_id evt
>>> u32 rmid, u32 cntr_id, u32 closid, bool assign);
>>> int rdtgroup_assign_cntr(struct rdtgroup *rdtgrp, enum
>>> resctrl_event_id evtid);
>>> int rdtgroup_alloc_cntr(struct rdtgroup *rdtgrp, int index);
>>> +int rdtgroup_unassign_cntr(struct rdtgroup *rdtgrp, enum
>>> resctrl_event_id evtid);
>>> +void rdtgroup_free_cntr(struct rdt_resource *r, struct rdtgroup
>>> *rdtgrp, int index);
>>> void rdt_staged_configs_clear(void);
>>> bool closid_allocated(unsigned int closid);
>>> int resctrl_find_cleanest_closid(void);
>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> index 1ee91a7293a8..0c2215dbd497 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> @@ -1961,6 +1961,58 @@ int rdtgroup_assign_cntr(struct rdtgroup *rdtgrp,
>>> enum resctrl_event_id evtid)
>>> return 0;
>>> }
>>> +static int rdtgroup_mbm_cntr_test(struct rdt_resource *r, u32 cntr_id)
>>
>> Could "test" be replaced with something more specific about what is tested?
>> for example, "rdtgroup_mbm_cntr_is_assigned()" or something better? The
>
> Yes. We can do that.
>
>> function
>> looks like a good candidate for returning a bool.
>
> Sure.
>>
>> Is this function needed though? (more below)
>
> Yes. It is required. It is called from two places
> (rdtgroup_unassign_update and rdtgroup_unassign_cntr).
>
> We can open code in rdtgroup_unassign_cntr. But we can't do that in
> rdtgroup_unassign_update. But, I will check again for sure.
Similar to rdtgroup_assign_cntr() and rdtgroup_assign_update() discussed
in previous patch, it also looks like rdtgroup_unassign_cntr() and
rdtgroup_unassign_update() can be merged.
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
2024-08-22 1:31 ` Moger, Babu
@ 2024-08-23 20:29 ` Reinette Chatre
2024-08-23 22:14 ` Moger, Babu
0 siblings, 1 reply; 96+ messages in thread
From: Reinette Chatre @ 2024-08-23 20:29 UTC (permalink / raw)
To: babu.moger, corbet, fenghua.yu, tglx, mingo, bp, dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Babu,
On 8/21/24 6:31 PM, Moger, Babu wrote:
> Hi Reinette,
>
> On 8/16/24 16:28, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 8/6/24 3:00 PM, Babu Moger wrote:
>>>
>>> Feature adds following interface files:
>>>
>>> /sys/fs/resctrl/info/L3_MON/mbm_mode: Reports the list of assignable
>>> monitoring features supported. The enclosed brackets indicate which
>>> feature is enabled.
>>
>> I've been considering this file as a generic file where all future "MBM
>> modes"
>> can be captured, while this series treats it as specific to "assignable
>> monitoring
>> features" (btw, should this be "assignable monitoring modes" to match the
>> name?).
>> Looking closer at this implementation it does make things easier that
>> "mbm_mode" is
>> specific to "assignable monitoring features" but when doing so I think it
>> should have
>> a less generic name to avoid the obstacles we have with the existing
>> "mon_features".
>> Apologies that this goes back to be close to what you had earlier ... maybe
>> "mbm_assign_mode"?
>
> Lets see:
> #cat /sys/fs/resctrl/info/L3_MON/mbm_mode
> [mbm_cntr_assign] <- This already says 'assign'. Isn't that enough?
It will be enough if "mbm_mode" is intended to be used for all current
and future MBM modes/features but this series instead dedicates this file
to just "assignable monitoring counters" feature. Doing so prevents us
from, in the future, expanding this file to, for example, contain
a new entry representing a new feature.
>
> default <- Default mode is not related assignable features.
If not assignable features, what is it related to? "default" being the
absence of assignable features still seems related to me.
>
> I would think mbm_mode is fine. Let me know.
If this work is reworded that it is intended to support any MBM mode then
it is fine, if this work remains to dedicate this file just to assignable
features then I think its name should be changed.
...
>>>
>>> Flags can be one of the following:
>>>
>>> t MBM total event.
>>> l MBM local event.
>>> tl Both total and local MBM events.
>>> _ None of the MBM events. Only works with '=' opcode.
>>
>> Please take care with the implementation that seems to support a variety of
>> combinations. If I understand correctly the implementation support flags
>> like,
>> for example, "tttt", "llll", "ltlt" ... those may not be an issue but of most
>> concern is, for example, a pattern like "_lt" that (unexpectedly) appears to
>> result in set of total and local.
>
> Yes. Should we not allow flag combinations with "_"?
> I am not very sure about how to go about this.
>
This topic seems to have moved to patch #22.
...
>>> # echo "legacy" > /sys/fs/resctrl/info/L3_MON/mbm_mode
>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_mode
>>> mbm_cntr_assign
>>> [legacy]
>>>
>>> k. Unmount the resctrl
>>> #umount /sys/fs/resctrl/
>>> ---
>>> v6:
>>> We still need to finalize few interface details on mbm_mode and
>>> mbm_control
>>> in case of ABMC and Soft-ABMC. We can continue the discussion with
>>> this series.
>>
>> Could you please list the details that need to be finalized?
>
> 1. mbm_mode display
> # cat /sys/fs/resctrl/info/L3_MON/mbm_mode
> mbm_cntr_assign
> [legacy]
>
> "mbm_cntr_assign"
> Are we sticking with ""mbm_cntr_assign" for ABMC?
> What should we name for soft-ABMC?
>
> 2. Also we had some concerns about Individual event assignment(ABMC)
> and group assignment(soft-ABMC)?
> Are the flags "t" and 'l' good for both these modes?
If I remember correctly the previous discussion ended with the need for
"modes" that indicate to user space what to expect when interacting with the
MBM flags in the "mbm_control" file. The term used by ABMC should reflect that
each MBM flag/event can be set independently, while the term used by soft-ABMC
reflects that setting one flag/event makes the same change to the other
MBM flag/event.
Reinette
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 22/22] x86/resctrl: Introduce interface to modify assignment states of the groups
2024-08-23 20:18 ` Reinette Chatre
@ 2024-08-23 22:04 ` Moger, Babu
0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-08-23 22:04 UTC (permalink / raw)
To: Reinette Chatre, babu.moger, corbet, fenghua.yu, tglx, mingo, bp,
dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Reinette,
On 8/23/2024 3:18 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/21/24 1:11 PM, Moger, Babu wrote:
>> On 8/16/24 17:33, Reinette Chatre wrote:
>>> On 8/6/24 3:00 PM, Babu Moger wrote:
>
> ...
>
>>>> +
>>>> +static int rdtgroup_str_to_mon_state(char *flag)
>>>> +{
>>>> + int i, mon_state = 0;
>>>> +
>>>> + for (i = 0; i < strlen(flag); i++) {
>>>> + switch (*(flag + i)) {
>>>> + case 't':
>>>> + mon_state |= ASSIGN_TOTAL;
>>>> + break;
>>>> + case 'l':
>>>> + mon_state |= ASSIGN_LOCAL;
>>>> + break;
>>>> + case '_':
>>>> + mon_state = ASSIGN_NONE;
>>>> + break;
>>>
>>> It looks like this supports flags like "_lt", treating it as assigning
>>> both local and total. I expect this should remove all flags instead?
>>
>> This is a cobination of flags.
>> "_lt" This will assign both local and total.
>> "lt_" This with remove both the flags.
>>
>> It seems alright to me. Do you want me to change the bahaviour here?
>
> This looks like undefined behavior to me. A request to set individual flags
> and also clear all flags looks like a contradiction to me.
Ok. Will address this in v7.
>
>>
>>
>>>
>>>> + default:
>>>> + break;
>>>> + }
>>>> + }
>>>> +
>>>> + return mon_state;
>>>> +}
>>>
>>> hmmm ... so you removed assigning mon_state to ASSIGN_NONE from default,
>>> but that did not change what this function returns since ASSIGN_NONE
>>> is 0
>>> and mon_state is initialized to 0. Unknown flags should cause error so
>>> that it is possible to add flags in the future. Above prevents us from
>>> ever adding new flags.
>>
>> May be I am missing something here. How about this?
>>
>> enum {
>> ASSIGN_NONE = 0,
>> ASSIGN_TOTAL,
>> ASSIGN_LOCAL,
>> ASSIGN_INVALID,
>> };
>>
>>
>> static int rdtgroup_str_to_mon_state(char *flag)
>> {
>> int i, mon_state = ASSIGN_NONE;
>>
>> for (i = 0; i < strlen(flag); i++) {
>> switch (*(flag + i)) {
>> case 't':
>> mon_state |= ASSIGN_TOTAL;
>> break;
>> case 'l':
>> mon_state |= ASSIGN_LOCAL;
>> break;
>> case '_':
>> mon_state = ASSIGN_NONE;
>> break;
>> default:
>> mon_state = ASSIGN_INVALID;
>> goto out_done;
>> }
>> }
>>
>> :out_done:
>> return mon_state;
>> }
>>
>> Then handle the ASSIGN_INVALID from the caller. Is that what you think?
>
> Why not return an error?
Sure.
>
>>
>>>
>>>> +
>>>> +static struct rdtgroup *rdtgroup_find_grp(enum rdt_group_type rtype,
>>>> char *p_grp, char *c_grp)
>>>
>>> rdtgroup_find_grp() -> rdtgroup_find_grp_by_name()?
>>>
>>>> +{
>>>> + struct rdtgroup *rdtg, *crg;
>>>> +
>>>> + if (rtype == RDTCTRL_GROUP && *p_grp == '\0') {
>>>> + return &rdtgroup_default;
>>>> + } else if (rtype == RDTCTRL_GROUP) {
>>>> + list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list)
>>>> + if (!strcmp(p_grp, rdtg->kn->name))
>>>> + return rdtg;
>>>> + } else if (rtype == RDTMON_GROUP) {
>>>> + list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
>>>> + if (!strcmp(p_grp, rdtg->kn->name)) {
>>>> + list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
>>>> + mon.crdtgrp_list) {
>>>> + if (!strcmp(c_grp, crg->kn->name))
>>>> + return crg;
>>>> + }
>>>> + }
>>>> + }
>>>> + }
>>>> +
>>>> + return NULL;
>>>> +}
>>>> +
>>>> +static int rdtgroup_process_flags(struct rdt_resource *r,
>>>> + enum rdt_group_type rtype,
>>>> + char *p_grp, char *c_grp, char *tok)
>>>> +{
>>>> + int op, mon_state, assign_state, unassign_state;
>>>> + char *dom_str, *id_str, *op_str;
>>>> + struct rdt_mon_domain *d;
>>>> + struct rdtgroup *rdtgrp;
>>>> + unsigned long dom_id;
>>>> + int ret, found = 0;
>>>> +
>>>> + rdtgrp = rdtgroup_find_grp(rtype, p_grp, c_grp);
>>>> +
>>>> + if (!rdtgrp) {
>>>> + rdt_last_cmd_puts("Not a valid resctrl group\n");
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>> +next:
>>>> + if (!tok || tok[0] == '\0')
>>>> + return 0;
>>>> +
>>>> + /* Start processing the strings for each domain */
>>>> + dom_str = strim(strsep(&tok, ";"));
>>>> +
>>>> + op_str = strpbrk(dom_str, "=+-");
>>>> +
>>>> + if (op_str) {
>>>> + op = *op_str;
>>>> + } else {
>>>> + rdt_last_cmd_puts("Missing operation =, +, -, _ character\n");
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>> + id_str = strsep(&dom_str, "=+-");
>>>> +
>>>> + /* Check for domain id '*' which means all domains */
>>>> + if (id_str && *id_str == '*') {
>>>> + d = NULL;
>>>> + goto check_state;
>>>> + } else if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
>>>> + rdt_last_cmd_puts("Missing domain id\n");
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>> + /* Verify if the dom_id is valid */
>>>> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
>>>> + if (d->hdr.id == dom_id) {
>>>> + found = 1;
>>>> + break;
>>>> + }
>>>> + }
>>>> +
>>>> + if (!found) {
>>>> + rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>> +check_state:
>>>> + mon_state = rdtgroup_str_to_mon_state(dom_str);
>>>
>>> Function should return error and exit here.
>>
>> No. This is case to skip checking for domain when '*' is passed to apply
>> assignment to all the domains.
>
> Using "*" for a domain still requires valid flags, no?
Yes. Flags will be process as usual.
>
> Reinette
>
--
- Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 16/22] x86/resctrl: Add the interface to unassign a MBM counter
2024-08-23 20:18 ` Reinette Chatre
@ 2024-08-23 22:05 ` Moger, Babu
0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-08-23 22:05 UTC (permalink / raw)
To: Reinette Chatre, babu.moger, corbet, fenghua.yu, tglx, mingo, bp,
dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Reinette,
On 8/23/2024 3:18 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/21/24 9:01 AM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 8/16/24 16:41, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 8/6/24 3:00 PM, Babu Moger wrote:
>>>> The ABMC feature provides an option to the user to assign a hardware
>>>
>>> This is about resctrl fs so "The ABMC feature" -> "mbm_cntr_assign mode"
>>
>> Sure.
>>
>>> (please check whole series).
>>
>> Sure.
>>
>>>
>>>> counter to an RMID and monitor the bandwidth as long as it is assigned.
>>>> The assigned RMID will be tracked by the hardware until the user
>>>> unassigns
>>>> it manually.
>>>>
>>>> Hardware provides only limited number of counters. If the system
>>>> runs out
>>>> of assignable counters, kernel will display an error when a new
>>>> assignment
>>>> is requested. Users need to unassign a already assigned counter to make
>>>> space for new assignment.
>>>>
>>>> Provide the interface to unassign the counter ids from the group.
>>>> Free the
>>>> counter if it is not assigned in any of the domains.
>>>>
>>>> The feature details are documented in the APM listed below [1].
>>>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>>>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable
>>>> Bandwidth
>>>> Monitoring (ABMC).
>>>>
>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>>>> ---
>>>> v6: Removed mbm_cntr_free from this patch.
>>>> Added counter test in all the domains and free if it is not
>>>> assigned to
>>>> any domains.
>>>>
>>>> v5: Few name changes to match cntr_id.
>>>> Changed the function names to
>>>> rdtgroup_unassign_cntr
>>>> More comments on commit log.
>>>>
>>>> v4: Added domain specific unassign feature.
>>>> Few name changes.
>>>>
>>>> v3: Removed the static from the prototype of rdtgroup_unassign_abmc.
>>>> The function is not called directly from user anymore. These
>>>> changes are related to global assignment interface.
>>>>
>>>> v2: No changes.
>>>> ---
>>>> arch/x86/kernel/cpu/resctrl/internal.h | 2 +
>>>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 52
>>>> ++++++++++++++++++++++++++
>>>> 2 files changed, 54 insertions(+)
>>>>
>>>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
>>>> b/arch/x86/kernel/cpu/resctrl/internal.h
>>>> index 4e8109dee174..cc832955b787 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>>>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>>>> @@ -689,6 +689,8 @@ int resctrl_arch_assign_cntr(struct rdt_mon_domain
>>>> *d, enum resctrl_event_id evt
>>>> u32 rmid, u32 cntr_id, u32 closid, bool assign);
>>>> int rdtgroup_assign_cntr(struct rdtgroup *rdtgrp, enum
>>>> resctrl_event_id evtid);
>>>> int rdtgroup_alloc_cntr(struct rdtgroup *rdtgrp, int index);
>>>> +int rdtgroup_unassign_cntr(struct rdtgroup *rdtgrp, enum
>>>> resctrl_event_id evtid);
>>>> +void rdtgroup_free_cntr(struct rdt_resource *r, struct rdtgroup
>>>> *rdtgrp, int index);
>>>> void rdt_staged_configs_clear(void);
>>>> bool closid_allocated(unsigned int closid);
>>>> int resctrl_find_cleanest_closid(void);
>>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> index 1ee91a7293a8..0c2215dbd497 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> @@ -1961,6 +1961,58 @@ int rdtgroup_assign_cntr(struct rdtgroup
>>>> *rdtgrp,
>>>> enum resctrl_event_id evtid)
>>>> return 0;
>>>> }
>>>> +static int rdtgroup_mbm_cntr_test(struct rdt_resource *r, u32
>>>> cntr_id)
>>>
>>> Could "test" be replaced with something more specific about what is
>>> tested?
>>> for example, "rdtgroup_mbm_cntr_is_assigned()" or something better? The
>>
>> Yes. We can do that.
>>
>>> function
>>> looks like a good candidate for returning a bool.
>>
>> Sure.
>>>
>>> Is this function needed though? (more below)
>>
>> Yes. It is required. It is called from two places
>> (rdtgroup_unassign_update and rdtgroup_unassign_cntr).
>>
>> We can open code in rdtgroup_unassign_cntr. But we can't do that in
>> rdtgroup_unassign_update. But, I will check again for sure.
>
> Similar to rdtgroup_assign_cntr() and rdtgroup_assign_update() discussed
> in previous patch, it also looks like rdtgroup_unassign_cntr() and
> rdtgroup_unassign_update() can be merged.
Yes. We can do that. Thanks
--
- Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
2024-08-23 20:29 ` Reinette Chatre
@ 2024-08-23 22:14 ` Moger, Babu
0 siblings, 0 replies; 96+ messages in thread
From: Moger, Babu @ 2024-08-23 22:14 UTC (permalink / raw)
To: Reinette Chatre, babu.moger, corbet, fenghua.yu, tglx, mingo, bp,
dave.hansen
Cc: x86, hpa, paulmck, rdunlap, tj, peterz, yanjiewtw, kim.phillips,
lukas.bulwahn, seanjc, jmattson, leitao, jpoimboe,
rick.p.edgecombe, kirill.shutemov, jithu.joseph, kai.huang,
kan.liang, daniel.sneddon, pbonzini, sandipan.das, ilpo.jarvinen,
peternewman, maciej.wieczor-retman, linux-doc, linux-kernel,
eranian, james.morse
Hi Reinette,
On 8/23/2024 3:29 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/21/24 6:31 PM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 8/16/24 16:28, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 8/6/24 3:00 PM, Babu Moger wrote:
>>>>
>>>> Feature adds following interface files:
>>>>
>>>> /sys/fs/resctrl/info/L3_MON/mbm_mode: Reports the list of assignable
>>>> monitoring features supported. The enclosed brackets indicate which
>>>> feature is enabled.
>>>
>>> I've been considering this file as a generic file where all future "MBM
>>> modes"
>>> can be captured, while this series treats it as specific to "assignable
>>> monitoring
>>> features" (btw, should this be "assignable monitoring modes" to match
>>> the
>>> name?).
>>> Looking closer at this implementation it does make things easier that
>>> "mbm_mode" is
>>> specific to "assignable monitoring features" but when doing so I
>>> think it
>>> should have
>>> a less generic name to avoid the obstacles we have with the existing
>>> "mon_features".
>>> Apologies that this goes back to be close to what you had earlier ...
>>> maybe
>>> "mbm_assign_mode"?
>>
>> Lets see:
>> #cat /sys/fs/resctrl/info/L3_MON/mbm_mode
>> [mbm_cntr_assign] <- This already says 'assign'. Isn't that enough?
>
> It will be enough if "mbm_mode" is intended to be used for all current
> and future MBM modes/features but this series instead dedicates this file
> to just "assignable monitoring counters" feature. Doing so prevents us
> from, in the future, expanding this file to, for example, contain
> a new entry representing a new feature.
>
>>
>> default <- Default mode is not related assignable features.
>
> If not assignable features, what is it related to? "default" being the
> absence of assignable features still seems related to me.
>
>>
>> I would think mbm_mode is fine. Let me know.
>
> If this work is reworded that it is intended to support any MBM mode then
> it is fine, if this work remains to dedicate this file just to assignable
> features then I think its name should be changed.
Ok. Will change it to "mbm_assign_mode".
>
> ...
>
>>>>
>>>> Flags can be one of the following:
>>>>
>>>> t MBM total event.
>>>> l MBM local event.
>>>> tl Both total and local MBM events.
>>>> _ None of the MBM events. Only works with '=' opcode.
>>>
>>> Please take care with the implementation that seems to support a
>>> variety of
>>> combinations. If I understand correctly the implementation support flags
>>> like,
>>> for example, "tttt", "llll", "ltlt" ... those may not be an issue but
>>> of most
>>> concern is, for example, a pattern like "_lt" that (unexpectedly)
>>> appears to
>>> result in set of total and local.
>>
>> Yes. Should we not allow flag combinations with "_"?
>> I am not very sure about how to go about this.
>>
>
> This topic seems to have moved to patch #22.
Yes. got it.
>
> ...
>
>>>> # echo "legacy" > /sys/fs/resctrl/info/L3_MON/mbm_mode
>>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_mode
>>>> mbm_cntr_assign
>>>> [legacy]
>>>>
>>>> k. Unmount the resctrl
>>>> #umount /sys/fs/resctrl/
>>>> ---
>>>> v6:
>>>> We still need to finalize few interface details on mbm_mode and
>>>> mbm_control
>>>> in case of ABMC and Soft-ABMC. We can continue the discussion with
>>>> this series.
>>>
>>> Could you please list the details that need to be finalized?
>>
>> 1. mbm_mode display
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_mode
>> mbm_cntr_assign
>> [legacy]
>>
>> "mbm_cntr_assign"
>> Are we sticking with ""mbm_cntr_assign" for ABMC?
>> What should we name for soft-ABMC?
>>
>> 2. Also we had some concerns about Individual event assignment(ABMC)
>> and group assignment(soft-ABMC)?
>> Are the flags "t" and 'l' good for both these modes?
>
> If I remember correctly the previous discussion ended with the need for
> "modes" that indicate to user space what to expect when interacting with
> the
> MBM flags in the "mbm_control" file. The term used by ABMC should
> reflect that
> each MBM flag/event can be set independently, while the term used by
> soft-ABMC
> reflects that setting one flag/event makes the same change to the other
> MBM flag/event.
>
Will add text in the resctrl.rst make it clear about it.
thanks
--
- Babu Moger
^ permalink raw reply [flat|nested] 96+ messages in thread
end of thread, other threads:[~2024-08-23 22:14 UTC | newest]
Thread overview: 96+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-06 22:00 [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
2024-08-06 22:00 ` [PATCH v6 01/22] x86/cpufeatures: Add support for " Babu Moger
2024-08-07 16:32 ` Thomas Gleixner
2024-08-08 14:46 ` Moger, Babu
2024-08-06 22:00 ` [PATCH v6 02/22] x86/resctrl: Add ABMC feature in the command line options Babu Moger
2024-08-06 22:00 ` [PATCH v6 03/22] x86/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
2024-08-16 21:29 ` Reinette Chatre
2024-08-19 14:46 ` Moger, Babu
2024-08-06 22:00 ` [PATCH v6 04/22] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
2024-08-07 16:33 ` Thomas Gleixner
2024-08-16 21:30 ` Reinette Chatre
2024-08-19 15:37 ` Moger, Babu
2024-08-06 22:00 ` [PATCH v6 05/22] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags Babu Moger
2024-08-06 22:00 ` [PATCH v6 06/22] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
2024-08-16 16:29 ` James Morse
2024-08-16 20:38 ` Moger, Babu
2024-08-16 21:31 ` Reinette Chatre
2024-08-19 18:07 ` Moger, Babu
2024-08-20 18:17 ` Reinette Chatre
2024-08-06 22:00 ` [PATCH v6 07/22] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
2024-08-16 16:56 ` James Morse
2024-08-16 20:38 ` Moger, Babu
2024-08-16 21:32 ` Reinette Chatre
2024-08-19 19:27 ` Moger, Babu
2024-08-06 22:00 ` [PATCH v6 08/22] x86/resctrl: Introduce interface to display number of monitoring counters Babu Moger
2024-08-16 21:34 ` Reinette Chatre
2024-08-20 15:56 ` Moger, Babu
2024-08-20 18:08 ` Reinette Chatre
2024-08-06 22:00 ` [PATCH v6 09/22] x86/resctrl: Introduce MBM counters bitmap Babu Moger
2024-08-16 16:29 ` James Morse
2024-08-16 20:39 ` Moger, Babu
2024-08-16 21:35 ` Reinette Chatre
2024-08-19 15:49 ` Moger, Babu
2024-08-20 18:08 ` Reinette Chatre
2024-08-06 22:00 ` [PATCH v6 10/22] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg Babu Moger
2024-08-06 22:00 ` [PATCH v6 11/22] x86/resctrl: Remove MSR reading of event configuration value Babu Moger
2024-08-16 21:36 ` Reinette Chatre
2024-08-20 16:19 ` Moger, Babu
2024-08-20 18:09 ` Reinette Chatre
2024-08-06 22:00 ` [PATCH v6 12/22] x86/resctrl: Introduce mbm_cntr_map to track counters at domain Babu Moger
2024-08-16 21:37 ` Reinette Chatre
2024-08-20 18:24 ` Moger, Babu
2024-08-06 22:00 ` [PATCH v6 13/22] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
2024-08-16 21:38 ` Reinette Chatre
2024-08-20 20:56 ` Moger, Babu
2024-08-20 21:09 ` Reinette Chatre
2024-08-06 22:00 ` [PATCH v6 14/22] x86/resctrl: Introduce cntr_id in mongroup for assignments Babu Moger
2024-08-16 21:38 ` Reinette Chatre
2024-08-20 22:42 ` Moger, Babu
2024-08-06 22:00 ` [PATCH v6 15/22] x86/resctrl: Add the interface to assign a hardware counter Babu Moger
2024-08-16 16:30 ` James Morse
2024-08-16 20:39 ` Moger, Babu
2024-08-16 21:41 ` Reinette Chatre
2024-08-21 15:04 ` Moger, Babu
2024-08-06 22:00 ` [PATCH v6 16/22] x86/resctrl: Add the interface to unassign a MBM counter Babu Moger
2024-08-16 21:41 ` Reinette Chatre
2024-08-21 16:01 ` Moger, Babu
2024-08-23 20:18 ` Reinette Chatre
2024-08-23 22:05 ` Moger, Babu
2024-08-06 22:00 ` [PATCH v6 17/22] x86/resctrl: Assign/unassign counters by default when ABMC is enabled Babu Moger
2024-08-16 21:42 ` Reinette Chatre
2024-08-21 17:20 ` Moger, Babu
2024-08-06 22:00 ` [PATCH v6 18/22] x86/resctrl: Report "Unassigned" for MBM events in ABMC mode Babu Moger
2024-08-16 21:42 ` Reinette Chatre
2024-08-21 17:30 ` Moger, Babu
2024-08-06 22:00 ` [PATCH v6 19/22] x86/resctrl: Introduce the interface to switch between monitor modes Babu Moger
2024-08-16 16:31 ` James Morse
2024-08-16 17:01 ` Reinette Chatre
2024-08-16 17:16 ` Peter Newman
2024-08-16 18:09 ` Reinette Chatre
2024-08-19 14:52 ` Reinette Chatre
2024-08-19 18:27 ` Peter Newman
2024-08-20 18:11 ` Reinette Chatre
2024-08-16 21:42 ` Reinette Chatre
2024-08-21 18:08 ` Moger, Babu
2024-08-06 22:00 ` [PATCH v6 20/22] x86/resctrl: Enable AMD ABMC feature by default when supported Babu Moger
2024-08-16 16:32 ` James Morse
2024-08-16 20:40 ` Moger, Babu
2024-08-16 22:33 ` Reinette Chatre
2024-08-19 18:18 ` Moger, Babu
2024-08-20 18:12 ` Reinette Chatre
2024-08-20 20:04 ` Moger, Babu
2024-08-20 20:18 ` Moger, Babu
2024-08-20 20:37 ` Reinette Chatre
2024-08-06 22:00 ` [PATCH v6 21/22] x86/resctrl: Introduce interface to list monitor states of all the groups Babu Moger
2024-08-16 16:28 ` James Morse
2024-08-16 20:40 ` Moger, Babu
2024-08-06 22:00 ` [PATCH v6 22/22] x86/resctrl: Introduce interface to modify assignment states of " Babu Moger
2024-08-16 22:33 ` Reinette Chatre
2024-08-21 20:11 ` Moger, Babu
2024-08-23 20:18 ` Reinette Chatre
2024-08-23 22:04 ` Moger, Babu
2024-08-16 21:28 ` [PATCH v6 00/22] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Reinette Chatre
2024-08-22 1:31 ` Moger, Babu
2024-08-23 20:29 ` Reinette Chatre
2024-08-23 22:14 ` Moger, Babu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).