* [PATCH v10 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
@ 2024-12-12 20:15 Babu Moger
2024-12-12 20:15 ` [PATCH v10 01/24] x86/resctrl: Add __init attribute to functions called from resctrl_late_init() Babu Moger
` (23 more replies)
0 siblings, 24 replies; 76+ messages in thread
From: Babu Moger @ 2024-12-12 20:15 UTC (permalink / raw)
To: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc,
babu.moger, xin3.li, andrew.cooper3, ebiggers, mario.limonciello,
james.morse, tan.shaopeng, linux-doc, linux-kernel,
maciej.wieczor-retman, eranian
This series adds the support for Assignable Bandwidth Monitoring Counters
(ABMC). It is also called QoS RMID Pinning feature
Series is written such that it is easier to support other assignable
features supported from different vendors.
The feature details are documented in the APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC). The documentation is available at
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
The patches are based on top of commit
d803bbbb55f4f (tip/master) Merge branch into tip/master: 'x86/tdx'
# Introduction
Users can create as many monitor groups as RMIDs supported by the hardware.
However, bandwidth monitoring feature on AMD system only guarantees that
RMIDs currently assigned to a processor will be tracked by hardware.
The counters of any other RMIDs which are no longer being tracked will be
reset to zero. The MBM event counters return "Unavailable" for the RMIDs
that are not tracked by hardware. So, there can be only limited number of
groups that can give guaranteed monitoring numbers. With ever changing
configurations there is no way to definitely know which of these groups
are being tracked for certain point of time. Users do not have the option
to monitor a group or set of groups for certain period of time without
worrying about counter being reset in between.
The ABMC feature provides an option to the user to assign a hardware
counter to an RMID, event pair and monitor the bandwidth as long as it is
assigned. The assigned RMID will be tracked by the hardware until the user
unassigns it manually. There is no need to worry about counters being reset
during this period. Additionally, the user can specify a bitmask identifying
the specific bandwidth types from the given source to track with the counter.
Without ABMC enabled, monitoring will work in current 'default' mode without
assignment option.
# Linux Implementation
Create a generic interface aimed to support user space assignment
of scarce counters used for monitoring. First usage of interface
is by ABMC with option to expand usage to "soft-ABMC" and MPAM
counters in future.
Feature adds following interface files:
/sys/fs/resctrl/info/L3_MON/mbm_assign_mode: Reports the list of assignable
monitoring features supported. The enclosed brackets indicate which
feature is enabled.
/sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
counters available for assignment.
/sys/fs/resctrl/info/L3_MON/available_mbm_cntrs: Reports the number of monitoring
counters free in each domain.
/sys/fs/resctrl/info/L3_MON/mbm_assign_control: Reports the resctrl group and monitor
status of each group. Assignment state can be updated by writing to the
interface.
# Examples
a. Check if ABMC support is available
#mount -t resctrl resctrl /sys/fs/resctrl/
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
[mbm_cntr_assign]
default
ABMC feature is detected and it is enabled.
b. Check how many ABMC counters are available.
# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
32
c. Check how many ABMC counters are available in each domain.
# cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
0=30;1=30;
d. Create few resctrl groups.
# mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
e. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control
to list and modify any group's monitoring states. File provides single place
to list monitoring states of all the resctrl groups. It makes it easier for
user space to learn about the used counters without needing to traverse all
the groups thus reducing the number of file system calls.
The list follows the following format:
"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
Format for specific type of groups:
* Default CTRL_MON group:
"//<domain_id>=<flags>"
* Non-default CTRL_MON group:
"<CTRL_MON group>//<domain_id>=<flags>"
* Child MON group of default CTRL_MON group:
"/<MON group>/<domain_id>=<flags>"
* Child MON group of non-default CTRL_MON group:
"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
Flags can be one of the following:
t MBM total event is enabled.
l MBM local event is enabled.
tl Both total and local MBM events are enabled.
_ None of the MBM events are enabled
Examples:
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
non_default_ctrl_mon_grp//0=tl;1=tl;
non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
//0=tl;1=tl;
/child_default_mon_grp/0=tl;1=tl;
There are four groups and all the groups have local and total
event enabled on domain 0 and 1.
f. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control.
The write format is similar to the above list format with addition
of opcode for the assignment operation.
“<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>”
* Default CTRL_MON group:
"//<domain_id><opcode><flags>"
* Non-default CTRL_MON group:
"<CTRL_MON group>//<domain_id><opcode><flags>"
* Child MON group of default CTRL_MON group:
"/<MON group>/<domain_id><opcode><flags>"
* Child MON group of non-default CTRL_MON group:
"<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
Opcode can be one of the following:
= Update the assignment to match the flags.
+ Assign a new MBM event without impacting existing assignments.
- Unassign a MBM event from currently assigned events.
Flags can be one of the following:
t MBM total event.
l MBM local event.
tl Both total and local MBM events.
_ None of the MBM events. Only works with '=' opcode. This flag cannot be combined with other flags.
Initial group status:
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
non_default_ctrl_mon_grp//0=tl;1=tl;
non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
//0=tl;1=tl;
/child_default_mon_grp/0=tl;1=tl;
To update the default group to enable only total event on domain 0:
# echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
Assignment status after the update:
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
non_default_ctrl_mon_grp//0=tl;1=tl;
non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
//0=t;1=tl;
/child_default_mon_grp/0=tl;1=tl;
To update the MON group child_default_mon_grp to remove total event on domain 1:
# echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
Assignment status after the update:
$ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
non_default_ctrl_mon_grp//0=tl;1=tl;
non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
//0=t;1=tl;
/child_default_mon_grp/0=tl;1=l;
To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to
remove both local and total events on domain 1:
# echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
/sys/fs/resctrl/info/L3_MON/mbm_assign_control
Assignment status after the update:
$ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
non_default_ctrl_mon_grp//0=tl;1=tl;
non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
//0=t;1=tl;
/child_default_mon_grp/0=tl;1=l;
To update the default group to add a local event domain 0.
# echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
Assignment status after the update:
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
non_default_ctrl_mon_grp//0=tl;1=tl;
non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
//0=tl;1=tl;
/child_default_mon_grp/0=tl;1=l;
To update the non default CTRL_MON group non_default_ctrl_mon_grp to unassign all
the MBM events on all the domains.
# echo "non_default_ctrl_mon_grp//*=_" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
Assignment status after the update:
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
non_default_ctrl_mon_grp//0=_;1=_;
non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
//0=tl;1=tl;
/child_default_mon_grp/0=tl;1=l;
g. Read the event mbm_total_bytes and mbm_local_bytes of the default group.
There is no change in reading the events with ABMC. If the event is unassigned
when reading, then the read will come back as "Unassigned".
# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
779247936
# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
765207488
h. Check the bandwidth configuration for the group. Note that bandwidth
configuration has a domain scope. Total event defaults to 0x7F (to
count all the events) and local event defaults to 0x15 (to count all
the local numa events). The event bitmap decoding is available at
https://www.kernel.org/doc/Documentation/x86/resctrl.rst
in section "mbm_total_bytes_config", "mbm_local_bytes_config":
#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
0=0x7f;1=0x7f
#cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
0=0x15;1=0x15
i. Change the bandwidth source for domain 0 for the total event to count only reads.
Note that this change effects total events on the domain 0.
#echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
0=0x33;1=0x7F
j. Now read the total event again. The first read will come back with "Unavailable"
status. The subsequent read of mbm_total_bytes will display only the read events.
#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
Unavailable
#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
314101
k. Users will have the option to go back to 'default' mbm_assign_mode if required.
This can be done using the following command. Note that switching the
mbm_assign_mode will reset all the MBM counters of all resctrl groups.
# echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
mbm_cntr_assign
[default]
l. Unmount the resctrl
#umount /sys/fs/resctrl/
---
v10:
Major change is related to domain specific assignment.
Added struct mbm_cntr_cfg inside mon domains. This will handle
the domain specific assignments as discussed in below.
https://lore.kernel.org/lkml/CALPaoCj+zWq1vkHVbXYP0znJbe6Ke3PXPWjtri5AFgD9cQDCUg@mail.gmail.com/
I did not see the need to add cntr_id in mbm_state structure. Not used in the code.
Following patches take care of these changes.
Patch 12, 13, 15, 16, 17, 18.
Added __init attribute to cache_alloc_hsw_probe(). Followed function
prototype rules (preferred order is storage class before return type).
Moved the mon_config_info structure definition to resctrl.h
Added call resctrl_arch_reset_rmid() to reset the RMID in the domain inside IPI call
resctrl_abmc_config_one_amd.
SMP and non-SMP call support is not required in resctrl_arch_config_cntr with new
domain specific assign approach/data structure.
Assigned the counter before exposing the event files.
Moved the call rdtgroup_assign_cntrs() inside mkdir_rdt_prepare_rmid_alloc().
This is called both CNTR_MON and MON group creation.
Call mbm_cntr_reset() when unmounted to clear all the assignments.
Fixed the issue with finding the domain in multiple iterations in rdtgroup_process_flags().
Printed full error message with domain information when assign fails.
Taken care of other text comments in all the patches. Patch specific changes are in each patch.
If I missed something please point me and it is not intentional.
v9:
Patch 14 is a new addition.
Major change in patch 24.
Moved the fix patch to address __init attribute to begining of the series.
Fixed all the call sequences. Added additional Fixed tags.
Added Reviewed-by where applicable.
Took care of couple of minor merge conflicts with latest code.
Re-ordered the MSR in couple of instances.
Added available_mbm_cntrs (patch 14) to print the number of counter in a domain.
Used MBM_EVENT_ARRAY_INDEX macro to get the event index.
Introduced rdtgroup_cntr_id_init() to initialize the cntr_id
Introduced new function resctrl_config_cntr to assign the counter, update
the bitmap and reset the architectural state.
Taken care of error handling(freeing the counter) when assignment fails.
Changed rdtgroup_assign_cntrs() and rdtgroup_unassign_cntrs() to return void.
Updated couple of rdtgroup_unassign_cntrs() calls properly.
Fixed problem changing the mode to mbm_cntr_assign mode when it is
not supported. Added extra checks to detect if systems supports it.
https://lore.kernel.org/lkml/03b278b5-6c15-4d09-9ab7-3317e84a409e@intel.com/
As discussed in the above comment, introduced resctrl_mon_event_config_set to
handle IPI. But sending another IPI inside IPI causes problem. Kernel
reports SMP warning. So, introduced resctrl_arch_update_cntr() to send the
command directly.
Fixed handling special case '//0=' and '//".
Removed extra strstr() call in rdtgroup_mbm_assign_control_write().
Added generic failure text when assignment operation fails.
Corrected user documentation format texts.
v8:
Patches are getting into final stages.
Couple of changes Patch 8, Patch 19 and Patch 23.
Most of the other changes are related to rename and text message updates.
Details are in each patch. Here is the summary.
Added __init attribute to dom_data_init() in patch 8/25.
Moved the mbm_cntrs_init() and mbm_cntrs_exit() functionality inside
dom_data_init() and dom_data_exit() respectively.
Renamed resctrl_mbm_evt_config_init() to arch_mbm_evt_config_init()
Renamed resctrl_arch_event_config_get() to resctrl_arch_mon_event_config_get().
resctrl_arch_event_config_set() to resctrl_arch_mon_event_config_set().
Rename resctrl_arch_assign_cntr to resctrl_arch_config_cntr.
Renamed rdtgroup_assign_cntr() to rdtgroup_assign_cntr_event().
Added the code to return the error if rdtgroup_assign_cntr_event fails.
Moved definition of MBM_EVENT_ARRAY_INDEX to resctrl/internal.h.
Renamed rdtgroup_mbm_cntr_is_assigned to mbm_cntr_assigned_to_domain
Added return error handling in resctrl_arch_config_cntr().
Renamed rdtgroup_assign_grp to rdtgroup_assign_cntrs.
Renamed rdtgroup_unassign_grp to rdtgroup_unassign_cntrs.
Fixed the problem with unassigning the child MON groups of CTRL_MON group.
Reset the internal counters after mbm_cntr_assign mode is changed.
Renamed rdtgroup_mbm_cntr_reset() to mbm_cntr_reset()
Renamed resctrl_arch_mbm_cntr_assign_configure to
resctrl_arch_mbm_cntr_assign_set_one.
Used the same IPI as event update to modify the assignment.
Could not do the way we discussed in the thread.
https://lore.kernel.org/lkml/f77737ac-d3f6-3e4b-3565-564f79c86ca8@amd.com/
Needed to figure out event type to update the configuration.
Moved unassign first and assign during the assign modification.
Assign none "_" takes priority. Cannot be mixed with other flags.
Updated the documentation and .rst file format. htmldoc looks ok.
v7:
Major changes are related to FS and arch codes separation.
Changed few interface names based on feedback.
Here are the summary and each patch contains changes specific the patch.
Removed WARN_ON for num_mbm_cntrs. Decided to dynamically allocate the bitmap.
WARN_ON is not required anymore.
Renamed the function resctrl_arch_get_abmc_enabled() to resctrl_arch_mbm_cntr_assign_enabled().
Merged resctrl_arch_mbm_cntr_assign_disable, resctrl_arch_mbm_cntr_assign_disable
and renamed to resctrl_arch_mbm_cntr_assign_set(). Passed the struct rdt_resource
to these functions.
Removed resctrl_arch_reset_rmid_all() from arch code. This will be done from FS the caller.
Updated the descriptions/commit log in resctrl.rst to generic text. Removed ABMC references.
Renamed mbm_mode to mbm_assign_mode.
Renamed mbm_control to mbm_assign_control.
Introduced mutex lock in rdtgroup_mbm_mode_show().
The 'legacy' mode is called 'default' mode.
Removed the static allocation and now allocating bitmap mbm_cntr_free_map dynamically.
Merged rdtgroup_assign_cntr(), rdtgroup_alloc_cntr() into one.
Merged rdtgroup_unassign_cntr(), rdtgroup_free_cntr() into one.
Added struct rdt_resource to the interface functions resctrl_arch_assign_cntr ()
and resctrl_arch_unassign_cntr().
Rename rdtgroup_abmc_cfg() to resctrl_abmc_config_one_amd().
Added a new patch to fix counter assignment on event config changes.
Removed the references of ABMC from user interfaces.
Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write().
Added mutex lock in rdtgroup_mbm_assign_control_write() while processing.
Thomas Gleixner asked us to update https://gitlab.com/x86-cpuid.org/x86-cpuid-db.
It needs internal approval. We are working on it.
v6:
We still need to finalize few interface details on mbm_assign_mode and mbm_assign_control
in case of ABMC and Soft-ABMC. We can continue the discussion with this series.
Added support for domain-id '*' to update all the domains at once.
Fixed assign interface to allocate the counter if counter is
not assigned.
Fixed unassign interface to free the counter if the counter is not
assigned in any of the domains.
Renamed abmc_capable to mbm_cntr_assignable.
Renamed abmc_enabled to mbm_cntr_assign_enabled.
Used msr_set_bit and msr_clear_bit for msr updates.
Renamed resctrl_arch_abmc_enable() to resctrl_arch_mbm_cntr_assign_enable().
Renamed resctrl_arch_abmc_disable() to resctrl_arch_mbm_cntr_assign_disable().
Changed the display name from num_cntrs to num_mbm_cntrs.
Removed the variable mbm_cntrs_free_map_len. This is not required.
Removed the call mbm_cntrs_init() in arch code. This needs to be done at higher level.
Used DECLARE_BITMAP to initialize mbm_cntrs_free_map.
Removed unused config value definitions.
Introduced mbm_cntr_map to track counters at domain level. With this
we dont need to send MSR read to read the counter configuration.
Separated all the counter id management to upper level in FS code.
Added checks to detect "Unassigned" before reading the RMID.
More details in each patch.
v5:
Rebase changes (because of SNC support)
Interface changes.
/sys/fs/resctrl/mbm_assign to /sys/fs/resctrl/mbm_assign_mode.
/sys/fs/resctrl/mbm_assign_control to /sys/fs/resctrl/mbm_assign_control.
Added few arch specific routines.
resctrl_arch_get_abmc_enabled.
resctrl_arch_abmc_enable.
resctrl_arch_abmc_disable.
Few renames
num_cntrs_free_map -> mbm_cntrs_free_map
num_cntrs_init -> mbm_cntrs_init
arch_domain_mbm_evt_config -> resctrl_arch_mbm_evt_config
Introduced resctrl_arch_event_config_get and
resctrl_arch_event_config_set() to update event configuration.
Removed mon_state field mongroup. Added MON_CNTR_UNSET to initialize counters.
Renamed ctr_id to cntr_id for the hardware counter.
Report "Unassigned" in case the user attempts to read the events without assigning the counter.
ABMC is enabled during the boot up. Can be enabled or disabled later.
Fixed opcode and flags combination.
'=_" is valid.
"-_" amd "+_" is not valid.
Added all the comments as far as I know. If I missed something, it is not intentional.
v4:
Main change is domain specific event assignment.
Kept the ABMC feature as a default.
Dynamcic switching between ABMC and mbm_legacy is still allowed.
We are still not clear about mount option.
Moved the monitoring related data in resctrl_mon structure from rdt_resource.
Fixed the display of legacy and ABMC mode.
Used bimap APIs when possible.
Removed event configuration read from MSRs. We can use the
internal saved data.(patch 12)
Added more comments about L3_QOS_ABMC_CFG MSR.
Added IPIs to read the assignment status for each domain (patch 18 and 19)
More details in each patch.
v3:
This series adds the support for global assignment mode discussed in
the thread. https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/
Removed the individual assignment mode and included the global assignment interface.
Added following interface files.
a. /sys/fs/resctrl/info/L3_MON/mbm_assign
Used for displaying the current assignment mode and switch between
ABMC and legacy mode.
b. /sys/fs/resctrl/info/L3_MON/mbm_assign_control
Used for lising the groups assignment mode and modify the assignment states.
c. Most of the changes are related to the new interface.
d. Addressed the comments from Reinette, James and Peter.
e. Hope I have addressed most of the major feedbacks discussed. If I missed
something then it is not intentional. Please feel free to comment.
f. Sending this as an RFC as per Reinette's comment. So, this is still open
for discussion.
v2:
a. Major change is the way ABMC is enabled. Earlier, user needed to remount
with -o abmc to enable ABMC feature. Removed that option now.
Now users can enable ABMC by "$echo 1 to /sys/fs/resctrl/info/L3_MON/mbm_assign_enable".
b. Added new word 21 to x86/cpufeatures.h.
c. Display unsupported if user attempts to read the events when ABMC is enabled
and event is not assigned.
d. Display monitor_state as "Unsupported" when ABMC is disabled.
e. Text updates and rebase to latest tip tree (as of Jan 18).
f. This series is still work in progress. I am yet to hear from ARM developers.
v9: https://lore.kernel.org/lkml/cover.1730244116.git.babu.moger@amd.com/
v8: https://lore.kernel.org/lkml/cover.1728495588.git.babu.moger@amd.com/
v7: https://lore.kernel.org/lkml/cover.1725488488.git.babu.moger@amd.com/
v6: https://lore.kernel.org/lkml/cover.1722981659.git.babu.moger@amd.com/
v5: https://lore.kernel.org/lkml/cover.1720043311.git.babu.moger@amd.com/
v4: https://lore.kernel.org/lkml/cover.1716552602.git.babu.moger@amd.com/
v3: https://lore.kernel.org/lkml/cover.1711674410.git.babu.moger@amd.com/
v2: https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/
v1: https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/
Babu Moger (24):
x86/resctrl: Add __init attribute to functions called from
resctrl_late_init()
x86/cpufeatures: Add support for Assignable Bandwidth Monitoring
Counters (ABMC)
x86/resctrl: Add ABMC feature in the command line options
x86/resctrl: Consolidate monitoring related data from rdt_resource
x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags
x86/resctrl: Add support to enable/disable AMD ABMC feature
x86/resctrl: Introduce the interface to display monitor mode
x86/resctrl: Introduce interface to display number of monitoring
counters
x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg in struct
rdt_hw_mon_domain
x86/resctrl: Remove MSR reading of event configuration value
x86/resctrl: Introduce cntr_cfg to track assignable counters at domain
x86/resctrl: Introduce interface to display number of free counters
x86/resctrl: Add data structures and definitions for ABMC assignment
x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter
with ABMC
x86/resctrl: Add interface to the assign counter
x86/resctrl: Add the interface to unassign a counter
x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is
enabled
x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign
mode
x86/resctrl: Introduce the interface to switch between monitor modes
x86/resctrl: Configure mbm_cntr_assign mode if supported
x86/resctrl: Update assignments on event configuration changes
x86/resctrl: Introduce interface to list assignment states of all the
groups
x86/resctrl: Introduce interface to modify assignment states of the
groups
.../admin-guide/kernel-parameters.txt | 2 +-
Documentation/arch/x86/resctrl.rst | 233 +++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 2 +
arch/x86/kernel/cpu/cpuid-deps.c | 3 +
arch/x86/kernel/cpu/resctrl/core.c | 27 +-
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 8 +
arch/x86/kernel/cpu/resctrl/internal.h | 78 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 68 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 971 ++++++++++++++++--
arch/x86/kernel/cpu/scattered.c | 1 +
include/linux/resctrl.h | 53 +-
12 files changed, 1355 insertions(+), 92 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 76+ messages in thread
* [PATCH v10 01/24] x86/resctrl: Add __init attribute to functions called from resctrl_late_init()
2024-12-12 20:15 [PATCH v10 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
@ 2024-12-12 20:15 ` Babu Moger
2024-12-12 20:15 ` [PATCH v10 02/24] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (22 subsequent siblings)
23 siblings, 0 replies; 76+ messages in thread
From: Babu Moger @ 2024-12-12 20:15 UTC (permalink / raw)
To: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc,
babu.moger, xin3.li, andrew.cooper3, ebiggers, mario.limonciello,
james.morse, tan.shaopeng, linux-doc, linux-kernel,
maciej.wieczor-retman, eranian
resctrl_late_init() has the __init attribute, but some of the functions
called from it do not have the __init attribute.
Add the __init attribute to all the functions in the call sequences to
maintain consistency throughout.
Fixes: 6a445edce657 ("x86/intel_rdt/cqm: Add RDT monitoring initialization")
Fixes: def10853930a ("x86/intel_rdt: Add two new resources for L2 Code and Data Prioritization (CDP)")
Fixes: bd334c86b5d7 ("x86/resctrl: Add __init attribute to rdt_get_mon_l3_config()")
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v10: Text changes.
Added __init attribute to cache_alloc_hsw_probe()
Followed function prototype rules (preferred order is storage
class before return type).
v9: Moved the patch to the begining of the series.
Fixed all the call sequences. Added additional Fixed tags.
v8: New patch.
---
arch/x86/kernel/cpu/resctrl/core.c | 10 +++++-----
arch/x86/kernel/cpu/resctrl/internal.h | 2 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 4 ++--
3 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index b681c2e07dbf..62f2c5bbe2c6 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -145,7 +145,7 @@ u32 resctrl_arch_system_num_rmid_idx(void)
* is always 20 on hsw server parts. The minimum cache bitmask length
* allowed for HSW server is always 2 bits. Hardcode all of them.
*/
-static inline void cache_alloc_hsw_probe(void)
+static inline __init void cache_alloc_hsw_probe(void)
{
struct rdt_hw_resource *hw_res = &rdt_resources_all[RDT_RESOURCE_L3];
struct rdt_resource *r = &hw_res->r_resctrl;
@@ -275,7 +275,7 @@ static __init bool __rdt_get_mem_config_amd(struct rdt_resource *r)
return true;
}
-static void rdt_get_cache_alloc_cfg(int idx, struct rdt_resource *r)
+static __init void rdt_get_cache_alloc_cfg(int idx, struct rdt_resource *r)
{
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
union cpuid_0x10_1_eax eax;
@@ -294,7 +294,7 @@ static void rdt_get_cache_alloc_cfg(int idx, struct rdt_resource *r)
r->alloc_capable = true;
}
-static void rdt_get_cdp_config(int level)
+static __init void rdt_get_cdp_config(int level)
{
/*
* By default, CDP is disabled. CDP can be enabled by mount parameter
@@ -304,12 +304,12 @@ static void rdt_get_cdp_config(int level)
rdt_resources_all[level].r_resctrl.cdp_capable = true;
}
-static void rdt_get_cdp_l3_config(void)
+static __init void rdt_get_cdp_l3_config(void)
{
rdt_get_cdp_config(RDT_RESOURCE_L3);
}
-static void rdt_get_cdp_l2_config(void)
+static __init void rdt_get_cdp_l2_config(void)
{
rdt_get_cdp_config(RDT_RESOURCE_L2);
}
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 955999aecfca..16181b90159a 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -627,7 +627,7 @@ int closids_supported(void);
void closid_free(int closid);
int alloc_rmid(u32 closid);
void free_rmid(u32 closid, u32 rmid);
-int rdt_get_mon_l3_config(struct rdt_resource *r);
+int __init rdt_get_mon_l3_config(struct rdt_resource *r);
void __exit rdt_put_mon_l3_config(void);
bool __init rdt_cpu_has(int flag);
void mon_event_count(void *info);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 5fcb3d635d91..daf2e8c03d86 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -983,7 +983,7 @@ void mbm_setup_overflow_handler(struct rdt_mon_domain *dom, unsigned long delay_
schedule_delayed_work_on(cpu, &dom->mbm_over, delay);
}
-static int dom_data_init(struct rdt_resource *r)
+static __init int dom_data_init(struct rdt_resource *r)
{
u32 idx_limit = resctrl_arch_system_num_rmid_idx();
u32 num_closid = resctrl_arch_get_num_closid(r);
@@ -1081,7 +1081,7 @@ static struct mon_evt mbm_local_event = {
* because as per the SDM the total and local memory bandwidth
* are enumerated as part of L3 monitoring.
*/
-static void l3_mon_evt_init(struct rdt_resource *r)
+static __init void l3_mon_evt_init(struct rdt_resource *r)
{
INIT_LIST_HEAD(&r->evt_list);
--
2.34.1
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v10 02/24] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC)
2024-12-12 20:15 [PATCH v10 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
2024-12-12 20:15 ` [PATCH v10 01/24] x86/resctrl: Add __init attribute to functions called from resctrl_late_init() Babu Moger
@ 2024-12-12 20:15 ` Babu Moger
2024-12-12 20:15 ` [PATCH v10 03/24] x86/resctrl: Add ABMC feature in the command line options Babu Moger
` (21 subsequent siblings)
23 siblings, 0 replies; 76+ messages in thread
From: Babu Moger @ 2024-12-12 20:15 UTC (permalink / raw)
To: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc,
babu.moger, xin3.li, andrew.cooper3, ebiggers, mario.limonciello,
james.morse, tan.shaopeng, linux-doc, linux-kernel,
maciej.wieczor-retman, eranian
Users can create as many monitor groups as RMIDs supported by the hardware.
However, bandwidth monitoring feature on AMD system only guarantees that
RMIDs currently assigned to a processor will be tracked by hardware. The
counters of any other RMIDs which are no longer being tracked will be reset
to zero. The MBM event counters return "Unavailable" for the RMIDs that are
not tracked by hardware. So, there can be only limited number of groups
that can give guaranteed monitoring numbers. With ever changing
configurations there is no way to definitely know which of these groups are
being tracked for certain point of time. Users do not have the option to
monitor a group or set of groups for certain period of time without
worrying about RMID being reset in between.
The ABMC feature provides an option to the user to assign a hardware
counter to an RMID, event pair and monitor the bandwidth as long as it is
assigned. The assigned RMID will be tracked by the hardware until the user
unassigns it manually. There is no need to worry about counters being reset
during this period. Additionally, the user can specify a bitmask
identifying the specific bandwidth types from the given source to track
with the counter.
Without ABMC enabled, monitoring will work in current mode without
assignment option.
Linux resctrl subsystem provides the interface to count maximum of two
memory bandwidth events per group, from a combination of available total
and local events. Keeping the current interface, users can enable a maximum
of 2 ABMC counters per group. User will also have the option to enable only
one counter to the group. If the system runs out of assignable ABMC
counters, kernel will display an error. Users need to disable an already
enabled counter to make space for new assignments.
The feature can be detected via CPUID_Fn80000020_EBX_x00 bit 5.
Bits Description
5 ABMC (Assignable Bandwidth Monitoring Counters)
The feature details are documented in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
Note: Checkpatch checks/warnings are ignored to maintain coding style.
v10: No changes.
v9: Took care of couple of minor merge conflicts. No other changes.
v8: No changes.
v7: Removed "" from feature flags. Not required anymore.
https://lore.kernel.org/lkml/20240817145058.GCZsC40neU4wkPXeVR@fat_crate.local/
v6: Added Reinette's Reviewed-by. Moved the Checkpatch note below ---.
v5: Minor rebase change and subject line update.
v4: Changes because of rebase. Feature word 21 has few more additions now.
Changed the text to "tracked by hardware" instead of active.
v3: Change because of rebase. Actual patch did not change.
v2: Added dependency on X86_FEATURE_BMEC.
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/cpuid-deps.c | 3 +++
arch/x86/kernel/cpu/scattered.c | 1 +
3 files changed, 5 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index f725ccc77b01..662c334662ba 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -479,6 +479,7 @@
#define X86_FEATURE_AMD_FAST_CPPC (21*32 + 5) /* Fast CPPC */
#define X86_FEATURE_AMD_HETEROGENEOUS_CORES (21*32 + 6) /* Heterogeneous Core Topology */
#define X86_FEATURE_AMD_WORKLOAD_CLASS (21*32 + 7) /* Workload Classification */
+#define X86_FEATURE_ABMC (21*32 + 8) /* Assignable Bandwidth Monitoring Counters */
/*
* BUG word(s)
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 8bd84114c2d9..7e4d63b381d6 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -70,6 +70,9 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_CQM_MBM_LOCAL, X86_FEATURE_CQM_LLC },
{ X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_TOTAL },
{ X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_LOCAL },
+ { X86_FEATURE_ABMC, X86_FEATURE_CQM_MBM_TOTAL },
+ { X86_FEATURE_ABMC, X86_FEATURE_CQM_MBM_LOCAL },
+ { X86_FEATURE_ABMC, X86_FEATURE_BMEC },
{ X86_FEATURE_AVX512_BF16, X86_FEATURE_AVX512VL },
{ X86_FEATURE_AVX512_FP16, X86_FEATURE_AVX512BW },
{ X86_FEATURE_ENQCMD, X86_FEATURE_XSAVES },
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 16f3ca30626a..3b72b72270f1 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -49,6 +49,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_MBA, CPUID_EBX, 6, 0x80000008, 0 },
{ X86_FEATURE_SMBA, CPUID_EBX, 2, 0x80000020, 0 },
{ X86_FEATURE_BMEC, CPUID_EBX, 3, 0x80000020, 0 },
+ { X86_FEATURE_ABMC, CPUID_EBX, 5, 0x80000020, 0 },
{ X86_FEATURE_AMD_WORKLOAD_CLASS, CPUID_EAX, 22, 0x80000021, 0 },
{ X86_FEATURE_PERFMON_V2, CPUID_EAX, 0, 0x80000022, 0 },
{ X86_FEATURE_AMD_LBR_V2, CPUID_EAX, 1, 0x80000022, 0 },
--
2.34.1
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v10 03/24] x86/resctrl: Add ABMC feature in the command line options
2024-12-12 20:15 [PATCH v10 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
2024-12-12 20:15 ` [PATCH v10 01/24] x86/resctrl: Add __init attribute to functions called from resctrl_late_init() Babu Moger
2024-12-12 20:15 ` [PATCH v10 02/24] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
@ 2024-12-12 20:15 ` Babu Moger
2024-12-12 20:15 ` [PATCH v10 04/24] x86/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
` (20 subsequent siblings)
23 siblings, 0 replies; 76+ messages in thread
From: Babu Moger @ 2024-12-12 20:15 UTC (permalink / raw)
To: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc,
babu.moger, xin3.li, andrew.cooper3, ebiggers, mario.limonciello,
james.morse, tan.shaopeng, linux-doc, linux-kernel,
maciej.wieczor-retman, eranian
Add the command line option to enable or disable exposing the ABMC
(Assignable Bandwidth Monitoring Counters) hardware feature to resctrl.
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
v10: No changes.
v9: No code changes. Added Reviewed-by.
v8: Commit message update.
v7: No changes
v6: No changes
v5: No changes
v4: No changes
v3: No changes
v2: No changes
---
Documentation/admin-guide/kernel-parameters.txt | 2 +-
Documentation/arch/x86/resctrl.rst | 1 +
arch/x86/kernel/cpu/resctrl/core.c | 2 ++
3 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 7d427d0a4a1a..f22d367290aa 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5854,7 +5854,7 @@
rdt= [HW,X86,RDT]
Turn on/off individual RDT features. List is:
cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
- mba, smba, bmec.
+ mba, smba, bmec, abmc.
E.g. to turn on cmt and turn off mba use:
rdt=cmt,!mba
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index a824affd741d..30586728a4cd 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -26,6 +26,7 @@ MBM (Memory Bandwidth Monitoring) "cqm_mbm_total", "cqm_mbm_local"
MBA (Memory Bandwidth Allocation) "mba"
SMBA (Slow Memory Bandwidth Allocation) ""
BMEC (Bandwidth Monitoring Event Configuration) ""
+ABMC (Assignable Bandwidth Monitoring Counters) ""
=============================================== ================================
Historically, new features were made visible by default in /proc/cpuinfo. This
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 62f2c5bbe2c6..b3861f0e5857 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -809,6 +809,7 @@ enum {
RDT_FLAG_MBA,
RDT_FLAG_SMBA,
RDT_FLAG_BMEC,
+ RDT_FLAG_ABMC,
};
#define RDT_OPT(idx, n, f) \
@@ -834,6 +835,7 @@ static struct rdt_options rdt_options[] __initdata = {
RDT_OPT(RDT_FLAG_MBA, "mba", X86_FEATURE_MBA),
RDT_OPT(RDT_FLAG_SMBA, "smba", X86_FEATURE_SMBA),
RDT_OPT(RDT_FLAG_BMEC, "bmec", X86_FEATURE_BMEC),
+ RDT_OPT(RDT_FLAG_ABMC, "abmc", X86_FEATURE_ABMC),
};
#define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options)
--
2.34.1
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v10 04/24] x86/resctrl: Consolidate monitoring related data from rdt_resource
2024-12-12 20:15 [PATCH v10 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (2 preceding siblings ...)
2024-12-12 20:15 ` [PATCH v10 03/24] x86/resctrl: Add ABMC feature in the command line options Babu Moger
@ 2024-12-12 20:15 ` Babu Moger
2024-12-12 20:15 ` [PATCH v10 05/24] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
` (19 subsequent siblings)
23 siblings, 0 replies; 76+ messages in thread
From: Babu Moger @ 2024-12-12 20:15 UTC (permalink / raw)
To: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc,
babu.moger, xin3.li, andrew.cooper3, ebiggers, mario.limonciello,
james.morse, tan.shaopeng, linux-doc, linux-kernel,
maciej.wieczor-retman, eranian
The cache allocation and memory bandwidth allocation feature properties
are consolidated into struct resctrl_cache and struct resctrl_membw
respectively.
In preparation for more monitoring properties that will clobber the
existing resource struct more, re-organize the monitoring specific
properties to also be in a separate structure.
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
v10: No changes.
v9: No changes.
v8: Added Reviewed-by from Reinette. No other changes.
v7: Added kernel doc for data structure. Minor text update.
v6: Update commit message and update kernel doc for rdt_resource.
v5: Commit message update.
Also changes related to data structure updates does to SNC support.
v4: New patch.
---
arch/x86/kernel/cpu/resctrl/core.c | 4 ++--
arch/x86/kernel/cpu/resctrl/monitor.c | 18 +++++++++---------
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 ++++----
include/linux/resctrl.h | 16 ++++++++++++----
4 files changed, 27 insertions(+), 19 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index b3861f0e5857..ff4dc649e35c 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -124,7 +124,7 @@ u32 resctrl_arch_system_num_rmid_idx(void)
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
/* RMID are independent numbers for x86. num_rmid_idx == num_rmid */
- return r->num_rmid;
+ return r->mon.num_rmid;
}
/*
@@ -625,7 +625,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
arch_mon_domain_online(r, d);
- if (arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
+ if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
mon_domain_free(hw_dom);
return;
}
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index daf2e8c03d86..ec2a237321dc 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -222,7 +222,7 @@ static int logical_rmid_to_physical_rmid(int cpu, int lrmid)
if (snc_nodes_per_l3_cache == 1)
return lrmid;
- return lrmid + (cpu_to_node(cpu) % snc_nodes_per_l3_cache) * r->num_rmid;
+ return lrmid + (cpu_to_node(cpu) % snc_nodes_per_l3_cache) * r->mon.num_rmid;
}
static int __rmid_read_phys(u32 prmid, enum resctrl_event_id eventid, u64 *val)
@@ -297,11 +297,11 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *
if (is_mbm_total_enabled())
memset(hw_dom->arch_mbm_total, 0,
- sizeof(*hw_dom->arch_mbm_total) * r->num_rmid);
+ sizeof(*hw_dom->arch_mbm_total) * r->mon.num_rmid);
if (is_mbm_local_enabled())
memset(hw_dom->arch_mbm_local, 0,
- sizeof(*hw_dom->arch_mbm_local) * r->num_rmid);
+ sizeof(*hw_dom->arch_mbm_local) * r->mon.num_rmid);
}
static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
@@ -1083,14 +1083,14 @@ static struct mon_evt mbm_local_event = {
*/
static __init void l3_mon_evt_init(struct rdt_resource *r)
{
- INIT_LIST_HEAD(&r->evt_list);
+ INIT_LIST_HEAD(&r->mon.evt_list);
if (is_llc_occupancy_enabled())
- list_add_tail(&llc_occupancy_event.list, &r->evt_list);
+ list_add_tail(&llc_occupancy_event.list, &r->mon.evt_list);
if (is_mbm_total_enabled())
- list_add_tail(&mbm_total_event.list, &r->evt_list);
+ list_add_tail(&mbm_total_event.list, &r->mon.evt_list);
if (is_mbm_local_enabled())
- list_add_tail(&mbm_local_event.list, &r->evt_list);
+ list_add_tail(&mbm_local_event.list, &r->mon.evt_list);
}
/*
@@ -1187,7 +1187,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale / snc_nodes_per_l3_cache;
- r->num_rmid = (boot_cpu_data.x86_cache_max_rmid + 1) / snc_nodes_per_l3_cache;
+ r->mon.num_rmid = (boot_cpu_data.x86_cache_max_rmid + 1) / snc_nodes_per_l3_cache;
hw_res->mbm_width = MBM_CNTR_WIDTH_BASE;
if (mbm_offset > 0 && mbm_offset <= MBM_CNTR_WIDTH_OFFSET_MAX)
@@ -1202,7 +1202,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
*
* For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC.
*/
- threshold = resctrl_rmid_realloc_limit / r->num_rmid;
+ threshold = resctrl_rmid_realloc_limit / r->mon.num_rmid;
/*
* Because num_rmid may not be a power of two, round the value
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index d906a1cd8491..1647ad9145ef 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1097,7 +1097,7 @@ static int rdt_num_rmids_show(struct kernfs_open_file *of,
{
struct rdt_resource *r = of->kn->parent->priv;
- seq_printf(seq, "%d\n", r->num_rmid);
+ seq_printf(seq, "%d\n", r->mon.num_rmid);
return 0;
}
@@ -1108,7 +1108,7 @@ static int rdt_mon_features_show(struct kernfs_open_file *of,
struct rdt_resource *r = of->kn->parent->priv;
struct mon_evt *mevt;
- list_for_each_entry(mevt, &r->evt_list, list) {
+ list_for_each_entry(mevt, &r->mon.evt_list, list) {
seq_printf(seq, "%s\n", mevt->name);
if (mevt->configurable)
seq_printf(seq, "%s_config\n", mevt->name);
@@ -3057,13 +3057,13 @@ static int mon_add_all_files(struct kernfs_node *kn, struct rdt_mon_domain *d,
struct mon_evt *mevt;
int ret;
- if (WARN_ON(list_empty(&r->evt_list)))
+ if (WARN_ON(list_empty(&r->mon.evt_list)))
return -EPERM;
priv.u.rid = r->rid;
priv.u.domid = do_sum ? d->ci->id : d->hdr.id;
priv.u.sum = do_sum;
- list_for_each_entry(mevt, &r->evt_list, list) {
+ list_for_each_entry(mevt, &r->mon.evt_list, list) {
priv.u.evtid = mevt->evtid;
ret = mon_addfile(kn, mevt->name, priv.priv);
if (ret)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index d94abba1c716..3c2307c7c106 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -182,16 +182,26 @@ enum resctrl_scope {
RESCTRL_L3_NODE,
};
+/**
+ * struct resctrl_mon - Monitoring related data of a resctrl resource
+ * @num_rmid: Number of RMIDs available
+ * @evt_list: List of monitoring events
+ */
+struct resctrl_mon {
+ int num_rmid;
+ struct list_head evt_list;
+};
+
/**
* struct rdt_resource - attributes of a resctrl resource
* @rid: The index of the resource
* @alloc_capable: Is allocation available on this machine
* @mon_capable: Is monitor feature available on this machine
- * @num_rmid: Number of RMIDs available
* @ctrl_scope: Scope of this resource for control functions
* @mon_scope: Scope of this resource for monitor functions
* @cache: Cache allocation related data
* @membw: If the component has bandwidth controls, their properties.
+ * @mon: Monitoring related data.
* @ctrl_domains: RCU list of all control domains for this resource
* @mon_domains: RCU list of all monitor domains for this resource
* @name: Name to use in "schemata" file.
@@ -199,7 +209,6 @@ enum resctrl_scope {
* @default_ctrl: Specifies default cache cbm or memory B/W percent.
* @format_str: Per resource format string to show domain value
* @parse_ctrlval: Per resource function pointer to parse control values
- * @evt_list: List of monitoring events
* @fflags: flags to choose base and info files
* @cdp_capable: Is the CDP feature available on this resource
*/
@@ -207,11 +216,11 @@ struct rdt_resource {
int rid;
bool alloc_capable;
bool mon_capable;
- int num_rmid;
enum resctrl_scope ctrl_scope;
enum resctrl_scope mon_scope;
struct resctrl_cache cache;
struct resctrl_membw membw;
+ struct resctrl_mon mon;
struct list_head ctrl_domains;
struct list_head mon_domains;
char *name;
@@ -221,7 +230,6 @@ struct rdt_resource {
int (*parse_ctrlval)(struct rdt_parse_data *data,
struct resctrl_schema *s,
struct rdt_ctrl_domain *d);
- struct list_head evt_list;
unsigned long fflags;
bool cdp_capable;
};
--
2.34.1
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v10 05/24] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
2024-12-12 20:15 [PATCH v10 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (3 preceding siblings ...)
2024-12-12 20:15 ` [PATCH v10 04/24] x86/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
@ 2024-12-12 20:15 ` Babu Moger
2024-12-12 20:15 ` [PATCH v10 06/24] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags Babu Moger
` (18 subsequent siblings)
23 siblings, 0 replies; 76+ messages in thread
From: Babu Moger @ 2024-12-12 20:15 UTC (permalink / raw)
To: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc,
babu.moger, xin3.li, andrew.cooper3, ebiggers, mario.limonciello,
james.morse, tan.shaopeng, linux-doc, linux-kernel,
maciej.wieczor-retman, eranian
ABMC feature details are reported via CPUID Fn8000_0020_EBX_x5.
Bits Description
15:0 MAX_ABMC Maximum Supported Assignable Bandwidth
Monitoring Counter ID + 1
The feature details are documented in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
Detect the feature and number of assignable monitoring counters supported.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
v10: No changes.
v9: Added Reviewed-by tag. No code changes
v8: Used GENMASK for the mask.
v7: Removed WARN_ON for num_mbm_cntrs. Decided to dynamically allocate the
bitmap. WARN_ON is not required anymore.
Removed redundant comments.
v6: Commit message update.
Renamed abmc_capable to mbm_cntr_assignable.
v5: Name change num_cntrs to num_mbm_cntrs.
Moved abmc_capable to resctrl_mon.
v4: Removed resctrl_arch_has_abmc(). Added all the code inline. We dont
need to separate this as arch code.
v3: Removed changes related to mon_features.
Moved rdt_cpu_has to core.c and added new function resctrl_arch_has_abmc.
Also moved the fields mbm_assign_capable and mbm_assign_cntrs to
rdt_resource. (James)
v2: Changed the field name to mbm_assign_capable from abmc_capable.
---
arch/x86/kernel/cpu/resctrl/monitor.c | 6 ++++++
include/linux/resctrl.h | 4 ++++
2 files changed, 10 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index ec2a237321dc..36df69da99f2 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1230,6 +1230,12 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
mbm_local_event.configurable = true;
mbm_config_rftype_init("mbm_local_bytes_config");
}
+
+ if (rdt_cpu_has(X86_FEATURE_ABMC)) {
+ r->mon.mbm_cntr_assignable = true;
+ cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
+ r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
+ }
}
l3_mon_evt_init(r);
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 3c2307c7c106..511cfce8fc21 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -185,10 +185,14 @@ enum resctrl_scope {
/**
* struct resctrl_mon - Monitoring related data of a resctrl resource
* @num_rmid: Number of RMIDs available
+ * @num_mbm_cntrs: Number of assignable monitoring counters
+ * @mbm_cntr_assignable:Is system capable of supporting monitor assignment?
* @evt_list: List of monitoring events
*/
struct resctrl_mon {
int num_rmid;
+ int num_mbm_cntrs;
+ bool mbm_cntr_assignable;
struct list_head evt_list;
};
--
2.34.1
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v10 06/24] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags
2024-12-12 20:15 [PATCH v10 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (4 preceding siblings ...)
2024-12-12 20:15 ` [PATCH v10 05/24] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
@ 2024-12-12 20:15 ` Babu Moger
2024-12-12 20:15 ` [PATCH v10 07/24] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
` (17 subsequent siblings)
23 siblings, 0 replies; 76+ messages in thread
From: Babu Moger @ 2024-12-12 20:15 UTC (permalink / raw)
To: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc,
babu.moger, xin3.li, andrew.cooper3, ebiggers, mario.limonciello,
james.morse, tan.shaopeng, linux-doc, linux-kernel,
maciej.wieczor-retman, eranian
thread_throttle_mode_init() and mbm_config_rftype_init() both initialize
fflags for resctrl files.
Adding new files will involve adding another function to initialize
the fflags. This can be simplified by adding a new function
resctrl_file_fflags_init() and passing the file name and flags
to be initialized.
Consolidate fflags initialization into resctrl_file_fflags_init() and
remove thread_throttle_mode_init() and mbm_config_rftype_init().
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
v10: No changes.
Tony added this patch in his series. Will remove it after it is merged.
https://lore.kernel.org/lkml/20241122235832.27498-2-tony.luck@intel.com/
v9: No changes.
v8: No changes.
v7: No changes.
v6: Added Reviewed-by from Reinette.
v5: Commit message update.
v4: Commit message update.
v3: New patch to display ABMC capability.
---
arch/x86/kernel/cpu/resctrl/core.c | 4 +++-
arch/x86/kernel/cpu/resctrl/internal.h | 4 ++--
arch/x86/kernel/cpu/resctrl/monitor.c | 6 ++++--
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 16 +++-------------
4 files changed, 12 insertions(+), 18 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index ff4dc649e35c..45f74d57de84 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -234,7 +234,9 @@ static __init bool __get_mem_config_intel(struct rdt_resource *r)
r->membw.throttle_mode = THREAD_THROTTLE_PER_THREAD;
else
r->membw.throttle_mode = THREAD_THROTTLE_MAX;
- thread_throttle_mode_init();
+
+ resctrl_file_fflags_init("thread_throttle_mode",
+ RFTYPE_CTRL_INFO | RFTYPE_RES_MB);
r->alloc_capable = true;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 16181b90159a..9dd1799adba3 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -647,8 +647,8 @@ void cqm_handle_limbo(struct work_struct *work);
bool has_busy_rmid(struct rdt_mon_domain *d);
void __check_limbo(struct rdt_mon_domain *d, bool force_free);
void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
-void __init thread_throttle_mode_init(void);
-void __init mbm_config_rftype_init(const char *config);
+void __init resctrl_file_fflags_init(const char *config,
+ unsigned long fflags);
void rdt_staged_configs_clear(void);
bool closid_allocated(unsigned int closid);
int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 36df69da99f2..80be91671dc1 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1224,11 +1224,13 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
mbm_total_event.configurable = true;
- mbm_config_rftype_init("mbm_total_bytes_config");
+ resctrl_file_fflags_init("mbm_total_bytes_config",
+ RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
}
if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
mbm_local_event.configurable = true;
- mbm_config_rftype_init("mbm_local_bytes_config");
+ resctrl_file_fflags_init("mbm_local_bytes_config",
+ RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
}
if (rdt_cpu_has(X86_FEATURE_ABMC)) {
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 1647ad9145ef..687d9d8d82a4 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2020,24 +2020,14 @@ static struct rftype *rdtgroup_get_rftype_by_name(const char *name)
return NULL;
}
-void __init thread_throttle_mode_init(void)
-{
- struct rftype *rft;
-
- rft = rdtgroup_get_rftype_by_name("thread_throttle_mode");
- if (!rft)
- return;
-
- rft->fflags = RFTYPE_CTRL_INFO | RFTYPE_RES_MB;
-}
-
-void __init mbm_config_rftype_init(const char *config)
+void __init resctrl_file_fflags_init(const char *config,
+ unsigned long fflags)
{
struct rftype *rft;
rft = rdtgroup_get_rftype_by_name(config);
if (rft)
- rft->fflags = RFTYPE_MON_INFO | RFTYPE_RES_CACHE;
+ rft->fflags = fflags;
}
/**
--
2.34.1
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v10 07/24] x86/resctrl: Add support to enable/disable AMD ABMC feature
2024-12-12 20:15 [PATCH v10 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (5 preceding siblings ...)
2024-12-12 20:15 ` [PATCH v10 06/24] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags Babu Moger
@ 2024-12-12 20:15 ` Babu Moger
2024-12-19 21:48 ` Reinette Chatre
2024-12-12 20:15 ` [PATCH v10 08/24] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
` (16 subsequent siblings)
23 siblings, 1 reply; 76+ messages in thread
From: Babu Moger @ 2024-12-12 20:15 UTC (permalink / raw)
To: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc,
babu.moger, xin3.li, andrew.cooper3, ebiggers, mario.limonciello,
james.morse, tan.shaopeng, linux-doc, linux-kernel,
maciej.wieczor-retman, eranian
Add the functionality to enable/disable AMD ABMC feature.
AMD ABMC feature is enabled by setting enabled bit(0) in MSR
L3_QOS_EXT_CFG. When the state of ABMC is changed, the MSR needs
to be updated on all the logical processors in the QOS Domain.
Hardware counters will reset when ABMC state is changed.
The ABMC feature details are documented in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
v10: No changes.
v9: Re-ordered the MSR and added Reviewed-by tag.
v8: Commit message update and moved around the comments about L3_QOS_EXT_CFG
to _resctrl_abmc_enable.
v7: Renamed the function
resctrl_arch_get_abmc_enabled() to resctrl_arch_mbm_cntr_assign_enabled().
Merged resctrl_arch_mbm_cntr_assign_disable, resctrl_arch_mbm_cntr_assign_disable
and renamed to resctrl_arch_mbm_cntr_assign_set().
Moved the function definition to linux/resctrl.h.
Passed the struct rdt_resource to these functions.
Removed resctrl_arch_reset_rmid_all() from arch code. This will be done
from the caller.
v6: Renamed abmc_enabled to mbm_cntr_assign_enabled.
Used msr_set_bit and msr_clear_bit for msr updates.
Renamed resctrl_arch_abmc_enable() to resctrl_arch_mbm_cntr_assign_enable().
Renamed resctrl_arch_abmc_disable() to resctrl_arch_mbm_cntr_assign_disable().
Made _resctrl_abmc_enable to return void.
v5: Renamed resctrl_abmc_enable to resctrl_arch_abmc_enable.
Renamed resctrl_abmc_disable to resctrl_arch_abmc_disable.
Introduced resctrl_arch_get_abmc_enabled to get abmc state from
non-arch code.
Renamed resctrl_abmc_set_all to _resctrl_abmc_enable().
Modified commit log to make it clear about AMD ABMC feature.
v3: No changes.
v2: Few text changes in commit message.
---
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/resctrl/core.c | 5 ++++
arch/x86/kernel/cpu/resctrl/internal.h | 5 ++++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 36 ++++++++++++++++++++++++++
include/linux/resctrl.h | 3 +++
5 files changed, 50 insertions(+)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 3ae84c3b8e6d..bdc95b7cd1b0 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1194,6 +1194,7 @@
/* - AMD: */
#define MSR_IA32_MBA_BW_BASE 0xc0000200
#define MSR_IA32_SMBA_BW_BASE 0xc0000280
+#define MSR_IA32_L3_QOS_EXT_CFG 0xc00003ff
#define MSR_IA32_EVT_CFG_BASE 0xc0000400
/* AMD-V MSRs */
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 45f74d57de84..407a80454ae1 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -405,6 +405,11 @@ void rdt_ctrl_update(void *arg)
hw_res->msr_update(m);
}
+bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
+{
+ return resctrl_to_arch_res(r)->mbm_cntr_assign_enabled;
+}
+
/*
* rdt_find_domain - Search for a domain id in a resource domain list.
*
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 9dd1799adba3..c07a93da31cc 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -56,6 +56,9 @@
/* Max event bits supported */
#define MAX_EVT_CONFIG_BITS GENMASK(6, 0)
+/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
+#define ABMC_ENABLE_BIT 0
+
/**
* cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
* aren't marked nohz_full
@@ -477,6 +480,7 @@ struct rdt_parse_data {
* @mbm_cfg_mask: Bandwidth sources that can be tracked when Bandwidth
* Monitoring Event Configuration (BMEC) is supported.
* @cdp_enabled: CDP state of this resource
+ * @mbm_cntr_assign_enabled: ABMC feature is enabled
*
* Members of this structure are either private to the architecture
* e.g. mbm_width, or accessed via helpers that provide abstraction. e.g.
@@ -491,6 +495,7 @@ struct rdt_hw_resource {
unsigned int mbm_width;
unsigned int mbm_cfg_mask;
bool cdp_enabled;
+ bool mbm_cntr_assign_enabled;
};
static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 687d9d8d82a4..d54c2701c09c 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2402,6 +2402,42 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable)
return 0;
}
+static void resctrl_abmc_set_one_amd(void *arg)
+{
+ bool *enable = arg;
+
+ if (*enable)
+ msr_set_bit(MSR_IA32_L3_QOS_EXT_CFG, ABMC_ENABLE_BIT);
+ else
+ msr_clear_bit(MSR_IA32_L3_QOS_EXT_CFG, ABMC_ENABLE_BIT);
+}
+
+/*
+ * Update L3_QOS_EXT_CFG MSR on all the CPUs associated with the monitor
+ * domain.
+ */
+static void _resctrl_abmc_enable(struct rdt_resource *r, bool enable)
+{
+ struct rdt_mon_domain *d;
+
+ list_for_each_entry(d, &r->mon_domains, hdr.list)
+ on_each_cpu_mask(&d->hdr.cpu_mask,
+ resctrl_abmc_set_one_amd, &enable, 1);
+}
+
+int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable)
+{
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+
+ if (r->mon.mbm_cntr_assignable &&
+ hw_res->mbm_cntr_assign_enabled != enable) {
+ _resctrl_abmc_enable(r, enable);
+ hw_res->mbm_cntr_assign_enabled = enable;
+ }
+
+ return 0;
+}
+
/*
* We don't allow rdtgroup directories to be created anywhere
* except the root directory. Thus when looking for the rdtgroup
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 511cfce8fc21..f11d6fdfd977 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -355,4 +355,7 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *
extern unsigned int resctrl_rmid_realloc_threshold;
extern unsigned int resctrl_rmid_realloc_limit;
+int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable);
+bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r);
+
#endif /* _RESCTRL_H */
--
2.34.1
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v10 08/24] x86/resctrl: Introduce the interface to display monitor mode
2024-12-12 20:15 [PATCH v10 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (6 preceding siblings ...)
2024-12-12 20:15 ` [PATCH v10 07/24] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
@ 2024-12-12 20:15 ` Babu Moger
2024-12-19 21:59 ` Reinette Chatre
2024-12-12 20:15 ` [PATCH v10 09/24] x86/resctrl: Introduce interface to display number of monitoring counters Babu Moger
` (15 subsequent siblings)
23 siblings, 1 reply; 76+ messages in thread
From: Babu Moger @ 2024-12-12 20:15 UTC (permalink / raw)
To: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc,
babu.moger, xin3.li, andrew.cooper3, ebiggers, mario.limonciello,
james.morse, tan.shaopeng, linux-doc, linux-kernel,
maciej.wieczor-retman, eranian
Introduce the interface file "mbm_assign_mode" to list monitor modes
supported.
The "mbm_cntr_assign" mode provides the option to assign a counter to
an RMID, event pair and monitor the bandwidth as long as it is assigned.
On AMD systems "mbm_cntr_assign" is backed by the ABMC (Assignable
Bandwidth Monitoring Counters) hardware feature and is enabled by default.
The "default" mode is the existing monitoring mode that works without the
explicit counter assignment, instead relying on dynamic counter assignment
by hardware that may result in hardware not dedicating a counter resulting
in monitoring data reads returning "Unavailable".
Provide an interface to display the monitor mode on the system.
$ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
[mbm_cntr_assign]
default
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v10: Added few more text to user documentation clarify on the default mode.
v9: Updated user documentation based on comments.
v8: Commit message update.
v7: Updated the descriptions/commit log in resctrl.rst to generic text.
Thanks to James and Reinette.
Rename mbm_mode to mbm_assign_mode.
Introduced mutex lock in rdtgroup_mbm_mode_show().
v6: Added documentation for mbm_cntr_assign and legacy mode.
Moved mbm_mode fflags initialization to static initialization.
v5: Changed interface name to mbm_mode.
It will be always available even if ABMC feature is not supported.
Added description in resctrl.rst about ABMC mode.
Fixed display abmc and legacy consistantly.
v4: Fixed the checks for legacy and abmc mode. Default it ABMC.
v3: New patch to display ABMC capability.
---
Documentation/arch/x86/resctrl.rst | 33 ++++++++++++++++++++++++++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 31 ++++++++++++++++++++++++
2 files changed, 64 insertions(+)
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 30586728a4cd..1e4a1f615496 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -257,6 +257,39 @@ with the following files:
# cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
0=0x30;1=0x30;3=0x15;4=0x15
+"mbm_assign_mode":
+ Reports the list of monitoring modes supported. The enclosed brackets
+ indicate which mode is enabled.
+ ::
+
+ # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
+ [mbm_cntr_assign]
+ default
+
+ "mbm_cntr_assign":
+
+ In mbm_cntr_assign mode user-space is able to specify which of the
+ events in CTRL_MON or MON groups should have a counter assigned using the
+ "mbm_assign_control" file. The number of counters available is described
+ in the "num_mbm_cntrs" file. Changing the mode may cause all counters on
+ a resource to reset.
+
+ The mode is useful on AMD platforms which support more CTRL_MON and MON
+ groups than hardware counters, meaning 'unassigned' events on CTRL_MON or
+ MON groups will report 'Unavailable'.
+
+ AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature
+ enable this mode by default so that counters remain assigned even when the
+ corresponding RMID is not in use by any processor.
+
+ "default":
+
+ In default mode, resctrl assumes there is a hardware counter for each
+ event within every CTRL_MON and MON group. On AMD platforms, it is
+ recommended to use mbm_cntr_assign mode if supported, because reading
+ "mbm_total_bytes" or "mbm_local_bytes" will report 'Unavailable' if
+ there is no counter associated with that event.
+
"max_threshold_occupancy":
Read/write file provides the largest value (in
bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index d54c2701c09c..f25ff1430014 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -845,6 +845,30 @@ static int rdtgroup_rmid_show(struct kernfs_open_file *of,
return ret;
}
+static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
+ struct seq_file *s, void *v)
+{
+ struct rdt_resource *r = of->kn->parent->priv;
+
+ mutex_lock(&rdtgroup_mutex);
+
+ if (r->mon.mbm_cntr_assignable) {
+ if (resctrl_arch_mbm_cntr_assign_enabled(r)) {
+ seq_puts(s, "[mbm_cntr_assign]\n");
+ seq_puts(s, "default\n");
+ } else {
+ seq_puts(s, "mbm_cntr_assign\n");
+ seq_puts(s, "[default]\n");
+ }
+ } else {
+ seq_puts(s, "[default]\n");
+ }
+
+ mutex_unlock(&rdtgroup_mutex);
+
+ return 0;
+}
+
#ifdef CONFIG_PROC_CPU_RESCTRL
/*
@@ -1901,6 +1925,13 @@ static struct rftype res_common_files[] = {
.seq_show = mbm_local_bytes_config_show,
.write = mbm_local_bytes_config_write,
},
+ {
+ .name = "mbm_assign_mode",
+ .mode = 0444,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .seq_show = rdtgroup_mbm_assign_mode_show,
+ .fflags = RFTYPE_MON_INFO,
+ },
{
.name = "cpus",
.mode = 0644,
--
2.34.1
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v10 09/24] x86/resctrl: Introduce interface to display number of monitoring counters
2024-12-12 20:15 [PATCH v10 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (7 preceding siblings ...)
2024-12-12 20:15 ` [PATCH v10 08/24] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
@ 2024-12-12 20:15 ` Babu Moger
2024-12-19 22:03 ` Reinette Chatre
2024-12-12 20:15 ` [PATCH v10 10/24] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg in struct rdt_hw_mon_domain Babu Moger
` (14 subsequent siblings)
23 siblings, 1 reply; 76+ messages in thread
From: Babu Moger @ 2024-12-12 20:15 UTC (permalink / raw)
To: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc,
babu.moger, xin3.li, andrew.cooper3, ebiggers, mario.limonciello,
james.morse, tan.shaopeng, linux-doc, linux-kernel,
maciej.wieczor-retman, eranian
The mbm_cntr_assign mode provides an option to the user to assign a
counter to an RMID, event pair and monitor the bandwidth as long as
the counter is assigned. Number of assignments depend on number of
monitoring counters available.
Provide the interface to display the number of monitoring counters
supported. The interface file 'num_mbm_cntrs' is available when an
architecture supports mbm_cntr_assign mode.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v10: No changes.
v9: Updated user document based on the comments.
Will add a new file available_mbm_cntrs later in the series.
v8: Commit message update and documentation update.
v7: Minor commit log text changes.
v6: No changes.
v5: Changed the display name from num_cntrs to num_mbm_cntrs.
Updated the commit message.
Moved the patch after mbm_mode is introduced.
v4: Changed the counter name to num_cntrs. And few text changes.
v3: Changed the field name to mbm_assign_cntrs.
v2: Changed the field name to mbm_assignable_counters from abmc_counter.
---
---
Documentation/arch/x86/resctrl.rst | 12 ++++++++++++
arch/x86/kernel/cpu/resctrl/monitor.c | 1 +
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 16 ++++++++++++++++
3 files changed, 29 insertions(+)
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 1e4a1f615496..43a861adeada 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -290,6 +290,18 @@ with the following files:
"mbm_total_bytes" or "mbm_local_bytes" will report 'Unavailable' if
there is no counter associated with that event.
+"num_mbm_cntrs":
+ The number of monitoring counters available for assignment when the
+ architecture supports mbm_cntr_assign mode.
+
+ The resctrl file system supports tracking up to two memory bandwidth
+ events per monitoring group: mbm_total_bytes and/or mbm_local_bytes.
+ Up to two counters can be assigned per monitoring group, one for each
+ memory bandwidth event. More monitoring groups can be tracked by
+ assigning one counter per monitoring group. However, doing so limits
+ memory bandwidth tracking to a single memory bandwidth event per
+ monitoring group.
+
"max_threshold_occupancy":
Read/write file provides the largest value (in
bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 80be91671dc1..c23e94fa6852 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1237,6 +1237,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
r->mon.mbm_cntr_assignable = true;
cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
+ resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
}
}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index f25ff1430014..339bb0b09a82 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -869,6 +869,16 @@ static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
return 0;
}
+static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
+ struct seq_file *s, void *v)
+{
+ struct rdt_resource *r = of->kn->parent->priv;
+
+ seq_printf(s, "%d\n", r->mon.num_mbm_cntrs);
+
+ return 0;
+}
+
#ifdef CONFIG_PROC_CPU_RESCTRL
/*
@@ -1940,6 +1950,12 @@ static struct rftype res_common_files[] = {
.seq_show = rdtgroup_cpus_show,
.fflags = RFTYPE_BASE,
},
+ {
+ .name = "num_mbm_cntrs",
+ .mode = 0444,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .seq_show = rdtgroup_num_mbm_cntrs_show,
+ },
{
.name = "cpus_list",
.mode = 0644,
--
2.34.1
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v10 10/24] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg in struct rdt_hw_mon_domain
2024-12-12 20:15 [PATCH v10 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (8 preceding siblings ...)
2024-12-12 20:15 ` [PATCH v10 09/24] x86/resctrl: Introduce interface to display number of monitoring counters Babu Moger
@ 2024-12-12 20:15 ` Babu Moger
2024-12-12 20:15 ` [PATCH v10 11/24] x86/resctrl: Remove MSR reading of event configuration value Babu Moger
` (13 subsequent siblings)
23 siblings, 0 replies; 76+ messages in thread
From: Babu Moger @ 2024-12-12 20:15 UTC (permalink / raw)
To: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc,
babu.moger, xin3.li, andrew.cooper3, ebiggers, mario.limonciello,
james.morse, tan.shaopeng, linux-doc, linux-kernel,
maciej.wieczor-retman, eranian
If the BMEC (Bandwidth Monitoring Event Configuration) feature is
supported, the bandwidth events can be configured to track specific
events. The event configuration is domain specific. ABMC (Assignable
Bandwidth Monitoring Counters) feature needs event configuration
information to assign a hardware counter to an RMID. Event configurations
are not stored in resctrl but instead always read from or written to
hardware directly when prompted by user space.
Read the event configuration from the hardware during the domain
initialization. Save the configuration value in struct rdt_hw_mon_domain,
so it can be used for counter assignment.
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
v10: Conflicts due to code displacement. Actual code didnt change.
v9: Added Reviewed-by tag. No other changes.
v8: Renamed resctrl_mbm_evt_config_init() to arch_mbm_evt_config_init()
Minor commit message update.
v7: Fixed initializing INVALID_CONFIG_VALUE to mbm_local_cfg in case of error.
v6: Renamed resctrl_arch_mbm_evt_config -> resctrl_mbm_evt_config_init
Initialized value to INVALID_CONFIG_VALUE if it is not configurable.
Minor commit message update.
v5: Exported mon_event_config_index_get.
Renamed arch_domain_mbm_evt_config to resctrl_arch_mbm_evt_config.
v4: Read the configuration information from the hardware to initialize.
Added few commit messages.
Fixed the tab spaces.
v3: Minor changes related to rebase in mbm_config_write_domain.
v2: No changes.
---
arch/x86/kernel/cpu/resctrl/core.c | 2 ++
arch/x86/kernel/cpu/resctrl/internal.h | 9 +++++++++
arch/x86/kernel/cpu/resctrl/monitor.c | 26 ++++++++++++++++++++++++++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 4 +---
4 files changed, 38 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 407a80454ae1..136b081ed04b 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -632,6 +632,8 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
arch_mon_domain_online(r, d);
+ arch_mbm_evt_config_init(hw_dom);
+
if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
mon_domain_free(hw_dom);
return;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index c07a93da31cc..f864550ddd42 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -56,6 +56,9 @@
/* Max event bits supported */
#define MAX_EVT_CONFIG_BITS GENMASK(6, 0)
+#define INVALID_CONFIG_VALUE U32_MAX
+#define INVALID_CONFIG_INDEX UINT_MAX
+
/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
#define ABMC_ENABLE_BIT 0
@@ -401,6 +404,8 @@ struct rdt_hw_ctrl_domain {
* @d_resctrl: Properties exposed to the resctrl file system
* @arch_mbm_total: arch private state for MBM total bandwidth
* @arch_mbm_local: arch private state for MBM local bandwidth
+ * @mbm_total_cfg: MBM total bandwidth configuration
+ * @mbm_local_cfg: MBM local bandwidth configuration
*
* Members of this structure are accessed via helpers that provide abstraction.
*/
@@ -408,6 +413,8 @@ struct rdt_hw_mon_domain {
struct rdt_mon_domain d_resctrl;
struct arch_mbm_state *arch_mbm_total;
struct arch_mbm_state *arch_mbm_local;
+ u32 mbm_total_cfg;
+ u32 mbm_local_cfg;
};
static inline struct rdt_hw_ctrl_domain *resctrl_to_arch_ctrl_dom(struct rdt_ctrl_domain *r)
@@ -657,5 +664,7 @@ void __init resctrl_file_fflags_init(const char *config,
void rdt_staged_configs_clear(void);
bool closid_allocated(unsigned int closid);
int resctrl_find_cleanest_closid(void);
+void arch_mbm_evt_config_init(struct rdt_hw_mon_domain *hw_dom);
+unsigned int mon_event_config_index_get(u32 evtid);
#endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index c23e94fa6852..b07d60fabf1c 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1248,6 +1248,32 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
return 0;
}
+void arch_mbm_evt_config_init(struct rdt_hw_mon_domain *hw_dom)
+{
+ unsigned int index;
+ u64 msrval;
+
+ /*
+ * Read the configuration registers QOS_EVT_CFG_n, where <n> is
+ * the BMEC event number (EvtID).
+ */
+ if (mbm_total_event.configurable) {
+ index = mon_event_config_index_get(QOS_L3_MBM_TOTAL_EVENT_ID);
+ rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval);
+ hw_dom->mbm_total_cfg = msrval & MAX_EVT_CONFIG_BITS;
+ } else {
+ hw_dom->mbm_total_cfg = INVALID_CONFIG_VALUE;
+ }
+
+ if (mbm_local_event.configurable) {
+ index = mon_event_config_index_get(QOS_L3_MBM_LOCAL_EVENT_ID);
+ rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval);
+ hw_dom->mbm_local_cfg = msrval & MAX_EVT_CONFIG_BITS;
+ } else {
+ hw_dom->mbm_local_cfg = INVALID_CONFIG_VALUE;
+ }
+}
+
void __exit rdt_put_mon_l3_config(void)
{
dom_data_exit();
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 339bb0b09a82..541bd353c567 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1582,8 +1582,6 @@ struct mon_config_info {
u32 mon_config;
};
-#define INVALID_CONFIG_INDEX UINT_MAX
-
/**
* mon_event_config_index_get - get the hardware index for the
* configurable event
@@ -1593,7 +1591,7 @@ struct mon_config_info {
* 1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
* INVALID_CONFIG_INDEX for invalid evtid
*/
-static inline unsigned int mon_event_config_index_get(u32 evtid)
+unsigned int mon_event_config_index_get(u32 evtid)
{
switch (evtid) {
case QOS_L3_MBM_TOTAL_EVENT_ID:
--
2.34.1
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v10 11/24] x86/resctrl: Remove MSR reading of event configuration value
2024-12-12 20:15 [PATCH v10 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (9 preceding siblings ...)
2024-12-12 20:15 ` [PATCH v10 10/24] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg in struct rdt_hw_mon_domain Babu Moger
@ 2024-12-12 20:15 ` Babu Moger
2024-12-19 22:12 ` Reinette Chatre
2024-12-12 20:15 ` [PATCH v10 12/24] x86/resctrl: Introduce cntr_cfg to track assignable counters at domain Babu Moger
` (12 subsequent siblings)
23 siblings, 1 reply; 76+ messages in thread
From: Babu Moger @ 2024-12-12 20:15 UTC (permalink / raw)
To: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc,
babu.moger, xin3.li, andrew.cooper3, ebiggers, mario.limonciello,
james.morse, tan.shaopeng, linux-doc, linux-kernel,
maciej.wieczor-retman, eranian
The event configuration is domain specific and initialized during domain
initialization. The values are stored in struct rdt_hw_mon_domain.
It is not required to read the configuration register every time user asks
for it. Use the value stored in struct rdt_hw_mon_domain instead.
Introduce resctrl_arch_mon_event_config_get() and
resctrl_arch_mon_event_config_set() to get/set architecture domain specific
mbm_total_cfg/mbm_local_cfg values.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v10: Moved the mon_config_info structure definition to resctrl.h.
v9: Removed QOS_L3_OCCUP_EVENT_ID switch case in resctrl_arch_mon_event_config_set.
Fixed a unnecessary space.
v8: Renamed
resctrl_arch_event_config_get() to resctrl_arch_mon_event_config_get().
resctrl_arch_event_config_set() to resctrl_arch_mon_event_config_set().
v7: Removed check if (val == INVALID_CONFIG_VALUE) as resctrl_arch_event_config_get
already prints warning.
Kept the Event config value definitions as is.
v6: Fixed inconstancy with types. Made all the types to u32 for config
value.
Removed few rdt_last_cmd_puts as it is not necessary.
Removed unused config value definitions.
Few more updates to commit message.
v5: Introduced resctrl_arch_event_config_get and
resctrl_arch_event_config_get() based on our discussion.
https://lore.kernel.org/lkml/68e861f9-245d-4496-a72e-46fc57d19c62@amd.com/
v4: New patch.
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 105 +++++++++++++------------
include/linux/resctrl.h | 16 ++++
2 files changed, 72 insertions(+), 49 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 541bd353c567..682f47e0beb1 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1577,10 +1577,51 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
return ret;
}
-struct mon_config_info {
- u32 evtid;
- u32 mon_config;
-};
+u32 resctrl_arch_mon_event_config_get(struct rdt_mon_domain *d,
+ enum resctrl_event_id eventid)
+{
+ struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
+
+ switch (eventid) {
+ case QOS_L3_OCCUP_EVENT_ID:
+ break;
+ case QOS_L3_MBM_TOTAL_EVENT_ID:
+ return hw_dom->mbm_total_cfg;
+ case QOS_L3_MBM_LOCAL_EVENT_ID:
+ return hw_dom->mbm_local_cfg;
+ }
+
+ /* Never expect to get here */
+ WARN_ON_ONCE(1);
+
+ return INVALID_CONFIG_VALUE;
+}
+
+void resctrl_arch_mon_event_config_set(void *info)
+{
+ struct mon_config_info *mon_info = info;
+ struct rdt_hw_mon_domain *hw_dom;
+ unsigned int index;
+
+ index = mon_event_config_index_get(mon_info->evtid);
+ if (index == INVALID_CONFIG_INDEX)
+ return;
+
+ wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
+
+ hw_dom = resctrl_to_arch_mon_dom(mon_info->d);
+
+ switch (mon_info->evtid) {
+ case QOS_L3_MBM_TOTAL_EVENT_ID:
+ hw_dom->mbm_total_cfg = mon_info->mon_config;
+ break;
+ case QOS_L3_MBM_LOCAL_EVENT_ID:
+ hw_dom->mbm_local_cfg = mon_info->mon_config;
+ break;
+ default:
+ break;
+ }
+}
/**
* mon_event_config_index_get - get the hardware index for the
@@ -1604,33 +1645,11 @@ unsigned int mon_event_config_index_get(u32 evtid)
}
}
-static void mon_event_config_read(void *info)
-{
- struct mon_config_info *mon_info = info;
- unsigned int index;
- u64 msrval;
-
- index = mon_event_config_index_get(mon_info->evtid);
- if (index == INVALID_CONFIG_INDEX) {
- pr_warn_once("Invalid event id %d\n", mon_info->evtid);
- return;
- }
- rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval);
-
- /* Report only the valid event configuration bits */
- mon_info->mon_config = msrval & MAX_EVT_CONFIG_BITS;
-}
-
-static void mondata_config_read(struct rdt_mon_domain *d, struct mon_config_info *mon_info)
-{
- smp_call_function_any(&d->hdr.cpu_mask, mon_event_config_read, mon_info, 1);
-}
-
static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
{
- struct mon_config_info mon_info;
struct rdt_mon_domain *dom;
bool sep = false;
+ u32 val;
cpus_read_lock();
mutex_lock(&rdtgroup_mutex);
@@ -1639,11 +1658,8 @@ static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid
if (sep)
seq_puts(s, ";");
- memset(&mon_info, 0, sizeof(struct mon_config_info));
- mon_info.evtid = evtid;
- mondata_config_read(dom, &mon_info);
-
- seq_printf(s, "%d=0x%02x", dom->hdr.id, mon_info.mon_config);
+ val = resctrl_arch_mon_event_config_get(dom, evtid);
+ seq_printf(s, "%d=0x%02x", dom->hdr.id, val);
sep = true;
}
seq_puts(s, "\n");
@@ -1674,33 +1690,23 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
return 0;
}
-static void mon_event_config_write(void *info)
-{
- struct mon_config_info *mon_info = info;
- unsigned int index;
-
- index = mon_event_config_index_get(mon_info->evtid);
- if (index == INVALID_CONFIG_INDEX) {
- pr_warn_once("Invalid event id %d\n", mon_info->evtid);
- return;
- }
- wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
-}
static void mbm_config_write_domain(struct rdt_resource *r,
struct rdt_mon_domain *d, u32 evtid, u32 val)
{
struct mon_config_info mon_info = {0};
+ u32 config_val;
/*
- * Read the current config value first. If both are the same then
+ * Check the current config value first. If both are the same then
* no need to write it again.
*/
- mon_info.evtid = evtid;
- mondata_config_read(d, &mon_info);
- if (mon_info.mon_config == val)
+ config_val = resctrl_arch_mon_event_config_get(d, evtid);
+ if (config_val == INVALID_CONFIG_VALUE || config_val == val)
return;
+ mon_info.d = d;
+ mon_info.evtid = evtid;
mon_info.mon_config = val;
/*
@@ -1709,7 +1715,8 @@ static void mbm_config_write_domain(struct rdt_resource *r,
* are scoped at the domain level. Writing any of these MSRs
* on one CPU is observed by all the CPUs in the domain.
*/
- smp_call_function_any(&d->hdr.cpu_mask, mon_event_config_write,
+ smp_call_function_any(&d->hdr.cpu_mask,
+ resctrl_arch_mon_event_config_set,
&mon_info, 1);
/*
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index f11d6fdfd977..c8ab3d7a0dab 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -118,6 +118,18 @@ struct rdt_mon_domain {
int cqm_work_cpu;
};
+/**
+ * struct mon_config_info - Monitoring event configuratiin details
+ * @d: Domain for the event
+ * @evtid: Event type
+ * @mon_config: Event configuration value
+ */
+struct mon_config_info {
+ struct rdt_mon_domain *d;
+ enum resctrl_event_id evtid;
+ u32 mon_config;
+};
+
/**
* struct resctrl_cache - Cache allocation related data
* @cbm_len: Length of the cache bit mask
@@ -352,6 +364,10 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
*/
void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d);
+void resctrl_arch_mon_event_config_set(void *info);
+u32 resctrl_arch_mon_event_config_get(struct rdt_mon_domain *d,
+ enum resctrl_event_id eventid);
+
extern unsigned int resctrl_rmid_realloc_threshold;
extern unsigned int resctrl_rmid_realloc_limit;
--
2.34.1
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v10 12/24] x86/resctrl: Introduce cntr_cfg to track assignable counters at domain
2024-12-12 20:15 [PATCH v10 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (10 preceding siblings ...)
2024-12-12 20:15 ` [PATCH v10 11/24] x86/resctrl: Remove MSR reading of event configuration value Babu Moger
@ 2024-12-12 20:15 ` Babu Moger
2024-12-19 22:33 ` Reinette Chatre
2024-12-12 20:15 ` [PATCH v10 13/24] x86/resctrl: Introduce interface to display number of free counters Babu Moger
` (11 subsequent siblings)
23 siblings, 1 reply; 76+ messages in thread
From: Babu Moger @ 2024-12-12 20:15 UTC (permalink / raw)
To: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc,
babu.moger, xin3.li, andrew.cooper3, ebiggers, mario.limonciello,
james.morse, tan.shaopeng, linux-doc, linux-kernel,
maciej.wieczor-retman, eranian
In mbm_assign_mode, the MBM counters are assigned/unassigned to an RMID,
event pair in a resctrl group and monitor the bandwidth as long as it is
assigned. Counters are assigned/unassigned at domain level and needs to
be tracked at domain level.
Add the mbm_assign_cntr_cfg data structure to struct rdt_ctrl_domain to
manage and track MBM counter assignments at the domain level.
Suggested-by: Peter Newman <peternewman@google.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v10: Patch changed completely to handle the counters at domain level.
https://lore.kernel.org/lkml/CALPaoCj+zWq1vkHVbXYP0znJbe6Ke3PXPWjtri5AFgD9cQDCUg@mail.gmail.com/
Removed Reviewed-by tag.
Did not see the need to add cntr_id in mbm_state structure. Not used in the code.
v9: Added Reviewed-by tag. No other changes.
v8: Minor commit message changes.
v7: Added check mbm_cntr_assignable for allocating bitmap mbm_cntr_map
v6: New patch to add domain level assignment.
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 11 +++++++++++
include/linux/resctrl.h | 12 ++++++++++++
2 files changed, 23 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 682f47e0beb1..1ee008a63d8b 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -4068,6 +4068,7 @@ static void __init rdtgroup_setup_default(void)
static void domain_destroy_mon_state(struct rdt_mon_domain *d)
{
+ kfree(d->cntr_cfg);
bitmap_free(d->rmid_busy_llc);
kfree(d->mbm_total);
kfree(d->mbm_local);
@@ -4141,6 +4142,16 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain
return -ENOMEM;
}
}
+ if (is_mbm_enabled() && r->mon.mbm_cntr_assignable) {
+ tsize = sizeof(*d->cntr_cfg);
+ d->cntr_cfg = kcalloc(r->mon.num_mbm_cntrs, tsize, GFP_KERNEL);
+ if (!d->cntr_cfg) {
+ bitmap_free(d->rmid_busy_llc);
+ kfree(d->mbm_total);
+ kfree(d->mbm_local);
+ return -ENOMEM;
+ }
+ }
return 0;
}
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index c8ab3d7a0dab..03c67d9156f3 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -94,6 +94,16 @@ struct rdt_ctrl_domain {
u32 *mbps_val;
};
+/**
+ * struct mbm_cntr_cfg -Assignable counter configuration
+ * @evtid: Event type
+ * @rdtgroup: Resctrl group assigned to the counter
+ */
+struct mbm_cntr_cfg {
+ enum resctrl_event_id evtid;
+ struct rdtgroup *rdtgrp;
+};
+
/**
* struct rdt_mon_domain - group of CPUs sharing a resctrl monitor resource
* @hdr: common header for different domain types
@@ -105,6 +115,7 @@ struct rdt_ctrl_domain {
* @cqm_limbo: worker to periodically read CQM h/w counters
* @mbm_work_cpu: worker CPU for MBM h/w counters
* @cqm_work_cpu: worker CPU for CQM h/w counters
+ * @cntr_cfg: Assignable counters configuration
*/
struct rdt_mon_domain {
struct rdt_domain_hdr hdr;
@@ -116,6 +127,7 @@ struct rdt_mon_domain {
struct delayed_work cqm_limbo;
int mbm_work_cpu;
int cqm_work_cpu;
+ struct mbm_cntr_cfg *cntr_cfg;
};
/**
--
2.34.1
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v10 13/24] x86/resctrl: Introduce interface to display number of free counters
2024-12-12 20:15 [PATCH v10 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (11 preceding siblings ...)
2024-12-12 20:15 ` [PATCH v10 12/24] x86/resctrl: Introduce cntr_cfg to track assignable counters at domain Babu Moger
@ 2024-12-12 20:15 ` Babu Moger
2024-12-19 22:50 ` Reinette Chatre
2024-12-12 20:15 ` [PATCH v10 14/24] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
` (10 subsequent siblings)
23 siblings, 1 reply; 76+ messages in thread
From: Babu Moger @ 2024-12-12 20:15 UTC (permalink / raw)
To: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc,
babu.moger, xin3.li, andrew.cooper3, ebiggers, mario.limonciello,
james.morse, tan.shaopeng, linux-doc, linux-kernel,
maciej.wieczor-retman, eranian
Provide the interface to display the number of monitoring counters
available for assignment in each domain when mbm_cntr_assign is supported.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v10: Patch changed to handle the counters at domain level.
https://lore.kernel.org/lkml/CALPaoCj+zWq1vkHVbXYP0znJbe6Ke3PXPWjtri5AFgD9cQDCUg@mail.gmail.com/
So, display logic also changed now.
v9: New patch
---
Documentation/arch/x86/resctrl.rst | 4 +++
arch/x86/kernel/cpu/resctrl/monitor.c | 1 +
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 47 ++++++++++++++++++++++++++
3 files changed, 52 insertions(+)
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 43a861adeada..c075fcee96b7 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -302,6 +302,10 @@ with the following files:
memory bandwidth tracking to a single memory bandwidth event per
monitoring group.
+"available_mbm_cntrs":
+ The number of monitoring counters available for assignment in each
+ domain when the architecture supports mbm_cntr_assign mode.
+
"max_threshold_occupancy":
Read/write file provides the largest value (in
bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index b07d60fabf1c..f857af361af1 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1238,6 +1238,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
+ resctrl_file_fflags_init("available_mbm_cntrs", RFTYPE_MON_INFO);
}
}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 1ee008a63d8b..72518e0ec2ec 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -879,6 +879,47 @@ static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
return 0;
}
+static int rdtgroup_available_mbm_cntrs_show(struct kernfs_open_file *of,
+ struct seq_file *s, void *v)
+{
+ struct rdt_resource *r = of->kn->parent->priv;
+ struct rdt_mon_domain *dom;
+ bool sep = false;
+ u32 cntrs, i;
+ int ret = 0;
+
+ cpus_read_lock();
+ mutex_lock(&rdtgroup_mutex);
+
+ if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
+ rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
+ ret = -EINVAL;
+ goto unlock_cntrs_show;
+ }
+
+
+ list_for_each_entry(dom, &r->mon_domains, hdr.list) {
+ if (sep)
+ seq_puts(s, ";");
+
+ cntrs = 0;
+ for (i = 0; i < r->mon.num_mbm_cntrs; i++) {
+ if (!dom->cntr_cfg[i].rdtgrp)
+ cntrs++;
+ }
+
+ seq_printf(s, "%d=%d", dom->hdr.id, cntrs);
+ sep = true;
+ }
+ seq_puts(s, "\n");
+
+unlock_cntrs_show:
+ mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();
+
+ return ret;
+}
+
#ifdef CONFIG_PROC_CPU_RESCTRL
/*
@@ -1961,6 +2002,12 @@ static struct rftype res_common_files[] = {
.kf_ops = &rdtgroup_kf_single_ops,
.seq_show = rdtgroup_num_mbm_cntrs_show,
},
+ {
+ .name = "available_mbm_cntrs",
+ .mode = 0444,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .seq_show = rdtgroup_available_mbm_cntrs_show,
+ },
{
.name = "cpus_list",
.mode = 0644,
--
2.34.1
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v10 14/24] x86/resctrl: Add data structures and definitions for ABMC assignment
2024-12-12 20:15 [PATCH v10 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (12 preceding siblings ...)
2024-12-12 20:15 ` [PATCH v10 13/24] x86/resctrl: Introduce interface to display number of free counters Babu Moger
@ 2024-12-12 20:15 ` Babu Moger
2024-12-12 20:15 ` [PATCH v10 15/24] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC Babu Moger
` (9 subsequent siblings)
23 siblings, 0 replies; 76+ messages in thread
From: Babu Moger @ 2024-12-12 20:15 UTC (permalink / raw)
To: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc,
babu.moger, xin3.li, andrew.cooper3, ebiggers, mario.limonciello,
james.morse, tan.shaopeng, linux-doc, linux-kernel,
maciej.wieczor-retman, eranian
The ABMC feature provides an option to the user to assign a hardware
counter to an RMID, event pair and monitor the bandwidth as long as the
counter is assigned. The bandwidth events will be tracked by the hardware
until the user changes the configuration. Each resctrl group can configure
maximum two counters, one for total event and one for local event.
The ABMC feature implements an MSR L3_QOS_ABMC_CFG (C000_03FDh).
Configuration is done by setting the counter id, bandwidth source (RMID)
and bandwidth configuration supported by BMEC (Bandwidth Monitoring Event
Configuration).
Attempts to read or write the MSR when ABMC is not enabled will result
in a #GP(0) exception.
Introduce the data structures and definitions for MSR L3_QOS_ABMC_CFG
(0xC000_03FDh):
=========================================================================
Bits Mnemonic Description Access Reset
Type Value
=========================================================================
63 CfgEn Configuration Enable R/W 0
62 CtrEn Enable/disable counting R/W 0
61:53 – Reserved MBZ 0
52:48 CtrID Counter Identifier R/W 0
47 IsCOS BwSrc field is a CLOSID R/W 0
(not an RMID)
46:44 – Reserved MBZ 0
43:32 BwSrc Bandwidth Source R/W 0
(RMID or CLOSID)
31:0 BwType Bandwidth configuration R/W 0
to track for this counter
==========================================================================
The feature details are documented in the APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
v10: No changes.
v9: Removed the references of L3_QOS_ABMC_DSC.
Text changes about configuration in kernel doc.
v8: Update the configuration notes in kernel_doc.
Few commit message update.
v7: Removed the reference of L3_QOS_ABMC_DSC as it is not used anymore.
Moved the configuration notes to kernel_doc.
Adjusted the tabs for l3_qos_abmc_cfg and checkpatch seems happy.
v6: Removed all the fs related changes.
Added note on CfgEn,CtrEn.
Removed the definitions which are not used.
Removed cntr_id initialization.
v5: Moved assignment flags here (path 10/19 of v4).
Added MON_CNTR_UNSET definition to initialize cntr_id's.
More details in commit log.
Renamed few fields in l3_qos_abmc_cfg for readability.
v4: Added more descriptions.
Changed the name abmc_ctr_id to ctr_id.
Added L3_QOS_ABMC_DSC. Used for reading the configuration.
v3: No changes.
v2: No changes.
---
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/resctrl/internal.h | 35 ++++++++++++++++++++++++++
2 files changed, 36 insertions(+)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index bdc95b7cd1b0..d7dec2326cd8 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1194,6 +1194,7 @@
/* - AMD: */
#define MSR_IA32_MBA_BW_BASE 0xc0000200
#define MSR_IA32_SMBA_BW_BASE 0xc0000280
+#define MSR_IA32_L3_QOS_ABMC_CFG 0xc00003fd
#define MSR_IA32_L3_QOS_EXT_CFG 0xc00003ff
#define MSR_IA32_EVT_CFG_BASE 0xc0000400
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index f864550ddd42..35bcf0e5ba7e 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -602,6 +602,41 @@ union cpuid_0x10_x_edx {
unsigned int full;
};
+/*
+ * ABMC counters are configured by writing to L3_QOS_ABMC_CFG.
+ * @bw_type : Bandwidth configuration (supported by BMEC)
+ * tracked by the @cntr_id.
+ * @bw_src : Bandwidth source (RMID or CLOSID).
+ * @reserved1 : Reserved.
+ * @is_clos : @bw_src field is a CLOSID (not an RMID).
+ * @cntr_id : Counter identifier.
+ * @reserved : Reserved.
+ * @cntr_en : Counting enable bit.
+ * @cfg_en : Configuration enable bit.
+ *
+ * Configuration and counting:
+ * Counter can be configured across multiple writes to MSR. Configuration
+ * is applied only when @cfg_en = 1. Counter @cntr_id is reset when the
+ * configuration is applied.
+ * @cfg_en = 1, @cntr_en = 0 : Apply @cntr_id configuration but do not
+ * count events.
+ * @cfg_en = 1, @cntr_en = 1 : Apply @cntr_id configuration and start
+ * counting events.
+ */
+union l3_qos_abmc_cfg {
+ struct {
+ unsigned long bw_type :32,
+ bw_src :12,
+ reserved1: 3,
+ is_clos : 1,
+ cntr_id : 5,
+ reserved : 9,
+ cntr_en : 1,
+ cfg_en : 1;
+ } split;
+ unsigned long full;
+};
+
void rdt_last_cmd_clear(void);
void rdt_last_cmd_puts(const char *s);
__printf(1, 2)
--
2.34.1
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v10 15/24] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
2024-12-12 20:15 [PATCH v10 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (13 preceding siblings ...)
2024-12-12 20:15 ` [PATCH v10 14/24] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
@ 2024-12-12 20:15 ` Babu Moger
2024-12-19 23:04 ` Reinette Chatre
2024-12-12 20:15 ` [PATCH v10 16/24] x86/resctrl: Add interface to the assign counter Babu Moger
` (8 subsequent siblings)
23 siblings, 1 reply; 76+ messages in thread
From: Babu Moger @ 2024-12-12 20:15 UTC (permalink / raw)
To: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc,
babu.moger, xin3.li, andrew.cooper3, ebiggers, mario.limonciello,
james.morse, tan.shaopeng, linux-doc, linux-kernel,
maciej.wieczor-retman, eranian
The ABMC feature provides an option to the user to assign a hardware
counter to an RMID, event pair and monitor the bandwidth as long as it is
assigned. The assigned RMID will be tracked by the hardware until the user
unassigns it manually.
Configure the counters by writing to the L3_QOS_ABMC_CFG MSR and specifying
the counter ID, bandwidth source (RMID), and bandwidth event configuration.
Provide the interface to assign the counter ids to RMID.
The feature details are documented in the APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v10: Added call resctrl_arch_reset_rmid() to reset the RMID in the domain
inside IPI call.
SMP and non-SMP call support is not required in resctrl_arch_config_cntr
with new domain specific assign approach/data structure.
Commit message update.
v9: Removed the code to reset the architectural state. It will done
in another patch.
v8: Rename resctrl_arch_assign_cntr to resctrl_arch_config_cntr.
v7: Separated arch and fs functions. This patch only has arch implementation.
Added struct rdt_resource to the interface resctrl_arch_assign_cntr.
Rename rdtgroup_abmc_cfg() to resctrl_abmc_config_one_amd().
v6: Removed mbm_cntr_alloc() from this patch to keep fs and arch code
separate.
Added code to update the counter assignment at domain level.
v5: Few name changes to match cntr_id.
Changed the function names to
rdtgroup_assign_cntr
resctr_arch_assign_cntr
More comments on commit log.
Added function summary.
v4: Commit message update.
User bitmap APIs where applicable.
Changed the interfaces considering MPAM(arm).
Added domain specific assignment.
v3: Removed the static from the prototype of rdtgroup_assign_abmc.
The function is not called directly from user anymore. These
changes are related to global assignment interface.
v2: Minor text changes in commit message.
---
arch/x86/kernel/cpu/resctrl/internal.h | 3 ++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 58 ++++++++++++++++++++++++++
2 files changed, 61 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 35bcf0e5ba7e..849bcfe4ea5b 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -701,5 +701,8 @@ bool closid_allocated(unsigned int closid);
int resctrl_find_cleanest_closid(void);
void arch_mbm_evt_config_init(struct rdt_hw_mon_domain *hw_dom);
unsigned int mon_event_config_index_get(u32 evtid);
+int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+ enum resctrl_event_id evtid, u32 rmid, u32 closid,
+ u32 cntr_id, bool assign);
#endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 72518e0ec2ec..e895d2415f22 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1686,6 +1686,34 @@ unsigned int mon_event_config_index_get(u32 evtid)
}
}
+struct cntr_config {
+ struct rdt_resource *r;
+ struct rdt_mon_domain *d;
+ enum resctrl_event_id evtid;
+ u32 rmid;
+ u32 closid;
+ u32 cntr_id;
+ u32 val;
+ bool assign;
+};
+
+static void resctrl_abmc_config_one_amd(void *info)
+{
+ struct cntr_config *config = info;
+ union l3_qos_abmc_cfg abmc_cfg = { 0 };
+
+ abmc_cfg.split.cfg_en = 1;
+ abmc_cfg.split.cntr_en = config->assign ? 1 : 0;
+ abmc_cfg.split.cntr_id = config->cntr_id;
+ abmc_cfg.split.bw_src = config->rmid;
+ abmc_cfg.split.bw_type = config->val;
+
+ wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, abmc_cfg.full);
+
+ resctrl_arch_reset_rmid(config->r, config->d, config->closid,
+ config->rmid, config->evtid);
+}
+
static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
{
struct rdt_mon_domain *dom;
@@ -1869,6 +1897,36 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
return ret ?: nbytes;
}
+/*
+ * Send an IPI to the domain to assign the counter to RMID, event pair.
+ */
+int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+ enum resctrl_event_id evtid, u32 rmid, u32 closid,
+ u32 cntr_id, bool assign)
+{
+ struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
+ struct cntr_config config = { 0 };
+
+ config.r = r;
+ config.d = d;
+ config.evtid = evtid;
+ config.rmid = rmid;
+ config.closid = closid;
+ config.cntr_id = cntr_id;
+
+ /* Update the event configuration from the domain */
+ if (evtid == QOS_L3_MBM_TOTAL_EVENT_ID)
+ config.val = hw_dom->mbm_total_cfg;
+ else
+ config.val = hw_dom->mbm_local_cfg;
+
+ config.assign = assign;
+
+ smp_call_function_any(&d->hdr.cpu_mask, resctrl_abmc_config_one_amd, &config, 1);
+
+ return 0;
+}
+
/* rdtgroup information files for one cache resource. */
static struct rftype res_common_files[] = {
{
--
2.34.1
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v10 16/24] x86/resctrl: Add interface to the assign counter
2024-12-12 20:15 [PATCH v10 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (14 preceding siblings ...)
2024-12-12 20:15 ` [PATCH v10 15/24] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC Babu Moger
@ 2024-12-12 20:15 ` Babu Moger
2024-12-12 23:37 ` Luck, Tony
2024-12-19 23:22 ` Reinette Chatre
2024-12-12 20:15 ` [PATCH v10 17/24] x86/resctrl: Add the interface to unassign a counter Babu Moger
` (7 subsequent siblings)
23 siblings, 2 replies; 76+ messages in thread
From: Babu Moger @ 2024-12-12 20:15 UTC (permalink / raw)
To: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc,
babu.moger, xin3.li, andrew.cooper3, ebiggers, mario.limonciello,
james.morse, tan.shaopeng, linux-doc, linux-kernel,
maciej.wieczor-retman, eranian
The mbm_cntr_assign mode offers several counters that can be assigned
to an RMID, event pair and monitor the bandwidth as long as it is
assigned.
Counters are managed at the domain level. Introduce the interface to
allocate/free/assign the counters.
If the user requests assignments across all domains, some domains may
fail if they run out of counters. Ensure assignments continue in other
domains wherever possible.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v10: Patch changed completely.
Counters are managed at the domain based on the discussion.
https://lore.kernel.org/lkml/CALPaoCj+zWq1vkHVbXYP0znJbe6Ke3PXPWjtri5AFgD9cQDCUg@mail.gmail.com/
Reset non-architectural MBM state.
Commit message update.
v9: Introduced new function resctrl_config_cntr to assign the counter, update
the bitmap and reset the architectural state.
Taken care of error handling(freeing the counter) when assignment fails.
Moved mbm_cntr_assigned_to_domain here as it used in this patch.
Minor text changes.
v8: Renamed rdtgroup_assign_cntr() to rdtgroup_assign_cntr_event().
Added the code to return the error if rdtgroup_assign_cntr_event fails.
Moved definition of MBM_EVENT_ARRAY_INDEX to resctrl/internal.h.
Updated typo in the comments.
v7: New patch. Moved all the FS code here.
Merged rdtgroup_assign_cntr and rdtgroup_alloc_cntr.
Adde new #define MBM_EVENT_ARRAY_INDEX.
---
arch/x86/kernel/cpu/resctrl/internal.h | 5 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 4 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 110 +++++++++++++++++++++++++
3 files changed, 116 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 849bcfe4ea5b..70d2577fc377 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -704,5 +704,8 @@ unsigned int mon_event_config_index_get(u32 evtid);
int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
enum resctrl_event_id evtid, u32 rmid, u32 closid,
u32 cntr_id, bool assign);
-
+int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
+ struct rdt_mon_domain *d, enum resctrl_event_id evtid);
+struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
+ u32 rmid, enum resctrl_event_id evtid);
#endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index f857af361af1..8823cd97ff1f 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -575,8 +575,8 @@ void free_rmid(u32 closid, u32 rmid)
list_add_tail(&entry->list, &rmid_free_lru);
}
-static struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
- u32 rmid, enum resctrl_event_id evtid)
+struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
+ u32 rmid, enum resctrl_event_id evtid)
{
u32 idx = resctrl_arch_rmid_idx_encode(closid, rmid);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index e895d2415f22..1c8694a68cf4 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1927,6 +1927,116 @@ int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
return 0;
}
+/*
+ * Configure the counter for the event, RMID pair for the domain.
+ */
+static int resctrl_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+ enum resctrl_event_id evtid, u32 rmid, u32 closid,
+ u32 cntr_id, bool assign)
+{
+ struct mbm_state *m;
+ int ret;
+
+ ret = resctrl_arch_config_cntr(r, d, evtid, rmid, closid, cntr_id, assign);
+ if (ret)
+ return ret;
+
+ m = get_mbm_state(d, closid, rmid, evtid);
+ if (m)
+ memset(m, 0, sizeof(struct mbm_state));
+
+ return ret;
+}
+
+static bool mbm_cntr_assigned(struct rdt_resource *r, struct rdt_mon_domain *d,
+ struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
+{
+ int cntr_id;
+
+ for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
+ if (d->cntr_cfg[cntr_id].rdtgrp == rdtgrp &&
+ d->cntr_cfg[cntr_id].evtid == evtid)
+ return true;
+ }
+
+ return false;
+}
+
+static int mbm_cntr_alloc(struct rdt_resource *r, struct rdt_mon_domain *d,
+ struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
+{
+ int cntr_id;
+
+ for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
+ if (!d->cntr_cfg[cntr_id].rdtgrp) {
+ d->cntr_cfg[cntr_id].rdtgrp = rdtgrp;
+ d->cntr_cfg[cntr_id].evtid = evtid;
+ return cntr_id;
+ }
+ }
+
+ return -EINVAL;
+}
+
+static void mbm_cntr_free(struct rdt_resource *r, struct rdt_mon_domain *d,
+ struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
+{
+ int cntr_id;
+
+ for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
+ if (d->cntr_cfg[cntr_id].rdtgrp == rdtgrp &&
+ d->cntr_cfg[cntr_id].evtid == evtid)
+ memset(&d->cntr_cfg[cntr_id], 0, sizeof(struct mbm_cntr_cfg));
+ }
+}
+
+/*
+ * Assign a hardware counter to event @evtid of group @rdtgrp.
+ * Counter will be assigned to all the domains if rdt_mon_domain is NULL
+ * else the counter will be assigned to specific domain.
+ */
+int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
+ struct rdt_mon_domain *d, enum resctrl_event_id evtid)
+{
+ int cntr_id, ret = 0;
+
+ if (!d) {
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
+ if (mbm_cntr_assigned(r, d, rdtgrp, evtid))
+ continue;
+
+ cntr_id = mbm_cntr_alloc(r, d, rdtgrp, evtid);
+ if (cntr_id < 0) {
+ rdt_last_cmd_puts("Domain Out of MBM assignable counters\n");
+ continue;
+ }
+
+ ret = resctrl_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
+ rdtgrp->closid, cntr_id, true);
+ if (ret)
+ goto out_done_assign;
+ }
+ } else {
+ if (mbm_cntr_assigned(r, d, rdtgrp, evtid))
+ goto out_done_assign;
+
+ cntr_id = mbm_cntr_alloc(r, d, rdtgrp, evtid);
+ if (cntr_id < 0) {
+ rdt_last_cmd_puts("Domain Out of MBM assignable counters\n");
+ goto out_done_assign;
+ }
+
+ ret = resctrl_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
+ rdtgrp->closid, cntr_id, true);
+ }
+
+out_done_assign:
+ if (ret)
+ mbm_cntr_free(r, d, rdtgrp, evtid);
+
+ return ret;
+}
+
/* rdtgroup information files for one cache resource. */
static struct rftype res_common_files[] = {
{
--
2.34.1
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v10 17/24] x86/resctrl: Add the interface to unassign a counter
2024-12-12 20:15 [PATCH v10 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (15 preceding siblings ...)
2024-12-12 20:15 ` [PATCH v10 16/24] x86/resctrl: Add interface to the assign counter Babu Moger
@ 2024-12-12 20:15 ` Babu Moger
2024-12-19 23:32 ` Reinette Chatre
2024-12-12 20:15 ` [PATCH v10 18/24] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled Babu Moger
` (6 subsequent siblings)
23 siblings, 1 reply; 76+ messages in thread
From: Babu Moger @ 2024-12-12 20:15 UTC (permalink / raw)
To: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc,
babu.moger, xin3.li, andrew.cooper3, ebiggers, mario.limonciello,
james.morse, tan.shaopeng, linux-doc, linux-kernel,
maciej.wieczor-retman, eranian
The mbm_cntr_assign mode provides a limited number of hardware counters
that can be assigned to an RMID, event pair to monitor bandwidth while
assigned. If all counters are in use, the kernel will show an error
message: "Out of MBM assignable counters" when a new assignment is
requested. To make space for a new assignment, users must unassign an
already assigned counter.
Introduce an interface that allows for the unassignment of counter IDs
from the domain.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v10: Patch changed again.
Counters are managed at the domain based on the discussion.
https://lore.kernel.org/lkml/CALPaoCj+zWq1vkHVbXYP0znJbe6Ke3PXPWjtri5AFgD9cQDCUg@mail.gmail.com/
commit message update.
v9: Changes related to addition of new function resctrl_config_cntr().
The removed rdtgroup_mbm_cntr_is_assigned() as it was introduced
already.
Text changes to take care comments.
v8: Renamed rdtgroup_mbm_cntr_is_assigned to mbm_cntr_assigned_to_domain
Added return error handling in resctrl_arch_config_cntr().
v7: Merged rdtgroup_unassign_cntr and rdtgroup_free_cntr functions.
Renamed rdtgroup_mbm_cntr_test() to rdtgroup_mbm_cntr_is_assigned().
Reworded the commit log little bit.
v6: Removed mbm_cntr_free from this patch.
Added counter test in all the domains and free if it is not assigned to
any domains.
v5: Few name changes to match cntr_id.
Changed the function names to rdtgroup_unassign_cntr
More comments on commit log.
v4: Added domain specific unassign feature.
Few name changes.
v3: Removed the static from the prototype of rdtgroup_unassign_abmc.
The function is not called directly from user anymore. These
changes are related to global assignment interface.
v2: No changes.
---
arch/x86/kernel/cpu/resctrl/internal.h | 2 +
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 52 ++++++++++++++++++++++++++
2 files changed, 54 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 70d2577fc377..f858098dbe4b 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -706,6 +706,8 @@ int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
u32 cntr_id, bool assign);
int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
struct rdt_mon_domain *d, enum resctrl_event_id evtid);
+int rdtgroup_unassign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
+ struct rdt_mon_domain *d, enum resctrl_event_id evtid);
struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
u32 rmid, enum resctrl_event_id evtid);
#endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 1c8694a68cf4..a71a8389b649 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1990,6 +1990,20 @@ static void mbm_cntr_free(struct rdt_resource *r, struct rdt_mon_domain *d,
}
}
+static int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
+ struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
+{
+ int cntr_id;
+
+ for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
+ if (d->cntr_cfg[cntr_id].rdtgrp == rdtgrp &&
+ d->cntr_cfg[cntr_id].evtid == evtid)
+ return cntr_id;
+ }
+
+ return -EINVAL;
+}
+
/*
* Assign a hardware counter to event @evtid of group @rdtgrp.
* Counter will be assigned to all the domains if rdt_mon_domain is NULL
@@ -2037,6 +2051,44 @@ int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
return ret;
}
+/*
+ * Unassign a hardware counter associated with @evtid from the domain and
+ * the group. Unassign the counters from all the domains if rdt_mon_domain
+ * is NULL else unassign from the specific domain.
+ */
+int rdtgroup_unassign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
+ struct rdt_mon_domain *d, enum resctrl_event_id evtid)
+{
+ int cntr_id, ret = 0;
+
+ if (!d) {
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
+ if (!mbm_cntr_assigned(r, d, rdtgrp, evtid))
+ continue;
+
+ cntr_id = mbm_cntr_get(r, d, rdtgrp, evtid);
+
+ ret = resctrl_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
+ rdtgrp->closid, cntr_id, false);
+ if (!ret)
+ mbm_cntr_free(r, d, rdtgrp, evtid);
+ }
+ } else {
+ if (!mbm_cntr_assigned(r, d, rdtgrp, evtid))
+ goto out_done_unassign;
+
+ cntr_id = mbm_cntr_get(r, d, rdtgrp, evtid);
+
+ ret = resctrl_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
+ rdtgrp->closid, cntr_id, false);
+ if (!ret)
+ mbm_cntr_free(r, d, rdtgrp, evtid);
+ }
+
+out_done_unassign:
+ return ret;
+}
+
/* rdtgroup information files for one cache resource. */
static struct rftype res_common_files[] = {
{
--
2.34.1
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v10 18/24] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled
2024-12-12 20:15 [PATCH v10 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (16 preceding siblings ...)
2024-12-12 20:15 ` [PATCH v10 17/24] x86/resctrl: Add the interface to unassign a counter Babu Moger
@ 2024-12-12 20:15 ` Babu Moger
2024-12-19 23:39 ` Reinette Chatre
2024-12-12 20:15 ` [PATCH v10 19/24] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode Babu Moger
` (5 subsequent siblings)
23 siblings, 1 reply; 76+ messages in thread
From: Babu Moger @ 2024-12-12 20:15 UTC (permalink / raw)
To: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc,
babu.moger, xin3.li, andrew.cooper3, ebiggers, mario.limonciello,
james.morse, tan.shaopeng, linux-doc, linux-kernel,
maciej.wieczor-retman, eranian
Assign/unassign counters on resctrl group creation/deletion. Two counters
are required per group, one for MBM total event and one for MBM local
event.
There are a limited number of counters available for assignment. If these
counters are exhausted, the kernel will display the error message: "Out of
MBM assignable counters". However, it is not necessary to fail the
creation of a group due to assignment failures. Users have the flexibility
to modify the assignments at a later time.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v10: Assigned the counter before exposing the event files.
Moved the call rdtgroup_assign_cntrs() inside mkdir_rdt_prepare_rmid_alloc().
This is called both CNTR_MON and MON group creation.
Call mbm_cntr_reset() when unmounted to clear all the assignments.
Taken care of few other feedback comments.
v9: Changed rdtgroup_assign_cntrs() and rdtgroup_unassign_cntrs() to return void.
Updated couple of rdtgroup_unassign_cntrs() calls properly.
Updated function comments.
v8: Renamed rdtgroup_assign_grp to rdtgroup_assign_cntrs.
Renamed rdtgroup_unassign_grp to rdtgroup_unassign_cntrs.
Fixed the problem with unassigning the child MON groups of CTRL_MON group.
v7: Reworded the commit message.
Removed the reference of ABMC with mbm_cntr_assign.
Renamed the function rdtgroup_assign_cntrs to rdtgroup_assign_grp.
v6: Removed the redundant comments on all the calls of
rdtgroup_assign_cntrs. Updated the commit message.
Dropped printing error message on every call of rdtgroup_assign_cntrs.
v5: Removed the code to enable/disable ABMC during the mount.
That will be another patch.
Added arch callers to get the arch specific data.
Renamed fuctions to match the other abmc function.
Added code comments for assignment failures.
v4: Few name changes based on the upstream discussion.
Commit message update.
v3: This is a new patch. Patch addresses the upstream comment to enable
ABMC feature by default if the feature is available.
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 81 +++++++++++++++++++++++++-
1 file changed, 79 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index a71a8389b649..5acae525881a 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -920,6 +920,25 @@ static int rdtgroup_available_mbm_cntrs_show(struct kernfs_open_file *of,
return ret;
}
+static void mbm_cntr_reset(struct rdt_resource *r)
+{
+ struct rdt_mon_domain *dom;
+
+ /*
+ * Hardware counters will reset after switching the monitor mode.
+ * Reset the architectural state so that reading of hardware
+ * counter is not considered as an overflow in the next update.
+ * Also reset the domain counter bitmap.
+ */
+ if (is_mbm_enabled() && r->mon.mbm_cntr_assignable) {
+ list_for_each_entry(dom, &r->mon_domains, hdr.list) {
+ memset(dom->cntr_cfg, 0,
+ sizeof(*dom->cntr_cfg) * r->mon.num_mbm_cntrs);
+ resctrl_arch_reset_rmid_all(r, dom);
+ }
+ }
+}
+
#ifdef CONFIG_PROC_CPU_RESCTRL
/*
@@ -2969,6 +2988,46 @@ static void schemata_list_destroy(void)
}
}
+/*
+ * Called when a new group is created. If "mbm_cntr_assign" mode is enabled,
+ * counters are automatically assigned. Each group can accommodate two counters:
+ * one for the total event and one for the local event. Assignments may fail
+ * due to the limited number of counters. However, it is not necessary to fail
+ * the group creation and thus no failure is returned. Users have the option
+ * to modify the counter assignments after the group has been created.
+ */
+static void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
+{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+
+ if (!resctrl_arch_mbm_cntr_assign_enabled(r))
+ return;
+
+ if (is_mbm_total_enabled())
+ rdtgroup_assign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_TOTAL_EVENT_ID);
+
+ if (is_mbm_local_enabled())
+ rdtgroup_assign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);
+}
+
+/*
+ * Called when a group is deleted. Counters are unassigned if it was in
+ * assigned state.
+ */
+static void rdtgroup_unassign_cntrs(struct rdtgroup *rdtgrp)
+{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+
+ if (!resctrl_arch_mbm_cntr_assign_enabled(r))
+ return;
+
+ if (is_mbm_total_enabled())
+ rdtgroup_unassign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_TOTAL_EVENT_ID);
+
+ if (is_mbm_local_enabled())
+ rdtgroup_unassign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);
+}
+
static int rdt_get_tree(struct fs_context *fc)
{
struct rdt_fs_context *ctx = rdt_fc2context(fc);
@@ -3023,6 +3082,8 @@ static int rdt_get_tree(struct fs_context *fc)
if (ret < 0)
goto out_info;
+ rdtgroup_assign_cntrs(&rdtgroup_default);
+
ret = mkdir_mondata_all(rdtgroup_default.kn,
&rdtgroup_default, &kn_mondata);
if (ret < 0)
@@ -3058,8 +3119,10 @@ static int rdt_get_tree(struct fs_context *fc)
out_psl:
rdt_pseudo_lock_release();
out_mondata:
- if (resctrl_arch_mon_capable())
+ if (resctrl_arch_mon_capable()) {
kernfs_remove(kn_mondata);
+ rdtgroup_unassign_cntrs(&rdtgroup_default);
+ }
out_mongrp:
if (resctrl_arch_mon_capable())
kernfs_remove(kn_mongrp);
@@ -3238,6 +3301,7 @@ static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp)
head = &rdtgrp->mon.crdtgrp_list;
list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) {
+ rdtgroup_unassign_cntrs(sentry);
free_rmid(sentry->closid, sentry->mon.rmid);
list_del(&sentry->mon.crdtgrp_list);
@@ -3278,6 +3342,8 @@ static void rmdir_all_sub(void)
cpumask_or(&rdtgroup_default.cpu_mask,
&rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);
+ rdtgroup_unassign_cntrs(rdtgrp);
+
free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
kernfs_remove(rdtgrp->kn);
@@ -3309,6 +3375,8 @@ static void rdt_kill_sb(struct super_block *sb)
for_each_alloc_capable_rdt_resource(r)
reset_all_ctrls(r);
rmdir_all_sub();
+ rdtgroup_unassign_cntrs(&rdtgroup_default);
+ mbm_cntr_reset(&rdt_resources_all[RDT_RESOURCE_L3].r_resctrl);
rdt_pseudo_lock_release();
rdtgroup_default.mode = RDT_MODE_SHAREABLE;
schemata_list_destroy();
@@ -3772,6 +3840,8 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
}
rdtgrp->mon.rmid = ret;
+ rdtgroup_assign_cntrs(rdtgrp);
+
ret = mkdir_mondata_all(rdtgrp->kn, rdtgrp, &rdtgrp->mon.mon_data_kn);
if (ret) {
rdt_last_cmd_puts("kernfs subdir error\n");
@@ -3784,8 +3854,10 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
static void mkdir_rdt_prepare_rmid_free(struct rdtgroup *rgrp)
{
- if (resctrl_arch_mon_capable())
+ if (resctrl_arch_mon_capable()) {
+ rdtgroup_unassign_cntrs(rgrp);
free_rmid(rgrp->closid, rgrp->mon.rmid);
+ }
}
static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
@@ -4044,6 +4116,9 @@ static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
update_closid_rmid(tmpmask, NULL);
rdtgrp->flags = RDT_DELETED;
+
+ rdtgroup_unassign_cntrs(rdtgrp);
+
free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
/*
@@ -4090,6 +4165,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
update_closid_rmid(tmpmask, NULL);
+ rdtgroup_unassign_cntrs(rdtgrp);
+
free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
closid_free(rdtgrp->closid);
--
2.34.1
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v10 19/24] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode
2024-12-12 20:15 [PATCH v10 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (17 preceding siblings ...)
2024-12-12 20:15 ` [PATCH v10 18/24] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled Babu Moger
@ 2024-12-12 20:15 ` Babu Moger
2024-12-19 23:59 ` Reinette Chatre
2024-12-12 20:15 ` [PATCH v10 20/24] x86/resctrl: Introduce the interface to switch between monitor modes Babu Moger
` (4 subsequent siblings)
23 siblings, 1 reply; 76+ messages in thread
From: Babu Moger @ 2024-12-12 20:15 UTC (permalink / raw)
To: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc,
babu.moger, xin3.li, andrew.cooper3, ebiggers, mario.limonciello,
james.morse, tan.shaopeng, linux-doc, linux-kernel,
maciej.wieczor-retman, eranian
In mbm_cntr_assign mode, the hardware counter should be assigned to read
the MBM events.
Report 'Unassigned' in case the user attempts to read the events without
assigning the counter.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v10: Moved the code to check the assign state inside mon_event_read().
Fixed few text comments.
v9: Used is_mbm_event() to check the event type.
Minor user documentation update.
v8: Used MBM_EVENT_ARRAY_INDEX to get the index for the MBM event.
Documentation update to make the text generic.
v7: Moved the documentation under "mon_data".
Updated the text little bit.
v6: Added more explaination in the resctrl.rst
Added checks to detect "Unassigned" before reading RMID.
v5: New patch.
---
Documentation/arch/x86/resctrl.rst | 10 ++++++++++
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 8 ++++++++
arch/x86/kernel/cpu/resctrl/internal.h | 2 ++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 4 ++--
4 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index c075fcee96b7..3ec14c314606 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -430,6 +430,16 @@ When monitoring is enabled all MON groups will also contain:
for the L3 cache they occupy). These are named "mon_sub_L3_YY"
where "YY" is the node number.
+ When supported the mbm_cntr_assign mode allows users to assign a
+ counter to mon_hw_id, event pair enabling bandwidth monitoring for
+ as long as the counter remains assigned. The hardware will continue
+ tracking the assigned mon_hw_id until the user manually unassigns
+ it, ensuring that counters are not reset during this period. With
+ a limited number of counters, the system may run out of assignable
+ counters. In that case, MBM event counters will return 'Unassigned'
+ when the event is read. Users must manually assign a counter to read
+ the events.
+
"mon_hw_id":
Available only with debug option. The identifier used by hardware
for the monitor group. On x86 this is the RMID.
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 200d89a64027..8e265a86e524 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -527,6 +527,12 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
/* When picking a CPU from cpu_mask, ensure it can't race with cpuhp */
lockdep_assert_cpus_held();
+ if (resctrl_arch_mbm_cntr_assign_enabled(r) && is_mbm_event(evtid) &&
+ !mbm_cntr_assigned(r, d, rdtgrp, evtid)) {
+ rr->err = -ENOENT;
+ return;
+ }
+
/*
* Setup the parameters to pass to mon_event_count() to read the data.
*/
@@ -618,6 +624,8 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
seq_puts(m, "Error\n");
else if (rr.err == -EINVAL)
seq_puts(m, "Unavailable\n");
+ else if (rr.err == -ENOENT)
+ seq_puts(m, "Unassigned\n");
else
seq_printf(m, "%llu\n", rr.val);
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index f858098dbe4b..bb3213a7993e 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -710,4 +710,6 @@ int rdtgroup_unassign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp
struct rdt_mon_domain *d, enum resctrl_event_id evtid);
struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
u32 rmid, enum resctrl_event_id evtid);
+bool mbm_cntr_assigned(struct rdt_resource *r, struct rdt_mon_domain *d,
+ struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
#endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 5acae525881a..8d00b1689a80 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1967,8 +1967,8 @@ static int resctrl_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
return ret;
}
-static bool mbm_cntr_assigned(struct rdt_resource *r, struct rdt_mon_domain *d,
- struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
+bool mbm_cntr_assigned(struct rdt_resource *r, struct rdt_mon_domain *d,
+ struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
{
int cntr_id;
--
2.34.1
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v10 20/24] x86/resctrl: Introduce the interface to switch between monitor modes
2024-12-12 20:15 [PATCH v10 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (18 preceding siblings ...)
2024-12-12 20:15 ` [PATCH v10 19/24] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode Babu Moger
@ 2024-12-12 20:15 ` Babu Moger
2024-12-20 2:56 ` Reinette Chatre
2024-12-12 20:15 ` [PATCH v10 21/24] x86/resctrl: Configure mbm_cntr_assign mode if supported Babu Moger
` (3 subsequent siblings)
23 siblings, 1 reply; 76+ messages in thread
From: Babu Moger @ 2024-12-12 20:15 UTC (permalink / raw)
To: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc,
babu.moger, xin3.li, andrew.cooper3, ebiggers, mario.limonciello,
james.morse, tan.shaopeng, linux-doc, linux-kernel,
maciej.wieczor-retman, eranian
Introduce interface to switch between mbm_cntr_assign and default modes.
$ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
[mbm_cntr_assign]
default
To enable the "mbm_cntr_assign" mode:
$ echo "mbm_cntr_assign" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
To enable the default monitoring mode:
$ echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
MBM event counters will reset when mbm_assign_mode is changed.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v10: The call mbm_cntr_reset() has been moved to earlier patch.
Minor documentation update.
v9: Fixed extra spaces in user documentation.
Fixed problem changing the mode to mbm_cntr_assign mode when it is
not supported. Added extra checks to detect if systems supports it.
Used the rdtgroup_cntr_id_init to initialize cntr_id.
v8: Reset the internal counters after mbm_cntr_assign mode is changed.
Renamed rdtgroup_mbm_cntr_reset() to mbm_cntr_reset()
Updated the documentation to make text generic.
v7: Changed the interface name to mbm_assign_mode.
Removed the references of ABMC.
Added the changes to reset global and domain bitmaps.
Added the changes to reset rmid.
v6: Changed the mode name to mbm_cntr_assign.
Moved all the FS related code here.
Added changes to reset mbm_cntr_map and resctrl group counters.
v5: Change log and mode description text correction.
v4: Minor commit text changes. Keep the default to ABMC when supported.
Fixed comments to reflect changed interface "mbm_mode".
v3: New patch to address the review comments from upstream.
---
Documentation/arch/x86/resctrl.rst | 15 ++++++++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 50 +++++++++++++++++++++++++-
2 files changed, 64 insertions(+), 1 deletion(-)
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 3ec14c314606..d3a8a34cf629 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -290,6 +290,21 @@ with the following files:
"mbm_total_bytes" or "mbm_local_bytes" will report 'Unavailable' if
there is no counter associated with that event.
+ * To enable "mbm_cntr_assign" mode:
+ ::
+
+ # echo "mbm_cntr_assign" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
+
+ * To enable default monitoring mode:
+ ::
+
+ # echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
+
+ The MBM events (mbm_total_bytes and/or mbm_local_bytes) associated with
+ counters may reset when "mbm_assign_mode" is changed. Moving to
+ mbm_cntr_assign mode require users to assign the counters to the events.
+ Otherwise, the MBM event counters will return "Unassigned" when read.
+
"num_mbm_cntrs":
The number of monitoring counters available for assignment when the
architecture supports mbm_cntr_assign mode.
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 8d00b1689a80..eea534cce3d0 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -939,6 +939,53 @@ static void mbm_cntr_reset(struct rdt_resource *r)
}
}
+static ssize_t rdtgroup_mbm_assign_mode_write(struct kernfs_open_file *of,
+ char *buf, size_t nbytes, loff_t off)
+{
+ struct rdt_resource *r = of->kn->parent->priv;
+ int ret = 0;
+ bool enable;
+
+ /* Valid input requires a trailing newline */
+ if (nbytes == 0 || buf[nbytes - 1] != '\n')
+ return -EINVAL;
+
+ buf[nbytes - 1] = '\0';
+
+ cpus_read_lock();
+ mutex_lock(&rdtgroup_mutex);
+
+ rdt_last_cmd_clear();
+
+ if (!strcmp(buf, "default")) {
+ enable = 0;
+ } else if (!strcmp(buf, "mbm_cntr_assign")) {
+ if (r->mon.mbm_cntr_assignable) {
+ enable = 1;
+ } else {
+ ret = -EINVAL;
+ rdt_last_cmd_puts("mbm_cntr_assign mode is not supported\n");
+ goto write_exit;
+ }
+ } else {
+ ret = -EINVAL;
+ rdt_last_cmd_puts("Unsupported assign mode\n");
+ goto write_exit;
+ }
+
+ if (enable != resctrl_arch_mbm_cntr_assign_enabled(r)) {
+ ret = resctrl_arch_mbm_cntr_assign_set(r, enable);
+ if (!ret)
+ mbm_cntr_reset(r);
+ }
+
+write_exit:
+ mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();
+
+ return ret ?: nbytes;
+}
+
#ifdef CONFIG_PROC_CPU_RESCTRL
/*
@@ -2222,9 +2269,10 @@ static struct rftype res_common_files[] = {
},
{
.name = "mbm_assign_mode",
- .mode = 0444,
+ .mode = 0644,
.kf_ops = &rdtgroup_kf_single_ops,
.seq_show = rdtgroup_mbm_assign_mode_show,
+ .write = rdtgroup_mbm_assign_mode_write,
.fflags = RFTYPE_MON_INFO,
},
{
--
2.34.1
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v10 21/24] x86/resctrl: Configure mbm_cntr_assign mode if supported
2024-12-12 20:15 [PATCH v10 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (19 preceding siblings ...)
2024-12-12 20:15 ` [PATCH v10 20/24] x86/resctrl: Introduce the interface to switch between monitor modes Babu Moger
@ 2024-12-12 20:15 ` Babu Moger
2024-12-20 3:03 ` Reinette Chatre
2024-12-12 20:15 ` [PATCH v10 22/24] x86/resctrl: Update assignments on event configuration changes Babu Moger
` (2 subsequent siblings)
23 siblings, 1 reply; 76+ messages in thread
From: Babu Moger @ 2024-12-12 20:15 UTC (permalink / raw)
To: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc,
babu.moger, xin3.li, andrew.cooper3, ebiggers, mario.limonciello,
james.morse, tan.shaopeng, linux-doc, linux-kernel,
maciej.wieczor-retman, eranian
Configure mbm_cntr_assign on AMD. 'mbm_cntr_assign' mode in AMD is ABMC
(Assignable Bandwidth Monitoring Counters). It is enabled by default when
supported on the system.
Ensure that the ABMC is updated on all logical processors in the resctrl
domain.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v10: Commit text in imperative tone.
v9: Minor code change due to merge. Actual code did not change.
v8: Renamed resctrl_arch_mbm_cntr_assign_configure to
resctrl_arch_mbm_cntr_assign_set_one.
Adde r->mon_capable check.
Commit message update.
v7: Introduced resctrl_arch_mbm_cntr_assign_configure() to configure.
Moved the default settings to rdt_get_mon_l3_config(). It should be
done before the hotplug handler is called. It cannot be done at
rdtgroup_init().
v6: Keeping the default enablement in arch init code for now.
This may need some discussion.
Renamed resctrl_arch_configure_abmc to resctrl_arch_mbm_cntr_assign_configure.
v5: New patch to enable ABMC by default.
---
arch/x86/kernel/cpu/resctrl/internal.h | 1 +
arch/x86/kernel/cpu/resctrl/monitor.c | 1 +
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 11 +++++++++++
3 files changed, 13 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index bb3213a7993e..1ca51f68a523 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -712,4 +712,5 @@ struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
u32 rmid, enum resctrl_event_id evtid);
bool mbm_cntr_assigned(struct rdt_resource *r, struct rdt_mon_domain *d,
struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
+void resctrl_arch_mbm_cntr_assign_set_one(struct rdt_resource *r);
#endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 8823cd97ff1f..845636a205bf 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1237,6 +1237,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
r->mon.mbm_cntr_assignable = true;
cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
+ hw_res->mbm_cntr_assign_enabled = true;
resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
resctrl_file_fflags_init("available_mbm_cntrs", RFTYPE_MON_INFO);
}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index eea534cce3d0..65b3556978ad 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2824,6 +2824,13 @@ int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable)
return 0;
}
+void resctrl_arch_mbm_cntr_assign_set_one(struct rdt_resource *r)
+{
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+
+ resctrl_abmc_set_one_amd(&hw_res->mbm_cntr_assign_enabled);
+}
+
/*
* We don't allow rdtgroup directories to be created anywhere
* except the root directory. Thus when looking for the rdtgroup
@@ -4600,9 +4607,13 @@ int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
void resctrl_online_cpu(unsigned int cpu)
{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+
mutex_lock(&rdtgroup_mutex);
/* The CPU is set in default rdtgroup after online. */
cpumask_set_cpu(cpu, &rdtgroup_default.cpu_mask);
+ if (r->mon_capable && r->mon.mbm_cntr_assignable)
+ resctrl_arch_mbm_cntr_assign_set_one(r);
mutex_unlock(&rdtgroup_mutex);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v10 22/24] x86/resctrl: Update assignments on event configuration changes
2024-12-12 20:15 [PATCH v10 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (20 preceding siblings ...)
2024-12-12 20:15 ` [PATCH v10 21/24] x86/resctrl: Configure mbm_cntr_assign mode if supported Babu Moger
@ 2024-12-12 20:15 ` Babu Moger
2024-12-20 3:12 ` Reinette Chatre
2024-12-12 20:15 ` [PATCH v10 23/24] x86/resctrl: Introduce interface to list assignment states of all the groups Babu Moger
2024-12-12 20:15 ` [PATCH v10 24/24] x86/resctrl: Introduce interface to modify assignment states of " Babu Moger
23 siblings, 1 reply; 76+ messages in thread
From: Babu Moger @ 2024-12-12 20:15 UTC (permalink / raw)
To: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc,
babu.moger, xin3.li, andrew.cooper3, ebiggers, mario.limonciello,
james.morse, tan.shaopeng, linux-doc, linux-kernel,
maciej.wieczor-retman, eranian
Resctrl provides option to configure events by writing to the interfaces
/sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config or
/sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config when BMEC (Bandwidth
Monitoring Event Configuration) is supported.
Whenever the event configuration is updated, MBM assignments must be
revised across all monitor groups within the impacted domains.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v10: Code changed completely with domain specific counter assignment.
Rewrite the commit message.
Added few more code comments.
v9: Again patch changed completely based on the comment.
https://lore.kernel.org/lkml/03b278b5-6c15-4d09-9ab7-3317e84a409e@intel.com/
Introduced resctrl_mon_event_config_set to handle IPI.
But sending another IPI inside IPI causes problem. Kernel reports SMP
warning. So, introduced resctrl_arch_update_cntr() to send the command directly.
v8: Patch changed completely.
Updated the assignment on same IPI as the event is updated.
Could not do the way we discussed in the thread.
https://lore.kernel.org/lkml/f77737ac-d3f6-3e4b-3565-564f79c86ca8@amd.com/
Needed to figure out event type to update the configuration.
v7: New patch to update the assignments. Missed it earlier.
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 67 ++++++++++++++++++++++----
include/linux/resctrl.h | 4 +-
2 files changed, 61 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 65b3556978ad..6b5c886b7e99 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1704,26 +1704,26 @@ u32 resctrl_arch_mon_event_config_get(struct rdt_mon_domain *d,
return INVALID_CONFIG_VALUE;
}
-void resctrl_arch_mon_event_config_set(void *info)
+void resctrl_arch_mon_event_config_set(struct rdt_mon_domain *d,
+ enum resctrl_event_id eventid, u32 val)
{
- struct mon_config_info *mon_info = info;
struct rdt_hw_mon_domain *hw_dom;
unsigned int index;
- index = mon_event_config_index_get(mon_info->evtid);
+ index = mon_event_config_index_get(eventid);
if (index == INVALID_CONFIG_INDEX)
return;
- wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
+ wrmsr(MSR_IA32_EVT_CFG_BASE + index, val, 0);
- hw_dom = resctrl_to_arch_mon_dom(mon_info->d);
+ hw_dom = resctrl_to_arch_mon_dom(d);
- switch (mon_info->evtid) {
+ switch (eventid) {
case QOS_L3_MBM_TOTAL_EVENT_ID:
- hw_dom->mbm_total_cfg = mon_info->mon_config;
+ hw_dom->mbm_total_cfg = val;
break;
case QOS_L3_MBM_LOCAL_EVENT_ID:
- hw_dom->mbm_local_cfg = mon_info->mon_config;
+ hw_dom->mbm_local_cfg = val;
break;
default:
break;
@@ -1825,6 +1825,54 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
return 0;
}
+/*
+ * Review the cntr_cfg domain configuration. If a matching assignment is found,
+ * update the counter assignment accordingly. This is within the IPI Context,
+ * so call resctrl_abmc_config_one_amd directly.
+ */
+static void resctrl_arch_update_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+ enum resctrl_event_id evtid, u32 val)
+{
+ struct cntr_config config;
+ struct rdtgroup *rdtgrp;
+ struct mbm_state *m;
+ u32 cntr_id;
+
+ for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
+ rdtgrp = d->cntr_cfg[cntr_id].rdtgrp;
+ if (rdtgrp && d->cntr_cfg[cntr_id].evtid == evtid) {
+ memset(&config, 0, sizeof(struct cntr_config));
+ config.r = r;
+ config.d = d;
+ config.evtid = evtid;
+ config.rmid = rdtgrp->mon.rmid;
+ config.closid = rdtgrp->closid;
+ config.cntr_id = cntr_id;
+ config.val = val;
+ config.assign = 1;
+
+ resctrl_abmc_config_one_amd(&config);
+
+ m = get_mbm_state(d, rdtgrp->closid, rdtgrp->mon.rmid, evtid);
+ if (m)
+ memset(m, 0, sizeof(struct mbm_state));
+ }
+ }
+}
+
+static void resctrl_mon_event_config_set(void *info)
+{
+ struct mon_config_info *mon_info = info;
+ struct rdt_mon_domain *d = mon_info->d;
+ struct rdt_resource *r = mon_info->r;
+
+ resctrl_arch_mon_event_config_set(d, mon_info->evtid, mon_info->mon_config);
+
+ /* Check if assignments needs to be updated */
+ if (resctrl_arch_mbm_cntr_assign_enabled(r))
+ resctrl_arch_update_cntr(r, d, mon_info->evtid,
+ mon_info->mon_config);
+}
static void mbm_config_write_domain(struct rdt_resource *r,
struct rdt_mon_domain *d, u32 evtid, u32 val)
@@ -1840,6 +1888,7 @@ static void mbm_config_write_domain(struct rdt_resource *r,
if (config_val == INVALID_CONFIG_VALUE || config_val == val)
return;
+ mon_info.r = r;
mon_info.d = d;
mon_info.evtid = evtid;
mon_info.mon_config = val;
@@ -1851,7 +1900,7 @@ static void mbm_config_write_domain(struct rdt_resource *r,
* on one CPU is observed by all the CPUs in the domain.
*/
smp_call_function_any(&d->hdr.cpu_mask,
- resctrl_arch_mon_event_config_set,
+ resctrl_mon_event_config_set,
&mon_info, 1);
/*
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 03c67d9156f3..2bf461179680 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -137,6 +137,7 @@ struct rdt_mon_domain {
* @mon_config: Event configuration value
*/
struct mon_config_info {
+ struct rdt_resource *r;
struct rdt_mon_domain *d;
enum resctrl_event_id evtid;
u32 mon_config;
@@ -376,7 +377,8 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
*/
void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d);
-void resctrl_arch_mon_event_config_set(void *info);
+void resctrl_arch_mon_event_config_set(struct rdt_mon_domain *d,
+ enum resctrl_event_id eventid, u32 val);
u32 resctrl_arch_mon_event_config_get(struct rdt_mon_domain *d,
enum resctrl_event_id eventid);
--
2.34.1
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v10 23/24] x86/resctrl: Introduce interface to list assignment states of all the groups
2024-12-12 20:15 [PATCH v10 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (21 preceding siblings ...)
2024-12-12 20:15 ` [PATCH v10 22/24] x86/resctrl: Update assignments on event configuration changes Babu Moger
@ 2024-12-12 20:15 ` Babu Moger
2024-12-12 22:57 ` Luck, Tony
2024-12-12 20:15 ` [PATCH v10 24/24] x86/resctrl: Introduce interface to modify assignment states of " Babu Moger
23 siblings, 1 reply; 76+ messages in thread
From: Babu Moger @ 2024-12-12 20:15 UTC (permalink / raw)
To: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc,
babu.moger, xin3.li, andrew.cooper3, ebiggers, mario.limonciello,
james.morse, tan.shaopeng, linux-doc, linux-kernel,
maciej.wieczor-retman, eranian
Provide the interface to list the assignment states of all the resctrl
groups in mbm_cntr_assign mode.
Example:
$ mount -t resctrl resctrl /sys/fs/resctrl/
$ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
//0=tl;1=tl;
List follows the following format:
"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
Format for specific type of groups:
- Default CTRL_MON group:
"//<domain_id>=<flags>"
- Non-default CTRL_MON group:
"<CTRL_MON group>//<domain_id>=<flags>"
- Child MON group of default CTRL_MON group:
"/<MON group>/<domain_id>=<flags>"
- Child MON group of non-default CTRL_MON group:
"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
Flags can be one of the following:
t MBM total event is assigned
l MBM local event is assigned
tl Both total and local MBM events are assigned
_ None of the MBM events are assigned
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v10: Changes mostly due to domain specific counter assignment.
v9: Minor parameter update in resctrl_mbm_event_assigned().
v8: Moved resctrl_mbm_event_assigned() in here as it is first used here.
Moved rdt_last_cmd_clear() before making any call.
Updated the commit log.
Corrected the doc format.
v7: Renamed the interface name from 'mbm_control' to 'mbm_assign_control'
to match 'mbm_assign_mode'.
Removed Arch references from FS code.
Added rdt_last_cmd_clear() before the command processing.
Added rdtgroup_mutex before all the calls.
Removed references of ABMC from FS code.
v6: The domain specific assignment can be determined looking at mbm_cntr_map.
Removed rdtgroup_abmc_dom_cfg() and rdtgroup_abmc_dom_state().
Removed the switch statement for the domain_state detection.
Determined the flags incremently.
Removed special handling of default group while printing..
v5: Replaced "assignment flags" with "flags".
Changes related to mon structure.
Changes related renaming the interface from mbm_assign_control to
mbm_control.
v4: Added functionality to query domain specific assigment in.
rdtgroup_abmc_dom_state().
v3: New patch.
Addresses the feedback to provide the global assignment interface.
https://lore.kernel.org/lkml/c73f444b-83a1-4e9a-95d3-54c5165ee782@intel.com/
---
Documentation/arch/x86/resctrl.rst | 44 ++++++++++++++++
arch/x86/kernel/cpu/resctrl/monitor.c | 1 +
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 71 ++++++++++++++++++++++++++
3 files changed, 116 insertions(+)
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index d3a8a34cf629..9ae2b8f335dd 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -321,6 +321,50 @@ with the following files:
The number of monitoring counters available for assignment in each
domain when the architecture supports mbm_cntr_assign mode.
+"mbm_assign_control":
+ Reports the resctrl group and monitor status of each group.
+
+ List follows the following format:
+ "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
+
+ Format for specific type of groups:
+
+ * Default CTRL_MON group:
+ "//<domain_id>=<flags>"
+
+ * Non-default CTRL_MON group:
+ "<CTRL_MON group>//<domain_id>=<flags>"
+
+ * Child MON group of default CTRL_MON group:
+ "/<MON group>/<domain_id>=<flags>"
+
+ * Child MON group of non-default CTRL_MON group:
+ "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
+
+ Flags can be one of the following:
+ ::
+
+ t MBM total event is assigned.
+ l MBM local event is assigned.
+ tl Both MBM total and local events are assigned.
+ _ None of the MBM events are assigned.
+
+ Examples:
+ ::
+
+ # mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
+ # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
+ # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
+
+ # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+ non_default_ctrl_mon_grp//0=tl;1=tl;
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+ //0=tl;1=tl;
+ /child_default_mon_grp/0=tl;1=tl;
+
+ There are four resctrl groups. All the groups have total and local MBM events
+ assigned on domain 0 and 1.
+
"max_threshold_occupancy":
Read/write file provides the largest value (in
bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 845636a205bf..3bb4313df92e 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1240,6 +1240,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
hw_res->mbm_cntr_assign_enabled = true;
resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
resctrl_file_fflags_init("available_mbm_cntrs", RFTYPE_MON_INFO);
+ resctrl_file_fflags_init("mbm_assign_control", RFTYPE_MON_INFO);
}
}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 6b5c886b7e99..70bf590ded8a 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -986,6 +986,71 @@ static ssize_t rdtgroup_mbm_assign_mode_write(struct kernfs_open_file *of,
return ret ?: nbytes;
}
+static char *rdtgroup_mon_state_to_str(struct rdt_resource *r,
+ struct rdt_mon_domain *d,
+ struct rdtgroup *rdtgrp, char *str)
+{
+ char *tmp = str;
+
+ /* Query the total and local event flags for the domain */
+ if (mbm_cntr_assigned(r, d, rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID))
+ *tmp++ = 't';
+
+ if (mbm_cntr_assigned(r, d, rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID))
+ *tmp++ = 'l';
+
+ if (tmp == str)
+ *tmp++ = '_';
+
+ *tmp = '\0';
+ return str;
+}
+
+static int rdtgroup_mbm_assign_control_show(struct kernfs_open_file *of,
+ struct seq_file *s, void *v)
+{
+ struct rdt_resource *r = of->kn->parent->priv;
+ struct rdt_mon_domain *dom;
+ struct rdtgroup *rdtg;
+ char str[10];
+
+ cpus_read_lock();
+ mutex_lock(&rdtgroup_mutex);
+ rdt_last_cmd_clear();
+
+ if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
+ rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
+ mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();
+ return -EINVAL;
+ }
+
+ list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
+ struct rdtgroup *crg;
+
+ seq_printf(s, "%s//", rdtg->kn->name);
+
+ list_for_each_entry(dom, &r->mon_domains, hdr.list)
+ seq_printf(s, "%d=%s;", dom->hdr.id,
+ rdtgroup_mon_state_to_str(r, dom, rdtg, str));
+ seq_putc(s, '\n');
+
+ list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
+ mon.crdtgrp_list) {
+ seq_printf(s, "%s/%s/", rdtg->kn->name, crg->kn->name);
+
+ list_for_each_entry(dom, &r->mon_domains, hdr.list)
+ seq_printf(s, "%d=%s;", dom->hdr.id,
+ rdtgroup_mon_state_to_str(r, dom, crg, str));
+ seq_putc(s, '\n');
+ }
+ }
+
+ mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();
+ return 0;
+}
+
#ifdef CONFIG_PROC_CPU_RESCTRL
/*
@@ -2316,6 +2381,12 @@ static struct rftype res_common_files[] = {
.seq_show = mbm_local_bytes_config_show,
.write = mbm_local_bytes_config_write,
},
+ {
+ .name = "mbm_assign_control",
+ .mode = 0444,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .seq_show = rdtgroup_mbm_assign_control_show,
+ },
{
.name = "mbm_assign_mode",
.mode = 0644,
--
2.34.1
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v10 24/24] x86/resctrl: Introduce interface to modify assignment states of the groups
2024-12-12 20:15 [PATCH v10 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
` (22 preceding siblings ...)
2024-12-12 20:15 ` [PATCH v10 23/24] x86/resctrl: Introduce interface to list assignment states of all the groups Babu Moger
@ 2024-12-12 20:15 ` Babu Moger
2024-12-20 3:23 ` Reinette Chatre
23 siblings, 1 reply; 76+ messages in thread
From: Babu Moger @ 2024-12-12 20:15 UTC (permalink / raw)
To: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc,
babu.moger, xin3.li, andrew.cooper3, ebiggers, mario.limonciello,
james.morse, tan.shaopeng, linux-doc, linux-kernel,
maciej.wieczor-retman, eranian
Introduce the interface to assign MBM events in mbm_cntr_assign mode.
Events can be enabled or disabled by writing to file
/sys/fs/resctrl/info/L3_MON/mbm_assign_control
Format is similar to the list format with addition of opcode for the
assignment operation.
"<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
Format for specific type of groups:
* Default CTRL_MON group:
"//<domain_id><opcode><flags>"
* Non-default CTRL_MON group:
"<CTRL_MON group>//<domain_id><opcode><flags>"
* Child MON group of default CTRL_MON group:
"/<MON group>/<domain_id><opcode><flags>"
* Child MON group of non-default CTRL_MON group:
"<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
Domain_id '*' will apply the flags on all the domains.
Opcode can be one of the following:
= Update the assignment to match the flags
+ Assign a new MBM event without impacting existing assignments.
- Unassign a MBM event from currently assigned events.
Assignment flags can be one of the following:
t MBM total event
l MBM local event
tl Both total and local MBM events
_ None of the MBM events. Valid only with '=' opcode. This flag cannot
be combined with other flags.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v10: Fixed the issue with finding the domain in multiple iterations.
Printed error message with domain information when assign fails.
Changed the variables to unsigned for processing assign state.
Taken care of few format corrections.
v9: Fixed handling special case '//0=' and '//".
Removed extra strstr() call.
Added generic failure text when assignment operation fails.
Corrected user documentation format texts.
v8: Moved unassign as the first action during the assign modification.
Assign none "_" takes priority. Cannot be mixed with other flags.
Updated the documentation and .rst file format. htmldoc looks ok.
v7: Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write().
Added mutex lock in rdtgroup_mbm_assign_control_write() while processing.
Renamed rdtgroup_find_grp to rdtgroup_find_grp_by_name.
Fixed rdtgroup_str_to_mon_state to return error for invalid flags.
Simplified the calls rdtgroup_assign_cntr by merging few functions earlier.
Removed ABMC reference in FS code.
Reinette commented about handling the combination of flags like 'lt_' and '_lt'.
Not sure if we need to change the behaviour here. Processed them sequencially right now.
Users have the liberty to pass the flags. Restricting it might be a problem later.
v6: Added support assign all if domain id is '*'
Fixed the allocation of counter id if it not assigned already.
v5: Interface name changed from mbm_assign_control to mbm_control.
Fixed opcode and flags combination.
'=_" is valid.
"-_" amd "+_" is not valid.
Minor message update.
Renamed the function with prefix - rdtgroup_.
Corrected few documentation mistakes.
Rebase related changes after SNC support.
v4: Added domain specific assignments. Fixed the opcode parsing.
v3: New patch.
Addresses the feedback to provide the global assignment interface.
---
Documentation/arch/x86/resctrl.rst | 116 +++++++++++-
arch/x86/kernel/cpu/resctrl/internal.h | 10 +
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 241 ++++++++++++++++++++++++-
3 files changed, 365 insertions(+), 2 deletions(-)
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 9ae2b8f335dd..3bb7c11df59d 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -347,7 +347,8 @@ with the following files:
t MBM total event is assigned.
l MBM local event is assigned.
tl Both MBM total and local events are assigned.
- _ None of the MBM events are assigned.
+ _ None of the MBM events are assigned. Only works with opcode '=' for write
+ and cannot be combined with other flags.
Examples:
::
@@ -365,6 +366,119 @@ with the following files:
There are four resctrl groups. All the groups have total and local MBM events
assigned on domain 0 and 1.
+ Assignment state can be updated by writing to "mbm_assign_control".
+
+ Format is similar to the list format with addition of opcode for the
+ assignment operation.
+
+ "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
+
+ Format for each type of group:
+
+ * Default CTRL_MON group:
+ "//<domain_id><opcode><flags>"
+
+ * Non-default CTRL_MON group:
+ "<CTRL_MON group>//<domain_id><opcode><flags>"
+
+ * Child MON group of default CTRL_MON group:
+ "/<MON group>/<domain_id><opcode><flags>"
+
+ * Child MON group of non-default CTRL_MON group:
+ "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
+
+ Domain_id '*' will apply the flags to all the domains.
+
+ Opcode can be one of the following:
+ ::
+
+ = Update the assignment to match the MBM event.
+ + Assign a new MBM event without impacting existing assignments.
+ - Unassign a MBM event from currently assigned events.
+
+ Examples:
+ Initial group status:
+ ::
+
+ # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+ non_default_ctrl_mon_grp//0=tl;1=tl;
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+ //0=tl;1=tl;
+ /child_default_mon_grp/0=tl;1=tl;
+
+ To update the default group to assign only total MBM event on domain 0:
+ ::
+
+ # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+ Assignment status after the update:
+ ::
+
+ # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+ non_default_ctrl_mon_grp//0=tl;1=tl;
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+ //0=t;1=tl;
+ /child_default_mon_grp/0=tl;1=tl;
+
+ To update the MON group child_default_mon_grp to remove total MBM event on domain 1:
+ ::
+
+ # echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+ Assignment status after the update:
+ ::
+
+ # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+ non_default_ctrl_mon_grp//0=tl;1=tl;
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+ //0=t;1=tl;
+ /child_default_mon_grp/0=tl;1=l;
+
+ To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to unassign
+ both local and total MBM events on domain 1:
+ ::
+
+ # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
+ /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+ Assignment status after the update:
+ ::
+
+ # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+ non_default_ctrl_mon_grp//0=tl;1=tl;
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
+ //0=t;1=tl;
+ /child_default_mon_grp/0=tl;1=l;
+
+ To update the default group to add a local MBM event domain 0:
+ ::
+
+ # echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+ Assignment status after the update:
+ ::
+
+ # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+ non_default_ctrl_mon_grp//0=tl;1=tl;
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
+ //0=tl;1=tl;
+ /child_default_mon_grp/0=tl;1=l;
+
+ To update the non default CTRL_MON group non_default_ctrl_mon_grp to unassign all the
+ MBM events on all the domains:
+ ::
+
+ # echo "non_default_ctrl_mon_grp//*=_" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+ Assignment status after the update:
+ ::
+
+ # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+ non_default_ctrl_mon_grp//0=_;1=_;
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
+ //0=tl;1=tl;
+ /child_default_mon_grp/0=tl;1=l;
+
"max_threshold_occupancy":
Read/write file provides the largest value (in
bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 1ca51f68a523..f433e31a5e87 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -62,6 +62,16 @@
/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
#define ABMC_ENABLE_BIT 0
+/*
+ * Assignment flags for mbm_cntr_assign mode
+ */
+enum {
+ ASSIGN_NONE = 0,
+ ASSIGN_TOTAL = BIT(QOS_L3_MBM_TOTAL_EVENT_ID),
+ ASSIGN_LOCAL = BIT(QOS_L3_MBM_LOCAL_EVENT_ID),
+ ASSIGN_INVALID,
+};
+
/**
* cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
* aren't marked nohz_full
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 70bf590ded8a..61a552974f13 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1051,6 +1051,244 @@ static int rdtgroup_mbm_assign_control_show(struct kernfs_open_file *of,
return 0;
}
+static unsigned int rdtgroup_str_to_mon_state(char *flag)
+{
+ unsigned int i, mon_state = ASSIGN_NONE;
+
+ if (!strlen(flag))
+ return ASSIGN_INVALID;
+
+ for (i = 0; i < strlen(flag); i++) {
+ switch (*(flag + i)) {
+ case 't':
+ mon_state |= ASSIGN_TOTAL;
+ break;
+ case 'l':
+ mon_state |= ASSIGN_LOCAL;
+ break;
+ case '_':
+ return ASSIGN_NONE;
+ default:
+ return ASSIGN_INVALID;
+ }
+ }
+
+ return mon_state;
+}
+
+static struct rdtgroup *rdtgroup_find_grp_by_name(enum rdt_group_type rtype,
+ char *p_grp, char *c_grp)
+{
+ struct rdtgroup *rdtg, *crg;
+
+ if (rtype == RDTCTRL_GROUP && *p_grp == '\0') {
+ return &rdtgroup_default;
+ } else if (rtype == RDTCTRL_GROUP) {
+ list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list)
+ if (!strcmp(p_grp, rdtg->kn->name))
+ return rdtg;
+ } else if (rtype == RDTMON_GROUP) {
+ list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
+ if (!strcmp(p_grp, rdtg->kn->name)) {
+ list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
+ mon.crdtgrp_list) {
+ if (!strcmp(c_grp, crg->kn->name))
+ return crg;
+ }
+ }
+ }
+ }
+
+ return NULL;
+}
+
+static int rdtgroup_process_flags(struct rdt_resource *r,
+ enum rdt_group_type rtype,
+ char *p_grp, char *c_grp, char *tok)
+{
+ unsigned int op, mon_state, assign_state, unassign_state;
+ char *dom_str, *id_str, *op_str;
+ struct rdt_mon_domain *d;
+ struct rdtgroup *rdtgrp;
+ unsigned long dom_id;
+ char domain[10];
+ bool found;
+ int ret;
+
+ rdtgrp = rdtgroup_find_grp_by_name(rtype, p_grp, c_grp);
+
+ if (!rdtgrp) {
+ rdt_last_cmd_puts("Not a valid resctrl group\n");
+ return -EINVAL;
+ }
+
+next:
+ if (!tok || tok[0] == '\0')
+ return 0;
+
+ /* Start processing the strings for each domain */
+ dom_str = strim(strsep(&tok, ";"));
+
+ op_str = strpbrk(dom_str, "=+-");
+
+ if (op_str) {
+ op = *op_str;
+ } else {
+ rdt_last_cmd_puts("Missing operation =, +, - character\n");
+ return -EINVAL;
+ }
+
+ id_str = strsep(&dom_str, "=+-");
+
+ /* Check for domain id '*' which means all domains */
+ if (id_str && *id_str == '*') {
+ d = NULL;
+ goto check_state;
+ } else if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
+ rdt_last_cmd_puts("Missing domain id\n");
+ return -EINVAL;
+ }
+
+ /* Verify if the dom_id is valid */
+ found = false;
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
+ if (d->hdr.id == dom_id) {
+ found = true;
+ break;
+ }
+ }
+
+ if (!found) {
+ rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
+ return -EINVAL;
+ }
+
+check_state:
+ mon_state = rdtgroup_str_to_mon_state(dom_str);
+
+ if (mon_state == ASSIGN_INVALID) {
+ rdt_last_cmd_puts("Invalid assign flag\n");
+ goto out_fail;
+ }
+
+ assign_state = 0;
+ unassign_state = 0;
+
+ switch (op) {
+ case '+':
+ if (mon_state == ASSIGN_NONE) {
+ rdt_last_cmd_puts("Invalid assign opcode\n");
+ goto out_fail;
+ }
+ assign_state = mon_state;
+ break;
+ case '-':
+ if (mon_state == ASSIGN_NONE) {
+ rdt_last_cmd_puts("Invalid assign opcode\n");
+ goto out_fail;
+ }
+ unassign_state = mon_state;
+ break;
+ case '=':
+ assign_state = mon_state;
+ unassign_state = (ASSIGN_TOTAL | ASSIGN_LOCAL) & ~assign_state;
+ break;
+ default:
+ break;
+ }
+
+ if (unassign_state & ASSIGN_TOTAL) {
+ ret = rdtgroup_unassign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID);
+ if (ret)
+ goto out_fail;
+ }
+
+ if (unassign_state & ASSIGN_LOCAL) {
+ ret = rdtgroup_unassign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID);
+ if (ret)
+ goto out_fail;
+ }
+
+ if (assign_state & ASSIGN_TOTAL) {
+ ret = rdtgroup_assign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID);
+ if (ret)
+ goto out_fail;
+ }
+
+ if (assign_state & ASSIGN_LOCAL) {
+ ret = rdtgroup_assign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID);
+ if (ret)
+ goto out_fail;
+ }
+
+ goto next;
+
+out_fail:
+ sprintf(domain, d ? "%ld" : "*", dom_id);
+
+ rdt_last_cmd_printf("Assign operation '%s%c%s' failed on the group %s/%s/\n",
+ domain, op, dom_str, p_grp, c_grp);
+
+ return -EINVAL;
+}
+
+static ssize_t rdtgroup_mbm_assign_control_write(struct kernfs_open_file *of,
+ char *buf, size_t nbytes, loff_t off)
+{
+ struct rdt_resource *r = of->kn->parent->priv;
+ char *token, *cmon_grp, *mon_grp;
+ enum rdt_group_type rtype;
+ int ret;
+
+ /* Valid input requires a trailing newline */
+ if (nbytes == 0 || buf[nbytes - 1] != '\n')
+ return -EINVAL;
+
+ buf[nbytes - 1] = '\0';
+
+ cpus_read_lock();
+ mutex_lock(&rdtgroup_mutex);
+
+ rdt_last_cmd_clear();
+
+ if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
+ rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
+ mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();
+ return -EINVAL;
+ }
+
+ while ((token = strsep(&buf, "\n")) != NULL) {
+ /*
+ * The write command follows the following format:
+ * “<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>”
+ * Extract the CTRL_MON group.
+ */
+ cmon_grp = strsep(&token, "/");
+
+ /*
+ * Extract the MON_GROUP.
+ * strsep returns empty string for contiguous delimiters.
+ * Empty mon_grp here means it is a RDTCTRL_GROUP.
+ */
+ mon_grp = strsep(&token, "/");
+
+ if (*mon_grp == '\0')
+ rtype = RDTCTRL_GROUP;
+ else
+ rtype = RDTMON_GROUP;
+
+ ret = rdtgroup_process_flags(r, rtype, cmon_grp, mon_grp, token);
+ if (ret)
+ break;
+ }
+
+ mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();
+
+ return ret ?: nbytes;
+}
+
#ifdef CONFIG_PROC_CPU_RESCTRL
/*
@@ -2383,9 +2621,10 @@ static struct rftype res_common_files[] = {
},
{
.name = "mbm_assign_control",
- .mode = 0444,
+ .mode = 0644,
.kf_ops = &rdtgroup_kf_single_ops,
.seq_show = rdtgroup_mbm_assign_control_show,
+ .write = rdtgroup_mbm_assign_control_write,
},
{
.name = "mbm_assign_mode",
--
2.34.1
^ permalink raw reply related [flat|nested] 76+ messages in thread
* Re: [PATCH v10 23/24] x86/resctrl: Introduce interface to list assignment states of all the groups
2024-12-12 20:15 ` [PATCH v10 23/24] x86/resctrl: Introduce interface to list assignment states of all the groups Babu Moger
@ 2024-12-12 22:57 ` Luck, Tony
2024-12-13 15:23 ` Moger, Babu
0 siblings, 1 reply; 76+ messages in thread
From: Luck, Tony @ 2024-12-12 22:57 UTC (permalink / raw)
To: Babu Moger
Cc: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen,
peternewman, fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
On Thu, Dec 12, 2024 at 02:15:26PM -0600, Babu Moger wrote:
> +static int rdtgroup_mbm_assign_control_show(struct kernfs_open_file *of,
> + struct seq_file *s, void *v)
> +{
> + struct rdt_resource *r = of->kn->parent->priv;
> + struct rdt_mon_domain *dom;
> + struct rdtgroup *rdtg;
> + char str[10];
> +
> + cpus_read_lock();
> + mutex_lock(&rdtgroup_mutex);
> + rdt_last_cmd_clear();
> +
> + if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
> + rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
> + mutex_unlock(&rdtgroup_mutex);
> + cpus_read_unlock();
> + return -EINVAL;
> + }
> +
> + list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
> + struct rdtgroup *crg;
> +
> + seq_printf(s, "%s//", rdtg->kn->name);
> +
> + list_for_each_entry(dom, &r->mon_domains, hdr.list)
> + seq_printf(s, "%d=%s;", dom->hdr.id,
> + rdtgroup_mon_state_to_str(r, dom, rdtg, str));
> + seq_putc(s, '\n');
Other resctrl files with domain lists use ';' as a separator, not a
terminator. This code results in:
//0=tl;1=tl;
rather than
//0=tl;1=tl
> +
> + list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
> + mon.crdtgrp_list) {
> + seq_printf(s, "%s/%s/", rdtg->kn->name, crg->kn->name);
> +
> + list_for_each_entry(dom, &r->mon_domains, hdr.list)
> + seq_printf(s, "%d=%s;", dom->hdr.id,
> + rdtgroup_mon_state_to_str(r, dom, crg, str));
> + seq_putc(s, '\n');
Ditto.
> + }
> + }
> +
> + mutex_unlock(&rdtgroup_mutex);
> + cpus_read_unlock();
> + return 0;
> +}
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 16/24] x86/resctrl: Add interface to the assign counter
2024-12-12 20:15 ` [PATCH v10 16/24] x86/resctrl: Add interface to the assign counter Babu Moger
@ 2024-12-12 23:37 ` Luck, Tony
2024-12-13 15:57 ` Moger, Babu
2024-12-19 23:22 ` Reinette Chatre
1 sibling, 1 reply; 76+ messages in thread
From: Luck, Tony @ 2024-12-12 23:37 UTC (permalink / raw)
To: Babu Moger
Cc: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen,
peternewman, fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
On Thu, Dec 12, 2024 at 02:15:19PM -0600, Babu Moger wrote:
> +/*
> + * Assign a hardware counter to event @evtid of group @rdtgrp.
> + * Counter will be assigned to all the domains if rdt_mon_domain is NULL
> + * else the counter will be assigned to specific domain.
> + */
> +int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
> + struct rdt_mon_domain *d, enum resctrl_event_id evtid)
> +{
> + int cntr_id, ret = 0;
> +
> + if (!d) {
> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
> + if (mbm_cntr_assigned(r, d, rdtgrp, evtid))
> + continue;
> +
> + cntr_id = mbm_cntr_alloc(r, d, rdtgrp, evtid);
> + if (cntr_id < 0) {
> + rdt_last_cmd_puts("Domain Out of MBM assignable counters\n");
Message could be more helpful by including the domain number.
> + continue;
Not sure whether continuing is the right thing to do here. Sure the
other domains may have available counters, but now you may have a
confused status where some requested operations succeeded and others
failed.
> + }
> +
> + ret = resctrl_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
> + rdtgrp->closid, cntr_id, true);
> + if (ret)
> + goto out_done_assign;
> + }
> + } else {
> + if (mbm_cntr_assigned(r, d, rdtgrp, evtid))
> + goto out_done_assign;
> +
> + cntr_id = mbm_cntr_alloc(r, d, rdtgrp, evtid);
> + if (cntr_id < 0) {
> + rdt_last_cmd_puts("Domain Out of MBM assignable counters\n");
Ditto helpful to include domain number.
> + goto out_done_assign;
When you run out of counters here, you still return 0 from this
function. This means that updating via write to the "mbm_assign_control"
file may return success, even though the operation failed.
E.g. with no counters available:
# cat available_mbm_cntrs
0=0;1=0
Try to set a monitor domain to record local bandwidth:
# echo 'c1/m94/0=l;1=_;' > mbm_assign_control
# echo $?
0
Looks like it worked!
But it didn't.
# cat ../last_cmd_status
Domain Out of MBM assignable counters
rdtgroup_assign_cntr_event() does say that it didn't if you think to
check here.
> + }
> +
> + ret = resctrl_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
> + rdtgrp->closid, cntr_id, true);
> + }
> +
> +out_done_assign:
> + if (ret)
> + mbm_cntr_free(r, d, rdtgrp, evtid);
> +
> + return ret;
> +}
-Tony
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 23/24] x86/resctrl: Introduce interface to list assignment states of all the groups
2024-12-12 22:57 ` Luck, Tony
@ 2024-12-13 15:23 ` Moger, Babu
0 siblings, 0 replies; 76+ messages in thread
From: Moger, Babu @ 2024-12-13 15:23 UTC (permalink / raw)
To: Luck, Tony, Babu Moger
Cc: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen,
peternewman, fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Tony,
On 12/12/2024 4:57 PM, Luck, Tony wrote:
> On Thu, Dec 12, 2024 at 02:15:26PM -0600, Babu Moger wrote:
>> +static int rdtgroup_mbm_assign_control_show(struct kernfs_open_file *of,
>> + struct seq_file *s, void *v)
>> +{
>> + struct rdt_resource *r = of->kn->parent->priv;
>> + struct rdt_mon_domain *dom;
>> + struct rdtgroup *rdtg;
>> + char str[10];
>> +
>> + cpus_read_lock();
>> + mutex_lock(&rdtgroup_mutex);
>> + rdt_last_cmd_clear();
>> +
>> + if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
>> + rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
>> + mutex_unlock(&rdtgroup_mutex);
>> + cpus_read_unlock();
>> + return -EINVAL;
>> + }
>> +
>> + list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
>> + struct rdtgroup *crg;
>> +
>> + seq_printf(s, "%s//", rdtg->kn->name);
>> +
>> + list_for_each_entry(dom, &r->mon_domains, hdr.list)
>> + seq_printf(s, "%d=%s;", dom->hdr.id,
>> + rdtgroup_mon_state_to_str(r, dom, rdtg, str));
>> + seq_putc(s, '\n');
>
> Other resctrl files with domain lists use ';' as a separator, not a
> terminator. This code results in:
>
> //0=tl;1=tl;
>
> rather than
>
> //0=tl;1=tl
Agree. Will correct it.
>> +
>> + list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
>> + mon.crdtgrp_list) {
>> + seq_printf(s, "%s/%s/", rdtg->kn->name, crg->kn->name);
>> +
>> + list_for_each_entry(dom, &r->mon_domains, hdr.list)
>> + seq_printf(s, "%d=%s;", dom->hdr.id,
>> + rdtgroup_mon_state_to_str(r, dom, crg, str));
>> + seq_putc(s, '\n');
>
> Ditto.
Sure.
>
>> + }
>> + }
>> +
>> + mutex_unlock(&rdtgroup_mutex);
>> + cpus_read_unlock();
>> + return 0;
>> +}
>
thanks
Babu
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 16/24] x86/resctrl: Add interface to the assign counter
2024-12-12 23:37 ` Luck, Tony
@ 2024-12-13 15:57 ` Moger, Babu
2024-12-13 16:24 ` Luck, Tony
0 siblings, 1 reply; 76+ messages in thread
From: Moger, Babu @ 2024-12-13 15:57 UTC (permalink / raw)
To: Luck, Tony, Babu Moger
Cc: corbet, reinette.chatre, tglx, mingo, bp, dave.hansen,
peternewman, fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Tony,
On 12/12/2024 5:37 PM, Luck, Tony wrote:
> On Thu, Dec 12, 2024 at 02:15:19PM -0600, Babu Moger wrote:
>> +/*
>> + * Assign a hardware counter to event @evtid of group @rdtgrp.
>> + * Counter will be assigned to all the domains if rdt_mon_domain is NULL
>> + * else the counter will be assigned to specific domain.
>> + */
>> +int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>> + struct rdt_mon_domain *d, enum resctrl_event_id evtid)
>> +{
>> + int cntr_id, ret = 0;
>> +
>> + if (!d) {
>> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> + if (mbm_cntr_assigned(r, d, rdtgrp, evtid))
>> + continue;
>> +
>> + cntr_id = mbm_cntr_alloc(r, d, rdtgrp, evtid);
>> + if (cntr_id < 0) {
>> + rdt_last_cmd_puts("Domain Out of MBM assignable counters\n");
>
> Message could be more helpful by including the domain number.
Yes. We can do that. I will to use rdt_last_cmd_printf().
>
>> + continue;
>
> Not sure whether continuing is the right thing to do here. Sure the
> other domains may have available counters, but now you may have a
> confused status where some requested operations succeeded and others
> failed.
>
>> + }
>> +
>> + ret = resctrl_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
>> + rdtgrp->closid, cntr_id, true);
>> + if (ret)
>> + goto out_done_assign;
>> + }
>> + } else {
>> + if (mbm_cntr_assigned(r, d, rdtgrp, evtid))
>> + goto out_done_assign;
>> +
>> + cntr_id = mbm_cntr_alloc(r, d, rdtgrp, evtid);
>> + if (cntr_id < 0) {
>> + rdt_last_cmd_puts("Domain Out of MBM assignable counters\n");
>
> Ditto helpful to include domain number.
Sure.
>
>> + goto out_done_assign;
>
> When you run out of counters here, you still return 0 from this
> function. This means that updating via write to the "mbm_assign_control"
> file may return success, even though the operation failed.
>
> E.g. with no counters available:
>
> # cat available_mbm_cntrs
> 0=0;1=0
>
> Try to set a monitor domain to record local bandwidth:
>
> # echo 'c1/m94/0=l;1=_;' > mbm_assign_control
> # echo $?
> 0
>
> Looks like it worked!
>
> But it didn't.
>
> # cat ../last_cmd_status
> Domain Out of MBM assignable counters
>
> rdtgroup_assign_cntr_event() does say that it didn't if you think to
> check here.
Yes. Agree.
It is right thing to continue assignment if one of the domain is out of
counters. In that case how about we save the error(say error_domain) and
continue. And finally return success if both ret and error_domain are zeros.
return ret ? ret : error_domain:
>
>> + }
>> +
>> + ret = resctrl_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
>> + rdtgrp->closid, cntr_id, true);
>> + }
>> +
>> +out_done_assign:
>> + if (ret)
>> + mbm_cntr_free(r, d, rdtgrp, evtid);
>> +
>> + return ret;
>> +}
>
> -Tony
>
thanks
Babu
^ permalink raw reply [flat|nested] 76+ messages in thread
* RE: [PATCH v10 16/24] x86/resctrl: Add interface to the assign counter
2024-12-13 15:57 ` Moger, Babu
@ 2024-12-13 16:24 ` Luck, Tony
2024-12-13 16:54 ` Moger, Babu
0 siblings, 1 reply; 76+ messages in thread
From: Luck, Tony @ 2024-12-13 16:24 UTC (permalink / raw)
To: Moger, Babu, Babu Moger
Cc: corbet@lwn.net, Chatre, Reinette, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
peternewman@google.com, Yu, Fenghua, x86@kernel.org,
hpa@zytor.com, paulmck@kernel.org, akpm@linux-foundation.org,
thuth@redhat.com, rostedt@goodmis.org,
xiongwei.song@windriver.com, pawan.kumar.gupta@linux.intel.com,
daniel.sneddon@linux.intel.com, jpoimboe@kernel.org,
perry.yuan@amd.com, Huang, Kai, Li, Xiaoyao, seanjc@google.com,
Li, Xin3, andrew.cooper3@citrix.com, ebiggers@google.com,
mario.limonciello@amd.com, james.morse@arm.com,
tan.shaopeng@fujitsu.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, Wieczor-Retman, Maciej,
Eranian, Stephane
> It is right thing to continue assignment if one of the domain is out of
> counters. In that case how about we save the error(say error_domain) and
> continue. And finally return success if both ret and error_domain are zeros.
>
> return ret ? ret : error_domain:
If there are many domains, then you might have 3 succeed and 5 fail.
I think the best you can do is return success if everything succeeded
and an error if any failed.
You have the same issue if someone tries to update multiple things
with a single write to mbm_assign_control:
# cat > mbm_assign_control << EOF
c1/m78/0=t;1=l;
c1/m79/0=t;1=l
c1/m80/0=t;1=l;
c1/m81/0=t;1=l;
EOF
Those get processed in order, some may succeed, but once a domain
is out of counters the rest for that domain will fail.
Updates to schemata are handled in multiple passes to either have
all succeed or all fail. But the only problems that can occur are user
syntax/range issues. So it's a lot simpler.
For writes to mbm_assign_control I think it's okay to document that
some requests may have been applied even though the whole request
reports failure. The user can always read the file to check status.
-Tony
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: RE: [PATCH v10 16/24] x86/resctrl: Add interface to the assign counter
2024-12-13 16:24 ` Luck, Tony
@ 2024-12-13 16:54 ` Moger, Babu
2024-12-18 22:01 ` Reinette Chatre
0 siblings, 1 reply; 76+ messages in thread
From: Moger, Babu @ 2024-12-13 16:54 UTC (permalink / raw)
To: Luck, Tony, Babu Moger
Cc: corbet@lwn.net, Chatre, Reinette, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
peternewman@google.com, Yu, Fenghua, x86@kernel.org,
hpa@zytor.com, paulmck@kernel.org, akpm@linux-foundation.org,
thuth@redhat.com, rostedt@goodmis.org,
xiongwei.song@windriver.com, pawan.kumar.gupta@linux.intel.com,
daniel.sneddon@linux.intel.com, jpoimboe@kernel.org,
perry.yuan@amd.com, Huang, Kai, Li, Xiaoyao, seanjc@google.com,
Li, Xin3, andrew.cooper3@citrix.com, ebiggers@google.com,
mario.limonciello@amd.com, james.morse@arm.com,
tan.shaopeng@fujitsu.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, Wieczor-Retman, Maciej,
Eranian, Stephane
Hi Tony,
On 12/13/2024 10:24 AM, Luck, Tony wrote:
>> It is right thing to continue assignment if one of the domain is out of
>> counters. In that case how about we save the error(say error_domain) and
>> continue. And finally return success if both ret and error_domain are zeros.
>>
>> return ret ? ret : error_domain:
>
> If there are many domains, then you might have 3 succeed and 5 fail.
>
> I think the best you can do is return success if everything succeeded
> and an error if any failed.
Yes. The above check should take care of this case.
>
> You have the same issue if someone tries to update multiple things
> with a single write to mbm_assign_control:
>
> # cat > mbm_assign_control << EOF
> c1/m78/0=t;1=l;
> c1/m79/0=t;1=l
> c1/m80/0=t;1=l;
> c1/m81/0=t;1=l;
> EOF
>
> Those get processed in order, some may succeed, but once a domain
> is out of counters the rest for that domain will fail.
Yes. I see the similar type of processing for schemata.
It is processed sequentially. If one fails, it returns immediately.
ret = rdtgroup_parse_resource(resname, tok, rdtgrp);
if (ret)
goto out;
I feel it is ok to keep same level of processing.
>
> Updates to schemata are handled in multiple passes to either have
> all succeed or all fail. But the only problems that can occur are user
> syntax/range issues. So it's a lot simpler.
>
> For writes to mbm_assign_control I think it's okay to document that
> some requests may have been applied even though the whole request
> reports failure. The user can always read the file to check status.
Yes. We can document this.
>
> -Tony
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 16/24] x86/resctrl: Add interface to the assign counter
2024-12-13 16:54 ` Moger, Babu
@ 2024-12-18 22:01 ` Reinette Chatre
2024-12-19 19:45 ` Moger, Babu
0 siblings, 1 reply; 76+ messages in thread
From: Reinette Chatre @ 2024-12-18 22:01 UTC (permalink / raw)
To: Moger, Babu, Luck, Tony, Babu Moger
Cc: corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, peternewman@google.com,
Yu, Fenghua, x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
akpm@linux-foundation.org, thuth@redhat.com, rostedt@goodmis.org,
xiongwei.song@windriver.com, pawan.kumar.gupta@linux.intel.com,
daniel.sneddon@linux.intel.com, jpoimboe@kernel.org,
perry.yuan@amd.com, Huang, Kai, Li, Xiaoyao, seanjc@google.com,
Li, Xin3, andrew.cooper3@citrix.com, ebiggers@google.com,
mario.limonciello@amd.com, james.morse@arm.com,
tan.shaopeng@fujitsu.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, Wieczor-Retman, Maciej,
Eranian, Stephane
On 12/13/24 8:54 AM, Moger, Babu wrote:
> On 12/13/2024 10:24 AM, Luck, Tony wrote:
>>> It is right thing to continue assignment if one of the domain is out of
>>> counters. In that case how about we save the error(say error_domain) and
>>> continue. And finally return success if both ret and error_domain are zeros.
>>>
>>> return ret ? ret : error_domain:
>>
>> If there are many domains, then you might have 3 succeed and 5 fail.
>>
>> I think the best you can do is return success if everything succeeded
>> and an error if any failed.
>
> Yes. The above check should take care of this case.
>
If I understand correctly "error_domain" can capture the ID of
a single failing domain. If there are multiple failing domains like
in Tony's example then "error_domain" will not be accurate and thus
can never be trusted. Instead of a single check of a failure user
space is then forced to parse the more complex "mbm_assign_control"
file to learn what succeeded and failed.
Would it not be simpler to process sequentially and then fail on
first error encountered with detailed error message? With that
user space can determine exactly which portion of request
succeeded and which portion failed.
>>
>> You have the same issue if someone tries to update multiple things
>> with a single write to mbm_assign_control:
>>
>> # cat > mbm_assign_control << EOF
>> c1/m78/0=t;1=l;
>> c1/m79/0=t;1=l
>> c1/m80/0=t;1=l;
>> c1/m81/0=t;1=l;
>> EOF
>>
>> Those get processed in order, some may succeed, but once a domain
>> is out of counters the rest for that domain will fail.
>
> Yes. I see the similar type of processing for schemata.
> It is processed sequentially. If one fails, it returns immediately.
>
> ret = rdtgroup_parse_resource(resname, tok, rdtgrp);
> if (ret)
> goto out;
>
> I feel it is ok to keep same level of processing.
>
resctrl also does sequential processing when, for example, the user requests
move of several tasks. resctrl returns with failure right away with error message
containing failing PID. This gives clear information to user what
portion of request succeeded without requiring user space to
do additional queries.
Reinette
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 16/24] x86/resctrl: Add interface to the assign counter
2024-12-18 22:01 ` Reinette Chatre
@ 2024-12-19 19:45 ` Moger, Babu
2024-12-19 21:12 ` Reinette Chatre
0 siblings, 1 reply; 76+ messages in thread
From: Moger, Babu @ 2024-12-19 19:45 UTC (permalink / raw)
To: Reinette Chatre, Luck, Tony, Babu Moger
Cc: corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, peternewman@google.com,
Yu, Fenghua, x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
akpm@linux-foundation.org, thuth@redhat.com, rostedt@goodmis.org,
xiongwei.song@windriver.com, pawan.kumar.gupta@linux.intel.com,
daniel.sneddon@linux.intel.com, jpoimboe@kernel.org,
perry.yuan@amd.com, Huang, Kai, Li, Xiaoyao, seanjc@google.com,
Li, Xin3, andrew.cooper3@citrix.com, ebiggers@google.com,
mario.limonciello@amd.com, james.morse@arm.com,
tan.shaopeng@fujitsu.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, Wieczor-Retman, Maciej,
Eranian, Stephane
Hi Reinette,
On 12/18/2024 4:01 PM, Reinette Chatre wrote:
>
>
> On 12/13/24 8:54 AM, Moger, Babu wrote:
>> On 12/13/2024 10:24 AM, Luck, Tony wrote:
>>>> It is right thing to continue assignment if one of the domain is out of
>>>> counters. In that case how about we save the error(say error_domain) and
>>>> continue. And finally return success if both ret and error_domain are zeros.
>>>>
>>>> return ret ? ret : error_domain:
>>>
>>> If there are many domains, then you might have 3 succeed and 5 fail.
>>>
>>> I think the best you can do is return success if everything succeeded
>>> and an error if any failed.
>>
>> Yes. The above check should take care of this case.
>>
>
> If I understand correctly "error_domain" can capture the ID of
> a single failing domain. If there are multiple failing domains like
> in Tony's example then "error_domain" will not be accurate and thus
> can never be trusted. Instead of a single check of a failure user
> space is then forced to parse the more complex "mbm_assign_control"
> file to learn what succeeded and failed.
>
> Would it not be simpler to process sequentially and then fail on
> first error encountered with detailed error message? With that
> user space can determine exactly which portion of request
> succeeded and which portion failed.
One more option is to print the error for each failure and continue. And
finally return error.
"Group mon1, domain:1 Out of MBM counters"
We have the error information as well as the convenience of assignment
on domains where counters are available when user is working with
"*"(all domains).
Note: I will be out of office starting next week Until Jan 10.
>
>>>
>>> You have the same issue if someone tries to update multiple things
>>> with a single write to mbm_assign_control:
>>>
>>> # cat > mbm_assign_control << EOF
>>> c1/m78/0=t;1=l;
>>> c1/m79/0=t;1=l
>>> c1/m80/0=t;1=l;
>>> c1/m81/0=t;1=l;
>>> EOF
>>>
>>> Those get processed in order, some may succeed, but once a domain
>>> is out of counters the rest for that domain will fail.
>>
>> Yes. I see the similar type of processing for schemata.
>> It is processed sequentially. If one fails, it returns immediately.
>>
>> ret = rdtgroup_parse_resource(resname, tok, rdtgrp);
>> if (ret)
>> goto out;
>>
>> I feel it is ok to keep same level of processing.
>>
>
> resctrl also does sequential processing when, for example, the user requests
> move of several tasks. resctrl returns with failure right away with error message
> containing failing PID. This gives clear information to user what
> portion of request succeeded without requiring user space to
> do additional queries.
>
>
> Reinette
>
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 16/24] x86/resctrl: Add interface to the assign counter
2024-12-19 19:45 ` Moger, Babu
@ 2024-12-19 21:12 ` Reinette Chatre
2024-12-19 21:38 ` Moger, Babu
0 siblings, 1 reply; 76+ messages in thread
From: Reinette Chatre @ 2024-12-19 21:12 UTC (permalink / raw)
To: Moger, Babu, Luck, Tony, Babu Moger
Cc: corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, peternewman@google.com,
Yu, Fenghua, x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
akpm@linux-foundation.org, thuth@redhat.com, rostedt@goodmis.org,
xiongwei.song@windriver.com, pawan.kumar.gupta@linux.intel.com,
daniel.sneddon@linux.intel.com, jpoimboe@kernel.org,
perry.yuan@amd.com, Huang, Kai, Li, Xiaoyao, seanjc@google.com,
Li, Xin3, andrew.cooper3@citrix.com, ebiggers@google.com,
mario.limonciello@amd.com, james.morse@arm.com,
tan.shaopeng@fujitsu.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, Wieczor-Retman, Maciej,
Eranian, Stephane
Hi Babu,
On 12/19/24 11:45 AM, Moger, Babu wrote:
> Hi Reinette,
>
> On 12/18/2024 4:01 PM, Reinette Chatre wrote:
>>
>>
>> On 12/13/24 8:54 AM, Moger, Babu wrote:
>>> On 12/13/2024 10:24 AM, Luck, Tony wrote:
>>>>> It is right thing to continue assignment if one of the domain is out of
>>>>> counters. In that case how about we save the error(say error_domain) and
>>>>> continue. And finally return success if both ret and error_domain are zeros.
>>>>>
>>>>> return ret ? ret : error_domain:
>>>>
>>>> If there are many domains, then you might have 3 succeed and 5 fail.
>>>>
>>>> I think the best you can do is return success if everything succeeded
>>>> and an error if any failed.
>>>
>>> Yes. The above check should take care of this case.
>>>
>>
>> If I understand correctly "error_domain" can capture the ID of
>> a single failing domain. If there are multiple failing domains like
>> in Tony's example then "error_domain" will not be accurate and thus
>> can never be trusted. Instead of a single check of a failure user
>> space is then forced to parse the more complex "mbm_assign_control"
>> file to learn what succeeded and failed.
>>
>> Would it not be simpler to process sequentially and then fail on
>> first error encountered with detailed error message? With that
>> user space can determine exactly which portion of request
>> succeeded and which portion failed.
>
> One more option is to print the error for each failure and continue. And finally return error.
>
> "Group mon1, domain:1 Out of MBM counters"
>
> We have the error information as well as the convenience of assignment on domains where counters are available when user is working with "*"(all domains).
This may be possible. Please keep in mind that any errors have to be
easily consumed in an automated way to support the user space tools
that interact with resctrl. I do not think we have thus far focused
on the "last_cmd_status" buffer as part of the user space ABI so this opens
up more considerations.
At this time the error handling of "all domains" does not seem to be
consistent and obvious to user space. From what I can tell the
implementation continues on to the next domain if one domain is out
of counters but it exits immediately if a counter cannot be configured
on a particular domain.
>
> Note: I will be out of office starting next week Until Jan 10.
Thank you for letting me know. I am currently reviewing this series
and will post feedback by tomorrow.
Reinette
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 16/24] x86/resctrl: Add interface to the assign counter
2024-12-19 21:12 ` Reinette Chatre
@ 2024-12-19 21:38 ` Moger, Babu
2024-12-19 21:45 ` Luck, Tony
0 siblings, 1 reply; 76+ messages in thread
From: Moger, Babu @ 2024-12-19 21:38 UTC (permalink / raw)
To: Reinette Chatre, Luck, Tony, Babu Moger
Cc: corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, peternewman@google.com,
Yu, Fenghua, x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
akpm@linux-foundation.org, thuth@redhat.com, rostedt@goodmis.org,
xiongwei.song@windriver.com, pawan.kumar.gupta@linux.intel.com,
daniel.sneddon@linux.intel.com, jpoimboe@kernel.org,
perry.yuan@amd.com, Huang, Kai, Li, Xiaoyao, seanjc@google.com,
Li, Xin3, andrew.cooper3@citrix.com, ebiggers@google.com,
mario.limonciello@amd.com, james.morse@arm.com,
tan.shaopeng@fujitsu.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, Wieczor-Retman, Maciej,
Eranian, Stephane
Hi Reinette,
On 12/19/2024 3:12 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 12/19/24 11:45 AM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 12/18/2024 4:01 PM, Reinette Chatre wrote:
>>>
>>>
>>> On 12/13/24 8:54 AM, Moger, Babu wrote:
>>>> On 12/13/2024 10:24 AM, Luck, Tony wrote:
>>>>>> It is right thing to continue assignment if one of the domain is out of
>>>>>> counters. In that case how about we save the error(say error_domain) and
>>>>>> continue. And finally return success if both ret and error_domain are zeros.
>>>>>>
>>>>>> return ret ? ret : error_domain:
>>>>>
>>>>> If there are many domains, then you might have 3 succeed and 5 fail.
>>>>>
>>>>> I think the best you can do is return success if everything succeeded
>>>>> and an error if any failed.
>>>>
>>>> Yes. The above check should take care of this case.
>>>>
>>>
>>> If I understand correctly "error_domain" can capture the ID of
>>> a single failing domain. If there are multiple failing domains like
>>> in Tony's example then "error_domain" will not be accurate and thus
>>> can never be trusted. Instead of a single check of a failure user
>>> space is then forced to parse the more complex "mbm_assign_control"
>>> file to learn what succeeded and failed.
>>>
>>> Would it not be simpler to process sequentially and then fail on
>>> first error encountered with detailed error message? With that
>>> user space can determine exactly which portion of request
>>> succeeded and which portion failed.
>>
>> One more option is to print the error for each failure and continue. And finally return error.
>>
>> "Group mon1, domain:1 Out of MBM counters"
>>
>> We have the error information as well as the convenience of assignment on domains where counters are available when user is working with "*"(all domains).
>
> This may be possible. Please keep in mind that any errors have to be
> easily consumed in an automated way to support the user space tools
> that interact with resctrl. I do not think we have thus far focused
> on the "last_cmd_status" buffer as part of the user space ABI so this opens
> up more considerations.
>
> At this time the error handling of "all domains" does not seem to be
> consistent and obvious to user space. From what I can tell the
> implementation continues on to the next domain if one domain is out
> of counters but it exits immediately if a counter cannot be configured
> on a particular domain.
Yes. We can handle both the errors in the same way.
>
>>
>> Note: I will be out of office starting next week Until Jan 10.
>
> Thank you for letting me know. I am currently reviewing this series
> and will post feedback by tomorrow.
Sure. Thanks. I will try to get to some of it at least. The review
comments which needs investigation may have to wait. Lets see.
Thanks
Babu
^ permalink raw reply [flat|nested] 76+ messages in thread
* RE: [PATCH v10 16/24] x86/resctrl: Add interface to the assign counter
2024-12-19 21:38 ` Moger, Babu
@ 2024-12-19 21:45 ` Luck, Tony
2024-12-19 22:33 ` Moger, Babu
0 siblings, 1 reply; 76+ messages in thread
From: Luck, Tony @ 2024-12-19 21:45 UTC (permalink / raw)
To: Moger, Babu, Chatre, Reinette, Babu Moger
Cc: corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, peternewman@google.com,
Yu, Fenghua, x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
akpm@linux-foundation.org, thuth@redhat.com, rostedt@goodmis.org,
xiongwei.song@windriver.com, pawan.kumar.gupta@linux.intel.com,
daniel.sneddon@linux.intel.com, jpoimboe@kernel.org,
perry.yuan@amd.com, Huang, Kai, Li, Xiaoyao, seanjc@google.com,
Li, Xin3, andrew.cooper3@citrix.com, ebiggers@google.com,
mario.limonciello@amd.com, james.morse@arm.com,
tan.shaopeng@fujitsu.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, Wieczor-Retman, Maciej,
Eranian, Stephane
> >>>>>> It is right thing to continue assignment if one of the domain is out of
> >>>>>> counters. In that case how about we save the error(say error_domain) and
> >>>>>> continue. And finally return success if both ret and error_domain are zeros.
> >>>>>>
> >>>>>> return ret ? ret : error_domain:
> >>>>>
> >>>>> If there are many domains, then you might have 3 succeed and 5 fail.
> >>>>>
> >>>>> I think the best you can do is return success if everything succeeded
> >>>>> and an error if any failed.
> >>>>
> >>>> Yes. The above check should take care of this case.
> >>>>
> >>>
> >>> If I understand correctly "error_domain" can capture the ID of
> >>> a single failing domain. If there are multiple failing domains like
> >>> in Tony's example then "error_domain" will not be accurate and thus
> >>> can never be trusted. Instead of a single check of a failure user
> >>> space is then forced to parse the more complex "mbm_assign_control"
> >>> file to learn what succeeded and failed.
> >>>
> >>> Would it not be simpler to process sequentially and then fail on
> >>> first error encountered with detailed error message? With that
> >>> user space can determine exactly which portion of request
> >>> succeeded and which portion failed.
> >>
> >> One more option is to print the error for each failure and continue. And finally return error.
There's limited space allocated for use by last_cmd_*() messages:
static char last_cmd_status_buf[512];
seq_buf_init(&last_cmd_status, last_cmd_status_buf,
sizeof(last_cmd_status_buf));
If you keep parsing and trying to apply changes from user input you will
quickly hit that limit.
> >>
> >> "Group mon1, domain:1 Out of MBM counters"
> >>
> >> We have the error information as well as the convenience of assignment on domains where counters are available when user is working with "*"(all domains).
> >
> > This may be possible. Please keep in mind that any errors have to be
> > easily consumed in an automated way to support the user space tools
> > that interact with resctrl. I do not think we have thus far focused
> > on the "last_cmd_status" buffer as part of the user space ABI so this opens
> > up more considerations.
> >
> > At this time the error handling of "all domains" does not seem to be
> > consistent and obvious to user space. From what I can tell the
> > implementation continues on to the next domain if one domain is out
> > of counters but it exits immediately if a counter cannot be configured
> > on a particular domain.
>
> Yes. We can handle both the errors in the same way.
I think it is simplest to make the "same way" be "fail on first error".
>
> >
> >>
> >> Note: I will be out of office starting next week Until Jan 10.
> >
> > Thank you for letting me know. I am currently reviewing this series
> > and will post feedback by tomorrow.
>
> Sure. Thanks. I will try to get to some of it at least. The review
> comments which needs investigation may have to wait. Lets see.
-Tony
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 07/24] x86/resctrl: Add support to enable/disable AMD ABMC feature
2024-12-12 20:15 ` [PATCH v10 07/24] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
@ 2024-12-19 21:48 ` Reinette Chatre
2024-12-20 15:14 ` Moger, Babu
0 siblings, 1 reply; 76+ messages in thread
From: Reinette Chatre @ 2024-12-19 21:48 UTC (permalink / raw)
To: Babu Moger, corbet, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Babu,
On 12/12/24 12:15 PM, Babu Moger wrote:
> static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r)
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 687d9d8d82a4..d54c2701c09c 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
These functions are clearly monitoring related. Is there a reason why they are
in rdtgroup.c and not in monitor.c?
> @@ -2402,6 +2402,42 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable)
> return 0;
> }
>
> +static void resctrl_abmc_set_one_amd(void *arg)
> +{
> + bool *enable = arg;
> +
> + if (*enable)
> + msr_set_bit(MSR_IA32_L3_QOS_EXT_CFG, ABMC_ENABLE_BIT);
> + else
> + msr_clear_bit(MSR_IA32_L3_QOS_EXT_CFG, ABMC_ENABLE_BIT);
> +}
> +
> +/*
> + * Update L3_QOS_EXT_CFG MSR on all the CPUs associated with the monitor
> + * domain.
> + */
> +static void _resctrl_abmc_enable(struct rdt_resource *r, bool enable)
> +{
> + struct rdt_mon_domain *d;
> +
> + list_for_each_entry(d, &r->mon_domains, hdr.list)
> + on_each_cpu_mask(&d->hdr.cpu_mask,
> + resctrl_abmc_set_one_amd, &enable, 1);
> +}
> +
> +int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable)
> +{
> + struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
> +
> + if (r->mon.mbm_cntr_assignable &&
> + hw_res->mbm_cntr_assign_enabled != enable) {
> + _resctrl_abmc_enable(r, enable);
> + hw_res->mbm_cntr_assign_enabled = enable;
> + }
> +
> + return 0;
> +}
> +
> /*
> * We don't allow rdtgroup directories to be created anywhere
> * except the root directory. Thus when looking for the rdtgroup
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 511cfce8fc21..f11d6fdfd977 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -355,4 +355,7 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *
> extern unsigned int resctrl_rmid_realloc_threshold;
> extern unsigned int resctrl_rmid_realloc_limit;
>
> +int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable);
> +bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r);
> +
> #endif /* _RESCTRL_H */
During the software controller work Boris stated [1] that these APIs should
only appear in the main header file at the time they are used. This series
makes a few changes to include/linux/resctrl.h that, considering this
feedback, should rather be in arch/x86/kernel/cpu/resctrl/internal.h
until MPAM starts using them.
Reinette
[1] https://lore.kernel.org/all/20241209222047.GKZ1dtPxIu5_Hxs1fp@fat_crate.local/
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 08/24] x86/resctrl: Introduce the interface to display monitor mode
2024-12-12 20:15 ` [PATCH v10 08/24] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
@ 2024-12-19 21:59 ` Reinette Chatre
2024-12-20 15:31 ` Moger, Babu
0 siblings, 1 reply; 76+ messages in thread
From: Reinette Chatre @ 2024-12-19 21:59 UTC (permalink / raw)
To: Babu Moger, corbet, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Babu,
On 12/12/24 12:15 PM, Babu Moger wrote:
> Introduce the interface file "mbm_assign_mode" to list monitor modes
> supported.
>
> The "mbm_cntr_assign" mode provides the option to assign a counter to
> an RMID, event pair and monitor the bandwidth as long as it is assigned.
>
> On AMD systems "mbm_cntr_assign" is backed by the ABMC (Assignable
> Bandwidth Monitoring Counters) hardware feature and is enabled by default.
>
> The "default" mode is the existing monitoring mode that works without the
> explicit counter assignment, instead relying on dynamic counter assignment
> by hardware that may result in hardware not dedicating a counter resulting
> in monitoring data reads returning "Unavailable".
>
> Provide an interface to display the monitor mode on the system.
> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> [mbm_cntr_assign]
> default
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v10: Added few more text to user documentation clarify on the default mode.
>
> v9: Updated user documentation based on comments.
>
> v8: Commit message update.
>
> v7: Updated the descriptions/commit log in resctrl.rst to generic text.
> Thanks to James and Reinette.
> Rename mbm_mode to mbm_assign_mode.
> Introduced mutex lock in rdtgroup_mbm_mode_show().
>
> v6: Added documentation for mbm_cntr_assign and legacy mode.
> Moved mbm_mode fflags initialization to static initialization.
>
> v5: Changed interface name to mbm_mode.
> It will be always available even if ABMC feature is not supported.
> Added description in resctrl.rst about ABMC mode.
> Fixed display abmc and legacy consistantly.
>
> v4: Fixed the checks for legacy and abmc mode. Default it ABMC.
>
> v3: New patch to display ABMC capability.
> ---
> Documentation/arch/x86/resctrl.rst | 33 ++++++++++++++++++++++++++
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 31 ++++++++++++++++++++++++
> 2 files changed, 64 insertions(+)
>
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index 30586728a4cd..1e4a1f615496 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -257,6 +257,39 @@ with the following files:
> # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
> 0=0x30;1=0x30;3=0x15;4=0x15
>
> +"mbm_assign_mode":
> + Reports the list of monitoring modes supported. The enclosed brackets
> + indicate which mode is enabled.
> + ::
> +
> + # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> + [mbm_cntr_assign]
> + default
> +
> + "mbm_cntr_assign":
> +
The text below jumps into usage with assumption that user space already
understands the feature. How about starting with some context? Something like
"A monitoring event can only accumulate data while it is backed by a hardware
counter."
> + In mbm_cntr_assign mode user-space is able to specify which of the
> + events in CTRL_MON or MON groups should have a counter assigned using the
> + "mbm_assign_control" file. The number of counters available is described
> + in the "num_mbm_cntrs" file. Changing the mode may cause all counters on
> + a resource to reset.
> +
> + The mode is useful on AMD platforms which support more CTRL_MON and MON
> + groups than hardware counters, meaning 'unassigned' events on CTRL_MON or
> + MON groups will report 'Unavailable'.
The "meaning 'unassigned'" is not clear to me since in "mbm_cntr_assign" mode
these events will (at end of this series) actually return "Unassigned", no? Perhaps
this portion can be dropped for now and the text found in patch #20 about returning
"Unassigned" can be placed here instead ... but this should probably be done in
patch #19 that adds that capability.
> +
> + AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature
> + enable this mode by default so that counters remain assigned even when the
> + corresponding RMID is not in use by any processor.
> +
> + "default":
> +
> + In default mode, resctrl assumes there is a hardware counter for each
> + event within every CTRL_MON and MON group. On AMD platforms, it is
> + recommended to use mbm_cntr_assign mode if supported, because reading
> + "mbm_total_bytes" or "mbm_local_bytes" will report 'Unavailable' if
> + there is no counter associated with that event.
> +
> "max_threshold_occupancy":
> Read/write file provides the largest value (in
> bytes) at which a previously used LLC_occupancy
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index d54c2701c09c..f25ff1430014 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -845,6 +845,30 @@ static int rdtgroup_rmid_show(struct kernfs_open_file *of,
> return ret;
> }
>
> +static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
> + struct seq_file *s, void *v)
I remember this topic from earlier version yet I still see many instances
of the "rdtgroup_" namespace used for functions that do not interact with
resource groups. Could you please check this series and fix this?
> +{
> + struct rdt_resource *r = of->kn->parent->priv;
> +
> + mutex_lock(&rdtgroup_mutex);
> +
> + if (r->mon.mbm_cntr_assignable) {
> + if (resctrl_arch_mbm_cntr_assign_enabled(r)) {
> + seq_puts(s, "[mbm_cntr_assign]\n");
> + seq_puts(s, "default\n");
> + } else {
> + seq_puts(s, "mbm_cntr_assign\n");
> + seq_puts(s, "[default]\n");
> + }
> + } else {
> + seq_puts(s, "[default]\n");
> + }
> +
> + mutex_unlock(&rdtgroup_mutex);
> +
> + return 0;
> +}
> +
> #ifdef CONFIG_PROC_CPU_RESCTRL
>
> /*
> @@ -1901,6 +1925,13 @@ static struct rftype res_common_files[] = {
> .seq_show = mbm_local_bytes_config_show,
> .write = mbm_local_bytes_config_write,
> },
> + {
> + .name = "mbm_assign_mode",
> + .mode = 0444,
> + .kf_ops = &rdtgroup_kf_single_ops,
> + .seq_show = rdtgroup_mbm_assign_mode_show,
> + .fflags = RFTYPE_MON_INFO,
> + },
> {
> .name = "cpus",
> .mode = 0644,
Reinette
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 09/24] x86/resctrl: Introduce interface to display number of monitoring counters
2024-12-12 20:15 ` [PATCH v10 09/24] x86/resctrl: Introduce interface to display number of monitoring counters Babu Moger
@ 2024-12-19 22:03 ` Reinette Chatre
2024-12-20 15:41 ` Moger, Babu
0 siblings, 1 reply; 76+ messages in thread
From: Reinette Chatre @ 2024-12-19 22:03 UTC (permalink / raw)
To: Babu Moger, corbet, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Babu,
On 12/12/24 12:15 PM, Babu Moger wrote:
> The mbm_cntr_assign mode provides an option to the user to assign a
> counter to an RMID, event pair and monitor the bandwidth as long as
> the counter is assigned. Number of assignments depend on number of
> monitoring counters available.
>
> Provide the interface to display the number of monitoring counters
> supported. The interface file 'num_mbm_cntrs' is available when an
> architecture supports mbm_cntr_assign mode.
How about: "The resctrl file 'num_mbm_cntrs' is visible to user space
when the system supports mbm_cntr_assign mode." ?
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v10: No changes.
>
> v9: Updated user document based on the comments.
> Will add a new file available_mbm_cntrs later in the series.
>
> v8: Commit message update and documentation update.
>
> v7: Minor commit log text changes.
>
> v6: No changes.
>
> v5: Changed the display name from num_cntrs to num_mbm_cntrs.
> Updated the commit message.
> Moved the patch after mbm_mode is introduced.
>
> v4: Changed the counter name to num_cntrs. And few text changes.
>
> v3: Changed the field name to mbm_assign_cntrs.
>
> v2: Changed the field name to mbm_assignable_counters from abmc_counter.
> ---
> ---
> Documentation/arch/x86/resctrl.rst | 12 ++++++++++++
> arch/x86/kernel/cpu/resctrl/monitor.c | 1 +
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 16 ++++++++++++++++
> 3 files changed, 29 insertions(+)
>
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index 1e4a1f615496..43a861adeada 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -290,6 +290,18 @@ with the following files:
> "mbm_total_bytes" or "mbm_local_bytes" will report 'Unavailable' if
> there is no counter associated with that event.
>
> +"num_mbm_cntrs":
> + The number of monitoring counters available for assignment when the
> + architecture supports mbm_cntr_assign mode.
"architecture supports" -> "system supports"
> +
> + The resctrl file system supports tracking up to two memory bandwidth
> + events per monitoring group: mbm_total_bytes and/or mbm_local_bytes.
> + Up to two counters can be assigned per monitoring group, one for each
> + memory bandwidth event. More monitoring groups can be tracked by
> + assigning one counter per monitoring group. However, doing so limits
> + memory bandwidth tracking to a single memory bandwidth event per
> + monitoring group.
> +
> "max_threshold_occupancy":
> Read/write file provides the largest value (in
> bytes) at which a previously used LLC_occupancy
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 80be91671dc1..c23e94fa6852 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -1237,6 +1237,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
> r->mon.mbm_cntr_assignable = true;
> cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
> r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
> + resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
> }
> }
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index f25ff1430014..339bb0b09a82 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -869,6 +869,16 @@ static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
> return 0;
> }
>
> +static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
> + struct seq_file *s, void *v)
No rdtgroup_ namespace, this can be resctrl_
> +{
> + struct rdt_resource *r = of->kn->parent->priv;
> +
> + seq_printf(s, "%d\n", r->mon.num_mbm_cntrs);
> +
> + return 0;
> +}
> +
> #ifdef CONFIG_PROC_CPU_RESCTRL
>
> /*
Reinette
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 11/24] x86/resctrl: Remove MSR reading of event configuration value
2024-12-12 20:15 ` [PATCH v10 11/24] x86/resctrl: Remove MSR reading of event configuration value Babu Moger
@ 2024-12-19 22:12 ` Reinette Chatre
2024-12-20 16:09 ` Moger, Babu
0 siblings, 1 reply; 76+ messages in thread
From: Reinette Chatre @ 2024-12-19 22:12 UTC (permalink / raw)
To: Babu Moger, corbet, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Babu,
On 12/12/24 12:15 PM, Babu Moger wrote:
> @@ -1604,33 +1645,11 @@ unsigned int mon_event_config_index_get(u32 evtid)
> }
> }
>
> -static void mon_event_config_read(void *info)
> -{
> - struct mon_config_info *mon_info = info;
> - unsigned int index;
> - u64 msrval;
> -
> - index = mon_event_config_index_get(mon_info->evtid);
> - if (index == INVALID_CONFIG_INDEX) {
> - pr_warn_once("Invalid event id %d\n", mon_info->evtid);
> - return;
> - }
> - rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval);
> -
> - /* Report only the valid event configuration bits */
> - mon_info->mon_config = msrval & MAX_EVT_CONFIG_BITS;
> -}
> -
> -static void mondata_config_read(struct rdt_mon_domain *d, struct mon_config_info *mon_info)
> -{
> - smp_call_function_any(&d->hdr.cpu_mask, mon_event_config_read, mon_info, 1);
> -}
> -
> static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
> {
> - struct mon_config_info mon_info;
> struct rdt_mon_domain *dom;
> bool sep = false;
> + u32 val;
Could this variable name be more descriptive? For example, mon_config, or config_val as
used in mbm_config_write_domain()?
...
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index f11d6fdfd977..c8ab3d7a0dab 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -118,6 +118,18 @@ struct rdt_mon_domain {
> int cqm_work_cpu;
> };
>
> +/**
> + * struct mon_config_info - Monitoring event configuratiin details
configuratiin -> configuration
... but actually, the motivation for moving this struct here was
to make it available for an arch to interpret the data passed
via resctrl_arch_mon_event_config_set(). This patch passes data
in this struct but a later patch modifies
resctrl_arch_mon_event_config_set() to not use struct anymore ...
and then leaves struct mon_config_info here.
Even so, considering Boris's preference this is no longer needed.
> + * @d: Domain for the event
> + * @evtid: Event type
> + * @mon_config: Event configuration value
> + */
> +struct mon_config_info {
> + struct rdt_mon_domain *d;
> + enum resctrl_event_id evtid;
> + u32 mon_config;
> +};
> +
> /**
> * struct resctrl_cache - Cache allocation related data
> * @cbm_len: Length of the cache bit mask
> @@ -352,6 +364,10 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
> */
> void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d);
>
> +void resctrl_arch_mon_event_config_set(void *info);
> +u32 resctrl_arch_mon_event_config_get(struct rdt_mon_domain *d,
> + enum resctrl_event_id eventid);
> +
Please move to internal header file instead and consider this for
all changes to include/linux/resctrl.h
> extern unsigned int resctrl_rmid_realloc_threshold;
> extern unsigned int resctrl_rmid_realloc_limit;
>
Reinette
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 12/24] x86/resctrl: Introduce cntr_cfg to track assignable counters at domain
2024-12-12 20:15 ` [PATCH v10 12/24] x86/resctrl: Introduce cntr_cfg to track assignable counters at domain Babu Moger
@ 2024-12-19 22:33 ` Reinette Chatre
2024-12-20 17:33 ` Moger, Babu
0 siblings, 1 reply; 76+ messages in thread
From: Reinette Chatre @ 2024-12-19 22:33 UTC (permalink / raw)
To: Babu Moger, corbet, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Babu,
Did subject intend to use name of new struct?
On 12/12/24 12:15 PM, Babu Moger wrote:
> In mbm_assign_mode, the MBM counters are assigned/unassigned to an RMID,
> event pair in a resctrl group and monitor the bandwidth as long as it is
> assigned. Counters are assigned/unassigned at domain level and needs to
> be tracked at domain level.
>
> Add the mbm_assign_cntr_cfg data structure to struct rdt_ctrl_domain to
"mbm_assign_cntr_cfg" -> "mbm_cntr_cfg"
> manage and track MBM counter assignments at the domain level.
This can really use some more information about this data structure. I think
it will be helpful to provide more information about how the data structure
looks ... for example, that it is an array indexed by counter ID where the
assignment details of each counter is stored. I also think it will be helpful
to describe how interactions with this data structure works, that a NULL
rdtgrp means that the counter is free and that it is not possible to find
a counter from a resource group and arrays need to be searched instead and doing
so is ok for $REASON (when considering the number of RMID and domain combinations
possible on AMD). A lot is left for the reader to figure out.
>
> Suggested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v10: Patch changed completely to handle the counters at domain level.
> https://lore.kernel.org/lkml/CALPaoCj+zWq1vkHVbXYP0znJbe6Ke3PXPWjtri5AFgD9cQDCUg@mail.gmail.com/
> Removed Reviewed-by tag.
> Did not see the need to add cntr_id in mbm_state structure. Not used in the code.
>
> v9: Added Reviewed-by tag. No other changes.
>
> v8: Minor commit message changes.
>
> v7: Added check mbm_cntr_assignable for allocating bitmap mbm_cntr_map
>
> v6: New patch to add domain level assignment.
> ---
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 11 +++++++++++
> include/linux/resctrl.h | 12 ++++++++++++
> 2 files changed, 23 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 682f47e0beb1..1ee008a63d8b 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -4068,6 +4068,7 @@ static void __init rdtgroup_setup_default(void)
>
> static void domain_destroy_mon_state(struct rdt_mon_domain *d)
> {
> + kfree(d->cntr_cfg);
> bitmap_free(d->rmid_busy_llc);
> kfree(d->mbm_total);
> kfree(d->mbm_local);
> @@ -4141,6 +4142,16 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain
> return -ENOMEM;
> }
> }
> + if (is_mbm_enabled() && r->mon.mbm_cntr_assignable) {
> + tsize = sizeof(*d->cntr_cfg);
> + d->cntr_cfg = kcalloc(r->mon.num_mbm_cntrs, tsize, GFP_KERNEL);
> + if (!d->cntr_cfg) {
> + bitmap_free(d->rmid_busy_llc);
> + kfree(d->mbm_total);
> + kfree(d->mbm_local);
> + return -ENOMEM;
> + }
> + }
>
> return 0;
> }
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index c8ab3d7a0dab..03c67d9156f3 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -94,6 +94,16 @@ struct rdt_ctrl_domain {
> u32 *mbps_val;
> };
>
> +/**
> + * struct mbm_cntr_cfg -Assignable counter configuration
Please compare with style use in rest of the file. For example,
"-Assignable" -> "- assignable"
> + * @evtid: Event type
This description is not useful. Consider: "MBM event to which
the counter is assigned. Only valid if @rdtgroup is not NULL."
(This was the first thing that came to my mind, please improve)
> + * @rdtgroup: Resctrl group assigned to the counter
Can add "NULL if counter is free"
> + */
> +struct mbm_cntr_cfg {
> + enum resctrl_event_id evtid;
> + struct rdtgroup *rdtgrp;
> +};
> +
> /**
> * struct rdt_mon_domain - group of CPUs sharing a resctrl monitor resource
> * @hdr: common header for different domain types
> @@ -105,6 +115,7 @@ struct rdt_ctrl_domain {
> * @cqm_limbo: worker to periodically read CQM h/w counters
> * @mbm_work_cpu: worker CPU for MBM h/w counters
> * @cqm_work_cpu: worker CPU for CQM h/w counters
> + * @cntr_cfg: Assignable counters configuration
Match capitalization of surrounding text.
Will be helpful to add that this is an array indexed by counter ID.
> */
> struct rdt_mon_domain {
> struct rdt_domain_hdr hdr;
> @@ -116,6 +127,7 @@ struct rdt_mon_domain {
> struct delayed_work cqm_limbo;
> int mbm_work_cpu;
> int cqm_work_cpu;
> + struct mbm_cntr_cfg *cntr_cfg;
> };
>
> /**
Reinette
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: RE: [PATCH v10 16/24] x86/resctrl: Add interface to the assign counter
2024-12-19 21:45 ` Luck, Tony
@ 2024-12-19 22:33 ` Moger, Babu
0 siblings, 0 replies; 76+ messages in thread
From: Moger, Babu @ 2024-12-19 22:33 UTC (permalink / raw)
To: Luck, Tony, Chatre, Reinette, Babu Moger
Cc: corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, peternewman@google.com,
Yu, Fenghua, x86@kernel.org, hpa@zytor.com, paulmck@kernel.org,
akpm@linux-foundation.org, thuth@redhat.com, rostedt@goodmis.org,
xiongwei.song@windriver.com, pawan.kumar.gupta@linux.intel.com,
daniel.sneddon@linux.intel.com, jpoimboe@kernel.org,
perry.yuan@amd.com, Huang, Kai, Li, Xiaoyao, seanjc@google.com,
Li, Xin3, andrew.cooper3@citrix.com, ebiggers@google.com,
mario.limonciello@amd.com, james.morse@arm.com,
tan.shaopeng@fujitsu.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, Wieczor-Retman, Maciej,
Eranian, Stephane
Hi Tony,
On 12/19/2024 3:45 PM, Luck, Tony wrote:
>>>>>>>> It is right thing to continue assignment if one of the domain is out of
>>>>>>>> counters. In that case how about we save the error(say error_domain) and
>>>>>>>> continue. And finally return success if both ret and error_domain are zeros.
>>>>>>>>
>>>>>>>> return ret ? ret : error_domain:
>>>>>>>
>>>>>>> If there are many domains, then you might have 3 succeed and 5 fail.
>>>>>>>
>>>>>>> I think the best you can do is return success if everything succeeded
>>>>>>> and an error if any failed.
>>>>>>
>>>>>> Yes. The above check should take care of this case.
>>>>>>
>>>>>
>>>>> If I understand correctly "error_domain" can capture the ID of
>>>>> a single failing domain. If there are multiple failing domains like
>>>>> in Tony's example then "error_domain" will not be accurate and thus
>>>>> can never be trusted. Instead of a single check of a failure user
>>>>> space is then forced to parse the more complex "mbm_assign_control"
>>>>> file to learn what succeeded and failed.
>>>>>
>>>>> Would it not be simpler to process sequentially and then fail on
>>>>> first error encountered with detailed error message? With that
>>>>> user space can determine exactly which portion of request
>>>>> succeeded and which portion failed.
>>>>
>>>> One more option is to print the error for each failure and continue. And finally return error.
>
> There's limited space allocated for use by last_cmd_*() messages:
>
> static char last_cmd_status_buf[512];
>
> seq_buf_init(&last_cmd_status, last_cmd_status_buf,
> sizeof(last_cmd_status_buf));
>
> If you keep parsing and trying to apply changes from user input you will
> quickly hit that limit.
oh. ok. Good to know.
>
>
>>>>
>>>> "Group mon1, domain:1 Out of MBM counters"
>>>>
>>>> We have the error information as well as the convenience of assignment on domains where counters are available when user is working with "*"(all domains).
>>>
>>> This may be possible. Please keep in mind that any errors have to be
>>> easily consumed in an automated way to support the user space tools
>>> that interact with resctrl. I do not think we have thus far focused
>>> on the "last_cmd_status" buffer as part of the user space ABI so this opens
>>> up more considerations.
>>>
>>> At this time the error handling of "all domains" does not seem to be
>>> consistent and obvious to user space. From what I can tell the
>>> implementation continues on to the next domain if one domain is out
>>> of counters but it exits immediately if a counter cannot be configured
>>> on a particular domain.
>>
>> Yes. We can handle both the errors in the same way.
>
> I think it is simplest to make the "same way" be "fail on first error".
Ok. Sure. Will do thanks.
-Babu
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 13/24] x86/resctrl: Introduce interface to display number of free counters
2024-12-12 20:15 ` [PATCH v10 13/24] x86/resctrl: Introduce interface to display number of free counters Babu Moger
@ 2024-12-19 22:50 ` Reinette Chatre
2024-12-20 18:05 ` Moger, Babu
2024-12-20 18:32 ` Moger, Babu
0 siblings, 2 replies; 76+ messages in thread
From: Reinette Chatre @ 2024-12-19 22:50 UTC (permalink / raw)
To: Babu Moger, corbet, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, sandipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
(andipan.das@amd.com -> sandipan.das@amd.com to stop sending undeliverable emails)
Hi Babu,
On 12/12/24 12:15 PM, Babu Moger wrote:
> Provide the interface to display the number of monitoring counters
> available for assignment in each domain when mbm_cntr_assign is supported.
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v10: Patch changed to handle the counters at domain level.
> https://lore.kernel.org/lkml/CALPaoCj+zWq1vkHVbXYP0znJbe6Ke3PXPWjtri5AFgD9cQDCUg@mail.gmail.com/
> So, display logic also changed now.
>
> v9: New patch
> ---
> Documentation/arch/x86/resctrl.rst | 4 +++
> arch/x86/kernel/cpu/resctrl/monitor.c | 1 +
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 47 ++++++++++++++++++++++++++
> 3 files changed, 52 insertions(+)
>
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index 43a861adeada..c075fcee96b7 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -302,6 +302,10 @@ with the following files:
> memory bandwidth tracking to a single memory bandwidth event per
> monitoring group.
>
> +"available_mbm_cntrs":
> + The number of monitoring counters available for assignment in each
> + domain when the architecture supports mbm_cntr_assign mode.
"architecture supports" -> "system supports"
It looks to me as though more than just support is required, the mode
is also required to be enabled?
> +
> "max_threshold_occupancy":
> Read/write file provides the largest value (in
> bytes) at which a previously used LLC_occupancy
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index b07d60fabf1c..f857af361af1 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -1238,6 +1238,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
> cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
> r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
> resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
> + resctrl_file_fflags_init("available_mbm_cntrs", RFTYPE_MON_INFO);
> }
> }
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 1ee008a63d8b..72518e0ec2ec 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -879,6 +879,47 @@ static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
> return 0;
> }
>
> +static int rdtgroup_available_mbm_cntrs_show(struct kernfs_open_file *of,
> + struct seq_file *s, void *v)
rdtgroup_
> +{
> + struct rdt_resource *r = of->kn->parent->priv;
> + struct rdt_mon_domain *dom;
> + bool sep = false;
> + u32 cntrs, i;
> + int ret = 0;
> +
> + cpus_read_lock();
> + mutex_lock(&rdtgroup_mutex);
> +
> + if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
> + rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
> + ret = -EINVAL;
> + goto unlock_cntrs_show;
> + }
> +
> +
unnecessary empty line
> + list_for_each_entry(dom, &r->mon_domains, hdr.list) {
> + if (sep)
> + seq_puts(s, ";");
> +
> + cntrs = 0;
> + for (i = 0; i < r->mon.num_mbm_cntrs; i++) {
> + if (!dom->cntr_cfg[i].rdtgrp)
> + cntrs++;
> + }
> +
> + seq_printf(s, "%d=%d", dom->hdr.id, cntrs);
> + sep = true;
> + }
> + seq_puts(s, "\n");
> +
> +unlock_cntrs_show:
> + mutex_unlock(&rdtgroup_mutex);
> + cpus_read_unlock();
> +
> + return ret;
> +}
> +
> #ifdef CONFIG_PROC_CPU_RESCTRL
>
> /*
> @@ -1961,6 +2002,12 @@ static struct rftype res_common_files[] = {
> .kf_ops = &rdtgroup_kf_single_ops,
> .seq_show = rdtgroup_num_mbm_cntrs_show,
> },
> + {
> + .name = "available_mbm_cntrs",
> + .mode = 0444,
> + .kf_ops = &rdtgroup_kf_single_ops,
> + .seq_show = rdtgroup_available_mbm_cntrs_show,
> + },
> {
> .name = "cpus_list",
> .mode = 0644,
Reinette
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 15/24] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
2024-12-12 20:15 ` [PATCH v10 15/24] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC Babu Moger
@ 2024-12-19 23:04 ` Reinette Chatre
2024-12-20 19:22 ` Moger, Babu
0 siblings, 1 reply; 76+ messages in thread
From: Reinette Chatre @ 2024-12-19 23:04 UTC (permalink / raw)
To: Babu Moger, corbet, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, sandipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
(andipan.das@amd.com -> sandipan.das@amd.com to stop sending undeliverable emails)
Hi Babu,
On 12/12/24 12:15 PM, Babu Moger wrote:
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID, event pair and monitor the bandwidth as long as it is
> assigned. The assigned RMID will be tracked by the hardware until the user
> unassigns it manually.
>
> Configure the counters by writing to the L3_QOS_ABMC_CFG MSR and specifying
> the counter ID, bandwidth source (RMID), and bandwidth event configuration.
>
> Provide the interface to assign the counter ids to RMID.
Until now in this series many patches "introduced interface X" and every
time it was some new resctrl file that user space interacts with. This
changelog starts with a context about "user to assign a hardware counter"
and ends with "Provide the interface", but there is no new user interface
in this patch. Can this be more specific about what this patch does?
>
> The feature details are documented in the APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
....
> ---
> arch/x86/kernel/cpu/resctrl/internal.h | 3 ++
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 58 ++++++++++++++++++++++++++
> 2 files changed, 61 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 35bcf0e5ba7e..849bcfe4ea5b 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -701,5 +701,8 @@ bool closid_allocated(unsigned int closid);
> int resctrl_find_cleanest_closid(void);
> void arch_mbm_evt_config_init(struct rdt_hw_mon_domain *hw_dom);
> unsigned int mon_event_config_index_get(u32 evtid);
> +int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
> + enum resctrl_event_id evtid, u32 rmid, u32 closid,
> + u32 cntr_id, bool assign);
>
> #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 72518e0ec2ec..e895d2415f22 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1686,6 +1686,34 @@ unsigned int mon_event_config_index_get(u32 evtid)
> }
> }
>
> +struct cntr_config {
> + struct rdt_resource *r;
> + struct rdt_mon_domain *d;
> + enum resctrl_event_id evtid;
> + u32 rmid;
> + u32 closid;
> + u32 cntr_id;
> + u32 val;
> + bool assign;
> +};
I think I am missing something because it is not clear to me why this
new struct is needed. Why not just use union l3_qos_abmc_cfg?
If it is indeed needed it needs better formatting and clear descriptions,
a member like "val" is very generic.
> +
> +static void resctrl_abmc_config_one_amd(void *info)
> +{
> + struct cntr_config *config = info;
> + union l3_qos_abmc_cfg abmc_cfg = { 0 };
> +
reverse fir
> + abmc_cfg.split.cfg_en = 1;
> + abmc_cfg.split.cntr_en = config->assign ? 1 : 0;
> + abmc_cfg.split.cntr_id = config->cntr_id;
> + abmc_cfg.split.bw_src = config->rmid;
> + abmc_cfg.split.bw_type = config->val;
> +
> + wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, abmc_cfg.full);
> +
> + resctrl_arch_reset_rmid(config->r, config->d, config->closid,
> + config->rmid, config->evtid);
> +}
> +
> static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
> {
> struct rdt_mon_domain *dom;
> @@ -1869,6 +1897,36 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
> return ret ?: nbytes;
> }
>
> +/*
> + * Send an IPI to the domain to assign the counter to RMID, event pair.
> + */
> +int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
> + enum resctrl_event_id evtid, u32 rmid, u32 closid,
> + u32 cntr_id, bool assign)
> +{
> + struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
> + struct cntr_config config = { 0 };
Please see 29eaa7958367 ("x86/resctrl: Slightly clean-up mbm_config_show()")
> +
> + config.r = r;
> + config.d = d;
> + config.evtid = evtid;
> + config.rmid = rmid;
> + config.closid = closid;
> + config.cntr_id = cntr_id;
> +
> + /* Update the event configuration from the domain */
> + if (evtid == QOS_L3_MBM_TOTAL_EVENT_ID)
> + config.val = hw_dom->mbm_total_cfg;
> + else
> + config.val = hw_dom->mbm_local_cfg;
> +
> + config.assign = assign;
> +
> + smp_call_function_any(&d->hdr.cpu_mask, resctrl_abmc_config_one_amd, &config, 1);
> +
> + return 0;
> +}
> +
> /* rdtgroup information files for one cache resource. */
> static struct rftype res_common_files[] = {
> {
Reinette
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 16/24] x86/resctrl: Add interface to the assign counter
2024-12-12 20:15 ` [PATCH v10 16/24] x86/resctrl: Add interface to the assign counter Babu Moger
2024-12-12 23:37 ` Luck, Tony
@ 2024-12-19 23:22 ` Reinette Chatre
2024-12-20 20:34 ` Moger, Babu
1 sibling, 1 reply; 76+ messages in thread
From: Reinette Chatre @ 2024-12-19 23:22 UTC (permalink / raw)
To: Babu Moger, corbet, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, sandipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Babu,
On 12/12/24 12:15 PM, Babu Moger wrote:
> The mbm_cntr_assign mode offers several counters that can be assigned
> to an RMID, event pair and monitor the bandwidth as long as it is
> assigned.
>
> Counters are managed at the domain level. Introduce the interface to
> allocate/free/assign the counters.
Changelog of previous patch also claimed to "Provide the interface to assign the
counter ids to RMID." Please let changelogs describe the change more accurately.
(This still does not provide a user interface so what is meant by interface is
unclear)
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 849bcfe4ea5b..70d2577fc377 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -704,5 +704,8 @@ unsigned int mon_event_config_index_get(u32 evtid);
> int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
> enum resctrl_event_id evtid, u32 rmid, u32 closid,
> u32 cntr_id, bool assign);
> -
> +int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
> + struct rdt_mon_domain *d, enum resctrl_event_id evtid);
Could you please be consistent in the ordering of parameters?
int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
> +struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
> + u32 rmid, enum resctrl_event_id evtid);
> #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index f857af361af1..8823cd97ff1f 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -575,8 +575,8 @@ void free_rmid(u32 closid, u32 rmid)
> list_add_tail(&entry->list, &rmid_free_lru);
> }
>
> -static struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
> - u32 rmid, enum resctrl_event_id evtid)
> +struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
> + u32 rmid, enum resctrl_event_id evtid)
> {
> u32 idx = resctrl_arch_rmid_idx_encode(closid, rmid);
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index e895d2415f22..1c8694a68cf4 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1927,6 +1927,116 @@ int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
> return 0;
> }
>
> +/*
> + * Configure the counter for the event, RMID pair for the domain.
This description can be more helpful ... it essentially just re-writes function
header.
> + */
> +static int resctrl_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
> + enum resctrl_event_id evtid, u32 rmid, u32 closid,
> + u32 cntr_id, bool assign)
> +{
> + struct mbm_state *m;
> + int ret;
> +
> + ret = resctrl_arch_config_cntr(r, d, evtid, rmid, closid, cntr_id, assign);
> + if (ret)
> + return ret;
> +
> + m = get_mbm_state(d, closid, rmid, evtid);
> + if (m)
> + memset(m, 0, sizeof(struct mbm_state));
> +
> + return ret;
> +}
> +
> +static bool mbm_cntr_assigned(struct rdt_resource *r, struct rdt_mon_domain *d,
> + struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
> +{
> + int cntr_id;
> +
> + for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
> + if (d->cntr_cfg[cntr_id].rdtgrp == rdtgrp &&
> + d->cntr_cfg[cntr_id].evtid == evtid)
> + return true;
> + }
> +
> + return false;
> +}
> +
> +static int mbm_cntr_alloc(struct rdt_resource *r, struct rdt_mon_domain *d,
> + struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
> +{
> + int cntr_id;
> +
> + for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
> + if (!d->cntr_cfg[cntr_id].rdtgrp) {
> + d->cntr_cfg[cntr_id].rdtgrp = rdtgrp;
> + d->cntr_cfg[cntr_id].evtid = evtid;
> + return cntr_id;
> + }
> + }
> +
> + return -EINVAL;
This can be -ENOSPC
> +}
> +
> +static void mbm_cntr_free(struct rdt_resource *r, struct rdt_mon_domain *d,
> + struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
> +{
> + int cntr_id;
> +
> + for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
> + if (d->cntr_cfg[cntr_id].rdtgrp == rdtgrp &&
> + d->cntr_cfg[cntr_id].evtid == evtid)
> + memset(&d->cntr_cfg[cntr_id], 0, sizeof(struct mbm_cntr_cfg));
> + }
> +}
From what I can tell the counter ID is always available when the counter is freed so
it can just be freed directly without looping over array?
> +
> +/*
> + * Assign a hardware counter to event @evtid of group @rdtgrp.
> + * Counter will be assigned to all the domains if rdt_mon_domain is NULL
(to be consistent) "if rdt_mon_domain is NULL" -> "if @d is NULL"
> + * else the counter will be assigned to specific domain.
"will be assigned to specific domain" -> "will be assigned to @d"
Reinette
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 17/24] x86/resctrl: Add the interface to unassign a counter
2024-12-12 20:15 ` [PATCH v10 17/24] x86/resctrl: Add the interface to unassign a counter Babu Moger
@ 2024-12-19 23:32 ` Reinette Chatre
2024-12-20 21:38 ` Moger, Babu
0 siblings, 1 reply; 76+ messages in thread
From: Reinette Chatre @ 2024-12-19 23:32 UTC (permalink / raw)
To: Babu Moger, corbet, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, sandipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Babu,
On 12/12/24 12:15 PM, Babu Moger wrote:
> The mbm_cntr_assign mode provides a limited number of hardware counters
> that can be assigned to an RMID, event pair to monitor bandwidth while
> assigned. If all counters are in use, the kernel will show an error
> message: "Out of MBM assignable counters" when a new assignment is
> requested. To make space for a new assignment, users must unassign an
> already assigned counter.
>
> Introduce an interface that allows for the unassignment of counter IDs
> from the domain.
Subject and changelog claims this introduces an interface, there is no new
resctrl interface introduced here. Can this be more specific?
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> ---
> arch/x86/kernel/cpu/resctrl/internal.h | 2 +
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 52 ++++++++++++++++++++++++++
> 2 files changed, 54 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 70d2577fc377..f858098dbe4b 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -706,6 +706,8 @@ int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
> u32 cntr_id, bool assign);
> int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
> struct rdt_mon_domain *d, enum resctrl_event_id evtid);
> +int rdtgroup_unassign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
> + struct rdt_mon_domain *d, enum resctrl_event_id evtid);
(please use consistent parameter ordering)
> struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
> u32 rmid, enum resctrl_event_id evtid);
> #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 1c8694a68cf4..a71a8389b649 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1990,6 +1990,20 @@ static void mbm_cntr_free(struct rdt_resource *r, struct rdt_mon_domain *d,
> }
> }
>
> +static int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
> + struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
> +{
> + int cntr_id;
> +
> + for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
> + if (d->cntr_cfg[cntr_id].rdtgrp == rdtgrp &&
> + d->cntr_cfg[cntr_id].evtid == evtid)
> + return cntr_id;
> + }
> +
> + return -EINVAL;
This could be -ENOENT?
> +}
mbm_cntr_get() seems to be essentially a duplicate of mbm_cntr_assigned() that returns
actual counter ID instrad of true/false. Could only one be used?
> +
> /*
> * Assign a hardware counter to event @evtid of group @rdtgrp.
> * Counter will be assigned to all the domains if rdt_mon_domain is NULL
> @@ -2037,6 +2051,44 @@ int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
> return ret;
> }
>
> +/*
> + * Unassign a hardware counter associated with @evtid from the domain and
> + * the group. Unassign the counters from all the domains if rdt_mon_domain
> + * is NULL else unassign from the specific domain.
(same comment as previous patch about consistency in referring to function
parameters)
> + */
> +int rdtgroup_unassign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
> + struct rdt_mon_domain *d, enum resctrl_event_id evtid)
> +{
> + int cntr_id, ret = 0;
> +
> + if (!d) {
> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
> + if (!mbm_cntr_assigned(r, d, rdtgrp, evtid))
> + continue;
> +
> + cntr_id = mbm_cntr_get(r, d, rdtgrp, evtid);
> +
It seems unnecessary to loop over array twice here. mbm_cntr_assigned() seems
unnecessary. Return value of mbm_cntr_get() can be used to determine if it
is assigned or not?
> + ret = resctrl_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
> + rdtgrp->closid, cntr_id, false);
> + if (!ret)
> + mbm_cntr_free(r, d, rdtgrp, evtid);
... and by providing cntr_id to mbm_cntr_free() another unnecessary loop can be avoided.
> + }
> + } else {
> + if (!mbm_cntr_assigned(r, d, rdtgrp, evtid))
> + goto out_done_unassign;
> +
> + cntr_id = mbm_cntr_get(r, d, rdtgrp, evtid);
> +
> + ret = resctrl_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
> + rdtgrp->closid, cntr_id, false);
> + if (!ret)
> + mbm_cntr_free(r, d, rdtgrp, evtid);
> + }
> +
> +out_done_unassign:
> + return ret;
> +}
> +
> /* rdtgroup information files for one cache resource. */
> static struct rftype res_common_files[] = {
> {
Reinette
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 18/24] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled
2024-12-12 20:15 ` [PATCH v10 18/24] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled Babu Moger
@ 2024-12-19 23:39 ` Reinette Chatre
2024-12-21 13:45 ` Moger, Babu
0 siblings, 1 reply; 76+ messages in thread
From: Reinette Chatre @ 2024-12-19 23:39 UTC (permalink / raw)
To: Babu Moger, corbet, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, sandipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Babu,
On 12/12/24 12:15 PM, Babu Moger wrote:
> Assign/unassign counters on resctrl group creation/deletion. Two counters
> are required per group, one for MBM total event and one for MBM local
> event.
>
> There are a limited number of counters available for assignment. If these
> counters are exhausted, the kernel will display the error message: "Out of
> MBM assignable counters". However, it is not necessary to fail the
> creation of a group due to assignment failures. Users have the flexibility
> to modify the assignments at a later time.
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> ---
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 81 +++++++++++++++++++++++++-
> 1 file changed, 79 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index a71a8389b649..5acae525881a 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -920,6 +920,25 @@ static int rdtgroup_available_mbm_cntrs_show(struct kernfs_open_file *of,
> return ret;
> }
>
> +static void mbm_cntr_reset(struct rdt_resource *r)
> +{
> + struct rdt_mon_domain *dom;
> +
> + /*
> + * Hardware counters will reset after switching the monitor mode.
> + * Reset the architectural state so that reading of hardware
> + * counter is not considered as an overflow in the next update.
> + * Also reset the domain counter bitmap.
> + */
> + if (is_mbm_enabled() && r->mon.mbm_cntr_assignable) {
> + list_for_each_entry(dom, &r->mon_domains, hdr.list) {
> + memset(dom->cntr_cfg, 0,
> + sizeof(*dom->cntr_cfg) * r->mon.num_mbm_cntrs);
> + resctrl_arch_reset_rmid_all(r, dom);
This looks to be missing reset of resctrl monitor state (from get_mbm_state()).
...
> static int rdt_get_tree(struct fs_context *fc)
> {
> struct rdt_fs_context *ctx = rdt_fc2context(fc);
> @@ -3023,6 +3082,8 @@ static int rdt_get_tree(struct fs_context *fc)
> if (ret < 0)
> goto out_info;
>
> + rdtgroup_assign_cntrs(&rdtgroup_default);
> +
> ret = mkdir_mondata_all(rdtgroup_default.kn,
> &rdtgroup_default, &kn_mondata);
> if (ret < 0)
If this mkdir_mondata_all() fails it calls "goto out_mongrp" ...
> @@ -3058,8 +3119,10 @@ static int rdt_get_tree(struct fs_context *fc)
> out_psl:
> rdt_pseudo_lock_release();
> out_mondata:
> - if (resctrl_arch_mon_capable())
> + if (resctrl_arch_mon_capable()) {
> kernfs_remove(kn_mondata);
> + rdtgroup_unassign_cntrs(&rdtgroup_default);
> + }
> out_mongrp:
> if (resctrl_arch_mon_capable())
> kernfs_remove(kn_mongrp);
Looks like this will miss counter cleanup on failure of mkdir_mondata_all().
> @@ -3238,6 +3301,7 @@ static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp)
>
> head = &rdtgrp->mon.crdtgrp_list;
> list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) {
> + rdtgroup_unassign_cntrs(sentry);
> free_rmid(sentry->closid, sentry->mon.rmid);
> list_del(&sentry->mon.crdtgrp_list);
>
> @@ -3278,6 +3342,8 @@ static void rmdir_all_sub(void)
> cpumask_or(&rdtgroup_default.cpu_mask,
> &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);
>
> + rdtgroup_unassign_cntrs(rdtgrp);
> +
> free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>
> kernfs_remove(rdtgrp->kn);
> @@ -3309,6 +3375,8 @@ static void rdt_kill_sb(struct super_block *sb)
> for_each_alloc_capable_rdt_resource(r)
> reset_all_ctrls(r);
> rmdir_all_sub();
> + rdtgroup_unassign_cntrs(&rdtgroup_default);
> + mbm_cntr_reset(&rdt_resources_all[RDT_RESOURCE_L3].r_resctrl);
> rdt_pseudo_lock_release();
> rdtgroup_default.mode = RDT_MODE_SHAREABLE;
> schemata_list_destroy();
> @@ -3772,6 +3840,8 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
> }
> rdtgrp->mon.rmid = ret;
>
> + rdtgroup_assign_cntrs(rdtgrp);
> +
> ret = mkdir_mondata_all(rdtgrp->kn, rdtgrp, &rdtgrp->mon.mon_data_kn);
> if (ret) {
> rdt_last_cmd_puts("kernfs subdir error\n");
Cleanup of assigned counters if mkdir_mondata_all() fails seems to be missing here also.
> @@ -3784,8 +3854,10 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
>
> static void mkdir_rdt_prepare_rmid_free(struct rdtgroup *rgrp)
> {
> - if (resctrl_arch_mon_capable())
> + if (resctrl_arch_mon_capable()) {
> + rdtgroup_unassign_cntrs(rgrp);
> free_rmid(rgrp->closid, rgrp->mon.rmid);
> + }
> }
>
> static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
> @@ -4044,6 +4116,9 @@ static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
> update_closid_rmid(tmpmask, NULL);
>
> rdtgrp->flags = RDT_DELETED;
> +
> + rdtgroup_unassign_cntrs(rdtgrp);
> +
> free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>
> /*
> @@ -4090,6 +4165,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
> cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
> update_closid_rmid(tmpmask, NULL);
>
> + rdtgroup_unassign_cntrs(rdtgrp);
> +
> free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
> closid_free(rdtgrp->closid);
>
Reinette
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 19/24] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode
2024-12-12 20:15 ` [PATCH v10 19/24] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode Babu Moger
@ 2024-12-19 23:59 ` Reinette Chatre
2024-12-21 14:04 ` Moger, Babu
0 siblings, 1 reply; 76+ messages in thread
From: Reinette Chatre @ 2024-12-19 23:59 UTC (permalink / raw)
To: Babu Moger, corbet, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Babu,
On 12/12/24 12:15 PM, Babu Moger wrote:
> In mbm_cntr_assign mode, the hardware counter should be assigned to read
> the MBM events.
>
> Report 'Unassigned' in case the user attempts to read the events without
> assigning the counter.
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
..
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index c075fcee96b7..3ec14c314606 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -430,6 +430,16 @@ When monitoring is enabled all MON groups will also contain:
> for the L3 cache they occupy). These are named "mon_sub_L3_YY"
> where "YY" is the node number.
>
> + When supported the mbm_cntr_assign mode allows users to assign a
"When supported" -> "When enabled"? Or perhaps just drop that and start with
"mbm_cntr_assign mode allows users ..."
> + counter to mon_hw_id, event pair enabling bandwidth monitoring for
> + as long as the counter remains assigned. The hardware will continue
> + tracking the assigned mon_hw_id until the user manually unassigns
> + it, ensuring that counters are not reset during this period. With
> + a limited number of counters, the system may run out of assignable
> + counters. In that case, MBM event counters will return 'Unassigned'
> + when the event is read. Users must manually assign a counter to read
> + the events.
> +
> "mon_hw_id":
> Available only with debug option. The identifier used by hardware
> for the monitor group. On x86 this is the RMID.
> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> index 200d89a64027..8e265a86e524 100644
> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> @@ -527,6 +527,12 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
> /* When picking a CPU from cpu_mask, ensure it can't race with cpuhp */
> lockdep_assert_cpus_held();
>
> + if (resctrl_arch_mbm_cntr_assign_enabled(r) && is_mbm_event(evtid) &&
> + !mbm_cntr_assigned(r, d, rdtgrp, evtid)) {
> + rr->err = -ENOENT;
> + return;
> + }
> +
hmmm ... d can be NULL here after the SNC support. Since the file that needs a
sum is essentially software backed I do not think assigning counters would
apply to it (but it may theoretically apply to the domains it consists of).
I think it may be safer to just move this check into rdtgroup_mondata_show()
where it reads data for a single domain.
I am not sure if we need to change the documentation because of this. One option
could be a rewording to "MBM event counters may return 'Unassigned' or
'Unavailable' when the event is read".
Reinette
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 20/24] x86/resctrl: Introduce the interface to switch between monitor modes
2024-12-12 20:15 ` [PATCH v10 20/24] x86/resctrl: Introduce the interface to switch between monitor modes Babu Moger
@ 2024-12-20 2:56 ` Reinette Chatre
2024-12-21 14:20 ` Moger, Babu
0 siblings, 1 reply; 76+ messages in thread
From: Reinette Chatre @ 2024-12-20 2:56 UTC (permalink / raw)
To: Babu Moger, corbet, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Babu,
On 12/12/24 12:15 PM, Babu Moger wrote:
> Introduce interface to switch between mbm_cntr_assign and default modes.
>
This changelog needs context.
> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> [mbm_cntr_assign]
> default
>
> To enable the "mbm_cntr_assign" mode:
> $ echo "mbm_cntr_assign" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>
> To enable the default monitoring mode:
> $ echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>
> MBM event counters will reset when mbm_assign_mode is changed.
I think it will help to elaborate on this.
I understand this as two parts. As stated, the hardware counters
are reset since that is what ABMC does. In this patch
there is a mbm_cntr_reset() but that does not actually reset the counters as
the above implies.
Instead, the counters are automatically reset as part of changing the mode.
resctrl triggers reset of architectural and non-architectural
state of the events because of the hardware counter reset.
The changelog can really do more to explain what this patch does.
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> ---
> Documentation/arch/x86/resctrl.rst | 15 ++++++++
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 50 +++++++++++++++++++++++++-
> 2 files changed, 64 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index 3ec14c314606..d3a8a34cf629 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -290,6 +290,21 @@ with the following files:
> "mbm_total_bytes" or "mbm_local_bytes" will report 'Unavailable' if
> there is no counter associated with that event.
>
> + * To enable "mbm_cntr_assign" mode:
> + ::
> +
> + # echo "mbm_cntr_assign" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> +
> + * To enable default monitoring mode:
> + ::
> +
> + # echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> +
> + The MBM events (mbm_total_bytes and/or mbm_local_bytes) associated with
> + counters may reset when "mbm_assign_mode" is changed. Moving to
After looking at the final documentation it seems more appropriate to move this to
the top of the "mbm_assign_mode" section. The top already shows how to read from the
file using cat so it seems like a good match to document write to the file in the
same area.
> + mbm_cntr_assign mode require users to assign the counters to the events.
> + Otherwise, the MBM event counters will return "Unassigned" when read.
This portion can move to the mode it applies to.
> +
> "num_mbm_cntrs":
> The number of monitoring counters available for assignment when the
> architecture supports mbm_cntr_assign mode.
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 8d00b1689a80..eea534cce3d0 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -939,6 +939,53 @@ static void mbm_cntr_reset(struct rdt_resource *r)
> }
> }
>
> +static ssize_t rdtgroup_mbm_assign_mode_write(struct kernfs_open_file *of,
> + char *buf, size_t nbytes, loff_t off)
rdtgroup_ namespace is not appropriate
> +{
> + struct rdt_resource *r = of->kn->parent->priv;
> + int ret = 0;
> + bool enable;
> +
> + /* Valid input requires a trailing newline */
> + if (nbytes == 0 || buf[nbytes - 1] != '\n')
> + return -EINVAL;
> +
> + buf[nbytes - 1] = '\0';
> +
> + cpus_read_lock();
> + mutex_lock(&rdtgroup_mutex);
> +
> + rdt_last_cmd_clear();
> +
> + if (!strcmp(buf, "default")) {
> + enable = 0;
> + } else if (!strcmp(buf, "mbm_cntr_assign")) {
> + if (r->mon.mbm_cntr_assignable) {
> + enable = 1;
> + } else {
> + ret = -EINVAL;
> + rdt_last_cmd_puts("mbm_cntr_assign mode is not supported\n");
> + goto write_exit;
> + }
> + } else {
> + ret = -EINVAL;
> + rdt_last_cmd_puts("Unsupported assign mode\n");
> + goto write_exit;
> + }
> +
> + if (enable != resctrl_arch_mbm_cntr_assign_enabled(r)) {
> + ret = resctrl_arch_mbm_cntr_assign_set(r, enable);
> + if (!ret)
> + mbm_cntr_reset(r);
> + }
> +
> +write_exit:
> + mutex_unlock(&rdtgroup_mutex);
> + cpus_read_unlock();
> +
> + return ret ?: nbytes;
> +}
> +
> #ifdef CONFIG_PROC_CPU_RESCTRL
>
> /*
> @@ -2222,9 +2269,10 @@ static struct rftype res_common_files[] = {
> },
> {
> .name = "mbm_assign_mode",
> - .mode = 0444,
> + .mode = 0644,
> .kf_ops = &rdtgroup_kf_single_ops,
> .seq_show = rdtgroup_mbm_assign_mode_show,
> + .write = rdtgroup_mbm_assign_mode_write,
> .fflags = RFTYPE_MON_INFO,
> },
> {
Reinette
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 21/24] x86/resctrl: Configure mbm_cntr_assign mode if supported
2024-12-12 20:15 ` [PATCH v10 21/24] x86/resctrl: Configure mbm_cntr_assign mode if supported Babu Moger
@ 2024-12-20 3:03 ` Reinette Chatre
2024-12-21 14:33 ` Moger, Babu
0 siblings, 1 reply; 76+ messages in thread
From: Reinette Chatre @ 2024-12-20 3:03 UTC (permalink / raw)
To: Babu Moger, corbet, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Babu,
On 12/12/24 12:15 PM, Babu Moger wrote:
> Configure mbm_cntr_assign on AMD. 'mbm_cntr_assign' mode in AMD is ABMC
> (Assignable Bandwidth Monitoring Counters). It is enabled by default when
> supported on the system.
Needs imperative "Enable mbm_cntr_assign mode ..."
>
> Ensure that the ABMC is updated on all logical processors in the resctrl
> domain.
Needs imperative (for example) "Update the assignable counter mode .."
Please distinguish how it is the architecture that decides what the
default mode should be. resctrl's part is to ensure that architecture
gets opportunity to configure every logical processor as it comes online.
Reinette
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 22/24] x86/resctrl: Update assignments on event configuration changes
2024-12-12 20:15 ` [PATCH v10 22/24] x86/resctrl: Update assignments on event configuration changes Babu Moger
@ 2024-12-20 3:12 ` Reinette Chatre
2024-12-21 14:59 ` Moger, Babu
0 siblings, 1 reply; 76+ messages in thread
From: Reinette Chatre @ 2024-12-20 3:12 UTC (permalink / raw)
To: Babu Moger, corbet, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Babu,
On 12/12/24 12:15 PM, Babu Moger wrote:
> Resctrl provides option to configure events by writing to the interfaces
> /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config or
> /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config when BMEC (Bandwidth
> Monitoring Event Configuration) is supported.
>
> Whenever the event configuration is updated, MBM assignments must be
> revised across all monitor groups within the impacted domains.
This needs imperative tone description of what this patch does.
...
> @@ -1825,6 +1825,54 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
> return 0;
> }
>
> +/*
> + * Review the cntr_cfg domain configuration. If a matching assignment is found,
> + * update the counter assignment accordingly. This is within the IPI Context,
This "Review the cntr_cfg domain configuration. If a matching assignment is found,"
is too vague for me to make sense of what it is trying to do. Can this be made more specific?
> + * so call resctrl_abmc_config_one_amd directly.
> + */
> +static void resctrl_arch_update_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
> + enum resctrl_event_id evtid, u32 val)
> +{
> + struct cntr_config config;
> + struct rdtgroup *rdtgrp;
> + struct mbm_state *m;
> + u32 cntr_id;
> +
> + for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
> + rdtgrp = d->cntr_cfg[cntr_id].rdtgrp;
> + if (rdtgrp && d->cntr_cfg[cntr_id].evtid == evtid) {
> + memset(&config, 0, sizeof(struct cntr_config));
> + config.r = r;
> + config.d = d;
> + config.evtid = evtid;
> + config.rmid = rdtgrp->mon.rmid;
> + config.closid = rdtgrp->closid;
> + config.cntr_id = cntr_id;
> + config.val = val;
> + config.assign = 1;
> +
> + resctrl_abmc_config_one_amd(&config);
> +
> + m = get_mbm_state(d, rdtgrp->closid, rdtgrp->mon.rmid, evtid);
> + if (m)
> + memset(m, 0, sizeof(struct mbm_state));
> + }
> + }
> +}
> +
> +static void resctrl_mon_event_config_set(void *info)
> +{
> + struct mon_config_info *mon_info = info;
> + struct rdt_mon_domain *d = mon_info->d;
> + struct rdt_resource *r = mon_info->r;
> +
> + resctrl_arch_mon_event_config_set(d, mon_info->evtid, mon_info->mon_config);
> +
> + /* Check if assignments needs to be updated */
> + if (resctrl_arch_mbm_cntr_assign_enabled(r))
> + resctrl_arch_update_cntr(r, d, mon_info->evtid,
> + mon_info->mon_config);
> +}
>
> static void mbm_config_write_domain(struct rdt_resource *r,
> struct rdt_mon_domain *d, u32 evtid, u32 val)
> @@ -1840,6 +1888,7 @@ static void mbm_config_write_domain(struct rdt_resource *r,
> if (config_val == INVALID_CONFIG_VALUE || config_val == val)
> return;
>
> + mon_info.r = r;
> mon_info.d = d;
> mon_info.evtid = evtid;
> mon_info.mon_config = val;
> @@ -1851,7 +1900,7 @@ static void mbm_config_write_domain(struct rdt_resource *r,
> * on one CPU is observed by all the CPUs in the domain.
> */
> smp_call_function_any(&d->hdr.cpu_mask,
> - resctrl_arch_mon_event_config_set,
> + resctrl_mon_event_config_set,
> &mon_info, 1);
>
> /*
Reinette
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 24/24] x86/resctrl: Introduce interface to modify assignment states of the groups
2024-12-12 20:15 ` [PATCH v10 24/24] x86/resctrl: Introduce interface to modify assignment states of " Babu Moger
@ 2024-12-20 3:23 ` Reinette Chatre
2024-12-21 15:28 ` Moger, Babu
0 siblings, 1 reply; 76+ messages in thread
From: Reinette Chatre @ 2024-12-20 3:23 UTC (permalink / raw)
To: Babu Moger, corbet, tglx, mingo, bp, dave.hansen, tony.luck,
peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, sandipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Babu,
On 12/12/24 12:15 PM, Babu Moger wrote:
> Introduce the interface to assign MBM events in mbm_cntr_assign mode.
Seems like something is missing ... there is no mention about what
MBM events are assigned "to".
...
> + if (assign_state & ASSIGN_LOCAL) {
> + ret = rdtgroup_assign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID);
> + if (ret)
> + goto out_fail;
> + }
> +
> + goto next;
> +
> +out_fail:
> + sprintf(domain, d ? "%ld" : "*", dom_id);
> +
The static checker I tried complains that dom_id can be used uninitialized.
Reinette
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 07/24] x86/resctrl: Add support to enable/disable AMD ABMC feature
2024-12-19 21:48 ` Reinette Chatre
@ 2024-12-20 15:14 ` Moger, Babu
2024-12-20 17:16 ` Reinette Chatre
0 siblings, 1 reply; 76+ messages in thread
From: Moger, Babu @ 2024-12-20 15:14 UTC (permalink / raw)
To: Reinette Chatre, Babu Moger, corbet, tglx, mingo, bp, dave.hansen,
tony.luck, peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Reinette,
On 12/19/2024 3:48 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 12/12/24 12:15 PM, Babu Moger wrote:
>
>> static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r)
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 687d9d8d82a4..d54c2701c09c 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>
> These functions are clearly monitoring related. Is there a reason why they are
> in rdtgroup.c and not in monitor.c?
There is no specific reason. Most of these functions are called from
user interface. User interface handlers are defined in rdtgroup.c.
All the code in this series is related to monitoring. We can move
everything to monitor.c if you are ok with it.
>
>> @@ -2402,6 +2402,42 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable)
>> return 0;
>> }
>>
>> +static void resctrl_abmc_set_one_amd(void *arg)
>> +{
>> + bool *enable = arg;
>> +
>> + if (*enable)
>> + msr_set_bit(MSR_IA32_L3_QOS_EXT_CFG, ABMC_ENABLE_BIT);
>> + else
>> + msr_clear_bit(MSR_IA32_L3_QOS_EXT_CFG, ABMC_ENABLE_BIT);
>> +}
>> +
>> +/*
>> + * Update L3_QOS_EXT_CFG MSR on all the CPUs associated with the monitor
>> + * domain.
>> + */
>> +static void _resctrl_abmc_enable(struct rdt_resource *r, bool enable)
>> +{
>> + struct rdt_mon_domain *d;
>> +
>> + list_for_each_entry(d, &r->mon_domains, hdr.list)
>> + on_each_cpu_mask(&d->hdr.cpu_mask,
>> + resctrl_abmc_set_one_amd, &enable, 1);
>> +}
>> +
>> +int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable)
>> +{
>> + struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>> +
>> + if (r->mon.mbm_cntr_assignable &&
>> + hw_res->mbm_cntr_assign_enabled != enable) {
>> + _resctrl_abmc_enable(r, enable);
>> + hw_res->mbm_cntr_assign_enabled = enable;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> /*
>> * We don't allow rdtgroup directories to be created anywhere
>> * except the root directory. Thus when looking for the rdtgroup
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index 511cfce8fc21..f11d6fdfd977 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -355,4 +355,7 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *
>> extern unsigned int resctrl_rmid_realloc_threshold;
>> extern unsigned int resctrl_rmid_realloc_limit;
>>
>> +int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable);
>> +bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r);
>> +
>> #endif /* _RESCTRL_H */
>
> During the software controller work Boris stated [1] that these APIs should
> only appear in the main header file at the time they are used. This series
> makes a few changes to include/linux/resctrl.h that, considering this
> feedback, should rather be in arch/x86/kernel/cpu/resctrl/internal.h
> until MPAM starts using them.
Sure. We can do that.
>
> Reinette
>
> [1] https://lore.kernel.org/all/20241209222047.GKZ1dtPxIu5_Hxs1fp@fat_crate.local/
>
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 08/24] x86/resctrl: Introduce the interface to display monitor mode
2024-12-19 21:59 ` Reinette Chatre
@ 2024-12-20 15:31 ` Moger, Babu
0 siblings, 0 replies; 76+ messages in thread
From: Moger, Babu @ 2024-12-20 15:31 UTC (permalink / raw)
To: Reinette Chatre, Babu Moger, corbet, tglx, mingo, bp, dave.hansen,
tony.luck, peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Reinette.
On 12/19/2024 3:59 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 12/12/24 12:15 PM, Babu Moger wrote:
>> Introduce the interface file "mbm_assign_mode" to list monitor modes
>> supported.
>>
>> The "mbm_cntr_assign" mode provides the option to assign a counter to
>> an RMID, event pair and monitor the bandwidth as long as it is assigned.
>>
>> On AMD systems "mbm_cntr_assign" is backed by the ABMC (Assignable
>> Bandwidth Monitoring Counters) hardware feature and is enabled by default.
>>
>> The "default" mode is the existing monitoring mode that works without the
>> explicit counter assignment, instead relying on dynamic counter assignment
>> by hardware that may result in hardware not dedicating a counter resulting
>> in monitoring data reads returning "Unavailable".
>>
>> Provide an interface to display the monitor mode on the system.
>> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> [mbm_cntr_assign]
>> default
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v10: Added few more text to user documentation clarify on the default mode.
>>
>> v9: Updated user documentation based on comments.
>>
>> v8: Commit message update.
>>
>> v7: Updated the descriptions/commit log in resctrl.rst to generic text.
>> Thanks to James and Reinette.
>> Rename mbm_mode to mbm_assign_mode.
>> Introduced mutex lock in rdtgroup_mbm_mode_show().
>>
>> v6: Added documentation for mbm_cntr_assign and legacy mode.
>> Moved mbm_mode fflags initialization to static initialization.
>>
>> v5: Changed interface name to mbm_mode.
>> It will be always available even if ABMC feature is not supported.
>> Added description in resctrl.rst about ABMC mode.
>> Fixed display abmc and legacy consistantly.
>>
>> v4: Fixed the checks for legacy and abmc mode. Default it ABMC.
>>
>> v3: New patch to display ABMC capability.
>> ---
>> Documentation/arch/x86/resctrl.rst | 33 ++++++++++++++++++++++++++
>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 31 ++++++++++++++++++++++++
>> 2 files changed, 64 insertions(+)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>> index 30586728a4cd..1e4a1f615496 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -257,6 +257,39 @@ with the following files:
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
>> 0=0x30;1=0x30;3=0x15;4=0x15
>>
>> +"mbm_assign_mode":
>> + Reports the list of monitoring modes supported. The enclosed brackets
>> + indicate which mode is enabled.
>> + ::
>> +
>> + # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> + [mbm_cntr_assign]
>> + default
>> +
>> + "mbm_cntr_assign":
>> +
>
> The text below jumps into usage with assumption that user space already
> understands the feature. How about starting with some context? Something like
> "A monitoring event can only accumulate data while it is backed by a hardware
> counter."
sure.
>
>> + In mbm_cntr_assign mode user-space is able to specify which of the
>> + events in CTRL_MON or MON groups should have a counter assigned using the
>> + "mbm_assign_control" file. The number of counters available is described
>> + in the "num_mbm_cntrs" file. Changing the mode may cause all counters on
>> + a resource to reset.
>> +
>> + The mode is useful on AMD platforms which support more CTRL_MON and MON
>> + groups than hardware counters, meaning 'unassigned' events on CTRL_MON or
>> + MON groups will report 'Unavailable'.
>
> The "meaning 'unassigned'" is not clear to me since in "mbm_cntr_assign" mode
> these events will (at end of this series) actually return "Unassigned", no? Perhaps
Yes. It will report "Unassigned".
> this portion can be dropped for now and the text found in patch #20 about returning
> "Unassigned" can be placed here instead ... but this should probably be done in
> patch #19 that adds that capability.
Sure. We can do that.
>
>> +
>> + AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature
>> + enable this mode by default so that counters remain assigned even when the
>> + corresponding RMID is not in use by any processor.
>> +
>> + "default":
>> +
>> + In default mode, resctrl assumes there is a hardware counter for each
>> + event within every CTRL_MON and MON group. On AMD platforms, it is
>> + recommended to use mbm_cntr_assign mode if supported, because reading
>> + "mbm_total_bytes" or "mbm_local_bytes" will report 'Unavailable' if
>> + there is no counter associated with that event.
>> +
>> "max_threshold_occupancy":
>> Read/write file provides the largest value (in
>> bytes) at which a previously used LLC_occupancy
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index d54c2701c09c..f25ff1430014 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -845,6 +845,30 @@ static int rdtgroup_rmid_show(struct kernfs_open_file *of,
>> return ret;
>> }
>>
>> +static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
>> + struct seq_file *s, void *v)
>
> I remember this topic from earlier version yet I still see many instances
> of the "rdtgroup_" namespace used for functions that do not interact with
> resource groups. Could you please check this series and fix this?
Yes. Sure. I will check on this.
>
>> +{
>> + struct rdt_resource *r = of->kn->parent->priv;
>> +
>> + mutex_lock(&rdtgroup_mutex);
>> +
>> + if (r->mon.mbm_cntr_assignable) {
>> + if (resctrl_arch_mbm_cntr_assign_enabled(r)) {
>> + seq_puts(s, "[mbm_cntr_assign]\n");
>> + seq_puts(s, "default\n");
>> + } else {
>> + seq_puts(s, "mbm_cntr_assign\n");
>> + seq_puts(s, "[default]\n");
>> + }
>> + } else {
>> + seq_puts(s, "[default]\n");
>> + }
>> +
>> + mutex_unlock(&rdtgroup_mutex);
>> +
>> + return 0;
>> +}
>> +
>> #ifdef CONFIG_PROC_CPU_RESCTRL
>>
>> /*
>> @@ -1901,6 +1925,13 @@ static struct rftype res_common_files[] = {
>> .seq_show = mbm_local_bytes_config_show,
>> .write = mbm_local_bytes_config_write,
>> },
>> + {
>> + .name = "mbm_assign_mode",
>> + .mode = 0444,
>> + .kf_ops = &rdtgroup_kf_single_ops,
>> + .seq_show = rdtgroup_mbm_assign_mode_show,
>> + .fflags = RFTYPE_MON_INFO,
>> + },
>> {
>> .name = "cpus",
>> .mode = 0644,
>
>
> Reinette
>
>
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 09/24] x86/resctrl: Introduce interface to display number of monitoring counters
2024-12-19 22:03 ` Reinette Chatre
@ 2024-12-20 15:41 ` Moger, Babu
0 siblings, 0 replies; 76+ messages in thread
From: Moger, Babu @ 2024-12-20 15:41 UTC (permalink / raw)
To: Reinette Chatre, Babu Moger, corbet, tglx, mingo, bp, dave.hansen,
tony.luck, peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Reinette,
On 12/19/2024 4:03 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 12/12/24 12:15 PM, Babu Moger wrote:
>> The mbm_cntr_assign mode provides an option to the user to assign a
>> counter to an RMID, event pair and monitor the bandwidth as long as
>> the counter is assigned. Number of assignments depend on number of
>> monitoring counters available.
>>
>> Provide the interface to display the number of monitoring counters
>> supported. The interface file 'num_mbm_cntrs' is available when an
>> architecture supports mbm_cntr_assign mode.
>
> How about: "The resctrl file 'num_mbm_cntrs' is visible to user space
> when the system supports mbm_cntr_assign mode." ?
Sure.
>
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v10: No changes.
>>
>> v9: Updated user document based on the comments.
>> Will add a new file available_mbm_cntrs later in the series.
>>
>> v8: Commit message update and documentation update.
>>
>> v7: Minor commit log text changes.
>>
>> v6: No changes.
>>
>> v5: Changed the display name from num_cntrs to num_mbm_cntrs.
>> Updated the commit message.
>> Moved the patch after mbm_mode is introduced.
>>
>> v4: Changed the counter name to num_cntrs. And few text changes.
>>
>> v3: Changed the field name to mbm_assign_cntrs.
>>
>> v2: Changed the field name to mbm_assignable_counters from abmc_counter.
>> ---
>> ---
>> Documentation/arch/x86/resctrl.rst | 12 ++++++++++++
>> arch/x86/kernel/cpu/resctrl/monitor.c | 1 +
>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 16 ++++++++++++++++
>> 3 files changed, 29 insertions(+)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>> index 1e4a1f615496..43a861adeada 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -290,6 +290,18 @@ with the following files:
>> "mbm_total_bytes" or "mbm_local_bytes" will report 'Unavailable' if
>> there is no counter associated with that event.
>>
>> +"num_mbm_cntrs":
>> + The number of monitoring counters available for assignment when the
>> + architecture supports mbm_cntr_assign mode.
>
> "architecture supports" -> "system supports"
Sure.
>
>> +
>> + The resctrl file system supports tracking up to two memory bandwidth
>> + events per monitoring group: mbm_total_bytes and/or mbm_local_bytes.
>> + Up to two counters can be assigned per monitoring group, one for each
>> + memory bandwidth event. More monitoring groups can be tracked by
>> + assigning one counter per monitoring group. However, doing so limits
>> + memory bandwidth tracking to a single memory bandwidth event per
>> + monitoring group.
>> +
>> "max_threshold_occupancy":
>> Read/write file provides the largest value (in
>> bytes) at which a previously used LLC_occupancy
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 80be91671dc1..c23e94fa6852 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -1237,6 +1237,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>> r->mon.mbm_cntr_assignable = true;
>> cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
>> r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
>> + resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
>> }
>> }
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index f25ff1430014..339bb0b09a82 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -869,6 +869,16 @@ static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
>> return 0;
>> }
>>
>> +static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
>> + struct seq_file *s, void *v)
>
> No rdtgroup_ namespace, this can be resctrl_
Yes. Sure.
thanks
Babu
>
>> +{
>> + struct rdt_resource *r = of->kn->parent->priv;
>> +
>> + seq_printf(s, "%d\n", r->mon.num_mbm_cntrs);
>> +
>> + return 0;
>> +}
>> +
>> #ifdef CONFIG_PROC_CPU_RESCTRL
>>
>> /*
>
> Reinette
>
>
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 11/24] x86/resctrl: Remove MSR reading of event configuration value
2024-12-19 22:12 ` Reinette Chatre
@ 2024-12-20 16:09 ` Moger, Babu
0 siblings, 0 replies; 76+ messages in thread
From: Moger, Babu @ 2024-12-20 16:09 UTC (permalink / raw)
To: Reinette Chatre, Babu Moger, corbet, tglx, mingo, bp, dave.hansen,
tony.luck, peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Reinette,
On 12/19/2024 4:12 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 12/12/24 12:15 PM, Babu Moger wrote:
>
>> @@ -1604,33 +1645,11 @@ unsigned int mon_event_config_index_get(u32 evtid)
>> }
>> }
>>
>> -static void mon_event_config_read(void *info)
>> -{
>> - struct mon_config_info *mon_info = info;
>> - unsigned int index;
>> - u64 msrval;
>> -
>> - index = mon_event_config_index_get(mon_info->evtid);
>> - if (index == INVALID_CONFIG_INDEX) {
>> - pr_warn_once("Invalid event id %d\n", mon_info->evtid);
>> - return;
>> - }
>> - rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval);
>> -
>> - /* Report only the valid event configuration bits */
>> - mon_info->mon_config = msrval & MAX_EVT_CONFIG_BITS;
>> -}
>> -
>> -static void mondata_config_read(struct rdt_mon_domain *d, struct mon_config_info *mon_info)
>> -{
>> - smp_call_function_any(&d->hdr.cpu_mask, mon_event_config_read, mon_info, 1);
>> -}
>> -
>> static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
>> {
>> - struct mon_config_info mon_info;
>> struct rdt_mon_domain *dom;
>> bool sep = false;
>> + u32 val;
>
> Could this variable name be more descriptive? For example, mon_config, or config_val as
> used in mbm_config_write_domain()?
Will change it to config_val.
>
> ...
>
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index f11d6fdfd977..c8ab3d7a0dab 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -118,6 +118,18 @@ struct rdt_mon_domain {
>> int cqm_work_cpu;
>> };
>>
>> +/**
>> + * struct mon_config_info - Monitoring event configuratiin details
>
> configuratiin -> configuration
Sure.
>
> ... but actually, the motivation for moving this struct here was
> to make it available for an arch to interpret the data passed
> via resctrl_arch_mon_event_config_set(). This patch passes data
> in this struct but a later patch modifies
> resctrl_arch_mon_event_config_set() to not use struct anymore ...
> and then leaves struct mon_config_info here.
>
> Even so, considering Boris's preference this is no longer needed.
ok. I will move the "struct mon_config_info" definition where it is
used(rdtgroup.c).
>
>> + * @d: Domain for the event
>> + * @evtid: Event type
>> + * @mon_config: Event configuration value
>> + */
>> +struct mon_config_info {
>> + struct rdt_mon_domain *d;
>> + enum resctrl_event_id evtid;
>> + u32 mon_config;
>> +};
>> +
>> /**
>> * struct resctrl_cache - Cache allocation related data
>> * @cbm_len: Length of the cache bit mask
>> @@ -352,6 +364,10 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
>> */
>> void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d);
>>
>> +void resctrl_arch_mon_event_config_set(void *info);
>> +u32 resctrl_arch_mon_event_config_get(struct rdt_mon_domain *d,
>> + enum resctrl_event_id eventid);
>> +
>
> Please move to internal header file instead and consider this for
> all changes to include/linux/resctrl.h
Sure. Thanks
Babu
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 07/24] x86/resctrl: Add support to enable/disable AMD ABMC feature
2024-12-20 15:14 ` Moger, Babu
@ 2024-12-20 17:16 ` Reinette Chatre
0 siblings, 0 replies; 76+ messages in thread
From: Reinette Chatre @ 2024-12-20 17:16 UTC (permalink / raw)
To: Moger, Babu, Babu Moger, corbet, tglx, mingo, bp, dave.hansen,
tony.luck, peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Babu,
On 12/20/24 7:14 AM, Moger, Babu wrote:
> Hi Reinette,
>
> On 12/19/2024 3:48 PM, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 12/12/24 12:15 PM, Babu Moger wrote:
>>
>>> static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r)
>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> index 687d9d8d82a4..d54c2701c09c 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>
>> These functions are clearly monitoring related. Is there a reason why they are
>> in rdtgroup.c and not in monitor.c?
>
> There is no specific reason. Most of these functions are called from user interface. User interface handlers are defined in rdtgroup.c.
>
Most, but not all of them, are, yes. With most operations triggered via user
interface we'll end up with most code in the same file if trying to keep all code
triggered by user space together.
> All the code in this series is related to monitoring. We can move everything to monitor.c if you are ok with it.
The read/write callbacks could stay with res_common_files[] to make their definition
simpler. I think it would make things clear if these callback functions call into monitoring
code located in monitor.c. Since you have been staring at this much longer, please let me know
if you find this to actually make things harder to follow and find.
Reinette
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 12/24] x86/resctrl: Introduce cntr_cfg to track assignable counters at domain
2024-12-19 22:33 ` Reinette Chatre
@ 2024-12-20 17:33 ` Moger, Babu
2024-12-20 20:58 ` Reinette Chatre
0 siblings, 1 reply; 76+ messages in thread
From: Moger, Babu @ 2024-12-20 17:33 UTC (permalink / raw)
To: Reinette Chatre, Babu Moger, corbet, tglx, mingo, bp, dave.hansen,
tony.luck, peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Reinette,
On 12/19/2024 4:33 PM, Reinette Chatre wrote:
> Hi Babu,
>
> Did subject intend to use name of new struct?
Yes. Will change it to "mbm_cntr_cfg.
>
> On 12/12/24 12:15 PM, Babu Moger wrote:
>> In mbm_assign_mode, the MBM counters are assigned/unassigned to an RMID,
>> event pair in a resctrl group and monitor the bandwidth as long as it is
>> assigned. Counters are assigned/unassigned at domain level and needs to
>> be tracked at domain level.
>>
>> Add the mbm_assign_cntr_cfg data structure to struct rdt_ctrl_domain to
>
> "mbm_assign_cntr_cfg" -> "mbm_cntr_cfg"
Sure.
>
>> manage and track MBM counter assignments at the domain level.
>
> This can really use some more information about this data structure. I think
> it will be helpful to provide more information about how the data structure
> looks ... for example, that it is an array indexed by counter ID where the
> assignment details of each counter is stored. I also think it will be helpful
> to describe how interactions with this data structure works, that a NULL
> rdtgrp means that the counter is free and that it is not possible to find
> a counter from a resource group and arrays need to be searched instead and doing
> so is ok for $REASON (when considering the number of RMID and domain combinations
> possible on AMD). A lot is left for the reader to figure out.
How about this?
In mbm_assign_mode, the MBM counters are assigned/unassigned to an RMID,
event pair in a resctrl group and monitor the bandwidth as long as it is
assigned. Counters are assigned/unassigned at domain level and needs to
be tracked at domain level.
Add the mbm_cntr_cfg data structure to struct rdt_ctrl_domain to
manage and track MBM counter assignments at the domain level.
Each domain will contain num_mbm_cntrs entries, indexed by cntr_id.
During initialization, all entries will be set to zero. When a counter
is allocated, its corresponding entry will be populated with the
assigned struct rdtgroup and enum resctrl_event_id. When the counter is
released, its entry will be reset to zero.
>>
>> Suggested-by: Peter Newman <peternewman@google.com>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v10: Patch changed completely to handle the counters at domain level.
>> https://lore.kernel.org/lkml/CALPaoCj+zWq1vkHVbXYP0znJbe6Ke3PXPWjtri5AFgD9cQDCUg@mail.gmail.com/
>> Removed Reviewed-by tag.
>> Did not see the need to add cntr_id in mbm_state structure. Not used in the code.
>>
>> v9: Added Reviewed-by tag. No other changes.
>>
>> v8: Minor commit message changes.
>>
>> v7: Added check mbm_cntr_assignable for allocating bitmap mbm_cntr_map
>>
>> v6: New patch to add domain level assignment.
>> ---
>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 11 +++++++++++
>> include/linux/resctrl.h | 12 ++++++++++++
>> 2 files changed, 23 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 682f47e0beb1..1ee008a63d8b 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -4068,6 +4068,7 @@ static void __init rdtgroup_setup_default(void)
>>
>> static void domain_destroy_mon_state(struct rdt_mon_domain *d)
>> {
>> + kfree(d->cntr_cfg);
>> bitmap_free(d->rmid_busy_llc);
>> kfree(d->mbm_total);
>> kfree(d->mbm_local);
>> @@ -4141,6 +4142,16 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain
>> return -ENOMEM;
>> }
>> }
>> + if (is_mbm_enabled() && r->mon.mbm_cntr_assignable) {
>> + tsize = sizeof(*d->cntr_cfg);
>> + d->cntr_cfg = kcalloc(r->mon.num_mbm_cntrs, tsize, GFP_KERNEL);
>> + if (!d->cntr_cfg) {
>> + bitmap_free(d->rmid_busy_llc);
>> + kfree(d->mbm_total);
>> + kfree(d->mbm_local);
>> + return -ENOMEM;
>> + }
>> + }
>>
>> return 0;
>> }
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index c8ab3d7a0dab..03c67d9156f3 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -94,6 +94,16 @@ struct rdt_ctrl_domain {
>> u32 *mbps_val;
>> };
>>
>> +/**
>> + * struct mbm_cntr_cfg -Assignable counter configuration
>
> Please compare with style use in rest of the file. For example,
> "-Assignable" -> "- assignable"
Sure.
>
>> + * @evtid: Event type
>
> This description is not useful. Consider: "MBM event to which
> the counter is assigned. Only valid if @rdtgroup is not NULL."
> (This was the first thing that came to my mind, please improve)
>
>> + * @rdtgroup: Resctrl group assigned to the counter
>
> Can add "NULL if counter is free"
Sure.
>
>> + */
>> +struct mbm_cntr_cfg {
>> + enum resctrl_event_id evtid;
>> + struct rdtgroup *rdtgrp;
>> +};
>> +
>> /**
>> * struct rdt_mon_domain - group of CPUs sharing a resctrl monitor resource
>> * @hdr: common header for different domain types
>> @@ -105,6 +115,7 @@ struct rdt_ctrl_domain {
>> * @cqm_limbo: worker to periodically read CQM h/w counters
>> * @mbm_work_cpu: worker CPU for MBM h/w counters
>> * @cqm_work_cpu: worker CPU for CQM h/w counters
>> + * @cntr_cfg: Assignable counters configuration
>
> Match capitalization of surrounding text.
> Will be helpful to add that this is an array indexed by counter ID.
ok. Sure.
>
>> */
>> struct rdt_mon_domain {
>> struct rdt_domain_hdr hdr;
>> @@ -116,6 +127,7 @@ struct rdt_mon_domain {
>> struct delayed_work cqm_limbo;
>> int mbm_work_cpu;
>> int cqm_work_cpu;
>> + struct mbm_cntr_cfg *cntr_cfg;
>> };
>>
>> /**
>
> Reinette
>
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 13/24] x86/resctrl: Introduce interface to display number of free counters
2024-12-19 22:50 ` Reinette Chatre
@ 2024-12-20 18:05 ` Moger, Babu
2024-12-20 18:32 ` Moger, Babu
1 sibling, 0 replies; 76+ messages in thread
From: Moger, Babu @ 2024-12-20 18:05 UTC (permalink / raw)
To: Reinette Chatre, Babu Moger, corbet, tglx, mingo, bp, dave.hansen,
tony.luck, peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, sandipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Reinette,
On 12/19/2024 4:50 PM, Reinette Chatre wrote:
> (andipan.das@amd.com -> sandipan.das@amd.com to stop sending undeliverable emails)
Yes. I know. My mistake when I grabbed the get_maintaners list.
>
> Hi Babu,
>
> On 12/12/24 12:15 PM, Babu Moger wrote:
>> Provide the interface to display the number of monitoring counters
>> available for assignment in each domain when mbm_cntr_assign is supported.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v10: Patch changed to handle the counters at domain level.
>> https://lore.kernel.org/lkml/CALPaoCj+zWq1vkHVbXYP0znJbe6Ke3PXPWjtri5AFgD9cQDCUg@mail.gmail.com/
>> So, display logic also changed now.
>>
>> v9: New patch
>> ---
>> Documentation/arch/x86/resctrl.rst | 4 +++
>> arch/x86/kernel/cpu/resctrl/monitor.c | 1 +
>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 47 ++++++++++++++++++++++++++
>> 3 files changed, 52 insertions(+)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>> index 43a861adeada..c075fcee96b7 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -302,6 +302,10 @@ with the following files:
>> memory bandwidth tracking to a single memory bandwidth event per
>> monitoring group.
>>
>> +"available_mbm_cntrs":
>> + The number of monitoring counters available for assignment in each
>> + domain when the architecture supports mbm_cntr_assign mode.
>
> "architecture supports" -> "system supports"
>
> It looks to me as though more than just support is required, the mode
> is also required to be enabled?
sure.
>
>> +
>> "max_threshold_occupancy":
>> Read/write file provides the largest value (in
>> bytes) at which a previously used LLC_occupancy
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index b07d60fabf1c..f857af361af1 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -1238,6 +1238,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>> cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
>> r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
>> resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
>> + resctrl_file_fflags_init("available_mbm_cntrs", RFTYPE_MON_INFO);
>> }
>> }
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 1ee008a63d8b..72518e0ec2ec 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -879,6 +879,47 @@ static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
>> return 0;
>> }
>>
>> +static int rdtgroup_available_mbm_cntrs_show(struct kernfs_open_file *of,
>> + struct seq_file *s, void *v)
>
> rdtgroup_
Will change it to resctrl_
>
>> +{
>> + struct rdt_resource *r = of->kn->parent->priv;
>> + struct rdt_mon_domain *dom;
>> + bool sep = false;
>> + u32 cntrs, i;
>> + int ret = 0;
>> +
>> + cpus_read_lock();
>> + mutex_lock(&rdtgroup_mutex);
>> +
>> + if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
>> + rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
>> + ret = -EINVAL;
>> + goto unlock_cntrs_show;
>> + }
>> +
>> +
>
> unnecessary empty line
>
ok.
thanks
Babu
>> + list_for_each_entry(dom, &r->mon_domains, hdr.list) {
>> + if (sep)
>> + seq_puts(s, ";");
>> +
>> + cntrs = 0;
>> + for (i = 0; i < r->mon.num_mbm_cntrs; i++) {
>> + if (!dom->cntr_cfg[i].rdtgrp)
>> + cntrs++;
>> + }
>> +
>> + seq_printf(s, "%d=%d", dom->hdr.id, cntrs);
>> + sep = true;
>> + }
>> + seq_puts(s, "\n");
>> +
>> +unlock_cntrs_show:
>> + mutex_unlock(&rdtgroup_mutex);
>> + cpus_read_unlock();
>> +
>> + return ret;
>> +}
>> +
>> #ifdef CONFIG_PROC_CPU_RESCTRL
>>
>> /*
>> @@ -1961,6 +2002,12 @@ static struct rftype res_common_files[] = {
>> .kf_ops = &rdtgroup_kf_single_ops,
>> .seq_show = rdtgroup_num_mbm_cntrs_show,
>> },
>> + {
>> + .name = "available_mbm_cntrs",
>> + .mode = 0444,
>> + .kf_ops = &rdtgroup_kf_single_ops,
>> + .seq_show = rdtgroup_available_mbm_cntrs_show,
>> + },
>> {
>> .name = "cpus_list",
>> .mode = 0644,
>
> Reinette
>
>
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 13/24] x86/resctrl: Introduce interface to display number of free counters
2024-12-19 22:50 ` Reinette Chatre
2024-12-20 18:05 ` Moger, Babu
@ 2024-12-20 18:32 ` Moger, Babu
1 sibling, 0 replies; 76+ messages in thread
From: Moger, Babu @ 2024-12-20 18:32 UTC (permalink / raw)
To: Reinette Chatre, Babu Moger, corbet, tglx, mingo, bp, dave.hansen,
tony.luck, peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, sandipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Reinette,
On 12/19/2024 4:50 PM, Reinette Chatre wrote:
> (andipan.das@amd.com -> sandipan.das@amd.com to stop sending undeliverable emails)
>
> Hi Babu,
>
> On 12/12/24 12:15 PM, Babu Moger wrote:
>> Provide the interface to display the number of monitoring counters
>> available for assignment in each domain when mbm_cntr_assign is supported.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v10: Patch changed to handle the counters at domain level.
>> https://lore.kernel.org/lkml/CALPaoCj+zWq1vkHVbXYP0znJbe6Ke3PXPWjtri5AFgD9cQDCUg@mail.gmail.com/
>> So, display logic also changed now.
>>
>> v9: New patch
>> ---
>> Documentation/arch/x86/resctrl.rst | 4 +++
>> arch/x86/kernel/cpu/resctrl/monitor.c | 1 +
>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 47 ++++++++++++++++++++++++++
>> 3 files changed, 52 insertions(+)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>> index 43a861adeada..c075fcee96b7 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -302,6 +302,10 @@ with the following files:
>> memory bandwidth tracking to a single memory bandwidth event per
>> monitoring group.
>>
>> +"available_mbm_cntrs":
>> + The number of monitoring counters available for assignment in each
>> + domain when the architecture supports mbm_cntr_assign mode.
>
> "architecture supports" -> "system supports"
>
> It looks to me as though more than just support is required, the mode
> is also required to be enabled?
Yes. It needs to be enabled.
The number of monitoring counters available for assignment in each
domain when mbm_cntr_assign mode is enabled on the system.
>
>> +
>> "max_threshold_occupancy":
>> Read/write file provides the largest value (in
>> bytes) at which a previously used LLC_occupancy
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index b07d60fabf1c..f857af361af1 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -1238,6 +1238,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>> cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
>> r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
>> resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
>> + resctrl_file_fflags_init("available_mbm_cntrs", RFTYPE_MON_INFO);
>> }
>> }
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 1ee008a63d8b..72518e0ec2ec 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -879,6 +879,47 @@ static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
>> return 0;
>> }
>>
>> +static int rdtgroup_available_mbm_cntrs_show(struct kernfs_open_file *of,
>> + struct seq_file *s, void *v)
>
> rdtgroup_
>
>> +{
>> + struct rdt_resource *r = of->kn->parent->priv;
>> + struct rdt_mon_domain *dom;
>> + bool sep = false;
>> + u32 cntrs, i;
>> + int ret = 0;
>> +
>> + cpus_read_lock();
>> + mutex_lock(&rdtgroup_mutex);
>> +
>> + if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
>> + rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
>> + ret = -EINVAL;
>> + goto unlock_cntrs_show;
>> + }
>> +
>> +
>
> unnecessary empty line
>
>> + list_for_each_entry(dom, &r->mon_domains, hdr.list) {
>> + if (sep)
>> + seq_puts(s, ";");
>> +
>> + cntrs = 0;
>> + for (i = 0; i < r->mon.num_mbm_cntrs; i++) {
>> + if (!dom->cntr_cfg[i].rdtgrp)
>> + cntrs++;
>> + }
>> +
>> + seq_printf(s, "%d=%d", dom->hdr.id, cntrs);
>> + sep = true;
>> + }
>> + seq_puts(s, "\n");
>> +
>> +unlock_cntrs_show:
>> + mutex_unlock(&rdtgroup_mutex);
>> + cpus_read_unlock();
>> +
>> + return ret;
>> +}
>> +
>> #ifdef CONFIG_PROC_CPU_RESCTRL
>>
>> /*
>> @@ -1961,6 +2002,12 @@ static struct rftype res_common_files[] = {
>> .kf_ops = &rdtgroup_kf_single_ops,
>> .seq_show = rdtgroup_num_mbm_cntrs_show,
>> },
>> + {
>> + .name = "available_mbm_cntrs",
>> + .mode = 0444,
>> + .kf_ops = &rdtgroup_kf_single_ops,
>> + .seq_show = rdtgroup_available_mbm_cntrs_show,
>> + },
>> {
>> .name = "cpus_list",
>> .mode = 0644,
>
> Reinette
>
>
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 15/24] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
2024-12-19 23:04 ` Reinette Chatre
@ 2024-12-20 19:22 ` Moger, Babu
2024-12-20 21:41 ` Reinette Chatre
0 siblings, 1 reply; 76+ messages in thread
From: Moger, Babu @ 2024-12-20 19:22 UTC (permalink / raw)
To: Reinette Chatre, Babu Moger, corbet, tglx, mingo, bp, dave.hansen,
tony.luck, peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, sandipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Reinette,
On 12/19/2024 5:04 PM, Reinette Chatre wrote:
> (andipan.das@amd.com -> sandipan.das@amd.com to stop sending undeliverable emails)
Yes.
>
> Hi Babu,
>
> On 12/12/24 12:15 PM, Babu Moger wrote:
>> The ABMC feature provides an option to the user to assign a hardware
>> counter to an RMID, event pair and monitor the bandwidth as long as it is
>> assigned. The assigned RMID will be tracked by the hardware until the user
>> unassigns it manually.
>>
>> Configure the counters by writing to the L3_QOS_ABMC_CFG MSR and specifying
>> the counter ID, bandwidth source (RMID), and bandwidth event configuration.
>>
>> Provide the interface to assign the counter ids to RMID.
>
> Until now in this series many patches "introduced interface X" and every
> time it was some new resctrl file that user space interacts with. This
> changelog starts with a context about "user to assign a hardware counter"
> and ends with "Provide the interface", but there is no new user interface
> in this patch. Can this be more specific about what this patch does?
Yes. This should be about resctrl_arch_config_cntr(). How about this?
The ABMC feature provides an option to the user to assign a hardware
counter to an RMID, event pair and monitor the bandwidth as long as it
is assigned. The assigned RMID will be tracked by the hardware until the
user unassigns it manually.
Provide the architecture specific handler to to assign/unassign the
counter. Counters are configured by writing to the L3_QOS_ABMC_CFG MSR
and specifying the counter ID, bandwidth source (RMID), and bandwidth
event configuration.
>
>>
>> The feature details are documented in the APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC).
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>
> ....
>
>> ---
>> arch/x86/kernel/cpu/resctrl/internal.h | 3 ++
>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 58 ++++++++++++++++++++++++++
>> 2 files changed, 61 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 35bcf0e5ba7e..849bcfe4ea5b 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -701,5 +701,8 @@ bool closid_allocated(unsigned int closid);
>> int resctrl_find_cleanest_closid(void);
>> void arch_mbm_evt_config_init(struct rdt_hw_mon_domain *hw_dom);
>> unsigned int mon_event_config_index_get(u32 evtid);
>> +int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>> + enum resctrl_event_id evtid, u32 rmid, u32 closid,
>> + u32 cntr_id, bool assign);
>>
>> #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 72518e0ec2ec..e895d2415f22 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1686,6 +1686,34 @@ unsigned int mon_event_config_index_get(u32 evtid)
>> }
>> }
>>
>> +struct cntr_config {
>> + struct rdt_resource *r;
>> + struct rdt_mon_domain *d;
>> + enum resctrl_event_id evtid;
>> + u32 rmid;
>> + u32 closid;
>> + u32 cntr_id;
>> + u32 val;
>> + bool assign;
>> +};
>
> I think I am missing something because it is not clear to me why this
> new struct is needed. Why not just use union l3_qos_abmc_cfg?
New struct is needed because we need to call resctrl_arch_reset_rmid()
inside IPI. It requires all these parameters.
void resctrl_arch_reset_rmid(struct rdt_resource *r, struct
rdt_mon_domain *d, u32 closid, u32 rmid,
enum resctrl_event_id eventid);
>
> If it is indeed needed it needs better formatting and clear descriptions,
> a member like "val" is very generic.
Sure. Will change it.
>
>> +
>> +static void resctrl_abmc_config_one_amd(void *info)
>> +{
>> + struct cntr_config *config = info;
>> + union l3_qos_abmc_cfg abmc_cfg = { 0 };
>> +
>
> reverse fir
Sure.
>
>> + abmc_cfg.split.cfg_en = 1;
>> + abmc_cfg.split.cntr_en = config->assign ? 1 : 0;
>> + abmc_cfg.split.cntr_id = config->cntr_id;
>> + abmc_cfg.split.bw_src = config->rmid;
>> + abmc_cfg.split.bw_type = config->val;
>> +
>> + wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, abmc_cfg.full);
>> +
>> + resctrl_arch_reset_rmid(config->r, config->d, config->closid,
>> + config->rmid, config->evtid);
>> +}
>> +
>> static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
>> {
>> struct rdt_mon_domain *dom;
>> @@ -1869,6 +1897,36 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
>> return ret ?: nbytes;
>> }
>>
>> +/*
>> + * Send an IPI to the domain to assign the counter to RMID, event pair.
>> + */
>> +int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>> + enum resctrl_event_id evtid, u32 rmid, u32 closid,
>> + u32 cntr_id, bool assign)
>> +{
>> + struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
>> + struct cntr_config config = { 0 };
>
> Please see 29eaa7958367 ("x86/resctrl: Slightly clean-up mbm_config_show()")
This may not apply here.
x86/resctrl: Slightly clean-up mbm_config_show()
"mon_info' is already zeroed in the list_for_each_entry() loop below.
There is no need to explicitly initialize it here. It just wastes some
space and cycles.
In our case we are not doing memset again.
Thanks
Babu
>
>> +
>> + config.r = r;
>> + config.d = d;
>> + config.evtid = evtid;
>> + config.rmid = rmid;
>> + config.closid = closid;
>> + config.cntr_id = cntr_id;
>> +
>> + /* Update the event configuration from the domain */
>> + if (evtid == QOS_L3_MBM_TOTAL_EVENT_ID)
>> + config.val = hw_dom->mbm_total_cfg;
>> + else
>> + config.val = hw_dom->mbm_local_cfg;
>> +
>> + config.assign = assign;
>> +
>> + smp_call_function_any(&d->hdr.cpu_mask, resctrl_abmc_config_one_amd, &config, 1);
>> +
>> + return 0;
>> +}
>> +
>> /* rdtgroup information files for one cache resource. */
>> static struct rftype res_common_files[] = {
>> {
>
> Reinette
>
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 16/24] x86/resctrl: Add interface to the assign counter
2024-12-19 23:22 ` Reinette Chatre
@ 2024-12-20 20:34 ` Moger, Babu
0 siblings, 0 replies; 76+ messages in thread
From: Moger, Babu @ 2024-12-20 20:34 UTC (permalink / raw)
To: Reinette Chatre, Babu Moger, corbet, tglx, mingo, bp, dave.hansen,
tony.luck, peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, sandipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Reinette,
On 12/19/2024 5:22 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 12/12/24 12:15 PM, Babu Moger wrote:
>> The mbm_cntr_assign mode offers several counters that can be assigned
>> to an RMID, event pair and monitor the bandwidth as long as it is
>> assigned.
>>
>> Counters are managed at the domain level. Introduce the interface to
>> allocate/free/assign the counters.
>
> Changelog of previous patch also claimed to "Provide the interface to assign the
> counter ids to RMID." Please let changelogs describe the change more accurately.
>
> (This still does not provide a user interface so what is meant by interface is
> unclear)
How about this?
The mbm_cntr_assign mode offers several counters that can be assigned
to an RMID, event pair and monitor the bandwidth as long as it is
assigned.
Counters are managed at the domain level. Introduce the functionality to
allocate/free/assign the counters.
If the user requests assignments across all domains, assignments will
abort on first failure. The error will be logged in
/sys/fs/resctrl/info/last_cmd_status.
>
>
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 849bcfe4ea5b..70d2577fc377 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -704,5 +704,8 @@ unsigned int mon_event_config_index_get(u32 evtid);
>> int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>> enum resctrl_event_id evtid, u32 rmid, u32 closid,
>> u32 cntr_id, bool assign);
>> -
>> +int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>> + struct rdt_mon_domain *d, enum resctrl_event_id evtid);
>
> Could you please be consistent in the ordering of parameters?
>
> int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
> struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
Sure.
>
>> +struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
>> + u32 rmid, enum resctrl_event_id evtid);
>> #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index f857af361af1..8823cd97ff1f 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -575,8 +575,8 @@ void free_rmid(u32 closid, u32 rmid)
>> list_add_tail(&entry->list, &rmid_free_lru);
>> }
>>
>> -static struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
>> - u32 rmid, enum resctrl_event_id evtid)
>> +struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
>> + u32 rmid, enum resctrl_event_id evtid)
>> {
>> u32 idx = resctrl_arch_rmid_idx_encode(closid, rmid);
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index e895d2415f22..1c8694a68cf4 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1927,6 +1927,116 @@ int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>> return 0;
>> }
>>
>> +/*
>> + * Configure the counter for the event, RMID pair for the domain.
>
> This description can be more helpful ... it essentially just re-writes function
> header.
I can drop this. There is not much to explain here. Code seems easy to
follow.
>
>> + */
>> +static int resctrl_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>> + enum resctrl_event_id evtid, u32 rmid, u32 closid,
>> + u32 cntr_id, bool assign)
>> +{
>> + struct mbm_state *m;
>> + int ret;
>> +
>> + ret = resctrl_arch_config_cntr(r, d, evtid, rmid, closid, cntr_id, assign);
>> + if (ret)
>> + return ret;
>> +
>> + m = get_mbm_state(d, closid, rmid, evtid);
>> + if (m)
>> + memset(m, 0, sizeof(struct mbm_state));
>> +
>> + return ret;
>> +}
>> +
>> +static bool mbm_cntr_assigned(struct rdt_resource *r, struct rdt_mon_domain *d,
>> + struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
>> +{
>> + int cntr_id;
>> +
>> + for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
>> + if (d->cntr_cfg[cntr_id].rdtgrp == rdtgrp &&
>> + d->cntr_cfg[cntr_id].evtid == evtid)
>> + return true;
>> + }
>> +
>> + return false;
>> +}
>> +
>> +static int mbm_cntr_alloc(struct rdt_resource *r, struct rdt_mon_domain *d,
>> + struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
>> +{
>> + int cntr_id;
>> +
>> + for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
>> + if (!d->cntr_cfg[cntr_id].rdtgrp) {
>> + d->cntr_cfg[cntr_id].rdtgrp = rdtgrp;
>> + d->cntr_cfg[cntr_id].evtid = evtid;
>> + return cntr_id;
>> + }
>> + }
>> +
>> + return -EINVAL;
>
> This can be -ENOSPC
Sure.
>
>> +}
>> +
>> +static void mbm_cntr_free(struct rdt_resource *r, struct rdt_mon_domain *d,
>> + struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
>> +{
>> + int cntr_id;
>> +
>> + for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
>> + if (d->cntr_cfg[cntr_id].rdtgrp == rdtgrp &&
>> + d->cntr_cfg[cntr_id].evtid == evtid)
>> + memset(&d->cntr_cfg[cntr_id], 0, sizeof(struct mbm_cntr_cfg));
>> + }
>> +}
>
>>From what I can tell the counter ID is always available when the counter is freed so
> it can just be freed directly without looping over array?
Yes. With immediate return on every individual failure, we can do this
easily.
>
>> +
>> +/*
>> + * Assign a hardware counter to event @evtid of group @rdtgrp.
>> + * Counter will be assigned to all the domains if rdt_mon_domain is NULL
>
> (to be consistent) "if rdt_mon_domain is NULL" -> "if @d is NULL"
Sure.
>
>> + * else the counter will be assigned to specific domain.
>
> "will be assigned to specific domain" -> "will be assigned to @d"
Sure.
>
> Reinette
>
>
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 12/24] x86/resctrl: Introduce cntr_cfg to track assignable counters at domain
2024-12-20 17:33 ` Moger, Babu
@ 2024-12-20 20:58 ` Reinette Chatre
0 siblings, 0 replies; 76+ messages in thread
From: Reinette Chatre @ 2024-12-20 20:58 UTC (permalink / raw)
To: Moger, Babu, Babu Moger, corbet, tglx, mingo, bp, dave.hansen,
tony.luck, peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Babu,
On 12/20/24 9:33 AM, Moger, Babu wrote:
> On 12/19/2024 4:33 PM, Reinette Chatre wrote:
>> On 12/12/24 12:15 PM, Babu Moger wrote:
>>> In mbm_assign_mode, the MBM counters are assigned/unassigned to an RMID,
>>> event pair in a resctrl group and monitor the bandwidth as long as it is
>>> assigned. Counters are assigned/unassigned at domain level and needs to
>>> be tracked at domain level.
>>>
>>> Add the mbm_assign_cntr_cfg data structure to struct rdt_ctrl_domain to
>>
>> "mbm_assign_cntr_cfg" -> "mbm_cntr_cfg"
>
> Sure.
>
>>
>>> manage and track MBM counter assignments at the domain level.
>>
>> This can really use some more information about this data structure. I think
>> it will be helpful to provide more information about how the data structure
>> looks ... for example, that it is an array indexed by counter ID where the
>> assignment details of each counter is stored. I also think it will be helpful
>> to describe how interactions with this data structure works, that a NULL
>> rdtgrp means that the counter is free and that it is not possible to find
>> a counter from a resource group and arrays need to be searched instead and doing
>> so is ok for $REASON (when considering the number of RMID and domain combinations
>> possible on AMD). A lot is left for the reader to figure out.
>
> How about this?
>
>
> In mbm_assign_mode, the MBM counters are assigned/unassigned to an RMID,
> event pair in a resctrl group and monitor the bandwidth as long as it is
> assigned. Counters are assigned/unassigned at domain level and needs to
> be tracked at domain level.
>
> Add the mbm_cntr_cfg data structure to struct rdt_ctrl_domain to
> manage and track MBM counter assignments at the domain level.
>
> Each domain will contain num_mbm_cntrs entries, indexed by cntr_id. During initialization, all entries will be set to zero. When a counter is allocated, its corresponding entry will be populated with the assigned struct rdtgroup and enum resctrl_event_id. When the counter is released, its entry will be reset to zero.
It will be better if you take a step back and create a coherent changelog
instead of appending independent text snippets. What you present has the
same mistake as before (mbm_assign_mode vs mbm_cntr_assign mode) and does
not address all the points raised.
Consider something like below (please check, improve, and complete):
In mbm_cntr_assign mode hardware counters are assigned/unassigned
to an MBM event of a monitor group. Hardware counters are
assigned/unassigned at monitoring domain level.
Manage a monitoring domain's hardware counters using a per monitoring
domain array of struct mbm_cntr_cfg that is indexed by the hardware
counter ID. A hardware counter's configuration contains the MBM event
ID and points to the monitoring group that it is assigned to, with a
NULL pointer meaning that the hardware counter is available for assignment.
There is no direct way to determine which hardware counters are assigned
to a particular monitoring group. Check every entry of every hardware
counter configuration array in every monitoring domain to query which
MBM events of a monitoring group is tracked by hardware. Such queries
are acceptable because <insert reason here>.
Please work on creating good changelogs. The requirements should be clear to you.
Reinette
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 17/24] x86/resctrl: Add the interface to unassign a counter
2024-12-19 23:32 ` Reinette Chatre
@ 2024-12-20 21:38 ` Moger, Babu
0 siblings, 0 replies; 76+ messages in thread
From: Moger, Babu @ 2024-12-20 21:38 UTC (permalink / raw)
To: Reinette Chatre, Babu Moger, corbet, tglx, mingo, bp, dave.hansen,
tony.luck, peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, sandipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Reinette,
On 12/19/2024 5:32 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 12/12/24 12:15 PM, Babu Moger wrote:
>> The mbm_cntr_assign mode provides a limited number of hardware counters
>> that can be assigned to an RMID, event pair to monitor bandwidth while
>> assigned. If all counters are in use, the kernel will show an error
>> message: "Out of MBM assignable counters" when a new assignment is
>> requested. To make space for a new assignment, users must unassign an
>> already assigned counter.
>>
>> Introduce an interface that allows for the unassignment of counter IDs
>> from the domain.
>
> Subject and changelog claims this introduces an interface, there is no new
> resctrl interface introduced here. Can this be more specific?
Sure. Let me rewrite the subject and description.
>
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> ---
>> arch/x86/kernel/cpu/resctrl/internal.h | 2 +
>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 52 ++++++++++++++++++++++++++
>> 2 files changed, 54 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 70d2577fc377..f858098dbe4b 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -706,6 +706,8 @@ int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>> u32 cntr_id, bool assign);
>> int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>> struct rdt_mon_domain *d, enum resctrl_event_id evtid);
>> +int rdtgroup_unassign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>> + struct rdt_mon_domain *d, enum resctrl_event_id evtid);
>
> (please use consistent parameter ordering)
Sure.
>
>> struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
>> u32 rmid, enum resctrl_event_id evtid);
>> #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 1c8694a68cf4..a71a8389b649 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1990,6 +1990,20 @@ static void mbm_cntr_free(struct rdt_resource *r, struct rdt_mon_domain *d,
>> }
>> }
>>
>> +static int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
>> + struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
>> +{
>> + int cntr_id;
>> +
>> + for (cntr_id = 0; cntr_id < r->mon.num_mbm_cntrs; cntr_id++) {
>> + if (d->cntr_cfg[cntr_id].rdtgrp == rdtgrp &&
>> + d->cntr_cfg[cntr_id].evtid == evtid)
>> + return cntr_id;
>> + }
>> +
>> + return -EINVAL;
>
> This could be -ENOENT?
Sure.
>
>> +}
>
> mbm_cntr_get() seems to be essentially a duplicate of mbm_cntr_assigned() that returns
> actual counter ID instrad of true/false. Could only one be used?
Yes. We can use mbm_cntr_get() alone.
>
>> +
>> /*
>> * Assign a hardware counter to event @evtid of group @rdtgrp.
>> * Counter will be assigned to all the domains if rdt_mon_domain is NULL
>> @@ -2037,6 +2051,44 @@ int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>> return ret;
>> }
>>
>> +/*
>> + * Unassign a hardware counter associated with @evtid from the domain and
>> + * the group. Unassign the counters from all the domains if rdt_mon_domain
>> + * is NULL else unassign from the specific domain.
>
> (same comment as previous patch about consistency in referring to function
> parameters)
>
Sure.
>> + */
>> +int rdtgroup_unassign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>> + struct rdt_mon_domain *d, enum resctrl_event_id evtid)
>> +{
>> + int cntr_id, ret = 0;
>> +
>> + if (!d) {
>> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> + if (!mbm_cntr_assigned(r, d, rdtgrp, evtid))
>> + continue;
>> +
>> + cntr_id = mbm_cntr_get(r, d, rdtgrp, evtid);
>> +
>
> It seems unnecessary to loop over array twice here. mbm_cntr_assigned() seems
> unnecessary. Return value of mbm_cntr_get() can be used to determine if it
> is assigned or not?
Yes. Sure.
>
>> + ret = resctrl_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
>> + rdtgrp->closid, cntr_id, false);
>> + if (!ret)
>> + mbm_cntr_free(r, d, rdtgrp, evtid);
>
> ... and by providing cntr_id to mbm_cntr_free() another unnecessary loop can be avoided.
Sure.
>
>> + }
>> + } else {
>> + if (!mbm_cntr_assigned(r, d, rdtgrp, evtid))
>> + goto out_done_unassign;
>> +
>> + cntr_id = mbm_cntr_get(r, d, rdtgrp, evtid);
>> +
>> + ret = resctrl_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
>> + rdtgrp->closid, cntr_id, false);
>> + if (!ret)
>> + mbm_cntr_free(r, d, rdtgrp, evtid);
>> + }
>> +
>> +out_done_unassign:
>> + return ret;
>> +}
>> +
>> /* rdtgroup information files for one cache resource. */
>> static struct rftype res_common_files[] = {
>> {
>
> Reinette
>
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 15/24] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
2024-12-20 19:22 ` Moger, Babu
@ 2024-12-20 21:41 ` Reinette Chatre
2024-12-20 22:28 ` Moger, Babu
0 siblings, 1 reply; 76+ messages in thread
From: Reinette Chatre @ 2024-12-20 21:41 UTC (permalink / raw)
To: Moger, Babu, Babu Moger, corbet, tglx, mingo, bp, dave.hansen,
tony.luck, peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, sandipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Babu,
On 12/20/24 11:22 AM, Moger, Babu wrote:
> Hi Reinette,
>
> On 12/19/2024 5:04 PM, Reinette Chatre wrote:
>> (andipan.das@amd.com -> sandipan.das@amd.com to stop sending undeliverable emails)
>
> Yes.
>
>>
>> Hi Babu,
>>
>> On 12/12/24 12:15 PM, Babu Moger wrote:
>>> The ABMC feature provides an option to the user to assign a hardware
>>> counter to an RMID, event pair and monitor the bandwidth as long as it is
>>> assigned. The assigned RMID will be tracked by the hardware until the user
>>> unassigns it manually.
>>>
>>> Configure the counters by writing to the L3_QOS_ABMC_CFG MSR and specifying
>>> the counter ID, bandwidth source (RMID), and bandwidth event configuration.
>>>
>>> Provide the interface to assign the counter ids to RMID.
>>
>> Until now in this series many patches "introduced interface X" and every
>> time it was some new resctrl file that user space interacts with. This
>> changelog starts with a context about "user to assign a hardware counter"
>> and ends with "Provide the interface", but there is no new user interface
>> in this patch. Can this be more specific about what this patch does?
>
> Yes. This should be about resctrl_arch_config_cntr(). How about this?
>
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID, event pair and monitor the bandwidth as long as it is assigned. The assigned RMID will be tracked by the hardware until the user unassigns it manually.
>
> Provide the architecture specific handler to to assign/unassign the counter. Counters are configured by writing to the L3_QOS_ABMC_CFG MSR and specifying the counter ID, bandwidth source (RMID), and bandwidth event configuration.
Again just one sentence appended. The "to to" demonstrates it is another
example of something typed quickly to see if it sticks.
>>> @@ -1686,6 +1686,34 @@ unsigned int mon_event_config_index_get(u32 evtid)
>>> }
>>> }
>>> +struct cntr_config {
>>> + struct rdt_resource *r;
>>> + struct rdt_mon_domain *d;
>>> + enum resctrl_event_id evtid;
>>> + u32 rmid;
>>> + u32 closid;
>>> + u32 cntr_id;
>>> + u32 val;
>>> + bool assign;
>>> +};
>>
>> I think I am missing something because it is not clear to me why this
>> new struct is needed. Why not just use union l3_qos_abmc_cfg?
>
> New struct is needed because we need to call resctrl_arch_reset_rmid() inside IPI. It requires all these parameters.
Could you please answer my question?
>>> @@ -1869,6 +1897,36 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
>>> return ret ?: nbytes;
>>> }
>>> +/*
>>> + * Send an IPI to the domain to assign the counter to RMID, event pair.
>>> + */
>>> +int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>>> + enum resctrl_event_id evtid, u32 rmid, u32 closid,
>>> + u32 cntr_id, bool assign)
>>> +{
>>> + struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
>>> + struct cntr_config config = { 0 };
>>
>> Please see 29eaa7958367 ("x86/resctrl: Slightly clean-up mbm_config_show()")
>
> This may not apply here.
>
> x86/resctrl: Slightly clean-up mbm_config_show()
>
> "mon_info' is already zeroed in the list_for_each_entry() loop below. There is no need to explicitly initialize it here. It just wastes some space and cycles.
>
> In our case we are not doing memset again.
No, but every member is explicitly initialized instead. It may be needed if
union l3_qos_abmc_cfg is used as I asked about earlier where it will be important
to ensure reserve bits are initialized.
Reinette
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 15/24] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
2024-12-20 21:41 ` Reinette Chatre
@ 2024-12-20 22:28 ` Moger, Babu
2024-12-20 23:47 ` Reinette Chatre
0 siblings, 1 reply; 76+ messages in thread
From: Moger, Babu @ 2024-12-20 22:28 UTC (permalink / raw)
To: Reinette Chatre, Babu Moger, corbet, tglx, mingo, bp, dave.hansen,
tony.luck, peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, sandipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Reinette,
On 12/20/2024 3:41 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 12/20/24 11:22 AM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 12/19/2024 5:04 PM, Reinette Chatre wrote:
>>> (andipan.das@amd.com -> sandipan.das@amd.com to stop sending undeliverable emails)
>>
>> Yes.
>>
>>>
>>> Hi Babu,
>>>
>>> On 12/12/24 12:15 PM, Babu Moger wrote:
>>>> The ABMC feature provides an option to the user to assign a hardware
>>>> counter to an RMID, event pair and monitor the bandwidth as long as it is
>>>> assigned. The assigned RMID will be tracked by the hardware until the user
>>>> unassigns it manually.
>>>>
>>>> Configure the counters by writing to the L3_QOS_ABMC_CFG MSR and specifying
>>>> the counter ID, bandwidth source (RMID), and bandwidth event configuration.
>>>>
>>>> Provide the interface to assign the counter ids to RMID.
>>>
>>> Until now in this series many patches "introduced interface X" and every
>>> time it was some new resctrl file that user space interacts with. This
>>> changelog starts with a context about "user to assign a hardware counter"
>>> and ends with "Provide the interface", but there is no new user interface
>>> in this patch. Can this be more specific about what this patch does?
>>
>> Yes. This should be about resctrl_arch_config_cntr(). How about this?
>>
>> The ABMC feature provides an option to the user to assign a hardware
>> counter to an RMID, event pair and monitor the bandwidth as long as it is assigned. The assigned RMID will be tracked by the hardware until the user unassigns it manually.
>>
>> Provide the architecture specific handler to to assign/unassign the counter. Counters are configured by writing to the L3_QOS_ABMC_CFG MSR and specifying the counter ID, bandwidth source (RMID), and bandwidth event configuration.
>
> Again just one sentence appended. The "to to" demonstrates it is another
> example of something typed quickly to see if it sticks.
My bad. Will rewrite this.
>
>
>>>> @@ -1686,6 +1686,34 @@ unsigned int mon_event_config_index_get(u32 evtid)
>>>> }
>>>> }
>>>> +struct cntr_config {
>>>> + struct rdt_resource *r;
>>>> + struct rdt_mon_domain *d;
>>>> + enum resctrl_event_id evtid;
>>>> + u32 rmid;
>>>> + u32 closid;
>>>> + u32 cntr_id;
>>>> + u32 val;
>>>> + bool assign;
>>>> +};
>>>
>>> I think I am missing something because it is not clear to me why this
>>> new struct is needed. Why not just use union l3_qos_abmc_cfg?
>>
>> New struct is needed because we need to call resctrl_arch_reset_rmid() inside IPI. It requires all these parameters.
>
> Could you please answer my question?
May be I did not understand your question here.
We need to do couple of things here in the IPI.
1. Configure the counter. This requires the cntr_id, rmid, event config
value and assign(or unassign). This is to populate l3_qos_abmc_cfg and
write the MSR.
2. Reset RMID. This requires rdt_resource, rdt_mon_domain, RMID, CLOSID
and event.
So, I packed all these in a new structure and sent to IPI handler so
that both these actions can be done in IPI.
Can this be simplified?
>
>>>> @@ -1869,6 +1897,36 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
>>>> return ret ?: nbytes;
>>>> }
>>>> +/*
>>>> + * Send an IPI to the domain to assign the counter to RMID, event pair.
>>>> + */
>>>> +int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>>>> + enum resctrl_event_id evtid, u32 rmid, u32 closid,
>>>> + u32 cntr_id, bool assign)
>>>> +{
>>>> + struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
>>>> + struct cntr_config config = { 0 };
>>>
>>> Please see 29eaa7958367 ("x86/resctrl: Slightly clean-up mbm_config_show()")
>>
>> This may not apply here.
>>
>> x86/resctrl: Slightly clean-up mbm_config_show()
>>
>> "mon_info' is already zeroed in the list_for_each_entry() loop below. There is no need to explicitly initialize it here. It just wastes some space and cycles.
>>
>> In our case we are not doing memset again.
>
> No, but every member is explicitly initialized instead. It may be needed if
> union l3_qos_abmc_cfg is used as I asked about earlier where it will be important
> to ensure reserve bits are initialized.
I missed your comment on reserve bits(Searched in this series). General
rule is reserve bits should be written as zeros.
Thanks
Babu
>
> Reinette
>
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 15/24] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
2024-12-20 22:28 ` Moger, Babu
@ 2024-12-20 23:47 ` Reinette Chatre
2024-12-21 13:40 ` Moger, Babu
0 siblings, 1 reply; 76+ messages in thread
From: Reinette Chatre @ 2024-12-20 23:47 UTC (permalink / raw)
To: Moger, Babu, Babu Moger, corbet, tglx, mingo, bp, dave.hansen,
tony.luck, peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, sandipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Babu,
On 12/20/24 2:28 PM, Moger, Babu wrote:
> On 12/20/2024 3:41 PM, Reinette Chatre wrote:
>> On 12/20/24 11:22 AM, Moger, Babu wrote:
>>> On 12/19/2024 5:04 PM, Reinette Chatre wrote:
>>>>> @@ -1686,6 +1686,34 @@ unsigned int mon_event_config_index_get(u32 evtid)
>>>>> }
>>>>> }
>>>>> +struct cntr_config {
>>>>> + struct rdt_resource *r;
>>>>> + struct rdt_mon_domain *d;
>>>>> + enum resctrl_event_id evtid;
>>>>> + u32 rmid;
>>>>> + u32 closid;
>>>>> + u32 cntr_id;
>>>>> + u32 val;
>>>>> + bool assign;
>>>>> +};
>>>>
>>>> I think I am missing something because it is not clear to me why this
>>>> new struct is needed. Why not just use union l3_qos_abmc_cfg?
>>>
>>> New struct is needed because we need to call resctrl_arch_reset_rmid() inside IPI. It requires all these parameters.
>>
>> Could you please answer my question?
>
> May be I did not understand your question here.
>
> We need to do couple of things here in the IPI.
>
> 1. Configure the counter. This requires the cntr_id, rmid, event config value and assign(or unassign). This is to populate l3_qos_abmc_cfg and write the MSR.
>
> 2. Reset RMID. This requires rdt_resource, rdt_mon_domain, RMID, CLOSID and event.
>
> So, I packed all these in a new structure and sent to IPI handler so that both these actions can be done in IPI.
>
> Can this be simplified?
This is all architecture specific code so I think l3_qos_abmc_cfg can be
initialized once and then passed around. Bouncing the individual members of
l3_qos_abmc_cfg through struct cntr_config seems unnecessary to me. More specifically,
would it not make things simpler to make l3_qos_abmc_cfg a member of cntr_config?
>>>>> @@ -1869,6 +1897,36 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
>>>>> return ret ?: nbytes;
>>>>> }
>>>>> +/*
>>>>> + * Send an IPI to the domain to assign the counter to RMID, event pair.
>>>>> + */
>>>>> +int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>>>>> + enum resctrl_event_id evtid, u32 rmid, u32 closid,
>>>>> + u32 cntr_id, bool assign)
>>>>> +{
>>>>> + struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
>>>>> + struct cntr_config config = { 0 };
>>>>
>>>> Please see 29eaa7958367 ("x86/resctrl: Slightly clean-up mbm_config_show()")
>>>
>>> This may not apply here.
>>>
>>> x86/resctrl: Slightly clean-up mbm_config_show()
>>>
>>> "mon_info' is already zeroed in the list_for_each_entry() loop below. There is no need to explicitly initialize it here. It just wastes some space and cycles.
>>>
>>> In our case we are not doing memset again.
>>
>> No, but every member is explicitly initialized instead. It may be needed if
>> union l3_qos_abmc_cfg is used as I asked about earlier where it will be important
>> to ensure reserve bits are initialized.
>
> I missed your comment on reserve bits(Searched in this series). General rule is reserve bits should be written as zeros.
I do not think I am being clear.
Back to original comment: resctrl_arch_config_cntr() zeroes the entire struct and then
initializes every member. I do not think it is necessary to zero the struct if
every member is initialized. If you want to be explicit about the zero initialization
you can do so while initializing the struct only once where it is defined.
See for example, rdtgroup_kn_set_ugid()
Reinette
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 15/24] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
2024-12-20 23:47 ` Reinette Chatre
@ 2024-12-21 13:40 ` Moger, Babu
0 siblings, 0 replies; 76+ messages in thread
From: Moger, Babu @ 2024-12-21 13:40 UTC (permalink / raw)
To: Reinette Chatre, Babu Moger, corbet, tglx, mingo, bp, dave.hansen,
tony.luck, peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, sandipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Reinette,
On 12/20/2024 5:47 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 12/20/24 2:28 PM, Moger, Babu wrote:
>> On 12/20/2024 3:41 PM, Reinette Chatre wrote:
>>> On 12/20/24 11:22 AM, Moger, Babu wrote:
>>>> On 12/19/2024 5:04 PM, Reinette Chatre wrote:
>
>>>>>> @@ -1686,6 +1686,34 @@ unsigned int mon_event_config_index_get(u32 evtid)
>>>>>> }
>>>>>> }
>>>>>> +struct cntr_config {
>>>>>> + struct rdt_resource *r;
>>>>>> + struct rdt_mon_domain *d;
>>>>>> + enum resctrl_event_id evtid;
>>>>>> + u32 rmid;
>>>>>> + u32 closid;
>>>>>> + u32 cntr_id;
>>>>>> + u32 val;
>>>>>> + bool assign;
>>>>>> +};
>>>>>
>>>>> I think I am missing something because it is not clear to me why this
>>>>> new struct is needed. Why not just use union l3_qos_abmc_cfg?
>>>>
>>>> New struct is needed because we need to call resctrl_arch_reset_rmid() inside IPI. It requires all these parameters.
>>>
>>> Could you please answer my question?
>>
>> May be I did not understand your question here.
>>
>> We need to do couple of things here in the IPI.
>>
>> 1. Configure the counter. This requires the cntr_id, rmid, event config value and assign(or unassign). This is to populate l3_qos_abmc_cfg and write the MSR.
>>
>> 2. Reset RMID. This requires rdt_resource, rdt_mon_domain, RMID, CLOSID and event.
>>
>> So, I packed all these in a new structure and sent to IPI handler so that both these actions can be done in IPI.
>>
>> Can this be simplified?
>
> This is all architecture specific code so I think l3_qos_abmc_cfg can be
> initialized once and then passed around. Bouncing the individual members of
> l3_qos_abmc_cfg through struct cntr_config seems unnecessary to me. More specifically,
> would it not make things simpler to make l3_qos_abmc_cfg a member of cntr_config?
Yes. It can be done.
>
>>>>>> @@ -1869,6 +1897,36 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
>>>>>> return ret ?: nbytes;
>>>>>> }
>>>>>> +/*
>>>>>> + * Send an IPI to the domain to assign the counter to RMID, event pair.
>>>>>> + */
>>>>>> +int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>>>>>> + enum resctrl_event_id evtid, u32 rmid, u32 closid,
>>>>>> + u32 cntr_id, bool assign)
>>>>>> +{
>>>>>> + struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
>>>>>> + struct cntr_config config = { 0 };
>>>>>
>>>>> Please see 29eaa7958367 ("x86/resctrl: Slightly clean-up mbm_config_show()")
>>>>
>>>> This may not apply here.
>>>>
>>>> x86/resctrl: Slightly clean-up mbm_config_show()
>>>>
>>>> "mon_info' is already zeroed in the list_for_each_entry() loop below. There is no need to explicitly initialize it here. It just wastes some space and cycles.
>>>>
>>>> In our case we are not doing memset again.
>>>
>>> No, but every member is explicitly initialized instead. It may be needed if
>>> union l3_qos_abmc_cfg is used as I asked about earlier where it will be important
>>> to ensure reserve bits are initialized.
>>
>> I missed your comment on reserve bits(Searched in this series). General rule is reserve bits should be written as zeros.
>
>
> I do not think I am being clear.
>
> Back to original comment: resctrl_arch_config_cntr() zeroes the entire struct and then
> initializes every member. I do not think it is necessary to zero the struct if
> every member is initialized. If you want to be explicit about the zero initialization
> you can do so while initializing the struct only once where it is defined.
> See for example, rdtgroup_kn_set_ugid()
Yes. I got it. It was not required as we are initializing all the
members of config here.
With adding l3_qos_abmc_cfg inside cntr_config, we may still have to
keep it.
Thanks
Babu
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 18/24] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled
2024-12-19 23:39 ` Reinette Chatre
@ 2024-12-21 13:45 ` Moger, Babu
0 siblings, 0 replies; 76+ messages in thread
From: Moger, Babu @ 2024-12-21 13:45 UTC (permalink / raw)
To: Reinette Chatre, Babu Moger, corbet, tglx, mingo, bp, dave.hansen,
tony.luck, peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, sandipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Reinette,
On 12/19/2024 5:39 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 12/12/24 12:15 PM, Babu Moger wrote:
>> Assign/unassign counters on resctrl group creation/deletion. Two counters
>> are required per group, one for MBM total event and one for MBM local
>> event.
>>
>> There are a limited number of counters available for assignment. If these
>> counters are exhausted, the kernel will display the error message: "Out of
>> MBM assignable counters". However, it is not necessary to fail the
>> creation of a group due to assignment failures. Users have the flexibility
>> to modify the assignments at a later time.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> ---
>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 81 +++++++++++++++++++++++++-
>> 1 file changed, 79 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index a71a8389b649..5acae525881a 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -920,6 +920,25 @@ static int rdtgroup_available_mbm_cntrs_show(struct kernfs_open_file *of,
>> return ret;
>> }
>>
>> +static void mbm_cntr_reset(struct rdt_resource *r)
>> +{
>> + struct rdt_mon_domain *dom;
>> +
>> + /*
>> + * Hardware counters will reset after switching the monitor mode.
>> + * Reset the architectural state so that reading of hardware
>> + * counter is not considered as an overflow in the next update.
>> + * Also reset the domain counter bitmap.
>> + */
>> + if (is_mbm_enabled() && r->mon.mbm_cntr_assignable) {
>> + list_for_each_entry(dom, &r->mon_domains, hdr.list) {
>> + memset(dom->cntr_cfg, 0,
>> + sizeof(*dom->cntr_cfg) * r->mon.num_mbm_cntrs);
>> + resctrl_arch_reset_rmid_all(r, dom);
>
> This looks to be missing reset of resctrl monitor state (from get_mbm_state()).
Yes. Will do.
>
> ...
>
>> static int rdt_get_tree(struct fs_context *fc)
>> {
>> struct rdt_fs_context *ctx = rdt_fc2context(fc);
>> @@ -3023,6 +3082,8 @@ static int rdt_get_tree(struct fs_context *fc)
>> if (ret < 0)
>> goto out_info;
>>
>> + rdtgroup_assign_cntrs(&rdtgroup_default);
>> +
>> ret = mkdir_mondata_all(rdtgroup_default.kn,
>> &rdtgroup_default, &kn_mondata);
>> if (ret < 0)
>
> If this mkdir_mondata_all() fails it calls "goto out_mongrp" ...
Sure.
>
>> @@ -3058,8 +3119,10 @@ static int rdt_get_tree(struct fs_context *fc)
>> out_psl:
>> rdt_pseudo_lock_release();
>> out_mondata:
>> - if (resctrl_arch_mon_capable())
>> + if (resctrl_arch_mon_capable()) {
>> kernfs_remove(kn_mondata);
>> + rdtgroup_unassign_cntrs(&rdtgroup_default);
>> + }
>> out_mongrp:
>> if (resctrl_arch_mon_capable())
>> kernfs_remove(kn_mongrp);
>
> Looks like this will miss counter cleanup on failure of mkdir_mondata_all().
Sure.
>
>> @@ -3238,6 +3301,7 @@ static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp)
>>
>> head = &rdtgrp->mon.crdtgrp_list;
>> list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) {
>> + rdtgroup_unassign_cntrs(sentry);
>> free_rmid(sentry->closid, sentry->mon.rmid);
>> list_del(&sentry->mon.crdtgrp_list);
>>
>> @@ -3278,6 +3342,8 @@ static void rmdir_all_sub(void)
>> cpumask_or(&rdtgroup_default.cpu_mask,
>> &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);
>>
>> + rdtgroup_unassign_cntrs(rdtgrp);
>> +
>> free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>>
>> kernfs_remove(rdtgrp->kn);
>> @@ -3309,6 +3375,8 @@ static void rdt_kill_sb(struct super_block *sb)
>> for_each_alloc_capable_rdt_resource(r)
>> reset_all_ctrls(r);
>> rmdir_all_sub();
>> + rdtgroup_unassign_cntrs(&rdtgroup_default);
>> + mbm_cntr_reset(&rdt_resources_all[RDT_RESOURCE_L3].r_resctrl);
>> rdt_pseudo_lock_release();
>> rdtgroup_default.mode = RDT_MODE_SHAREABLE;
>> schemata_list_destroy();
>> @@ -3772,6 +3840,8 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
>> }
>> rdtgrp->mon.rmid = ret;
>>
>> + rdtgroup_assign_cntrs(rdtgrp);
>> +
>> ret = mkdir_mondata_all(rdtgrp->kn, rdtgrp, &rdtgrp->mon.mon_data_kn);
>> if (ret) {
>> rdt_last_cmd_puts("kernfs subdir error\n");
>
> Cleanup of assigned counters if mkdir_mondata_all() fails seems to be missing here also.
Sure.
Thanks
Babu
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 19/24] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode
2024-12-19 23:59 ` Reinette Chatre
@ 2024-12-21 14:04 ` Moger, Babu
0 siblings, 0 replies; 76+ messages in thread
From: Moger, Babu @ 2024-12-21 14:04 UTC (permalink / raw)
To: Reinette Chatre, Babu Moger, corbet, tglx, mingo, bp, dave.hansen,
tony.luck, peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, andipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Reinette,
On 12/19/2024 5:59 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 12/12/24 12:15 PM, Babu Moger wrote:
>> In mbm_cntr_assign mode, the hardware counter should be assigned to read
>> the MBM events.
>>
>> Report 'Unassigned' in case the user attempts to read the events without
>> assigning the counter.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>
> ..
>
>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>> index c075fcee96b7..3ec14c314606 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -430,6 +430,16 @@ When monitoring is enabled all MON groups will also contain:
>> for the L3 cache they occupy). These are named "mon_sub_L3_YY"
>> where "YY" is the node number.
>>
>> + When supported the mbm_cntr_assign mode allows users to assign a
>
> "When supported" -> "When enabled"? Or perhaps just drop that and start with
> "mbm_cntr_assign mode allows users ..."
ok.
>
>
>> + counter to mon_hw_id, event pair enabling bandwidth monitoring for
>> + as long as the counter remains assigned. The hardware will continue
>> + tracking the assigned mon_hw_id until the user manually unassigns
>> + it, ensuring that counters are not reset during this period. With
>> + a limited number of counters, the system may run out of assignable
>> + counters. In that case, MBM event counters will return 'Unassigned'
>> + when the event is read. Users must manually assign a counter to read
>> + the events.
>> +
>> "mon_hw_id":
>> Available only with debug option. The identifier used by hardware
>> for the monitor group. On x86 this is the RMID.
>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> index 200d89a64027..8e265a86e524 100644
>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> @@ -527,6 +527,12 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>> /* When picking a CPU from cpu_mask, ensure it can't race with cpuhp */
>> lockdep_assert_cpus_held();
>>
>> + if (resctrl_arch_mbm_cntr_assign_enabled(r) && is_mbm_event(evtid) &&
>> + !mbm_cntr_assigned(r, d, rdtgrp, evtid)) {
>> + rr->err = -ENOENT;
>> + return;
>> + }
>> +
>
> hmmm ... d can be NULL here after the SNC support. Since the file that needs a
> sum is essentially software backed I do not think assigning counters would
> apply to it (but it may theoretically apply to the domains it consists of).
> I think it may be safer to just move this check into rdtgroup_mondata_show()
> where it reads data for a single domain.
Sure.
>
> I am not sure if we need to change the documentation because of this. One option
> could be a rewording to "MBM event counters may return 'Unassigned' or
> 'Unavailable' when the event is read".
ok.
Thanks
Babu
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 20/24] x86/resctrl: Introduce the interface to switch between monitor modes
2024-12-20 2:56 ` Reinette Chatre
@ 2024-12-21 14:20 ` Moger, Babu
0 siblings, 0 replies; 76+ messages in thread
From: Moger, Babu @ 2024-12-21 14:20 UTC (permalink / raw)
To: Reinette Chatre, Babu Moger, corbet, tglx, mingo, bp, dave.hansen,
tony.luck, peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, sandipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Reinette,
On 12/19/2024 8:56 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 12/12/24 12:15 PM, Babu Moger wrote:
>> Introduce interface to switch between mbm_cntr_assign and default modes.
>>
>
> This changelog needs context.
Sure.
>
>> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> [mbm_cntr_assign]
>> default
>>
>> To enable the "mbm_cntr_assign" mode:
>> $ echo "mbm_cntr_assign" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>>
>> To enable the default monitoring mode:
>> $ echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>>
>> MBM event counters will reset when mbm_assign_mode is changed.
>
> I think it will help to elaborate on this.
>
> I understand this as two parts. As stated, the hardware counters
> are reset since that is what ABMC does. In this patch
> there is a mbm_cntr_reset() but that does not actually reset the counters as
> the above implies.
> Instead, the counters are automatically reset as part of changing the mode.
> resctrl triggers reset of architectural and non-architectural
> state of the events because of the hardware counter reset.
>
> The changelog can really do more to explain what this patch does.
Ok. Will do.
>
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>
>> ---
>> Documentation/arch/x86/resctrl.rst | 15 ++++++++
>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 50 +++++++++++++++++++++++++-
>> 2 files changed, 64 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>> index 3ec14c314606..d3a8a34cf629 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -290,6 +290,21 @@ with the following files:
>> "mbm_total_bytes" or "mbm_local_bytes" will report 'Unavailable' if
>> there is no counter associated with that event.
>>
>> + * To enable "mbm_cntr_assign" mode:
>> + ::
>> +
>> + # echo "mbm_cntr_assign" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> +
>> + * To enable default monitoring mode:
>> + ::
>> +
>> + # echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> +
>> + The MBM events (mbm_total_bytes and/or mbm_local_bytes) associated with
>> + counters may reset when "mbm_assign_mode" is changed. Moving to
>
> After looking at the final documentation it seems more appropriate to move this to
> the top of the "mbm_assign_mode" section. The top already shows how to read from the
> file using cat so it seems like a good match to document write to the file in the
> same area.
ok.
>
>> + mbm_cntr_assign mode require users to assign the counters to the events.
>> + Otherwise, the MBM event counters will return "Unassigned" when read.
>
> This portion can move to the mode it applies to.
ok.
>
>> +
>> "num_mbm_cntrs":
>> The number of monitoring counters available for assignment when the
>> architecture supports mbm_cntr_assign mode.
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 8d00b1689a80..eea534cce3d0 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -939,6 +939,53 @@ static void mbm_cntr_reset(struct rdt_resource *r)
>> }
>> }
>>
>> +static ssize_t rdtgroup_mbm_assign_mode_write(struct kernfs_open_file *of,
>> + char *buf, size_t nbytes, loff_t off)
>
> rdtgroup_ namespace is not appropriate
Will rename as resctrl_
thanks
Babu
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 21/24] x86/resctrl: Configure mbm_cntr_assign mode if supported
2024-12-20 3:03 ` Reinette Chatre
@ 2024-12-21 14:33 ` Moger, Babu
0 siblings, 0 replies; 76+ messages in thread
From: Moger, Babu @ 2024-12-21 14:33 UTC (permalink / raw)
To: Reinette Chatre, Babu Moger, corbet, tglx, mingo, bp, dave.hansen,
tony.luck, peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, sandipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Reinette,
On 12/19/2024 9:03 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 12/12/24 12:15 PM, Babu Moger wrote:
>> Configure mbm_cntr_assign on AMD. 'mbm_cntr_assign' mode in AMD is ABMC
>> (Assignable Bandwidth Monitoring Counters). It is enabled by default when
>> supported on the system.
>
> Needs imperative "Enable mbm_cntr_assign mode ..."
Sure.
>
>>
>> Ensure that the ABMC is updated on all logical processors in the resctrl
>> domain.
>
> Needs imperative (for example) "Update the assignable counter mode .."
>
ok. Sure.
>
> Please distinguish how it is the architecture that decides what the
> default mode should be. resctrl's part is to ensure that architecture
> gets opportunity to configure every logical processor as it comes online.
>
Yes. Got.
thanks
Babu
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 22/24] x86/resctrl: Update assignments on event configuration changes
2024-12-20 3:12 ` Reinette Chatre
@ 2024-12-21 14:59 ` Moger, Babu
2024-12-23 16:20 ` Reinette Chatre
0 siblings, 1 reply; 76+ messages in thread
From: Moger, Babu @ 2024-12-21 14:59 UTC (permalink / raw)
To: Reinette Chatre, Babu Moger, corbet, tglx, mingo, bp, dave.hansen,
tony.luck, peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, sandipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Reinette,
On 12/19/2024 9:12 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 12/12/24 12:15 PM, Babu Moger wrote:
>> Resctrl provides option to configure events by writing to the interfaces
>> /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config or
>> /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config when BMEC (Bandwidth
>> Monitoring Event Configuration) is supported.
>>
>> Whenever the event configuration is updated, MBM assignments must be
>> revised across all monitor groups within the impacted domains.
>
> This needs imperative tone description of what this patch does.
Sure.
>
>
> ...
>
>> @@ -1825,6 +1825,54 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
>> return 0;
>> }
>>
>> +/*
>> + * Review the cntr_cfg domain configuration. If a matching assignment is found,
>> + * update the counter assignment accordingly. This is within the IPI Context,
>
> This "Review the cntr_cfg domain configuration. If a matching assignment is found,"
> is too vague for me to make sense of what it is trying to do. Can this be made more specific?
Does this look ok?
Check the counter configuration in the domain. If the specific event is
configured, then update the assignment with the new event configuration
value. This is within the IPI Context, so call
resctrl_abmc_config_one_amd directly"
Thanks,
Babu
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 24/24] x86/resctrl: Introduce interface to modify assignment states of the groups
2024-12-20 3:23 ` Reinette Chatre
@ 2024-12-21 15:28 ` Moger, Babu
0 siblings, 0 replies; 76+ messages in thread
From: Moger, Babu @ 2024-12-21 15:28 UTC (permalink / raw)
To: Reinette Chatre, Babu Moger, corbet, tglx, mingo, bp, dave.hansen,
tony.luck, peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, sandipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Reinette,
On 12/19/2024 9:23 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 12/12/24 12:15 PM, Babu Moger wrote:
>> Introduce the interface to assign MBM events in mbm_cntr_assign mode.
>
> Seems like something is missing ... there is no mention about what
> MBM events are assigned "to".
Sure. Will add some context.
>
> ...
>
>> + if (assign_state & ASSIGN_LOCAL) {
>> + ret = rdtgroup_assign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID);
>> + if (ret)
>> + goto out_fail;
>> + }
>> +
>> + goto next;
>> +
>> +out_fail:
>> + sprintf(domain, d ? "%ld" : "*", dom_id);
>> +
>
> The static checker I tried complains that dom_id can be used uninitialized.
Interesting.
dom_id can be uninitialized. That is why we have check "d ?"
unsigned long dom_id = 0;
This might help.
Thanks
Babu
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 22/24] x86/resctrl: Update assignments on event configuration changes
2024-12-21 14:59 ` Moger, Babu
@ 2024-12-23 16:20 ` Reinette Chatre
2025-01-13 20:03 ` Moger, Babu
0 siblings, 1 reply; 76+ messages in thread
From: Reinette Chatre @ 2024-12-23 16:20 UTC (permalink / raw)
To: Moger, Babu, Babu Moger, corbet, tglx, mingo, bp, dave.hansen,
tony.luck, peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, sandipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Babu,
On 12/21/24 6:59 AM, Moger, Babu wrote:
> Hi Reinette,
>
> On 12/19/2024 9:12 PM, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 12/12/24 12:15 PM, Babu Moger wrote:
>>> Resctrl provides option to configure events by writing to the interfaces
>>> /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config or
>>> /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config when BMEC (Bandwidth
>>> Monitoring Event Configuration) is supported.
>>>
>>> Whenever the event configuration is updated, MBM assignments must be
>>> revised across all monitor groups within the impacted domains.
>>
>> This needs imperative tone description of what this patch does.
>
> Sure.
>
>>
>> ...
>>
>>> @@ -1825,6 +1825,54 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
>>> return 0;
>>> }
>>> +/*
>>> + * Review the cntr_cfg domain configuration. If a matching assignment is found,
>>> + * update the counter assignment accordingly. This is within the IPI Context,
>>
>> This "Review the cntr_cfg domain configuration. If a matching assignment is found,"
>> is too vague for me to make sense of what it is trying to do. Can this be made more specific?
>
> Does this look ok?
>
> Check the counter configuration in the domain. If the specific event is configured, then update the assignment with the new event configuration value. This is within the IPI Context, so call resctrl_abmc_config_one_amd directly"
I think it will be easier to understand what this function does if the
comment is made more specific. For example:
Update hardware counter configuration after event configuration change.
Walk the hardware counters of domain @d to reconfigure all assigned
counters that are monitoring @evtid with the event's new configuration
@mon_config (or @config_val).
This is run on a CPU belonging to domain @d so call
resctrl_abmc_config_one_amd() directly.
Looking closer at architecture specific resctrl_arch_update_cntr() the
reset of non-arch state (get_mbm_state()->memset()) seems out of place.
There is a resctrl_arch_reset_rmid_all() within mbm_config_write_domain() that
resets all architectural state after the event configuration is changed,
should the non-architectural state not also be reset at that time? It looks
to me like it is something that may be needed for existing event
configuration (but not an issue until Peter's new feature lands) and when done,
the reset done within resctrl_arch_update_cntr() will no longer be necessary.
Something else to consider is the resctrl_arch_reset_rmid() within
resctrl_abmc_config_one_amd() seems redundant on this call path since
it is followed by resctrl_arch_reset_rmid_all(). resctrl_arch_reset_rmid()
does one MSR write and one MSR read for every counter that needs to be
reconfigured and if that is unnecessary it may be worthwhile to optimize
out?
Reinette
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v10 22/24] x86/resctrl: Update assignments on event configuration changes
2024-12-23 16:20 ` Reinette Chatre
@ 2025-01-13 20:03 ` Moger, Babu
0 siblings, 0 replies; 76+ messages in thread
From: Moger, Babu @ 2025-01-13 20:03 UTC (permalink / raw)
To: Reinette Chatre, Moger, Babu, corbet, tglx, mingo, bp,
dave.hansen, tony.luck, peternewman
Cc: fenghua.yu, x86, hpa, paulmck, akpm, thuth, rostedt,
xiongwei.song, pawan.kumar.gupta, daniel.sneddon, jpoimboe,
perry.yuan, sandipan.das, kai.huang, xiaoyao.li, seanjc, xin3.li,
andrew.cooper3, ebiggers, mario.limonciello, james.morse,
tan.shaopeng, linux-doc, linux-kernel, maciej.wieczor-retman,
eranian
Hi Reinette,
On 12/23/24 10:20, Reinette Chatre wrote:
> Hi Babu,
>
> On 12/21/24 6:59 AM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 12/19/2024 9:12 PM, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 12/12/24 12:15 PM, Babu Moger wrote:
>>>> Resctrl provides option to configure events by writing to the interfaces
>>>> /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config or
>>>> /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config when BMEC (Bandwidth
>>>> Monitoring Event Configuration) is supported.
>>>>
>>>> Whenever the event configuration is updated, MBM assignments must be
>>>> revised across all monitor groups within the impacted domains.
>>>
>>> This needs imperative tone description of what this patch does.
>>
>> Sure.
>>
>>>
>>> ...
>>>
>>>> @@ -1825,6 +1825,54 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
>>>> return 0;
>>>> }
>>>> +/*
>>>> + * Review the cntr_cfg domain configuration. If a matching assignment is found,
>>>> + * update the counter assignment accordingly. This is within the IPI Context,
>>>
>>> This "Review the cntr_cfg domain configuration. If a matching assignment is found,"
>>> is too vague for me to make sense of what it is trying to do. Can this be made more specific?
>>
>> Does this look ok?
>>
>> Check the counter configuration in the domain. If the specific event is configured, then update the assignment with the new event configuration value. This is within the IPI Context, so call resctrl_abmc_config_one_amd directly"
>
> I think it will be easier to understand what this function does if the
> comment is made more specific. For example:
> Update hardware counter configuration after event configuration change.
>
> Walk the hardware counters of domain @d to reconfigure all assigned
> counters that are monitoring @evtid with the event's new configuration
> @mon_config (or @config_val).
>
> This is run on a CPU belonging to domain @d so call
> resctrl_abmc_config_one_amd() directly.
Looks good. Thanks
>
> Looking closer at architecture specific resctrl_arch_update_cntr() the
> reset of non-arch state (get_mbm_state()->memset()) seems out of place.
> There is a resctrl_arch_reset_rmid_all() within mbm_config_write_domain() that
> resets all architectural state after the event configuration is changed,
> should the non-architectural state not also be reset at that time? It looks
Moved the reset of non-arch state inside mbm_config_write_domain(). It
seems to work fine. Also I can simplify the IPI code further.
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 5f5cf9b3a053..ce08fb718e2e 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2076,9 +2076,6 @@ static void resctrl_abmc_config_one_amd(void *info)
abmc_cfg.split.bw_type = config->val;
wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, abmc_cfg.full);
-
- resctrl_arch_reset_rmid(config->r, config->d, config->closid,
- config->rmid, config->evtid);
}
static int mbm_config_show(struct seq_file *s, struct rdt_resource *r,
u32 evtid)
@@ -2153,10 +2150,6 @@ static void resctrl_arch_update_cntr(struct
rdt_resource *r, struct rdt_mon_doma
config.assign = 1;
resctrl_abmc_config_one_amd(&config);
-
- m = get_mbm_state(d, rdtgrp->closid,
rdtgrp->mon.rmid, evtid);
- if (m)
- memset(m, 0, sizeof(struct mbm_state));
}
}
}
@@ -2178,6 +2171,7 @@ static void resctrl_mon_event_config_set(void *info)
static void mbm_config_write_domain(struct rdt_resource *r,
struct rdt_mon_domain *d, u32 evtid,
u32 val)
{
+ u32 idx_limit = resctrl_arch_system_num_rmid_idx();
struct mon_config_info mon_info = {0};
u32 config_val;
@@ -2214,6 +2208,12 @@ static void mbm_config_write_domain(struct
rdt_resource *r,
* mbm_local and mbm_total counts for all the RMIDs.
*/
resctrl_arch_reset_rmid_all(r, d);
+
+ if (is_mbm_total_enabled())
+ memset(d->mbm_total, 0, sizeof(struct mbm_state) * idx_limit);
+
+ if (is_mbm_local_enabled())
+ memset(d->mbm_local, 0, sizeof(struct mbm_state) * idx_limit);
}
static int mon_config_write(struct rdt_resource *r, char *tok, u32 evtid)
> to me like it is something that may be needed for existing event
> configuration (but not an issue until Peter's new feature lands) and when done,
> the reset done within resctrl_arch_update_cntr() will no longer be necessary.
>
> Something else to consider is the resctrl_arch_reset_rmid() within
> resctrl_abmc_config_one_amd() seems redundant on this call path since
> it is followed by resctrl_arch_reset_rmid_all(). resctrl_arch_reset_rmid()
> does one MSR write and one MSR read for every counter that needs to be
> reconfigured and if that is unnecessary it may be worthwhile to optimize
> out?
Yes. Removed the resctrl_arch_reset_rmid() within
resctrl_abmc_config_one_amd().
Tested the code and seems to work fine.
--
Thanks
Babu Moger
^ permalink raw reply related [flat|nested] 76+ messages in thread
end of thread, other threads:[~2025-01-13 20:03 UTC | newest]
Thread overview: 76+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-12 20:15 [PATCH v10 00/24] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
2024-12-12 20:15 ` [PATCH v10 01/24] x86/resctrl: Add __init attribute to functions called from resctrl_late_init() Babu Moger
2024-12-12 20:15 ` [PATCH v10 02/24] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
2024-12-12 20:15 ` [PATCH v10 03/24] x86/resctrl: Add ABMC feature in the command line options Babu Moger
2024-12-12 20:15 ` [PATCH v10 04/24] x86/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
2024-12-12 20:15 ` [PATCH v10 05/24] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
2024-12-12 20:15 ` [PATCH v10 06/24] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags Babu Moger
2024-12-12 20:15 ` [PATCH v10 07/24] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
2024-12-19 21:48 ` Reinette Chatre
2024-12-20 15:14 ` Moger, Babu
2024-12-20 17:16 ` Reinette Chatre
2024-12-12 20:15 ` [PATCH v10 08/24] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
2024-12-19 21:59 ` Reinette Chatre
2024-12-20 15:31 ` Moger, Babu
2024-12-12 20:15 ` [PATCH v10 09/24] x86/resctrl: Introduce interface to display number of monitoring counters Babu Moger
2024-12-19 22:03 ` Reinette Chatre
2024-12-20 15:41 ` Moger, Babu
2024-12-12 20:15 ` [PATCH v10 10/24] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg in struct rdt_hw_mon_domain Babu Moger
2024-12-12 20:15 ` [PATCH v10 11/24] x86/resctrl: Remove MSR reading of event configuration value Babu Moger
2024-12-19 22:12 ` Reinette Chatre
2024-12-20 16:09 ` Moger, Babu
2024-12-12 20:15 ` [PATCH v10 12/24] x86/resctrl: Introduce cntr_cfg to track assignable counters at domain Babu Moger
2024-12-19 22:33 ` Reinette Chatre
2024-12-20 17:33 ` Moger, Babu
2024-12-20 20:58 ` Reinette Chatre
2024-12-12 20:15 ` [PATCH v10 13/24] x86/resctrl: Introduce interface to display number of free counters Babu Moger
2024-12-19 22:50 ` Reinette Chatre
2024-12-20 18:05 ` Moger, Babu
2024-12-20 18:32 ` Moger, Babu
2024-12-12 20:15 ` [PATCH v10 14/24] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
2024-12-12 20:15 ` [PATCH v10 15/24] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC Babu Moger
2024-12-19 23:04 ` Reinette Chatre
2024-12-20 19:22 ` Moger, Babu
2024-12-20 21:41 ` Reinette Chatre
2024-12-20 22:28 ` Moger, Babu
2024-12-20 23:47 ` Reinette Chatre
2024-12-21 13:40 ` Moger, Babu
2024-12-12 20:15 ` [PATCH v10 16/24] x86/resctrl: Add interface to the assign counter Babu Moger
2024-12-12 23:37 ` Luck, Tony
2024-12-13 15:57 ` Moger, Babu
2024-12-13 16:24 ` Luck, Tony
2024-12-13 16:54 ` Moger, Babu
2024-12-18 22:01 ` Reinette Chatre
2024-12-19 19:45 ` Moger, Babu
2024-12-19 21:12 ` Reinette Chatre
2024-12-19 21:38 ` Moger, Babu
2024-12-19 21:45 ` Luck, Tony
2024-12-19 22:33 ` Moger, Babu
2024-12-19 23:22 ` Reinette Chatre
2024-12-20 20:34 ` Moger, Babu
2024-12-12 20:15 ` [PATCH v10 17/24] x86/resctrl: Add the interface to unassign a counter Babu Moger
2024-12-19 23:32 ` Reinette Chatre
2024-12-20 21:38 ` Moger, Babu
2024-12-12 20:15 ` [PATCH v10 18/24] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled Babu Moger
2024-12-19 23:39 ` Reinette Chatre
2024-12-21 13:45 ` Moger, Babu
2024-12-12 20:15 ` [PATCH v10 19/24] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode Babu Moger
2024-12-19 23:59 ` Reinette Chatre
2024-12-21 14:04 ` Moger, Babu
2024-12-12 20:15 ` [PATCH v10 20/24] x86/resctrl: Introduce the interface to switch between monitor modes Babu Moger
2024-12-20 2:56 ` Reinette Chatre
2024-12-21 14:20 ` Moger, Babu
2024-12-12 20:15 ` [PATCH v10 21/24] x86/resctrl: Configure mbm_cntr_assign mode if supported Babu Moger
2024-12-20 3:03 ` Reinette Chatre
2024-12-21 14:33 ` Moger, Babu
2024-12-12 20:15 ` [PATCH v10 22/24] x86/resctrl: Update assignments on event configuration changes Babu Moger
2024-12-20 3:12 ` Reinette Chatre
2024-12-21 14:59 ` Moger, Babu
2024-12-23 16:20 ` Reinette Chatre
2025-01-13 20:03 ` Moger, Babu
2024-12-12 20:15 ` [PATCH v10 23/24] x86/resctrl: Introduce interface to list assignment states of all the groups Babu Moger
2024-12-12 22:57 ` Luck, Tony
2024-12-13 15:23 ` Moger, Babu
2024-12-12 20:15 ` [PATCH v10 24/24] x86/resctrl: Introduce interface to modify assignment states of " Babu Moger
2024-12-20 3:23 ` Reinette Chatre
2024-12-21 15:28 ` Moger, Babu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).