From: Reinette Chatre <reinette.chatre@intel.com>
To: Tony Luck <tony.luck@intel.com>,
Fenghua Yu <fenghua.yu@intel.com>,
"Maciej Wieczor-Retman" <maciej.wieczor-retman@intel.com>,
Peter Newman <peternewman@google.com>,
James Morse <james.morse@arm.com>,
Babu Moger <babu.moger@amd.com>,
Drew Fustini <dfustini@baylibre.com>,
Dave Martin <Dave.Martin@arm.com>
Cc: <x86@kernel.org>, <linux-kernel@vger.kernel.org>,
<patches@lists.linux.dev>
Subject: Re: [PATCH v19 06/20] x86/resctrl: Introduce snc_nodes_per_l3_cache
Date: Thu, 30 May 2024 13:20:39 -0700 [thread overview]
Message-ID: <b0e17f5e-210d-4aa3-9410-f1829b570c4b@intel.com> (raw)
In-Reply-To: <20240528222006.58283-7-tony.luck@intel.com>
Hi Tony,
On 5/28/24 3:19 PM, Tony Luck wrote:
> Intel Sub-NUMA Cluster (SNC) is a feature that subdivides the CPU cores
> and memory controllers on a socket into two or more groups. These are
> presented to the operating system as NUMA nodes.
>
> This may enable some workloads to have slightly lower latency to memory
> as the memory controller(s) in an SNC node are electrically closer to the
> CPU cores on that SNC node. This cost may be offset by lower bandwidth
> since the memory accesses for each core can only be interleaved between
> the memory controllers on the same SNC node.
>
> Resctrl monitoring on an Intel system depends upon attaching RMIDs to tasks
> to track L3 cache occupancy and memory bandwidth. There is an MSR that
> controls how the RMIDs are shared between SNC nodes.
>
> The default mode divides them numerically. E.g. when there are two SNC
> nodes on a socket the lower number half of the RMIDs are given to the
> first node, the remainder to the second node. This would be difficult
> to use with the Linux resctrl interface as specific RMID values assigned
> to resctrl groups are not visible to users.
>
> The other mode divides the RMIDs and renumbers the ones on the second
> SNC node to start from zero.
>
> Even with this renumbering SNC mode requires several changes in resctrl
> behavior for correct operation.
>
> Add a static global to arch/x86/kernel/cpu/resctrl/monitor.c to indicate
> how many SNC domains share an L3 cache instance. Initialize this to
> "1". Runtime detection of SNC mode will adjust this value.
>
> Update all places to take appropriate action when SNC mode is enabled:
> 1) The number of logical RMIDs per L3 cache available for use is the
> number of physical RMIDs divided by the number of SNC nodes.
> 2) Likewise the "mon_scale" value must be divided by the number of SNC
> nodes.
> 3) Add a function to convert from logical RMID values (assigned to
> tasks and loaded into the IA32_PQR_ASSOC MSR on context switch)
> to physical RMID values to load into IA32_QM_EVTSEL MSR when
> reading counters on each SNC node.
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> arch/x86/kernel/cpu/resctrl/monitor.c | 37 ++++++++++++++++++++++++---
> 1 file changed, 33 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 89d7e6fcbaa1..b9b4d2b5ca82 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -97,6 +97,8 @@ unsigned int resctrl_rmid_realloc_limit;
>
> #define CF(cf) ((unsigned long)(1048576 * (cf) + 0.5))
>
> +static int snc_nodes_per_l3_cache = 1;
> +
> /*
> * The correction factor table is documented in Documentation/arch/x86/resctrl.rst.
> * If rmid > rmid threshold, MBM total and local values should be multiplied
> @@ -185,10 +187,37 @@ static inline struct rmid_entry *__rmid_entry(u32 idx)
> return entry;
> }
>
> -static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
> +/*
> + * When Sub-NUMA Cluster (SNC) mode is not enabled, the physical RMID
> + * is the same as the logical RMID.
> + *
> + * When SNC mode is enabled the physical RMIDs are distributed across
> + * the SNC nodes. E.g. with two SNC nodes per L3 cache and 200 physical
> + * RMIDs are divided with 0..99 on the first node and 100..199 on
> + * the second node. Compute the value of the physical RMID to pass to
> + * resctrl_arch_rmid_read().
Please stop rushing version after version. I do not think you read the
above after you wrote it. The sentences run into each other.
Could this be specific about what is meant by "physical" and "logical" RMID?
To me "physical RMID" implies the RMID used by hardware and "logical RMID"
is the RMID used by software ... but when it comes to SNC it is actually:
"physical RMID" - RMID used by MSR_IA32_QM_EVTSEL
"logical RMID" - RMID used by software and the MSR_IA32_PQR_ASSOC register
> + *
> + * Caller is responsible to make sure execution running on a CPU in
"is responsible" and "make sure" means the same, no?
"make sure execution running"?
(Looking ahead in this series and coming back to this, this looks like
rushed work that you in turn expect folks spend quality time reviewing.)
> + * the domain to be read.
> + */
> +static int logical_rmid_to_physical_rmid(int lrmid)
> +{
> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> + int cpu = smp_processor_id();
> +
> + if (snc_nodes_per_l3_cache == 1)
> + return lrmid;
> +
> + return lrmid + (cpu_to_node(cpu) % snc_nodes_per_l3_cache) * r->num_rmid;
> +}
> +
> +static int __rmid_read(u32 lrmid,
> + enum resctrl_event_id eventid, u64 *val)
This line does not need to be split.
> {
> u64 msr_val;
> + int prmid;
>
> + prmid = logical_rmid_to_physical_rmid(lrmid);
> /*
> * As per the SDM, when IA32_QM_EVTSEL.EvtID (bits 7:0) is configured
> * with a valid event code for supported resource type and the bits
> @@ -197,7 +226,7 @@ static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
> * IA32_QM_CTR.Error (bit 63) and IA32_QM_CTR.Unavailable (bit 62)
> * are error bits.
> */
> - wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid);
> + wrmsr(MSR_IA32_QM_EVTSEL, eventid, prmid);
> rdmsrl(MSR_IA32_QM_CTR, msr_val);
>
> if (msr_val & RMID_VAL_ERROR)
> @@ -1022,8 +1051,8 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
> int ret;
>
> resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
> - hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale;
> - r->num_rmid = boot_cpu_data.x86_cache_max_rmid + 1;
> + hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale / snc_nodes_per_l3_cache;
> + r->num_rmid = (boot_cpu_data.x86_cache_max_rmid + 1) / snc_nodes_per_l3_cache;
> hw_res->mbm_width = MBM_CNTR_WIDTH_BASE;
>
> if (mbm_offset > 0 && mbm_offset <= MBM_CNTR_WIDTH_OFFSET_MAX)
Reinette
next prev parent reply other threads:[~2024-05-30 20:20 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-28 22:19 [PATCH v19 00/20] Add support for Sub-NUMA cluster (SNC) systems Tony Luck
2024-05-28 22:19 ` [PATCH v19 01/20] x86/resctrl: Prepare for new domain scope Tony Luck
2024-05-28 22:19 ` [PATCH v19 02/20] x86/resctrl: Prepare to split rdt_domain structure Tony Luck
2024-05-28 22:19 ` [PATCH v19 03/20] x86/resctrl: Prepare for different scope for control/monitor operations Tony Luck
2024-05-28 22:19 ` [PATCH v19 04/20] x86/resctrl: Split the rdt_domain and rdt_hw_domain structures Tony Luck
2024-05-28 22:19 ` [PATCH v19 05/20] x86/resctrl: Add node-scope to the options for feature scope Tony Luck
2024-05-28 22:19 ` [PATCH v19 06/20] x86/resctrl: Introduce snc_nodes_per_l3_cache Tony Luck
2024-05-30 20:20 ` Reinette Chatre [this message]
2024-05-31 18:17 ` Tony Luck
2024-06-07 16:49 ` Reinette Chatre
2024-05-28 22:19 ` [PATCH v19 07/20] x86/resctrl: Block use of mba_MBps mount option on Sub-NUMA Cluster (SNC) systems Tony Luck
2024-05-28 22:19 ` [PATCH v19 08/20] x86/resctrl: Prepare for new Sub-NUMA Cluster (SNC) monitor files Tony Luck
2024-05-30 20:21 ` Reinette Chatre
2024-05-31 0:26 ` Tony Luck
2024-05-31 16:01 ` Reinette Chatre
2024-05-28 22:19 ` [PATCH v19 09/20] x86/resctrl: Add new fields to struct rmid_read for summation of domains Tony Luck
2024-05-30 20:21 ` Reinette Chatre
2024-05-28 22:19 ` [PATCH v19 10/20] x86/resctrl: Refactor mkdir_mondata_subdir() with a helper function Tony Luck
2024-05-28 22:19 ` [PATCH v19 11/20] x86/resctrl: Allocate a new bit in union mon_data_bits Tony Luck
2024-05-30 20:21 ` Reinette Chatre
2024-05-28 22:19 ` [PATCH v19 12/20] x86/resctrl: Create Sub-NUMA Cluster (SNC) monitor files Tony Luck
2024-05-30 20:22 ` Reinette Chatre
2024-05-28 22:19 ` [PATCH v19 13/20] x86/resctrl: Handle removing directories in Sub-NUMA Cluster (SNC) mode Tony Luck
2024-05-28 22:19 ` [PATCH v19 14/20] x86/resctrl: Fill out rmid_read structure for smp_call*() to read a counter Tony Luck
2024-05-30 20:23 ` Reinette Chatre
2024-05-28 22:20 ` [PATCH v19 15/20] x86/resctrl: Pass two extra arguments to resctrl_arch_rmid_read() Tony Luck
2024-05-30 20:24 ` Reinette Chatre
2024-05-28 22:20 ` [PATCH v19 16/20] x86/resctrl: Make resctrl_arch_rmid_read() handle sum over domains Tony Luck
2024-05-30 20:24 ` Reinette Chatre
2024-06-03 23:15 ` Tony Luck
2024-06-07 16:49 ` Reinette Chatre
2024-06-07 19:51 ` Luck, Tony
2024-06-07 21:08 ` Reinette Chatre
2024-05-28 22:20 ` [PATCH v19 17/20] x86/resctrl: Update CPU sanity checks when reading RMID counters Tony Luck
2024-05-28 22:20 ` [PATCH v19 18/20] x86/resctrl: Enable RMID shared RMID mode on Sub-NUMA Cluster (SNC) systems Tony Luck
2024-05-30 20:27 ` Reinette Chatre
2024-05-28 22:20 ` [PATCH v19 19/20] x86/resctrl: Sub-NUMA Cluster (SNC) detection and enabling Tony Luck
2024-05-30 20:28 ` Reinette Chatre
2024-05-28 22:20 ` [PATCH v19 20/20] x86/resctrl: Update documentation with Sub-NUMA cluster changes Tony Luck
2024-05-30 20:29 ` Reinette Chatre
2024-05-28 22:55 ` [PATCH v19 00/20] Add support for Sub-NUMA cluster (SNC) systems Reinette Chatre
2024-05-29 20:20 ` Tony Luck
2024-05-30 2:46 ` Reinette Chatre
2024-05-30 16:36 ` Tony Luck
2024-05-30 17:55 ` Reinette Chatre
2024-05-30 22:49 ` Luck, Tony
2024-05-30 23:10 ` Reinette Chatre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b0e17f5e-210d-4aa3-9410-f1829b570c4b@intel.com \
--to=reinette.chatre@intel.com \
--cc=Dave.Martin@arm.com \
--cc=babu.moger@amd.com \
--cc=dfustini@baylibre.com \
--cc=fenghua.yu@intel.com \
--cc=james.morse@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maciej.wieczor-retman@intel.com \
--cc=patches@lists.linux.dev \
--cc=peternewman@google.com \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).