From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935053AbdBQUAZ (ORCPT ); Fri, 17 Feb 2017 15:00:25 -0500 Received: from mga01.intel.com ([192.55.52.88]:19241 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934920AbdBQT6X (ORCPT ); Fri, 17 Feb 2017 14:58:23 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.35,173,1484035200"; d="scan'208";a="67174462" From: Vikas Shivappa To: vikas.shivappa@intel.com Cc: linux-kernel@vger.kernel.org, x86@kernel.org, hpa@zytor.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, ravi.v.shankar@intel.com, tony.luck@intel.com, fenghua.yu@intel.com, andi.kleen@intel.com, vikas.shivappa@linux.intel.com Subject: [PATCH 1/8] Documentation, x86: Documentation for Intel Mem b/w allocation Date: Fri, 17 Feb 2017 11:58:48 -0800 Message-Id: <1487361535-9727-2-git-send-email-vikas.shivappa@linux.intel.com> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1487361535-9727-1-git-send-email-vikas.shivappa@linux.intel.com> References: <1487361535-9727-1-git-send-email-vikas.shivappa@linux.intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Update the intel_rdt_ui documentation to have Memory bandwidth(b/w) allocation interface usage. Signed-off-by: Vikas Shivappa --- Documentation/x86/intel_rdt_ui.txt | 74 ++++++++++++++++++++++++++++++++++---- 1 file changed, 67 insertions(+), 7 deletions(-) diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt index d918d26..2f679e2 100644 --- a/Documentation/x86/intel_rdt_ui.txt +++ b/Documentation/x86/intel_rdt_ui.txt @@ -4,6 +4,7 @@ Copyright (C) 2016 Intel Corporation Fenghua Yu Tony Luck +Vikas Shivappa This feature is enabled by the CONFIG_INTEL_RDT_A Kconfig and the X86 /proc/cpuinfo flag bits "rdt", "cat_l3" and "cdp_l3". @@ -22,8 +23,8 @@ Info directory The 'info' directory contains information about the enabled resources. Each resource has its own subdirectory. The subdirectory -names reflect the resource names. Each subdirectory contains the -following files: +names reflect the resource names. +Cache resource(L3/L2) subdirectory contains the following files: "num_closids": The number of CLOSIDs which are valid for this resource. The kernel uses the smallest number of @@ -35,6 +36,16 @@ following files: "min_cbm_bits": The minimum number of consecutive bits which must be set when writing a mask. +Memory bandwitdh(MB) subdirectory contains the following files: + +"min_bw": The minimum memory bandwidth percentage which user can + request. + +"bw_gran": The granularity in which the user can request the memory + bandwidth percentage. + +"scale_linear":Indicates if the bandwidth scale is linear or + non-linear. Resource groups --------------- @@ -107,6 +118,28 @@ and 0xA are not. On a system with a 20-bit mask each bit represents 5% of the capacity of the cache. You could partition the cache into four equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000. +Memory bandwidth(b/w) throttle +------------------------------ +For Memory b/w resource, user controls the resource by indicating the +percentage of total memory b/w. + +The minimum bandwidth percentage value for each cpu model is predefined +and can be looked up through "info/MB/min_bw". The bandwidth granularity +that can be requested is also dependent on the cpu model and can be +looked up at "info/MB/bw_gran". + +The bandwidth percentage values are mapped to hardware delay values and +programmed in the QOS_MSRs. The delay values may be in linear scale and +non-linear scale. In a linear scale the programmed values directly +correspond to a delay value(b/w percentage = 100 - delay). However in a +non-linear scale, the percentage values correspond to a pre-caliberated +delay values. The delay values in non-linear scale have the granularity +of power of 2. + +The bandwidth throttling is a a core specific mechanism on some of Intel +SKUs. Using a high bandwidth and a low bandwidth setting on two threads +sharing a core will result in both threads being throttled to use the +low bandwidth. L3 details (code and data prioritization disabled) -------------------------------------------------- @@ -129,16 +162,24 @@ schemata format is always: L2:=;=;... +Memory b/w Allocation details +----------------------------- + +Memory b/w domain is L3 cache. + + MB:=bandwidth0;=bandwidth1;... + Example 1 --------- On a two socket machine (one L3 cache per socket) with just four bits -for cache bit masks +for cache bit masks, minimum b/w of 10% with a memory bandwidth +granularity of 10% # mount -t resctrl resctrl /sys/fs/resctrl # cd /sys/fs/resctrl # mkdir p0 p1 -# echo "L3:0=3;1=c" > /sys/fs/resctrl/p0/schemata -# echo "L3:0=3;1=3" > /sys/fs/resctrl/p1/schemata +# echo "L3:0=3;1=c\nMB:0=50;1=50" > /sys/fs/resctrl/p0/schemata +# echo "L3:0=3;1=3\nMB:0=50;1=50" > /sys/fs/resctrl/p1/schemata The default resource group is unmodified, so we have access to all parts of all caches (its schemata file reads "L3:0=f;1=f"). @@ -147,6 +188,14 @@ Tasks that are under the control of group "p0" may only allocate from the "lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1. Tasks in group "p1" use the "lower" 50% of cache on both sockets. +Similarly, tasks that are under the control of group "p0" may use a +maximum memory b/w of 50% on socket0, and the 50% on socket 1. +Tasks in group "p1" may use the rest of 50% memory b/w on both sockets. +Note that unlike cache masks, memory b/w cannot specify whether these +allocations can overlap or not. The allocations specifies the maximum +b/w that the group may be able to use and the system admin can configure +the b/w accordingly. + Example 2 --------- Again two sockets, but this time with a more realistic 20-bit mask. @@ -185,6 +234,16 @@ Ditto for the second real time task (with the remaining 25% of cache): # echo 5678 > p1/tasks # taskset -cp 2 5678 +For the same 2 socket system with memory b/w resource and CAT L3 the +schemata would look like: + +Assume min_bw 10 and bw_gran is 10. + +# echo -e "L3:0=f8000;1=fffff\nMB:0=10;1=30" > p0/schemata + +This would request 10% memory b/w on socket 0 and 30% memory b/w on +socket1. + Example 3 --------- @@ -203,10 +262,11 @@ First we reset the schemata for the default group so that the "upper" # echo "L3:0=3ff" > schemata Next we make a resource group for our real time cores and give -it access to the "top" 50% of the cache on socket 0. +it access to the "top" 50% of the cache on socket 0 and 50% of memory +bandwidth on socket 0. # mkdir p0 -# echo "L3:0=ffc00;" > p0/schemata +# echo "L3:0=ffc00;\nMB:0=50" > p0/schemata Finally we move core 4-7 over to the new group and make sure that the kernel and the tasks running there get 50% of the cache. -- 1.9.1