public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Reinette Chatre <reinette.chatre@intel.com>
To: "Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
Cc: fenghua.yu@intel.com, shuah@kernel.org, tony.luck@intel.com,
	peternewman@google.com, babu.moger@amd.com,
	"Maciej Wieczór-Retman" <maciej.wieczor-retman@intel.com>,
	linux-kselftest@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 5/6] selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth
Date: Fri, 6 Sep 2024 17:05:29 -0700	[thread overview]
Message-ID: <a2c7baf1-1c69-45a5-8755-38d152d33fae@intel.com> (raw)
In-Reply-To: <c3aa4289-40aa-c158-8fde-8a35a6002783@linux.intel.com>

Hi Ilpo,

On 9/6/24 1:44 AM, Ilpo Järvinen wrote:
> On Thu, 5 Sep 2024, Reinette Chatre wrote:
>> On 9/5/24 4:45 AM, Ilpo Järvinen wrote:
>>> On Wed, 4 Sep 2024, Reinette Chatre wrote:
>>>> On 9/4/24 4:43 AM, Ilpo Järvinen wrote:
>>>>> On Fri, 30 Aug 2024, Reinette Chatre wrote:
>>>>>> On 8/30/24 4:42 AM, Ilpo Järvinen wrote:
>>>>>>> On Thu, 29 Aug 2024, Reinette Chatre wrote:
>>>>>>>
>>>>>>>> The MBA test incrementally throttles memory bandwidth, each time
>>>>>>>> followed by a comparison between the memory bandwidth observed
>>>>>>>> by the performance counters and resctrl respectively.
>>>>>>>>
>>>>>>>> While a comparison between performance counters and resctrl is
>>>>>>>> generally appropriate, they do not have an identical view of
>>>>>>>> memory bandwidth. For example RAS features or memory performance
>>>>>>>> features that generate memory traffic may drive accesses that are
>>>>>>>> counted differently by performance counters and MBM respectively,
>>>>>>>> for instance generating "overhead" traffic which is not counted
>>>>>>>> against any specific RMID. As a ratio, this different view of
>>>>>>>> memory
>>>>>>>> bandwidth becomes more apparent at low memory bandwidths.
>>>>>>>
>>>>>>> Interesting.
>>>>>>>
>>>>>>> I did some time back prototype with a change to MBM test such that
>>>>>>> instead
>>>>>>> of using once=false I changed fill_buf to be able to run N passes
>>>>>>> through
>>>>>>> the buffer which allowed me to know how many reads were performed by
>>>>>>> the
>>>>>>> benchmark. This yielded numerical difference between all those 3
>>>>>>> values
>>>>>>> (# of reads, MBM, perf) which also varied from arch to another so it
>>>>>>> didn't end up making an usable test.
>>>>>>>
>>>>>>> I guess I now have an explanation for at least a part of the
>>>>>>> differences.
>>>>>>>
>>>>>>>> It is not practical to enable/disable the various features that
>>>>>>>> may generate memory bandwidth to give performance counters and
>>>>>>>> resctrl an identical view. Instead, do not compare performance
>>>>>>>> counters and resctrl view of memory bandwidth when the memory
>>>>>>>> bandwidth is low.
>>>>>>>>
>>>>>>>> Bandwidth throttling behaves differently across platforms
>>>>>>>> so it is not appropriate to drop measurement data simply based
>>>>>>>> on the throttling level. Instead, use a threshold of 750MiB
>>>>>>>> that has been observed to support adequate comparison between
>>>>>>>> performance counters and resctrl.
>>>>>>>>
>>>>>>>> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
>>>>>>>> ---
>>>>>>>>      tools/testing/selftests/resctrl/mba_test.c | 7 +++++++
>>>>>>>>      tools/testing/selftests/resctrl/resctrl.h  | 6 ++++++
>>>>>>>>      2 files changed, 13 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/tools/testing/selftests/resctrl/mba_test.c
>>>>>>>> b/tools/testing/selftests/resctrl/mba_test.c
>>>>>>>> index cad473b81a64..204b9ac4b108 100644
>>>>>>>> --- a/tools/testing/selftests/resctrl/mba_test.c
>>>>>>>> +++ b/tools/testing/selftests/resctrl/mba_test.c
>>>>>>>> @@ -96,6 +96,13 @@ static bool show_mba_info(unsigned long
>>>>>>>> *bw_imc,
>>>>>>>> unsigned long *bw_resc)
>>>>>>>>        		avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1);
>>>>>>>>      		avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
>>>>>>>> +		if (avg_bw_imc < THROTTLE_THRESHOLD || avg_bw_resc <
>>>>>>>> THROTTLE_THRESHOLD) {
>>>>>>>> +			ksft_print_msg("Bandwidth below threshold (%d
>>>>>>>> MiB).
>>>>>>>> Dropping results from MBA schemata %u.\n",
>>>>>>>> +					THROTTLE_THRESHOLD,
>>>>>>>> +					ALLOCATION_MAX -
>>>>>>>> ALLOCATION_STEP *
>>>>>>>> allocation);
>>>>>>>
>>>>>>> The second one too should be %d.
>>>>>>>
>>>>>>
>>>>>> hmmm ... I intended to have it be consistent with the ksft_print_msg()
>>>>>> that
>>>>>> follows. Perhaps allocation can be made unsigned instead?
>>>>>
>>>>> If you go that way, then it would be good to make the related defines
>>>>> and
>>>>> allocation in mba_setup() unsigned too, although the latter is a bit
>>>>> scary
>>>>
>>>> Sure, will look into that.
>>>>
>>>>> because it does allocation -= ALLOCATION_STEP which could underflow if
>>>>> the
>>>>> defines are ever changed.
>>>>>
>>>>
>>>> Is this not already covered in the following check:
>>>> 	if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)
>>>> 		return END_OF_TESTS;
>>>>
>>>> We are talking about hardcoded constants though.
>>>
>>> Borderline yes. It is "covered" by the allocation > ALLOCATION_MAX but
>>> it's also very non-intuitive to let the value underflow and then check for
>>> that with the > operator.
>>
>> My understanding is that this is the traditional way of checking overflow
>> (or more accurately wraparound). There are several examples of this pattern
>> in the kernel and it is also the pattern that I understand Linus referred
>> to as "traditional" in [1]. Even the compiler's intrinsic overflow checkers
>> do checking in this way (perform the subtraction and then check if it
>> overflowed) [2].
> 
> Fair enough. I've never come across that kind of claim before.
> 
>>> You're correct that they're constants so one would need to tweak the
>>> source to end up into this condition in the first place.
>>>
>>> Perhaps I'm being overly pendantic here but I in general don't like
>>> leaving trappy and non-obvious logic like that lying around because one
>>> day one of such will come back biting.
>>
>> It is not clear to me what is "trappy" about this. The current checks will
>> catch the wraparound if somebody changes ALLOCATION_STEP in your scenario, no?
>>
>>> So, if a variable is unsigned and we ever do subtraction (or adding
>>> negative numbers to it), I'd prefer additional check to prevent ever
>>> underflowing it unexpectedly. Or leave them signed.
>>
>> To support checking at the time of subtraction we either need to change the
>> flow of that function since it cannot exit with failure if that subtraction
>> fails because of overflow/wraparound, or we need to introduce more state that
>> will be an additional check that the existing
>> "if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)"
>> would have caught anyway.
>>
>> In either case, to do this checking at the time of subtraction the ideal way
>> would be to use check_sub_overflow() ... but it again does exactly what
>> you find to be non-intuitive since the implementation in
>> tools/include/linux/overflow.h uses the gcc intrinsics that does subtraction
>> first and then checks if overflow occurred.
> 
> It's trappy because by glance, that check looks unnecessary since
> allocation starts from max and goes downwards. Also worth to note,
> the check is not immediately after the decrement but done on the next
> iteration.

Right. This is probably what causes most confusion.

Considering that, what do you think of below that avoids wraparound entirely:

---8<---
From: Reinette Chatre <reinette.chatre@intel.com>
Subject: [PATCH] selftests/resctrl: Make wraparound handling obvious

Within mba_setup() the programmed bandwidth delay value starts
at the maximum (100, or rather ALLOCATION_MAX) and progresses
towards ALLOCATION_MIN by decrementing with ALLOCATION_STEP.

The programmed bandwidth delay should never be negative, so
representing it with an unsigned int is most appropriate. This
may introduce confusion because of the "allocation > ALLOCATION_MAX"
check used to check wraparound of the subtraction.

Modify the mba_setup() flow to start at the minimum, ALLOCATION_MIN,
and incrementally, with ALLOCATION_STEP steps, adjust the
bandwidth delay value. This avoids wraparound while making the purpose
of "allocation > ALLOCATION_MAX" clear and eliminates the
need for the "allocation < ALLOCATION_MIN" check.

Reported-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- New patch
---
  tools/testing/selftests/resctrl/mba_test.c | 12 +++++++-----
  1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c
index ab8496a4925b..947d5699f0c8 100644
--- a/tools/testing/selftests/resctrl/mba_test.c
+++ b/tools/testing/selftests/resctrl/mba_test.c
@@ -39,7 +39,8 @@ static int mba_setup(const struct resctrl_test *test,
  		     const struct user_params *uparams,
  		     struct resctrl_val_param *p)
  {
-	static int runs_per_allocation, allocation = 100;
+	static unsigned int allocation = ALLOCATION_MIN;
+	static int runs_per_allocation = 0;
  	char allocation_str[64];
  	int ret;
  
@@ -50,7 +51,7 @@ static int mba_setup(const struct resctrl_test *test,
  	if (runs_per_allocation++ != 0)
  		return 0;
  
-	if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)
+	if (allocation > ALLOCATION_MAX)
  		return END_OF_TESTS;
  
  	sprintf(allocation_str, "%d", allocation);
@@ -59,7 +60,7 @@ static int mba_setup(const struct resctrl_test *test,
  	if (ret < 0)
  		return ret;
  
-	allocation -= ALLOCATION_STEP;
+	allocation += ALLOCATION_STEP;
  
  	return 0;
  }
@@ -72,8 +73,9 @@ static int mba_measure(const struct user_params *uparams,
  
  static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc)
  {
-	int allocation, runs;
+	unsigned int allocation;
  	bool ret = false;
+	int runs;
  
  	ksft_print_msg("Results are displayed in (MB)\n");
  	/* Memory bandwidth from 100% down to 10% */
@@ -103,7 +105,7 @@ static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc)
  			       avg_diff_per > MAX_DIFF_PERCENT ?
  			       "Fail:" : "Pass:",
  			       MAX_DIFF_PERCENT,
-			       ALLOCATION_MAX - ALLOCATION_STEP * allocation);
+			       ALLOCATION_MIN + ALLOCATION_STEP * allocation);
  
  		ksft_print_msg("avg_diff_per: %d%%\n", avg_diff_per);
  		ksft_print_msg("avg_bw_imc: %lu\n", avg_bw_imc);


> 
> The risk here is that somebody removes allocation > ALLOCATION_MAX check.
> 
> Something called check_sub_overflow() is not subject to a similar risk
> regardless of what operations occur inside it.

Reinette

  reply	other threads:[~2024-09-07  0:05 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-29 22:52 [PATCH 0/6] selftests/resctrl: Support diverse platforms with MBM and MBA tests Reinette Chatre
2024-08-29 22:52 ` [PATCH 1/6] selftests/resctrl: Fix sparse warnings Reinette Chatre
2024-08-30 10:29   ` Ilpo Järvinen
2024-08-29 22:52 ` [PATCH 2/6] selftests/resctrl: Ensure measurements skip initialization of default benchmark Reinette Chatre
2024-08-30 10:56   ` Ilpo Järvinen
2024-08-30 16:00     ` Reinette Chatre
2024-09-04 11:57       ` Ilpo Järvinen
2024-09-04 21:15         ` Reinette Chatre
2024-09-05 12:10           ` Ilpo Järvinen
2024-09-05 18:08             ` Reinette Chatre
2024-09-06 10:00               ` Ilpo Järvinen
2024-09-07  0:05                 ` Reinette Chatre
2024-09-09 12:52                   ` Ilpo Järvinen
2024-08-29 22:52 ` [PATCH 3/6] selftests/resctrl: Simplify benchmark parameter passing Reinette Chatre
2024-08-30 11:13   ` Ilpo Järvinen
2024-08-30 16:01     ` Reinette Chatre
2024-08-29 22:52 ` [PATCH 4/6] selftests/resctrl: Use cache size to determine "fill_buf" buffer size Reinette Chatre
2024-08-30 11:25   ` Ilpo Järvinen
2024-08-30 16:00     ` Reinette Chatre
2024-08-29 22:52 ` [PATCH 5/6] selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth Reinette Chatre
2024-08-30 11:42   ` Ilpo Järvinen
2024-08-30 16:00     ` Reinette Chatre
2024-09-04 11:43       ` Ilpo Järvinen
2024-09-04 21:15         ` Reinette Chatre
2024-09-05 11:45           ` Ilpo Järvinen
2024-09-05 18:08             ` Reinette Chatre
2024-09-06  8:44               ` Ilpo Järvinen
2024-09-07  0:05                 ` Reinette Chatre [this message]
2024-09-09  8:13                   ` Ilpo Järvinen
2024-08-29 22:52 ` [PATCH 6/6] selftests/resctrl: Keep results from first test run Reinette Chatre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a2c7baf1-1c69-45a5-8755-38d152d33fae@intel.com \
    --to=reinette.chatre@intel.com \
    --cc=babu.moger@amd.com \
    --cc=fenghua.yu@intel.com \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=maciej.wieczor-retman@intel.com \
    --cc=peternewman@google.com \
    --cc=shuah@kernel.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox