public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
To: "Shaopeng Tan (Fujitsu)" <tan.shaopeng@fujitsu.com>
Cc: "linux-kselftest@vger.kernel.org"
	<linux-kselftest@vger.kernel.org>,
	Reinette Chatre <reinette.chatre@intel.com>,
	Fenghua Yu <fenghua.yu@intel.com>, Shuah Khan <shuah@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: RE: [PATCH v2 21/24] selftests/resctrl: Read in less obvious order to defeat prefetch optimizations
Date: Wed, 14 Jun 2023 16:02:43 +0300 (EEST)	[thread overview]
Message-ID: <b7dfc9b-74da-5fe2-9060-fd36eb636c6@linux.intel.com> (raw)
In-Reply-To: <TYAPR01MB6330025B5E6537F94DA49ACB8B499@TYAPR01MB6330.jpnprd01.prod.outlook.com>

[-- Attachment #1: Type: text/plain, Size: 6484 bytes --]

On Thu, 1 Jun 2023, Shaopeng Tan (Fujitsu) wrote:
> > > > When reading memory in order, HW prefetching optimizations will
> > > > interfere with measuring how caches and memory are being accessed.
> > > > This adds noise into the results.
> > > >
> > > > Change the fill_buf reading loop to not use an obvious in-order
> > > > access using multiply by a prime and modulo.
> > > >
> > > > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> > > > ---
> > > >  tools/testing/selftests/resctrl/fill_buf.c | 17 ++++++++++-------
> > > >  1 file changed, 10 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/tools/testing/selftests/resctrl/fill_buf.c
> > > > b/tools/testing/selftests/resctrl/fill_buf.c
> > > > index 7e0d3a1ea555..049a520498a9 100644
> > > > --- a/tools/testing/selftests/resctrl/fill_buf.c
> > > > +++ b/tools/testing/selftests/resctrl/fill_buf.c
> > > > @@ -88,14 +88,17 @@ static void *malloc_and_init_memory(size_t s)
> > > >
> > > >  static int fill_one_span_read(unsigned char *start_ptr, unsigned
> > > > char
> > > > *end_ptr)  {
> > > > -	unsigned char sum, *p;
> > > > -
> > > > +	unsigned int size = (end_ptr - start_ptr) / (CL_SIZE / 2);
> > > > +	unsigned int count = size;
> > > > +	unsigned char sum;
> > > > +
> > > > +	/*
> > > > +	 * Read the buffer in an order that is unexpected by HW prefetching
> > > > +	 * optimizations to prevent them interfering with the caching pattern.
> > > > +	 */
> > > >  	sum = 0;
> > > > -	p = start_ptr;
> > > > -	while (p < end_ptr) {
> > > > -		sum += *p;
> > > > -		p += (CL_SIZE / 2);
> > > > -	}
> > > > +	while (count--)
> > > > +		sum += start_ptr[((count * 59) % size) * CL_SIZE / 2];
> > >
> > > Could you please elaborate why 59 is used?
> > 
> > The main reason is that it's a prime number ensuring the whole buffer gets read.
> > I picked something that doesn't make it to wrap on almost every iteration.
> 
> Thanks for your explanation. It seems there is no problem.
> 
> Perhaps you have already tested this patch in your environment and got a test result of "ok". 
> Because HW prefetching does not work well,
> the IMC counter fluctuates a lot in my environment,
> and the test result is "not ok". 
> 
> In order to ensure this test set runs in any environments and gets "ok",
> would you consider changing the value of MAX_DIFF_PERCENT of each test?
> or changing something else?
> 
> ```
> Environment:
>  Kernel: 6.4.0-rc2
>  CPU: Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz
> 
> Test result(MBM as an example):
> # # Starting MBM BW change ...
> # # Mounting resctrl to "/sys/fs/resctrl"
> # # Benchmark PID: 8671
> # # Writing benchmark parameters to resctrl FS
> # # Write schema "MB:0=100" to resctrl FS
> # # Checking for pass/fail
> # # Fail: Check MBM diff within 5%
> # # avg_diff_per: 9%
> # # Span in bytes: 262144000
> # # avg_bw_imc: 6202
> # # avg_bw_resc: 5585
> # not ok 1 MBM: bw change
> ```

Could you try if the approach below works better (I think it should apply 
cleanly on top of the fixes+cleanups v3 series which you recently tested, 
no need to have the other CAT test changes).

The biggest difference in terms of result stability my tests come from 
these factors:
- Removed reversed index order.
- Open-coded the modulo in the loop to subtraction.

In addition, I changed the prime to one which works slightly better than 
59. The MBM/MBA results were already <5% with 59 too due to the other two 
changes, but using 23 lowered them further in my tests (with Platinum 
8260L).

---
From: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[PATCH] selftests/resctrl: Read in less obvious order to defeat prefetch optimizations

When reading memory in order, HW prefetching optimizations will
interfere with measuring how caches and memory are being accessed. This
adds noise into the results.

Change the fill_buf reading loop to not use an obvious in-order access
using multiply by a prime and modulo.

Using a prime multiplier with modulo ensures the entire buffer is
eventually read. 23 is small enough that the reads are spread out but
wrapping does not occur very frequently (wrapping too often can trigger
L2 hits more frequently which causes noise to the test because getting
the data from LLC is not required).

It was discovered that not all primes work equally well and some can
cause wildly unstable results (e.g., in an earlier version of this
patch, the reads were done in reversed order and 59 was used as the
prime resulting in unacceptably high and unstable results in MBA and
MBM test on some architectures).

Link: https://lore.kernel.org/linux-kselftest/TYAPR01MB6330025B5E6537F94DA49ACB8B499@TYAPR01MB6330.jpnprd01.prod.outlook.com/
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

---
 tools/testing/selftests/resctrl/fill_buf.c | 38 +++++++++++++++++++++++-------
 1 file changed, 30 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/resctrl/fill_buf.c b/tools/testing/selftests/resctrl/fill_buf.c
index f9893edda869..afde37d3fca0 100644
--- a/tools/testing/selftests/resctrl/fill_buf.c
+++ b/tools/testing/selftests/resctrl/fill_buf.c
@@ -74,16 +74,38 @@ static void *malloc_and_init_memory(size_t buf_size)
 	return p;
 }
 
+/*
+ * Buffer index step advance to workaround HW prefetching interfering with
+ * the measurements.
+ *
+ * Must be a prime to step through all indexes of the buffer.
+ *
+ * Some primes work better than others on some architectures (from MBA/MBM
+ * result stability point of view).
+ */
+#define FILL_IDX_MULT	23
+
 static int fill_one_span_read(unsigned char *buf, size_t buf_size)
 {
-	unsigned char *end_ptr = buf + buf_size;
-	unsigned char sum, *p;
-
-	sum = 0;
-	p = buf;
-	while (p < end_ptr) {
-		sum += *p;
-		p += (CL_SIZE / 2);
+	unsigned int size = buf_size / (CL_SIZE / 2);
+	unsigned int i, idx = 0;
+	unsigned char sum = 0;
+
+	/*
+	 * Read the buffer in an order that is unexpected by HW prefetching
+	 * optimizations to prevent them interfering with the caching pattern.
+	 *
+	 * The read order is (in terms of halves of cachelines):
+	 *	i * FILL_IDX_MULT % size
+	 * The formula is open-coded below to avoiding modulo inside the loop
+	 * as it improves MBA/MBM result stability on some architectures.
+	 */
+	for (i = 0; i < size; i++) {
+		sum += buf[idx * (CL_SIZE / 2)];
+
+		idx += FILL_IDX_MULT;
+		while (idx >= size)
+			idx -= size;
 	}
 
 	return sum;

-- 
tg: (68d2d0512b91..) refactor/read-fuzzing (depends on: refactor/remove-test-globals)

  parent reply	other threads:[~2023-06-14 13:03 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-18 11:44 [PATCH v2 00/24] selftests/resctrl: Fixes, cleanups, and rewritten CAT test Ilpo Järvinen
2023-04-18 11:44 ` [PATCH v2 01/24] selftests/resctrl: Add resctrl.h into build deps Ilpo Järvinen
2023-04-22  0:07   ` Reinette Chatre
2023-04-18 11:44 ` [PATCH v2 02/24] selftests/resctrl: Check also too low values for CBM bits Ilpo Järvinen
2023-04-22  0:08   ` Reinette Chatre
2023-04-24 10:46     ` Ilpo Järvinen
2023-04-18 11:44 ` [PATCH v2 03/24] selftests/resctrl: Move resctrl FS mount/umount to higher level Ilpo Järvinen
2023-04-22  0:09   ` Reinette Chatre
2023-04-24 14:58     ` Ilpo Järvinen
2023-04-18 11:44 ` [PATCH v2 04/24] selftests/resctrl: Remove mum_resctrlfs Ilpo Järvinen
2023-04-22  0:11   ` Reinette Chatre
2023-04-24 14:55     ` Ilpo Järvinen
2023-04-18 11:44 ` [PATCH v2 05/24] selftests/resctrl: Make span unsigned long everywhere Ilpo Järvinen
2023-04-22  0:12   ` Reinette Chatre
2023-04-24 15:05     ` Ilpo Järvinen
2023-04-18 11:44 ` [PATCH v2 06/24] selftests/resctrl: Express span in bytes Ilpo Järvinen
2023-04-22  0:12   ` Reinette Chatre
2023-04-24 15:31     ` Ilpo Järvinen
2023-04-18 11:44 ` [PATCH v2 07/24] selftests/resctrl: Remove duplicated preparation for span arg Ilpo Järvinen
2023-04-22  0:14   ` Reinette Chatre
2023-04-24 15:41     ` Ilpo Järvinen
2023-04-18 11:44 ` [PATCH v2 08/24] selftests/resctrl: Don't use variable argument list for ->setup() Ilpo Järvinen
2023-04-18 11:44 ` [PATCH v2 09/24] selftests/resctrl: Remove "malloc_and_init_memory" param from run_fill_buf() Ilpo Järvinen
2023-04-18 11:44 ` [PATCH v2 10/24] selftests/resctrl: Split run_fill_buf() to alloc, work, and dealloc helpers Ilpo Järvinen
2023-04-22  0:16   ` Reinette Chatre
2023-04-24 16:01     ` Ilpo Järvinen
2023-04-24 16:32       ` Reinette Chatre
2023-04-18 11:44 ` [PATCH v2 11/24] selftests/resctrl: Remove start_buf local variable from buffer alloc func Ilpo Järvinen
2023-04-22  0:16   ` Reinette Chatre
2023-04-18 11:44 ` [PATCH v2 12/24] selftests/resctrl: Don't pass test name to fill_buf Ilpo Järvinen
2023-04-18 11:44 ` [PATCH v2 13/24] selftests/resctrl: Add flush_buffer() " Ilpo Järvinen
2023-04-22  0:16   ` Reinette Chatre
2023-04-18 11:44 ` [PATCH v2 14/24] selftests/resctrl: Remove test type checks from cat_val() Ilpo Järvinen
2023-04-18 11:44 ` [PATCH v2 15/24] selftests/resctrl: Refactor get_cbm_mask() Ilpo Järvinen
2023-04-22  0:17   ` Reinette Chatre
2023-05-31  6:03   ` Shaopeng Tan (Fujitsu)
2023-05-31  9:24     ` Ilpo Järvinen
2023-04-18 11:44 ` [PATCH v2 16/24] selftests/resctrl: Create cache_alloc_size() helper Ilpo Järvinen
2023-04-22  0:19   ` Reinette Chatre
2023-04-24 16:28     ` Ilpo Järvinen
2023-04-24 16:45       ` Reinette Chatre
2023-04-18 11:44 ` [PATCH v2 17/24] selftests/resctrl: Replace count_bits with count_consecutive_bits() Ilpo Järvinen
2023-04-22  0:20   ` Reinette Chatre
2023-04-25 11:41     ` Ilpo Järvinen
2023-04-25 14:28       ` Reinette Chatre
2023-05-31  7:25   ` Shaopeng Tan (Fujitsu)
2023-05-31  9:35     ` Ilpo Järvinen
2023-06-01  6:20       ` Shaopeng Tan (Fujitsu)
2023-04-18 11:45 ` [PATCH v2 18/24] selftests/resctrl: Exclude shareable bits from schemata in CAT test Ilpo Järvinen
2023-04-18 11:45 ` [PATCH v2 19/24] selftests/resctrl: Pass the real number of tests to show_cache_info() Ilpo Järvinen
2023-04-22  0:20   ` Reinette Chatre
2023-04-18 11:45 ` [PATCH v2 20/24] selftests/resctrl: Move CAT/CMT test global vars to func they are used Ilpo Järvinen
2023-04-22  0:20   ` Reinette Chatre
2023-04-18 11:45 ` [PATCH v2 21/24] selftests/resctrl: Read in less obvious order to defeat prefetch optimizations Ilpo Järvinen
2023-05-31  5:33   ` Shaopeng Tan (Fujitsu)
2023-05-31  9:17     ` Ilpo Järvinen
2023-06-01  6:15       ` Shaopeng Tan (Fujitsu)
2023-06-02 13:51         ` Ilpo Järvinen
2023-06-02 14:39           ` Reinette Chatre
2023-06-14 13:02         ` Ilpo Järvinen [this message]
2023-06-16  5:30           ` Shaopeng Tan (Fujitsu)
2023-06-16  6:44             ` Ilpo Järvinen
2023-04-18 11:45 ` [PATCH v2 22/24] selftests/resctrl: Split measure_cache_vals() function Ilpo Järvinen
2023-04-22  0:21   ` Reinette Chatre
2023-04-18 11:45 ` [PATCH v2 23/24] selftests/resctrl: Split show_cache_info() to test specific and generic parts Ilpo Järvinen
2023-04-22  0:22   ` Reinette Chatre
2023-04-18 11:45 ` [PATCH v2 24/24] selftests/resctrl: Rewrite Cache Allocation Technology (CAT) test Ilpo Järvinen
2023-04-22  0:32   ` Reinette Chatre
2023-04-26 13:58     ` Ilpo Järvinen
2023-04-26 23:35       ` Reinette Chatre
2023-04-27  8:04         ` Ilpo Järvinen
2023-04-27 15:15           ` Reinette Chatre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b7dfc9b-74da-5fe2-9060-fd36eb636c6@linux.intel.com \
    --to=ilpo.jarvinen@linux.intel.com \
    --cc=fenghua.yu@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=reinette.chatre@intel.com \
    --cc=shuah@kernel.org \
    --cc=tan.shaopeng@fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox