public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
To: Reinette Chatre <reinette.chatre@intel.com>
Cc: fenghua.yu@intel.com, shuah@kernel.org, tony.luck@intel.com,
	peternewman@google.com, babu.moger@amd.com,
	"Maciej Wieczór-Retman" <maciej.wieczor-retman@intel.com>,
	linux-kselftest@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/6] selftests/resctrl: Ensure measurements skip initialization of default benchmark
Date: Mon, 9 Sep 2024 15:52:49 +0300 (EEST)	[thread overview]
Message-ID: <ccc7579f-2633-adda-2a8e-235021c5c785@linux.intel.com> (raw)
In-Reply-To: <3325de04-6e59-4c7a-ae44-c245c8edb93f@intel.com>

[-- Attachment #1: Type: text/plain, Size: 9702 bytes --]

On Fri, 6 Sep 2024, Reinette Chatre wrote:
> On 9/6/24 3:00 AM, Ilpo Järvinen wrote:
> > On Thu, 5 Sep 2024, Reinette Chatre wrote:
> > > On 9/5/24 5:10 AM, Ilpo Järvinen wrote:
> > > > On Wed, 4 Sep 2024, Reinette Chatre wrote:
> > > > > On 9/4/24 4:57 AM, Ilpo Järvinen wrote:
> > > > > > On Fri, 30 Aug 2024, Reinette Chatre wrote:
> > > > > > > On 8/30/24 3:56 AM, Ilpo Järvinen wrote:
> > > > > > > > On Thu, 29 Aug 2024, Reinette Chatre wrote:
> > > > 
> > > > > > > > > @@ -699,111 +639,80 @@ int resctrl_val(const struct
> > > > > > > > > resctrl_test
> > > > > > > > > *test,
> > > > > > > > >      		return ret;
> > > > > > > > >      	}
> > > > > > > > >      -	/*
> > > > > > > > > -	 * If benchmark wasn't successfully started by child,
> > > > > > > > > then
> > > > > > > > > child
> > > > > > > > > should
> > > > > > > > > -	 * kill parent, so save parent's pid
> > > > > > > > > -	 */
> > > > > > > > >      	ppid = getpid();
> > > > > > > > >      -	if (pipe(pipefd)) {
> > > > > > > > > -		ksft_perror("Unable to create pipe");
> > > > > > > > > +	/* Taskset test to specified CPU. */
> > > > > > > > > +	ret = taskset_benchmark(ppid, uparams->cpu,
> > > > > > > > > &old_affinity);
> > > > > > > > 
> > > > > > > > Previously only CPU affinity for bm_pid was set but now it's set
> > > > > > > > before
> > > > > > > > fork(). Quickly checking the Internet, it seems that CPU
> > > > > > > > affinity
> > > > > > > > gets
> > > > > > > > inherited on fork() so now both processes will have the same
> > > > > > > > affinity
> > > > > > > > which might make the other process to interfere with the
> > > > > > > > measurement.
> > > > > > > 
> > > > > > > Setting the affinity is intended to ensure that the buffer
> > > > > > > preparation
> > > > > > > occurs in the same topology as where the runtime portion will run.
> > > > > > > This preparation is done before the work to be measured starts.
> > > > > > > 
> > > > > > > This does tie in with the association with the resctrl group and I
> > > > > > > will elaborate more below ...
> > > > > > 
> > > > > > Okay, that's useful to retain but thinking this further, now we're
> > > > > > also
> > > > > > going do non-trivial amount of work in between the setup and the
> > > > > > test by
> > > > > 
> > > > > Could you please elaborate how the amount of work during setup can be
> > > > > an
> > > > > issue? I have been focused on the measurements that are done
> > > > > afterwards
> > > > > that do have clear boundaries from what I can tell.
> > > > 
> > > > Well, you used it as a justification: "Setting the affinity is intended
> > > > to ensure that the buffer preparation occurs in the same topology as
> > > > where
> > > > the runtime portion will run." So I assumed you had some expectations
> > > > about
> > > > "preparations" done outside of those "clear boundaries" but now you seem
> > > > to take entirely opposite stance?
> > > 
> > > I do not follow you here. With the "clear boundaries" I meant the
> > > measurements and associated variables that have  a clear start/init and
> > > stop/read while the other task sleeps. Yes, preparations are done outside
> > > of this since that should not be measured.
> > 
> > Of course the preparations are not measured (at least not after this
> > patch :-)).
> > 
> > But that's not what I meant. You said the preparations are to be done
> > using the same topology as the test but if it's outside of the measurement
> > period, the topology during preparations only matters if you make some
> > hidden assumption that some state carries from preparations to the actual
> > test. If no state carry-over is assumed, the topology during preparations
> > is not important. So which way it is? Is the topology during preparations
> > important or not?
> 
> Topology during preparations is important.
> 
> In the CMT test this is more relevant with the transition to using
> memflush = false. The preparation phase and measure phase uses the same
> cache alloc configuration and just as importantly, the same monitoring
> configuration. During preparation the cache portion that will be
> used during measurement will be filled with the contents of the buffer.
> During measurement it will be the same cache portion into which any new reads
> will be allocated and measured. In fact, the preparation phase will thus form
> part of the measurement. If preparation was done with different
> configuration, then I see a problem whether memflush = true as well as when
> memflush = false. In the case of memflush = true it will have the familiar
> issue of the test needing to "guess" the workload settle time. In the case
> of memflush = false the buffer will remain allocated into the cache portion
> used during preparation phase, when the workload runs it will read the
> data from a cache portion that does not belong to it and since it does
> not need to allocate into its own cache portion its LLC occupancy counts will
> not be accurate (the LLC occupancy associated with the buffer will be
> attributed
> to prepare portion).
> 
> I am not familiar with the details of memory allocation but as I understand
> Linux does attempt to satisfy memory requests from the node to which the
> requesting CPU is assigned. For the MBM and MBA tests I thus believe it is
> important to allocate the memory from where it will be used. I have
> encountered
> systems where CPU0 and CPU1 are on different sockets and by default the
> workload
> is set to run on CPU1. If the preparation phase runs on CPU0 then it may be
> that memory could be allocated from a different NUMA node than where the
> workload will
> be running. By doing preparation within the same topology as what the
> workload will be running I believe that memory will be allocated appropriate
> to workload and thus result in more reliable measurements.
> 
> > 
> > You used the topology argument to justify why both parent and child are
> > now in the same resctrl group unlike before when only the actual test was.
> > 
> > > You stated "now we're also going
> > > do non-trivial amount of work in between the setup and the test" ...
> > > could you clarify what the problem is with this? Before this work
> > > the "non-trivial amount of work" (for "fill_buf") was done as part of the
> > > test with (wrong) guesses about how long it takes. This work aims to
> > > improve
> > > this.
> > 
> > I understand why you're trying to do with this change.
> > 
> > However, I was trying to say that before preparations occurred right
> > before the test which is no longer the case because there's fork() in
> > between.
> 
> If by "test" you mean the measurement phase then in the case of "fill_buf"
> preparations only now reliably occur before the measurement. Original behavior
> is maintained with user provided benchmark.
> 
> > 
> > Does that extra work impact the state carry-over from preparations to the
> > test?
> 
> It is not clear to me what extra work or state you are referring to.
> 
> > 
> > > > fork() quite heavy operation as it has to copy various things including
> > > > the address space which you just made to contain a huge mem blob. :-)
> > > 
> > > As highlighted in a comment found in the patch, the kernel uses
> > > copy-on-write
> > > and all tests only read from the buffer after fork().
> > 
> > Write test is possible using -b fill_buf ... as mentioned in comments to
> > the other patch.
> 
> Yes, it is theoretically possible, but looking closer it is not supported.
> Note
> how measure_mem_bw() is always called with hardcoded "reads". Reading through
> the history of commits I do not believe modifying fill_buf parameters was
> ever intended to be a use case for the "-b" parameter. It seems intended
> to provide an alternate benchmark to fill_buf.
> 
> I am not interested in adding support for the write test because I do not see
> how it helps us to improve resctrl subsystem health. Considering that
> nobody has ever complained about the write test being broken I think all
> that dead code should just be removed instead.
> 
> I prefer the focus be on the health of the resctrl subsystem instead of
> additional
> hardware performance testing. I do not think it is energy well spent to
> further
> tune these performance tests unless it benefits resctrl subsystem health.
> These
> performance tests are difficult to maintain and I have not yet seen how they
> benefit the resctrl subsystem.
> 
> > > > BTW, perhaps we could use some lighter weighted fork variant in the
> > > > resctrl selftests, the processes don't really need to be that separated
> > > > to justify using full fork() (and custom benchmarks will do execvp()).
> > > 
> > > Are you thinking about pthreads? It is not clear to me that this is
> > > necessary. It is also not clear to me what problem you are describing that
> > > needs this solution. When I understand that better it will be easier to
> > > discuss solutions.
> > 
> > I was trying to say I see advantage of retaining the address space which
> > is not possible with fork(), so perhaps using clone() with CLONE_VM would
> > be useful and maybe some other flags too. I think this would solve the
> > write test case.
> 
> clone() brings with it the complexity of needing to manage the child's
> stack. This again aims for a performance improvement. What does this fix?
> What is the benefit to resctrl health? I would prefer that our energy
> instead be focused on improving resctrl subsystem health.

Fair enough.

-- 
 i.

  reply	other threads:[~2024-09-09 12:52 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-29 22:52 [PATCH 0/6] selftests/resctrl: Support diverse platforms with MBM and MBA tests Reinette Chatre
2024-08-29 22:52 ` [PATCH 1/6] selftests/resctrl: Fix sparse warnings Reinette Chatre
2024-08-30 10:29   ` Ilpo Järvinen
2024-08-29 22:52 ` [PATCH 2/6] selftests/resctrl: Ensure measurements skip initialization of default benchmark Reinette Chatre
2024-08-30 10:56   ` Ilpo Järvinen
2024-08-30 16:00     ` Reinette Chatre
2024-09-04 11:57       ` Ilpo Järvinen
2024-09-04 21:15         ` Reinette Chatre
2024-09-05 12:10           ` Ilpo Järvinen
2024-09-05 18:08             ` Reinette Chatre
2024-09-06 10:00               ` Ilpo Järvinen
2024-09-07  0:05                 ` Reinette Chatre
2024-09-09 12:52                   ` Ilpo Järvinen [this message]
2024-08-29 22:52 ` [PATCH 3/6] selftests/resctrl: Simplify benchmark parameter passing Reinette Chatre
2024-08-30 11:13   ` Ilpo Järvinen
2024-08-30 16:01     ` Reinette Chatre
2024-08-29 22:52 ` [PATCH 4/6] selftests/resctrl: Use cache size to determine "fill_buf" buffer size Reinette Chatre
2024-08-30 11:25   ` Ilpo Järvinen
2024-08-30 16:00     ` Reinette Chatre
2024-08-29 22:52 ` [PATCH 5/6] selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth Reinette Chatre
2024-08-30 11:42   ` Ilpo Järvinen
2024-08-30 16:00     ` Reinette Chatre
2024-09-04 11:43       ` Ilpo Järvinen
2024-09-04 21:15         ` Reinette Chatre
2024-09-05 11:45           ` Ilpo Järvinen
2024-09-05 18:08             ` Reinette Chatre
2024-09-06  8:44               ` Ilpo Järvinen
2024-09-07  0:05                 ` Reinette Chatre
2024-09-09  8:13                   ` Ilpo Järvinen
2024-08-29 22:52 ` [PATCH 6/6] selftests/resctrl: Keep results from first test run Reinette Chatre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ccc7579f-2633-adda-2a8e-235021c5c785@linux.intel.com \
    --to=ilpo.jarvinen@linux.intel.com \
    --cc=babu.moger@amd.com \
    --cc=fenghua.yu@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=maciej.wieczor-retman@intel.com \
    --cc=peternewman@google.com \
    --cc=reinette.chatre@intel.com \
    --cc=shuah@kernel.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox