Re: [RFC PATCH] sched/fair: Make tg->load_avg per node

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Daniel Jordan <daniel.m.jordan@oracle.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Aaron Lu <aaron.lu@intel.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Nitin Tekchandani <nitin.tekchandani@intel.com>,
	Waiman Long <longman@redhat.com>, Yu Chen <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] sched/fair: Make tg->load_avg per node
Date: Thu, 20 Apr 2023 16:52:01 -0400	[thread overview]
Message-ID: <20230420205201.36fphk5g3aolryjh@parnassus.localdomain> (raw)
In-Reply-To: <20230412120736.GD628377@hirez.programming.kicks-ass.net>

On Wed, Apr 12, 2023 at 02:07:36PM +0200, Peter Zijlstra wrote:
> On Thu, Mar 30, 2023 at 01:45:57PM -0400, Daniel Jordan wrote:
> 
> > The topology of my machine is different from yours, but it's the biggest
> > I have, and I'm assuming cpu count is more important than topology when
> > reproducing the remote accesses.  I also tried on
> 
> Core count definitely matters some, but the thing that really hurts is
> the cross-node (and cross-cache, which for intel happens to be the same
> set) atomics.
> 
> I suppose the thing to measure is where this cost rises most sharply on
> the AMD platforms -- is that cross LLC or cross Node?
> 
> I mean, setting up the split at boot time is fairly straight forward and
> we could equally well split at LLC.

To check the cross LLC case, I bound all postgres and sysbench tasks to
a node.  The two functions aren't free then on either AMD or Intel,
multiple LLCs or not, but the pain is a bit greater in the cross node
(unbound) case.

The read side (update_cfs_group) gets more expensive with per-node tg
load_avg on AMD, especially cross node--those are the biggest diffs.

These are more containerized sysbench runs, just the same as before.
Base is 6.2, test is 6.2 plus this RFC.  Each number under base or test
is the average over ten runs of the profile percent of the function
measured for 5 seconds, 60 seconds into the run.  I ran the experiment a
second time, and the numbers were fairly similar to what's below.

AMD EPYC 7J13 64-Core Processor (NPS1)
    2 sockets * 64 cores * 2 threads = 256 CPUs

                      update_load_avg profile%    update_cfs_group profile%        
affinity  nr_threads          base  test  diff             base  test  diff
 unbound          96           0.7   0.6  -0.1              0.3   0.6   0.4
 unbound         128           0.8   0.7   0.0              0.3   0.7   0.4
 unbound         160           2.4   1.7  -0.7              1.2   2.3   1.1
 unbound         192           2.3   1.7  -0.6              0.9   2.4   1.5
 unbound         224           0.9   0.9   0.0              0.3   0.6   0.3
 unbound         256           0.4   0.4   0.0              0.1   0.2   0.1
   node0          48           0.7   0.6  -0.1              0.3   0.6   0.3
   node0          64           0.7   0.7  -0.1              0.3   0.6   0.3
   node0          80           1.4   1.3  -0.1              0.3   0.6   0.3
   node0          96           1.5   1.4  -0.1              0.3   0.6   0.3
   node0         112           0.8   0.8   0.0              0.2   0.4   0.2
   node0         128           0.4   0.4   0.0              0.1   0.2   0.1
   node1          48           0.7   0.6  -0.1              0.3   0.6   0.3
   node1          64           0.7   0.6  -0.1              0.3   0.6   0.3
   node1          80           1.4   1.2  -0.1              0.3   0.6   0.3
   node1          96           1.4   1.3  -0.2              0.3   0.6   0.3
   node1         112           0.8   0.7  -0.1              0.2   0.3   0.2
   node1         128           0.4   0.4   0.0              0.1   0.2   0.1
                                                                                             
Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
    2 sockets * 32 cores * 2 thread = 128 CPUs
                                                                                             
                      update_load_avg profile%    update_cfs_group profile%        
affinity  nr_threads          base  test  diff             base  test  diff
 unbound          48           0.4   0.4   0.0              0.4   0.5   0.1
 unbound          64           0.5   0.5   0.0              0.5   0.6   0.1
 unbound          80           2.0   1.8  -0.2              2.7   2.4  -0.3
 unbound          96           3.3   2.8  -0.5              3.6   3.3  -0.3
 unbound         112           2.8   2.6  -0.2              4.1   3.3  -0.8
 unbound         128           0.4   0.4   0.0              0.4   0.4   0.1
   node0          24           0.4   0.4   0.0              0.3   0.5   0.2
   node0          32           0.5   0.5   0.0              0.3   0.4   0.2
   node0          40           1.0   1.1   0.1              0.7   0.8   0.1
   node0          48           1.5   1.6   0.1              0.8   0.9   0.1
   node0          56           1.8   1.9   0.1              0.8   0.9   0.1
   node0          64           0.4   0.4   0.0              0.2   0.4   0.1
   node1          24           0.4   0.5   0.0              0.3   0.5   0.2
   node1          32           0.4   0.5   0.0              0.3   0.5   0.2
   node1          40           1.0   1.1   0.0              0.7   0.8   0.1
   node1          48           1.6   1.6   0.1              0.8   0.9   0.1
   node1          56           1.8   1.9   0.1              0.8   0.9   0.1
   node1          64           0.4   0.4   0.0              0.2   0.4   0.1

next prev parent reply	other threads:[~2023-04-20 20:52 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-27  5:39 [RFC PATCH] sched/fair: Make tg->load_avg per node Aaron Lu
2023-03-27 14:45 ` Chen Yu
2023-03-28  6:42   ` Aaron Lu
2023-03-28 12:09 ` Dietmar Eggemann
2023-03-28 12:56   ` Aaron Lu
2023-03-29 12:36     ` Dietmar Eggemann
2023-03-29 13:54       ` Aaron Lu
2023-03-30 17:45         ` Daniel Jordan
2023-03-30 19:51           ` Daniel Jordan
2023-03-31  4:06             ` Aaron Lu
2023-03-31 15:48               ` Dietmar Eggemann
2023-04-03  7:53                 ` Aaron Lu
2023-04-05 21:04               ` Daniel Jordan
2023-04-12 12:07           ` Peter Zijlstra
2023-04-20 20:52             ` Daniel Jordan [this message]
2023-04-21 15:05               ` Aaron Lu
2023-05-03 19:41                 ` Daniel Jordan
2023-05-04 10:27                   ` Aaron Lu
2023-05-16  7:50                     ` Aaron Lu
2023-05-16  8:57                       ` Chen Yu
2023-05-16 11:32                         ` Aaron Lu
2023-03-29 14:55       ` Chen Yu
2023-04-04  8:25 ` Chen Yu
2023-04-04 13:33   ` Aaron Lu
2023-04-04 15:15 ` Aaron Lu
2023-04-04 15:37   ` Chen Yu
2023-04-05 21:31   ` Daniel Jordan
2023-04-12 11:59 ` Peter Zijlstra
2023-04-12 13:58   ` Peter Zijlstra
2023-04-12 14:11     ` Aaron Lu
2023-04-12 14:01   ` Aaron Lu
2023-04-22  4:01 ` Chen Yu
2023-04-22  6:04   ` Aaron Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230420205201.36fphk5g3aolryjh@parnassus.localdomain \
    --to=daniel.m.jordan@oracle.com \
    --cc=aaron.lu@intel.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=nitin.tekchandani@intel.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tim.c.chen@intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).