Re: [RFC PATCH] sched/fair: Make tg->load_avg per node

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Chen Yu <yu.c.chen@intel.com>
To: Aaron Lu <aaron.lu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, "Mel Gorman" <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Nitin Tekchandani <nitin.tekchandani@intel.com>,
	Waiman Long <longman@redhat.com>, <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH] sched/fair: Make tg->load_avg per node
Date: Tue, 4 Apr 2023 16:25:04 +0800	[thread overview]
Message-ID: <ZCve4JaH8EhxBcwQ@chenyu5-mobl1> (raw)
In-Reply-To: <20230327053955.GA570404@ziqianlu-desk2>

On 2023-03-27 at 13:39:55 +0800, Aaron Lu wrote:
> When using sysbench to benchmark Postgres in a single docker instance
> with sysbench's nr_threads set to nr_cpu, it is observed there are times
> update_cfs_group() and update_load_avg() shows noticeable overhead on
> cpus of one node of a 2sockets/112core/224cpu Intel Sapphire Rapids:
> 
>     10.01%     9.86%  [kernel.vmlinux]        [k] update_cfs_group
>      7.84%     7.43%  [kernel.vmlinux]        [k] update_load_avg
> 
> While cpus of the other node normally sees a lower cycle percent:
> 
>      4.46%     4.36%  [kernel.vmlinux]        [k] update_cfs_group
>      4.02%     3.40%  [kernel.vmlinux]        [k] update_load_avg
> 
> Annotate shows the cycles are mostly spent on accessing tg->load_avg
> with update_load_avg() being the write side and update_cfs_group() being
> the read side.
> 
> The reason why only cpus of one node has bigger overhead is: task_group
> is allocated on demand from a slab and whichever cpu happens to do the
> allocation, the allocated tg will be located on that node and accessing
> to tg->load_avg will have a lower cost for cpus on the same node and
> a higer cost for cpus of the remote node.
> 
> Tim Chen told me that PeterZ once mentioned a way to solve a similar
> problem by making a counter per node so do the same for tg->load_avg.
> After this change, the worst number I saw during a 5 minutes run from
> both nodes are:
> 
>      2.77%     2.11%  [kernel.vmlinux]        [k] update_load_avg
>      2.72%     2.59%  [kernel.vmlinux]        [k] update_cfs_group
>
The same issue was found when running netperf on this platform.
According to the perf profile:

11.90%    11.84%  swapper          [kernel.kallsyms]   [k] update_cfs_group
9.79%     9.43%  swapper           [kernel.kallsyms]   [k] update_load_avg

these two functions took quite some cycles.

1. cpufreq governor set to performance, turbo disabled, C6 disabled
2. launches 224 instances of netperf, and each instance is:
   netperf -4 -H 127.0.0.1 -t UDP_RR/TCP_RR -c -C -l 100 & 
3. perf record -ag sleep 4

Also the test script could be downloaded via
https://github.com/yu-chen-surf/schedtests.git


thanks,
Chenyu

next prev parent reply	other threads:[~2023-04-04  8:27 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-27  5:39 [RFC PATCH] sched/fair: Make tg->load_avg per node Aaron Lu
2023-03-27 14:45 ` Chen Yu
2023-03-28  6:42   ` Aaron Lu
2023-03-28 12:09 ` Dietmar Eggemann
2023-03-28 12:56   ` Aaron Lu
2023-03-29 12:36     ` Dietmar Eggemann
2023-03-29 13:54       ` Aaron Lu
2023-03-30 17:45         ` Daniel Jordan
2023-03-30 19:51           ` Daniel Jordan
2023-03-31  4:06             ` Aaron Lu
2023-03-31 15:48               ` Dietmar Eggemann
2023-04-03  7:53                 ` Aaron Lu
2023-04-05 21:04               ` Daniel Jordan
2023-04-12 12:07           ` Peter Zijlstra
2023-04-20 20:52             ` Daniel Jordan
2023-04-21 15:05               ` Aaron Lu
2023-05-03 19:41                 ` Daniel Jordan
2023-05-04 10:27                   ` Aaron Lu
2023-05-16  7:50                     ` Aaron Lu
2023-05-16  8:57                       ` Chen Yu
2023-05-16 11:32                         ` Aaron Lu
2023-03-29 14:55       ` Chen Yu
2023-04-04  8:25 ` Chen Yu [this message]
2023-04-04 13:33   ` Aaron Lu
2023-04-04 15:15 ` Aaron Lu
2023-04-04 15:37   ` Chen Yu
2023-04-05 21:31   ` Daniel Jordan
2023-04-12 11:59 ` Peter Zijlstra
2023-04-12 13:58   ` Peter Zijlstra
2023-04-12 14:11     ` Aaron Lu
2023-04-12 14:01   ` Aaron Lu
2023-04-22  4:01 ` Chen Yu
2023-04-22  6:04   ` Aaron Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZCve4JaH8EhxBcwQ@chenyu5-mobl1 \
    --to=yu.c.chen@intel.com \
    --cc=aaron.lu@intel.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=nitin.tekchandani@intel.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tim.c.chen@intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).