public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Chen Yu <yu.c.chen@intel.com>
To: Aaron Lu <aaron.lu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Daniel Jordan <daniel.m.jordan@oracle.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	"Valentin Schneider" <vschneid@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	"Nitin Tekchandani" <nitin.tekchandani@intel.com>,
	Waiman Long <longman@redhat.com>, <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH] sched/fair: Make tg->load_avg per node
Date: Tue, 4 Apr 2023 23:37:29 +0800	[thread overview]
Message-ID: <ZCxEOVBO3kPGF4FU@chenyu5-mobl1> (raw)
In-Reply-To: <20230404151540.GA51499@ziqianlu-desk2>

On 2023-04-04 at 23:15:40 +0800, Aaron Lu wrote:
> On Mon, Mar 27, 2023 at 01:39:55PM +0800, Aaron Lu wrote:
> [...]
> > Another observation of this workload is: it has a lot of wakeup time
> > task migrations and that is the reason why update_load_avg() and
> > update_cfs_group() shows noticeable cost. Running this workload in N
> > instances setup where N >= 2 with sysbench's nr_threads set to 1/N nr_cpu,
> > task migrations on wake up time are greatly reduced and the overhead from
> > the two above mentioned functions also dropped a lot. It's not clear to
> > me why running in multiple instances can reduce task migrations on
> > wakeup path yet.
> 
> Regarding this observation, I've some finding. The TLDR is: 1 instance
> setup's overall CPU util is lower than N >= 2 instances setup and as a
> result, under 1 instance setup, sis() is more likely to find idle cpus
> than N >= 2 instances setup and that is the reason why 1 instance setup
> has more migrations.
> 
> More details:
> 
> For 1 instance with nr_thread=nr_cpu=224 setup, during a 5s window,
> there are 10 million calls of select_idle_sibling() and 6.1 million
> migrations. Of these migrations, 4.6 million comes from select_idle_cpu(),
> 1.3 million comes from recent_cpu.
> mpstat of this time window:
> Average:    NODE    %usr   %nice    %sys %iowait    %irq   %soft  %steal %guest  %gnice   %idle
> Average:     all   45.15    0.00   18.59    0.00    0.00   17.29    0.00 0.00    0.00   18.98
> Average:       0   38.14    0.00   17.29    0.00    0.00   14.77    0.00 0.00    0.00   29.80
> Average:       1   52.07    0.00   19.88    0.00    0.00   19.78    0.00 0.00    0.00    8.28
> 
> 
> For 4 instance with nr_thread=56 setup, during a 5s window, there are 15
> million calls of select_idle_sibling() and only 30k migrations.
> select_idle_cpu() is called 15 million times but only 23k of them passed
> the sd_share->nr_idle_scan != 0 test.
> mpstat of this time window:
> Average:    NODE    %usr   %nice    %sys %iowait    %irq   %soft  %steal %guest  %gnice   %idle
> Average:     all   68.54    0.00   21.54    0.00    0.00    8.35    0.00 0.00    0.00    1.58
> Average:       0   70.05    0.00   20.92    0.00    0.00    8.17    0.00 0.00    0.00    0.87
> Average:       1   67.03    0.00   22.16    0.00    0.00    8.53    0.00 0.00    0.00    2.29
> 
> For 8 instance with nr_thread=28 setup, during a 5s window, there are
> 16 million calls of select_idle_sibling() and 9.6k migrations.
> select_idle_cpu() is called 16 million times but none of them passed the
> sd_share->nr_idle_scan != 0 test.
> mpstat of this time window:
> Average:    NODE    %usr   %nice    %sys %iowait    %irq   %soft  %steal %guest  %gnice   %idle
> Average:     all   70.29    0.00   20.99    0.00    0.00    8.28    0.00 0.00    0.00    0.43
> Average:       0   71.58    0.00   19.98    0.00    0.00    8.04    0.00 0.00    0.00    0.40
> Average:       1   69.00    0.00   22.01    0.00    0.00    8.52    0.00 0.00    0.00    0.47
> 
> On a side note: when sd_share->nr_idle_scan > 0 and has_idle_core is true,
> then sd_share->nr_idle_scan is not actually respected. Is this intended?
> It seems to say: if there is idle core, then let's try hard and ignore
> SIS_UTIL to find that idle core, right?
Yes, SIS_UTIL inherits the logic of SIS_PROP, which honors has_idle_core and
scans at any cost. Abel previously proposed a patch to make this more aggressive
by not allowing SIS_UTIL to take effect even when the system is overloaded.
https://lore.kernel.org/lkml/20221019122859.18399-3-wuyun.abel@bytedance.com/

thanks,
Chenyu

  reply	other threads:[~2023-04-04 15:39 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-27  5:39 [RFC PATCH] sched/fair: Make tg->load_avg per node Aaron Lu
2023-03-27 14:45 ` Chen Yu
2023-03-28  6:42   ` Aaron Lu
2023-03-28 12:09 ` Dietmar Eggemann
2023-03-28 12:56   ` Aaron Lu
2023-03-29 12:36     ` Dietmar Eggemann
2023-03-29 13:54       ` Aaron Lu
2023-03-30 17:45         ` Daniel Jordan
2023-03-30 19:51           ` Daniel Jordan
2023-03-31  4:06             ` Aaron Lu
2023-03-31 15:48               ` Dietmar Eggemann
2023-04-03  7:53                 ` Aaron Lu
2023-04-05 21:04               ` Daniel Jordan
2023-04-12 12:07           ` Peter Zijlstra
2023-04-20 20:52             ` Daniel Jordan
2023-04-21 15:05               ` Aaron Lu
2023-05-03 19:41                 ` Daniel Jordan
2023-05-04 10:27                   ` Aaron Lu
2023-05-16  7:50                     ` Aaron Lu
2023-05-16  8:57                       ` Chen Yu
2023-05-16 11:32                         ` Aaron Lu
2023-03-29 14:55       ` Chen Yu
2023-04-04  8:25 ` Chen Yu
2023-04-04 13:33   ` Aaron Lu
2023-04-04 15:15 ` Aaron Lu
2023-04-04 15:37   ` Chen Yu [this message]
2023-04-05 21:31   ` Daniel Jordan
2023-04-12 11:59 ` Peter Zijlstra
2023-04-12 13:58   ` Peter Zijlstra
2023-04-12 14:11     ` Aaron Lu
2023-04-12 14:01   ` Aaron Lu
2023-04-22  4:01 ` Chen Yu
2023-04-22  6:04   ` Aaron Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZCxEOVBO3kPGF4FU@chenyu5-mobl1 \
    --to=yu.c.chen@intel.com \
    --cc=aaron.lu@intel.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=nitin.tekchandani@intel.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tim.c.chen@intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox