Re: [RFC PATCH] sched/fair: Make tg->load_avg per node

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Chen Yu <yu.c.chen@intel.com>
To: Aaron Lu <aaron.lu@intel.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	"Ingo Molnar" <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	"Vincent Guittot" <vincent.guittot@linaro.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	"Daniel Bristot de Oliveira" <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Nitin Tekchandani <nitin.tekchandani@intel.com>,
	Waiman Long <longman@redhat.com>, <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH] sched/fair: Make tg->load_avg per node
Date: Tue, 16 May 2023 16:57:52 +0800	[thread overview]
Message-ID: <ZGNFkDkyipat5J8v@chenyu5-mobl1> (raw)
In-Reply-To: <20230516075011.GA780@ziqianlu-desk2>

On 2023-05-16 at 15:50:11 +0800, Aaron Lu wrote:
> On Thu, May 04, 2023 at 06:27:46PM +0800, Aaron Lu wrote:
> > Base on my current understanding, the summary is:
> > - Running this workload with nr_thread=224 on SPR, the ingress queue
> >   will overflow and that will slow things down. This patch helps
> >   performance mainly because it transform the "many cpus accessing the
> >   same cacheline" scenario to "many cpus accessing two cachelines" and
> >   that can reduce the likelyhood of ingress queue overflow and thus,
> >   helps performance;
> > - On Icelake with high nr_threads but not too high that would cause
> >   100% cpu utilization, the two functions' cost will drop a little but
> >   performance did not improve(it actually regressed a little);
> > - On SPR when there is no ingress queue overflow, it's similar to
> >   Icelake: the two functions' cost will drop but performance did not
> >   improve.
> 
> More results when running hackbench and netperf on Sapphire Rapids as
> well as on 2 sockets Icelake and 2 sockets Cascade Lake.
> 
> The summary is:
> - on SPR, hackbench time reduced ~8% and netperf(UDP_RR/nr_thread=100%)
>   performance increased ~50%;
> - on Icelake, performance regressed about 1%-2% for postgres_sysbench
>   and hackbench, netperf has no performance change;
> - on Cascade Lake, netperf/UDP_RR/nr_thread=50% sees performance drop
>   ~3%; others have no performance change.
> 
> Together with results kindly collected by Daniel, it looks this patch
> helps most for SPR while for other machines, it either is flat or
> regressed 1%-3% for some workloads. With these results, I'm thinking an
> alternative solution to reduce the cost of accessing tg->load_avg.
> 
> There are two main reasons to access tg->load_avg. One is driven by
> pelt decay, which has a fixed frequency and is not a concern; the other
> is by enqueue_entity/dequeue_entity triggered by task migration. The
> number of migrations can be unbound so the access to tg->load_avg can
> be huge due to this. This frequent task migration is the problem for
> tg->load_avg. One thing I noticed is, on task migration, the load is
> carried from the old per-cpu cfs_rq to the new per-cpu cfs_rq. While
> the cfs_rq's load_avg and tg_load_avg_contrib should change accordingly
> to reflect this so that its corresponding sched entity can get a correct
> weight, the task group's load_avg should stay unchanged. So instead of
> removing a delta to tg->load_avg by src cfs_rq and then increasing the
> same delta to tg->load_avg by target cfs_rq, the two updates to tg's
> load_avg could be avoided. With this change, the update to tg->load_avg
> will be greatly reduced and the problem should be solved and it is
> likely to be a win for most machines/workloads. Not sure if I understand
> this correctly? I'm going to persue a solution based on this, feel free
> to let me know if you see anything wrong here, thanks.
Sound good, but maybe I understand it incorrectly, if the task has been dequeued
for a long time, and not enqueued yet, since we do not update
the tg->load_avg, will it be out-of-date? Or do you mean the task migration
is a frequent sleep-wakeup sequence?

thanks,
Chenyu

next prev parent reply	other threads:[~2023-05-16  8:58 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-27  5:39 [RFC PATCH] sched/fair: Make tg->load_avg per node Aaron Lu
2023-03-27 14:45 ` Chen Yu
2023-03-28  6:42   ` Aaron Lu
2023-03-28 12:09 ` Dietmar Eggemann
2023-03-28 12:56   ` Aaron Lu
2023-03-29 12:36     ` Dietmar Eggemann
2023-03-29 13:54       ` Aaron Lu
2023-03-30 17:45         ` Daniel Jordan
2023-03-30 19:51           ` Daniel Jordan
2023-03-31  4:06             ` Aaron Lu
2023-03-31 15:48               ` Dietmar Eggemann
2023-04-03  7:53                 ` Aaron Lu
2023-04-05 21:04               ` Daniel Jordan
2023-04-12 12:07           ` Peter Zijlstra
2023-04-20 20:52             ` Daniel Jordan
2023-04-21 15:05               ` Aaron Lu
2023-05-03 19:41                 ` Daniel Jordan
2023-05-04 10:27                   ` Aaron Lu
2023-05-16  7:50                     ` Aaron Lu
2023-05-16  8:57                       ` Chen Yu [this message]
2023-05-16 11:32                         ` Aaron Lu
2023-03-29 14:55       ` Chen Yu
2023-04-04  8:25 ` Chen Yu
2023-04-04 13:33   ` Aaron Lu
2023-04-04 15:15 ` Aaron Lu
2023-04-04 15:37   ` Chen Yu
2023-04-05 21:31   ` Daniel Jordan
2023-04-12 11:59 ` Peter Zijlstra
2023-04-12 13:58   ` Peter Zijlstra
2023-04-12 14:11     ` Aaron Lu
2023-04-12 14:01   ` Aaron Lu
2023-04-22  4:01 ` Chen Yu
2023-04-22  6:04   ` Aaron Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZGNFkDkyipat5J8v@chenyu5-mobl1 \
    --to=yu.c.chen@intel.com \
    --cc=aaron.lu@intel.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=nitin.tekchandani@intel.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tim.c.chen@intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox