From: Jason Low <jason.low2@hp.com>
To: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Paul Turner <pjt@google.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>, Ben Segall <bsegall@google.com>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2] sched: Reduce contention in update_cfs_rq_blocked_load
Date: Thu, 28 Aug 2014 12:46:36 -0700 [thread overview]
Message-ID: <1409255196.4945.7.camel@j-VirtualBox> (raw)
In-Reply-To: <1409182369.27939.9.camel@schen9-desk2.jf.intel.com>
On Wed, 2014-08-27 at 16:32 -0700, Tim Chen wrote:
> On Wed, 2014-08-27 at 10:34 -0700, Jason Low wrote:
> > On Tue, 2014-08-26 at 16:24 -0700, Paul Turner wrote:
> > > On Tue, Aug 26, 2014 at 4:11 PM, Jason Low <jason.low2@hp.com> wrote:
> > > > Based on perf profiles, the update_cfs_rq_blocked_load function constantly
> > > > shows up as taking up a noticeable % of system run time. This is especially
> > > > apparent on larger numa systems.
> > > >
> > > > Much of the contention is in __update_cfs_rq_tg_load_contrib when we're
> > > > updating the tg load contribution stats. However, it was noticed that the
> > > > values often don't get modified by much. In fact, much of the time, they
> > > > don't get modified at all. However, the update can always get attempted due
> > > > to force_update.
> > > >
> > > > In this patch, we remove the force_update in only the
> > > > __update_cfs_rq_tg_load_contrib. Thus the tg load contrib stats now get
> > > > modified only if the delta is large enough (in the current code, they get
> > > > updated when the delta is larger than 12.5%). This is a way to rate-limit
> > > > the updates while largely keeping the values accurate.
> > > >
> > > > When testing this change with AIM7 workloads, we found that it was able to
> > > > reduce the overhead of the function by up to a factor of 20x.
> > >
> > > Looks reasonable.
> > >
> > > >
> > > > Cc: Yuyang Du <yuyang.du@intel.com>
> > > > Cc: Waiman Long <Waiman.Long@hp.com>
> > > > Cc: Mel Gorman <mgorman@suse.de>
> > > > Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
> > > > Cc: Rik van Riel <riel@redhat.com>
> > > > Cc: Aswin Chandramouleeswaran <aswin@hp.com>
> > > > Cc: Chegu Vinod <chegu_vinod@hp.com>
> > > > Cc: Scott J Norton <scott.norton@hp.com>
> > > > Signed-off-by: Jason Low <jason.low2@hp.com>
> > > > ---
> > > > kernel/sched/fair.c | 10 ++++------
> > > > 1 files changed, 4 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > index fea7d33..7a6e18b 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -2352,8 +2352,7 @@ static inline u64 __synchronize_entity_decay(struct sched_entity *se)
> > > > }
> > > >
> > > > #ifdef CONFIG_FAIR_GROUP_SCHED
> > > > -static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
> > > > - int force_update)
> > > > +static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq)
> > > > {
> > > > struct task_group *tg = cfs_rq->tg;
> > > > long tg_contrib;
> > > > @@ -2361,7 +2360,7 @@ static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
> > > > tg_contrib = cfs_rq->runnable_load_avg + cfs_rq->blocked_load_avg;
> > > > tg_contrib -= cfs_rq->tg_load_contrib;
> > > >
> > > > - if (force_update || abs(tg_contrib) > cfs_rq->tg_load_contrib / 8) {
> > >
> > > Another option with slightly higher accuracy would be to increase the
> > > sensitivity here when force_update == 1.
> > >
> > > E.g.:
> > > abs(tg_contrib) > cfs_rq->tg_load_contrib / (8 * (1 + force_update))) { ...
> > >
> > > Alternatively we could bound total inaccuracy:
> > > int divisor = force_update ? NR_CPUS : 8;
> > > if (abs(tg_contrib) > cfs_rq->tg_load_contrib / divisor) { ...
> > >
> > >
> > > [ And probably rename force_update to want_update ]
> >
> > Out of the 2 additional options, I think the first one is better. The
> > other option of using NR_CPUS looks like we're increasing the update
> > rate as the system gets larger, and its the larger systems that are
> > typically more affected by the contention.
>
> Probably num_present_cpus() will be better than NR_CPUS, which can
> be much larger than the actual cpus present.
Yeah, num_present_cpus(), though the same issue would still be there.
> >
> > Do you prefer either of the 2 other options over this v2 patch? If so, I
> > can test and send out a new patch, otherwise, I'll keep this current v2
> > patch.
>
> If there are multiple non-forced updates, option 1's error seems to
> accumulate and non-bounded as we do not actually update?
> Is this a concern?
It should be fine. Once the delta is large enough, we will end up doing
the update anyway.
Thanks.
next prev parent reply other threads:[~2014-08-28 19:46 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-26 23:11 [PATCH v2] sched: Reduce contention in update_cfs_rq_blocked_load Jason Low
2014-08-26 23:24 ` Paul Turner
2014-08-27 17:34 ` Jason Low
2014-08-27 23:32 ` Tim Chen
2014-08-28 19:46 ` Jason Low [this message]
2014-09-01 12:55 ` Peter Zijlstra
2014-09-02 7:41 ` Jason Low
2014-09-03 11:32 ` Peter Zijlstra
2014-09-09 14:52 ` [tip:sched/core] sched: Reduce contention in update_cfs_rq_blocked_load() tip-bot for Jason Low
2014-08-27 19:18 ` [PATCH RESULT] sched: Rewrite per entity runnable load average tracking v5 Yuyang Du
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1409255196.4945.7.camel@j-VirtualBox \
--to=jason.low2@hp.com \
--cc=bsegall@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=tim.c.chen@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.