From: Mel Gorman <mgorman@techsingularity.net>
To: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Matt Fleming <matt@codeblueprint.co.uk>,
Mike Galbraith <mgalbraith@suse.de>,
Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/1] sched: Make schedstats a runtime tunable that is disabled by default v4
Date: Wed, 3 Feb 2016 13:32:46 +0000 [thread overview]
Message-ID: <20160203133246.GQ8337@techsingularity.net> (raw)
In-Reply-To: <20160203124921.GA28953@gmail.com>
On Wed, Feb 03, 2016 at 01:49:21PM +0100, Ingo Molnar wrote:
>
> * Mel Gorman <mgorman@techsingularity.net> wrote:
>
> > On Wed, Feb 03, 2016 at 12:28:49PM +0100, Ingo Molnar wrote:
> > >
> > > * Mel Gorman <mgorman@techsingularity.net> wrote:
> > >
> > > > Changelog since v3
> > > > o Force enable stats during profiling and latencytop
> > > >
> > > > Changelog since V2
> > > > o Print stats that are not related to schedstat
> > > > o Reintroduce a static inline for update_stats_dequeue
> > > >
> > > > Changelog since V1
> > > > o Introduce schedstat_enabled and address Ingo's feedback
> > > > o More schedstat-only paths eliminated, particularly ttwu_stat
> > > >
> > > > schedstats is very useful during debugging and performance tuning but it
> > > > incurs overhead. As such, even though it can be disabled at build time,
> > > > it is often enabled as the information is useful. This patch adds a
> > > > kernel command-line and sysctl tunable to enable or disable schedstats on
> > > > demand. It is disabled by default as someone who knows they need it can
> > > > also learn to enable it when necessary.
> > > >
> > > > The benefits are workload-dependent but when it gets down to it, the
> > > > difference will be whether cache misses are incurred updating the shared
> > > > stats or not. [...]
> > >
> > > Hm, which shared stats are those?
> >
> > Extremely poor phrasing on my part. The stats share a cache line and the impact
> > partially depends on whether unrelated stats share a cache line or not during
> > updates.
>
> Yes, but the question is, are there true cross-CPU cache-misses? I.e. are there
> any 'global' (or per node) counters that we keep touching and which keep
> generating cache-misses?
>
I haven't specifically identified them as I consider the calculations for
some of them to be expensive in their own right even without accounting for
cache misses. Moving to per-cpu counters would not eliminate all cache misses
as a stat updated on one CPU for a task that is woken on a separate CPU is
still going to trigger a cache miss. Even if such counters were identified
and moved to separate cache lines, the calculation overhead would remain.
> > > I think we should really fix those as well: those shared stats should be
> > > percpu collected as well, with no extra cache misses in any scheduler fast
> > > path.
> >
> > I looked into that but converting those stats to per-cpu counters would incur
> > sizable memory overhead. There are a *lot* of them and the basic structure for
> > the generic percpu-counter is
> >
> > struct percpu_counter {
> > raw_spinlock_t lock;
> > s64 count;
> > #ifdef CONFIG_HOTPLUG_CPU
> > struct list_head list; /* All percpu_counters are on a list */
> > #endif
> > s32 __percpu *counters;
> > };
>
> We don't have to reuse percpu_counter().
>
No, but rolling a specialised solution for a debugging feature is overkill
and the calculation overhead would remain. It's specialised code with very
little upside.
The main gain from the patch is that the calculation overhead is
avoided. Avoid any potential cache miss is a bonus.
> > That's not taking the associated runtime overhead such as synchronising them.
>
> Why do we have to synchronize them in the kernel?
Because some simply require it or are not suitable for moving to per-cpu
counters at all. sleep_start is an obvious one as it can wake on another
CPU.
>? User-space can recover them on a
> percpu basis and add them up if it wishes to. We can update the schedstat utility
> to handle the more spread out fields as well.
>
Any user of /proc/pid/sched would also need updating, including latencytop
and all of them will need to be able to handle CPU hotplug or else deal
with the output from all possible CPUs instead of the currently online ones.
--
Mel Gorman
SUSE Labs
next prev parent reply other threads:[~2016-02-03 13:32 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-03 11:07 [PATCH 1/1] sched: Make schedstats a runtime tunable that is disabled by default v4 Mel Gorman
2016-02-03 11:28 ` Ingo Molnar
2016-02-03 11:39 ` Mel Gorman
2016-02-03 12:49 ` Ingo Molnar
2016-02-03 13:32 ` Mel Gorman [this message]
2016-02-03 14:56 ` Mel Gorman
2016-02-03 11:51 ` Srikar Dronamraju
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160203133246.GQ8337@techsingularity.net \
--to=mgorman@techsingularity.net \
--cc=linux-kernel@vger.kernel.org \
--cc=matt@codeblueprint.co.uk \
--cc=mgalbraith@suse.de \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=srikar@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox