From: Peter Zijlstra <peterz@infradead.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ivan Babrou <ivan@cloudflare.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
kernel-team <kernel-team@cloudflare.com>,
Ingo Molnar <mingo@redhat.com>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>
Subject: Re: Lower than expected CPU pressure in PSI
Date: Sat, 8 Feb 2020 11:19:57 +0100 [thread overview]
Message-ID: <20200208101957.GU14946@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20200207130829.GG14897@hirez.programming.kicks-ass.net>
On Fri, Feb 07, 2020 at 02:08:29PM +0100, Peter Zijlstra wrote:
> On Thu, Jan 09, 2020 at 11:16:32AM -0500, Johannes Weiner wrote:
> > On Wed, Jan 08, 2020 at 11:47:10AM -0800, Ivan Babrou wrote:
> > > We added reporting for PSI in cgroups and results are somewhat surprising.
> > >
> > > My test setup consists of 3 services:
> > >
> > > * stress-cpu1-no-contention.service : taskset -c 1 stress --cpu 1
> > > * stress-cpu2-first-half.service : taskset -c 2 stress --cpu 1
> > > * stress-cpu2-second-half.service : taskset -c 2 stress --cpu 1
> > >
> > > First service runs unconstrained, the other two compete for CPU.
> > >
> > > As expected, I can see 500ms/s sched delay for the latter two and
> > > aggregated 1000ms/s delay for /system.slice, no surprises here.
> > >
> > > However, CPU pressure reported by PSI says that none of my services
> > > have any pressure on them. I can see around 434ms/s pressure on
> > > /unified/system.slice and 425ms/s pressure on /unified cgroup, which
> > > is surprising for three reasons:
> > >
> > > * Pressure is absent for my services (I expect it to match scheed delay)
> > > * Pressure on /unified/system.slice is lower than both 500ms/s and 1000ms/s
> > > * Pressure on root cgroup is lower than on system.slice
> >
> > CPU pressure is currently implemented based only on the number of
> > *runnable* tasks, not on who gets to actively use the CPU. This works
> > for contention within cgroups or at the global scope, but it doesn't
> > correctly reflect competition between cgroups. It also doesn't show
> > the effects of e.g. cpu cycle limiting through cpu.max where there
> > might *be* only one runnable task, but it's not getting the CPU.
> >
> > I've been working on fixing this, but hadn't gotten around to sending
> > the patch upstream. Attaching it below. Would you mind testing it?
> >
> > Peter, what would you think of the below?
>
> I'm not loving it; but I see what it does and I can't quickly see an
> alternative.
>
> My main gripe is doing even more of those cgroup traversals.
>
> One thing pick_next_task_fair() does is try and limit the cgroup
> traversal to the sub-tree that contains both prev and next. Not sure
> that is immediately applicable here, but it might be worth looking into.
One option I suppose, would be to replace this:
+static inline void psi_sched_switch(struct task_struct *prev,
+ struct task_struct *next,
+ bool sleep)
+{
+ if (static_branch_likely(&psi_disabled))
+ return;
+
+ /*
+ * Clear the TSK_ONCPU state if the task was preempted. If
+ * it's a voluntary sleep, dequeue will have taken care of it.
+ */
+ if (!sleep)
+ psi_task_change(prev, TSK_ONCPU, 0);
+
+ psi_task_change(next, 0, TSK_ONCPU);
+}
With something like:
static inline void psi_sched_switch(struct task_struct *prev,
struct task_struct *next,
bool sleep)
{
struct psi_group *g, *p = NULL;
set = TSK_ONCPU;
clear = 0;
while ((g = iterate_group(next, &g))) {
u32 nr_running = per_cpu_ptr(g->pcpu, cpu)->tasks[NR_RUNNING];
if (nr_running) {
/* if set, we hit the subtree @prev lives in, terminate */
p = g;
break;
}
/* the rest of psi_task_change */
}
if (sleep)
return;
set = 0;
clear = TSK_ONCPU;
while ((g = iterate_group(prev, &g))) {
if (g == p)
break;
/* the rest of psi_task_change */
}
}
That way we avoid clearing and setting the common parents.
next prev parent reply other threads:[~2020-02-08 10:21 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-08 19:47 Lower than expected CPU pressure in PSI Ivan Babrou
2020-01-09 16:16 ` Johannes Weiner
2020-01-10 19:28 ` Ivan Babrou
2020-01-15 16:55 ` Johannes Weiner
2020-01-16 20:24 ` Ivan Babrou
2020-02-07 13:08 ` Peter Zijlstra
2020-02-08 10:19 ` Peter Zijlstra [this message]
2020-02-10 18:04 ` Johannes Weiner
2020-01-09 16:23 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200208101957.GU14946@hirez.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=hannes@cmpxchg.org \
--cc=ivan@cloudflare.com \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@cloudflare.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox