From: Johannes Weiner <hannes@cmpxchg.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Tejun Heo <tj@kernel.org>, Suren Baghdasaryan <surenb@google.com>,
Daniel Drake <drake@endlessm.com>,
Vinayak Menon <vinmenon@codeaurora.org>,
Christopher Lameter <cl@linux.com>,
Mike Galbraith <efault@gmx.de>,
Shakeel Butt <shakeelb@google.com>,
Peter Enderborg <peter.enderborg@sony.com>,
linux-mm@kvack.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 8/9] psi: pressure stall information for CPU, memory, and IO
Date: Tue, 21 Aug 2018 15:44:13 -0400 [thread overview]
Message-ID: <20180821194413.GA24538@cmpxchg.org> (raw)
In-Reply-To: <20180803165641.GA2476@hirez.programming.kicks-ass.net>
Hi,
a quick update on that feedback before I send out v4:
On Fri, Aug 03, 2018 at 06:56:41PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 01, 2018 at 11:19:57AM -0400, Johannes Weiner wrote:
> > +static bool test_state(unsigned int *tasks, int cpu, enum psi_states state)
> > +{
> > + switch (state) {
> > + case PSI_IO_SOME:
> > + return tasks[NR_IOWAIT];
> > + case PSI_IO_FULL:
> > + return tasks[NR_IOWAIT] && !tasks[NR_RUNNING];
> > + case PSI_MEM_SOME:
> > + return tasks[NR_MEMSTALL];
> > + case PSI_MEM_FULL:
> > + /*
> > + * Since we care about lost potential, things are
> > + * fully blocked on memory when there are no other
> > + * working tasks, but also when the CPU is actively
> > + * being used by a reclaimer and nothing productive
> > + * could run even if it were runnable.
> > + */
> > + return tasks[NR_MEMSTALL] &&
> > + (!tasks[NR_RUNNING] ||
> > + cpu_curr(cpu)->flags & PF_MEMSTALL);
>
> I don't think you can do this, there is nothing that guarantees
> cpu_curr() still exists.
As discussed later in this thread, I've replaced this with time
sampling from inside scheduler_tick(): in the unlikely event that
rq->curr is PF_MEMSTALL, it'll record TICK_NSEC worth of MEM_FULL.
However:
> > + for (s = PSI_NONIDLE; s >= 0; s--) {
> > + u32 time, delta;
> > +
> > + time = READ_ONCE(groupc->times[s]);
> > + /*
> > + * In addition to already concluded states, we
> > + * also incorporate currently active states on
> > + * the CPU, since states may last for many
> > + * sampling periods.
> > + *
> > + * This way we keep our delta sampling buckets
> > + * small (u32) and our reported pressure close
> > + * to what's actually happening.
> > + */
> > + if (test_state(groupc->tasks, cpu, s)) {
> > + /*
> > + * We can race with a state change and
> > + * need to make sure the state_start
> > + * update is ordered against the
> > + * updates to the live state and the
> > + * time buckets (groupc->times).
> > + *
> > + * 1. If we observe task state that
> > + * needs to be recorded, make sure we
> > + * see state_start from when that
> > + * state went into effect or we'll
> > + * count time from the previous state.
> > + *
> > + * 2. If the time delta has already
> > + * been added to the bucket, make sure
> > + * we don't see it in state_start or
> > + * we'll count it twice.
> > + *
> > + * If the time delta is out of
> > + * state_start but not in the time
> > + * bucket yet, we'll miss it entirely
> > + * and handle it in the next period.
> > + */
> > + smp_rmb();
> > + time += cpu_clock(cpu) - groupc->state_start;
> > + }
>
> The alternative is adding an update to scheduler_tick(), that would
> ensure you're never more than nr_cpu_ids * TICK_NSEC behind.
I wasn't able to convert *all* states to tick updates like this.
The reason is that, while testing rq->curr for PF_MEMSTALL is cheap,
other tasks associated with the rq could be from any cgroup in the
system. That means we'd have to do for_each_cgroup() on every tick to
keep the groupc->times that closely uptodate, and that wouldn't scale.
We tend to have hundreds of them, some setups have thousands.
Since we don't need to be *that* current, I left the on-demand update
inside the aggregator for now. It's a bit trickier, but much cheaper.
next prev parent reply other threads:[~2018-08-21 19:44 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-01 15:19 [PATCH 0/9] psi: pressure stall information for CPU, memory, and IO v3 Johannes Weiner
2018-08-01 15:19 ` [PATCH 1/9] mm: workingset: don't drop refault information prematurely Johannes Weiner
2018-08-01 15:19 ` [PATCH 2/9] mm: workingset: tell cache transitions from workingset thrashing Johannes Weiner
2018-08-01 21:56 ` Suren Baghdasaryan
2018-08-02 12:28 ` Johannes Weiner
2018-08-01 15:19 ` [PATCH 3/9] delayacct: track delays from thrashing cache pages Johannes Weiner
2018-08-01 15:19 ` [PATCH 4/9] sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD Johannes Weiner
2018-08-01 15:19 ` [PATCH 5/9] sched: loadavg: make calc_load_n() public Johannes Weiner
2018-08-01 15:19 ` [PATCH 6/9] sched: sched.h: make rq locking and clock functions available in stats.h Johannes Weiner
2018-08-01 15:19 ` [PATCH 7/9] sched: introduce this_rq_lock_irq() Johannes Weiner
2018-08-01 15:19 ` [PATCH 8/9] psi: pressure stall information for CPU, memory, and IO Johannes Weiner
2018-08-03 16:56 ` Peter Zijlstra
2018-08-06 15:05 ` Johannes Weiner
2018-08-06 15:25 ` Peter Zijlstra
2018-08-06 15:40 ` Johannes Weiner
2018-08-06 15:19 ` Johannes Weiner
2018-08-06 16:03 ` Peter Zijlstra
2018-08-21 19:44 ` Johannes Weiner [this message]
2018-08-22 9:16 ` Peter Zijlstra
2018-08-03 17:07 ` Peter Zijlstra
2018-08-06 15:23 ` Johannes Weiner
2018-08-03 17:15 ` Peter Zijlstra
2018-08-03 17:21 ` Peter Zijlstra
2018-08-21 20:11 ` Johannes Weiner
2018-08-22 9:10 ` Peter Zijlstra
2018-08-22 17:28 ` Johannes Weiner
2018-08-01 15:19 ` [PATCH 9/9] psi: cgroup support Johannes Weiner
2018-08-07 11:50 ` [PATCH 0/9] psi: pressure stall information for CPU, memory, and IO v3 peter enderborg
2018-08-07 11:50 ` peter enderborg
2018-08-07 17:51 ` Johannes Weiner
-- strict thread matches above, loose matches on Subject: below --
2018-08-28 17:22 [PATCH 0/9] psi: pressure stall information for CPU, memory, and IO v4 Johannes Weiner
2018-08-28 17:22 ` [PATCH 8/9] psi: pressure stall information for CPU, memory, and IO Johannes Weiner
2018-08-28 20:11 ` Randy Dunlap
2018-08-28 20:56 ` Johannes Weiner
2018-08-28 21:30 ` Randy Dunlap
2018-09-07 10:16 ` Peter Zijlstra
2018-09-07 10:21 ` Peter Zijlstra
2018-09-07 14:44 ` Johannes Weiner
2018-09-07 14:58 ` Peter Zijlstra
2018-09-07 17:50 ` Johannes Weiner
2018-09-07 10:24 ` Peter Zijlstra
2018-09-07 14:54 ` Johannes Weiner
2018-08-01 15:12 Johannes Weiner
2018-08-01 15:13 ` [PATCH 8/9] psi: pressure stall information for CPU, memory, and IO Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180821194413.GA24538@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=cl@linux.com \
--cc=drake@endlessm.com \
--cc=efault@gmx.de \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=peter.enderborg@sony.com \
--cc=peterz@infradead.org \
--cc=shakeelb@google.com \
--cc=surenb@google.com \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vinmenon@codeaurora.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.