All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tejun Heo <tj@kernel.org>,
	surenb@google.com, Vinayak Menon <vinmenon@codeaurora.org>,
	Christoph Lameter <cl@linux.com>, Mike Galbraith <efault@gmx.de>,
	shakeelb@google.com, linux-mm <linux-mm@kvack.org>,
	cgroups <cgroups@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	kernel-team <kernel-team@fb.com>
Subject: Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO
Date: Thu, 19 Jul 2018 13:54:05 -0400	[thread overview]
Message-ID: <20180719175405.GA19230@cmpxchg.org> (raw)
In-Reply-To: <CA+55aFw7t++BzEy-XsatNcauw3Wn22SSXfd3iTYECi4fJ97CCg@mail.gmail.com>

On Thu, Jul 19, 2018 at 08:08:20AM -0700, Linus Torvalds wrote:
> On Wed, Jul 18, 2018 at 5:03 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > And as said before, we can compress the state from 12 bytes, to 6 bits
> > (or 1 byte), giving another 11 bytes for 59 bytes free.
> >
> > Leaving us just 5 bytes short of needing a single cacheline :/
> 
> Do you actually need 64 bits for the times?
> 
> That's the big cost. And it seems ridiculous, if you actually care about size.
> 
> You already have a 64-bit start time. Everything else is some
> cumulative relative time. Do those really need 64-bit and nanosecond
> resolution?
> 
> Maybe a 32-bit microsecond would be ok - would you ever account more
> than 35 minutes of anything without starting anew?

D'oh, you're right, the per-cpu buckets don't need to be this big at
all. In fact, we flush those deltas out every 2 seconds when there is
activity to maintain the running averages. Since we get 4.2s worth of
nanoseconds into a u32, we don't even need to divide in the hotpath.

Something along the lines of this here should work:

static void psi_group_change(struct psi_group *group, int cpu, u64 now,
			     unsigned int clear, unsigned int set)
{
	struct psi_group_cpu *groupc;
	unsigned int *tasks;
	unsigned int t;
	u32 delta;

	groupc = per_cpu_ptr(group->cpus, cpu);
	tasks = groupc->tasks;

	/* Time since last task change on this runqueue */
	delta = now - groupc->last_time;
	groupc->last_time = now;

	/* Tasks waited for IO? */
	if (tasks[NR_IOWAIT]) {
		if (!tasks[NR_RUNNING])
			groupc->full_time[PSI_IO] += delta;
		else
			groupc->some_time[PSI_IO] += delta;
	}

	/* Tasks waited for memory? */
	if (tasks[NR_MEMSTALL]) {
		if (!tasks[NR_RUNNING] ||
		    (cpu_curr(cpu)->flags & PF_MEMSTALL))
			groupc->full_time[PSI_MEM] += delta;
		else
			groupc->some_time[PSI_MEM] += delta;
	}

	/* Tasks waited for the CPU? */
	if (tasks[NR_RUNNING] > 1)
		groupc->some_time[PSI_CPU] += delta;

	/* Tasks were generally non-idle? To weigh the CPU in summaries */
	if (tasks[NR_RUNNING] || tasks[NR_IOWAIT] || tasks[NR_MEMSTALL])
		groupc->nonidle_time += delta;

	/* Update task counts according to the set/clear bitmasks */
	for (t = 0; clear; clear &= ~(1 << t), t++)
		if (clear & (1 << t))
			groupc->tasks[t]--;
	for (t = 0; set; set &= ~(1 << t), t++)
		if (set & (1 << t))
			groupc->tasks[t]++;

	/* Kick the stats aggregation worker if it's gone to sleep */
	if (!delayed_work_pending(&group->clock_work))
		schedule_delayed_work(&group->clock_work, PSI_FREQ);
}

And then we can pack it down to one cacheline:

struct psi_group_cpu {
	/* States of the tasks belonging to this group */
	unsigned int tasks[NR_PSI_TASK_COUNTS]; // 3

	/* Time sampling bucket for pressure states - no FULL for CPU */
	u32 some_time[NR_PSI_RESOURCES];
	u32 full_time[NR_PSI_RESOURCES - 1];

	/* Time sampling bucket for non-idle state (ns) */
	u32 nonidle_time;

	/* Time of last task change in this group (rq_clock) */
	u64 last_time;
};

I'm going to go test with this.

Thanks

  reply	other threads:[~2018-07-19 17:54 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-12 17:29 [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 Johannes Weiner
2018-07-12 17:29 ` [PATCH 01/10] mm: workingset: don't drop refault information prematurely Johannes Weiner
2018-07-12 17:29 ` [PATCH 02/10] mm: workingset: tell cache transitions from workingset thrashing Johannes Weiner
2018-07-23 13:36   ` Arnd Bergmann
2018-07-23 13:36     ` Arnd Bergmann
2018-07-23 15:23     ` Johannes Weiner
2018-07-23 15:23       ` Johannes Weiner
2018-07-23 15:23       ` Johannes Weiner
2018-07-23 15:35       ` Arnd Bergmann
2018-07-23 15:35         ` Arnd Bergmann
2018-07-23 16:27         ` Johannes Weiner
2018-07-23 16:27           ` Johannes Weiner
2018-07-23 16:27           ` Johannes Weiner
2018-07-24 15:04           ` Will Deacon
2018-07-24 15:04             ` Will Deacon
2018-07-25 16:06             ` Will Deacon
2018-07-25 16:06               ` Will Deacon
2018-07-12 17:29 ` [PATCH 03/10] delayacct: track delays from thrashing cache pages Johannes Weiner
2018-07-12 17:29 ` [PATCH 04/10] sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD Johannes Weiner
2018-07-12 17:29 ` [PATCH 05/10] sched: loadavg: make calc_load_n() public Johannes Weiner
2018-07-12 17:29 ` [PATCH 06/10] sched: sched.h: make rq locking and clock functions available in stats.h Johannes Weiner
2018-07-12 17:29 ` [PATCH 07/10] sched: introduce this_rq_lock_irq() Johannes Weiner
2018-07-12 17:29 ` [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO Johannes Weiner
2018-07-13  9:21   ` Peter Zijlstra
2018-07-13 16:17     ` Johannes Weiner
2018-07-14  8:48       ` Peter Zijlstra
2018-07-14  9:02       ` Peter Zijlstra
2018-07-17 10:03   ` Peter Zijlstra
2018-07-18 21:56     ` Johannes Weiner
2018-07-17 14:16   ` Peter Zijlstra
2018-07-18 22:00     ` Johannes Weiner
2018-07-17 14:21   ` Peter Zijlstra
2018-07-18 22:03     ` Johannes Weiner
2018-07-17 15:01   ` Peter Zijlstra
2018-07-18 22:06     ` Johannes Weiner
2018-07-20 14:13       ` Johannes Weiner
2018-07-17 15:17   ` Peter Zijlstra
2018-07-18 22:11     ` Johannes Weiner
2018-07-17 15:32   ` Peter Zijlstra
2018-07-18 12:03   ` Peter Zijlstra
2018-07-18 12:22     ` Peter Zijlstra
2018-07-18 22:36     ` Johannes Weiner
2018-07-19 13:58       ` Peter Zijlstra
2018-07-19  9:26     ` Peter Zijlstra
2018-07-19 12:50       ` Johannes Weiner
2018-07-19 13:18         ` Peter Zijlstra
2018-07-19 15:08     ` Linus Torvalds
2018-07-19 17:54       ` Johannes Weiner [this message]
2018-07-19 18:47     ` Johannes Weiner
2018-07-19 20:31       ` Peter Zijlstra
2018-07-24 16:01         ` Johannes Weiner
2018-07-18 12:46   ` Peter Zijlstra
2018-07-18 13:56     ` Johannes Weiner
2018-07-18 16:31       ` Peter Zijlstra
2018-07-18 16:46         ` Johannes Weiner
2018-07-20 20:35   ` Peter Zijlstra
2018-07-12 17:29 ` [PATCH 09/10] psi: cgroup support Johannes Weiner
2018-07-12 20:08   ` Tejun Heo
2018-07-17 15:40   ` Peter Zijlstra
2018-07-24 15:54     ` Johannes Weiner
2018-07-12 17:29 ` [RFC PATCH 10/10] psi: aggregate ongoing stall events when somebody reads pressure Johannes Weiner
2018-07-12 23:45   ` Andrew Morton
2018-07-13 22:17     ` Johannes Weiner
2018-07-13 22:13   ` Suren Baghdasaryan
2018-07-13 22:49     ` Johannes Weiner
2018-07-13 23:34       ` Suren Baghdasaryan
2018-07-17 15:13   ` Peter Zijlstra
2018-07-12 17:37 ` [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 Linus Torvalds
2018-07-12 23:44 ` Andrew Morton
2018-07-13 22:14   ` Johannes Weiner
2018-07-16 15:57 ` Daniel Drake
2018-07-17 11:25   ` Michal Hocko
2018-07-17 12:13     ` Daniel Drake
2018-07-17 12:23       ` Michal Hocko
2018-07-25 22:57         ` Daniel Drake
2018-07-18 22:21     ` Johannes Weiner
2018-07-19 11:29       ` peter enderborg
2018-07-19 11:29         ` peter enderborg
2018-07-19 12:18         ` Johannes Weiner
2018-07-23 21:14 ` Balbir Singh
2018-07-24 15:15   ` Johannes Weiner
2018-07-26  1:07     ` Singh, Balbir
2018-07-26 20:07       ` Johannes Weiner
2018-07-27 23:40         ` Suren Baghdasaryan
2018-07-27 22:01 ` Pavel Machek
2018-07-30 15:40   ` Johannes Weiner
2018-07-30 17:39     ` Pavel Machek
2018-07-30 17:51       ` Tejun Heo
2018-07-30 17:54         ` Randy Dunlap
2018-07-30 18:05           ` Tejun Heo
2018-07-30 17:59         ` Pavel Machek
2018-07-30 18:07           ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180719175405.GA19230@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=efault@gmx.de \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=shakeelb@google.com \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vinmenon@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.