From: Johannes Weiner <hannes@cmpxchg.org>
To: Tejun Heo <tj@kernel.org>
Cc: "Michal Koutný" <mkoutny@suse.com>,
"Abel Wu" <wuyun.abel@bytedance.com>,
"Jonathan Corbet" <corbet@lwn.net>,
"Ingo Molnar" <mingo@redhat.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Juri Lelli" <juri.lelli@redhat.com>,
"Vincent Guittot" <vincent.guittot@linaro.org>,
"Dietmar Eggemann" <dietmar.eggemann@arm.com>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Ben Segall" <bsegall@google.com>, "Mel Gorman" <mgorman@suse.de>,
"Valentin Schneider" <vschneid@redhat.com>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Yury Norov" <yury.norov@gmail.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Bitao Hu" <yaoma@linux.alibaba.com>,
"Chen Ridong" <chenridong@huawei.com>,
"open list:CONTROL GROUP (CGROUP)" <cgroups@vger.kernel.org>,
"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
"open list" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 3/3] cgroup/rstat: Add run_delay accounting for cgroups
Date: Mon, 10 Feb 2025 13:25:45 -0500 [thread overview]
Message-ID: <20250210182545.GA2484@cmpxchg.org> (raw)
In-Reply-To: <Z6onPMIxS0ixXxj9@slm.duckdns.org>
On Mon, Feb 10, 2025 at 06:20:12AM -1000, Tejun Heo wrote:
> On Mon, Feb 10, 2025 at 04:38:56PM +0100, Michal Koutný wrote:
> ...
> > The challenge is with nr (assuming they're all runnable during Δt), that
> > would need to be sampled from /sys/kernel/debug/sched/debug. But then
> > you can get whatever load for individual cfs_rqs from there. Hm, does it
> > even make sense to add up run_delays from different CPUs?
>
> The difficulty in aggregating across CPUs is why some and full pressures are
> defined the way they are. Ideally, we'd want full distribution of stall
> states across CPUs but both aggregation and presentation become challenging,
> so some/full provide the two extremes. Sum of all cpu_delay adds more
> incomplete signal on top. I don't know how useful it'd be. At meta, we
> depend on PSI a lot when investigating resource problems and we've never
> felt the need for the sum time, so that's one data point with the caveat
> that usually our focus is on mem and io pressures where some and full
> pressure metrics usually seem to provide sufficient information.
>
> As the picture provided by some and full metrics is incomplete, I can
> imagine adding the sum being useful. That said, it'd help if Able can
> provide more concrete examples on it being useful. Another thing to consider
> is whether we should add this across resources monitored by PSI - cpu, mem
> and io.
Yes, a more detailed description of the usecase would be helpful.
I'm not exactly sure how the sum of wait times in a cgroup would be
used to gauge load without taking available concurrency into account.
One second of aggregate wait time means something very different if
you have 200 cpus compared to if you have 2.
This is precisely what psi tries to capture. "Some" does provide group
loading information in a sense, but it's a ratio over available
concurrency, and currently capped at 100%. I.e. if you have N cpus,
100% some is "at least N threads waiting at all times." There is a
gradient below that, but not above.
It's conceivable percentages over 100% might be useful, to capture the
degree of contention beyond that. Although like Tejun says, we've not
felt the need for that so far. Whether something is actionable or not
tends to be in the 0-1 range, and beyond that it's just "all bad".
High overload scenarios can also be gauged with tools like runqlat[1],
which give a histogram over individual tasks' delays. We've used this
one extensively to track down issues.
[1] https://github.com/iovisor/bcc/blob/master/tools/runqlat.py
next prev parent reply other threads:[~2025-02-10 18:25 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-25 5:25 [PATCH v2 0/3] Fix and cleanup and extend cpu.stat Abel Wu
2025-01-25 5:25 ` [PATCH v2 1/3] cgroup/rstat: Fix forceidle time in cpu.stat Abel Wu
2025-01-25 5:25 ` [PATCH v2 2/3] cgroup/rstat: Cleanup cpu.stat once for all Abel Wu
2025-01-27 20:17 ` Tejun Heo
2025-01-29 4:47 ` Abel Wu
2025-01-25 5:25 ` [PATCH v2 3/3] cgroup/rstat: Add run_delay accounting for cgroups Abel Wu
2025-01-27 14:10 ` Michal Koutný
2025-01-29 4:48 ` Abel Wu
2025-02-10 15:38 ` Michal Koutný
2025-02-10 16:20 ` Tejun Heo
2025-02-10 18:25 ` Johannes Weiner [this message]
2025-02-12 15:17 ` Abel Wu
2025-02-21 15:36 ` Michal Koutný
2025-02-12 15:14 ` Abel Wu
2025-02-12 15:12 ` Abel Wu
2025-02-03 8:11 ` [PATCH v2 0/3] Fix and cleanup and extend cpu.stat Abel Wu
2025-02-04 20:46 ` Tejun Heo
2025-02-05 2:42 ` Abel Wu
2025-02-10 8:51 ` Abel Wu
2025-02-19 2:34 ` Abel Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250210182545.GA2484@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=bsegall@google.com \
--cc=cgroups@vger.kernel.org \
--cc=chenridong@huawei.com \
--cc=corbet@lwn.net \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=mkoutny@suse.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=wuyun.abel@bytedance.com \
--cc=yaoma@linux.alibaba.com \
--cc=yury.norov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox