Re: [RFC PATCH] cgroup: Track time in cgroup v2 freezer

cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Tiffany Yang <ynaffit@google.com>
To: "Michal Koutný" <mkoutny@suse.com>
Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	 kernel-team@android.com, John Stultz <jstultz@google.com>,
	 Thomas Gleixner <tglx@linutronix.de>,
	Stephen Boyd <sboyd@kernel.org>,
	 Anna-Maria Behnsen <anna-maria@linutronix.de>,
	Frederic Weisbecker <frederic@kernel.org>,
	 Tejun Heo <tj@kernel.org>, Johannes Weiner <hannes@cmpxchg.org>,
	 "Rafael J. Wysocki" <rafael@kernel.org>,
	Pavel Machek <pavel@kernel.org>,
	 Roman Gushchin <roman.gushchin@linux.dev>,
	Chen Ridong <chenridong@huawei.com>,
	 Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	 Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	 Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	 Valentin Schneider <vschneid@redhat.com>
Subject: Re: [RFC PATCH] cgroup: Track time in cgroup v2 freezer
Date: Sun, 13 Jul 2025 21:53:45 -0700	[thread overview]
Message-ID: <dbx8o6tn8jae.fsf@ynaffit-andsys.c.googlers.com> (raw)
In-Reply-To: <ry6p5w3p4l7pnsovyapu6n2by7f4zl63c7umwut2ngdxinx6fs@yu53tunbkxdi> ("Michal Koutný"'s message of "Mon, 30 Jun 2025 19:40:28 +0200")

Michal Koutný <mkoutny@suse.com> writes:

> Would it be sufficient to measure that deadline against
> cpu.stat:usage_usec (CPU time consumed by the cgroup)? Or do I
> misunderstand your latter deadline metric?

CPU time is a good way to think about the quantity we are trying to
measure against, but it does not account for sleep time (either
voluntarily or waiting on a futex, etc.). Unlike freeze time, we would
want sleep time to count against our deadline because a timeout would
likely indicate a problem in the application's logic.

> (Note that SIGSTOP may be sent to self or within the group but) mind
> that even the category "not requested" is split into two other: resource
> contention and freezing management. And the latter should be under
> control of the agent that sets the deadlines.

This would be ideal, but in our case, the agent that sets/enforces the
deadlines is a task in the same application. It has no control over
freezing events and (currently) no way to know when one has
occurred. Consequently, even if the freezing manager were to send the
relevant information to our agent, none of those messages could be
processed until the application was unfrozen.

The result would be competing directly against the task under deadline
(to handle communication as it came in) or delaying corrective action
decisions (to wait until the deadline to deal with any messages). If the
application were frozen multiple times during the timer interval, that
cost would be incurred each time. As an alternative, the watchdog could
request this information from the freezing manager upon timer elapse,
but that would also introduce significant latency to deadline
enforcement.

> Those are order(s) of magnitude different. I can't imagine that using
> freezer for jobs where also wakeup latency matters.

This is true! These examples were mainly to illustrate the breadth of
the problem space/how slippery it can be to generalize.

> Well, there are multiple similar metrics: various (cgroup) PSI, (global)
> steal time, cpu.stat:throttled_usage and perhaps some more.

Ah! Thanks for noting these. It's helpful to have these concrete
examples to find ways to think about this problem.

Philosophically, I think the time we're trying to account for is most
similar to steal time because it allows a VM to correct the internal
accounting it uses to enforce policy. After considering how the delay
we're trying to track fits among these, I think one quality that makes
it somewhat difficult to formalize is that we are trying to account for
multiple external sources of delay, but we also want to exclude
"internal" delay (contention, voluntary sleep). The specificity of this
is making an iterative approach seem more appealing...

> Tejun's suggestion with tracking cgroup's frozen time of whole cgroup
> could complement other "debugging" stats provided by cgroups by I tend
> to think that it's not good (and certainly not complete) solution to
> your problem.

I agree that it doesn't necessarily feel complete, but after spending
this time mulling over the problem, I think it still feels too narrow to
know what a more general solution should look like.

Since there isn't yet a clear way to identify a set of "lost" time that
everyone (or at least a wider group of users) cares about, it seems like
iterating over components of interest is the best way to make progress
for now. That way, at least folks can track some combination of the
values that matter to them. (One aspect of this I find interesting is
time that is accounted for in multiple metrics. Maybe a better way to
think about this problem can be found in some relation between these
overlaps.)

I really appreciate the effort that you've put into trying to understand
the larger problem and the questions you've asked to help me think about
it. Thank you very much for your time!

-- 
Tiffany Y. Yang

     prev parent reply	other threads:[~2025-07-14  4:53 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-03 22:43 [RFC PATCH] cgroup: Track time in cgroup v2 freezer Tiffany Yang
2025-06-03 23:03 ` Tejun Heo
2025-06-04 19:39   ` Tiffany Yang
2025-06-04 22:47     ` Tejun Heo
2025-06-27  2:19       ` Tiffany Yang
2025-06-27 19:01         ` Tejun Heo
2025-07-14  4:44           ` Tiffany Yang
2025-06-17  9:49 ` Michal Koutný
2025-06-27  7:47   ` Tiffany Yang
2025-06-30 17:40     ` Michal Koutný
2025-07-14  4:53       ` Tiffany Yang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dbx8o6tn8jae.fsf@ynaffit-andsys.c.googlers.com \
    --to=ynaffit@google.com \
    --cc=anna-maria@linutronix.de \
    --cc=bsegall@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chenridong@huawei.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=frederic@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=jstultz@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=mkoutny@suse.com \
    --cc=pavel@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=roman.gushchin@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).