an argument for keeping oom_control in cgroups v2

public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed

* an argument for keeping oom_control in cgroups v2
@ 2022-08-22 12:04 Chris Frey
       [not found] ` <20220822120402.GA20333-4/nNOD19pEMY+eTVAdjFZg@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Frey @ 2022-08-22 12:04 UTC (permalink / raw)
  To: cgroups-u79uwXL29TY76Z2rM5mHXA

In cgroups v1 we had:

	memory.soft_limit_in_bytes
	memory.limit_in_bytes
	memory.memsw.limit_in_bytes
	memory.oom_control

Using these features, we could achieve:

	- cause programs that were memory hungry to suffer performance, but
	  not stop (soft limit)

	- cause programs to swap before the system actually ran out of memory
	  (limit)

	- cause programs to be OOM-killed if they used too much swap
	  (memsw.limit...)

	- cause programs to halt instead of get killed (oom_control)

That last feature is something I haven't seen duplicated in the settings
for cgroups v2.  In terms of handling a truly non-malicious memory hungry
program, it is a feature that has no equal, because the user may require
time to free up memory elsewhere before allocating more to the program,
and he may not want the performance degredation, nor the loss of work,
that comes from the other options.

Is there a reason why it wasn't included in v2?  Is there hope that it will
come back?

Thanks,
- Chris

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: an argument for keeping oom_control in cgroups v2
       [not found] ` <20220822120402.GA20333-4/nNOD19pEMY+eTVAdjFZg@public.gmane.org>
@ 2022-08-23  3:22   ` Tejun Heo
       [not found]     ` <YwRIDTmZJflhKP2n-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Tejun Heo @ 2022-08-23  3:22 UTC (permalink / raw)
  To: Chris Frey
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song

(cc'ing memcg folks for visiblity)

On Mon, Aug 22, 2022 at 08:04:02AM -0400, Chris Frey wrote:
> In cgroups v1 we had:
> 
> 	memory.soft_limit_in_bytes
> 	memory.limit_in_bytes
> 	memory.memsw.limit_in_bytes
> 	memory.oom_control
> 
> Using these features, we could achieve:
> 
> 	- cause programs that were memory hungry to suffer performance, but
> 	  not stop (soft limit)
> 
> 	- cause programs to swap before the system actually ran out of memory
> 	  (limit)
> 
> 	- cause programs to be OOM-killed if they used too much swap
> 	  (memsw.limit...)
> 
> 	- cause programs to halt instead of get killed (oom_control)
> 
> That last feature is something I haven't seen duplicated in the settings
> for cgroups v2.  In terms of handling a truly non-malicious memory hungry
> program, it is a feature that has no equal, because the user may require
> time to free up memory elsewhere before allocating more to the program,
> and he may not want the performance degredation, nor the loss of work,
> that comes from the other options.
> 
> Is there a reason why it wasn't included in v2?  Is there hope that it will
> come back?

memcg folks will have better answers but the short answer is that the kernel
really doesn't like giving control of a task stuck with an arbitrary
backtrace to userspace, and that kernel OOM detection often is way too late,
so cgroup2 instead goes for enabling userspace-drive OOM detection and
handling through PSI. The following doc has some information on it.

 https://facebookmicrosites.github.io/resctl-demo-website/docs/demo_docs/res_protection/oomd-daemon

FYI, systemd already has its own oomd implementation in systemd-oomd.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: an argument for keeping oom_control in cgroups v2
       [not found]     ` <YwRIDTmZJflhKP2n-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
@ 2022-08-23  5:06       ` Michal Hocko
       [not found]         ` <YwRgOcfagx4FfQcY-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2022-08-23  5:06 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Chris Frey, cgroups-u79uwXL29TY76Z2rM5mHXA, Johannes Weiner,
	Roman Gushchin, Shakeel Butt, Muchun Song

On Mon 22-08-22 17:22:53, Tejun Heo wrote:
> (cc'ing memcg folks for visiblity)
> 
> On Mon, Aug 22, 2022 at 08:04:02AM -0400, Chris Frey wrote:
> > In cgroups v1 we had:
> > 
> > 	memory.soft_limit_in_bytes
> > 	memory.limit_in_bytes
> > 	memory.memsw.limit_in_bytes
> > 	memory.oom_control
> > 
> > Using these features, we could achieve:
> > 
> > 	- cause programs that were memory hungry to suffer performance, but
> > 	  not stop (soft limit)

There is memory.high with a much more sensible semantic and
implementation to achieve a similar thing.

> > 	- cause programs to swap before the system actually ran out of memory
> > 	  (limit)

Not sure what this is supposed to mean.

> > 	- cause programs to be OOM-killed if they used too much swap
> > 	  (memsw.limit...)

There is an explicit swap limit. It is true that the semantic is
different but do you have an example where you cannot really achieve
what you need by the swap limit?

> > 
> > 	- cause programs to halt instead of get killed (oom_control)
> > 
> > That last feature is something I haven't seen duplicated in the settings
> > for cgroups v2.  In terms of handling a truly non-malicious memory hungry
> > program, it is a feature that has no equal, because the user may require
> > time to free up memory elsewhere before allocating more to the program,
> > and he may not want the performance degredation, nor the loss of work,
> > that comes from the other options.

Yes this functionality is not available in v2 anymore. One reason is
that the implementation had to be considerably reduced to only block on
OOM for user space triggered page faults 3812c8c8f395 ("mm: memcg: do
not trap chargers with full callstack on OOM"). The primary reason is,
as Tejun indicated, that we cannot simply block a random kernel code
path and wait for userspace because that is a potential DoS on the rest
of the system and unrelated workloads which is a trivial breakage of
workload separation.

This means that many other kernel paths which can cause memcg OOM cannot
be blocked and so the feature is severly crippled. In order to allow for
this feature we would essentially need a safe place to wait for the
userspace for any allocation (charging) kernel path where no locks are
held yet allocation failure is not observed and that is not feasible.

Hope this helps clarify
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: an argument for keeping oom_control in cgroups v2
       [not found]         ` <YwRgOcfagx4FfQcY-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2022-08-23 16:10           ` Roman Gushchin
  2022-08-24  9:30             ` Chris Frey
  0 siblings, 1 reply; 5+ messages in thread
From: Roman Gushchin @ 2022-08-23 16:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tejun Heo, Chris Frey, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Johannes Weiner, Shakeel Butt, Muchun Song

On Tue, Aug 23, 2022 at 07:06:01AM +0200, Michal Hocko wrote:
> On Mon 22-08-22 17:22:53, Tejun Heo wrote:
> > (cc'ing memcg folks for visiblity)
> > 
> > On Mon, Aug 22, 2022 at 08:04:02AM -0400, Chris Frey wrote:
> > > In cgroups v1 we had:
> > > 
> > > 	memory.soft_limit_in_bytes
> > > 	memory.limit_in_bytes
> > > 	memory.memsw.limit_in_bytes
> > > 	memory.oom_control
> > > 
> > > Using these features, we could achieve:
> > > 
> > > 	- cause programs that were memory hungry to suffer performance, but
> > > 	  not stop (soft limit)
> 
> There is memory.high with a much more sensible semantic and
> implementation to achieve a similar thing.
> 
> > > 	- cause programs to swap before the system actually ran out of memory
> > > 	  (limit)
> 
> Not sure what this is supposed to mean.
> 
> > > 	- cause programs to be OOM-killed if they used too much swap
> > > 	  (memsw.limit...)
> 
> 
> There is an explicit swap limit. It is true that the semantic is
> different but do you have an example where you cannot really achieve
> what you need by the swap limit?
> 
> > > 
> > > 	- cause programs to halt instead of get killed (oom_control)
> > > 
> > > That last feature is something I haven't seen duplicated in the settings
> > > for cgroups v2.  In terms of handling a truly non-malicious memory hungry
> > > program, it is a feature that has no equal, because the user may require
> > > time to free up memory elsewhere before allocating more to the program,
> > > and he may not want the performance degredation, nor the loss of work,
> > > that comes from the other options.
> 
> Yes this functionality is not available in v2 anymore. One reason is
> that the implementation had to be considerably reduced to only block on
> OOM for user space triggered page faults 3812c8c8f395 ("mm: memcg: do
> not trap chargers with full callstack on OOM"). The primary reason is,
> as Tejun indicated, that we cannot simply block a random kernel code
> path and wait for userspace because that is a potential DoS on the rest
> of the system and unrelated workloads which is a trivial breakage of
> workload separation.
> 
> This means that many other kernel paths which can cause memcg OOM cannot
> be blocked and so the feature is severly crippled. In order to allow for
> this feature we would essentially need a safe place to wait for the
> userspace for any allocation (charging) kernel path where no locks are
> held yet allocation failure is not observed and that is not feasible.

Btw, it's fairly easy to emulate the oom_control behaviour using cgroups v2:
a userspace agent can listen to memory.high/max events and use the cgroup v2
freezer to stop the workload and handle the oom in v1 oom_control style.
An agent can have a high/real-time priority, so I guess the behavior will be
actually quite close to the v1 experience. Much safer though.

Thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: an argument for keeping oom_control in cgroups v2
  2022-08-23 16:10           ` Roman Gushchin
@ 2022-08-24  9:30             ` Chris Frey
  0 siblings, 0 replies; 5+ messages in thread
From: Chris Frey @ 2022-08-24  9:30 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Michal Hocko, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Johannes Weiner, Shakeel Butt, Muchun Song

On Tue, Aug 23, 2022 at 09:10:37AM -0700, Roman Gushchin wrote:
> Btw, it's fairly easy to emulate the oom_control behaviour using cgroups v2:
> a userspace agent can listen to memory.high/max events and use the cgroup v2
> freezer to stop the workload and handle the oom in v1 oom_control style.
> An agent can have a high/real-time priority, so I guess the behavior will be
> actually quite close to the v1 experience. Much safer though.

Thanks to everyone who responded.  Looks like the same functionality,
slightly different, is still available through different means,
so my query has been satisfied.

- Chris


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-08-24  9:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-22 12:04 an argument for keeping oom_control in cgroups v2 Chris Frey
     [not found] ` <20220822120402.GA20333-4/nNOD19pEMY+eTVAdjFZg@public.gmane.org>
2022-08-23  3:22   ` Tejun Heo
     [not found]     ` <YwRIDTmZJflhKP2n-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
2022-08-23  5:06       ` Michal Hocko
     [not found]         ` <YwRgOcfagx4FfQcY-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2022-08-23 16:10           ` Roman Gushchin
2022-08-24  9:30             ` Chris Frey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox