linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] mm: Throttle allocators when failing reclaim over memory.high
       [not found] ` <20190201071757.GE11599@dhcp22.suse.cz>
@ 2019-02-01 16:12   ` Johannes Weiner
  2019-02-01 19:16   ` Chris Down
  1 sibling, 0 replies; 3+ messages in thread
From: Johannes Weiner @ 2019-02-01 16:12 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Chris Down, Andrew Morton, Tejun Heo, Roman Gushchin,
	linux-kernel, cgroups, linux-mm, kernel-team

On Fri, Feb 01, 2019 at 08:17:57AM +0100, Michal Hocko wrote:
> On Thu 31-01-19 20:13:52, Chris Down wrote:
> [...]
> > The current situation goes against both the expectations of users of
> > memory.high, and our intentions as cgroup v2 developers. In
> > cgroup-v2.txt, we claim that we will throttle and only under "extreme
> > conditions" will memory.high protection be breached. Likewise, cgroup v2
> > users generally also expect that memory.high should throttle workloads
> > as they exceed their high threshold. However, as seen above, this isn't
> > always how it works in practice -- even on banal setups like those with
> > no swap, or where swap has become exhausted, we can end up with
> > memory.high being breached and us having no weapons left in our arsenal
> > to combat runaway growth with, since reclaim is futile.
> > 
> > It's also hard for system monitoring software or users to tell how bad
> > the situation is, as "high" events for the memcg may in some cases be
> > benign, and in others be catastrophic. The current status quo is that we
> > fail containment in a way that doesn't provide any advance warning that
> > things are about to go horribly wrong (for example, we are about to
> > invoke the kernel OOM killer).
> > 
> > This patch introduces explicit throttling when reclaim is failing to
> > keep memcg size contained at the memory.high setting. It does so by
> > applying an exponential delay curve derived from the memcg's overage
> > compared to memory.high.  In the normal case where the memcg is either
> > below or only marginally over its memory.high setting, no throttling
> > will be performed.
> 
> How does this play wit the actual OOM when the user expects oom to
> resolve the situation because the reclaim is futile and there is nothing
> reclaimable except for killing a process?

Hm, can you elaborate on your question a bit?

The idea behind memory.high is to throttle allocations long enough for
the admin or a management daemon to intervene, but not to trigger the
kernel oom killer. It was designed as a replacement for the cgroup1
oom_control, but without the deadlock potential, ptrace problems etc.

What we specifically do is to set memory.high and have a daemon (oomd)
watch memory.pressure, io.pressure etc. in the group. If pressure
exceeds a certain threshold, the daemon kills something.

As you know, the kernel OOM killer does not kick in reliably when
e.g. page cache is thrashing heavily, since from a kernel POV it's
still successfully allocating and reclaiming - meanwhile the workload
is spending most its time in page faults. And when the kernel OOM
killer does kick in, its selection policy is not very workload-aware.

This daemon on the other hand can be configured to 1) kick in reliably
when the workload-specific tolerances for slowdowns and latencies are
violated (which tends to be way earlier than the kernel oom killer
usually kicks in) and 2) know about the workload and all its
components to make an informed kill decision.

Right now, that throttling mechanism works okay with swap enabled, but
we cannot enable swap everywhere, or sometimes run out of swap, and
then it breaks down and we run into system OOMs.

This patch makes sure memory.high *always* implements the throttling
semantics described in cgroup-v2.txt, not just most of the time.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] mm: Throttle allocators when failing reclaim over memory.high
       [not found] ` <20190201071757.GE11599@dhcp22.suse.cz>
  2019-02-01 16:12   ` [PATCH] mm: Throttle allocators when failing reclaim over memory.high Johannes Weiner
@ 2019-02-01 19:16   ` Chris Down
       [not found]     ` <20190410153307.GA11122@chrisdown.name>
  1 sibling, 1 reply; 3+ messages in thread
From: Chris Down @ 2019-02-01 19:16 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Johannes Weiner, Tejun Heo, Roman Gushchin,
	linux-kernel, cgroups, linux-mm, kernel-team

Michal Hocko writes:
>How does this play wit the actual OOM when the user expects oom to
>resolve the situation because the reclaim is futile and there is nothing
>reclaimable except for killing a process?

In addition to what Johannes said, this doesn't impede OOM in the case of 
global system starvation (eg. in the case that all major consumers of memory 
are allocator throttling). In that case nothing unusual will happen, since the 
task's state is TASK_KILLABLE rather than TASK_UNINTERRUPTIBLE, and we will 
exit out of mem_cgroup_handle_over_high as quickly as possible.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH REBASED] mm: Throttle allocators when failing reclaim over memory.high
       [not found]     ` <20190410153307.GA11122@chrisdown.name>
@ 2019-04-10 15:34       ` Chris Down
  0 siblings, 0 replies; 3+ messages in thread
From: Chris Down @ 2019-04-10 15:34 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, Tejun Heo, Roman Gushchin, linux-kernel, cgroups,
	linux-mm, kernel-team, Andrew Morton

Hey Michal,

Just to come back to your last e-mail about how this interacts with OOM.

Michal Hocko writes:
> I am not really opposed to the throttling in the absence of a reclaimable
> memory. We do that for the regular allocation paths already
> (should_reclaim_retry). A swapless system with anon memory is very likely to
> oom too quickly and this sounds like a real problem. But I do not think that
> we should throttle the allocation to freeze it completely. We should
> eventually OOM. And that was my question about essentially. How much we
> can/should throttle to give a high limit events consumer enough time to
> intervene. I am sorry to still not have time to study the patch more closely
> but this should be explained in the changelog. Are we talking about
> seconds/minutes or simply freeze each allocator to death?

Per-allocation, the maximum is 2 seconds (MEMCG_MAX_HIGH_DELAY_JIFFIES), so we 
don't freeze things to death -- they can recover if they are amenable to it.  
The idea here is that primarily you handle it, just like memory.oom_control in 
v1 (as mentioned in the commit message, or as a last resort, the kernel will 
still OOM if our userspace daemon has kicked the bucket or is otherwise 
ineffective.

If you're setting memory.high and memory.max together, then setting memory.high 
always has to come with a.) tolerance of heavy throttling by your application, 
and b.) userspace intervention in the case of high memory pressure resulting. 
This patch doesn't really change those semantics.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-04-10 15:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20190201011352.GA14370@chrisdown.name>
     [not found] ` <20190201071757.GE11599@dhcp22.suse.cz>
2019-02-01 16:12   ` [PATCH] mm: Throttle allocators when failing reclaim over memory.high Johannes Weiner
2019-02-01 19:16   ` Chris Down
     [not found]     ` <20190410153307.GA11122@chrisdown.name>
2019-04-10 15:34       ` [PATCH REBASED] " Chris Down

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).