Re: [PATCH RFC] mm: don't raise MEMCG_OOM event due to failed high-order allocation

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Roman Gushchin <guro@fb.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@fb.com, Johannes Weiner <hannes@cmpxchg.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>
Subject: Re: [PATCH RFC] mm: don't raise MEMCG_OOM event due to failed high-order allocation
Date: Wed, 12 Sep 2018 09:25:29 -0700	[thread overview]
Message-ID: <20180912162526.GA15119@castle> (raw)
In-Reply-To: <20180912123534.GG10951@dhcp22.suse.cz>

On Wed, Sep 12, 2018 at 02:35:34PM +0200, Michal Hocko wrote:
> On Tue 11-09-18 08:27:30, Roman Gushchin wrote:
> > On Tue, Sep 11, 2018 at 02:11:41PM +0200, Michal Hocko wrote:
> > > On Mon 10-09-18 14:56:22, Roman Gushchin wrote:
> > > > The memcg OOM killer is never invoked due to a failed high-order
> > > > allocation, however the MEMCG_OOM event can be easily raised.
> > > > 
> > > > Under some memory pressure it can happen easily because of a
> > > > concurrent allocation. Let's look at try_charge(). Even if we were
> > > > able to reclaim enough memory, this check can fail due to a race
> > > > with another allocation:
> > > > 
> > > >     if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
> > > >         goto retry;
> > > > 
> > > > For regular pages the following condition will save us from triggering
> > > > the OOM:
> > > > 
> > > >    if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER))
> > > >        goto retry;
> > > > 
> > > > But for high-order allocation this condition will intentionally fail.
> > > > The reason behind is that we'll likely fall to regular pages anyway,
> > > > so it's ok and even preferred to return ENOMEM.
> > > > 
> > > > In this case the idea of raising the MEMCG_OOM event looks dubious.
> > > 
> > > Why is this a problem though? IIRC this event was deliberately placed
> > > outside of the oom path because we wanted to count allocation failures
> > > and this is also documented that way
> > > 
> > >           oom
> > >                 The number of time the cgroup's memory usage was
> > >                 reached the limit and allocation was about to fail.
> > > 
> > >                 Depending on context result could be invocation of OOM
> > >                 killer and retrying allocation or failing a
> > > 
> > > One could argue that we do not apply the same logic to GFP_NOWAIT
> > > requests but in general I would like to see a good reason to change
> > > the behavior and if it is really the right thing to do then we need to
> > > update the documentation as well.
> > 
> > Right, the current behavior matches the documentation, because the description
> > of the event is broad enough. My point is that the current behavior is not
> > useful in my corner case.
> > 
> > Let me explain my case in details: I've got a report about sporadic memcg oom
> > kills on some hosts with plenty of pagecache and low memory pressure.
> > You'll probably agree, that raising OOM signal in this case looks strange.
> 
> I am not sure I follow. So you see both OOM_KILL and OOM events and the
> user misinterprets OOM ones?

No, I see sporadic OOMs without OOM_KILLs in cgroups with plenty of pagecache
and low memory pressure. It's not a pre-OOM condition at all.

> 
> My understanding was that OOM event should tell admin that the limit
> should be increased in order to allow more charges. Without OOM_KILL
> events it means that those failed charges have some sort of fallback
> so it is not critical condition for the workload yet. Something to watch
> for though in case of perf. degradation or potential misbehavior.

Right, something like "there is a shortage of memory which will likely
lead to OOM soon". It's not my case.

> 
> Whether this is how the event is used, I dunno. Anyway, if you want to
> just move the event and make it closer to OOM_KILL then I strongly
> suspect the event is losing its relevance.

I agree here (about losing relevance), but don't think it's a reason
to generate misleading events.

Thanks!

WARNING: multiple messages have this Message-ID (diff)

From: Roman Gushchin <guro@fb.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
	<kernel-team@fb.com>, Johannes Weiner <hannes@cmpxchg.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>
Subject: Re: [PATCH RFC] mm: don't raise MEMCG_OOM event due to failed high-order allocation
Date: Wed, 12 Sep 2018 09:25:29 -0700	[thread overview]
Message-ID: <20180912162526.GA15119@castle> (raw)
In-Reply-To: <20180912123534.GG10951@dhcp22.suse.cz>

On Wed, Sep 12, 2018 at 02:35:34PM +0200, Michal Hocko wrote:
> On Tue 11-09-18 08:27:30, Roman Gushchin wrote:
> > On Tue, Sep 11, 2018 at 02:11:41PM +0200, Michal Hocko wrote:
> > > On Mon 10-09-18 14:56:22, Roman Gushchin wrote:
> > > > The memcg OOM killer is never invoked due to a failed high-order
> > > > allocation, however the MEMCG_OOM event can be easily raised.
> > > > 
> > > > Under some memory pressure it can happen easily because of a
> > > > concurrent allocation. Let's look at try_charge(). Even if we were
> > > > able to reclaim enough memory, this check can fail due to a race
> > > > with another allocation:
> > > > 
> > > >     if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
> > > >         goto retry;
> > > > 
> > > > For regular pages the following condition will save us from triggering
> > > > the OOM:
> > > > 
> > > >    if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER))
> > > >        goto retry;
> > > > 
> > > > But for high-order allocation this condition will intentionally fail.
> > > > The reason behind is that we'll likely fall to regular pages anyway,
> > > > so it's ok and even preferred to return ENOMEM.
> > > > 
> > > > In this case the idea of raising the MEMCG_OOM event looks dubious.
> > > 
> > > Why is this a problem though? IIRC this event was deliberately placed
> > > outside of the oom path because we wanted to count allocation failures
> > > and this is also documented that way
> > > 
> > >           oom
> > >                 The number of time the cgroup's memory usage was
> > >                 reached the limit and allocation was about to fail.
> > > 
> > >                 Depending on context result could be invocation of OOM
> > >                 killer and retrying allocation or failing a
> > > 
> > > One could argue that we do not apply the same logic to GFP_NOWAIT
> > > requests but in general I would like to see a good reason to change
> > > the behavior and if it is really the right thing to do then we need to
> > > update the documentation as well.
> > 
> > Right, the current behavior matches the documentation, because the description
> > of the event is broad enough. My point is that the current behavior is not
> > useful in my corner case.
> > 
> > Let me explain my case in details: I've got a report about sporadic memcg oom
> > kills on some hosts with plenty of pagecache and low memory pressure.
> > You'll probably agree, that raising OOM signal in this case looks strange.
> 
> I am not sure I follow. So you see both OOM_KILL and OOM events and the
> user misinterprets OOM ones?

No, I see sporadic OOMs without OOM_KILLs in cgroups with plenty of pagecache
and low memory pressure. It's not a pre-OOM condition at all.

> 
> My understanding was that OOM event should tell admin that the limit
> should be increased in order to allow more charges. Without OOM_KILL
> events it means that those failed charges have some sort of fallback
> so it is not critical condition for the workload yet. Something to watch
> for though in case of perf. degradation or potential misbehavior.

Right, something like "there is a shortage of memory which will likely
lead to OOM soon". It's not my case.

> 
> Whether this is how the event is used, I dunno. Anyway, if you want to
> just move the event and make it closer to OOM_KILL then I strongly
> suspect the event is losing its relevance.

I agree here (about losing relevance), but don't think it's a reason
to generate misleading events.

Thanks!

next prev parent reply	other threads:[~2018-09-12 16:25 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-10 21:56 [PATCH RFC] mm: don't raise MEMCG_OOM event due to failed high-order allocation Roman Gushchin
2018-09-10 21:56 ` Roman Gushchin
2018-09-11  0:40 ` David Rientjes
2018-09-11 12:11 ` Michal Hocko
2018-09-11 12:41   ` peter enderborg
2018-09-11 12:41     ` peter enderborg
2018-09-11 12:47     ` Michal Hocko
2018-09-11 15:34     ` Roman Gushchin
2018-09-11 15:34       ` Roman Gushchin
2018-09-11 15:27   ` Roman Gushchin
2018-09-11 15:27     ` Roman Gushchin
2018-09-12 12:35     ` Michal Hocko
2018-09-12 16:25       ` Roman Gushchin [this message]
2018-09-12 16:25         ` Roman Gushchin
2018-09-11 12:43 ` Johannes Weiner
2018-09-11 15:47   ` Roman Gushchin
2018-09-11 15:47     ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180912162526.GA15119@castle \
    --to=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.