From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
akpm@linux-foundation.org, linux-mm@kvack.org,
Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: [PATCH 1/3] mm,oom: Move last second allocation to inside the OOM killer.
Date: Fri, 1 Dec 2017 15:57:11 +0000 [thread overview]
Message-ID: <20171201155711.GA11057@cmpxchg.org> (raw)
In-Reply-To: <20171201151715.yiep5wkmxmp77nxn@dhcp22.suse.cz>
On Fri, Dec 01, 2017 at 04:17:15PM +0100, Michal Hocko wrote:
> On Fri 01-12-17 14:56:38, Johannes Weiner wrote:
> > On Fri, Dec 01, 2017 at 03:46:34PM +0100, Michal Hocko wrote:
> > > On Fri 01-12-17 14:33:17, Johannes Weiner wrote:
> > > > On Sat, Nov 25, 2017 at 07:52:47PM +0900, Tetsuo Handa wrote:
> > > > > @@ -1068,6 +1071,17 @@ bool out_of_memory(struct oom_control *oc)
> > > > > }
> > > > >
> > > > > select_bad_process(oc);
> > > > > + /*
> > > > > + * Try really last second allocation attempt after we selected an OOM
> > > > > + * victim, for somebody might have managed to free memory while we were
> > > > > + * selecting an OOM victim which can take quite some time.
> > > >
> > > > Somebody might free some memory right after this attempt fails. OOM
> > > > can always be a temporary state that resolves on its own.
> > > >
> > > > What keeps us from declaring OOM prematurely is the fact that we
> > > > already scanned the entire LRU list without success, not last second
> > > > or last-last second, or REALLY last-last-last-second allocations.
> > >
> > > You are right that this is inherently racy. The point here is, however,
> > > that the race window between the last check and the kill can be _huge_!
> >
> > My point is that it's irrelevant. We already sampled the entire LRU
> > list; compared to that, the delay before the kill is immaterial.
>
> Well, I would disagree. I have seen OOM reports with a free memory.
> Closer debugging shown that an existing process was on the way out and
> the oom victim selection took way too long and fired after a large
> process manage. There were different hacks^Wheuristics to cover those
> cases but they turned out to just cause different corner cases. Moving
> the existing last moment allocation after a potentially very time
> consuming action is relatively cheap and safe measure to cover those
> cases without any negative side effects I can think of.
An existing process can exit right after you pull the trigger. How big
is *that* race window? By this logic you could add a sleep(5) before
the last-second allocation because it would increase the likelihood of
somebody else exiting voluntarily.
This patch is making the time it takes to select a victim an integral
part of OOM semantics. Think about it: if somebody later speeds up the
OOM selection process, they shrink the window in which somebody could
volunteer memory for the last-second allocation. By optimizing that
code, you're probabilistically increasing the rate of OOM kills.
A guaranteed 5 second window would in fact be better behavior.
This is bananas. I'm sticking with my nak.
> > > Another argument is that the allocator itself could have changed its
> > > allocation capabilities - e.g. become the OOM victim itself since the
> > > last time it the allocator could have reflected that fact.
> >
> > Can you outline how this would happen exactly?
>
> http://lkml.kernel.org/r/20171101135855.bqg2kuj6ao2cicqi@dhcp22.suse.cz
>
> As I try to explain the workload is really pathological but this (resp.
> the follow up based on this patch) as a workaround is moderately ugly
> wrt. it actually can help.
That's not a real case which matters. It's really unfortunate how much
churn the OOM killer has been seeing based on artificial stress tests.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-12-01 15:57 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-25 10:52 [PATCH 1/3] mm,oom: Move last second allocation to inside the OOM killer Tetsuo Handa
2017-11-25 10:52 ` [PATCH 2/3] mm,oom: Use ALLOC_OOM for OOM victim's last second allocation Tetsuo Handa
2017-11-25 10:52 ` [PATCH 3/3] mm,oom: Remove oom_lock serialization from the OOM reaper Tetsuo Handa
2017-11-28 13:04 ` [PATCH 1/3] mm,oom: Move last second allocation to inside the OOM killer Michal Hocko
2017-12-01 14:33 ` Johannes Weiner
2017-12-01 14:46 ` Michal Hocko
2017-12-01 14:56 ` Johannes Weiner
2017-12-01 15:17 ` Michal Hocko
2017-12-01 15:57 ` Johannes Weiner [this message]
2017-12-01 16:38 ` Michal Hocko
2017-12-05 10:46 ` Johannes Weiner
2017-12-05 13:02 ` Michal Hocko
2017-12-05 13:17 ` Tetsuo Handa
2017-12-05 13:42 ` Michal Hocko
2017-12-05 14:07 ` Tetsuo Handa
2017-12-05 14:30 ` Michal Hocko
2017-12-01 16:52 ` Tetsuo Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171201155711.GA11057@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.