From: Andrea Arcangeli <andrea@suse.de>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@osdl.org>,
marcelo.tosatti@cyclades.com, LKML <linux-kernel@vger.kernel.org>,
nickpiggin@yahoo.com.au
Subject: Re: [PATCH] oom killer (Core)
Date: Fri, 3 Dec 2004 00:35:00 +0100 [thread overview]
Message-ID: <20041202233459.GF32635@dualathlon.random> (raw)
In-Reply-To: <1102013716.13353.226.camel@tglx.tec.linutronix.de>
On Thu, Dec 02, 2004 at 07:55:16PM +0100, Thomas Gleixner wrote:
> On Thu, 2004-12-02 at 19:08 +0100, Andrea Arcangeli wrote:
> > OTOH we must not forget 2.4(-aa) calls do_exit synchronously and it
> > never sends signals. That might be why 2.4 doesn't kill more than one
> > task by mistake, even without a callback-wakeup.
>
> I just run the same test on 2.4.27 and the behaviour is completely
> different.
>
> The box seems to be stuck in a swap in/out loop for quite a long time.
> During this time the box is not responsive at all. It finally stops the
> forking after quite a long time of swapping with
> fork() (error: resource temporarily not available).
Fork eventually failing is very reasonable if you're executing a fork
loop.
>
> There is no output in dmesg, but I'm not able to remove the remaining
> hackbench processes as even a kill -SIGKILL returns with
> fork() (error: resource temporarily not available)
>
> I'm not sure, which of the two scenarios I like better :)
Please try with 2.4.23aa3, I think there was some oom killer change
after I had no resources to track 2.4 anymore. I'm not saying 2.4.23aa3
will work better though, but I would like to know if there's some corner
case still open in 2.4-aa. Careful, 2.4.23aa3 has security bugs (only
local security of course, i.e. normally not a big issue, sure good
enough for a quick test).
I doubt your testcase simulates anything remotely realistic but
anyway it's still informative.
What I'm simulating here is very real life scenario with a couple of
apps allocating more memory than ram.
> FYI, I tried with 2.6 UP and PREEMPT=n. The result is more horrible. The
> box just gets stuck in an endless swap in/swap out and does not respond
> to anything else than SysRq-T and the reset button.
>
> With the callback the machine did not come back after 20 Minutes.
Was the oom killer invoked at all? If yes, and it works with preempt,
that could mean a cond_resched is simply missing...
> You meant the one in badness() right ?
yes.
> Well it makes it better, but I was able to have a second invocation
> before the first killed tasks exited. That's simple to explain. The task
> is on the way out and releases resources, so the VM size is reduced and
> the killer picks another process.
That's trivial to fix checking for PF_DEAD/PF_EXITING.
> > I'd rather fix this by removing buggy code, than by adding additional
> > logics on top of already buggy code (i.e. setting PF_MEMDIE is a smp
> > race and can corrupt other bitflags), but at least the
> > oom-wakeup-callback from do_exit still makes a lot of sense (even if
> > PF_MEMDIE must be dropped since it's buggy, and something else should be
> > used instead).
>
> I think the callback is the only safe way to fix that. If PF_MEMDIE is
> racy then I'm sure we will find a different indicator for that.
The callback adds overhead to the exit path. Plus strictly speaking it's
not actually a callback, you're just "polling" for the bitflag :)
> Yep, but the reentrancy blocking with the callback makes the time, count
> crap and the watermark check go away, as it is safe to reenable the
> killer at this point because we definitely freed memory resources. So
> the watermark comes for free.
You can get an I/O race where your program is about to finish a failing
try_to_free_pages pass (note that a task exiting won't make
try_to_free_pages work any easier, try_to_free_pages has to free
allocated memory, it doesn't care if there's 1M or 100M of free memory).
If you don't check the watermarks after waiting for I/O, you're going to
generate a suprious oom-killing. Your changes can't help.
Note that even the watermark checks leaves a race window open, but at
least it's not an I/O window. While try_to_free_pages can wait for I/O
and then fail.
I'll add to my last patch the removal of the PF_MEMDIE check in oom_kill
plus I'll fix the remaining race with PF_EXITING/DEAD, and I'll add a
cond_resched. Then you can try again with my simple way (w/ and w/o
PREEMPT ;).
Thanks for the great feedback.
next prev parent reply other threads:[~2004-12-02 23:35 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-12-01 9:49 [PATCH] oom killer (Core) tglx
2004-12-01 21:16 ` Andrea Arcangeli
2004-12-01 22:06 ` Thomas Gleixner
2004-12-01 22:33 ` Andrea Arcangeli
2004-12-02 3:36 ` Andrea Arcangeli
2004-12-02 11:09 ` Thomas Gleixner
2004-12-02 13:48 ` Thomas Gleixner
2004-12-02 16:47 ` Andrea Arcangeli
2004-12-02 16:55 ` Andrew Morton
2004-12-02 11:18 ` Marcelo Tosatti
2004-12-02 17:17 ` Thomas Gleixner
2004-12-02 17:27 ` Andrew Morton
2004-12-02 18:08 ` Andrea Arcangeli
2004-12-02 18:29 ` Andrew Morton
2004-12-02 19:01 ` Thomas Gleixner
2004-12-02 18:55 ` Thomas Gleixner
2004-12-02 19:07 ` Andrew Morton
2004-12-02 19:08 ` Thomas Gleixner
2004-12-02 19:22 ` Andrew Morton
2004-12-02 19:24 ` Thomas Gleixner
2004-12-02 20:11 ` Andre Tomt
2004-12-03 22:45 ` Thomas Gleixner
2004-12-02 23:47 ` Andrea Arcangeli
2004-12-03 14:41 ` Helge Hafting
2004-12-03 21:20 ` Thomas Gleixner
2004-12-05 21:14 ` Helge Hafting
2004-12-02 23:35 ` Andrea Arcangeli [this message]
2004-12-03 2:28 ` Andrea Arcangeli
2004-12-03 22:37 ` Thomas Gleixner
2004-12-03 22:51 ` Thomas Gleixner
2004-12-03 23:08 ` Andrea Arcangeli
2004-12-10 16:36 ` William Lee Irwin III
2004-12-10 17:35 ` Andrea Arcangeli
2004-12-10 17:43 ` William Lee Irwin III
2004-12-10 17:55 ` Andrea Arcangeli
2004-12-10 18:00 ` William Lee Irwin III
2004-12-10 18:15 ` Andrea Arcangeli
2004-12-10 18:19 ` William Lee Irwin III
2004-12-10 19:05 ` Andrea Arcangeli
2004-12-10 16:51 ` William Lee Irwin III
2004-12-03 21:10 ` Thomas Gleixner
2004-12-03 22:21 ` Andrea Arcangeli
2004-12-05 2:52 ` William Lee Irwin III
2004-12-05 13:38 ` Thomas Gleixner
2004-12-05 15:22 ` Andrea Arcangeli
2004-12-10 16:32 ` William Lee Irwin III
2004-12-10 16:52 ` Thomas Gleixner
2004-12-10 17:43 ` William Lee Irwin III
2004-12-10 17:47 ` William Lee Irwin III
2004-12-10 17:49 ` Andrea Arcangeli
2004-12-10 17:57 ` William Lee Irwin III
2004-12-12 0:12 ` William Lee Irwin III
2004-12-24 1:18 ` Andrea Arcangeli
-- strict thread matches above, loose matches on Subject: below --
2004-12-01 10:21 tvrtko.ursulin
2004-12-04 7:00 Voluspa
2004-12-04 8:08 ` Andrea Arcangeli
2004-12-04 12:42 Voluspa
2004-12-04 16:43 ` Andrea Arcangeli
2004-12-04 18:33 ` Thomas Gleixner
2004-12-04 21:02 ` Thomas Gleixner
2004-12-05 0:27 ` Andrea Arcangeli
2004-12-05 14:55 ` Thomas Gleixner
2004-12-05 15:34 ` Andrea Arcangeli
2004-12-05 16:29 ` Thomas Gleixner
2004-12-05 2:22 Voluspa
2004-12-05 8:32 Voluspa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20041202233459.GF32635@dualathlon.random \
--to=andrea@suse.de \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marcelo.tosatti@cyclades.com \
--cc=nickpiggin@yahoo.com.au \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox