From: Andrea Arcangeli <andrea@suse.de>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@osdl.org>,
marcelo.tosatti@cyclades.com, LKML <linux-kernel@vger.kernel.org>,
nickpiggin@yahoo.com.au
Subject: Re: [PATCH] oom killer (Core)
Date: Fri, 3 Dec 2004 00:35:00 +0100 [thread overview]
Message-ID: <20041202233459.GF32635@dualathlon.random> (raw)
In-Reply-To: <1102013716.13353.226.camel@tglx.tec.linutronix.de>
On Thu, Dec 02, 2004 at 07:55:16PM +0100, Thomas Gleixner wrote:
> On Thu, 2004-12-02 at 19:08 +0100, Andrea Arcangeli wrote:
> > OTOH we must not forget 2.4(-aa) calls do_exit synchronously and it
> > never sends signals. That might be why 2.4 doesn't kill more than one
> > task by mistake, even without a callback-wakeup.
>
> I just run the same test on 2.4.27 and the behaviour is completely
> different.
>
> The box seems to be stuck in a swap in/out loop for quite a long time.
> During this time the box is not responsive at all. It finally stops the
> forking after quite a long time of swapping with
> fork() (error: resource temporarily not available).
Fork eventually failing is very reasonable if you're executing a fork
loop.
>
> There is no output in dmesg, but I'm not able to remove the remaining
> hackbench processes as even a kill -SIGKILL returns with
> fork() (error: resource temporarily not available)
>
> I'm not sure, which of the two scenarios I like better :)
Please try with 2.4.23aa3, I think there was some oom killer change
after I had no resources to track 2.4 anymore. I'm not saying 2.4.23aa3
will work better though, but I would like to know if there's some corner
case still open in 2.4-aa. Careful, 2.4.23aa3 has security bugs (only
local security of course, i.e. normally not a big issue, sure good
enough for a quick test).
I doubt your testcase simulates anything remotely realistic but
anyway it's still informative.
What I'm simulating here is very real life scenario with a couple of
apps allocating more memory than ram.
> FYI, I tried with 2.6 UP and PREEMPT=n. The result is more horrible. The
> box just gets stuck in an endless swap in/swap out and does not respond
> to anything else than SysRq-T and the reset button.
>
> With the callback the machine did not come back after 20 Minutes.
Was the oom killer invoked at all? If yes, and it works with preempt,
that could mean a cond_resched is simply missing...
> You meant the one in badness() right ?
yes.
> Well it makes it better, but I was able to have a second invocation
> before the first killed tasks exited. That's simple to explain. The task
> is on the way out and releases resources, so the VM size is reduced and
> the killer picks another process.
That's trivial to fix checking for PF_DEAD/PF_EXITING.
> > I'd rather fix this by removing buggy code, than by adding additional
> > logics on top of already buggy code (i.e. setting PF_MEMDIE is a smp
> > race and can corrupt other bitflags), but at least the
> > oom-wakeup-callback from do_exit still makes a lot of sense (even if
> > PF_MEMDIE must be dropped since it's buggy, and something else should be
> > used instead).
>
> I think the callback is the only safe way to fix that. If PF_MEMDIE is
> racy then I'm sure we will find a different indicator for that.
The callback adds overhead to the exit path. Plus strictly speaking it's
not actually a callback, you're just "polling" for the bitflag :)
> Yep, but the reentrancy blocking with the callback makes the time, count
> crap and the watermark check go away, as it is safe to reenable the
> killer at this point because we definitely freed memory resources. So
> the watermark comes for free.
You can get an I/O race where your program is about to finish a failing
try_to_free_pages pass (note that a task exiting won't make
try_to_free_pages work any easier, try_to_free_pages has to free
allocated memory, it doesn't care if there's 1M or 100M of free memory).
If you don't check the watermarks after waiting for I/O, you're going to
generate a suprious oom-killing. Your changes can't help.
Note that even the watermark checks leaves a race window open, but at
least it's not an I/O window. While try_to_free_pages can wait for I/O
and then fail.
I'll add to my last patch the removal of the PF_MEMDIE check in oom_kill
plus I'll fix the remaining race with PF_EXITING/DEAD, and I'll add a
cond_resched. Then you can try again with my simple way (w/ and w/o
PREEMPT ;).
Thanks for the great feedback.
next prev parent reply other threads:[~2004-12-02 23:35 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-12-01 9:49 [PATCH] oom killer (Core) tglx
2004-12-01 21:16 ` Andrea Arcangeli
2004-12-01 22:06 ` Thomas Gleixner
2004-12-01 22:33 ` Andrea Arcangeli
2004-12-02 3:36 ` Andrea Arcangeli
2004-12-02 11:09 ` Thomas Gleixner
2004-12-02 13:48 ` Thomas Gleixner
2004-12-02 16:47 ` Andrea Arcangeli
2004-12-02 16:55 ` Andrew Morton
2004-12-02 11:18 ` Marcelo Tosatti
2004-12-02 17:17 ` Thomas Gleixner
2004-12-02 17:27 ` Andrew Morton
2004-12-02 18:08 ` Andrea Arcangeli
2004-12-02 18:29 ` Andrew Morton
2004-12-02 19:01 ` Thomas Gleixner
2004-12-02 18:55 ` Thomas Gleixner
2004-12-02 19:07 ` Andrew Morton
2004-12-02 19:08 ` Thomas Gleixner
2004-12-02 19:22 ` Andrew Morton
2004-12-02 19:24 ` Thomas Gleixner
2004-12-02 20:11 ` Andre Tomt
2004-12-03 22:45 ` Thomas Gleixner
2004-12-02 23:47 ` Andrea Arcangeli
2004-12-03 14:41 ` Helge Hafting
2004-12-03 21:20 ` Thomas Gleixner
2004-12-05 21:14 ` Helge Hafting
2004-12-02 23:35 ` Andrea Arcangeli [this message]
2004-12-03 2:28 ` Andrea Arcangeli
2004-12-03 22:37 ` Thomas Gleixner
2004-12-03 22:51 ` Thomas Gleixner
2004-12-03 23:08 ` Andrea Arcangeli
2004-12-10 16:36 ` William Lee Irwin III
2004-12-10 17:35 ` Andrea Arcangeli
2004-12-10 17:43 ` William Lee Irwin III
2004-12-10 17:55 ` Andrea Arcangeli
2004-12-10 18:00 ` William Lee Irwin III
2004-12-10 18:15 ` Andrea Arcangeli
2004-12-10 18:19 ` William Lee Irwin III
2004-12-10 19:05 ` Andrea Arcangeli
2004-12-10 16:51 ` William Lee Irwin III
2004-12-03 21:10 ` Thomas Gleixner
2004-12-03 22:21 ` Andrea Arcangeli
2004-12-05 2:52 ` William Lee Irwin III
2004-12-05 13:38 ` Thomas Gleixner
2004-12-05 15:22 ` Andrea Arcangeli
2004-12-10 16:32 ` William Lee Irwin III
2004-12-10 16:52 ` Thomas Gleixner
2004-12-10 17:43 ` William Lee Irwin III
2004-12-10 17:47 ` William Lee Irwin III
2004-12-10 17:49 ` Andrea Arcangeli
2004-12-10 17:57 ` William Lee Irwin III
2004-12-12 0:12 ` William Lee Irwin III
2004-12-24 1:18 ` Andrea Arcangeli
-- strict thread matches above, loose matches on Subject: below --
2004-12-01 10:21 tvrtko.ursulin
2004-12-04 7:00 Voluspa
2004-12-04 8:08 ` Andrea Arcangeli
2004-12-04 12:42 Voluspa
2004-12-04 16:43 ` Andrea Arcangeli
2004-12-04 18:33 ` Thomas Gleixner
2004-12-04 21:02 ` Thomas Gleixner
2004-12-05 0:27 ` Andrea Arcangeli
2004-12-05 14:55 ` Thomas Gleixner
2004-12-05 15:34 ` Andrea Arcangeli
2004-12-05 16:29 ` Thomas Gleixner
2004-12-05 2:22 Voluspa
2004-12-05 8:32 Voluspa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20041202233459.GF32635@dualathlon.random \
--to=andrea@suse.de \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marcelo.tosatti@cyclades.com \
--cc=nickpiggin@yahoo.com.au \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.