public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@osdl.org>,
	marcelo.tosatti@cyclades.com, LKML <linux-kernel@vger.kernel.org>,
	nickpiggin@yahoo.com.au
Subject: Re: [PATCH] oom killer (Core)
Date: Fri, 3 Dec 2004 00:35:00 +0100	[thread overview]
Message-ID: <20041202233459.GF32635@dualathlon.random> (raw)
In-Reply-To: <1102013716.13353.226.camel@tglx.tec.linutronix.de>

On Thu, Dec 02, 2004 at 07:55:16PM +0100, Thomas Gleixner wrote:
> On Thu, 2004-12-02 at 19:08 +0100, Andrea Arcangeli wrote:
> > OTOH we must not forget 2.4(-aa) calls do_exit synchronously and it
> > never sends signals. That might be why 2.4 doesn't kill more than one
> > task by mistake, even without a callback-wakeup. 
> 
> I just run the same test on 2.4.27 and the behaviour is completely
> different.
> 
> The box seems to be stuck in a swap in/out loop for quite a long time.
> During this time the box is not responsive at all. It finally stops the
> forking after quite a long time of swapping with
> fork() (error: resource temporarily not available). 

Fork eventually failing is very reasonable if you're executing a fork
loop.

> 
> There is no output in dmesg, but I'm not able to remove the remaining
> hackbench processes as even a kill -SIGKILL returns with 
> fork() (error: resource temporarily not available)
> 
> I'm not sure, which of the two scenarios I like better :)

Please try with 2.4.23aa3, I think there was some oom killer change
after I had no resources to track 2.4 anymore. I'm not saying 2.4.23aa3
will work better though, but I would like to know if there's some corner
case still open in 2.4-aa. Careful, 2.4.23aa3 has security bugs (only
local security of course, i.e. normally not a big issue, sure good
enough for a quick test).

I doubt your testcase simulates anything remotely realistic but
anyway it's still informative.

What I'm simulating here is very real life scenario with a couple of
apps allocating more memory than ram.

> FYI, I tried with 2.6 UP and PREEMPT=n. The result is more horrible. The
> box just gets stuck in an endless swap in/swap out and does not respond
> to anything else than SysRq-T and the reset button.
> 
> With the callback the machine did not come back after 20 Minutes.

Was the oom killer invoked at all? If yes, and it works with preempt,
that could mean a cond_resched is simply missing...

> You meant the one in badness() right ?

yes.

> Well it makes it better, but I was able to have a second invocation
> before the first killed tasks exited. That's simple to explain. The task
> is on the way out and releases resources, so the VM size is reduced and
> the killer picks another process. 

That's trivial to fix checking for PF_DEAD/PF_EXITING.

> > I'd rather fix this by removing buggy code, than by adding additional
> > logics on top of already buggy code (i.e. setting PF_MEMDIE is a smp
> > race and can corrupt other bitflags), but at least the
> > oom-wakeup-callback from do_exit still makes a lot of sense (even if
> > PF_MEMDIE must be dropped since it's buggy, and something else should be
> > used instead).
> 
> I think the callback is the only safe way to fix that. If PF_MEMDIE is
> racy then I'm sure we will find a different indicator for that.

The callback adds overhead to the exit path. Plus strictly speaking it's
not actually a callback, you're just "polling" for the bitflag :)

> Yep, but the reentrancy blocking with the callback makes the time, count
> crap and the watermark check go away, as it is safe to reenable the
> killer at this point because we definitely freed memory resources. So
> the watermark comes for free.

You can get an I/O race where your program is about to finish a failing
try_to_free_pages pass (note that a task exiting won't make
try_to_free_pages work any easier, try_to_free_pages has to free
allocated memory, it doesn't care if there's 1M or 100M of free memory).
If you don't check the watermarks after waiting for I/O, you're going to
generate a suprious oom-killing. Your changes can't help.

Note that even the watermark checks leaves a race window open, but at
least it's not an I/O window. While try_to_free_pages can wait for I/O
and then fail.

I'll add to my last patch the removal of the PF_MEMDIE check in oom_kill
plus I'll fix the remaining race with PF_EXITING/DEAD, and I'll add a
cond_resched. Then you can try again with my simple way (w/ and w/o
PREEMPT ;).

Thanks for the great feedback.

  parent reply	other threads:[~2004-12-02 23:35 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-12-01  9:49 [PATCH] oom killer (Core) tglx
2004-12-01 21:16 ` Andrea Arcangeli
2004-12-01 22:06   ` Thomas Gleixner
2004-12-01 22:33     ` Andrea Arcangeli
2004-12-02  3:36     ` Andrea Arcangeli
2004-12-02 11:09       ` Thomas Gleixner
2004-12-02 13:48         ` Thomas Gleixner
2004-12-02 16:47           ` Andrea Arcangeli
2004-12-02 16:55             ` Andrew Morton
2004-12-02 11:18               ` Marcelo Tosatti
2004-12-02 17:17               ` Thomas Gleixner
2004-12-02 17:27                 ` Andrew Morton
2004-12-02 18:08               ` Andrea Arcangeli
2004-12-02 18:29                 ` Andrew Morton
2004-12-02 19:01                   ` Thomas Gleixner
2004-12-02 18:55                 ` Thomas Gleixner
2004-12-02 19:07                   ` Andrew Morton
2004-12-02 19:08                     ` Thomas Gleixner
2004-12-02 19:22                       ` Andrew Morton
2004-12-02 19:24                         ` Thomas Gleixner
2004-12-02 20:11                           ` Andre Tomt
2004-12-03 22:45                             ` Thomas Gleixner
2004-12-02 23:47                           ` Andrea Arcangeli
2004-12-03 14:41                           ` Helge Hafting
2004-12-03 21:20                             ` Thomas Gleixner
2004-12-05 21:14                               ` Helge Hafting
2004-12-02 23:35                   ` Andrea Arcangeli [this message]
2004-12-03  2:28                     ` Andrea Arcangeli
2004-12-03 22:37                       ` Thomas Gleixner
2004-12-03 22:51                         ` Thomas Gleixner
2004-12-03 23:08                           ` Andrea Arcangeli
2004-12-10 16:36                       ` William Lee Irwin III
2004-12-10 17:35                         ` Andrea Arcangeli
2004-12-10 17:43                           ` William Lee Irwin III
2004-12-10 17:55                             ` Andrea Arcangeli
2004-12-10 18:00                               ` William Lee Irwin III
2004-12-10 18:15                                 ` Andrea Arcangeli
2004-12-10 18:19                                   ` William Lee Irwin III
2004-12-10 19:05                                     ` Andrea Arcangeli
2004-12-10 16:51                       ` William Lee Irwin III
2004-12-03 21:10                     ` Thomas Gleixner
2004-12-03 22:21                       ` Andrea Arcangeli
2004-12-05  2:52 ` William Lee Irwin III
2004-12-05 13:38   ` Thomas Gleixner
2004-12-05 15:22     ` Andrea Arcangeli
2004-12-10 16:32 ` William Lee Irwin III
2004-12-10 16:52   ` Thomas Gleixner
2004-12-10 17:43     ` William Lee Irwin III
2004-12-10 17:47     ` William Lee Irwin III
2004-12-10 17:49     ` Andrea Arcangeli
2004-12-10 17:57       ` William Lee Irwin III
2004-12-12  0:12         ` William Lee Irwin III
2004-12-24  1:18     ` Andrea Arcangeli
  -- strict thread matches above, loose matches on Subject: below --
2004-12-01 10:21 tvrtko.ursulin
2004-12-04  7:00 Voluspa
2004-12-04  8:08 ` Andrea Arcangeli
2004-12-04 12:42 Voluspa
2004-12-04 16:43 ` Andrea Arcangeli
2004-12-04 18:33   ` Thomas Gleixner
2004-12-04 21:02     ` Thomas Gleixner
2004-12-05  0:27       ` Andrea Arcangeli
2004-12-05 14:55         ` Thomas Gleixner
2004-12-05 15:34           ` Andrea Arcangeli
2004-12-05 16:29             ` Thomas Gleixner
2004-12-05  2:22 Voluspa
2004-12-05  8:32 Voluspa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20041202233459.GF32635@dualathlon.random \
    --to=andrea@suse.de \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marcelo.tosatti@cyclades.com \
    --cc=nickpiggin@yahoo.com.au \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox