public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Larry Woodman <lwoodman@redhat.com>
To: linux-kernel@vger.kernel.org
Subject: Re: oom kill oddness.
Date: Fri, 29 Sep 2006 16:03:14 -0400	[thread overview]
Message-ID: <451D7C02.2090907@redhat.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 1227 bytes --]

>
>
>So I have two boxes that are very similar.
>Both have 2GB of RAM & 1GB of swap space.
>One has a 2.8GHz CPU, the other a 2.93GHz CPU, both dualcore.
>
>The slower box survives a 'make -j bzImage' of a 2.6.18 kernel tree
>without incident. (Although it takes ~4 minutes longer than a -j2)
>
>The faster box goes absolutely nuts, oomkilling everything in sight,
>until eventually after about 10 minutes, the box locks up dead,
>and won't even respond to pings.
>
>Oh, the only other difference - the slower box has 1 disk, whereas the
>faster box has two in RAID0.   I'm not surprised that stuff is getting
>oom-killed given the pathological scenario, but the fact that the
>box never recovered at all is a little odd.  Does md lack some means
>of dealing with low memory scenarios ?
>
>	Dave
>
Dave, this has been a problem since the out_of_memory() function was 
changed
between 2.6.10 and 2.6.11.  Before this change out_of_memory() required 
multiple
calls within 5 seconds before actually OOM killed a process.  After the 
change(in 2.6.11)
a single call to out_of_memory() results in OOM killing a process.  The 
following patch
allows the 2.6.18 system to run under much more memory pressure before 
it OOM kills.




[-- Attachment #2: oomkill.patch --]
[-- Type: text/plain, Size: 2191 bytes --]

--- linux-2.6.18.noarch/mm/oom_kill.c.orig
+++ linux-2.6.18.noarch/mm/oom_kill.c
@@ -306,6 +306,69 @@ static int oom_kill_process(struct task_
 	return oom_kill_task(p, message);
 }
 
+int should_oom_kill(void)
+{
+	static spinlock_t oom_lock = SPIN_LOCK_UNLOCKED;
+	static unsigned long first, last, count, lastkill;
+	unsigned long now, since;
+	int ret = 0;
+
+	spin_lock(&oom_lock);
+	now = jiffies;
+	since = now - last;
+	last = now;
+
+	/*
+	 * If it's been a long time since last failure,
+	 * we're not oom.
+	 */
+	if (since > 5*HZ)
+		goto reset;
+
+	/*
+	 * If we haven't tried for at least one second,
+	 * we're not really oom.
+	 */
+	since = now - first;
+	if (since < HZ)
+		goto out_unlock;
+
+	/*
+	 * If we have gotten only a few failures,
+	 * we're not really oom.
+	 */
+	if (++count < 10)
+		goto out_unlock;
+
+	/*
+	 * If we just killed a process, wait a while
+	 * to give that task a chance to exit. This
+	 * avoids killing multiple processes needlessly.
+	 */
+	since = now - lastkill;
+	if (since < HZ*5)
+		goto out_unlock;
+
+	/*
+	 * Ok, really out of memory. Kill something.
+	 */
+	lastkill = now;
+	ret = 1;
+
+reset:
+/*
+ * We dropped the lock above, so check to be sure the variable
+ * first only ever increases to prevent false OOM's.
+ */
+	if (time_after(now, first))
+		first = now;
+	count = 0;
+
+out_unlock:
+	spin_unlock(&oom_lock);
+	return ret;
+}
+
 /**
  * out_of_memory - kill the "best" process when we run out of memory
  *
@@ -326,6 +389,9 @@ void out_of_memory(struct zonelist *zone
 		show_mem();
 	}
 
+	if (!should_oom_kill())
+		return;
+
 	cpuset_lock();
 	read_lock(&tasklist_lock);
 
--- linux-2.6.18.noarch/mm/vmscan.c.orig
+++ linux-2.6.18.noarch/mm/vmscan.c
@@ -999,10 +999,8 @@ unsigned long try_to_free_pages(struct z
 			reclaim_state->reclaimed_slab = 0;
 		}
 		total_scanned += sc.nr_scanned;
-		if (nr_reclaimed >= sc.swap_cluster_max) {
-			ret = 1;
+		if (nr_reclaimed >= sc.swap_cluster_max)
 			goto out;
-		}
 
 		/*
 		 * Try to write back as many pages as we just scanned.  This
@@ -1030,6 +1028,8 @@ out:
 
 		zone->prev_priority = zone->temp_priority;
 	}
+	if (nr_reclaimed)
+		ret = 1;
 	return ret;
 }
 

             reply	other threads:[~2006-09-29 19:58 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-29 20:03 Larry Woodman [this message]
2006-09-29 21:34 ` oom kill oddness Dave Jones
  -- strict thread matches above, loose matches on Subject: below --
2006-09-27 20:54 Dave Jones
2006-09-27 23:59 ` Andrew Morton
2006-09-28 23:03 ` Roman Zippel
2006-09-29  0:17   ` Andrew Morton
2006-09-29  0:22     ` Dave Jones
2006-09-29  0:57     ` Roman Zippel
2006-09-29  1:39       ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=451D7C02.2090907@redhat.com \
    --to=lwoodman@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox