All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: Andrew Morton <akpm@osdl.org>
Cc: Rik van Riel <riel@redhat.com>,
	nikita@clusterfs.com, linux-mm@kvack.org,
	Nick Piggin <piggin@cyberone.com.au>
Subject: Re: [PATCH] ignore referenced pages on reclaim when OOM
Date: Wed, 10 Nov 2004 16:41:34 -0200	[thread overview]
Message-ID: <20041110184134.GC12867@logos.cnet> (raw)
In-Reply-To: <20041108142837.307029fc.akpm@osdl.org>

On Mon, Nov 08, 2004 at 02:28:37PM -0800, Andrew Morton wrote:
> Rik van Riel <riel@redhat.com> wrote:
> >
> > On Tue, 9 Nov 2004, Nikita Danilov wrote:
> > 
> > >  > Speeds up extreme load performance on Rik's tests.
> > > 
> > > I recently tested quite similar thing, the only dfference being that in
> > > my case references bit started being ignored when scanning priority
> > > reached 2 rather than 0.
> > > 
> > > I found that it _degrades_ performance in the loads when there is a lot
> > > of file system write-back going from tail of the inactive list (like
> > > dirtying huge file through mmap in a loop).
> > 
> > Well yeah, when you reach priority 2, you've only scanned
> > 1/4 of memory.  On the other hand, when you reach priority
> > 0, you've already scanned all pages once - beyond that point
> > the referenced bit really doesn't buy you much any more.
> > 
> 
> But we have to scan active, referenced pages two times to move them onto
> the inactive list.  A bit more, really, because nowadays
> refill_inactive_zone() doesn't even run page_referenced() until it starts
> to reach higher scanning priorities.
> 
> So it could be that we're just not scanning enough.

You know, all_unreclaimable has drawbacks.

Its hard to know whether you have "scanned enough to consider the box OOM 
and trigger OOM killer" when all_unreclaimable avoids the system 
from "scanning enough".

I'm trying to improve the OOM-kill-from-kswapd patch but z->all_unreclaimable 
is currently the bigger "rock on the shoe" - we need some way to detect that
the zones have been scanned enough so to be able to say 
"OK, I have scanned enough and no freeable pages appear, its time 
to trigger the OOM killer".

So z->all_unreclaimable logic and "OOM detection" are conflicting goals.

There must be some way to combine both effectively.

This is my current patch - avoids spurious OOM kills but obviously 
fails to set "worked_dma" - "worked_normal" due to all_unreclaimable logic,  
resulting in livelock when swapspace exhauts. 

Ideas are welcome.


--- vmscan.c.orig	2004-11-09 16:38:04.000000000 -0200
+++ vmscan.c	2004-11-10 18:59:43.098090736 -0200
@@ -878,6 +878,8 @@
 		shrink_zone(zone, sc);
 	}
 }
+
+int task_looping_oom = 0;
  
 /*
  * This is the main entry point to direct page reclaim.
@@ -952,8 +954,8 @@
 		if (sc.nr_scanned && priority < DEF_PRIORITY - 2)
 			blk_congestion_wait(WRITE, HZ/10);
 	}
-	if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY))
-		out_of_memory(gfp_mask);
+        if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY))
+		task_looping_oom = 1;
 out:
 	for (i = 0; zones[i] != 0; i++) {
 		struct zone *zone = zones[i];
@@ -963,6 +965,8 @@
 
 		zone->prev_priority = zone->temp_priority;
 	}
+	if (ret)
+		task_looping_oom = 0;
 	return ret;
 }
 
@@ -997,13 +1001,17 @@
 	int all_zones_ok;
 	int priority;
 	int i;
-	int total_scanned, total_reclaimed;
+	int total_scanned, total_reclaimed, low_reclaimed;
+	int worked_norm, worked_dma;
 	struct reclaim_state *reclaim_state = current->reclaim_state;
 	struct scan_control sc;
 
+
 loop_again:
 	total_scanned = 0;
 	total_reclaimed = 0;
+	low_reclaimed = 0;
+	worked_norm = worked_dma = 0;
 	sc.gfp_mask = GFP_KERNEL;
 	sc.may_writepage = 0;
 	sc.nr_mapped = read_page_state(nr_mapped);
@@ -1072,6 +1080,17 @@
 			if (zone->all_unreclaimable && priority != DEF_PRIORITY)
 				continue;
 
+			/* if we're scanning dma or normal, and priority 
+			 * reached zero, set "worked_dma" or "worked_norm" 
+			 * accordingly.
+			 */
+			if (i <= 1 && priority == 0) {
+				if (!i) 
+					worked_dma = 1;
+				else
+					worked_norm = 1;
+			}
+
 			if (nr_pages == 0) {	/* Not software suspend */
 				if (!zone_watermark_ok(zone, order,
 						zone->pages_high, end_zone, 0, 0))
@@ -1088,6 +1107,10 @@
 			shrink_slab(sc.nr_scanned, GFP_KERNEL, lru_pages);
 			sc.nr_reclaimed += reclaim_state->reclaimed_slab;
 			total_reclaimed += sc.nr_reclaimed;
+
+			if (i <= 1)
+				low_reclaimed += sc.nr_reclaimed;
+
 			if (zone->all_unreclaimable)
 				continue;
 			if (zone->pages_scanned >= (zone->nr_active +
@@ -1128,6 +1151,29 @@
 
 		zone->prev_priority = zone->temp_priority;
 	}
+
+
+	if (!low_reclaimed && worked_dma && worked_norm && task_looping_oom) {
+
+		printk(KERN_ERR "kswp: pri:%d tot_recl:%d wrkd_dma:%d"
+				"wrkd_norm:%d tsk_loop_oom:%d\n",
+			priority, total_reclaimed, worked_dma, worked_norm, 
+				task_looping_oom);
+
+		/* 
+		 * Only kill if ZONE_NORMAL/ZONE_DMA are both below
+		 * pages_min
+		 */
+		for (i = pgdat->nr_zones - 2; i >= 0; i--) {
+			struct zone *zone = pgdat->node_zones + i;
+
+			if (zone->free_pages > zone->pages_min)
+				return 0;
+		}
+		out_of_memory(GFP_KERNEL);
+		task_looping_oom = 0;
+	}
+
 	if (!all_zones_ok) {
 		cond_resched();
 		goto loop_again;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

  reply	other threads:[~2004-11-10 18:41 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-11-08 18:18 [PATCH] ignore referenced pages on reclaim when OOM Marcelo Tosatti
2004-11-08 21:48 ` Nikita Danilov
2004-11-08 21:56   ` Rik van Riel
2004-11-08 18:48     ` Marcelo Tosatti
2004-11-08 22:28     ` Andrew Morton
2004-11-10 18:41       ` Marcelo Tosatti [this message]
2004-11-10 22:29         ` Andrew Morton
2004-11-10 20:09           ` Marcelo Tosatti
2004-11-12 16:10           ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20041110184134.GC12867@logos.cnet \
    --to=marcelo.tosatti@cyclades.com \
    --cc=akpm@osdl.org \
    --cc=linux-mm@kvack.org \
    --cc=nikita@clusterfs.com \
    --cc=piggin@cyberone.com.au \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.