From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Wed, 12 Sep 2007 05:17:30 -0700 From: Andrew Morton Subject: Re: [PATCH 06 of 24] reduce the probability of an OOM livelock Message-Id: <20070912051730.c9efd406.akpm@linux-foundation.org> In-Reply-To: <49e2d90eb0d7b1021b1e.1187786933@v2.random> References: <49e2d90eb0d7b1021b1e.1187786933@v2.random> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Andrea Arcangeli Cc: linux-mm@kvack.org, David Rientjes List-ID: On Wed, 22 Aug 2007 14:48:53 +0200 Andrea Arcangeli wrote: > # HG changeset patch > # User Andrea Arcangeli > # Date 1187778125 -7200 > # Node ID 49e2d90eb0d7b1021b1e1e841bef22fdc647766e > # Parent de62eb332b1dfee7e493043b20e560283ef42f67 > reduce the probability of an OOM livelock > > There's no need to loop way too many times over the lrus in order to > declare defeat and decide to kill a task. The more loops we do the more > likely there we'll run in a livelock with a page bouncing back and > forth between tasks. The maximum number of entries to check in a loop > that returns less than swap-cluster-max pages freed, should be the size > of the list (or at most twice the size of the list if you want to be > really paranoid about the PG_referenced bit). > > Our objective there is to know reliably when it's time that we kill a > task, tring to free a few more pages at that already ciritical point is > worthless. > > This seems to have the effect of reducing the "hang" time during oom > killing. > > Signed-off-by: Andrea Arcangeli > > diff --git a/mm/vmscan.c b/mm/vmscan.c > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1112,7 +1112,7 @@ unsigned long try_to_free_pages(struct z > int priority; > int ret = 0; > unsigned long total_scanned = 0; > - unsigned long nr_reclaimed = 0; > + unsigned long nr_reclaimed; > struct reclaim_state *reclaim_state = current->reclaim_state; > unsigned long lru_pages = 0; > int i; > @@ -1141,12 +1141,12 @@ unsigned long try_to_free_pages(struct z > sc.nr_scanned = 0; > if (!priority) > disable_swap_token(); > - nr_reclaimed += shrink_zones(priority, zones, &sc); > + nr_reclaimed = shrink_zones(priority, zones, &sc); > + if (reclaim_state) > + reclaim_state->reclaimed_slab = 0; > shrink_slab(sc.nr_scanned, gfp_mask, lru_pages); > - if (reclaim_state) { > + if (reclaim_state) > nr_reclaimed += reclaim_state->reclaimed_slab; > - reclaim_state->reclaimed_slab = 0; > - } > total_scanned += sc.nr_scanned; > if (nr_reclaimed >= sc.swap_cluster_max) { > ret = 1; I don't get it. This code changes try_to_free_pages() so that it will only bale out when a single scan of the zone at a particular priority reclaimed more than swap_cluster_max pages. Previously we'd include the results of all the lower-priority scanning in that comparison too. So this patch will make try_to_free_pages() do _more_ scanning than it used to, in some situations. Which seems opposite to what you're trying to do here. > @@ -1238,7 +1238,6 @@ static unsigned long balance_pgdat(pg_da > > loop_again: > total_scanned = 0; > - nr_reclaimed = 0; > sc.may_writepage = !laptop_mode; > count_vm_event(PAGEOUTRUN); > > @@ -1293,6 +1292,7 @@ loop_again: > * pages behind kswapd's direction of progress, which would > * cause too much scanning of the lower zones. > */ > + nr_reclaimed = 0; > for (i = 0; i <= end_zone; i++) { > struct zone *zone = pgdat->node_zones + i; > int nr_slab; > A similar situation exists with this change. Your changelog made no mention of the change to balance_pgdat() and I'm struggling a bit to see what it's doing in there. In both places, the definition of local variable nr_reclaimed can be moved into a more inner scope. This makes the code easier to follow. Please watch out for cleanup opportunities like that. I'll skip this patch. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org