All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Greg KH <gregkh@linuxfoundation.org>,
	torvalds@linux-foundation.org, akpm@linux-foundation.org,
	alan@lxorguk.ukuu.org.uk, Dave Chinner <dchinner@redhat.com>,
	Al Viro <viro@zeniv.linux.org.uk>, Mel Gorman <mgorman@suse.de>
Subject: [ 13/40] vmscan: shrinker->nr updates race and go wrong
Date: Thu, 26 Jul 2012 14:29:31 -0700	[thread overview]
Message-ID: <20120726211412.284677137@linuxfoundation.org> (raw)
In-Reply-To: <20120726211411.164006056@linuxfoundation.org>

From: Greg KH <gregkh@linuxfoundation.org>

3.0-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dave Chinner <dchinner@redhat.com>

commit acf92b485cccf028177f46918e045c0c4e80ee10 upstream.

Stable note: Not tracked in Bugzilla. This patch reduces excessive
	reclaim of slab objects reducing the amount of information
	that has to be brought back in from disk.

shrink_slab() allows shrinkers to be called in parallel so the
struct shrinker can be updated concurrently. It does not provide any
exclusio for such updates, so we can get the shrinker->nr value
increasing or decreasing incorrectly.

As a result, when a shrinker repeatedly returns a value of -1 (e.g.
a VFS shrinker called w/ GFP_NOFS), the shrinker->nr goes haywire,
sometimes updating with the scan count that wasn't used, sometimes
losing it altogether. Worse is when a shrinker does work and that
update is lost due to racy updates, which means the shrinker will do
the work again!

Fix this by making the total_scan calculations independent of
shrinker->nr, and making the shrinker->nr updates atomic w.r.t. to
other updates via cmpxchg loops.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/vmscan.c |   45 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 32 insertions(+), 13 deletions(-)

--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -251,17 +251,29 @@ unsigned long shrink_slab(struct shrink_
 		unsigned long total_scan;
 		unsigned long max_pass;
 		int shrink_ret = 0;
+		long nr;
+		long new_nr;
 
+		/*
+		 * copy the current shrinker scan count into a local variable
+		 * and zero it so that other concurrent shrinker invocations
+		 * don't also do this scanning work.
+		 */
+		do {
+			nr = shrinker->nr;
+		} while (cmpxchg(&shrinker->nr, nr, 0) != nr);
+
+		total_scan = nr;
 		max_pass = do_shrinker_shrink(shrinker, shrink, 0);
 		delta = (4 * nr_pages_scanned) / shrinker->seeks;
 		delta *= max_pass;
 		do_div(delta, lru_pages + 1);
-		shrinker->nr += delta;
-		if (shrinker->nr < 0) {
+		total_scan += delta;
+		if (total_scan < 0) {
 			printk(KERN_ERR "shrink_slab: %pF negative objects to "
 			       "delete nr=%ld\n",
-			       shrinker->shrink, shrinker->nr);
-			shrinker->nr = max_pass;
+			       shrinker->shrink, total_scan);
+			total_scan = max_pass;
 		}
 
 		/*
@@ -269,13 +281,10 @@ unsigned long shrink_slab(struct shrink_
 		 * never try to free more than twice the estimate number of
 		 * freeable entries.
 		 */
-		if (shrinker->nr > max_pass * 2)
-			shrinker->nr = max_pass * 2;
-
-		total_scan = shrinker->nr;
-		shrinker->nr = 0;
+		if (total_scan > max_pass * 2)
+			total_scan = max_pass * 2;
 
-		trace_mm_shrink_slab_start(shrinker, shrink, total_scan,
+		trace_mm_shrink_slab_start(shrinker, shrink, nr,
 					nr_pages_scanned, lru_pages,
 					max_pass, delta, total_scan);
 
@@ -296,9 +305,19 @@ unsigned long shrink_slab(struct shrink_
 			cond_resched();
 		}
 
-		shrinker->nr += total_scan;
-		trace_mm_shrink_slab_end(shrinker, shrink_ret, total_scan,
-					 shrinker->nr);
+		/*
+		 * move the unused scan count back into the shrinker in a
+		 * manner that handles concurrent updates. If we exhausted the
+		 * scan, there is no need to do an update.
+		 */
+		do {
+			nr = shrinker->nr;
+			new_nr = total_scan + nr;
+			if (total_scan <= 0)
+				break;
+		} while (cmpxchg(&shrinker->nr, nr, new_nr) != nr);
+
+		trace_mm_shrink_slab_end(shrinker, shrink_ret, nr, new_nr);
 	}
 	up_read(&shrinker_rwsem);
 out:



  parent reply	other threads:[~2012-07-26 21:39 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-26 21:14 [ 00/40] 3.0.39-stable review Greg KH
2012-07-26 21:29 ` [ 01/40] cifs: always update the inode cache with the results from a FIND_* Greg Kroah-Hartman
2012-07-26 21:29   ` [ 02/40] ntp: Fix STA_INS/DEL clearing bug Greg Kroah-Hartman
2012-07-26 21:29   ` [ 03/40] mm: fix lost kswapd wakeup in kswapd_stop() Greg Kroah-Hartman
2012-07-26 21:29   ` [ 04/40] MIPS: Properly align the .data..init_task section Greg Kroah-Hartman
2012-07-26 21:29   ` [ 05/40] UBIFS: fix a bug in empty space fix-up Greg Kroah-Hartman
2012-07-26 21:29   ` [ 06/40] dm raid1: fix crash with mirror recovery and discard Greg Kroah-Hartman
2012-07-26 21:29   ` [ 07/40] mm/vmstat.c: cache align vm_stat Greg Kroah-Hartman
2012-07-26 21:29   ` [ 08/40] mm: memory hotplug: Check if pages are correctly reserved on a per-section basis Greg Kroah-Hartman
2012-07-26 21:29   ` [ 09/40] mm: reduce the amount of work done when updating min_free_kbytes Greg Kroah-Hartman
2012-07-26 21:29   ` [ 10/40] mm: vmscan: fix force-scanning small targets without swap Greg Kroah-Hartman
2012-07-26 21:29   ` [ 11/40] vmscan: clear ZONE_CONGESTED for zone with good watermark Greg Kroah-Hartman
2012-07-26 21:29   ` [ 12/40] vmscan: add shrink_slab tracepoints Greg Kroah-Hartman
2012-07-26 21:29   ` Greg Kroah-Hartman [this message]
2012-07-29 20:29     ` [ 13/40] vmscan: shrinker->nr updates race and go wrong Ben Hutchings
2012-07-30  9:06       ` Mel Gorman
2012-07-30 15:41         ` Greg Kroah-Hartman
2012-07-26 21:29   ` [ 14/40] vmscan: reduce wind up shrinker->nr when shrinker cant do work Greg Kroah-Hartman
2012-07-26 21:29   ` [ 15/40] vmscan: limit direct reclaim for higher order allocations Greg Kroah-Hartman
2012-07-26 21:29   ` [ 16/40] vmscan: abort reclaim/compaction if compaction can proceed Greg Kroah-Hartman
2012-07-26 21:29   ` [ 17/40] mm: compaction: trivial clean up in acct_isolated() Greg Kroah-Hartman
2012-07-26 21:29   ` [ 18/40] mm: change isolate mode from #define to bitwise type Greg Kroah-Hartman
2012-07-26 21:29   ` [ 19/40] mm: compaction: make isolate_lru_page() filter-aware Greg Kroah-Hartman
2012-07-26 21:29   ` [ 20/40] mm: zone_reclaim: " Greg Kroah-Hartman
2012-07-26 21:29   ` [ 21/40] mm: migration: clean up unmap_and_move() Greg Kroah-Hartman
2012-07-26 21:29   ` [ 22/40] mm: compaction: allow compaction to isolate dirty pages Greg Kroah-Hartman
2012-07-26 21:29   ` [ 23/40] mm: compaction: determine if dirty pages can be migrated without blocking within ->migratepage Greg Kroah-Hartman
2012-07-26 21:29   ` [ 24/40] mm: page allocator: do not call direct reclaim for THP allocations while compaction is deferred Greg Kroah-Hartman
2012-07-26 21:29   ` [ 25/40] mm: compaction: make isolate_lru_page() filter-aware again Greg Kroah-Hartman
2012-07-26 21:29   ` [ 26/40] kswapd: avoid unnecessary rebalance after an unsuccessful balancing Greg Kroah-Hartman
2012-07-26 21:29   ` [ 27/40] kswapd: assign new_order and new_classzone_idx after wakeup in sleeping Greg Kroah-Hartman
2012-07-26 21:29   ` [ 28/40] mm: compaction: introduce sync-light migration for use by compaction Greg Kroah-Hartman
2012-07-26 21:29   ` [ 29/40] mm: vmscan: when reclaiming for compaction, ensure there are sufficient free pages available Greg Kroah-Hartman
2012-07-26 21:29   ` [ 30/40] mm: vmscan: do not OOM if aborting reclaim to start compaction Greg Kroah-Hartman
2012-07-26 21:29   ` [ 31/40] mm: vmscan: check if reclaim should really abort even if compaction_ready() is true for one zone Greg Kroah-Hartman
2012-07-26 21:29   ` [ 32/40] vmscan: promote shared file mapped pages Greg Kroah-Hartman
2012-07-26 21:29   ` [ 33/40] vmscan: activate executable pages after first usage Greg Kroah-Hartman
2012-07-26 21:29   ` [ 34/40] mm/vmscan.c: consider swap space when deciding whether to continue reclaim Greg Kroah-Hartman
2012-07-26 21:29   ` [ 35/40] mm: test PageSwapBacked in lumpy reclaim Greg Kroah-Hartman
2012-07-26 21:29   ` [ 36/40] mm: vmscan: convert global reclaim to per-memcg LRU lists Greg Kroah-Hartman
2012-07-30  0:25     ` Ben Hutchings
2012-07-30 15:29       ` Greg Kroah-Hartman
2012-07-26 21:29   ` [ 37/40] cpusets: avoid looping when storing to mems_allowed if one node remains set Greg Kroah-Hartman
2012-07-26 21:29   ` [ 38/40] cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask Greg Kroah-Hartman
2012-07-26 21:29   ` [ 39/40] cpuset: mm: reduce large amounts of memory barrier related damage v3 Greg Kroah-Hartman
2012-07-27 15:08     ` Herton Ronaldo Krzesinski
2012-07-27 15:23       ` Mel Gorman
2012-07-27 19:01         ` Greg Kroah-Hartman
2012-07-28  5:02           ` Herton Ronaldo Krzesinski
2012-07-28 10:26             ` Mel Gorman
2012-07-30 15:39               ` Greg Kroah-Hartman
2012-07-30 15:37             ` Greg Kroah-Hartman
2012-07-30 15:38               ` Greg Kroah-Hartman
2012-07-26 21:29   ` [ 40/40] mm/hugetlb: fix warning in alloc_huge_page/dequeue_huge_page_vma Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120726211412.284677137@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=dchinner@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.