public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org, stable@kernel.org
Cc: stable-review@kernel.org, torvalds@linux-foundation.org,
	akpm@linux-foundation.org, alan@lxorguk.ukuu.org.uk,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	Johannes Weiner <hannes@cmpxchg.org>
Subject: [patch 21/26] mm: add_to_swap_cache() must not sleep
Date: Fri, 09 Oct 2009 16:08:57 -0700	[thread overview]
Message-ID: <20091009231002.768016040@mini.kroah.org> (raw)
In-Reply-To: <20091009231249.GA31084@kroah.com>

[-- Attachment #1: mm-add_to_swap_cache-must-not-sleep.patch --]
[-- Type: text/plain, Size: 4846 bytes --]

2.6.31-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

commit 31a5639623a487d6db996c8138c9e53fef2e2d91 upstream.

After commit 355cfa73 ("mm: modify swap_map and add SWAP_HAS_CACHE flag"),
read_swap_cache_async() will busy-wait while a entry doesn't exist in swap
cache but it has SWAP_HAS_CACHE flag.

Such entries can exist on add/delete path of swap cache.  On add path,
add_to_swap_cache() is called soon after SWAP_HAS_CACHE flag is set, and
on delete path, swapcache_free() will be called (SWAP_HAS_CACHE flag is
cleared) soon after __delete_from_swap_cache() is called.  So, the
busy-wait works well in most cases.

But this mechanism can cause soft lockup if add_to_swap_cache() sleeps and
read_swap_cache_async() tries to swap-in the same entry on the same cpu.

This patch calls radix_tree_preload() before swapcache_prepare() and
divides add_to_swap_cache() into two part: radix_tree_preload() part and
radix_tree_insert() part(define it as __add_to_swap_cache()).

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 mm/swap_state.c |   70 ++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 46 insertions(+), 24 deletions(-)

--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -66,10 +66,10 @@ void show_swap_cache_info(void)
 }
 
 /*
- * add_to_swap_cache resembles add_to_page_cache_locked on swapper_space,
+ * __add_to_swap_cache resembles add_to_page_cache_locked on swapper_space,
  * but sets SwapCache flag and private instead of mapping and index.
  */
-int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp_mask)
+static int __add_to_swap_cache(struct page *page, swp_entry_t entry)
 {
 	int error;
 
@@ -77,28 +77,37 @@ int add_to_swap_cache(struct page *page,
 	VM_BUG_ON(PageSwapCache(page));
 	VM_BUG_ON(!PageSwapBacked(page));
 
+	page_cache_get(page);
+	SetPageSwapCache(page);
+	set_page_private(page, entry.val);
+
+	spin_lock_irq(&swapper_space.tree_lock);
+	error = radix_tree_insert(&swapper_space.page_tree, entry.val, page);
+	if (likely(!error)) {
+		total_swapcache_pages++;
+		__inc_zone_page_state(page, NR_FILE_PAGES);
+		INC_CACHE_INFO(add_total);
+	}
+	spin_unlock_irq(&swapper_space.tree_lock);
+
+	if (unlikely(error)) {
+		set_page_private(page, 0UL);
+		ClearPageSwapCache(page);
+		page_cache_release(page);
+	}
+
+	return error;
+}
+
+
+int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp_mask)
+{
+	int error;
+
 	error = radix_tree_preload(gfp_mask);
 	if (!error) {
-		page_cache_get(page);
-		SetPageSwapCache(page);
-		set_page_private(page, entry.val);
-
-		spin_lock_irq(&swapper_space.tree_lock);
-		error = radix_tree_insert(&swapper_space.page_tree,
-						entry.val, page);
-		if (likely(!error)) {
-			total_swapcache_pages++;
-			__inc_zone_page_state(page, NR_FILE_PAGES);
-			INC_CACHE_INFO(add_total);
-		}
-		spin_unlock_irq(&swapper_space.tree_lock);
+		error = __add_to_swap_cache(page, entry);
 		radix_tree_preload_end();
-
-		if (unlikely(error)) {
-			set_page_private(page, 0UL);
-			ClearPageSwapCache(page);
-			page_cache_release(page);
-		}
 	}
 	return error;
 }
@@ -289,13 +298,24 @@ struct page *read_swap_cache_async(swp_e
 		}
 
 		/*
+		 * call radix_tree_preload() while we can wait.
+		 */
+		err = radix_tree_preload(gfp_mask & GFP_KERNEL);
+		if (err)
+			break;
+
+		/*
 		 * Swap entry may have been freed since our caller observed it.
 		 */
 		err = swapcache_prepare(entry);
-		if (err == -EEXIST) /* seems racy */
+		if (err == -EEXIST) {	/* seems racy */
+			radix_tree_preload_end();
 			continue;
-		if (err)           /* swp entry is obsolete ? */
+		}
+		if (err) {		/* swp entry is obsolete ? */
+			radix_tree_preload_end();
 			break;
+		}
 
 		/*
 		 * Associate the page with swap entry in the swap cache.
@@ -307,8 +327,9 @@ struct page *read_swap_cache_async(swp_e
 		 */
 		__set_page_locked(new_page);
 		SetPageSwapBacked(new_page);
-		err = add_to_swap_cache(new_page, entry, gfp_mask & GFP_KERNEL);
+		err = __add_to_swap_cache(new_page, entry);
 		if (likely(!err)) {
+			radix_tree_preload_end();
 			/*
 			 * Initiate read into locked page and return.
 			 */
@@ -316,6 +337,7 @@ struct page *read_swap_cache_async(swp_e
 			swap_readpage(new_page);
 			return new_page;
 		}
+		radix_tree_preload_end();
 		ClearPageSwapBacked(new_page);
 		__clear_page_locked(new_page);
 		swapcache_free(entry, NULL);



  parent reply	other threads:[~2009-10-09 23:18 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20091009230836.316410305@mini.kroah.org>
2009-10-09 23:12 ` [patch 00/26] 2.6.31.4-stable review Greg KH
2009-10-09 23:08   ` [patch 01/26] x86: fix csum_ipv6_magic asm memory clobber Greg KH
2009-10-09 23:08   ` [patch 02/26] tty: Avoid dropping ldisc_mutex over hangup tty re-initialization Greg KH
2009-10-09 23:08   ` [patch 03/26] x86: Dont leak 64-bit kernel register values to 32-bit processes Greg KH
2009-10-09 23:08   ` [patch 04/26] ALSA: hda - Added quirk to enable sound on Toshiba NB200 Greg KH
2009-10-09 23:08   ` [patch 05/26] tracing: correct module boundaries for ftrace_release Greg KH
2009-10-09 23:08   ` [patch 06/26] ftrace: check for failure for all conversions Greg KH
2009-10-09 23:08   ` [patch 07/26] futex: fix requeue_pi key imbalance Greg KH
2009-10-09 23:08   ` [patch 08/26] futex: Move exit_pi_state() call to release_mm() Greg KH
2009-10-09 23:08   ` [patch 09/26] futex: Nullify robust lists after cleanup Greg KH
2009-10-09 23:08   ` [patch 10/26] futex: Fix locking imbalance Greg KH
2009-10-09 23:08   ` [patch 11/26] NOHZ: update idle state also when NOHZ is inactive Greg KH
2009-10-09 23:08   ` [patch 12/26] ima: ecryptfs fix imbalance message Greg KH
2009-10-09 23:08   ` [patch 13/26] libata: fix incorrect link online check during probe Greg KH
2009-10-09 23:08   ` [patch 14/26] sound: via82xx: move DXS volume controls to PCM interface Greg KH
2009-10-09 23:08   ` [patch 15/26] ASoC: WM8350 capture PGA mutes are inverted Greg KH
2009-10-09 23:08   ` [patch 16/26] KVM: Prevent overflow in KVM_GET_SUPPORTED_CPUID Greg KH
2009-10-09 23:08   ` [patch 17/26] KVM: VMX: flush TLB with INVEPT on cpu migration Greg KH
2009-10-09 23:08   ` [patch 18/26] KVM: fix LAPIC timer period overflow Greg KH
2009-10-09 23:08   ` [patch 19/26] KVM: SVM: Fix tsc offset adjustment when running nested Greg KH
2009-10-09 23:08   ` [patch 20/26] net: Fix wrong sizeof Greg KH
2009-10-09 23:08   ` Greg KH [this message]
2009-10-09 23:08   ` [patch 22/26] sis5513: fix PIO setup for ATAPI devices Greg KH
2009-10-09 23:08   ` [patch 23/26] PIT fixes to unbreak suspend/resume (bug #14222) Greg KH
2009-10-09 23:09   ` [patch 24/26] IMA: open new file for read Greg KH
2009-10-09 23:09   ` [patch 25/26] ACPI: Clarify resource conflict message Greg KH
2009-10-09 23:09   ` [patch 26/26] ACPI: fix Compaq Evo N800c (Pentium 4m) boot hang regression Greg KH
2009-10-09 23:38   ` [patch 00/26] 2.6.31.4-stable review Greg KH
2009-10-09 23:34     ` [patch 27/37] net: restore tx timestamping for accelerated vlans Greg KH
2009-10-09 23:34     ` [patch 28/37] net: unix: fix sending fds in multiple buffers Greg KH
2009-10-09 23:34     ` [patch 29/37] tun: Return -EINVAL if neither IFF_TUN nor IFF_TAP is set Greg KH
2009-10-13 12:36       ` [Stable-review] " Stefan Bader
2009-10-09 23:34     ` [patch 30/37] tcp: fix CONFIG_TCP_MD5SIG + CONFIG_PREEMPT timer BUG() Greg KH
2009-10-09 23:34     ` [patch 31/37] net: Fix sock_wfree() race Greg KH
2009-10-09 23:34     ` [patch 32/37] smsc95xx: fix transmission where ZLP is expected Greg KH
2009-10-09 23:34     ` [patch 33/37] sky2: Set SKY2_HW_RAM_BUFFER in sky2_init Greg KH
2009-10-09 23:34     ` [patch 34/37] appletalk: Fix skb leak when ipddp interface is not loaded Greg KH
2009-10-09 23:34     ` [patch 35/37] ax25: Fix possible oops in ax25_make_new Greg KH
2009-10-09 23:34     ` [patch 36/37] ax25: Fix SIOCAX25GETINFO ioctl Greg KH
2009-10-09 23:34     ` [patch 37/37] sit: fix off-by-one in ipip6_tunnel_get_prl Greg KH
2009-10-10  0:34       ` Templin, Fred L
2009-10-10  3:42         ` David Miller
2009-10-11  1:29           ` Wolfgang Walter
2009-10-12 23:58             ` Templin, Fred L
2009-10-12 22:04           ` [stable] " Greg KH
2009-10-12 23:29             ` [stable] [patch 37/37] sit: fix off-by-one inipip6_tunnel_get_prl Templin, Fred L
2009-10-12 23:58               ` Greg KH
2009-10-13  0:12                 ` David Miller
2009-10-12 23:12           ` [patch 37/37] sit: fix off-by-one in ipip6_tunnel_get_prl Templin, Fred L
2009-10-10  7:17     ` [Stable-review] [patch 00/26] 2.6.31.4-stable review Willy Tarreau
2009-10-10  7:22       ` [stable] " Greg KH
2009-10-10  7:46         ` Willy Tarreau
2009-10-12 11:09     ` Thomas Voegtle
2009-10-12 12:34       ` [Stable-review] " Chuck Ebbert
     [not found] <20091009233411.852013234@mini.kroah.org>
     [not found] ` <20091009233440.7866800 01@mini.kroah.org>

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091009231002.768016040@mini.kroah.org \
    --to=gregkh@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=hannes@cmpxchg.org \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=stable-review@kernel.org \
    --cc=stable@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox