linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O compeletion
@ 2013-05-30 18:05 Rafael Aquini
  2013-05-30 18:32 ` Greg KH
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Rafael Aquini @ 2013-05-30 18:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, hughd, shli, riel, lwoodman, kosaki.motohiro,
	kamezawa.hiroyu, stable

read_swap_cache_async() can race against get_swap_page(), and stumble across
a SWAP_HAS_CACHE entry in the swap map whose page wasn't brought into the
swapcache yet. This transient swap_map state is expected to be transitory,
but the actual placement of discard at scan_swap_map() inserts a wait for
I/O completion thus making the thread at read_swap_cache_async() to loop
around its -EEXIST case, while the other end at get_swap_page()
is scheduled away at scan_swap_map(). This can leave the system deadlocked
if the I/O completion happens to be waiting on the CPU workqueue where
read_swap_cache_async() is busy looping and !CONFIG_PREEMPT.

This patch introduces a cond_resched() call to make the aforementioned
read_swap_cache_async() busy loop condition to bail out when necessary,
thus avoiding the subtle race window.

Signed-off-by: Rafael Aquini <aquini@redhat.com>
---
 mm/swap_state.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/mm/swap_state.c b/mm/swap_state.c
index b3d40dc..9ad9e3b 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -336,8 +336,20 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
 		 * Swap entry may have been freed since our caller observed it.
 		 */
 		err = swapcache_prepare(entry);
-		if (err == -EEXIST) {	/* seems racy */
+		if (err == -EEXIST) {
 			radix_tree_preload_end();
+			/*
+			 * We might race against get_swap_page() and stumble
+			 * across a SWAP_HAS_CACHE swap_map entry whose page
+			 * has not been brought into the swapcache yet, while
+			 * the other end is scheduled away waiting on discard
+			 * I/O completion.
+			 * In order to avoid turning this transitory state
+			 * into a permanent loop around this -EEXIST case,
+			 * lets just conditionally invoke the scheduler,
+			 * if there are some more important tasks to run.
+			 */
+			cond_resched();
 			continue;
 		}
 		if (err) {		/* swp entry is obsolete ? */
-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-05-30 22:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-30 18:05 [PATCH] swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O compeletion Rafael Aquini
2013-05-30 18:32 ` Greg KH
2013-05-30 19:55 ` Johannes Weiner
2013-05-30 21:56   ` Rafael Aquini
2013-05-30 19:59 ` KOSAKI Motohiro
2013-05-30 22:02 ` Hugh Dickins
2013-05-30 22:49 ` [PATCH v2] swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion Rafael Aquini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).