From: Rafael Aquini <aquini@redhat.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
akpm@linux-foundation.org, hughd@google.com, shli@kernel.org,
riel@redhat.com, lwoodman@redhat.com,
kosaki.motohiro@jp.fujitsu.com, kamezawa.hiroyu@jp.fujitsu.com,
stable@vger.kernel.org
Subject: Re: [PATCH] swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O compeletion
Date: Thu, 30 May 2013 18:56:57 -0300 [thread overview]
Message-ID: <20130530215656.GD13605@optiplex.redhat.com> (raw)
In-Reply-To: <20130530195539.GA27226@cmpxchg.org>
On Thu, May 30, 2013 at 03:55:39PM -0400, Johannes Weiner wrote:
> On Thu, May 30, 2013 at 03:05:00PM -0300, Rafael Aquini wrote:
> > read_swap_cache_async() can race against get_swap_page(), and stumble across
> > a SWAP_HAS_CACHE entry in the swap map whose page wasn't brought into the
> > swapcache yet. This transient swap_map state is expected to be transitory,
> > but the actual placement of discard at scan_swap_map() inserts a wait for
> > I/O completion thus making the thread at read_swap_cache_async() to loop
> > around its -EEXIST case, while the other end at get_swap_page()
> > is scheduled away at scan_swap_map(). This can leave the system deadlocked
> > if the I/O completion happens to be waiting on the CPU workqueue where
>
> waitqueue?
>
Ugh! I will repost this to fix it and the "compeletion" typo at subject...
> > read_swap_cache_async() is busy looping and !CONFIG_PREEMPT.
> >
> > This patch introduces a cond_resched() call to make the aforementioned
> > read_swap_cache_async() busy loop condition to bail out when necessary,
> > thus avoiding the subtle race window.
> >
> > Signed-off-by: Rafael Aquini <aquini@redhat.com>
> > ---
> > mm/swap_state.c | 14 +++++++++++++-
> > 1 file changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/swap_state.c b/mm/swap_state.c
> > index b3d40dc..9ad9e3b 100644
> > --- a/mm/swap_state.c
> > +++ b/mm/swap_state.c
> > @@ -336,8 +336,20 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
> > * Swap entry may have been freed since our caller observed it.
> > */
> > err = swapcache_prepare(entry);
> > - if (err == -EEXIST) { /* seems racy */
> > + if (err == -EEXIST) {
> > radix_tree_preload_end();
> > + /*
> > + * We might race against get_swap_page() and stumble
> > + * across a SWAP_HAS_CACHE swap_map entry whose page
> > + * has not been brought into the swapcache yet, while
> > + * the other end is scheduled away waiting on discard
> > + * I/O completion.
> > + * In order to avoid turning this transitory state
> > + * into a permanent loop around this -EEXIST case,
> > + * lets just conditionally invoke the scheduler,
> > + * if there are some more important tasks to run.
> > + */
> > + cond_resched();
>
> Might be worth mentioning the !CONFIG_PREEMPT deadlock scenario here,
> especially since under CONFIG_PREEMPT the radix_tree_preload_end() is
> already a scheduling point through the preempt_enable().
>
Nice suggestion, will do it. Thanks for reviewing this patch!
> Other than that, the patch looks good to me!
>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Rafael Aquini <aquini@redhat.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
akpm@linux-foundation.org, hughd@google.com, shli@kernel.org,
riel@redhat.com, lwoodman@redhat.com,
kosaki.motohiro@jp.fujitsu.com, kamezawa.hiroyu@jp.fujitsu.com,
stable@vger.kernel.org
Subject: Re: [PATCH] swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O compeletion
Date: Thu, 30 May 2013 18:56:57 -0300 [thread overview]
Message-ID: <20130530215656.GD13605@optiplex.redhat.com> (raw)
In-Reply-To: <20130530195539.GA27226@cmpxchg.org>
On Thu, May 30, 2013 at 03:55:39PM -0400, Johannes Weiner wrote:
> On Thu, May 30, 2013 at 03:05:00PM -0300, Rafael Aquini wrote:
> > read_swap_cache_async() can race against get_swap_page(), and stumble across
> > a SWAP_HAS_CACHE entry in the swap map whose page wasn't brought into the
> > swapcache yet. This transient swap_map state is expected to be transitory,
> > but the actual placement of discard at scan_swap_map() inserts a wait for
> > I/O completion thus making the thread at read_swap_cache_async() to loop
> > around its -EEXIST case, while the other end at get_swap_page()
> > is scheduled away at scan_swap_map(). This can leave the system deadlocked
> > if the I/O completion happens to be waiting on the CPU workqueue where
>
> waitqueue?
>
Ugh! I will repost this to fix it and the "compeletion" typo at subject...
> > read_swap_cache_async() is busy looping and !CONFIG_PREEMPT.
> >
> > This patch introduces a cond_resched() call to make the aforementioned
> > read_swap_cache_async() busy loop condition to bail out when necessary,
> > thus avoiding the subtle race window.
> >
> > Signed-off-by: Rafael Aquini <aquini@redhat.com>
> > ---
> > mm/swap_state.c | 14 +++++++++++++-
> > 1 file changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/swap_state.c b/mm/swap_state.c
> > index b3d40dc..9ad9e3b 100644
> > --- a/mm/swap_state.c
> > +++ b/mm/swap_state.c
> > @@ -336,8 +336,20 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
> > * Swap entry may have been freed since our caller observed it.
> > */
> > err = swapcache_prepare(entry);
> > - if (err == -EEXIST) { /* seems racy */
> > + if (err == -EEXIST) {
> > radix_tree_preload_end();
> > + /*
> > + * We might race against get_swap_page() and stumble
> > + * across a SWAP_HAS_CACHE swap_map entry whose page
> > + * has not been brought into the swapcache yet, while
> > + * the other end is scheduled away waiting on discard
> > + * I/O completion.
> > + * In order to avoid turning this transitory state
> > + * into a permanent loop around this -EEXIST case,
> > + * lets just conditionally invoke the scheduler,
> > + * if there are some more important tasks to run.
> > + */
> > + cond_resched();
>
> Might be worth mentioning the !CONFIG_PREEMPT deadlock scenario here,
> especially since under CONFIG_PREEMPT the radix_tree_preload_end() is
> already a scheduling point through the preempt_enable().
>
Nice suggestion, will do it. Thanks for reviewing this patch!
> Other than that, the patch looks good to me!
>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
next prev parent reply other threads:[~2013-05-30 21:57 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-30 18:05 [PATCH] swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O compeletion Rafael Aquini
2013-05-30 18:05 ` Rafael Aquini
2013-05-30 18:32 ` Greg KH
2013-05-30 18:32 ` Greg KH
2013-05-30 19:55 ` Johannes Weiner
2013-05-30 19:55 ` Johannes Weiner
2013-05-30 21:56 ` Rafael Aquini [this message]
2013-05-30 21:56 ` Rafael Aquini
2013-05-30 19:59 ` KOSAKI Motohiro
2013-05-30 19:59 ` KOSAKI Motohiro
2013-05-30 22:02 ` Hugh Dickins
2013-05-30 22:02 ` Hugh Dickins
2013-05-30 22:49 ` [PATCH v2] swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion Rafael Aquini
2013-05-30 22:49 ` Rafael Aquini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130530215656.GD13605@optiplex.redhat.com \
--to=aquini@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lwoodman@redhat.com \
--cc=riel@redhat.com \
--cc=shli@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.