linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Rafael Aquini <aquini@redhat.com>
To: Shaohua Li <shli@kernel.org>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org, riel@redhat.com,
	minchan@kernel.org, kmpark@infradead.org, hughd@google.com
Subject: Re: [patch 3/4 v4]swap: fix races exposed by swap discard
Date: Fri, 29 Mar 2013 00:14:21 -0300	[thread overview]
Message-ID: <20130329031420.GC19721@optiplex.redhat.com> (raw)
In-Reply-To: <20130326053827.GC19646@kernel.org>

On Tue, Mar 26, 2013 at 01:38:27PM +0800, Shaohua Li wrote:
> Last patch can expose races, according to Hugh:
> 
> swapoff was sometimes failing with "Cannot allocate memory", coming from
> try_to_unuse()'s -ENOMEM: it needs to allow for swap_duplicate() failing on a
> free entry temporarily SWAP_MAP_BAD while being discarded.
> 
> We should use ACCESS_ONCE() there, and whenever accessing swap_map locklessly;
> but rather than peppering it throughout try_to_unuse(), just declare *swap_map
> with volatile.
> 
> try_to_unuse() is accustomed to *swap_map going down racily, but not
> necessarily to it jumping up from 0 to SWAP_MAP_BAD: we'll be safer to prevent
> that transition once SWP_WRITEOK is switched off, when it's a waste of time to
> issue discards anyway (swapon can do a whole discard).
> 
> Another issue is:
> 
> In swapin_readahead(), read_swap_cache_async() can read a bad swap entry,
> because we don't check if readahead swap entry is bad. This doesn't break
> anything but such swapin page is wasteful and can only be freed at page
> reclaim. We avoid read such swap entry.
> 
> And next patch will mark a swap entry bad temporarily for discard. Without this
> patch, swap entry count will be messed.
> 
> Thanks Hugh to inspire swapin_readahead could use bad swap entry.
> 
> [include Hugh's patch 'swap: fix swapoff ENOMEMs from discard']
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: Shaohua Li <shli@fusionio.com>
> ---

Acked-by: Rafael Aquini <aquini@redhat.com>


>  mm/swapfile.c |   15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> Index: linux/mm/swapfile.c
> ===================================================================
> --- linux.orig/mm/swapfile.c	2013-03-22 17:28:06.000000000 +0800
> +++ linux/mm/swapfile.c	2013-03-22 17:40:51.580356594 +0800
> @@ -331,7 +331,8 @@ static inline void dec_cluster_info_page
>  		 * instead of free it immediately. The cluster will be freed
>  		 * after discard.
>  		 */
> -		if (p->flags & SWP_DISCARDABLE) {
> +		if ((p->flags & (SWP_WRITEOK | SWP_DISCARDABLE)) ==
> +				 (SWP_WRITEOK | SWP_DISCARDABLE)) {
>  			swap_cluster_schedule_discard(p, idx);
>  			return;
>  		}
> @@ -1228,7 +1229,7 @@ static unsigned int find_next_to_unuse(s
>  			else
>  				continue;
>  		}
> -		count = si->swap_map[i];
> +		count = ACCESS_ONCE(si->swap_map[i]);
>  		if (count && swap_count(count) != SWAP_MAP_BAD)
>  			break;
>  	}
> @@ -1248,7 +1249,7 @@ int try_to_unuse(unsigned int type, bool
>  {
>  	struct swap_info_struct *si = swap_info[type];
>  	struct mm_struct *start_mm;
> -	unsigned char *swap_map;
> +	volatile unsigned char *swap_map;	/* ACCESS_ONCE throughout */
>  	unsigned char swcount;
>  	struct page *page;
>  	swp_entry_t entry;
> @@ -1299,7 +1300,8 @@ int try_to_unuse(unsigned int type, bool
>  			 * reused since sys_swapoff() already disabled
>  			 * allocation from here, or alloc_page() failed.
>  			 */
> -			if (!*swap_map)
> +			swcount = *swap_map;
> +			if (!swcount || swcount == SWAP_MAP_BAD)
>  				continue;
>  			retval = -ENOMEM;
>  			break;
> @@ -2432,6 +2434,11 @@ static int __swap_duplicate(swp_entry_t
>  		goto unlock_out;
>  
>  	count = p->swap_map[offset];
> +	if (unlikely(swap_count(count) == SWAP_MAP_BAD)) {
> +		err = -ENOENT;
> +		goto unlock_out;
> +	}
> +
>  	has_cache = count & SWAP_HAS_CACHE;
>  	count &= ~SWAP_HAS_CACHE;
>  	err = 0;
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

      reply	other threads:[~2013-03-29  3:14 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-26  5:38 [patch 3/4 v4]swap: fix races exposed by swap discard Shaohua Li
2013-03-29  3:14 ` Rafael Aquini [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130329031420.GC19721@optiplex.redhat.com \
    --to=aquini@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kmpark@infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).