All of lore.kernel.org
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Weijie Yang <weijie.yang@samsung.com>
Cc: akpm@linux-foundation.org, sjenning@linux.vnet.ibm.com,
	bob.liu@oracle.com, weijie.yang.kh@gmail.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	d.j.shin@samsung.com, heesub.shin@samsung.com,
	kyungmin.park@samsung.com, hau.chen@samsung.com,
	bifeng.tong@samsung.com, rui.xie@samsung.com
Subject: Re: [PATCH v3 2/3] mm/zswap: bugfix: memory leak when invalidate and reclaim occur concurrently
Date: Tue, 24 Sep 2013 10:03:08 +0900	[thread overview]
Message-ID: <20130924010308.GG17725@bbox> (raw)
In-Reply-To: <000201ceb836$4c549740$e4fdc5c0$%yang@samsung.com>

On Mon, Sep 23, 2013 at 04:21:49PM +0800, Weijie Yang wrote:
> Consider the following scenario:
> thread 0: reclaim entry x (get refcount, but not call zswap_get_swap_cache_page)
> thread 1: call zswap_frontswap_invalidate_page to invalidate entry x.
> 	finished, entry x and its zbud is not freed as its refcount != 0
> 	now, the swap_map[x] = 0
> thread 0: now call zswap_get_swap_cache_page
> 	swapcache_prepare return -ENOENT because entry x is not used any more
> 	zswap_get_swap_cache_page return ZSWAP_SWAPCACHE_NOMEM
> 	zswap_writeback_entry do nothing except put refcount
> Now, the memory of zswap_entry x and its zpage leak.
> 
> Modify:
>  - check the refcount in fail path, free memory if it is not referenced.

Hmm, I don't like this because zswap refcount routine is already mess for me.
I'm not sure why it was designed from the beginning. I hope we should fix it first.

1. zswap_rb_serach could include zswap_entry_get semantic if it founds a entry from
   the tree. Of course, we should ranme it as find_get_zswap_entry like find_get_page.
2. zswap_entry_put could hide resource free function like zswap_free_entry so that
   all of caller can use it easily following pattern.
   
  find_get_zswap_entry
  ...
  ...
  zswap_entry_put

Of course, zswap_entry_put have to check the entry is in the tree or not
so if someone already removes it from the tree, it should avoid double remove.

One of the concern I can think is that approach extends critical section
but I think it would be no problem because more bottleneck would be [de]compress
functions. If it were really problem, we can mitigate a problem with moving
unnecessary functions out of zswap_free_entry because it seem to be rather
over-enginnering.

>  - use ZSWAP_SWAPCACHE_FAIL instead of ZSWAP_SWAPCACHE_NOMEM as the fail path
> can be not only caused by nomem but also by invalidate.
> 
> Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
> Reviewed-by: Bob Liu <bob.liu@oracle.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: stable@vger.kernel.org
> Acked-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
> ---
>  mm/zswap.c |   21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/zswap.c b/mm/zswap.c
> index cbd9578..1be7b90 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -387,7 +387,7 @@ static void zswap_free_entry(struct zswap_tree *tree, struct zswap_entry *entry)
>  enum zswap_get_swap_ret {
>  	ZSWAP_SWAPCACHE_NEW,
>  	ZSWAP_SWAPCACHE_EXIST,
> -	ZSWAP_SWAPCACHE_NOMEM
> +	ZSWAP_SWAPCACHE_FAIL,
>  };
>  
>  /*
> @@ -401,9 +401,9 @@ enum zswap_get_swap_ret {
>   * added to the swap cache, and returned in retpage.
>   *
>   * If success, the swap cache page is returned in retpage
> - * Returns 0 if page was already in the swap cache, page is not locked
> - * Returns 1 if the new page needs to be populated, page is locked
> - * Returns <0 on error
> + * Returns ZSWAP_SWAPCACHE_EXIST if page was already in the swap cache
> + * Returns ZSWAP_SWAPCACHE_NEW if the new page needs to be populated, page is locked
> + * Returns ZSWAP_SWAPCACHE_FAIL on error
>   */
>  static int zswap_get_swap_cache_page(swp_entry_t entry,
>  				struct page **retpage)
> @@ -475,7 +475,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry,
>  	if (new_page)
>  		page_cache_release(new_page);
>  	if (!found_page)
> -		return ZSWAP_SWAPCACHE_NOMEM;
> +		return ZSWAP_SWAPCACHE_FAIL;
>  	*retpage = found_page;
>  	return ZSWAP_SWAPCACHE_EXIST;
>  }
> @@ -529,11 +529,11 @@ static int zswap_writeback_entry(struct zbud_pool *pool, unsigned long handle)
>  
>  	/* try to allocate swap cache page */
>  	switch (zswap_get_swap_cache_page(swpentry, &page)) {
> -	case ZSWAP_SWAPCACHE_NOMEM: /* no memory */
> +	case ZSWAP_SWAPCACHE_FAIL: /* no memory or invalidate happened */
>  		ret = -ENOMEM;
>  		goto fail;
>  
> -	case ZSWAP_SWAPCACHE_EXIST: /* page is unlocked */
> +	case ZSWAP_SWAPCACHE_EXIST:
>  		/* page is already in the swap cache, ignore for now */
>  		page_cache_release(page);
>  		ret = -EEXIST;
> @@ -591,7 +591,12 @@ static int zswap_writeback_entry(struct zbud_pool *pool, unsigned long handle)
>  
>  fail:
>  	spin_lock(&tree->lock);
> -	zswap_entry_put(entry);
> +	refcount = zswap_entry_put(entry);
> +	if (refcount <= 0) {
> +		/* invalidate happened, consider writeback as success */
> +		zswap_free_entry(tree, entry);
> +		ret = 0;
> +	}
>  	spin_unlock(&tree->lock);
>  	return ret;
>  }
> -- 
> 1.7.10.4
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Minchan Kim <minchan@kernel.org>
To: Weijie Yang <weijie.yang@samsung.com>
Cc: akpm@linux-foundation.org, sjenning@linux.vnet.ibm.com,
	bob.liu@oracle.com, weijie.yang.kh@gmail.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	d.j.shin@samsung.com, heesub.shin@samsung.com,
	kyungmin.park@samsung.com, hau.chen@samsung.com,
	bifeng.tong@samsung.com, rui.xie@samsung.com
Subject: Re: [PATCH v3 2/3] mm/zswap: bugfix: memory leak when invalidate and reclaim occur concurrently
Date: Tue, 24 Sep 2013 10:03:08 +0900	[thread overview]
Message-ID: <20130924010308.GG17725@bbox> (raw)
In-Reply-To: <000201ceb836$4c549740$e4fdc5c0$%yang@samsung.com>

On Mon, Sep 23, 2013 at 04:21:49PM +0800, Weijie Yang wrote:
> Consider the following scenario:
> thread 0: reclaim entry x (get refcount, but not call zswap_get_swap_cache_page)
> thread 1: call zswap_frontswap_invalidate_page to invalidate entry x.
> 	finished, entry x and its zbud is not freed as its refcount != 0
> 	now, the swap_map[x] = 0
> thread 0: now call zswap_get_swap_cache_page
> 	swapcache_prepare return -ENOENT because entry x is not used any more
> 	zswap_get_swap_cache_page return ZSWAP_SWAPCACHE_NOMEM
> 	zswap_writeback_entry do nothing except put refcount
> Now, the memory of zswap_entry x and its zpage leak.
> 
> Modify:
>  - check the refcount in fail path, free memory if it is not referenced.

Hmm, I don't like this because zswap refcount routine is already mess for me.
I'm not sure why it was designed from the beginning. I hope we should fix it first.

1. zswap_rb_serach could include zswap_entry_get semantic if it founds a entry from
   the tree. Of course, we should ranme it as find_get_zswap_entry like find_get_page.
2. zswap_entry_put could hide resource free function like zswap_free_entry so that
   all of caller can use it easily following pattern.
   
  find_get_zswap_entry
  ...
  ...
  zswap_entry_put

Of course, zswap_entry_put have to check the entry is in the tree or not
so if someone already removes it from the tree, it should avoid double remove.

One of the concern I can think is that approach extends critical section
but I think it would be no problem because more bottleneck would be [de]compress
functions. If it were really problem, we can mitigate a problem with moving
unnecessary functions out of zswap_free_entry because it seem to be rather
over-enginnering.

>  - use ZSWAP_SWAPCACHE_FAIL instead of ZSWAP_SWAPCACHE_NOMEM as the fail path
> can be not only caused by nomem but also by invalidate.
> 
> Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
> Reviewed-by: Bob Liu <bob.liu@oracle.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: stable@vger.kernel.org
> Acked-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
> ---
>  mm/zswap.c |   21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/zswap.c b/mm/zswap.c
> index cbd9578..1be7b90 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -387,7 +387,7 @@ static void zswap_free_entry(struct zswap_tree *tree, struct zswap_entry *entry)
>  enum zswap_get_swap_ret {
>  	ZSWAP_SWAPCACHE_NEW,
>  	ZSWAP_SWAPCACHE_EXIST,
> -	ZSWAP_SWAPCACHE_NOMEM
> +	ZSWAP_SWAPCACHE_FAIL,
>  };
>  
>  /*
> @@ -401,9 +401,9 @@ enum zswap_get_swap_ret {
>   * added to the swap cache, and returned in retpage.
>   *
>   * If success, the swap cache page is returned in retpage
> - * Returns 0 if page was already in the swap cache, page is not locked
> - * Returns 1 if the new page needs to be populated, page is locked
> - * Returns <0 on error
> + * Returns ZSWAP_SWAPCACHE_EXIST if page was already in the swap cache
> + * Returns ZSWAP_SWAPCACHE_NEW if the new page needs to be populated, page is locked
> + * Returns ZSWAP_SWAPCACHE_FAIL on error
>   */
>  static int zswap_get_swap_cache_page(swp_entry_t entry,
>  				struct page **retpage)
> @@ -475,7 +475,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry,
>  	if (new_page)
>  		page_cache_release(new_page);
>  	if (!found_page)
> -		return ZSWAP_SWAPCACHE_NOMEM;
> +		return ZSWAP_SWAPCACHE_FAIL;
>  	*retpage = found_page;
>  	return ZSWAP_SWAPCACHE_EXIST;
>  }
> @@ -529,11 +529,11 @@ static int zswap_writeback_entry(struct zbud_pool *pool, unsigned long handle)
>  
>  	/* try to allocate swap cache page */
>  	switch (zswap_get_swap_cache_page(swpentry, &page)) {
> -	case ZSWAP_SWAPCACHE_NOMEM: /* no memory */
> +	case ZSWAP_SWAPCACHE_FAIL: /* no memory or invalidate happened */
>  		ret = -ENOMEM;
>  		goto fail;
>  
> -	case ZSWAP_SWAPCACHE_EXIST: /* page is unlocked */
> +	case ZSWAP_SWAPCACHE_EXIST:
>  		/* page is already in the swap cache, ignore for now */
>  		page_cache_release(page);
>  		ret = -EEXIST;
> @@ -591,7 +591,12 @@ static int zswap_writeback_entry(struct zbud_pool *pool, unsigned long handle)
>  
>  fail:
>  	spin_lock(&tree->lock);
> -	zswap_entry_put(entry);
> +	refcount = zswap_entry_put(entry);
> +	if (refcount <= 0) {
> +		/* invalidate happened, consider writeback as success */
> +		zswap_free_entry(tree, entry);
> +		ret = 0;
> +	}
>  	spin_unlock(&tree->lock);
>  	return ret;
>  }
> -- 
> 1.7.10.4
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

  reply	other threads:[~2013-09-24  1:02 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-23  8:21 [PATCH v3 2/3] mm/zswap: bugfix: memory leak when invalidate and reclaim occur concurrently Weijie Yang
2013-09-23  8:21 ` Weijie Yang
2013-09-24  1:03 ` Minchan Kim [this message]
2013-09-24  1:03   ` Minchan Kim
2013-09-26  3:42   ` Weijie Yang
2013-09-26  3:42     ` Weijie Yang
2013-10-10 19:54     ` Andrew Morton
2013-10-10 19:54       ` Andrew Morton
2013-10-11  7:13     ` Minchan Kim
2013-10-11  7:13       ` Minchan Kim
2013-10-12  2:50       ` Bob Liu
2013-10-12  2:50         ` Bob Liu
2013-10-12  9:14         ` Weijie Yang
2013-10-12  9:14           ` Weijie Yang
2013-10-12  9:29           ` Bob Liu
2013-10-12  9:29             ` Bob Liu
2013-10-12  8:41       ` Weijie Yang
2013-10-12  8:41         ` Weijie Yang
2013-10-12  9:37       ` Weijie Yang
2013-10-12  9:37         ` Weijie Yang
2013-10-14  0:31         ` Minchan Kim
2013-10-14  0:31           ` Minchan Kim
2013-10-14  8:05           ` Weijie Yang
2013-10-14  8:05             ` Weijie Yang
2013-10-12  2:32     ` Bob Liu
2013-10-12  2:32       ` Bob Liu
2013-10-12  8:45       ` Weijie Yang
2013-10-12  8:45         ` Weijie Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130924010308.GG17725@bbox \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=bifeng.tong@samsung.com \
    --cc=bob.liu@oracle.com \
    --cc=d.j.shin@samsung.com \
    --cc=hau.chen@samsung.com \
    --cc=heesub.shin@samsung.com \
    --cc=kyungmin.park@samsung.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rui.xie@samsung.com \
    --cc=sjenning@linux.vnet.ibm.com \
    --cc=stable@vger.kernel.org \
    --cc=weijie.yang.kh@gmail.com \
    --cc=weijie.yang@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.