All of lore.kernel.org
 help / color / mirror / Atom feed
From: Krzysztof Kozlowski <k.kozlowski@samsung.com>
To: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Bob Liu <bob.liu@oracle.com>, Mel Gorman <mgorman@suse.de>,
	Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	Kyungmin Park <kyungmin.park@samsung.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Minchan Kim <minchan@kernel.org>,
	Tomasz Stanislawski <t.stanislaws@samsung.com>
Subject: Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages
Date: Tue, 10 Sep 2013 09:16:51 +0200	[thread overview]
Message-ID: <1378797411.15425.17.camel@AMDC1943> (raw)
In-Reply-To: <20130909194733.GE4701@variantweb.net>

Hi,

On pon, 2013-09-09 at 14:47 -0500, Seth Jennings wrote:
> On Fri, Aug 30, 2013 at 10:42:53AM +0200, Krzysztof Kozlowski wrote:
> > Use page reference counter for zbud pages. The ref counter replaces
> > zbud_header.under_reclaim flag and ensures that zbud page won't be freed
> > when zbud_free() is called during reclaim. It allows implementation of
> > additional reclaim paths.
> 
> I like this idea.
> 
> > 
> > The page count is incremented when:
> >  - a handle is created and passed to zswap (in zbud_alloc()),
> >  - user-supplied eviction callback is called (in zbud_reclaim_page()).
> > 
> > Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
> > Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
> > Reviewed-by: Bob Liu <bob.liu@oracle.com>
> > ---
> >  mm/zbud.c |   97 +++++++++++++++++++++++++++++++++----------------------------
> >  1 file changed, 52 insertions(+), 45 deletions(-)
> > 
> > diff --git a/mm/zbud.c b/mm/zbud.c
> > index ad1e781..aa9a15c 100644
> > --- a/mm/zbud.c
> > +++ b/mm/zbud.c
> > @@ -109,7 +109,6 @@ struct zbud_header {
> >  	struct list_head lru;
> >  	unsigned int first_chunks;
> >  	unsigned int last_chunks;
> > -	bool under_reclaim;
> >  };
> > 
> >  /*****************
> > @@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page)
> >  	zhdr->last_chunks = 0;
> >  	INIT_LIST_HEAD(&zhdr->buddy);
> >  	INIT_LIST_HEAD(&zhdr->lru);
> > -	zhdr->under_reclaim = 0;
> >  	return zhdr;
> >  }
> > 
> > -/* Resets the struct page fields and frees the page */
> > -static void free_zbud_page(struct zbud_header *zhdr)
> > -{
> > -	__free_page(virt_to_page(zhdr));
> > -}
> > -
> >  /*
> >   * Encodes the handle of a particular buddy within a zbud page
> >   * Pool lock should be held as this function accesses first|last_chunks
> > @@ -188,6 +180,31 @@ static int num_free_chunks(struct zbud_header *zhdr)
> >  	return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1;
> >  }
> > 
> > +/*
> > + * Increases ref count for zbud page.
> > + */
> > +static void get_zbud_page(struct zbud_header *zhdr)
> > +{
> > +	get_page(virt_to_page(zhdr));
> > +}
> > +
> > +/*
> > + * Decreases ref count for zbud page and frees the page if it reaches 0
> > + * (no external references, e.g. handles).
> > + *
> > + * Returns 1 if page was freed and 0 otherwise.
> > + */
> > +static int put_zbud_page(struct zbud_header *zhdr)
> > +{
> > +	struct page *page = virt_to_page(zhdr);
> > +	if (put_page_testzero(page)) {
> > +		free_hot_cold_page(page, 0);
> > +		return 1;
> > +	}
> > +	return 0;
> > +}
> > +
> > +
> >  /*****************
> >   * API Functions
> >  *****************/
> > @@ -250,7 +267,7 @@ void zbud_destroy_pool(struct zbud_pool *pool)
> >  int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
> >  			unsigned long *handle)
> >  {
> > -	int chunks, i, freechunks;
> > +	int chunks, i;
> 
> This change (make freechunks a block local variable) is unrelated to the
> patch description and should be its own patch.

Sure, I will split this into 2 patches.


[...]
> >  		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> >  	}
> > 
> > +	put_zbud_page(zhdr);
> >  	spin_unlock(&pool->lock);
> >  }
> > 
> > @@ -400,7 +414,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
> >   */
> >  int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> >  {
> > -	int i, ret, freechunks;
> > +	int i, ret;
> >  	struct zbud_header *zhdr;
> >  	unsigned long first_handle = 0, last_handle = 0;
> > 
> > @@ -411,11 +425,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> >  		return -EINVAL;
> >  	}
> >  	for (i = 0; i < retries; i++) {
> > +		if (list_empty(&pool->lru)) {
> > +			/*
> > +			 * LRU was emptied during evict calls in previous
> > +			 * iteration but put_zbud_page() returned 0 meaning
> > +			 * that someone still holds the page. This may
> > +			 * happen when some other mm mechanism increased
> > +			 * the page count.
> > +			 * In such case we succedded with reclaim.
> > +			 */
> > +			spin_unlock(&pool->lock);
> > +			return 0;
> > +		}
> >  		zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru);
> > +		/* Move this last element to beginning of LRU */
> >  		list_del(&zhdr->lru);
> > -		list_del(&zhdr->buddy);
> > +		list_add(&zhdr->lru, &pool->lru);
> 
> This adds the entry back to the lru list, but the entry is never removed
> from the list.  I'm not sure how this could work.

The entry will be removed from LRU and un/buddied list in zbud_free()
called from eviction handler.

So this actually works well however I wonder if this may be a source of
race conditions because during the call to eviction handler, the zbud
header is still present on buddied list (till call to zbud_free()).

Another issue is that I haven't update the documentation for
zbud_reclaim_page(). I fix this in next version.


> >  		/* Protect zbud page against free */
> > -		zhdr->under_reclaim = true;
> > +		get_zbud_page(zhdr);
> >  		/*
> >  		 * We need encode the handles before unlocking, since we can
> >  		 * race with free that will set (first|last)_chunks to 0
> > @@ -440,29 +467,9 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> >  				goto next;
> >  		}
> >  next:
> > -		spin_lock(&pool->lock);
> > -		zhdr->under_reclaim = false;
> > -		if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> > -			/*
> > -			 * Both buddies are now free, free the zbud page and
> > -			 * return success.
> > -			 */
> > -			free_zbud_page(zhdr);
> > -			pool->pages_nr--;
> > -			spin_unlock(&pool->lock);
> > +		if (put_zbud_page(zhdr))
> 
> There is a single put_zbud_page() regardless of result of the eviction
> attempts.  What if both buddies of a buddied zbud page are evicted?
> Then you leak the zbud page.  What if the eviction in an unbuddied zbud
> page failed?  Then you prematurely free the page and crash later.

This single put_zbud_page() puts the reference obtained few lines
earlier.

If one or two buddies are evicted then the eviction handler should call
zbud_free(). zbud_free() will put the initial ref count (obtained from
zbud_alloc()).

If the eviction handler failed then zbud_free() won't be called so the
initial ref count still be valid and this put_zbud_page() won't free the
page.

Let me show the flow of ref counts:
1. zbud_alloc() for first buddy, ref count = 1
2. zbud_alloc() for last buddy, ref count = 2
3. zbud_reclaim_page() start, ref count = 3
4. calling user eviction handler
5. zbud_free() for first buddy (called from handler), ref count = 2
6. zbud_free() for last buddy (called from handler), ref count = 1
7. zbud_reclaim_page() end, ref count = 0 (free the page)

If eviction handler fails at step 5 or 6, then the ref count won't drop
to 0 as zswap still holds the initial reference to handle.


Thanks for comments. I appreciate them very much.


Best regards,
Krzysztof



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Krzysztof Kozlowski <k.kozlowski@samsung.com>
To: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Bob Liu <bob.liu@oracle.com>, Mel Gorman <mgorman@suse.de>,
	Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	Kyungmin Park <kyungmin.park@samsung.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Minchan Kim <minchan@kernel.org>,
	Tomasz Stanislawski <t.stanislaws@samsung.com>
Subject: Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages
Date: Tue, 10 Sep 2013 09:16:51 +0200	[thread overview]
Message-ID: <1378797411.15425.17.camel@AMDC1943> (raw)
In-Reply-To: <20130909194733.GE4701@variantweb.net>

Hi,

On pon, 2013-09-09 at 14:47 -0500, Seth Jennings wrote:
> On Fri, Aug 30, 2013 at 10:42:53AM +0200, Krzysztof Kozlowski wrote:
> > Use page reference counter for zbud pages. The ref counter replaces
> > zbud_header.under_reclaim flag and ensures that zbud page won't be freed
> > when zbud_free() is called during reclaim. It allows implementation of
> > additional reclaim paths.
> 
> I like this idea.
> 
> > 
> > The page count is incremented when:
> >  - a handle is created and passed to zswap (in zbud_alloc()),
> >  - user-supplied eviction callback is called (in zbud_reclaim_page()).
> > 
> > Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
> > Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
> > Reviewed-by: Bob Liu <bob.liu@oracle.com>
> > ---
> >  mm/zbud.c |   97 +++++++++++++++++++++++++++++++++----------------------------
> >  1 file changed, 52 insertions(+), 45 deletions(-)
> > 
> > diff --git a/mm/zbud.c b/mm/zbud.c
> > index ad1e781..aa9a15c 100644
> > --- a/mm/zbud.c
> > +++ b/mm/zbud.c
> > @@ -109,7 +109,6 @@ struct zbud_header {
> >  	struct list_head lru;
> >  	unsigned int first_chunks;
> >  	unsigned int last_chunks;
> > -	bool under_reclaim;
> >  };
> > 
> >  /*****************
> > @@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page)
> >  	zhdr->last_chunks = 0;
> >  	INIT_LIST_HEAD(&zhdr->buddy);
> >  	INIT_LIST_HEAD(&zhdr->lru);
> > -	zhdr->under_reclaim = 0;
> >  	return zhdr;
> >  }
> > 
> > -/* Resets the struct page fields and frees the page */
> > -static void free_zbud_page(struct zbud_header *zhdr)
> > -{
> > -	__free_page(virt_to_page(zhdr));
> > -}
> > -
> >  /*
> >   * Encodes the handle of a particular buddy within a zbud page
> >   * Pool lock should be held as this function accesses first|last_chunks
> > @@ -188,6 +180,31 @@ static int num_free_chunks(struct zbud_header *zhdr)
> >  	return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1;
> >  }
> > 
> > +/*
> > + * Increases ref count for zbud page.
> > + */
> > +static void get_zbud_page(struct zbud_header *zhdr)
> > +{
> > +	get_page(virt_to_page(zhdr));
> > +}
> > +
> > +/*
> > + * Decreases ref count for zbud page and frees the page if it reaches 0
> > + * (no external references, e.g. handles).
> > + *
> > + * Returns 1 if page was freed and 0 otherwise.
> > + */
> > +static int put_zbud_page(struct zbud_header *zhdr)
> > +{
> > +	struct page *page = virt_to_page(zhdr);
> > +	if (put_page_testzero(page)) {
> > +		free_hot_cold_page(page, 0);
> > +		return 1;
> > +	}
> > +	return 0;
> > +}
> > +
> > +
> >  /*****************
> >   * API Functions
> >  *****************/
> > @@ -250,7 +267,7 @@ void zbud_destroy_pool(struct zbud_pool *pool)
> >  int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
> >  			unsigned long *handle)
> >  {
> > -	int chunks, i, freechunks;
> > +	int chunks, i;
> 
> This change (make freechunks a block local variable) is unrelated to the
> patch description and should be its own patch.

Sure, I will split this into 2 patches.


[...]
> >  		list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> >  	}
> > 
> > +	put_zbud_page(zhdr);
> >  	spin_unlock(&pool->lock);
> >  }
> > 
> > @@ -400,7 +414,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
> >   */
> >  int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> >  {
> > -	int i, ret, freechunks;
> > +	int i, ret;
> >  	struct zbud_header *zhdr;
> >  	unsigned long first_handle = 0, last_handle = 0;
> > 
> > @@ -411,11 +425,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> >  		return -EINVAL;
> >  	}
> >  	for (i = 0; i < retries; i++) {
> > +		if (list_empty(&pool->lru)) {
> > +			/*
> > +			 * LRU was emptied during evict calls in previous
> > +			 * iteration but put_zbud_page() returned 0 meaning
> > +			 * that someone still holds the page. This may
> > +			 * happen when some other mm mechanism increased
> > +			 * the page count.
> > +			 * In such case we succedded with reclaim.
> > +			 */
> > +			spin_unlock(&pool->lock);
> > +			return 0;
> > +		}
> >  		zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru);
> > +		/* Move this last element to beginning of LRU */
> >  		list_del(&zhdr->lru);
> > -		list_del(&zhdr->buddy);
> > +		list_add(&zhdr->lru, &pool->lru);
> 
> This adds the entry back to the lru list, but the entry is never removed
> from the list.  I'm not sure how this could work.

The entry will be removed from LRU and un/buddied list in zbud_free()
called from eviction handler.

So this actually works well however I wonder if this may be a source of
race conditions because during the call to eviction handler, the zbud
header is still present on buddied list (till call to zbud_free()).

Another issue is that I haven't update the documentation for
zbud_reclaim_page(). I fix this in next version.


> >  		/* Protect zbud page against free */
> > -		zhdr->under_reclaim = true;
> > +		get_zbud_page(zhdr);
> >  		/*
> >  		 * We need encode the handles before unlocking, since we can
> >  		 * race with free that will set (first|last)_chunks to 0
> > @@ -440,29 +467,9 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> >  				goto next;
> >  		}
> >  next:
> > -		spin_lock(&pool->lock);
> > -		zhdr->under_reclaim = false;
> > -		if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> > -			/*
> > -			 * Both buddies are now free, free the zbud page and
> > -			 * return success.
> > -			 */
> > -			free_zbud_page(zhdr);
> > -			pool->pages_nr--;
> > -			spin_unlock(&pool->lock);
> > +		if (put_zbud_page(zhdr))
> 
> There is a single put_zbud_page() regardless of result of the eviction
> attempts.  What if both buddies of a buddied zbud page are evicted?
> Then you leak the zbud page.  What if the eviction in an unbuddied zbud
> page failed?  Then you prematurely free the page and crash later.

This single put_zbud_page() puts the reference obtained few lines
earlier.

If one or two buddies are evicted then the eviction handler should call
zbud_free(). zbud_free() will put the initial ref count (obtained from
zbud_alloc()).

If the eviction handler failed then zbud_free() won't be called so the
initial ref count still be valid and this put_zbud_page() won't free the
page.

Let me show the flow of ref counts:
1. zbud_alloc() for first buddy, ref count = 1
2. zbud_alloc() for last buddy, ref count = 2
3. zbud_reclaim_page() start, ref count = 3
4. calling user eviction handler
5. zbud_free() for first buddy (called from handler), ref count = 2
6. zbud_free() for last buddy (called from handler), ref count = 1
7. zbud_reclaim_page() end, ref count = 0 (free the page)

If eviction handler fails at step 5 or 6, then the ref count won't drop
to 0 as zswap still holds the initial reference to handle.


Thanks for comments. I appreciate them very much.


Best regards,
Krzysztof




  reply	other threads:[~2013-09-10  7:16 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-30  8:42 [RFC PATCH 0/4] mm: migrate zbud pages Krzysztof Kozlowski
2013-08-30  8:42 ` Krzysztof Kozlowski
2013-08-30  8:42 ` [RFC PATCH 1/4] zbud: use page ref counter for " Krzysztof Kozlowski
2013-08-30  8:42   ` Krzysztof Kozlowski
2013-09-08  9:04   ` Bob Liu
2013-09-08  9:04     ` Bob Liu
2013-09-09  6:14     ` Krzysztof Kozlowski
2013-09-09  6:14       ` Krzysztof Kozlowski
2013-09-09 19:47   ` Seth Jennings
2013-09-09 19:47     ` Seth Jennings
2013-09-10  7:16     ` Krzysztof Kozlowski [this message]
2013-09-10  7:16       ` Krzysztof Kozlowski
2013-08-30  8:42 ` [RFC PATCH 2/4] mm: use mapcount for identifying " Krzysztof Kozlowski
2013-08-30  8:42   ` Krzysztof Kozlowski
2013-09-09 19:53   ` Seth Jennings
2013-09-09 19:53     ` Seth Jennings
2013-08-30  8:42 ` [RFC PATCH 3/4] mm: use indirect zbud handle and radix tree Krzysztof Kozlowski
2013-08-30  8:42   ` Krzysztof Kozlowski
2013-08-30  8:42 ` [RFC PATCH 4/4] mm: migrate zbud pages Krzysztof Kozlowski
2013-08-30  8:42   ` Krzysztof Kozlowski
2013-09-06 17:30 ` [RFC PATCH 0/4] " Seth Jennings
2013-09-06 17:30   ` Seth Jennings
2013-09-09  6:55   ` Krzysztof Kozlowski
2013-09-09  6:55     ` Krzysztof Kozlowski
  -- strict thread matches above, loose matches on Subject: below --
2013-08-06  6:42 [RFC PATCH 0/4] mm: reclaim zbud pages on migration and compaction Krzysztof Kozlowski
2013-08-06  6:42 ` [RFC PATCH 1/4] zbud: use page ref counter for zbud pages Krzysztof Kozlowski
2013-08-06  6:42   ` Krzysztof Kozlowski
2013-08-06  9:00   ` Bob Liu
2013-08-06  9:00     ` Bob Liu
2013-08-06  9:25     ` Krzysztof Kozlowski
2013-08-06  9:25       ` Krzysztof Kozlowski
2013-08-06 18:51   ` Seth Jennings
2013-08-07  7:31     ` Krzysztof Kozlowski
2013-08-07  7:31       ` Krzysztof Kozlowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1378797411.15425.17.camel@AMDC1943 \
    --to=k.kozlowski@samsung.com \
    --cc=akpm@linux-foundation.org \
    --cc=b.zolnierkie@samsung.com \
    --cc=bob.liu@oracle.com \
    --cc=dave.hansen@intel.com \
    --cc=kyungmin.park@samsung.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=m.szyprowski@samsung.com \
    --cc=mgorman@suse.de \
    --cc=minchan@kernel.org \
    --cc=sjenning@linux.vnet.ibm.com \
    --cc=t.stanislaws@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.