[PATCH 1/2] zswap: limit pool fragment

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/2] zswap: limit pool fragment
@ 2013-06-20 14:38 Bob Liu
  2013-06-20 15:18 ` Seth Jennings
  0 siblings, 1 reply; 3+ messages in thread
From: Bob Liu @ 2013-06-20 14:38 UTC (permalink / raw)
  To: akpm; +Cc: sjenning, linux-mm, konrad.wilk, Bob Liu

If zswap pool fragment is heavy, it's meanless to store more pages to zswap.
So refuse allocate page to zswap pool to limit the fragmentation.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 include/linux/zbud.h |    2 +-
 mm/zbud.c            |    4 +++-
 mm/zswap.c           |   15 +++++++++++++--
 3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/include/linux/zbud.h b/include/linux/zbud.h
index 2571a5c..71a61be 100644
--- a/include/linux/zbud.h
+++ b/include/linux/zbud.h
@@ -12,7 +12,7 @@ struct zbud_ops {
 struct zbud_pool *zbud_create_pool(gfp_t gfp, struct zbud_ops *ops);
 void zbud_destroy_pool(struct zbud_pool *pool);
 int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
-	unsigned long *handle);
+	unsigned long *handle, bool dis_pagealloc);
 void zbud_free(struct zbud_pool *pool, unsigned long handle);
 int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries);
 void *zbud_map(struct zbud_pool *pool, unsigned long handle);
diff --git a/mm/zbud.c b/mm/zbud.c
index 9bb4710..5ace447 100644
--- a/mm/zbud.c
+++ b/mm/zbud.c
@@ -248,7 +248,7 @@ void zbud_destroy_pool(struct zbud_pool *pool)
  * a new page.
  */
 int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
-			unsigned long *handle)
+			unsigned long *handle, bool dis_pagealloc)
 {
 	int chunks, i, freechunks;
 	struct zbud_header *zhdr = NULL;
@@ -279,6 +279,8 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
 
 	/* Couldn't find unbuddied zbud page, create new one */
 	spin_unlock(&pool->lock);
+	if (dis_pagealloc)
+		return -ENOSPC;
 	page = alloc_page(gfp);
 	if (!page)
 		return -ENOMEM;
diff --git a/mm/zswap.c b/mm/zswap.c
index deda2b6..7fe2b1b 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -607,10 +607,12 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
 	struct zswap_entry *entry, *dupentry;
 	int ret;
 	unsigned int dlen = PAGE_SIZE, len;
-	unsigned long handle;
+	unsigned long handle, stored_pages;
 	char *buf;
 	u8 *src, *dst;
 	struct zswap_header *zhdr;
+	u64 tmp;
+	bool dis_pagealloc = false;
 
 	if (!tree) {
 		ret = -ENODEV;
@@ -645,10 +647,19 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
 		goto freepage;
 	}
 
+	/* If the fragment of zswap pool is heavy, don't alloc new page to
+	 * zswap pool anymore. The limitation of fragment is 70% percent currently
+	 */
+	stored_pages = atomic_read(&zswap_stored_pages);
+	tmp = zswap_pool_pages * 100;
+	do_div(tmp, stored_pages + 1);
+	if (tmp > 70)
+		dis_pagealloc = true;
+
 	/* store */
 	len = dlen + sizeof(struct zswap_header);
 	ret = zbud_alloc(tree->pool, len, __GFP_NORETRY | __GFP_NOWARN,
-		&handle);
+		&handle, dis_pagealloc);
 	if (ret == -ENOSPC) {
 		zswap_reject_compress_poor++;
 		goto freepage;
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/2] zswap: limit pool fragment
  2013-06-20 14:38 [PATCH 1/2] zswap: limit pool fragment Bob Liu
@ 2013-06-20 15:18 ` Seth Jennings
  2013-06-21  0:00   ` Bob Liu
  0 siblings, 1 reply; 3+ messages in thread
From: Seth Jennings @ 2013-06-20 15:18 UTC (permalink / raw)
  To: Bob Liu; +Cc: akpm, linux-mm, konrad.wilk, Bob Liu

On Thu, Jun 20, 2013 at 10:38:22PM +0800, Bob Liu wrote:
> If zswap pool fragment is heavy, it's meanless to store more pages to zswap.
> So refuse allocate page to zswap pool to limit the fragmentation.

This is a little light on specifics.

>From looking at the code it would seem that you are preventing further
growth of the zswap pool if the overall compression ratio achieved in the
pool doesn't reach a certain threshold; in this case an arbitrary 70%.

So there are some issues:

1. If the first few pages that come into zswap don't compress well this logic
makes a horrible assumption that all future page will also not compress well
and effectively disables zswap until those poorly-compressed pages are removed
from the pool.  So that's a huge problem.

2. It mucks up the allocator API with a change to zbud_alloc's signature

3. It introduces yet another (should be) tunable in the form of the compression
threshold and really just reimplements the per-page compression threshold that
was removed from the code during development (the per-page check was better
than this because it doesn't suffer from issue #1).  It was decided that the
decisions about whether the page can be (efficiently) stored should be the job
of the allocator.

Seth

> 
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
>  include/linux/zbud.h |    2 +-
>  mm/zbud.c            |    4 +++-
>  mm/zswap.c           |   15 +++++++++++++--
>  3 files changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/zbud.h b/include/linux/zbud.h
> index 2571a5c..71a61be 100644
> --- a/include/linux/zbud.h
> +++ b/include/linux/zbud.h
> @@ -12,7 +12,7 @@ struct zbud_ops {
>  struct zbud_pool *zbud_create_pool(gfp_t gfp, struct zbud_ops *ops);
>  void zbud_destroy_pool(struct zbud_pool *pool);
>  int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
> -	unsigned long *handle);
> +	unsigned long *handle, bool dis_pagealloc);
>  void zbud_free(struct zbud_pool *pool, unsigned long handle);
>  int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries);
>  void *zbud_map(struct zbud_pool *pool, unsigned long handle);
> diff --git a/mm/zbud.c b/mm/zbud.c
> index 9bb4710..5ace447 100644
> --- a/mm/zbud.c
> +++ b/mm/zbud.c
> @@ -248,7 +248,7 @@ void zbud_destroy_pool(struct zbud_pool *pool)
>   * a new page.
>   */
>  int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
> -			unsigned long *handle)
> +			unsigned long *handle, bool dis_pagealloc)
>  {
>  	int chunks, i, freechunks;
>  	struct zbud_header *zhdr = NULL;
> @@ -279,6 +279,8 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
> 
>  	/* Couldn't find unbuddied zbud page, create new one */
>  	spin_unlock(&pool->lock);
> +	if (dis_pagealloc)
> +		return -ENOSPC;
>  	page = alloc_page(gfp);
>  	if (!page)
>  		return -ENOMEM;
> diff --git a/mm/zswap.c b/mm/zswap.c
> index deda2b6..7fe2b1b 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -607,10 +607,12 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
>  	struct zswap_entry *entry, *dupentry;
>  	int ret;
>  	unsigned int dlen = PAGE_SIZE, len;
> -	unsigned long handle;
> +	unsigned long handle, stored_pages;
>  	char *buf;
>  	u8 *src, *dst;
>  	struct zswap_header *zhdr;
> +	u64 tmp;
> +	bool dis_pagealloc = false;
> 
>  	if (!tree) {
>  		ret = -ENODEV;
> @@ -645,10 +647,19 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
>  		goto freepage;
>  	}
> 
> +	/* If the fragment of zswap pool is heavy, don't alloc new page to
> +	 * zswap pool anymore. The limitation of fragment is 70% percent currently
> +	 */
> +	stored_pages = atomic_read(&zswap_stored_pages);
> +	tmp = zswap_pool_pages * 100;
> +	do_div(tmp, stored_pages + 1);
> +	if (tmp > 70)
> +		dis_pagealloc = true;
> +
>  	/* store */
>  	len = dlen + sizeof(struct zswap_header);
>  	ret = zbud_alloc(tree->pool, len, __GFP_NORETRY | __GFP_NOWARN,
> -		&handle);
> +		&handle, dis_pagealloc);
>  	if (ret == -ENOSPC) {
>  		zswap_reject_compress_poor++;
>  		goto freepage;
> -- 
> 1.7.10.4
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/2] zswap: limit pool fragment
  2013-06-20 15:18 ` Seth Jennings
@ 2013-06-21  0:00   ` Bob Liu
  0 siblings, 0 replies; 3+ messages in thread
From: Bob Liu @ 2013-06-21  0:00 UTC (permalink / raw)
  To: Seth Jennings; +Cc: Bob Liu, akpm, linux-mm, konrad.wilk

Hi Seth,

On 06/20/2013 11:18 PM, Seth Jennings wrote:
> On Thu, Jun 20, 2013 at 10:38:22PM +0800, Bob Liu wrote:
>> If zswap pool fragment is heavy, it's meanless to store more pages to zswap.
>> So refuse allocate page to zswap pool to limit the fragmentation.
> 
> This is a little light on specifics.
> 
> From looking at the code it would seem that you are preventing further
> growth of the zswap pool if the overall compression ratio achieved in the
> pool doesn't reach a certain threshold; in this case an arbitrary 70%.
> 

Thanks for the review.
Yes.

> So there are some issues:
> 
> 1. If the first few pages that come into zswap don't compress well this logic
> makes a horrible assumption that all future page will also not compress well
> and effectively disables zswap until those poorly-compressed pages are removed

Not exactly, future pages can still use zswap if they can find a space.
But without allocate new physical pages.

> from the pool.  So that's a huge problem.
> 
> 2. It mucks up the allocator API with a change to zbud_alloc's signature
> 
> 3. It introduces yet another (should be) tunable in the form of the compression
> threshold and really just reimplements the per-page compression threshold that
> was removed from the code during development (the per-page check was better
> than this because it doesn't suffer from issue #1).  It was decided that the

Yes, but it's a bit different. The per-page compression limit the
compression ratio of a single page, but here it's limit the the
compression ratio of the whole pool.
poorly-compressed pages can still be stored depends on the overall
compression ratio.

That's what I observed while testing memcached, the overall compression
ratio is always low. With this patch the performance is better.
(It still takes time to summarize the results since there is an problem
of my testing machine currently.)

> decisions about whether the page can be (efficiently) stored should be the job
> of the allocator.

So maybe we should limit the fragmentation in zbud.

> 
> Seth
> 
>>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>>  include/linux/zbud.h |    2 +-
>>  mm/zbud.c            |    4 +++-
>>  mm/zswap.c           |   15 +++++++++++++--
>>  3 files changed, 17 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/zbud.h b/include/linux/zbud.h
>> index 2571a5c..71a61be 100644
>> --- a/include/linux/zbud.h
>> +++ b/include/linux/zbud.h
>> @@ -12,7 +12,7 @@ struct zbud_ops {
>>  struct zbud_pool *zbud_create_pool(gfp_t gfp, struct zbud_ops *ops);
>>  void zbud_destroy_pool(struct zbud_pool *pool);
>>  int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
>> -	unsigned long *handle);
>> +	unsigned long *handle, bool dis_pagealloc);
>>  void zbud_free(struct zbud_pool *pool, unsigned long handle);
>>  int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries);
>>  void *zbud_map(struct zbud_pool *pool, unsigned long handle);
>> diff --git a/mm/zbud.c b/mm/zbud.c
>> index 9bb4710..5ace447 100644
>> --- a/mm/zbud.c
>> +++ b/mm/zbud.c
>> @@ -248,7 +248,7 @@ void zbud_destroy_pool(struct zbud_pool *pool)
>>   * a new page.
>>   */
>>  int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
>> -			unsigned long *handle)
>> +			unsigned long *handle, bool dis_pagealloc)
>>  {
>>  	int chunks, i, freechunks;
>>  	struct zbud_header *zhdr = NULL;
>> @@ -279,6 +279,8 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
>>
>>  	/* Couldn't find unbuddied zbud page, create new one */
>>  	spin_unlock(&pool->lock);
>> +	if (dis_pagealloc)
>> +		return -ENOSPC;
>>  	page = alloc_page(gfp);
>>  	if (!page)
>>  		return -ENOMEM;
>> diff --git a/mm/zswap.c b/mm/zswap.c
>> index deda2b6..7fe2b1b 100644
>> --- a/mm/zswap.c
>> +++ b/mm/zswap.c
>> @@ -607,10 +607,12 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
>>  	struct zswap_entry *entry, *dupentry;
>>  	int ret;
>>  	unsigned int dlen = PAGE_SIZE, len;
>> -	unsigned long handle;
>> +	unsigned long handle, stored_pages;
>>  	char *buf;
>>  	u8 *src, *dst;
>>  	struct zswap_header *zhdr;
>> +	u64 tmp;
>> +	bool dis_pagealloc = false;
>>
>>  	if (!tree) {
>>  		ret = -ENODEV;
>> @@ -645,10 +647,19 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
>>  		goto freepage;
>>  	}
>>
>> +	/* If the fragment of zswap pool is heavy, don't alloc new page to
>> +	 * zswap pool anymore. The limitation of fragment is 70% percent currently
>> +	 */
>> +	stored_pages = atomic_read(&zswap_stored_pages);
>> +	tmp = zswap_pool_pages * 100;
>> +	do_div(tmp, stored_pages + 1);
>> +	if (tmp > 70)
>> +		dis_pagealloc = true;
>> +
>>  	/* store */
>>  	len = dlen + sizeof(struct zswap_header);
>>  	ret = zbud_alloc(tree->pool, len, __GFP_NORETRY | __GFP_NOWARN,
>> -		&handle);
>> +		&handle, dis_pagealloc);
>>  	if (ret == -ENOSPC) {
>>  		zswap_reject_compress_poor++;
>>  		goto freepage;
>> -- 
>> 1.7.10.4
>>
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

-- 
Regards,
-Bob

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-06-21  0:01 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-20 14:38 [PATCH 1/2] zswap: limit pool fragment Bob Liu
2013-06-20 15:18 ` Seth Jennings
2013-06-21  0:00   ` Bob Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).