From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-f171.google.com (mail-lb0-f171.google.com [209.85.217.171]) by kanga.kvack.org (Postfix) with ESMTP id 0E8496B0038 for ; Wed, 16 Sep 2015 07:49:27 -0400 (EDT) Received: by lbpo4 with SMTP id o4so102077915lbp.2 for ; Wed, 16 Sep 2015 04:49:26 -0700 (PDT) Received: from mail-la0-x22f.google.com (mail-la0-x22f.google.com. [2a00:1450:4010:c03::22f]) by mx.google.com with ESMTPS id ea18si17936848lbb.84.2015.09.16.04.49.24 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 Sep 2015 04:49:25 -0700 (PDT) Received: by lamp12 with SMTP id p12so125754061lam.0 for ; Wed, 16 Sep 2015 04:49:24 -0700 (PDT) Date: Wed, 16 Sep 2015 13:48:57 +0200 From: Vitaly Wool Subject: [PATCH 0/2] prepare zbud to be used by zram as underlying allocator Message-Id: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: ddstreet@ieee.org, akpm@linux-foundation.org, minchan@kernel.org, sergey.senozhatsky@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org Hi, as a follow-up to my previous patchset, I decided to first prepare zbud/zpool related patches and then have some testing rounds and performance measurements for zram running over both, and come up with improved/verified zram/zpool patches then. So for now, here comes the zbud/zpool part. -- Vitaly Wool -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f47.google.com (mail-la0-f47.google.com [209.85.215.47]) by kanga.kvack.org (Postfix) with ESMTP id 7CDDE6B0038 for ; Wed, 16 Sep 2015 07:50:52 -0400 (EDT) Received: by lamp12 with SMTP id p12so125777717lam.0 for ; Wed, 16 Sep 2015 04:50:51 -0700 (PDT) Received: from mail-la0-x22a.google.com (mail-la0-x22a.google.com. [2a00:1450:4010:c03::22a]) by mx.google.com with ESMTPS id k4si17940037lam.79.2015.09.16.04.50.50 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 Sep 2015 04:50:51 -0700 (PDT) Received: by lahg1 with SMTP id g1so97978566lah.1 for ; Wed, 16 Sep 2015 04:50:50 -0700 (PDT) Date: Wed, 16 Sep 2015 13:50:48 +0200 From: Vitaly Wool Subject: [PATCH 1/2] zbud: allow PAGE_SIZE allocations Message-Id: <20150916135048.fbd50fac5e91244ab9731b82@gmail.com> In-Reply-To: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: ddstreet@ieee.org, akpm@linux-foundation.org, minchan@kernel.org, sergey.senozhatsky@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org For zram to be able to use zbud via the common zpool API, allocations of size PAGE_SIZE should be allowed by zpool. zbud uses the beginning of an allocated page for its internal structure but it is not a problem as long as we keep track of such special pages using a newly introduced page flag. To be able to keep track of zbud pages in any case, struct page's lru pointer will be used for zbud page lists instead of the one that used to be part of the aforementioned internal structure. Signed-off-by: Vitaly Wool --- include/linux/page-flags.h | 3 ++ mm/zbud.c | 71 ++++++++++++++++++++++++++++++++++++++-------- 2 files changed, 62 insertions(+), 12 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 416509e..dd47cf0 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -134,6 +134,9 @@ enum pageflags { /* SLOB */ PG_slob_free = PG_private, + + /* ZBUD */ + PG_uncompressed = PG_owner_priv_1, }; #ifndef __GENERATING_BOUNDS_H diff --git a/mm/zbud.c b/mm/zbud.c index fa48bcdf..ee8b5d6 100644 --- a/mm/zbud.c +++ b/mm/zbud.c @@ -107,13 +107,11 @@ struct zbud_pool { * struct zbud_header - zbud page metadata occupying the first chunk of each * zbud page. * @buddy: links the zbud page into the unbuddied/buddied lists in the pool - * @lru: links the zbud page into the lru list in the pool * @first_chunks: the size of the first buddy in chunks, 0 if free * @last_chunks: the size of the last buddy in chunks, 0 if free */ struct zbud_header { struct list_head buddy; - struct list_head lru; unsigned int first_chunks; unsigned int last_chunks; bool under_reclaim; @@ -221,6 +219,7 @@ MODULE_ALIAS("zpool-zbud"); *****************/ /* Just to make the code easier to read */ enum buddy { + FULL, FIRST, LAST }; @@ -241,7 +240,7 @@ static struct zbud_header *init_zbud_page(struct page *page) zhdr->first_chunks = 0; zhdr->last_chunks = 0; INIT_LIST_HEAD(&zhdr->buddy); - INIT_LIST_HEAD(&zhdr->lru); + INIT_LIST_HEAD(&page->lru); zhdr->under_reclaim = 0; return zhdr; } @@ -267,11 +266,18 @@ static unsigned long encode_handle(struct zbud_header *zhdr, enum buddy bud) * over the zbud header in the first chunk. */ handle = (unsigned long)zhdr; - if (bud == FIRST) + switch (bud) { + case FIRST: /* skip over zbud header */ handle += ZHDR_SIZE_ALIGNED; - else /* bud == LAST */ + break; + case LAST: handle += PAGE_SIZE - (zhdr->last_chunks << CHUNK_SHIFT); + break; + case FULL: + default: + break; + } return handle; } @@ -360,6 +366,24 @@ int zbud_alloc(struct zbud_pool *pool, size_t size, gfp_t gfp, if (!size || (gfp & __GFP_HIGHMEM)) return -EINVAL; + + if (size == PAGE_SIZE) { + /* + * This is a special case. The page will be allocated + * and used to store uncompressed data + */ + page = alloc_page(gfp); + if (!page) + return -ENOMEM; + spin_lock(&pool->lock); + pool->pages_nr++; + INIT_LIST_HEAD(&page->lru); + page->flags |= PG_uncompressed; + list_add(&page->lru, &pool->lru); + spin_unlock(&pool->lock); + *handle = encode_handle(page_address(page), FULL); + return 0; + } if (size > PAGE_SIZE - ZHDR_SIZE_ALIGNED - CHUNK_SIZE) return -ENOSPC; chunks = size_to_chunks(size); @@ -372,6 +396,7 @@ int zbud_alloc(struct zbud_pool *pool, size_t size, gfp_t gfp, zhdr = list_first_entry(&pool->unbuddied[i], struct zbud_header, buddy); list_del(&zhdr->buddy); + page = virt_to_page(zhdr); if (zhdr->first_chunks == 0) bud = FIRST; else @@ -406,9 +431,9 @@ found: } /* Add/move zbud page to beginning of LRU */ - if (!list_empty(&zhdr->lru)) - list_del(&zhdr->lru); - list_add(&zhdr->lru, &pool->lru); + if (!list_empty(&page->lru)) + list_del(&page->lru); + list_add(&page->lru, &pool->lru); *handle = encode_handle(zhdr, bud); spin_unlock(&pool->lock); @@ -430,9 +455,21 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle) { struct zbud_header *zhdr; int freechunks; + struct page *page; spin_lock(&pool->lock); zhdr = handle_to_zbud_header(handle); + page = virt_to_page(zhdr); + + /* If it was an uncompressed full page, just free it */ + if (page->flags & PG_uncompressed) { + page->flags &= ~PG_uncompressed; + list_del(&page->lru); + __free_page(page); + pool->pages_nr--; + spin_unlock(&pool->lock); + return; + } /* If first buddy, handle will be page aligned */ if ((handle - ZHDR_SIZE_ALIGNED) & ~PAGE_MASK) @@ -451,7 +488,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle) if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) { /* zbud page is empty, free */ - list_del(&zhdr->lru); + list_del(&page->lru); free_zbud_page(zhdr); pool->pages_nr--; } else { @@ -505,6 +542,7 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) { int i, ret, freechunks; struct zbud_header *zhdr; + struct page *page; unsigned long first_handle = 0, last_handle = 0; spin_lock(&pool->lock); @@ -514,8 +552,17 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) return -EINVAL; } for (i = 0; i < retries; i++) { - zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru); - list_del(&zhdr->lru); + page = list_tail_entry(&pool->lru, struct page, lru); + zhdr = page_address(page); + list_del(&page->lru); + /* Uncompressed zbud page? just run eviction and free it */ + if (page->flags & PG_uncompressed) { + page->flags &= ~PG_uncompressed; + spin_unlock(&pool->lock); + pool->ops->evict(pool, encode_handle(zhdr, FULL)); + __free_page(page); + return 0; + } list_del(&zhdr->buddy); /* Protect zbud page against free */ zhdr->under_reclaim = true; @@ -565,7 +612,7 @@ next: } /* add to beginning of LRU */ - list_add(&zhdr->lru, &pool->lru); + list_add(&page->lru, &pool->lru); } spin_unlock(&pool->lock); return -EAGAIN; -- 1.9.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-f176.google.com (mail-lb0-f176.google.com [209.85.217.176]) by kanga.kvack.org (Postfix) with ESMTP id 9D33D6B0254 for ; Wed, 16 Sep 2015 07:53:51 -0400 (EDT) Received: by lbpo4 with SMTP id o4so102140592lbp.2 for ; Wed, 16 Sep 2015 04:53:51 -0700 (PDT) Received: from mail-la0-x229.google.com (mail-la0-x229.google.com. [2a00:1450:4010:c03::229]) by mx.google.com with ESMTPS id 6si17421586lav.58.2015.09.16.04.53.49 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 Sep 2015 04:53:50 -0700 (PDT) Received: by lahg1 with SMTP id g1so98029509lah.1 for ; Wed, 16 Sep 2015 04:53:49 -0700 (PDT) Date: Wed, 16 Sep 2015 13:53:47 +0200 From: Vitaly Wool Subject: [PATCH 2/2] zpool/zsmalloc/zbud: align on interfaces Message-Id: <20150916135347.149f550d51751c58c8b7ca96@gmail.com> In-Reply-To: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: ddstreet@ieee.org, akpm@linux-foundation.org, minchan@kernel.org, sergey.senozhatsky@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org As a preparation step for zram to be able to use common zpool API, there has to be some alignment done on it. This patch adds functions that correspond to zsmalloc-specific API to the common zpool API and takes care of the callbacks that have to be introduced, too. This version of the patch uses simplified 'compact' API/callbacks. Signed-off-by: Vitaly Wool --- drivers/block/zram/zram_drv.c | 4 ++-- include/linux/zpool.h | 14 ++++++++++++++ include/linux/zsmalloc.h | 8 ++------ mm/zbud.c | 12 ++++++++++++ mm/zpool.c | 21 +++++++++++++++++++++ mm/zsmalloc.c | 19 ++++++++++++++++--- 6 files changed, 67 insertions(+), 11 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 9fa15bb..a0a786e 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -426,12 +426,12 @@ static ssize_t mm_stat_show(struct device *dev, struct device_attribute *attr, char *buf) { struct zram *zram = dev_to_zram(dev); - struct zs_pool_stats pool_stats; + struct zpool_stats pool_stats; u64 orig_size, mem_used = 0; long max_used; ssize_t ret; - memset(&pool_stats, 0x00, sizeof(struct zs_pool_stats)); + memset(&pool_stats, 0x00, sizeof(struct zpool_stats)); down_read(&zram->init_lock); if (init_done(zram)) { diff --git a/include/linux/zpool.h b/include/linux/zpool.h index 42f8ec9..a2a5bc4 100644 --- a/include/linux/zpool.h +++ b/include/linux/zpool.h @@ -17,6 +17,11 @@ struct zpool_ops { int (*evict)(struct zpool *pool, unsigned long handle); }; +struct zpool_stats { + /* How many pages were migrated (freed) */ + unsigned long pages_compacted; +}; + /* * Control how a handle is mapped. It will be ignored if the * implementation does not support it. Its use is optional. @@ -58,6 +63,10 @@ void *zpool_map_handle(struct zpool *pool, unsigned long handle, void zpool_unmap_handle(struct zpool *pool, unsigned long handle); +unsigned long zpool_compact(struct zpool *pool); + +void zpool_stats(struct zpool *pool, struct zpool_stats *zstats); + u64 zpool_get_total_size(struct zpool *pool); @@ -72,6 +81,8 @@ u64 zpool_get_total_size(struct zpool *pool); * @shrink: shrink the pool. * @map: map a handle. * @unmap: unmap a handle. + * @compact: try to run compaction for the pool + * @stats: get statistics for the pool * @total_size: get total size of a pool. * * This is created by a zpool implementation and registered @@ -98,6 +109,9 @@ struct zpool_driver { enum zpool_mapmode mm); void (*unmap)(void *pool, unsigned long handle); + unsigned long (*compact)(void *pool); + void (*stats)(void *pool, struct zpool_stats *stats); + u64 (*total_size)(void *pool); }; diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h index 6398dfa..5aee1c7 100644 --- a/include/linux/zsmalloc.h +++ b/include/linux/zsmalloc.h @@ -15,6 +15,7 @@ #define _ZS_MALLOC_H_ #include +#include /* * zsmalloc mapping modes @@ -34,11 +35,6 @@ enum zs_mapmode { */ }; -struct zs_pool_stats { - /* How many pages were migrated (freed) */ - unsigned long pages_compacted; -}; - struct zs_pool; struct zs_pool *zs_create_pool(char *name, gfp_t flags); @@ -54,5 +50,5 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle); unsigned long zs_get_total_pages(struct zs_pool *pool); unsigned long zs_compact(struct zs_pool *pool); -void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats); +void zs_pool_stats(struct zs_pool *pool, struct zpool_stats *stats); #endif diff --git a/mm/zbud.c b/mm/zbud.c index ee8b5d6..23cfc76 100644 --- a/mm/zbud.c +++ b/mm/zbud.c @@ -193,6 +193,16 @@ static void zbud_zpool_unmap(void *pool, unsigned long handle) zbud_unmap(pool, handle); } +static unsigned long zbud_zpool_compact(void *pool) +{ + return 0; +} + +static void zbud_zpool_stats(void *pool, struct zpool_stats *stats) +{ + /* no-op */ +} + static u64 zbud_zpool_total_size(void *pool) { return zbud_get_pool_size(pool) * PAGE_SIZE; @@ -208,6 +218,8 @@ static struct zpool_driver zbud_zpool_driver = { .shrink = zbud_zpool_shrink, .map = zbud_zpool_map, .unmap = zbud_zpool_unmap, + .compact = zbud_zpool_compact, + .stats = zbud_zpool_stats, .total_size = zbud_zpool_total_size, }; diff --git a/mm/zpool.c b/mm/zpool.c index 8f670d3..d454f37 100644 --- a/mm/zpool.c +++ b/mm/zpool.c @@ -341,6 +341,27 @@ void zpool_unmap_handle(struct zpool *zpool, unsigned long handle) } /** + * zpool_compact() - try to run compaction over zpool + * @pool The zpool to compact + * + * Returns: the number of migrated pages (0 if nothing happened) + */ +unsigned long zpool_compact(struct zpool *zpool) +{ + return zpool->driver->compact(zpool->pool); +} + +/** + * zpool_stats() - obtain zpool statistics + * @pool The zpool to get statistics for + * @zstats stats to fill in + */ +void zpool_stats(struct zpool *zpool, struct zpool_stats *zstats) +{ + zpool->driver->stats(zpool->pool, zstats); +} + +/** * zpool_get_total_size() - The total size of the pool * @pool The zpool to check * diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index f135b1b..3ab0515 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -245,7 +245,7 @@ struct zs_pool { gfp_t flags; /* allocation flags used when growing pool */ atomic_long_t pages_allocated; - struct zs_pool_stats stats; + struct zpool_stats stats; /* Compact classes */ struct shrinker shrinker; @@ -365,6 +365,17 @@ static void zs_zpool_unmap(void *pool, unsigned long handle) zs_unmap_object(pool, handle); } +static unsigned long zs_zpool_compact(void *pool) +{ + return zs_compact(pool); +} + + +static void zs_zpool_stats(void *pool, struct zpool_stats *stats) +{ + zs_pool_stats(pool, stats); +} + static u64 zs_zpool_total_size(void *pool) { return zs_get_total_pages(pool) << PAGE_SHIFT; @@ -380,6 +391,8 @@ static struct zpool_driver zs_zpool_driver = { .shrink = zs_zpool_shrink, .map = zs_zpool_map, .unmap = zs_zpool_unmap, + .compact = zs_zpool_compact, + .stats = zs_zpool_stats, .total_size = zs_zpool_total_size, }; @@ -1789,9 +1802,9 @@ unsigned long zs_compact(struct zs_pool *pool) } EXPORT_SYMBOL_GPL(zs_compact); -void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats) +void zs_pool_stats(struct zs_pool *pool, struct zpool_stats *stats) { - memcpy(stats, &pool->stats, sizeof(struct zs_pool_stats)); + memcpy(stats, &pool->stats, sizeof(struct zpool_stats)); } EXPORT_SYMBOL_GPL(zs_pool_stats); -- 1.9.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f47.google.com (mail-pa0-f47.google.com [209.85.220.47]) by kanga.kvack.org (Postfix) with ESMTP id CC8776B0038 for ; Wed, 16 Sep 2015 21:29:23 -0400 (EDT) Received: by padhy16 with SMTP id hy16so4456213pad.1 for ; Wed, 16 Sep 2015 18:29:23 -0700 (PDT) Received: from mail-pa0-x235.google.com (mail-pa0-x235.google.com. [2607:f8b0:400e:c03::235]) by mx.google.com with ESMTPS id cy1si1218875pad.200.2015.09.16.18.29.22 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 Sep 2015 18:29:23 -0700 (PDT) Received: by padhy16 with SMTP id hy16so4455972pad.1 for ; Wed, 16 Sep 2015 18:29:22 -0700 (PDT) Date: Thu, 17 Sep 2015 10:30:07 +0900 From: Sergey Senozhatsky Subject: Re: [PATCH 0/2] prepare zbud to be used by zram as underlying allocator Message-ID: <20150917013007.GB421@swordfish> References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> Sender: owner-linux-mm@kvack.org List-ID: To: Vitaly Wool Cc: ddstreet@ieee.org, akpm@linux-foundation.org, minchan@kernel.org, sergey.senozhatsky@gmail.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org On (09/16/15 13:48), Vitaly Wool wrote: > as a follow-up to my previous patchset, I decided to first prepare > zbud/zpool related patches and then have some testing rounds and > performance measurements for zram running over both, and come up > with improved/verified zram/zpool patches then. Hi, just a side note, I'm afraid this is not how it works. numbers go first, to justify the patch set. -ss > > So for now, here comes the zbud/zpool part. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-f182.google.com (mail-qk0-f182.google.com [209.85.220.182]) by kanga.kvack.org (Postfix) with ESMTP id 7418B6B0038 for ; Thu, 17 Sep 2015 06:26:13 -0400 (EDT) Received: by qkdw123 with SMTP id w123so4448217qkd.0 for ; Thu, 17 Sep 2015 03:26:13 -0700 (PDT) Received: from mail-qg0-x22e.google.com (mail-qg0-x22e.google.com. [2607:f8b0:400d:c04::22e]) by mx.google.com with ESMTPS id 195si1955020qhw.70.2015.09.17.03.26.12 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 17 Sep 2015 03:26:12 -0700 (PDT) Received: by qgev79 with SMTP id v79so8669178qge.0 for ; Thu, 17 Sep 2015 03:26:12 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20150917013007.GB421@swordfish> References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> <20150917013007.GB421@swordfish> Date: Thu, 17 Sep 2015 12:26:12 +0200 Message-ID: Subject: Re: [PATCH 0/2] prepare zbud to be used by zram as underlying allocator From: Vitaly Wool Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Sergey Senozhatsky Cc: Dan Streetman , Andrew Morton , Minchan Kim , Sergey Senozhatsky , LKML , linux-mm@kvack.org On Thu, Sep 17, 2015 at 1:30 AM, Sergey Senozhatsky wrote: > > just a side note, > I'm afraid this is not how it works. numbers go first, to justify > the patch set. > These patches are extension/alignment patches, why would anyone need to justify that? But just to help you understand where I am coming from, here are some numbers: zsmalloc zbud kswapd_low_wmark_hit_quickly 4513 5696 kswapd_high_wmark_hit_quickly 861 902 allocstall 2236 1122 pgmigrate_success 78229 31244 compact_stall 1172 634 compact_fail 194 95 compact_success 464 210 These are results from an Android device having run 3 'monkey' tests each 20 minutes, with user switch to guest and back in between. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-f176.google.com (mail-lb0-f176.google.com [209.85.217.176]) by kanga.kvack.org (Postfix) with ESMTP id DBC116B0038 for ; Thu, 17 Sep 2015 09:00:57 -0400 (EDT) Received: by lbbvu2 with SMTP id vu2so8802030lbb.0 for ; Thu, 17 Sep 2015 06:00:57 -0700 (PDT) Received: from mx2.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id uu1si3921121wjc.126.2015.09.17.06.00.55 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 17 Sep 2015 06:00:55 -0700 (PDT) Subject: Re: [PATCH 1/2] zbud: allow PAGE_SIZE allocations References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> <20150916135048.fbd50fac5e91244ab9731b82@gmail.com> From: Vlastimil Babka Message-ID: <55FAB985.9060705@suse.cz> Date: Thu, 17 Sep 2015 15:00:53 +0200 MIME-Version: 1.0 In-Reply-To: <20150916135048.fbd50fac5e91244ab9731b82@gmail.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Vitaly Wool , ddstreet@ieee.org, akpm@linux-foundation.org, minchan@kernel.org, sergey.senozhatsky@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org On 09/16/2015 01:50 PM, Vitaly Wool wrote: > For zram to be able to use zbud via the common zpool API, > allocations of size PAGE_SIZE should be allowed by zpool. > zbud uses the beginning of an allocated page for its internal > structure but it is not a problem as long as we keep track of > such special pages using a newly introduced page flag. > To be able to keep track of zbud pages in any case, struct page's > lru pointer will be used for zbud page lists instead of the one > that used to be part of the aforementioned internal structure. I don't know how zsmalloc handles uncompressible PAGE_SIZE allocations, but I wouldn't expect it to be any more clever than this? So why duplicate the functionality in zswap and zbud? This could be handled e.g. at the zpool level? Or maybe just in zram, as IIRC in zswap (frontswap) it's valid just to reject a page and it goes to physical swap. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qg0-f53.google.com (mail-qg0-f53.google.com [209.85.192.53]) by kanga.kvack.org (Postfix) with ESMTP id 9DC786B0038 for ; Fri, 18 Sep 2015 04:03:23 -0400 (EDT) Received: by qgez77 with SMTP id z77so32945903qge.1 for ; Fri, 18 Sep 2015 01:03:23 -0700 (PDT) Received: from mail-qg0-x22d.google.com (mail-qg0-x22d.google.com. [2607:f8b0:400d:c04::22d]) by mx.google.com with ESMTPS id k91si6667230qgf.53.2015.09.18.01.03.22 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 18 Sep 2015 01:03:22 -0700 (PDT) Received: by qgt47 with SMTP id 47so32761174qgt.2 for ; Fri, 18 Sep 2015 01:03:22 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <55FAB985.9060705@suse.cz> References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> <20150916135048.fbd50fac5e91244ab9731b82@gmail.com> <55FAB985.9060705@suse.cz> Date: Fri, 18 Sep 2015 10:03:22 +0200 Message-ID: Subject: Re: [PATCH 1/2] zbud: allow PAGE_SIZE allocations From: Vitaly Wool Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Dan Streetman , Andrew Morton , Minchan Kim , Sergey Senozhatsky Cc: LKML , linux-mm@kvack.org > I don't know how zsmalloc handles uncompressible PAGE_SIZE allocations, but > I wouldn't expect it to be any more clever than this? So why duplicate the > functionality in zswap and zbud? This could be handled e.g. at the zpool > level? Or maybe just in zram, as IIRC in zswap (frontswap) it's valid just > to reject a page and it goes to physical swap. >>From what I can see, zsmalloc just allocates pages and puts them into a linked list. Using the beginning of a page for storing an internal struct is zbud-specific, and so is this patch. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f46.google.com (mail-pa0-f46.google.com [209.85.220.46]) by kanga.kvack.org (Postfix) with ESMTP id EF8AC6B0253 for ; Mon, 21 Sep 2015 00:17:10 -0400 (EDT) Received: by padhk3 with SMTP id hk3so104449426pad.3 for ; Sun, 20 Sep 2015 21:17:10 -0700 (PDT) Received: from mail-pa0-x22b.google.com (mail-pa0-x22b.google.com. [2607:f8b0:400e:c03::22b]) by mx.google.com with ESMTPS id al4si34737823pbd.110.2015.09.20.21.17.10 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 20 Sep 2015 21:17:10 -0700 (PDT) Received: by pacex6 with SMTP id ex6so104734815pac.0 for ; Sun, 20 Sep 2015 21:17:09 -0700 (PDT) Date: Mon, 21 Sep 2015 13:18:37 +0900 From: Minchan Kim Subject: Re: [PATCH 0/2] prepare zbud to be used by zram as underlying allocator Message-ID: <20150921041837.GF27729@bbox> References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> <20150917013007.GB421@swordfish> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Vitaly Wool Cc: Sergey Senozhatsky , Dan Streetman , Andrew Morton , Sergey Senozhatsky , LKML , linux-mm@kvack.org Hello Vitaly, On Thu, Sep 17, 2015 at 12:26:12PM +0200, Vitaly Wool wrote: > On Thu, Sep 17, 2015 at 1:30 AM, Sergey Senozhatsky > wrote: > > > > > just a side note, > > I'm afraid this is not how it works. numbers go first, to justify > > the patch set. I totally agree Sergey's opinion. > > > > These patches are extension/alignment patches, why would anyone need > to justify that? Sorry, because you wrote up "zram" in the title. As I said earlier, we need several numbers to investigate. First of all, what is culprit of your latency? It seems you are thinking about compaction. so compaction what? Frequent scanning? lock collision? or frequent sleeping in compaction code somewhere? And then why does zbud solve it? If we use zbud for zram, we lose memory efficiency so there is something to justify it. The reason I am asking is I have investigated similar problems in android and other plaforms and the reason of latency was not zsmalloc but agressive high-order allocations from subsystems, watermark check race, deferring of compaction, LMK not working and too much swapout so it causes to reclaim lots of page cache pages which was main culprit in my cases. When I checks with perf, compaction stall count is increased, the time spent in there is not huge so it was not main factor of latency. Your problem might be differnt with me so convincing us, you should give us real data and investigation story. Thanks. > > But just to help you understand where I am coming from, here are some numbers: > zsmalloc zbud > kswapd_low_wmark_hit_quickly 4513 5696 > kswapd_high_wmark_hit_quickly 861 902 > allocstall 2236 1122 > pgmigrate_success 78229 31244 > compact_stall 1172 634 > compact_fail 194 95 > compact_success 464 210 > > These are results from an Android device having run 3 'monkey' tests > each 20 minutes, with user switch to guest and back in between. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f182.google.com (mail-ig0-f182.google.com [209.85.213.182]) by kanga.kvack.org (Postfix) with ESMTP id 697936B0255 for ; Mon, 21 Sep 2015 11:27:48 -0400 (EDT) Received: by igcpb10 with SMTP id pb10so81071464igc.1 for ; Mon, 21 Sep 2015 08:27:48 -0700 (PDT) Received: from mail-ig0-x229.google.com (mail-ig0-x229.google.com. [2607:f8b0:4001:c05::229]) by mx.google.com with ESMTPS id k102si17405068ioi.138.2015.09.21.08.27.47 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 21 Sep 2015 08:27:47 -0700 (PDT) Received: by igbkq10 with SMTP id kq10so81123188igb.0 for ; Mon, 21 Sep 2015 08:27:47 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> <20150916135048.fbd50fac5e91244ab9731b82@gmail.com> <55FAB985.9060705@suse.cz> From: Dan Streetman Date: Mon, 21 Sep 2015 11:27:08 -0400 Message-ID: Subject: Re: [PATCH 1/2] zbud: allow PAGE_SIZE allocations Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Vitaly Wool Cc: Andrew Morton , Minchan Kim , Sergey Senozhatsky , LKML , Linux-MM On Fri, Sep 18, 2015 at 4:03 AM, Vitaly Wool wrote: >> I don't know how zsmalloc handles uncompressible PAGE_SIZE allocations, but >> I wouldn't expect it to be any more clever than this? So why duplicate the >> functionality in zswap and zbud? This could be handled e.g. at the zpool >> level? Or maybe just in zram, as IIRC in zswap (frontswap) it's valid just >> to reject a page and it goes to physical swap. zpool doesn't actually store pages anywhere; zbud and zsmalloc do the storing, and they do it in completely different ways. Storing an uncompressed page has to be done in zbud and zsmalloc, not zpool. And zram can't do it either; zram doesn't actually store pages either, it relies on zsmalloc to store all its pages. > > From what I can see, zsmalloc just allocates pages and puts them into > a linked list. Using the beginning of a page for storing an internal > struct is zbud-specific, and so is this patch. zsmalloc has size "classes" that allow storing "objects" of a specific size range (i.e. the last class size + 1, up to class size). the max size class is: #define ZS_MAX_ALLOC_SIZE PAGE_SIZE so zsmalloc is able to store "objects" up to, and including, PAGE_SIZE. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f170.google.com (mail-io0-f170.google.com [209.85.223.170]) by kanga.kvack.org (Postfix) with ESMTP id 6E7C36B0253 for ; Mon, 21 Sep 2015 12:18:02 -0400 (EDT) Received: by iofb144 with SMTP id b144so125819127iof.1 for ; Mon, 21 Sep 2015 09:18:02 -0700 (PDT) Received: from mail-ig0-x22f.google.com (mail-ig0-x22f.google.com. [2607:f8b0:4001:c05::22f]) by mx.google.com with ESMTPS id f194si17661055ioe.72.2015.09.21.09.18.01 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 21 Sep 2015 09:18:01 -0700 (PDT) Received: by igbkq10 with SMTP id kq10so82327091igb.0 for ; Mon, 21 Sep 2015 09:18:01 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20150916135048.fbd50fac5e91244ab9731b82@gmail.com> References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> <20150916135048.fbd50fac5e91244ab9731b82@gmail.com> From: Dan Streetman Date: Mon, 21 Sep 2015 12:17:21 -0400 Message-ID: Subject: Re: [PATCH 1/2] zbud: allow PAGE_SIZE allocations Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Vitaly Wool Cc: Andrew Morton , Minchan Kim , Sergey Senozhatsky , linux-kernel , Linux-MM , Seth Jennings Please make sure to cc Seth also, he's the owner of zbud. On Wed, Sep 16, 2015 at 7:50 AM, Vitaly Wool wrote: > For zram to be able to use zbud via the common zpool API, > allocations of size PAGE_SIZE should be allowed by zpool. > zbud uses the beginning of an allocated page for its internal > structure but it is not a problem as long as we keep track of > such special pages using a newly introduced page flag. > To be able to keep track of zbud pages in any case, struct page's > lru pointer will be used for zbud page lists instead of the one > that used to be part of the aforementioned internal structure. > > Signed-off-by: Vitaly Wool > --- > include/linux/page-flags.h | 3 ++ > mm/zbud.c | 71 ++++++++++++++++++++++++++++++++++++++-------- > 2 files changed, 62 insertions(+), 12 deletions(-) > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > index 416509e..dd47cf0 100644 > --- a/include/linux/page-flags.h > +++ b/include/linux/page-flags.h > @@ -134,6 +134,9 @@ enum pageflags { > > /* SLOB */ > PG_slob_free = PG_private, > + > + /* ZBUD */ > + PG_uncompressed = PG_owner_priv_1, you don't need a new page flag. and there's 0% chance it would be accepted even if you did. > }; > > #ifndef __GENERATING_BOUNDS_H > diff --git a/mm/zbud.c b/mm/zbud.c > index fa48bcdf..ee8b5d6 100644 > --- a/mm/zbud.c > +++ b/mm/zbud.c > @@ -107,13 +107,11 @@ struct zbud_pool { > * struct zbud_header - zbud page metadata occupying the first chunk of each > * zbud page. > * @buddy: links the zbud page into the unbuddied/buddied lists in the pool > - * @lru: links the zbud page into the lru list in the pool > * @first_chunks: the size of the first buddy in chunks, 0 if free > * @last_chunks: the size of the last buddy in chunks, 0 if free > */ > struct zbud_header { > struct list_head buddy; > - struct list_head lru; > unsigned int first_chunks; > unsigned int last_chunks; > bool under_reclaim; > @@ -221,6 +219,7 @@ MODULE_ALIAS("zpool-zbud"); > *****************/ > /* Just to make the code easier to read */ > enum buddy { > + FULL, > FIRST, > LAST > }; > @@ -241,7 +240,7 @@ static struct zbud_header *init_zbud_page(struct page *page) > zhdr->first_chunks = 0; > zhdr->last_chunks = 0; > INIT_LIST_HEAD(&zhdr->buddy); > - INIT_LIST_HEAD(&zhdr->lru); > + INIT_LIST_HEAD(&page->lru); > zhdr->under_reclaim = 0; > return zhdr; > } > @@ -267,11 +266,18 @@ static unsigned long encode_handle(struct zbud_header *zhdr, enum buddy bud) > * over the zbud header in the first chunk. > */ > handle = (unsigned long)zhdr; > - if (bud == FIRST) > + switch (bud) { > + case FIRST: > /* skip over zbud header */ > handle += ZHDR_SIZE_ALIGNED; > - else /* bud == LAST */ > + break; > + case LAST: > handle += PAGE_SIZE - (zhdr->last_chunks << CHUNK_SHIFT); > + break; > + case FULL: > + default: Hmm, while it should be ok to treat a default (invalid) bud value as a full page (assuming the caller treats it as such), you should at least add a pr_warn() or pr_warn_ratelimited(), or maybe a WARN_ON() or WARN_ON_ONCE(). the default case should never happen, and a warning should be printed if it does. > + break; > + } > return handle; > } > > @@ -360,6 +366,24 @@ int zbud_alloc(struct zbud_pool *pool, size_t size, gfp_t gfp, > > if (!size || (gfp & __GFP_HIGHMEM)) > return -EINVAL; > + > + if (size == PAGE_SIZE) { > + /* > + * This is a special case. The page will be allocated > + * and used to store uncompressed data > + */ well you shouldn't special case only PAGE_SIZE. If zram increases its max_zpage_size to a value > (PAGE_SIZE - ZHDR_SIZE_ALIGNED - CHUNK_SIZE) then those compressed pages will fail to store here. I think it would be better to change the size check to a simple if (size > PAGE_SIZE) return -ENOSPC; then use the existing > if (size > PAGE_SIZE - ZHDR_SIZE_ALIGNED - CHUNK_SIZE) to store the object (which is either a large compressed page, or an uncompressed page) into the full zbud page. And don't duplicate everything the function does inside an if (), just update the function to handle PAGE_SIZE storage. > + page = alloc_page(gfp); > + if (!page) > + return -ENOMEM; > + spin_lock(&pool->lock); > + pool->pages_nr++; > + INIT_LIST_HEAD(&page->lru); > + page->flags |= PG_uncompressed; > + list_add(&page->lru, &pool->lru); > + spin_unlock(&pool->lock); > + *handle = encode_handle(page_address(page), FULL); > + return 0; > + } > if (size > PAGE_SIZE - ZHDR_SIZE_ALIGNED - CHUNK_SIZE) > return -ENOSPC; > chunks = size_to_chunks(size); > @@ -372,6 +396,7 @@ int zbud_alloc(struct zbud_pool *pool, size_t size, gfp_t gfp, > zhdr = list_first_entry(&pool->unbuddied[i], > struct zbud_header, buddy); > list_del(&zhdr->buddy); > + page = virt_to_page(zhdr); > if (zhdr->first_chunks == 0) > bud = FIRST; > else > @@ -406,9 +431,9 @@ found: > } > > /* Add/move zbud page to beginning of LRU */ > - if (!list_empty(&zhdr->lru)) > - list_del(&zhdr->lru); > - list_add(&zhdr->lru, &pool->lru); > + if (!list_empty(&page->lru)) > + list_del(&page->lru); > + list_add(&page->lru, &pool->lru); > > *handle = encode_handle(zhdr, bud); > spin_unlock(&pool->lock); > @@ -430,9 +455,21 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle) > { > struct zbud_header *zhdr; > int freechunks; > + struct page *page; > > spin_lock(&pool->lock); > zhdr = handle_to_zbud_header(handle); > + page = virt_to_page(zhdr); > + > + /* If it was an uncompressed full page, just free it */ > + if (page->flags & PG_uncompressed) { > + page->flags &= ~PG_uncompressed; > + list_del(&page->lru); > + __free_page(page); > + pool->pages_nr--; > + spin_unlock(&pool->lock); > + return; > + } don't repeat this function inside an if() block. update the actual function to handle the new case. and you don't need a new page flag. you have 3 distinct cases: switch (handle & ~PAGE_MASK) { case 0: /* this is a full-sized page */ case ZHDR_SIZE_ALIGNED: /* this is the first buddy */ default: /* this is the last buddy */ } > > /* If first buddy, handle will be page aligned */ > if ((handle - ZHDR_SIZE_ALIGNED) & ~PAGE_MASK) > @@ -451,7 +488,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle) > > if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) { > /* zbud page is empty, free */ > - list_del(&zhdr->lru); > + list_del(&page->lru); > free_zbud_page(zhdr); > pool->pages_nr--; > } else { > @@ -505,6 +542,7 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) > { > int i, ret, freechunks; > struct zbud_header *zhdr; > + struct page *page; > unsigned long first_handle = 0, last_handle = 0; > > spin_lock(&pool->lock); > @@ -514,8 +552,17 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) > return -EINVAL; > } > for (i = 0; i < retries; i++) { > - zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru); > - list_del(&zhdr->lru); > + page = list_tail_entry(&pool->lru, struct page, lru); > + zhdr = page_address(page); > + list_del(&page->lru); > + /* Uncompressed zbud page? just run eviction and free it */ > + if (page->flags & PG_uncompressed) { > + page->flags &= ~PG_uncompressed; > + spin_unlock(&pool->lock); > + pool->ops->evict(pool, encode_handle(zhdr, FULL)); > + __free_page(page); > + return 0; again, don't be redundant. change the function to handle full-sized pages, don't repeat the function in an if() block for a special case. > + } > list_del(&zhdr->buddy); > /* Protect zbud page against free */ > zhdr->under_reclaim = true; > @@ -565,7 +612,7 @@ next: > } > > /* add to beginning of LRU */ > - list_add(&zhdr->lru, &pool->lru); > + list_add(&page->lru, &pool->lru); > } > spin_unlock(&pool->lock); > return -EAGAIN; > -- > 1.9.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f177.google.com (mail-ig0-f177.google.com [209.85.213.177]) by kanga.kvack.org (Postfix) with ESMTP id 82C436B0254 for ; Mon, 21 Sep 2015 13:15:51 -0400 (EDT) Received: by igcpb10 with SMTP id pb10so83484794igc.1 for ; Mon, 21 Sep 2015 10:15:51 -0700 (PDT) Received: from mail-io0-x232.google.com (mail-io0-x232.google.com. [2607:f8b0:4001:c06::232]) by mx.google.com with ESMTPS id h127si17880728ioh.207.2015.09.21.10.15.50 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 21 Sep 2015 10:15:50 -0700 (PDT) Received: by iofh134 with SMTP id h134so127106923iof.0 for ; Mon, 21 Sep 2015 10:15:50 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20150916135347.149f550d51751c58c8b7ca96@gmail.com> References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> <20150916135347.149f550d51751c58c8b7ca96@gmail.com> From: Dan Streetman Date: Mon, 21 Sep 2015 13:15:10 -0400 Message-ID: Subject: Re: [PATCH 2/2] zpool/zsmalloc/zbud: align on interfaces Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Vitaly Wool Cc: Andrew Morton , Minchan Kim , Sergey Senozhatsky , linux-kernel , Linux-MM , Seth Jennings On Wed, Sep 16, 2015 at 7:53 AM, Vitaly Wool wrote: > As a preparation step for zram to be able to use common zpool API, > there has to be some alignment done on it. This patch adds > functions that correspond to zsmalloc-specific API to the common > zpool API and takes care of the callbacks that have to be > introduced, too. > > This version of the patch uses simplified 'compact' API/callbacks. > > Signed-off-by: Vitaly Wool > --- > drivers/block/zram/zram_drv.c | 4 ++-- > include/linux/zpool.h | 14 ++++++++++++++ > include/linux/zsmalloc.h | 8 ++------ > mm/zbud.c | 12 ++++++++++++ > mm/zpool.c | 21 +++++++++++++++++++++ > mm/zsmalloc.c | 19 ++++++++++++++++--- > 6 files changed, 67 insertions(+), 11 deletions(-) > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > index 9fa15bb..a0a786e 100644 > --- a/drivers/block/zram/zram_drv.c > +++ b/drivers/block/zram/zram_drv.c > @@ -426,12 +426,12 @@ static ssize_t mm_stat_show(struct device *dev, > struct device_attribute *attr, char *buf) > { > struct zram *zram = dev_to_zram(dev); > - struct zs_pool_stats pool_stats; > + struct zpool_stats pool_stats; > u64 orig_size, mem_used = 0; > long max_used; > ssize_t ret; > > - memset(&pool_stats, 0x00, sizeof(struct zs_pool_stats)); > + memset(&pool_stats, 0x00, sizeof(struct zpool_stats)); > > down_read(&zram->init_lock); > if (init_done(zram)) { you don't need to change zram in this patch. save this part for the patch to update zram to use zpool. > diff --git a/include/linux/zpool.h b/include/linux/zpool.h > index 42f8ec9..a2a5bc4 100644 > --- a/include/linux/zpool.h > +++ b/include/linux/zpool.h > @@ -17,6 +17,11 @@ struct zpool_ops { > int (*evict)(struct zpool *pool, unsigned long handle); > }; > > +struct zpool_stats { > + /* How many pages were migrated (freed) */ > + unsigned long pages_compacted; > +}; > + > /* > * Control how a handle is mapped. It will be ignored if the > * implementation does not support it. Its use is optional. > @@ -58,6 +63,10 @@ void *zpool_map_handle(struct zpool *pool, unsigned long handle, > > void zpool_unmap_handle(struct zpool *pool, unsigned long handle); > > +unsigned long zpool_compact(struct zpool *pool); > + > +void zpool_stats(struct zpool *pool, struct zpool_stats *zstats); > + > u64 zpool_get_total_size(struct zpool *pool); > > > @@ -72,6 +81,8 @@ u64 zpool_get_total_size(struct zpool *pool); > * @shrink: shrink the pool. > * @map: map a handle. > * @unmap: unmap a handle. > + * @compact: try to run compaction for the pool > + * @stats: get statistics for the pool > * @total_size: get total size of a pool. > * > * This is created by a zpool implementation and registered > @@ -98,6 +109,9 @@ struct zpool_driver { > enum zpool_mapmode mm); > void (*unmap)(void *pool, unsigned long handle); > > + unsigned long (*compact)(void *pool); > + void (*stats)(void *pool, struct zpool_stats *stats); > + > u64 (*total_size)(void *pool); > }; > > diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h > index 6398dfa..5aee1c7 100644 > --- a/include/linux/zsmalloc.h > +++ b/include/linux/zsmalloc.h > @@ -15,6 +15,7 @@ > #define _ZS_MALLOC_H_ > > #include > +#include > > /* > * zsmalloc mapping modes > @@ -34,11 +35,6 @@ enum zs_mapmode { > */ > }; > > -struct zs_pool_stats { > - /* How many pages were migrated (freed) */ > - unsigned long pages_compacted; > -}; > - > struct zs_pool; > > struct zs_pool *zs_create_pool(char *name, gfp_t flags); > @@ -54,5 +50,5 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle); > unsigned long zs_get_total_pages(struct zs_pool *pool); > unsigned long zs_compact(struct zs_pool *pool); > > -void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats); > +void zs_pool_stats(struct zs_pool *pool, struct zpool_stats *stats); > #endif > diff --git a/mm/zbud.c b/mm/zbud.c > index ee8b5d6..23cfc76 100644 > --- a/mm/zbud.c > +++ b/mm/zbud.c > @@ -193,6 +193,16 @@ static void zbud_zpool_unmap(void *pool, unsigned long handle) > zbud_unmap(pool, handle); > } > > +static unsigned long zbud_zpool_compact(void *pool) > +{ > + return 0; > +} > + > +static void zbud_zpool_stats(void *pool, struct zpool_stats *stats) > +{ > + /* no-op */ > +} > + > static u64 zbud_zpool_total_size(void *pool) > { > return zbud_get_pool_size(pool) * PAGE_SIZE; > @@ -208,6 +218,8 @@ static struct zpool_driver zbud_zpool_driver = { > .shrink = zbud_zpool_shrink, > .map = zbud_zpool_map, > .unmap = zbud_zpool_unmap, > + .compact = zbud_zpool_compact, > + .stats = zbud_zpool_stats, > .total_size = zbud_zpool_total_size, > }; > > diff --git a/mm/zpool.c b/mm/zpool.c > index 8f670d3..d454f37 100644 > --- a/mm/zpool.c > +++ b/mm/zpool.c > @@ -341,6 +341,27 @@ void zpool_unmap_handle(struct zpool *zpool, unsigned long handle) > } > > /** > + * zpool_compact() - try to run compaction over zpool > + * @pool The zpool to compact > + * > + * Returns: the number of migrated pages (0 if nothing happened) don't say "nothing happened", 0 doesn't necessarily mean that. 0 means no pages were compacted. > + */ > +unsigned long zpool_compact(struct zpool *zpool) > +{ > + return zpool->driver->compact(zpool->pool); > +} > + > +/** > + * zpool_stats() - obtain zpool statistics > + * @pool The zpool to get statistics for > + * @zstats stats to fill in this doc needs more. how is this function used? what should the caller expect in the zstats fields before and after the call? for the common zpool interface, it might make more sense to just include a direct function to access the pages_compacted, instead of the indirect stats call. If more stats get added in the future, the call can be changed into a zpool_stats function. > + */ > +void zpool_stats(struct zpool *zpool, struct zpool_stats *zstats) > +{ > + zpool->driver->stats(zpool->pool, zstats); > +} > + > +/** > * zpool_get_total_size() - The total size of the pool > * @pool The zpool to check > * > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > index f135b1b..3ab0515 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c > @@ -245,7 +245,7 @@ struct zs_pool { > gfp_t flags; /* allocation flags used when growing pool */ > atomic_long_t pages_allocated; > > - struct zs_pool_stats stats; > + struct zpool_stats stats; we haven't made zsmalloc completely dependent on zpool just yet :-) this won't compile when CONFIG_ZPOOL isn't set. leave this as a zs_pool_stats type and just handle the translation in the zs_zpool_stats function call. > > /* Compact classes */ > struct shrinker shrinker; > @@ -365,6 +365,17 @@ static void zs_zpool_unmap(void *pool, unsigned long handle) > zs_unmap_object(pool, handle); > } > > +static unsigned long zs_zpool_compact(void *pool) > +{ > + return zs_compact(pool); > +} > + > + > +static void zs_zpool_stats(void *pool, struct zpool_stats *stats) > +{ > + zs_pool_stats(pool, stats); > +} > + > static u64 zs_zpool_total_size(void *pool) > { > return zs_get_total_pages(pool) << PAGE_SHIFT; > @@ -380,6 +391,8 @@ static struct zpool_driver zs_zpool_driver = { > .shrink = zs_zpool_shrink, > .map = zs_zpool_map, > .unmap = zs_zpool_unmap, > + .compact = zs_zpool_compact, > + .stats = zs_zpool_stats, > .total_size = zs_zpool_total_size, > }; > > @@ -1789,9 +1802,9 @@ unsigned long zs_compact(struct zs_pool *pool) > } > EXPORT_SYMBOL_GPL(zs_compact); > > -void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats) > +void zs_pool_stats(struct zs_pool *pool, struct zpool_stats *stats) > { > - memcpy(stats, &pool->stats, sizeof(struct zs_pool_stats)); > + memcpy(stats, &pool->stats, sizeof(struct zpool_stats)); > } > EXPORT_SYMBOL_GPL(zs_pool_stats); > > -- > 1.9.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f179.google.com (mail-wi0-f179.google.com [209.85.212.179]) by kanga.kvack.org (Postfix) with ESMTP id 0816A6B0038 for ; Mon, 21 Sep 2015 17:11:02 -0400 (EDT) Received: by wicgb1 with SMTP id gb1so133013434wic.1 for ; Mon, 21 Sep 2015 14:11:01 -0700 (PDT) Received: from mail-wi0-x235.google.com (mail-wi0-x235.google.com. [2a00:1450:400c:c05::235]) by mx.google.com with ESMTPS id n4si19940780wia.64.2015.09.21.14.11.00 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 21 Sep 2015 14:11:00 -0700 (PDT) Received: by wiclk2 with SMTP id lk2so165797556wic.0 for ; Mon, 21 Sep 2015 14:11:00 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20150921041837.GF27729@bbox> References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> <20150917013007.GB421@swordfish> <20150921041837.GF27729@bbox> Date: Mon, 21 Sep 2015 23:11:00 +0200 Message-ID: Subject: Re: [PATCH 0/2] prepare zbud to be used by zram as underlying allocator From: Vitaly Wool Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: Sergey Senozhatsky , Dan Streetman , Andrew Morton , Sergey Senozhatsky , LKML , linux-mm@kvack.org Hello Minchan, > Sorry, because you wrote up "zram" in the title. > As I said earlier, we need several numbers to investigate. > > First of all, what is culprit of your latency? > It seems you are thinking about compaction. so compaction what? > Frequent scanning? lock collision? or frequent sleeping in compaction > code somewhere? And then why does zbud solve it? If we use zbud for zram, > we lose memory efficiency so there is something to justify it. The data I've got so far strongly suggests that in some use cases (see below) with zsmalloc * there are more allocstalls * memory compaction is triggered more frequently * allocstalls happen more often * page migrations are way more frequent, too. Please also keep in mind that I do not advise you or anyone to use zbud instead of zsmalloc. The point I'm trying to make is that zbud fits my particular case better and I want to be able to choose it in the kernel without hacking it with my private patches. FWIW, given that I am not an author of either, I don't see why anyone would consider me biased. :-) As of the memory efficiency, you seem to be quite comfortable with storing uncompressed pages when they compress to more than 3/4 of a page. I observed ~13% reported ratio increase (3.8x to 4.3x) when I increased max_zpage_size to PAGE_SIZE / 32 * 31. Doesn't look like a fight for every byte to me. > The reason I am asking is I have investigated similar problems > in android and other plaforms and the reason of latency was not zsmalloc > but agressive high-order allocations from subsystems, watermark check > race, deferring of compaction, LMK not working and too much swapout so > it causes to reclaim lots of page cache pages which was main culprit > in my cases. When I checks with perf, compaction stall count is increased, > the time spent in there is not huge so it was not main factor of latency. The main use case where the difference is seen is switching between users on an Android device. It does cause a lot of reclaim, too, as you say, but this is in the nature of zbud that reclaim happens in a more deterministic way and worst-case looks substantially nicer. That said, the standard deviation calculated over 20 iterations of a change-user-multiple-times-test is 2x less for zbud than the one of zsmalloc. I'll post some numbers in the next patch respin so they won't get lost :) ~vitaly -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f172.google.com (mail-wi0-f172.google.com [209.85.212.172]) by kanga.kvack.org (Postfix) with ESMTP id C80AF6B0038 for ; Tue, 22 Sep 2015 06:18:59 -0400 (EDT) Received: by wicfx3 with SMTP id fx3so184532846wic.1 for ; Tue, 22 Sep 2015 03:18:59 -0700 (PDT) Received: from mail-wi0-x22d.google.com (mail-wi0-x22d.google.com. [2a00:1450:400c:c05::22d]) by mx.google.com with ESMTPS id fa17si2858546wid.21.2015.09.22.03.18.58 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 22 Sep 2015 03:18:58 -0700 (PDT) Received: by wicfx3 with SMTP id fx3so16376945wic.0 for ; Tue, 22 Sep 2015 03:18:58 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> <20150916135048.fbd50fac5e91244ab9731b82@gmail.com> Date: Tue, 22 Sep 2015 12:18:58 +0200 Message-ID: Subject: Re: [PATCH 1/2] zbud: allow PAGE_SIZE allocations From: Vitaly Wool Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Dan Streetman Cc: Andrew Morton , Minchan Kim , Sergey Senozhatsky , linux-kernel , Linux-MM , Seth Jennings Hi Dan, On Mon, Sep 21, 2015 at 6:17 PM, Dan Streetman wrote: > Please make sure to cc Seth also, he's the owner of zbud. Sure :) >> @@ -514,8 +552,17 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) >> return -EINVAL; >> } >> for (i = 0; i < retries; i++) { >> - zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru); >> - list_del(&zhdr->lru); >> + page = list_tail_entry(&pool->lru, struct page, lru); >> + zhdr = page_address(page); >> + list_del(&page->lru); >> + /* Uncompressed zbud page? just run eviction and free it */ >> + if (page->flags & PG_uncompressed) { >> + page->flags &= ~PG_uncompressed; >> + spin_unlock(&pool->lock); >> + pool->ops->evict(pool, encode_handle(zhdr, FULL)); >> + __free_page(page); >> + return 0; > > again, don't be redundant. change the function to handle full-sized > pages, don't repeat the function in an if() block for a special case. Well, this case is a little tricky. How to process a zbud page in zbud_reclaim_page() is defined basing on the assumption there is a zhdr at the beginning of the page. What can be done here IMV is either of the following: * add a constant magic number to zhdr and check for it, if the check fails, it is a type FULL page * add a CRC field to zhdr, if CRC check over assumed zhdr fails, it is a type FULL page * use a field from struct page to indicate its type The last option still looks better to me. ~vitaly -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-f177.google.com (mail-ob0-f177.google.com [209.85.214.177]) by kanga.kvack.org (Postfix) with ESMTP id 4B0946B0253 for ; Tue, 22 Sep 2015 11:35:13 -0400 (EDT) Received: by obbda8 with SMTP id da8so10563999obb.1 for ; Tue, 22 Sep 2015 08:35:13 -0700 (PDT) Received: from mail-pa0-x236.google.com (mail-pa0-x236.google.com. [2607:f8b0:400e:c03::236]) by mx.google.com with ESMTPS id u193si1416366oif.28.2015.09.22.08.35.12 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 22 Sep 2015 08:35:12 -0700 (PDT) Received: by pacfv12 with SMTP id fv12so12475341pac.2 for ; Tue, 22 Sep 2015 08:35:11 -0700 (PDT) Date: Wed, 23 Sep 2015 00:36:40 +0900 From: Minchan Kim Subject: Re: [PATCH 0/2] prepare zbud to be used by zram as underlying allocator Message-ID: <20150922153640.GA14817@bbox> References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> <20150917013007.GB421@swordfish> <20150921041837.GF27729@bbox> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Vitaly Wool Cc: Sergey Senozhatsky , Dan Streetman , Andrew Morton , Sergey Senozhatsky , LKML , linux-mm@kvack.org Hi Vitaly, On Mon, Sep 21, 2015 at 11:11:00PM +0200, Vitaly Wool wrote: > Hello Minchan, > > > Sorry, because you wrote up "zram" in the title. > > As I said earlier, we need several numbers to investigate. > > > > First of all, what is culprit of your latency? > > It seems you are thinking about compaction. so compaction what? > > Frequent scanning? lock collision? or frequent sleeping in compaction > > code somewhere? And then why does zbud solve it? If we use zbud for zram, > > we lose memory efficiency so there is something to justify it. > > The data I've got so far strongly suggests that in some use cases (see > below) with zsmalloc > * there are more allocstalls > * memory compaction is triggered more frequently > * allocstalls happen more often > * page migrations are way more frequent, too. > > Please also keep in mind that I do not advise you or anyone to use > zbud instead of zsmalloc. The point I'm trying to make is that zbud > fits my particular case better and I want to be able to choose it in > the kernel without hacking it with my private patches. I understand your goal well. ;-) But, please understand my goal which is to find fundamental reason why zbud removes latency. You gave some compaction-related stats but it is just one of result, not the cause. I guess you could find another stats as well as compaction stats which affect your workload. Once you find them all, please investigate what is major factor for your latency among them. Then, we should think over what is best solution for it and if zbud is best to remove the cause, yes, why not. I can merge it into zram. IOW, I should maintain zram so I need to know when,where,how to use zbud with zram is useful so that I can guide it to zram users and you should *justify* the overhead to me. Overhead means I should maintain two allocators for zram from now on. It means when I want to add some feature for zsmalloc, I should take care of zbud and I should watch zbud patches, too which could be very painful and starting point of diverge for zram. Compared to zsmalloc, zsmalloc packs lots of compressed objects into a page while zbud just stores two objects so if there are different life time objects in a page, zsmalloc may make higher fragmentation but zbud is not a good choice for memory efficiency either so my concern starts from here. For solving such problem, we added compaction into recent zram to reduce waste memory space so it should solve internal fragment problem. Other problem we don't solve now is external fragmentation which is related to compaction stats you are seeing now. Although you are seeing mitigation with zbud, it would be still problem if you begin to use more memory for zbud. One of example, a few years ago, some guys tried to support zbud page migration. If external fragmentation is really problem in here, we should proivde a feature VM can migrate zsmalloc page and it was alomost done as I told you previous thread and I think it is really way to go. Even, we are trying to add zlib which is better compress ratio algorithm to reduce working memory size so without the feature, the problem would be more severe. So, I am thinking now we should enhance it rather than hiding a problem by merging zbud. > FWIW, given that I am not an author of either, I don't see why anyone > would consider me biased. :-) > > As of the memory efficiency, you seem to be quite comfortable with > storing uncompressed pages when they compress to more than 3/4 of a > page. I observed ~13% reported ratio increase (3.8x to 4.3x) when I > increased max_zpage_size to PAGE_SIZE / 32 * 31. Doesn't look like a > fight for every byte to me. Thanks for the report. It could be another patch. :) > > > The reason I am asking is I have investigated similar problems > > in android and other plaforms and the reason of latency was not zsmalloc > > but agressive high-order allocations from subsystems, watermark check > > race, deferring of compaction, LMK not working and too much swapout so > > it causes to reclaim lots of page cache pages which was main culprit > > in my cases. When I checks with perf, compaction stall count is increased, > > the time spent in there is not huge so it was not main factor of latency. > > The main use case where the difference is seen is switching between > users on an Android device. It does cause a lot of reclaim, too, as > you say, but this is in the nature of zbud that reclaim happens in a > more deterministic way and worst-case looks substantially nicer. That Interesting! Why is reclaim more deterministic with zbud? That's really one of part what I want with data. > said, the standard deviation calculated over 20 iterations of a > change-user-multiple-times-test is 2x less for zbud than the one of > zsmalloc. One thing I can guess is a page could be freed easily if just two objects in a page are freed by munmap or kill. IOW, we could remove pinned page easily so we could get higher-order page easily. However, it would be different once zbud's memory usgae is higher as I mentioned. As well, we lose memory efficieny significantly for zram. :( IMO, more fundamentatal solution is to support VM-aware compaction of zsmalloc/zbud rather than hiding a problem with zbud. Thanks. > > I'll post some numbers in the next patch respin so they won't get lost :) > > ~vitaly -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-f181.google.com (mail-qk0-f181.google.com [209.85.220.181]) by kanga.kvack.org (Postfix) with ESMTP id 6E5566B0254 for ; Tue, 22 Sep 2015 12:49:28 -0400 (EDT) Received: by qkfq186 with SMTP id q186so6466398qkf.1 for ; Tue, 22 Sep 2015 09:49:28 -0700 (PDT) Received: from mail-qk0-x235.google.com (mail-qk0-x235.google.com. [2607:f8b0:400d:c09::235]) by mx.google.com with ESMTPS id 143si2331326qhy.11.2015.09.22.09.49.27 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 22 Sep 2015 09:49:27 -0700 (PDT) Received: by qkcf65 with SMTP id f65so6464626qkc.3 for ; Tue, 22 Sep 2015 09:49:27 -0700 (PDT) Subject: Re: [PATCH 0/2] prepare zbud to be used by zram as underlying allocator References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> <20150917013007.GB421@swordfish> <20150921041837.GF27729@bbox> <20150922153640.GA14817@bbox> From: Austin S Hemmelgarn Message-ID: <56018695.5030700@gmail.com> Date: Tue, 22 Sep 2015 12:49:25 -0400 MIME-Version: 1.0 In-Reply-To: <20150922153640.GA14817@bbox> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-512; boundary="------------ms060901050207080706030800" Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim , Vitaly Wool Cc: Sergey Senozhatsky , Dan Streetman , Andrew Morton , Sergey Senozhatsky , LKML , linux-mm@kvack.org This is a cryptographically signed message in MIME format. --------------ms060901050207080706030800 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: quoted-printable On 2015-09-22 11:36, Minchan Kim wrote: > Hi Vitaly, > > On Mon, Sep 21, 2015 at 11:11:00PM +0200, Vitaly Wool wrote: >> Hello Minchan, >> >>> Sorry, because you wrote up "zram" in the title. >>> As I said earlier, we need several numbers to investigate. >>> >>> First of all, what is culprit of your latency? >>> It seems you are thinking about compaction. so compaction what? >>> Frequent scanning? lock collision? or frequent sleeping in compaction= >>> code somewhere? And then why does zbud solve it? If we use zbud for z= ram, >>> we lose memory efficiency so there is something to justify it. >> >> The data I've got so far strongly suggests that in some use cases (see= >> below) with zsmalloc >> * there are more allocstalls >> * memory compaction is triggered more frequently >> * allocstalls happen more often >> * page migrations are way more frequent, too. >> >> Please also keep in mind that I do not advise you or anyone to use >> zbud instead of zsmalloc. The point I'm trying to make is that zbud >> fits my particular case better and I want to be able to choose it in >> the kernel without hacking it with my private patches. > > I understand your goal well. ;-) But, please understand my goal which > is to find fundamental reason why zbud removes latency. > > You gave some compaction-related stats but it is just one of result, > not the cause. I guess you could find another stats as well as compacti= on > stats which affect your workload. Once you find them all, please > investigate what is major factor for your latency among them. > Then, we should think over what is best solution for it and if zbud is > best to remove the cause, yes, why not. I can merge it into zram. > > IOW, I should maintain zram so I need to know when,where,how to use zbu= d > with zram is useful so that I can guide it to zram users and you should= > *justify* the overhead to me. Overhead means I should maintain two allo= cators > for zram from now on. It means when I want to add some feature for zsma= lloc, > I should take care of zbud and I should watch zbud patches, too which c= ould > be very painful and starting point of diverge for zram. > > Compared to zsmalloc, zsmalloc packs lots of compressed objects into > a page while zbud just stores two objects so if there are different > life time objects in a page, zsmalloc may make higher fragmentation > but zbud is not a good choice for memory efficiency either so my concer= n > starts from here. > > For solving such problem, we added compaction into recent zram to > reduce waste memory space so it should solve internal fragment problem.= > Other problem we don't solve now is external fragmentation which > is related to compaction stats you are seeing now. > Although you are seeing mitigation with zbud, it would be still problem= > if you begin to use more memory for zbud. One of example, a few years > ago, some guys tried to support zbud page migration. > > If external fragmentation is really problem in here, we should proivde > a feature VM can migrate zsmalloc page and it was alomost done as I tol= d > you previous thread and I think it is really way to go. > > Even, we are trying to add zlib which is better compress ratio algorith= m > to reduce working memory size so without the feature, the problem would= be > more severe. > > So, I am thinking now we should enhance it rather than hiding a problem= > by merging zbud. You know,from my perspective, most of the above argument 'against' zbud=20 being supported really provides no actual reason to not support zbud=20 without reading between the lines (which as far as I can tell, is 'I=20 just don't want to support it'), it just points out all the differences=20 between zsmalloc and zbud. They are different API's, they have=20 different semantics, they are intended for different workloads and use=20 cases, and this will always be the case. To be entirely honest, the=20 more complicated you make zsmalloc in an attempt to obviate the need to=20 support zbud, the more attractive zbud looks as far as I'm concerned. > >> FWIW, given that I am not an author of either, I don't see why anyone >> would consider me biased. :-) >> >> As of the memory efficiency, you seem to be quite comfortable with >> storing uncompressed pages when they compress to more than 3/4 of a >> page. I observed ~13% reported ratio increase (3.8x to 4.3x) when I >> increased max_zpage_size to PAGE_SIZE / 32 * 31. Doesn't look like a >> fight for every byte to me. > > Thanks for the report. It could be another patch. :) > >> >>> The reason I am asking is I have investigated similar problems >>> in android and other plaforms and the reason of latency was not zsmal= loc >>> but agressive high-order allocations from subsystems, watermark check= >>> race, deferring of compaction, LMK not working and too much swapout s= o >>> it causes to reclaim lots of page cache pages which was main culprit >>> in my cases. When I checks with perf, compaction stall count is incre= ased, >>> the time spent in there is not huge so it was not main factor of late= ncy. >> >> The main use case where the difference is seen is switching between >> users on an Android device. It does cause a lot of reclaim, too, as >> you say, but this is in the nature of zbud that reclaim happens in a >> more deterministic way and worst-case looks substantially nicer. That > > Interesting! > Why is reclaim more deterministic with zbud? > That's really one of part what I want with data. I'm pretty sure this is due to the fact that zbud stores at most 2=20 compressed pages in a given page, combined with fewer conditionals in=20 the reclaim path. > > >> said, the standard deviation calculated over 20 iterations of a >> change-user-multiple-times-test is 2x less for zbud than the one of >> zsmalloc. > > One thing I can guess is a page could be freed easily if just two objec= ts > in a page are freed by munmap or kill. IOW, we could remove pinned page= > easily so we could get higher-order page easily. > > However, it would be different once zbud's memory usgae is higher > as I mentioned. As well, we lose memory efficieny significantly for zra= m. :( Not everyone who uses zram cares hugely about memory efficiency, I for=20 one would rather have a more deterministic compression ratio (which zbud = achieves) than slightly better memory efficiencyg. > > IMO, more fundamentatal solution is to support VM-aware compaction of > zsmalloc/zbud rather than hiding a problem with zbud. --------------ms060901050207080706030800 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Brgwgga0MIIEnKADAgECAgMRLfgwDQYJKoZIhvcNAQENBQAweTEQMA4GA1UEChMHUm9vdCBD QTEeMBwGA1UECxMVaHR0cDovL3d3dy5jYWNlcnQub3JnMSIwIAYDVQQDExlDQSBDZXJ0IFNp Z25pbmcgQXV0aG9yaXR5MSEwHwYJKoZIhvcNAQkBFhJzdXBwb3J0QGNhY2VydC5vcmcwHhcN MTUwOTIxMTEzNTEzWhcNMTYwMzE5MTEzNTEzWjBjMRgwFgYDVQQDEw9DQWNlcnQgV29UIFVz ZXIxIzAhBgkqhkiG9w0BCQEWFGFoZmVycm9pbjdAZ21haWwuY29tMSIwIAYJKoZIhvcNAQkB FhNhaGVtbWVsZ0BvaGlvZ3QuY29tMIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEA nQ/81tq0QBQi5w316VsVNfjg6kVVIMx760TuwA1MUaNQgQ3NyUl+UyFtjhpkNwwChjgAqfGd LIMTHAdObcwGfzO5uI2o1a8MHVQna8FRsU3QGouysIOGQlX8jFYXMKPEdnlt0GoQcd+BtESr pivbGWUEkPs1CwM6WOrs+09bAJP3qzKIr0VxervFrzrC5Dg9Rf18r9WXHElBuWHg4GYHNJ2V Ab8iKc10h44FnqxZK8RDN8ts/xX93i9bIBmHnFfyNRfiOUtNVeynJbf6kVtdHP+CRBkXCNRZ qyQT7gbTGD24P92PS2UTmDfplSBcWcTn65o3xWfesbf02jF6PL3BCrVnDRI4RgYxG3zFBJuG qvMoEODLhHKSXPAyQhwZINigZNdw5G1NqjXqUw+lIqdQvoPijK9J3eijiakh9u2bjWOMaleI SMRR6XsdM2O5qun1dqOrCgRkM0XSNtBQ2JjY7CycIx+qifJWsRaYWZz0aQU4ZrtAI7gVhO9h pyNaAGjvm7PdjEBiXq57e4QcgpwzvNlv8pG1c/hnt0msfDWNJtl3b6elhQ2Pz4w/QnWifZ8E BrFEmjeeJa2dqjE3giPVWrsH+lOvQQONsYJOuVb8b0zao4vrWeGmW2q2e3pdv0Axzm/60cJQ haZUv8+JdX9ZzqxOm5w5eUQSclt84u+D+hsCAwEAAaOCAVkwggFVMAwGA1UdEwEB/wQCMAAw VgYJYIZIAYb4QgENBEkWR1RvIGdldCB5b3VyIG93biBjZXJ0aWZpY2F0ZSBmb3IgRlJFRSBo ZWFkIG92ZXIgdG8gaHR0cDovL3d3dy5DQWNlcnQub3JnMA4GA1UdDwEB/wQEAwIDqDBABgNV HSUEOTA3BggrBgEFBQcDBAYIKwYBBQUHAwIGCisGAQQBgjcKAwQGCisGAQQBgjcKAwMGCWCG SAGG+EIEATAyBggrBgEFBQcBAQQmMCQwIgYIKwYBBQUHMAGGFmh0dHA6Ly9vY3NwLmNhY2Vy dC5vcmcwMQYDVR0fBCowKDAmoCSgIoYgaHR0cDovL2NybC5jYWNlcnQub3JnL3Jldm9rZS5j cmwwNAYDVR0RBC0wK4EUYWhmZXJyb2luN0BnbWFpbC5jb22BE2FoZW1tZWxnQG9oaW9ndC5j b20wDQYJKoZIhvcNAQENBQADggIBADMnxtSLiIunh/TQcjnRdf63yf2D8jMtYUm4yDoCF++J jCXbPQBGrpCEHztlNSGIkF3PH7ohKZvlqF4XePWxpY9dkr/pNyCF1PRkwxUURqvuHXbu8Lwn 8D3U2HeOEU3KmrfEo65DcbanJCMTTW7+mU9lZICPP7ZA9/zB+L0Gm1UNFZ6AU50N/86vjQfY WgkCd6dZD4rQ5y8L+d/lRbJW7ZGEQw1bSFVTRpkxxDTOwXH4/GpQfnfqTAtQuJ1CsKT12e+H NSD/RUWGTr289dA3P4nunBlz7qfvKamxPymHeBEUcuICKkL9/OZrnuYnGROFwcdvfjGE5iLB kjp/ttrY4aaVW5EsLASNgiRmA6mbgEAMlw3RwVx0sVelbiIAJg9Twzk4Ct6U9uBKiJ8S0sS2 8RCSyTmCRhJs0vvva5W9QUFGmp5kyFQEoSfBRJlbZfGX2ehI2Hi3U2/PMUm2ONuQG1E+a0AP u7I0NJc/Xil7rqR0gdbfkbWp0a+8dAvaM6J00aIcNo+HkcQkUgtfrw+C2Oyl3q8IjivGXZqT 5UdGUb2KujLjqjG91Dun3/RJ/qgQlotH7WkVBs7YJVTCxfkdN36rToPcnMYOI30FWa0Q06gn F6gUv9/mo6riv3A5bem/BdbgaJoPnWQD9D8wSyci9G4LKC+HQAMdLmGoeZfpJzKHMYIE0TCC BM0CAQEwgYAweTEQMA4GA1UEChMHUm9vdCBDQTEeMBwGA1UECxMVaHR0cDovL3d3dy5jYWNl cnQub3JnMSIwIAYDVQQDExlDQSBDZXJ0IFNpZ25pbmcgQXV0aG9yaXR5MSEwHwYJKoZIhvcN AQkBFhJzdXBwb3J0QGNhY2VydC5vcmcCAxEt+DANBglghkgBZQMEAgMFAKCCAiEwGAYJKoZI hvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMTUwOTIyMTY0OTI1WjBPBgkq hkiG9w0BCQQxQgRAiKSeEZLoPU0Q+pPLTDRwl98jslYonZBLIrVLmDemX6KcI9bzbAcHhjjg ClSuDKDS1Qi/eItjwP51diIWtNZfZjBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFlAwQBKjAL BglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3DQMCAgFA MAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIGRBgkrBgEEAYI3EAQxgYMwgYAweTEQMA4GA1UE ChMHUm9vdCBDQTEeMBwGA1UECxMVaHR0cDovL3d3dy5jYWNlcnQub3JnMSIwIAYDVQQDExlD QSBDZXJ0IFNpZ25pbmcgQXV0aG9yaXR5MSEwHwYJKoZIhvcNAQkBFhJzdXBwb3J0QGNhY2Vy dC5vcmcCAxEt+DCBkwYLKoZIhvcNAQkQAgsxgYOggYAweTEQMA4GA1UEChMHUm9vdCBDQTEe MBwGA1UECxMVaHR0cDovL3d3dy5jYWNlcnQub3JnMSIwIAYDVQQDExlDQSBDZXJ0IFNpZ25p bmcgQXV0aG9yaXR5MSEwHwYJKoZIhvcNAQkBFhJzdXBwb3J0QGNhY2VydC5vcmcCAxEt+DAN BgkqhkiG9w0BAQEFAASCAgBt0pwRCf0HrUjPw6wP05hXX1tRtm/ocpc6C/KQGpMBraG30Okc 7qRL6pWefV9DA6AeNIlGS6Rm9mmQCp56Q8cB+BqL/eSM63tjBX3hQgc762NPlwH2GmUVmtJi CKKz5pbWK9sAuq0juKMr3HnPvNDHU6AOW4k9CgsEGqGNThlEDq5shPUm7O4Vflto2U9ykKZZ xqOtJ2Y8kqEZ2yXHz1r4Uq0fwgHohND9hgV/bC0sGqscIKtALYxebaaOpem+O1Y7MwZnRefb J/6u7H8LeRaQuvACwbsRpEC2DL1qdKUlefMSRKT6S3tG0R/EUMw84UBTv8WKBYYhVqmFJEI6 jHxRdeqaByEcZpa1MMRuObCIVl+9kTuKbKPy1cqFvC8U79zXu08x6TUMLlXVvlrVwCtV7ip1 zFdp9AbjsJoTvSvHSEL9StzGaeACsbA/aVhv5VCnUlpor/d0A0TkZ9QcGH+sTAhDw9Q9eI70 dyYG29g85FcJ9ObCnmwXVquhtDbjkZZoiYMdHG71LCfs3aNNmCibuv75AxPvNiGLxYu3Nrnj wKW2qcskwabBKbWX6PxvSZ4juv7gDUvFCfEW+WIK+Rx0iY9hMp67PlGHLp1xsy8yLuJW5RbW VwVG1A40PPXHaVDF2Ll4Iti+FUkpjYl+uXcAjrrNZ1N2OAJtFBz+FhLbQgAAAAAAAA== --------------ms060901050207080706030800-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f169.google.com (mail-ig0-f169.google.com [209.85.213.169]) by kanga.kvack.org (Postfix) with ESMTP id 382316B0255 for ; Tue, 22 Sep 2015 13:09:50 -0400 (EDT) Received: by igcpb10 with SMTP id pb10so103657465igc.1 for ; Tue, 22 Sep 2015 10:09:50 -0700 (PDT) Received: from mail-ig0-x22a.google.com (mail-ig0-x22a.google.com. [2607:f8b0:4001:c05::22a]) by mx.google.com with ESMTPS id y3si13665018igl.47.2015.09.22.10.09.49 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 22 Sep 2015 10:09:49 -0700 (PDT) Received: by igxx6 with SMTP id x6so13483330igx.1 for ; Tue, 22 Sep 2015 10:09:49 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> <20150916135048.fbd50fac5e91244ab9731b82@gmail.com> From: Dan Streetman Date: Tue, 22 Sep 2015 13:09:09 -0400 Message-ID: Subject: Re: [PATCH 1/2] zbud: allow PAGE_SIZE allocations Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Vitaly Wool Cc: Andrew Morton , Minchan Kim , Sergey Senozhatsky , linux-kernel , Linux-MM , Seth Jennings On Tue, Sep 22, 2015 at 6:18 AM, Vitaly Wool wrote: > Hi Dan, > > On Mon, Sep 21, 2015 at 6:17 PM, Dan Streetman wrote: >> Please make sure to cc Seth also, he's the owner of zbud. > > Sure :) > > >>> @@ -514,8 +552,17 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) >>> return -EINVAL; >>> } >>> for (i = 0; i < retries; i++) { >>> - zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru); >>> - list_del(&zhdr->lru); >>> + page = list_tail_entry(&pool->lru, struct page, lru); >>> + zhdr = page_address(page); >>> + list_del(&page->lru); >>> + /* Uncompressed zbud page? just run eviction and free it */ >>> + if (page->flags & PG_uncompressed) { >>> + page->flags &= ~PG_uncompressed; >>> + spin_unlock(&pool->lock); >>> + pool->ops->evict(pool, encode_handle(zhdr, FULL)); >>> + __free_page(page); >>> + return 0; >> >> again, don't be redundant. change the function to handle full-sized >> pages, don't repeat the function in an if() block for a special case. > > Well, this case is a little tricky. How to process a zbud page in > zbud_reclaim_page() is defined basing on the assumption there is a > zhdr at the beginning of the page. What can be done here IMV is either > of the following: aha, this is why you used the page flag. > * add a constant magic number to zhdr and check for it, if the check > fails, it is a type FULL page > * add a CRC field to zhdr, if CRC check over assumed zhdr fails, it > is a type FULL page neither of those; you can't guarantee the magic number won't naturally occur in a page. > * use a field from struct page to indicate its type sure, you could use a pre-existing field from struct page, like the page->private field. > > The last option still looks better to me. > > ~vitaly -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752519AbbIPLt1 (ORCPT ); Wed, 16 Sep 2015 07:49:27 -0400 Received: from mail-la0-f44.google.com ([209.85.215.44]:36004 "EHLO mail-la0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752074AbbIPLt0 (ORCPT ); Wed, 16 Sep 2015 07:49:26 -0400 Date: Wed, 16 Sep 2015 13:48:57 +0200 From: Vitaly Wool To: ddstreet@ieee.org, akpm@linux-foundation.org, minchan@kernel.org, sergey.senozhatsky@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 0/2] prepare zbud to be used by zram as underlying allocator Message-Id: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> X-Mailer: Sylpheed 3.4.1 (GTK+ 2.24.23; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, as a follow-up to my previous patchset, I decided to first prepare zbud/zpool related patches and then have some testing rounds and performance measurements for zram running over both, and come up with improved/verified zram/zpool patches then. So for now, here comes the zbud/zpool part. -- Vitaly Wool From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753151AbbIPLux (ORCPT ); Wed, 16 Sep 2015 07:50:53 -0400 Received: from mail-la0-f42.google.com ([209.85.215.42]:35118 "EHLO mail-la0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753082AbbIPLuv (ORCPT ); Wed, 16 Sep 2015 07:50:51 -0400 Date: Wed, 16 Sep 2015 13:50:48 +0200 From: Vitaly Wool To: ddstreet@ieee.org, akpm@linux-foundation.org, minchan@kernel.org, sergey.senozhatsky@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 1/2] zbud: allow PAGE_SIZE allocations Message-Id: <20150916135048.fbd50fac5e91244ab9731b82@gmail.com> In-Reply-To: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> X-Mailer: Sylpheed 3.4.1 (GTK+ 2.24.23; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org For zram to be able to use zbud via the common zpool API, allocations of size PAGE_SIZE should be allowed by zpool. zbud uses the beginning of an allocated page for its internal structure but it is not a problem as long as we keep track of such special pages using a newly introduced page flag. To be able to keep track of zbud pages in any case, struct page's lru pointer will be used for zbud page lists instead of the one that used to be part of the aforementioned internal structure. Signed-off-by: Vitaly Wool --- include/linux/page-flags.h | 3 ++ mm/zbud.c | 71 ++++++++++++++++++++++++++++++++++++++-------- 2 files changed, 62 insertions(+), 12 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 416509e..dd47cf0 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -134,6 +134,9 @@ enum pageflags { /* SLOB */ PG_slob_free = PG_private, + + /* ZBUD */ + PG_uncompressed = PG_owner_priv_1, }; #ifndef __GENERATING_BOUNDS_H diff --git a/mm/zbud.c b/mm/zbud.c index fa48bcdf..ee8b5d6 100644 --- a/mm/zbud.c +++ b/mm/zbud.c @@ -107,13 +107,11 @@ struct zbud_pool { * struct zbud_header - zbud page metadata occupying the first chunk of each * zbud page. * @buddy: links the zbud page into the unbuddied/buddied lists in the pool - * @lru: links the zbud page into the lru list in the pool * @first_chunks: the size of the first buddy in chunks, 0 if free * @last_chunks: the size of the last buddy in chunks, 0 if free */ struct zbud_header { struct list_head buddy; - struct list_head lru; unsigned int first_chunks; unsigned int last_chunks; bool under_reclaim; @@ -221,6 +219,7 @@ MODULE_ALIAS("zpool-zbud"); *****************/ /* Just to make the code easier to read */ enum buddy { + FULL, FIRST, LAST }; @@ -241,7 +240,7 @@ static struct zbud_header *init_zbud_page(struct page *page) zhdr->first_chunks = 0; zhdr->last_chunks = 0; INIT_LIST_HEAD(&zhdr->buddy); - INIT_LIST_HEAD(&zhdr->lru); + INIT_LIST_HEAD(&page->lru); zhdr->under_reclaim = 0; return zhdr; } @@ -267,11 +266,18 @@ static unsigned long encode_handle(struct zbud_header *zhdr, enum buddy bud) * over the zbud header in the first chunk. */ handle = (unsigned long)zhdr; - if (bud == FIRST) + switch (bud) { + case FIRST: /* skip over zbud header */ handle += ZHDR_SIZE_ALIGNED; - else /* bud == LAST */ + break; + case LAST: handle += PAGE_SIZE - (zhdr->last_chunks << CHUNK_SHIFT); + break; + case FULL: + default: + break; + } return handle; } @@ -360,6 +366,24 @@ int zbud_alloc(struct zbud_pool *pool, size_t size, gfp_t gfp, if (!size || (gfp & __GFP_HIGHMEM)) return -EINVAL; + + if (size == PAGE_SIZE) { + /* + * This is a special case. The page will be allocated + * and used to store uncompressed data + */ + page = alloc_page(gfp); + if (!page) + return -ENOMEM; + spin_lock(&pool->lock); + pool->pages_nr++; + INIT_LIST_HEAD(&page->lru); + page->flags |= PG_uncompressed; + list_add(&page->lru, &pool->lru); + spin_unlock(&pool->lock); + *handle = encode_handle(page_address(page), FULL); + return 0; + } if (size > PAGE_SIZE - ZHDR_SIZE_ALIGNED - CHUNK_SIZE) return -ENOSPC; chunks = size_to_chunks(size); @@ -372,6 +396,7 @@ int zbud_alloc(struct zbud_pool *pool, size_t size, gfp_t gfp, zhdr = list_first_entry(&pool->unbuddied[i], struct zbud_header, buddy); list_del(&zhdr->buddy); + page = virt_to_page(zhdr); if (zhdr->first_chunks == 0) bud = FIRST; else @@ -406,9 +431,9 @@ found: } /* Add/move zbud page to beginning of LRU */ - if (!list_empty(&zhdr->lru)) - list_del(&zhdr->lru); - list_add(&zhdr->lru, &pool->lru); + if (!list_empty(&page->lru)) + list_del(&page->lru); + list_add(&page->lru, &pool->lru); *handle = encode_handle(zhdr, bud); spin_unlock(&pool->lock); @@ -430,9 +455,21 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle) { struct zbud_header *zhdr; int freechunks; + struct page *page; spin_lock(&pool->lock); zhdr = handle_to_zbud_header(handle); + page = virt_to_page(zhdr); + + /* If it was an uncompressed full page, just free it */ + if (page->flags & PG_uncompressed) { + page->flags &= ~PG_uncompressed; + list_del(&page->lru); + __free_page(page); + pool->pages_nr--; + spin_unlock(&pool->lock); + return; + } /* If first buddy, handle will be page aligned */ if ((handle - ZHDR_SIZE_ALIGNED) & ~PAGE_MASK) @@ -451,7 +488,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle) if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) { /* zbud page is empty, free */ - list_del(&zhdr->lru); + list_del(&page->lru); free_zbud_page(zhdr); pool->pages_nr--; } else { @@ -505,6 +542,7 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) { int i, ret, freechunks; struct zbud_header *zhdr; + struct page *page; unsigned long first_handle = 0, last_handle = 0; spin_lock(&pool->lock); @@ -514,8 +552,17 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) return -EINVAL; } for (i = 0; i < retries; i++) { - zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru); - list_del(&zhdr->lru); + page = list_tail_entry(&pool->lru, struct page, lru); + zhdr = page_address(page); + list_del(&page->lru); + /* Uncompressed zbud page? just run eviction and free it */ + if (page->flags & PG_uncompressed) { + page->flags &= ~PG_uncompressed; + spin_unlock(&pool->lock); + pool->ops->evict(pool, encode_handle(zhdr, FULL)); + __free_page(page); + return 0; + } list_del(&zhdr->buddy); /* Protect zbud page against free */ zhdr->under_reclaim = true; @@ -565,7 +612,7 @@ next: } /* add to beginning of LRU */ - list_add(&zhdr->lru, &pool->lru); + list_add(&page->lru, &pool->lru); } spin_unlock(&pool->lock); return -EAGAIN; -- 1.9.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752632AbbIPLxw (ORCPT ); Wed, 16 Sep 2015 07:53:52 -0400 Received: from mail-lb0-f178.google.com ([209.85.217.178]:36671 "EHLO mail-lb0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752031AbbIPLxv (ORCPT ); Wed, 16 Sep 2015 07:53:51 -0400 Date: Wed, 16 Sep 2015 13:53:47 +0200 From: Vitaly Wool To: ddstreet@ieee.org, akpm@linux-foundation.org, minchan@kernel.org, sergey.senozhatsky@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 2/2] zpool/zsmalloc/zbud: align on interfaces Message-Id: <20150916135347.149f550d51751c58c8b7ca96@gmail.com> In-Reply-To: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> X-Mailer: Sylpheed 3.4.1 (GTK+ 2.24.23; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org As a preparation step for zram to be able to use common zpool API, there has to be some alignment done on it. This patch adds functions that correspond to zsmalloc-specific API to the common zpool API and takes care of the callbacks that have to be introduced, too. This version of the patch uses simplified 'compact' API/callbacks. Signed-off-by: Vitaly Wool --- drivers/block/zram/zram_drv.c | 4 ++-- include/linux/zpool.h | 14 ++++++++++++++ include/linux/zsmalloc.h | 8 ++------ mm/zbud.c | 12 ++++++++++++ mm/zpool.c | 21 +++++++++++++++++++++ mm/zsmalloc.c | 19 ++++++++++++++++--- 6 files changed, 67 insertions(+), 11 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 9fa15bb..a0a786e 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -426,12 +426,12 @@ static ssize_t mm_stat_show(struct device *dev, struct device_attribute *attr, char *buf) { struct zram *zram = dev_to_zram(dev); - struct zs_pool_stats pool_stats; + struct zpool_stats pool_stats; u64 orig_size, mem_used = 0; long max_used; ssize_t ret; - memset(&pool_stats, 0x00, sizeof(struct zs_pool_stats)); + memset(&pool_stats, 0x00, sizeof(struct zpool_stats)); down_read(&zram->init_lock); if (init_done(zram)) { diff --git a/include/linux/zpool.h b/include/linux/zpool.h index 42f8ec9..a2a5bc4 100644 --- a/include/linux/zpool.h +++ b/include/linux/zpool.h @@ -17,6 +17,11 @@ struct zpool_ops { int (*evict)(struct zpool *pool, unsigned long handle); }; +struct zpool_stats { + /* How many pages were migrated (freed) */ + unsigned long pages_compacted; +}; + /* * Control how a handle is mapped. It will be ignored if the * implementation does not support it. Its use is optional. @@ -58,6 +63,10 @@ void *zpool_map_handle(struct zpool *pool, unsigned long handle, void zpool_unmap_handle(struct zpool *pool, unsigned long handle); +unsigned long zpool_compact(struct zpool *pool); + +void zpool_stats(struct zpool *pool, struct zpool_stats *zstats); + u64 zpool_get_total_size(struct zpool *pool); @@ -72,6 +81,8 @@ u64 zpool_get_total_size(struct zpool *pool); * @shrink: shrink the pool. * @map: map a handle. * @unmap: unmap a handle. + * @compact: try to run compaction for the pool + * @stats: get statistics for the pool * @total_size: get total size of a pool. * * This is created by a zpool implementation and registered @@ -98,6 +109,9 @@ struct zpool_driver { enum zpool_mapmode mm); void (*unmap)(void *pool, unsigned long handle); + unsigned long (*compact)(void *pool); + void (*stats)(void *pool, struct zpool_stats *stats); + u64 (*total_size)(void *pool); }; diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h index 6398dfa..5aee1c7 100644 --- a/include/linux/zsmalloc.h +++ b/include/linux/zsmalloc.h @@ -15,6 +15,7 @@ #define _ZS_MALLOC_H_ #include +#include /* * zsmalloc mapping modes @@ -34,11 +35,6 @@ enum zs_mapmode { */ }; -struct zs_pool_stats { - /* How many pages were migrated (freed) */ - unsigned long pages_compacted; -}; - struct zs_pool; struct zs_pool *zs_create_pool(char *name, gfp_t flags); @@ -54,5 +50,5 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle); unsigned long zs_get_total_pages(struct zs_pool *pool); unsigned long zs_compact(struct zs_pool *pool); -void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats); +void zs_pool_stats(struct zs_pool *pool, struct zpool_stats *stats); #endif diff --git a/mm/zbud.c b/mm/zbud.c index ee8b5d6..23cfc76 100644 --- a/mm/zbud.c +++ b/mm/zbud.c @@ -193,6 +193,16 @@ static void zbud_zpool_unmap(void *pool, unsigned long handle) zbud_unmap(pool, handle); } +static unsigned long zbud_zpool_compact(void *pool) +{ + return 0; +} + +static void zbud_zpool_stats(void *pool, struct zpool_stats *stats) +{ + /* no-op */ +} + static u64 zbud_zpool_total_size(void *pool) { return zbud_get_pool_size(pool) * PAGE_SIZE; @@ -208,6 +218,8 @@ static struct zpool_driver zbud_zpool_driver = { .shrink = zbud_zpool_shrink, .map = zbud_zpool_map, .unmap = zbud_zpool_unmap, + .compact = zbud_zpool_compact, + .stats = zbud_zpool_stats, .total_size = zbud_zpool_total_size, }; diff --git a/mm/zpool.c b/mm/zpool.c index 8f670d3..d454f37 100644 --- a/mm/zpool.c +++ b/mm/zpool.c @@ -341,6 +341,27 @@ void zpool_unmap_handle(struct zpool *zpool, unsigned long handle) } /** + * zpool_compact() - try to run compaction over zpool + * @pool The zpool to compact + * + * Returns: the number of migrated pages (0 if nothing happened) + */ +unsigned long zpool_compact(struct zpool *zpool) +{ + return zpool->driver->compact(zpool->pool); +} + +/** + * zpool_stats() - obtain zpool statistics + * @pool The zpool to get statistics for + * @zstats stats to fill in + */ +void zpool_stats(struct zpool *zpool, struct zpool_stats *zstats) +{ + zpool->driver->stats(zpool->pool, zstats); +} + +/** * zpool_get_total_size() - The total size of the pool * @pool The zpool to check * diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index f135b1b..3ab0515 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -245,7 +245,7 @@ struct zs_pool { gfp_t flags; /* allocation flags used when growing pool */ atomic_long_t pages_allocated; - struct zs_pool_stats stats; + struct zpool_stats stats; /* Compact classes */ struct shrinker shrinker; @@ -365,6 +365,17 @@ static void zs_zpool_unmap(void *pool, unsigned long handle) zs_unmap_object(pool, handle); } +static unsigned long zs_zpool_compact(void *pool) +{ + return zs_compact(pool); +} + + +static void zs_zpool_stats(void *pool, struct zpool_stats *stats) +{ + zs_pool_stats(pool, stats); +} + static u64 zs_zpool_total_size(void *pool) { return zs_get_total_pages(pool) << PAGE_SHIFT; @@ -380,6 +391,8 @@ static struct zpool_driver zs_zpool_driver = { .shrink = zs_zpool_shrink, .map = zs_zpool_map, .unmap = zs_zpool_unmap, + .compact = zs_zpool_compact, + .stats = zs_zpool_stats, .total_size = zs_zpool_total_size, }; @@ -1789,9 +1802,9 @@ unsigned long zs_compact(struct zs_pool *pool) } EXPORT_SYMBOL_GPL(zs_compact); -void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats) +void zs_pool_stats(struct zs_pool *pool, struct zpool_stats *stats) { - memcpy(stats, &pool->stats, sizeof(struct zs_pool_stats)); + memcpy(stats, &pool->stats, sizeof(struct zpool_stats)); } EXPORT_SYMBOL_GPL(zs_pool_stats); -- 1.9.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753167AbbIQB3Y (ORCPT ); Wed, 16 Sep 2015 21:29:24 -0400 Received: from mail-pa0-f48.google.com ([209.85.220.48]:33804 "EHLO mail-pa0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752825AbbIQB3X (ORCPT ); Wed, 16 Sep 2015 21:29:23 -0400 Date: Thu, 17 Sep 2015 10:30:07 +0900 From: Sergey Senozhatsky To: Vitaly Wool Cc: ddstreet@ieee.org, akpm@linux-foundation.org, minchan@kernel.org, sergey.senozhatsky@gmail.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 0/2] prepare zbud to be used by zram as underlying allocator Message-ID: <20150917013007.GB421@swordfish> References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On (09/16/15 13:48), Vitaly Wool wrote: > as a follow-up to my previous patchset, I decided to first prepare > zbud/zpool related patches and then have some testing rounds and > performance measurements for zram running over both, and come up > with improved/verified zram/zpool patches then. Hi, just a side note, I'm afraid this is not how it works. numbers go first, to justify the patch set. -ss > > So for now, here comes the zbud/zpool part. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751691AbbIQK0O (ORCPT ); Thu, 17 Sep 2015 06:26:14 -0400 Received: from mail-qg0-f50.google.com ([209.85.192.50]:35306 "EHLO mail-qg0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751160AbbIQK0N (ORCPT ); Thu, 17 Sep 2015 06:26:13 -0400 MIME-Version: 1.0 In-Reply-To: <20150917013007.GB421@swordfish> References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> <20150917013007.GB421@swordfish> Date: Thu, 17 Sep 2015 12:26:12 +0200 Message-ID: Subject: Re: [PATCH 0/2] prepare zbud to be used by zram as underlying allocator From: Vitaly Wool To: Sergey Senozhatsky Cc: Dan Streetman , Andrew Morton , Minchan Kim , Sergey Senozhatsky , LKML , linux-mm@kvack.org Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 17, 2015 at 1:30 AM, Sergey Senozhatsky wrote: > > just a side note, > I'm afraid this is not how it works. numbers go first, to justify > the patch set. > These patches are extension/alignment patches, why would anyone need to justify that? But just to help you understand where I am coming from, here are some numbers: zsmalloc zbud kswapd_low_wmark_hit_quickly 4513 5696 kswapd_high_wmark_hit_quickly 861 902 allocstall 2236 1122 pgmigrate_success 78229 31244 compact_stall 1172 634 compact_fail 194 95 compact_success 464 210 These are results from an Android device having run 3 'monkey' tests each 20 minutes, with user switch to guest and back in between. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752983AbbIQNA6 (ORCPT ); Thu, 17 Sep 2015 09:00:58 -0400 Received: from mx2.suse.de ([195.135.220.15]:47532 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752255AbbIQNA5 (ORCPT ); Thu, 17 Sep 2015 09:00:57 -0400 Subject: Re: [PATCH 1/2] zbud: allow PAGE_SIZE allocations To: Vitaly Wool , ddstreet@ieee.org, akpm@linux-foundation.org, minchan@kernel.org, sergey.senozhatsky@gmail.com References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> <20150916135048.fbd50fac5e91244ab9731b82@gmail.com> Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org From: Vlastimil Babka Message-ID: <55FAB985.9060705@suse.cz> Date: Thu, 17 Sep 2015 15:00:53 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.0.1 MIME-Version: 1.0 In-Reply-To: <20150916135048.fbd50fac5e91244ab9731b82@gmail.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/16/2015 01:50 PM, Vitaly Wool wrote: > For zram to be able to use zbud via the common zpool API, > allocations of size PAGE_SIZE should be allowed by zpool. > zbud uses the beginning of an allocated page for its internal > structure but it is not a problem as long as we keep track of > such special pages using a newly introduced page flag. > To be able to keep track of zbud pages in any case, struct page's > lru pointer will be used for zbud page lists instead of the one > that used to be part of the aforementioned internal structure. I don't know how zsmalloc handles uncompressible PAGE_SIZE allocations, but I wouldn't expect it to be any more clever than this? So why duplicate the functionality in zswap and zbud? This could be handled e.g. at the zpool level? Or maybe just in zram, as IIRC in zswap (frontswap) it's valid just to reject a page and it goes to physical swap. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752197AbbIRID2 (ORCPT ); Fri, 18 Sep 2015 04:03:28 -0400 Received: from mail-qg0-f44.google.com ([209.85.192.44]:33998 "EHLO mail-qg0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751068AbbIRIDX (ORCPT ); Fri, 18 Sep 2015 04:03:23 -0400 MIME-Version: 1.0 In-Reply-To: <55FAB985.9060705@suse.cz> References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> <20150916135048.fbd50fac5e91244ab9731b82@gmail.com> <55FAB985.9060705@suse.cz> Date: Fri, 18 Sep 2015 10:03:22 +0200 Message-ID: Subject: Re: [PATCH 1/2] zbud: allow PAGE_SIZE allocations From: Vitaly Wool To: Dan Streetman , Andrew Morton , Minchan Kim , Sergey Senozhatsky Cc: LKML , linux-mm@kvack.org Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > I don't know how zsmalloc handles uncompressible PAGE_SIZE allocations, but > I wouldn't expect it to be any more clever than this? So why duplicate the > functionality in zswap and zbud? This could be handled e.g. at the zpool > level? Or maybe just in zram, as IIRC in zswap (frontswap) it's valid just > to reject a page and it goes to physical swap. >>From what I can see, zsmalloc just allocates pages and puts them into a linked list. Using the beginning of a page for storing an internal struct is zbud-specific, and so is this patch. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751856AbbIUERM (ORCPT ); Mon, 21 Sep 2015 00:17:12 -0400 Received: from mail-pa0-f53.google.com ([209.85.220.53]:34719 "EHLO mail-pa0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751093AbbIUERK (ORCPT ); Mon, 21 Sep 2015 00:17:10 -0400 Date: Mon, 21 Sep 2015 13:18:37 +0900 From: Minchan Kim To: Vitaly Wool Cc: Sergey Senozhatsky , Dan Streetman , Andrew Morton , Sergey Senozhatsky , LKML , linux-mm@kvack.org Subject: Re: [PATCH 0/2] prepare zbud to be used by zram as underlying allocator Message-ID: <20150921041837.GF27729@bbox> References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> <20150917013007.GB421@swordfish> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Vitaly, On Thu, Sep 17, 2015 at 12:26:12PM +0200, Vitaly Wool wrote: > On Thu, Sep 17, 2015 at 1:30 AM, Sergey Senozhatsky > wrote: > > > > > just a side note, > > I'm afraid this is not how it works. numbers go first, to justify > > the patch set. I totally agree Sergey's opinion. > > > > These patches are extension/alignment patches, why would anyone need > to justify that? Sorry, because you wrote up "zram" in the title. As I said earlier, we need several numbers to investigate. First of all, what is culprit of your latency? It seems you are thinking about compaction. so compaction what? Frequent scanning? lock collision? or frequent sleeping in compaction code somewhere? And then why does zbud solve it? If we use zbud for zram, we lose memory efficiency so there is something to justify it. The reason I am asking is I have investigated similar problems in android and other plaforms and the reason of latency was not zsmalloc but agressive high-order allocations from subsystems, watermark check race, deferring of compaction, LMK not working and too much swapout so it causes to reclaim lots of page cache pages which was main culprit in my cases. When I checks with perf, compaction stall count is increased, the time spent in there is not huge so it was not main factor of latency. Your problem might be differnt with me so convincing us, you should give us real data and investigation story. Thanks. > > But just to help you understand where I am coming from, here are some numbers: > zsmalloc zbud > kswapd_low_wmark_hit_quickly 4513 5696 > kswapd_high_wmark_hit_quickly 861 902 > allocstall 2236 1122 > pgmigrate_success 78229 31244 > compact_stall 1172 634 > compact_fail 194 95 > compact_success 464 210 > > These are results from an Android device having run 3 'monkey' tests > each 20 minutes, with user switch to guest and back in between. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756608AbbIUP1t (ORCPT ); Mon, 21 Sep 2015 11:27:49 -0400 Received: from mail-ig0-f182.google.com ([209.85.213.182]:37116 "EHLO mail-ig0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752824AbbIUP1s (ORCPT ); Mon, 21 Sep 2015 11:27:48 -0400 MIME-Version: 1.0 In-Reply-To: References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> <20150916135048.fbd50fac5e91244ab9731b82@gmail.com> <55FAB985.9060705@suse.cz> From: Dan Streetman Date: Mon, 21 Sep 2015 11:27:08 -0400 X-Google-Sender-Auth: EprYTaew1yV7QwVgOmd6iSIks8A Message-ID: Subject: Re: [PATCH 1/2] zbud: allow PAGE_SIZE allocations To: Vitaly Wool Cc: Andrew Morton , Minchan Kim , Sergey Senozhatsky , LKML , Linux-MM Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 18, 2015 at 4:03 AM, Vitaly Wool wrote: >> I don't know how zsmalloc handles uncompressible PAGE_SIZE allocations, but >> I wouldn't expect it to be any more clever than this? So why duplicate the >> functionality in zswap and zbud? This could be handled e.g. at the zpool >> level? Or maybe just in zram, as IIRC in zswap (frontswap) it's valid just >> to reject a page and it goes to physical swap. zpool doesn't actually store pages anywhere; zbud and zsmalloc do the storing, and they do it in completely different ways. Storing an uncompressed page has to be done in zbud and zsmalloc, not zpool. And zram can't do it either; zram doesn't actually store pages either, it relies on zsmalloc to store all its pages. > > From what I can see, zsmalloc just allocates pages and puts them into > a linked list. Using the beginning of a page for storing an internal > struct is zbud-specific, and so is this patch. zsmalloc has size "classes" that allow storing "objects" of a specific size range (i.e. the last class size + 1, up to class size). the max size class is: #define ZS_MAX_ALLOC_SIZE PAGE_SIZE so zsmalloc is able to store "objects" up to, and including, PAGE_SIZE. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757257AbbIUQSE (ORCPT ); Mon, 21 Sep 2015 12:18:04 -0400 Received: from mail-ig0-f173.google.com ([209.85.213.173]:34205 "EHLO mail-ig0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756659AbbIUQSC (ORCPT ); Mon, 21 Sep 2015 12:18:02 -0400 MIME-Version: 1.0 In-Reply-To: <20150916135048.fbd50fac5e91244ab9731b82@gmail.com> References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> <20150916135048.fbd50fac5e91244ab9731b82@gmail.com> From: Dan Streetman Date: Mon, 21 Sep 2015 12:17:21 -0400 X-Google-Sender-Auth: 3EGyuW38a4kkk0SvHwRCLI3Ejsk Message-ID: Subject: Re: [PATCH 1/2] zbud: allow PAGE_SIZE allocations To: Vitaly Wool Cc: Andrew Morton , Minchan Kim , Sergey Senozhatsky , linux-kernel , Linux-MM , Seth Jennings Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Please make sure to cc Seth also, he's the owner of zbud. On Wed, Sep 16, 2015 at 7:50 AM, Vitaly Wool wrote: > For zram to be able to use zbud via the common zpool API, > allocations of size PAGE_SIZE should be allowed by zpool. > zbud uses the beginning of an allocated page for its internal > structure but it is not a problem as long as we keep track of > such special pages using a newly introduced page flag. > To be able to keep track of zbud pages in any case, struct page's > lru pointer will be used for zbud page lists instead of the one > that used to be part of the aforementioned internal structure. > > Signed-off-by: Vitaly Wool > --- > include/linux/page-flags.h | 3 ++ > mm/zbud.c | 71 ++++++++++++++++++++++++++++++++++++++-------- > 2 files changed, 62 insertions(+), 12 deletions(-) > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > index 416509e..dd47cf0 100644 > --- a/include/linux/page-flags.h > +++ b/include/linux/page-flags.h > @@ -134,6 +134,9 @@ enum pageflags { > > /* SLOB */ > PG_slob_free = PG_private, > + > + /* ZBUD */ > + PG_uncompressed = PG_owner_priv_1, you don't need a new page flag. and there's 0% chance it would be accepted even if you did. > }; > > #ifndef __GENERATING_BOUNDS_H > diff --git a/mm/zbud.c b/mm/zbud.c > index fa48bcdf..ee8b5d6 100644 > --- a/mm/zbud.c > +++ b/mm/zbud.c > @@ -107,13 +107,11 @@ struct zbud_pool { > * struct zbud_header - zbud page metadata occupying the first chunk of each > * zbud page. > * @buddy: links the zbud page into the unbuddied/buddied lists in the pool > - * @lru: links the zbud page into the lru list in the pool > * @first_chunks: the size of the first buddy in chunks, 0 if free > * @last_chunks: the size of the last buddy in chunks, 0 if free > */ > struct zbud_header { > struct list_head buddy; > - struct list_head lru; > unsigned int first_chunks; > unsigned int last_chunks; > bool under_reclaim; > @@ -221,6 +219,7 @@ MODULE_ALIAS("zpool-zbud"); > *****************/ > /* Just to make the code easier to read */ > enum buddy { > + FULL, > FIRST, > LAST > }; > @@ -241,7 +240,7 @@ static struct zbud_header *init_zbud_page(struct page *page) > zhdr->first_chunks = 0; > zhdr->last_chunks = 0; > INIT_LIST_HEAD(&zhdr->buddy); > - INIT_LIST_HEAD(&zhdr->lru); > + INIT_LIST_HEAD(&page->lru); > zhdr->under_reclaim = 0; > return zhdr; > } > @@ -267,11 +266,18 @@ static unsigned long encode_handle(struct zbud_header *zhdr, enum buddy bud) > * over the zbud header in the first chunk. > */ > handle = (unsigned long)zhdr; > - if (bud == FIRST) > + switch (bud) { > + case FIRST: > /* skip over zbud header */ > handle += ZHDR_SIZE_ALIGNED; > - else /* bud == LAST */ > + break; > + case LAST: > handle += PAGE_SIZE - (zhdr->last_chunks << CHUNK_SHIFT); > + break; > + case FULL: > + default: Hmm, while it should be ok to treat a default (invalid) bud value as a full page (assuming the caller treats it as such), you should at least add a pr_warn() or pr_warn_ratelimited(), or maybe a WARN_ON() or WARN_ON_ONCE(). the default case should never happen, and a warning should be printed if it does. > + break; > + } > return handle; > } > > @@ -360,6 +366,24 @@ int zbud_alloc(struct zbud_pool *pool, size_t size, gfp_t gfp, > > if (!size || (gfp & __GFP_HIGHMEM)) > return -EINVAL; > + > + if (size == PAGE_SIZE) { > + /* > + * This is a special case. The page will be allocated > + * and used to store uncompressed data > + */ well you shouldn't special case only PAGE_SIZE. If zram increases its max_zpage_size to a value > (PAGE_SIZE - ZHDR_SIZE_ALIGNED - CHUNK_SIZE) then those compressed pages will fail to store here. I think it would be better to change the size check to a simple if (size > PAGE_SIZE) return -ENOSPC; then use the existing > if (size > PAGE_SIZE - ZHDR_SIZE_ALIGNED - CHUNK_SIZE) to store the object (which is either a large compressed page, or an uncompressed page) into the full zbud page. And don't duplicate everything the function does inside an if (), just update the function to handle PAGE_SIZE storage. > + page = alloc_page(gfp); > + if (!page) > + return -ENOMEM; > + spin_lock(&pool->lock); > + pool->pages_nr++; > + INIT_LIST_HEAD(&page->lru); > + page->flags |= PG_uncompressed; > + list_add(&page->lru, &pool->lru); > + spin_unlock(&pool->lock); > + *handle = encode_handle(page_address(page), FULL); > + return 0; > + } > if (size > PAGE_SIZE - ZHDR_SIZE_ALIGNED - CHUNK_SIZE) > return -ENOSPC; > chunks = size_to_chunks(size); > @@ -372,6 +396,7 @@ int zbud_alloc(struct zbud_pool *pool, size_t size, gfp_t gfp, > zhdr = list_first_entry(&pool->unbuddied[i], > struct zbud_header, buddy); > list_del(&zhdr->buddy); > + page = virt_to_page(zhdr); > if (zhdr->first_chunks == 0) > bud = FIRST; > else > @@ -406,9 +431,9 @@ found: > } > > /* Add/move zbud page to beginning of LRU */ > - if (!list_empty(&zhdr->lru)) > - list_del(&zhdr->lru); > - list_add(&zhdr->lru, &pool->lru); > + if (!list_empty(&page->lru)) > + list_del(&page->lru); > + list_add(&page->lru, &pool->lru); > > *handle = encode_handle(zhdr, bud); > spin_unlock(&pool->lock); > @@ -430,9 +455,21 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle) > { > struct zbud_header *zhdr; > int freechunks; > + struct page *page; > > spin_lock(&pool->lock); > zhdr = handle_to_zbud_header(handle); > + page = virt_to_page(zhdr); > + > + /* If it was an uncompressed full page, just free it */ > + if (page->flags & PG_uncompressed) { > + page->flags &= ~PG_uncompressed; > + list_del(&page->lru); > + __free_page(page); > + pool->pages_nr--; > + spin_unlock(&pool->lock); > + return; > + } don't repeat this function inside an if() block. update the actual function to handle the new case. and you don't need a new page flag. you have 3 distinct cases: switch (handle & ~PAGE_MASK) { case 0: /* this is a full-sized page */ case ZHDR_SIZE_ALIGNED: /* this is the first buddy */ default: /* this is the last buddy */ } > > /* If first buddy, handle will be page aligned */ > if ((handle - ZHDR_SIZE_ALIGNED) & ~PAGE_MASK) > @@ -451,7 +488,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle) > > if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) { > /* zbud page is empty, free */ > - list_del(&zhdr->lru); > + list_del(&page->lru); > free_zbud_page(zhdr); > pool->pages_nr--; > } else { > @@ -505,6 +542,7 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) > { > int i, ret, freechunks; > struct zbud_header *zhdr; > + struct page *page; > unsigned long first_handle = 0, last_handle = 0; > > spin_lock(&pool->lock); > @@ -514,8 +552,17 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) > return -EINVAL; > } > for (i = 0; i < retries; i++) { > - zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru); > - list_del(&zhdr->lru); > + page = list_tail_entry(&pool->lru, struct page, lru); > + zhdr = page_address(page); > + list_del(&page->lru); > + /* Uncompressed zbud page? just run eviction and free it */ > + if (page->flags & PG_uncompressed) { > + page->flags &= ~PG_uncompressed; > + spin_unlock(&pool->lock); > + pool->ops->evict(pool, encode_handle(zhdr, FULL)); > + __free_page(page); > + return 0; again, don't be redundant. change the function to handle full-sized pages, don't repeat the function in an if() block for a special case. > + } > list_del(&zhdr->buddy); > /* Protect zbud page against free */ > zhdr->under_reclaim = true; > @@ -565,7 +612,7 @@ next: > } > > /* add to beginning of LRU */ > - list_add(&zhdr->lru, &pool->lru); > + list_add(&page->lru, &pool->lru); > } > spin_unlock(&pool->lock); > return -EAGAIN; > -- > 1.9.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756660AbbIURPw (ORCPT ); Mon, 21 Sep 2015 13:15:52 -0400 Received: from mail-io0-f171.google.com ([209.85.223.171]:34073 "EHLO mail-io0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752857AbbIURPu (ORCPT ); Mon, 21 Sep 2015 13:15:50 -0400 MIME-Version: 1.0 In-Reply-To: <20150916135347.149f550d51751c58c8b7ca96@gmail.com> References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> <20150916135347.149f550d51751c58c8b7ca96@gmail.com> From: Dan Streetman Date: Mon, 21 Sep 2015 13:15:10 -0400 X-Google-Sender-Auth: OspHsdimAVGLaEQWqtZE09Mvoco Message-ID: Subject: Re: [PATCH 2/2] zpool/zsmalloc/zbud: align on interfaces To: Vitaly Wool Cc: Andrew Morton , Minchan Kim , Sergey Senozhatsky , linux-kernel , Linux-MM , Seth Jennings Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 16, 2015 at 7:53 AM, Vitaly Wool wrote: > As a preparation step for zram to be able to use common zpool API, > there has to be some alignment done on it. This patch adds > functions that correspond to zsmalloc-specific API to the common > zpool API and takes care of the callbacks that have to be > introduced, too. > > This version of the patch uses simplified 'compact' API/callbacks. > > Signed-off-by: Vitaly Wool > --- > drivers/block/zram/zram_drv.c | 4 ++-- > include/linux/zpool.h | 14 ++++++++++++++ > include/linux/zsmalloc.h | 8 ++------ > mm/zbud.c | 12 ++++++++++++ > mm/zpool.c | 21 +++++++++++++++++++++ > mm/zsmalloc.c | 19 ++++++++++++++++--- > 6 files changed, 67 insertions(+), 11 deletions(-) > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > index 9fa15bb..a0a786e 100644 > --- a/drivers/block/zram/zram_drv.c > +++ b/drivers/block/zram/zram_drv.c > @@ -426,12 +426,12 @@ static ssize_t mm_stat_show(struct device *dev, > struct device_attribute *attr, char *buf) > { > struct zram *zram = dev_to_zram(dev); > - struct zs_pool_stats pool_stats; > + struct zpool_stats pool_stats; > u64 orig_size, mem_used = 0; > long max_used; > ssize_t ret; > > - memset(&pool_stats, 0x00, sizeof(struct zs_pool_stats)); > + memset(&pool_stats, 0x00, sizeof(struct zpool_stats)); > > down_read(&zram->init_lock); > if (init_done(zram)) { you don't need to change zram in this patch. save this part for the patch to update zram to use zpool. > diff --git a/include/linux/zpool.h b/include/linux/zpool.h > index 42f8ec9..a2a5bc4 100644 > --- a/include/linux/zpool.h > +++ b/include/linux/zpool.h > @@ -17,6 +17,11 @@ struct zpool_ops { > int (*evict)(struct zpool *pool, unsigned long handle); > }; > > +struct zpool_stats { > + /* How many pages were migrated (freed) */ > + unsigned long pages_compacted; > +}; > + > /* > * Control how a handle is mapped. It will be ignored if the > * implementation does not support it. Its use is optional. > @@ -58,6 +63,10 @@ void *zpool_map_handle(struct zpool *pool, unsigned long handle, > > void zpool_unmap_handle(struct zpool *pool, unsigned long handle); > > +unsigned long zpool_compact(struct zpool *pool); > + > +void zpool_stats(struct zpool *pool, struct zpool_stats *zstats); > + > u64 zpool_get_total_size(struct zpool *pool); > > > @@ -72,6 +81,8 @@ u64 zpool_get_total_size(struct zpool *pool); > * @shrink: shrink the pool. > * @map: map a handle. > * @unmap: unmap a handle. > + * @compact: try to run compaction for the pool > + * @stats: get statistics for the pool > * @total_size: get total size of a pool. > * > * This is created by a zpool implementation and registered > @@ -98,6 +109,9 @@ struct zpool_driver { > enum zpool_mapmode mm); > void (*unmap)(void *pool, unsigned long handle); > > + unsigned long (*compact)(void *pool); > + void (*stats)(void *pool, struct zpool_stats *stats); > + > u64 (*total_size)(void *pool); > }; > > diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h > index 6398dfa..5aee1c7 100644 > --- a/include/linux/zsmalloc.h > +++ b/include/linux/zsmalloc.h > @@ -15,6 +15,7 @@ > #define _ZS_MALLOC_H_ > > #include > +#include > > /* > * zsmalloc mapping modes > @@ -34,11 +35,6 @@ enum zs_mapmode { > */ > }; > > -struct zs_pool_stats { > - /* How many pages were migrated (freed) */ > - unsigned long pages_compacted; > -}; > - > struct zs_pool; > > struct zs_pool *zs_create_pool(char *name, gfp_t flags); > @@ -54,5 +50,5 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle); > unsigned long zs_get_total_pages(struct zs_pool *pool); > unsigned long zs_compact(struct zs_pool *pool); > > -void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats); > +void zs_pool_stats(struct zs_pool *pool, struct zpool_stats *stats); > #endif > diff --git a/mm/zbud.c b/mm/zbud.c > index ee8b5d6..23cfc76 100644 > --- a/mm/zbud.c > +++ b/mm/zbud.c > @@ -193,6 +193,16 @@ static void zbud_zpool_unmap(void *pool, unsigned long handle) > zbud_unmap(pool, handle); > } > > +static unsigned long zbud_zpool_compact(void *pool) > +{ > + return 0; > +} > + > +static void zbud_zpool_stats(void *pool, struct zpool_stats *stats) > +{ > + /* no-op */ > +} > + > static u64 zbud_zpool_total_size(void *pool) > { > return zbud_get_pool_size(pool) * PAGE_SIZE; > @@ -208,6 +218,8 @@ static struct zpool_driver zbud_zpool_driver = { > .shrink = zbud_zpool_shrink, > .map = zbud_zpool_map, > .unmap = zbud_zpool_unmap, > + .compact = zbud_zpool_compact, > + .stats = zbud_zpool_stats, > .total_size = zbud_zpool_total_size, > }; > > diff --git a/mm/zpool.c b/mm/zpool.c > index 8f670d3..d454f37 100644 > --- a/mm/zpool.c > +++ b/mm/zpool.c > @@ -341,6 +341,27 @@ void zpool_unmap_handle(struct zpool *zpool, unsigned long handle) > } > > /** > + * zpool_compact() - try to run compaction over zpool > + * @pool The zpool to compact > + * > + * Returns: the number of migrated pages (0 if nothing happened) don't say "nothing happened", 0 doesn't necessarily mean that. 0 means no pages were compacted. > + */ > +unsigned long zpool_compact(struct zpool *zpool) > +{ > + return zpool->driver->compact(zpool->pool); > +} > + > +/** > + * zpool_stats() - obtain zpool statistics > + * @pool The zpool to get statistics for > + * @zstats stats to fill in this doc needs more. how is this function used? what should the caller expect in the zstats fields before and after the call? for the common zpool interface, it might make more sense to just include a direct function to access the pages_compacted, instead of the indirect stats call. If more stats get added in the future, the call can be changed into a zpool_stats function. > + */ > +void zpool_stats(struct zpool *zpool, struct zpool_stats *zstats) > +{ > + zpool->driver->stats(zpool->pool, zstats); > +} > + > +/** > * zpool_get_total_size() - The total size of the pool > * @pool The zpool to check > * > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > index f135b1b..3ab0515 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c > @@ -245,7 +245,7 @@ struct zs_pool { > gfp_t flags; /* allocation flags used when growing pool */ > atomic_long_t pages_allocated; > > - struct zs_pool_stats stats; > + struct zpool_stats stats; we haven't made zsmalloc completely dependent on zpool just yet :-) this won't compile when CONFIG_ZPOOL isn't set. leave this as a zs_pool_stats type and just handle the translation in the zs_zpool_stats function call. > > /* Compact classes */ > struct shrinker shrinker; > @@ -365,6 +365,17 @@ static void zs_zpool_unmap(void *pool, unsigned long handle) > zs_unmap_object(pool, handle); > } > > +static unsigned long zs_zpool_compact(void *pool) > +{ > + return zs_compact(pool); > +} > + > + > +static void zs_zpool_stats(void *pool, struct zpool_stats *stats) > +{ > + zs_pool_stats(pool, stats); > +} > + > static u64 zs_zpool_total_size(void *pool) > { > return zs_get_total_pages(pool) << PAGE_SHIFT; > @@ -380,6 +391,8 @@ static struct zpool_driver zs_zpool_driver = { > .shrink = zs_zpool_shrink, > .map = zs_zpool_map, > .unmap = zs_zpool_unmap, > + .compact = zs_zpool_compact, > + .stats = zs_zpool_stats, > .total_size = zs_zpool_total_size, > }; > > @@ -1789,9 +1802,9 @@ unsigned long zs_compact(struct zs_pool *pool) > } > EXPORT_SYMBOL_GPL(zs_compact); > > -void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats) > +void zs_pool_stats(struct zs_pool *pool, struct zpool_stats *stats) > { > - memcpy(stats, &pool->stats, sizeof(struct zs_pool_stats)); > + memcpy(stats, &pool->stats, sizeof(struct zpool_stats)); > } > EXPORT_SYMBOL_GPL(zs_pool_stats); > > -- > 1.9.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932352AbbIUVLE (ORCPT ); Mon, 21 Sep 2015 17:11:04 -0400 Received: from mail-wi0-f170.google.com ([209.85.212.170]:33559 "EHLO mail-wi0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756029AbbIUVLB (ORCPT ); Mon, 21 Sep 2015 17:11:01 -0400 MIME-Version: 1.0 In-Reply-To: <20150921041837.GF27729@bbox> References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> <20150917013007.GB421@swordfish> <20150921041837.GF27729@bbox> Date: Mon, 21 Sep 2015 23:11:00 +0200 Message-ID: Subject: Re: [PATCH 0/2] prepare zbud to be used by zram as underlying allocator From: Vitaly Wool To: Minchan Kim Cc: Sergey Senozhatsky , Dan Streetman , Andrew Morton , Sergey Senozhatsky , LKML , linux-mm@kvack.org Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Minchan, > Sorry, because you wrote up "zram" in the title. > As I said earlier, we need several numbers to investigate. > > First of all, what is culprit of your latency? > It seems you are thinking about compaction. so compaction what? > Frequent scanning? lock collision? or frequent sleeping in compaction > code somewhere? And then why does zbud solve it? If we use zbud for zram, > we lose memory efficiency so there is something to justify it. The data I've got so far strongly suggests that in some use cases (see below) with zsmalloc * there are more allocstalls * memory compaction is triggered more frequently * allocstalls happen more often * page migrations are way more frequent, too. Please also keep in mind that I do not advise you or anyone to use zbud instead of zsmalloc. The point I'm trying to make is that zbud fits my particular case better and I want to be able to choose it in the kernel without hacking it with my private patches. FWIW, given that I am not an author of either, I don't see why anyone would consider me biased. :-) As of the memory efficiency, you seem to be quite comfortable with storing uncompressed pages when they compress to more than 3/4 of a page. I observed ~13% reported ratio increase (3.8x to 4.3x) when I increased max_zpage_size to PAGE_SIZE / 32 * 31. Doesn't look like a fight for every byte to me. > The reason I am asking is I have investigated similar problems > in android and other plaforms and the reason of latency was not zsmalloc > but agressive high-order allocations from subsystems, watermark check > race, deferring of compaction, LMK not working and too much swapout so > it causes to reclaim lots of page cache pages which was main culprit > in my cases. When I checks with perf, compaction stall count is increased, > the time spent in there is not huge so it was not main factor of latency. The main use case where the difference is seen is switching between users on an Android device. It does cause a lot of reclaim, too, as you say, but this is in the nature of zbud that reclaim happens in a more deterministic way and worst-case looks substantially nicer. That said, the standard deviation calculated over 20 iterations of a change-user-multiple-times-test is 2x less for zbud than the one of zsmalloc. I'll post some numbers in the next patch respin so they won't get lost :) ~vitaly From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757483AbbIVKTB (ORCPT ); Tue, 22 Sep 2015 06:19:01 -0400 Received: from mail-wi0-f175.google.com ([209.85.212.175]:35950 "EHLO mail-wi0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754560AbbIVKS7 (ORCPT ); Tue, 22 Sep 2015 06:18:59 -0400 MIME-Version: 1.0 In-Reply-To: References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> <20150916135048.fbd50fac5e91244ab9731b82@gmail.com> Date: Tue, 22 Sep 2015 12:18:58 +0200 Message-ID: Subject: Re: [PATCH 1/2] zbud: allow PAGE_SIZE allocations From: Vitaly Wool To: Dan Streetman Cc: Andrew Morton , Minchan Kim , Sergey Senozhatsky , linux-kernel , Linux-MM , Seth Jennings Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Dan, On Mon, Sep 21, 2015 at 6:17 PM, Dan Streetman wrote: > Please make sure to cc Seth also, he's the owner of zbud. Sure :) >> @@ -514,8 +552,17 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) >> return -EINVAL; >> } >> for (i = 0; i < retries; i++) { >> - zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru); >> - list_del(&zhdr->lru); >> + page = list_tail_entry(&pool->lru, struct page, lru); >> + zhdr = page_address(page); >> + list_del(&page->lru); >> + /* Uncompressed zbud page? just run eviction and free it */ >> + if (page->flags & PG_uncompressed) { >> + page->flags &= ~PG_uncompressed; >> + spin_unlock(&pool->lock); >> + pool->ops->evict(pool, encode_handle(zhdr, FULL)); >> + __free_page(page); >> + return 0; > > again, don't be redundant. change the function to handle full-sized > pages, don't repeat the function in an if() block for a special case. Well, this case is a little tricky. How to process a zbud page in zbud_reclaim_page() is defined basing on the assumption there is a zhdr at the beginning of the page. What can be done here IMV is either of the following: * add a constant magic number to zhdr and check for it, if the check fails, it is a type FULL page * add a CRC field to zhdr, if CRC check over assumed zhdr fails, it is a type FULL page * use a field from struct page to indicate its type The last option still looks better to me. ~vitaly From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758075AbbIVPfQ (ORCPT ); Tue, 22 Sep 2015 11:35:16 -0400 Received: from mail-pa0-f45.google.com ([209.85.220.45]:35908 "EHLO mail-pa0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933072AbbIVPfM (ORCPT ); Tue, 22 Sep 2015 11:35:12 -0400 Date: Wed, 23 Sep 2015 00:36:40 +0900 From: Minchan Kim To: Vitaly Wool Cc: Sergey Senozhatsky , Dan Streetman , Andrew Morton , Sergey Senozhatsky , LKML , linux-mm@kvack.org Subject: Re: [PATCH 0/2] prepare zbud to be used by zram as underlying allocator Message-ID: <20150922153640.GA14817@bbox> References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> <20150917013007.GB421@swordfish> <20150921041837.GF27729@bbox> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Vitaly, On Mon, Sep 21, 2015 at 11:11:00PM +0200, Vitaly Wool wrote: > Hello Minchan, > > > Sorry, because you wrote up "zram" in the title. > > As I said earlier, we need several numbers to investigate. > > > > First of all, what is culprit of your latency? > > It seems you are thinking about compaction. so compaction what? > > Frequent scanning? lock collision? or frequent sleeping in compaction > > code somewhere? And then why does zbud solve it? If we use zbud for zram, > > we lose memory efficiency so there is something to justify it. > > The data I've got so far strongly suggests that in some use cases (see > below) with zsmalloc > * there are more allocstalls > * memory compaction is triggered more frequently > * allocstalls happen more often > * page migrations are way more frequent, too. > > Please also keep in mind that I do not advise you or anyone to use > zbud instead of zsmalloc. The point I'm trying to make is that zbud > fits my particular case better and I want to be able to choose it in > the kernel without hacking it with my private patches. I understand your goal well. ;-) But, please understand my goal which is to find fundamental reason why zbud removes latency. You gave some compaction-related stats but it is just one of result, not the cause. I guess you could find another stats as well as compaction stats which affect your workload. Once you find them all, please investigate what is major factor for your latency among them. Then, we should think over what is best solution for it and if zbud is best to remove the cause, yes, why not. I can merge it into zram. IOW, I should maintain zram so I need to know when,where,how to use zbud with zram is useful so that I can guide it to zram users and you should *justify* the overhead to me. Overhead means I should maintain two allocators for zram from now on. It means when I want to add some feature for zsmalloc, I should take care of zbud and I should watch zbud patches, too which could be very painful and starting point of diverge for zram. Compared to zsmalloc, zsmalloc packs lots of compressed objects into a page while zbud just stores two objects so if there are different life time objects in a page, zsmalloc may make higher fragmentation but zbud is not a good choice for memory efficiency either so my concern starts from here. For solving such problem, we added compaction into recent zram to reduce waste memory space so it should solve internal fragment problem. Other problem we don't solve now is external fragmentation which is related to compaction stats you are seeing now. Although you are seeing mitigation with zbud, it would be still problem if you begin to use more memory for zbud. One of example, a few years ago, some guys tried to support zbud page migration. If external fragmentation is really problem in here, we should proivde a feature VM can migrate zsmalloc page and it was alomost done as I told you previous thread and I think it is really way to go. Even, we are trying to add zlib which is better compress ratio algorithm to reduce working memory size so without the feature, the problem would be more severe. So, I am thinking now we should enhance it rather than hiding a problem by merging zbud. > FWIW, given that I am not an author of either, I don't see why anyone > would consider me biased. :-) > > As of the memory efficiency, you seem to be quite comfortable with > storing uncompressed pages when they compress to more than 3/4 of a > page. I observed ~13% reported ratio increase (3.8x to 4.3x) when I > increased max_zpage_size to PAGE_SIZE / 32 * 31. Doesn't look like a > fight for every byte to me. Thanks for the report. It could be another patch. :) > > > The reason I am asking is I have investigated similar problems > > in android and other plaforms and the reason of latency was not zsmalloc > > but agressive high-order allocations from subsystems, watermark check > > race, deferring of compaction, LMK not working and too much swapout so > > it causes to reclaim lots of page cache pages which was main culprit > > in my cases. When I checks with perf, compaction stall count is increased, > > the time spent in there is not huge so it was not main factor of latency. > > The main use case where the difference is seen is switching between > users on an Android device. It does cause a lot of reclaim, too, as > you say, but this is in the nature of zbud that reclaim happens in a > more deterministic way and worst-case looks substantially nicer. That Interesting! Why is reclaim more deterministic with zbud? That's really one of part what I want with data. > said, the standard deviation calculated over 20 iterations of a > change-user-multiple-times-test is 2x less for zbud than the one of > zsmalloc. One thing I can guess is a page could be freed easily if just two objects in a page are freed by munmap or kill. IOW, we could remove pinned page easily so we could get higher-order page easily. However, it would be different once zbud's memory usgae is higher as I mentioned. As well, we lose memory efficieny significantly for zram. :( IMO, more fundamentatal solution is to support VM-aware compaction of zsmalloc/zbud rather than hiding a problem with zbud. Thanks. > > I'll post some numbers in the next patch respin so they won't get lost :) > > ~vitaly From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758193AbbIVRJv (ORCPT ); Tue, 22 Sep 2015 13:09:51 -0400 Received: from mail-ig0-f170.google.com ([209.85.213.170]:32935 "EHLO mail-ig0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758074AbbIVRJt (ORCPT ); Tue, 22 Sep 2015 13:09:49 -0400 MIME-Version: 1.0 In-Reply-To: References: <20150916134857.e4a71f601a1f68cfa16cb361@gmail.com> <20150916135048.fbd50fac5e91244ab9731b82@gmail.com> From: Dan Streetman Date: Tue, 22 Sep 2015 13:09:09 -0400 X-Google-Sender-Auth: E1R8CCkePfU6gYQyb0RnG31-JyY Message-ID: Subject: Re: [PATCH 1/2] zbud: allow PAGE_SIZE allocations To: Vitaly Wool Cc: Andrew Morton , Minchan Kim , Sergey Senozhatsky , linux-kernel , Linux-MM , Seth Jennings Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Sep 22, 2015 at 6:18 AM, Vitaly Wool wrote: > Hi Dan, > > On Mon, Sep 21, 2015 at 6:17 PM, Dan Streetman wrote: >> Please make sure to cc Seth also, he's the owner of zbud. > > Sure :) > > >>> @@ -514,8 +552,17 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) >>> return -EINVAL; >>> } >>> for (i = 0; i < retries; i++) { >>> - zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru); >>> - list_del(&zhdr->lru); >>> + page = list_tail_entry(&pool->lru, struct page, lru); >>> + zhdr = page_address(page); >>> + list_del(&page->lru); >>> + /* Uncompressed zbud page? just run eviction and free it */ >>> + if (page->flags & PG_uncompressed) { >>> + page->flags &= ~PG_uncompressed; >>> + spin_unlock(&pool->lock); >>> + pool->ops->evict(pool, encode_handle(zhdr, FULL)); >>> + __free_page(page); >>> + return 0; >> >> again, don't be redundant. change the function to handle full-sized >> pages, don't repeat the function in an if() block for a special case. > > Well, this case is a little tricky. How to process a zbud page in > zbud_reclaim_page() is defined basing on the assumption there is a > zhdr at the beginning of the page. What can be done here IMV is either > of the following: aha, this is why you used the page flag. > * add a constant magic number to zhdr and check for it, if the check > fails, it is a type FULL page > * add a CRC field to zhdr, if CRC check over assumed zhdr fails, it > is a type FULL page neither of those; you can't guarantee the magic number won't naturally occur in a page. > * use a field from struct page to indicate its type sure, you could use a pre-existing field from struct page, like the page->private field. > > The last option still looks better to me. > > ~vitaly