Re: [PATCH v4 11/12] zsmalloc: page migration support

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Minchan Kim <minchan@kernel.org>
To: Chulmin Kim <cmlaika.kim@samsung.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Subject: Re: [PATCH v4 11/12] zsmalloc: page migration support
Date: Tue, 3 May 2016 10:43:05 +0900	[thread overview]
Message-ID: <20160503014305.GC2272@bbox> (raw)
In-Reply-To: <20160503004359.GA2272@bbox>

On Tue, May 03, 2016 at 09:43:59AM +0900, Minchan Kim wrote:
> Good morning, Chulmin
> 
> On Tue, May 03, 2016 at 08:33:16AM +0900, Chulmin Kim wrote:
> > Hello, Minchan!
> > 
> > On 2016년 04월 27일 16:48, Minchan Kim wrote:
> > >This patch introduces run-time migration feature for zspage.
> > >
> > >For migration, VM uses page.lru field so it would be better to not use
> > >page.next field for own purpose. For that, firstly, we can get first
> > >object offset of the page via runtime calculation instead of
> > >page->index so we can use page->index as link for page chaining.
> > >In case of huge object, it stores handle rather than page chaining.
> > >To identify huge object, we uses PG_owner_priv_1 flag.
> > >
> > >For migration, it supports three functions
> > >
> > >* zs_page_isolate
> > >
> > >It isolates a zspage which includes a subpage VM want to migrate from
> > >class so anyone cannot allocate new object from the zspage if it's first
> > >isolation on subpages of zspage. Thus, further isolation on other
> > >subpages cannot isolate zspage from class list.
> > >
> > >* zs_page_migrate
> > >
> > >First of all, it holds write-side zspage->lock to prevent migrate other
> > >subpage in zspage. Then, lock all objects in the page VM want to migrate.
> > >The reason we should lock all objects in the page is due to race between
> > >zs_map_object and zs_page_migrate.
> > >
> > >zs_map_object				zs_page_migrate
> > >
> > >pin_tag(handle)
> > >obj = handle_to_obj(handle)
> > >obj_to_location(obj, &page, &obj_idx);
> > >
> > >					write_lock(&zspage->lock)
> > >					if (!trypin_tag(handle))
> > >						goto unpin_object
> > >
> > >zspage = get_zspage(page);
> > >read_lock(&zspage->lock);
> > >
> > >If zs_page_migrate doesn't do trypin_tag, zs_map_object's page can
> > >be stale so go crash.
> > >
> > >If it locks all of objects successfully, it copies content from old page
> > >create new one, finally, create new page chain with new page.
> > >If it's last isolated page in the zspage, put the zspage back to class.
> > >
> > >* zs_page_putback
> > >
> > >It returns isolated zspage to right fullness_group list if it fails to
> > >migrate a page.
> > >
> > >Lastly, this patch introduces asynchronous zspage free. The reason
> > >we need it is we need page_lock to clear PG_movable but unfortunately,
> > >zs_free path should be atomic so the apporach is try to grab page_lock
> > >with preemption disabled. If it got page_lock of all of pages
> > >successfully, it can free zspage in the context. Otherwise, it queues
> > >the free request and free zspage via workqueue in process context.
> > >
> > >Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > >Signed-off-by: Minchan Kim <minchan@kernel.org>
> > >---
> > >  include/uapi/linux/magic.h |   1 +
> > >  mm/zsmalloc.c              | 552 +++++++++++++++++++++++++++++++++++++++------
> > >  2 files changed, 487 insertions(+), 66 deletions(-)
> > >
> > >diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
> > >index e1fbe72c39c0..93b1affe4801 100644
> > >--- a/include/uapi/linux/magic.h
> > >+++ b/include/uapi/linux/magic.h
> > >@@ -79,5 +79,6 @@
> > >  #define NSFS_MAGIC		0x6e736673
> > >  #define BPF_FS_MAGIC		0xcafe4a11
> > >  #define BALLOON_KVM_MAGIC	0x13661366
> > >+#define ZSMALLOC_MAGIC		0x58295829
> > >
> > >  #endif /* __LINUX_MAGIC_H__ */
> > >diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> > >index 8d82e44c4644..042793015ecf 100644
> > >--- a/mm/zsmalloc.c
> > >+++ b/mm/zsmalloc.c
> > >@@ -17,15 +17,14 @@
> > >   *
> > >   * Usage of struct page fields:
> > >   *	page->private: points to zspage
> > >- *	page->index: offset of the first object starting in this page.
> > >- *		For the first page, this is always 0, so we use this field
> > >- *		to store handle for huge object.
> > >- *	page->next: links together all component pages of a zspage
> > >+ *	page->freelist: links together all component pages of a zspage
> > >+ *		For the huge page, this is always 0, so we use this field
> > >+ *		to store handle.
> > >   *
> > >   * Usage of struct page flags:
> > >   *	PG_private: identifies the first component page
> > >   *	PG_private2: identifies the last component page
> > >- *
> > >+ *	PG_owner_priv_1: indentifies the huge component page
> > >   */
> > >
> > >  #include <linux/module.h>
> > >@@ -47,6 +46,10 @@
> > >  #include <linux/debugfs.h>
> > >  #include <linux/zsmalloc.h>
> > >  #include <linux/zpool.h>
> > >+#include <linux/mount.h>
> > >+#include <linux/migrate.h>
> > >+
> > >+#define ZSPAGE_MAGIC	0x58
> > >
> > >  /*
> > >   * This must be power of 2 and greater than of equal to sizeof(link_free).
> > >@@ -128,8 +131,33 @@
> > >   *  ZS_MIN_ALLOC_SIZE and ZS_SIZE_CLASS_DELTA must be multiple of ZS_ALIGN
> > >   *  (reason above)
> > >   */
> > >+
> > >+/*
> > >+ * A zspage's class index and fullness group
> > >+ * are encoded in its (first)page->mapping
> > >+ */
> > >+#define FULLNESS_BITS	2
> > >+#define CLASS_BITS	8
> > >+#define ISOLATED_BITS	3
> > >+#define MAGIC_VAL_BITS	8
> > >+
> > >+
> > >  #define ZS_SIZE_CLASS_DELTA	(PAGE_SIZE >> CLASS_BITS)
> > >
> > >+struct zspage {
> > >+	struct {
> > >+		unsigned int fullness:FULLNESS_BITS;
> > >+		unsigned int class:CLASS_BITS;
> > >+		unsigned int isolated:ISOLATED_BITS;
> > >+		unsigned int magic:MAGIC_VAL_BITS;
> > >+	};
> > >+	unsigned int inuse;
> > >+	unsigned int freeobj;
> > >+	struct page *first_page;
> > >+	struct list_head list; /* fullness list */
> > >+	rwlock_t lock;
> > >+};
> > >+
> > >  /*
> > >   * We do not maintain any list for completely empty or full pages
> > >   */
> > >@@ -161,6 +189,8 @@ struct zs_size_stat {
> > >  static struct dentry *zs_stat_root;
> > >  #endif
> > >
> > >+static struct vfsmount *zsmalloc_mnt;
> > >+
> > >  /*
> > >   * number of size_classes
> > >   */
> > >@@ -243,24 +273,10 @@ struct zs_pool {
> > >  #ifdef CONFIG_ZSMALLOC_STAT
> > >  	struct dentry *stat_dentry;
> > >  #endif
> > >-};
> > >-
> > >-/*
> > >- * A zspage's class index and fullness group
> > >- * are encoded in its (first)page->mapping
> > >- */
> > >-#define FULLNESS_BITS	2
> > >-#define CLASS_BITS	8
> > >-
> > >-struct zspage {
> > >-	struct {
> > >-		unsigned int fullness:FULLNESS_BITS;
> > >-		unsigned int class:CLASS_BITS;
> > >-	};
> > >-	unsigned int inuse;
> > >-	unsigned int freeobj;
> > >-	struct page *first_page;
> > >-	struct list_head list; /* fullness list */
> > >+	struct inode *inode;
> > >+	spinlock_t free_lock;
> > >+	struct work_struct free_work;
> > >+	struct list_head free_zspage;
> > >  };
> > >
> > >  struct mapping_area {
> > >@@ -312,8 +328,11 @@ static struct zspage *cache_alloc_zspage(struct zs_pool *pool, gfp_t flags)
> > >  	struct zspage *zspage;
> > >
> > >  	zspage = kmem_cache_alloc(pool->zspage_cachep, flags & ~__GFP_HIGHMEM);
> > >-	if (zspage)
> > >+	if (zspage) {
> > >  		memset(zspage, 0, sizeof(struct zspage));
> > >+		zspage->magic = ZSPAGE_MAGIC;
> > >+		rwlock_init(&zspage->lock);
> > 
> > +              INIT_LIST_HEAD(&zspage->list);
> > 
> > If there is no special intention here,
> > I think we need the list initialization.
> 
> Intention was that I just watned to add unncessary instruction there

                     I just don't want to add unnecessary instruction there
Typo. :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Minchan Kim <minchan@kernel.org>
To: Chulmin Kim <cmlaika.kim@samsung.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	<linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Subject: Re: [PATCH v4 11/12] zsmalloc: page migration support
Date: Tue, 3 May 2016 10:43:05 +0900	[thread overview]
Message-ID: <20160503014305.GC2272@bbox> (raw)
In-Reply-To: <20160503004359.GA2272@bbox>

On Tue, May 03, 2016 at 09:43:59AM +0900, Minchan Kim wrote:
> Good morning, Chulmin
> 
> On Tue, May 03, 2016 at 08:33:16AM +0900, Chulmin Kim wrote:
> > Hello, Minchan!
> > 
> > On 2016년 04월 27일 16:48, Minchan Kim wrote:
> > >This patch introduces run-time migration feature for zspage.
> > >
> > >For migration, VM uses page.lru field so it would be better to not use
> > >page.next field for own purpose. For that, firstly, we can get first
> > >object offset of the page via runtime calculation instead of
> > >page->index so we can use page->index as link for page chaining.
> > >In case of huge object, it stores handle rather than page chaining.
> > >To identify huge object, we uses PG_owner_priv_1 flag.
> > >
> > >For migration, it supports three functions
> > >
> > >* zs_page_isolate
> > >
> > >It isolates a zspage which includes a subpage VM want to migrate from
> > >class so anyone cannot allocate new object from the zspage if it's first
> > >isolation on subpages of zspage. Thus, further isolation on other
> > >subpages cannot isolate zspage from class list.
> > >
> > >* zs_page_migrate
> > >
> > >First of all, it holds write-side zspage->lock to prevent migrate other
> > >subpage in zspage. Then, lock all objects in the page VM want to migrate.
> > >The reason we should lock all objects in the page is due to race between
> > >zs_map_object and zs_page_migrate.
> > >
> > >zs_map_object				zs_page_migrate
> > >
> > >pin_tag(handle)
> > >obj = handle_to_obj(handle)
> > >obj_to_location(obj, &page, &obj_idx);
> > >
> > >					write_lock(&zspage->lock)
> > >					if (!trypin_tag(handle))
> > >						goto unpin_object
> > >
> > >zspage = get_zspage(page);
> > >read_lock(&zspage->lock);
> > >
> > >If zs_page_migrate doesn't do trypin_tag, zs_map_object's page can
> > >be stale so go crash.
> > >
> > >If it locks all of objects successfully, it copies content from old page
> > >create new one, finally, create new page chain with new page.
> > >If it's last isolated page in the zspage, put the zspage back to class.
> > >
> > >* zs_page_putback
> > >
> > >It returns isolated zspage to right fullness_group list if it fails to
> > >migrate a page.
> > >
> > >Lastly, this patch introduces asynchronous zspage free. The reason
> > >we need it is we need page_lock to clear PG_movable but unfortunately,
> > >zs_free path should be atomic so the apporach is try to grab page_lock
> > >with preemption disabled. If it got page_lock of all of pages
> > >successfully, it can free zspage in the context. Otherwise, it queues
> > >the free request and free zspage via workqueue in process context.
> > >
> > >Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > >Signed-off-by: Minchan Kim <minchan@kernel.org>
> > >---
> > >  include/uapi/linux/magic.h |   1 +
> > >  mm/zsmalloc.c              | 552 +++++++++++++++++++++++++++++++++++++++------
> > >  2 files changed, 487 insertions(+), 66 deletions(-)
> > >
> > >diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
> > >index e1fbe72c39c0..93b1affe4801 100644
> > >--- a/include/uapi/linux/magic.h
> > >+++ b/include/uapi/linux/magic.h
> > >@@ -79,5 +79,6 @@
> > >  #define NSFS_MAGIC		0x6e736673
> > >  #define BPF_FS_MAGIC		0xcafe4a11
> > >  #define BALLOON_KVM_MAGIC	0x13661366
> > >+#define ZSMALLOC_MAGIC		0x58295829
> > >
> > >  #endif /* __LINUX_MAGIC_H__ */
> > >diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> > >index 8d82e44c4644..042793015ecf 100644
> > >--- a/mm/zsmalloc.c
> > >+++ b/mm/zsmalloc.c
> > >@@ -17,15 +17,14 @@
> > >   *
> > >   * Usage of struct page fields:
> > >   *	page->private: points to zspage
> > >- *	page->index: offset of the first object starting in this page.
> > >- *		For the first page, this is always 0, so we use this field
> > >- *		to store handle for huge object.
> > >- *	page->next: links together all component pages of a zspage
> > >+ *	page->freelist: links together all component pages of a zspage
> > >+ *		For the huge page, this is always 0, so we use this field
> > >+ *		to store handle.
> > >   *
> > >   * Usage of struct page flags:
> > >   *	PG_private: identifies the first component page
> > >   *	PG_private2: identifies the last component page
> > >- *
> > >+ *	PG_owner_priv_1: indentifies the huge component page
> > >   */
> > >
> > >  #include <linux/module.h>
> > >@@ -47,6 +46,10 @@
> > >  #include <linux/debugfs.h>
> > >  #include <linux/zsmalloc.h>
> > >  #include <linux/zpool.h>
> > >+#include <linux/mount.h>
> > >+#include <linux/migrate.h>
> > >+
> > >+#define ZSPAGE_MAGIC	0x58
> > >
> > >  /*
> > >   * This must be power of 2 and greater than of equal to sizeof(link_free).
> > >@@ -128,8 +131,33 @@
> > >   *  ZS_MIN_ALLOC_SIZE and ZS_SIZE_CLASS_DELTA must be multiple of ZS_ALIGN
> > >   *  (reason above)
> > >   */
> > >+
> > >+/*
> > >+ * A zspage's class index and fullness group
> > >+ * are encoded in its (first)page->mapping
> > >+ */
> > >+#define FULLNESS_BITS	2
> > >+#define CLASS_BITS	8
> > >+#define ISOLATED_BITS	3
> > >+#define MAGIC_VAL_BITS	8
> > >+
> > >+
> > >  #define ZS_SIZE_CLASS_DELTA	(PAGE_SIZE >> CLASS_BITS)
> > >
> > >+struct zspage {
> > >+	struct {
> > >+		unsigned int fullness:FULLNESS_BITS;
> > >+		unsigned int class:CLASS_BITS;
> > >+		unsigned int isolated:ISOLATED_BITS;
> > >+		unsigned int magic:MAGIC_VAL_BITS;
> > >+	};
> > >+	unsigned int inuse;
> > >+	unsigned int freeobj;
> > >+	struct page *first_page;
> > >+	struct list_head list; /* fullness list */
> > >+	rwlock_t lock;
> > >+};
> > >+
> > >  /*
> > >   * We do not maintain any list for completely empty or full pages
> > >   */
> > >@@ -161,6 +189,8 @@ struct zs_size_stat {
> > >  static struct dentry *zs_stat_root;
> > >  #endif
> > >
> > >+static struct vfsmount *zsmalloc_mnt;
> > >+
> > >  /*
> > >   * number of size_classes
> > >   */
> > >@@ -243,24 +273,10 @@ struct zs_pool {
> > >  #ifdef CONFIG_ZSMALLOC_STAT
> > >  	struct dentry *stat_dentry;
> > >  #endif
> > >-};
> > >-
> > >-/*
> > >- * A zspage's class index and fullness group
> > >- * are encoded in its (first)page->mapping
> > >- */
> > >-#define FULLNESS_BITS	2
> > >-#define CLASS_BITS	8
> > >-
> > >-struct zspage {
> > >-	struct {
> > >-		unsigned int fullness:FULLNESS_BITS;
> > >-		unsigned int class:CLASS_BITS;
> > >-	};
> > >-	unsigned int inuse;
> > >-	unsigned int freeobj;
> > >-	struct page *first_page;
> > >-	struct list_head list; /* fullness list */
> > >+	struct inode *inode;
> > >+	spinlock_t free_lock;
> > >+	struct work_struct free_work;
> > >+	struct list_head free_zspage;
> > >  };
> > >
> > >  struct mapping_area {
> > >@@ -312,8 +328,11 @@ static struct zspage *cache_alloc_zspage(struct zs_pool *pool, gfp_t flags)
> > >  	struct zspage *zspage;
> > >
> > >  	zspage = kmem_cache_alloc(pool->zspage_cachep, flags & ~__GFP_HIGHMEM);
> > >-	if (zspage)
> > >+	if (zspage) {
> > >  		memset(zspage, 0, sizeof(struct zspage));
> > >+		zspage->magic = ZSPAGE_MAGIC;
> > >+		rwlock_init(&zspage->lock);
> > 
> > +              INIT_LIST_HEAD(&zspage->list);
> > 
> > If there is no special intention here,
> > I think we need the list initialization.
> 
> Intention was that I just watned to add unncessary instruction there

                     I just don't want to add unnecessary instruction there
Typo. :)

next prev parent reply	other threads:[~2016-05-03  1:43 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-27  7:48 [PATCH v4 00/13] Support non-lru page migration Minchan Kim
2016-04-27  7:48 ` Minchan Kim
2016-04-27  7:48 ` Minchan Kim
2016-04-27  7:48 ` [PATCH v4 01/12] mm: use put_page to free page instead of putback_lru_page Minchan Kim
2016-04-27  7:48   ` Minchan Kim
2016-04-27  7:48 ` [PATCH v4 02/12] mm: migrate: support non-lru movable page migration Minchan Kim
2016-04-27  7:48   ` Minchan Kim
2016-04-27  7:48 ` Minchan Kim
2016-04-27  7:48 ` [PATCH v4 03/12] mm: balloon: use general non-lru movable page feature Minchan Kim
2016-04-27  7:48   ` Minchan Kim
2016-04-27  7:48   ` Minchan Kim
2016-04-27  7:48 ` [PATCH v4 04/12] zsmalloc: keep max_object in size_class Minchan Kim
2016-04-27  7:48   ` Minchan Kim
2016-04-27  7:48 ` [PATCH v4 05/12] zsmalloc: use bit_spin_lock Minchan Kim
2016-04-27  7:48   ` Minchan Kim
2016-04-27  7:48 ` [PATCH v4 06/12] zsmalloc: use accessor Minchan Kim
2016-04-27  7:48   ` Minchan Kim
2016-04-27  7:48 ` [PATCH v4 07/12] zsmalloc: factor page chain functionality out Minchan Kim
2016-04-27  7:48   ` Minchan Kim
2016-04-27  7:48 ` [PATCH v4 08/12] zsmalloc: introduce zspage structure Minchan Kim
2016-04-27  7:48   ` Minchan Kim
2016-04-27  7:48 ` [PATCH v4 09/12] zsmalloc: separate free_zspage from putback_zspage Minchan Kim
2016-04-27  7:48   ` Minchan Kim
2016-04-27  7:48 ` [PATCH v4 10/12] zsmalloc: use freeobj for index Minchan Kim
2016-04-27  7:48   ` Minchan Kim
2016-04-27  7:48 ` [PATCH v4 11/12] zsmalloc: page migration support Minchan Kim
2016-04-27  7:48   ` Minchan Kim
2016-05-02 23:33   ` Chulmin Kim
2016-05-02 23:33     ` Chulmin Kim
2016-05-03  0:43     ` Minchan Kim
2016-05-03  0:43       ` Minchan Kim
2016-05-03  1:42       ` Chulmin Kim
2016-05-03  1:42         ` Chulmin Kim
2016-05-03  1:58         ` Minchan Kim
2016-05-03  1:58           ` Minchan Kim
2016-05-03  1:43       ` Minchan Kim [this message]
2016-05-03  1:43         ` Minchan Kim
2016-04-27  7:48 ` [PATCH v4 12/12] zram: use __GFP_MOVABLE for memory allocation Minchan Kim
2016-04-27  7:48   ` Minchan Kim
2016-04-27 20:20 ` [PATCH v4 00/13] Support non-lru page migration Andrew Morton
2016-04-27 20:20   ` Andrew Morton
2016-04-27 20:20   ` Andrew Morton
2016-04-27 23:54   ` Minchan Kim
2016-04-27 23:54     ` Minchan Kim
2016-04-27 23:54     ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160503014305.GC2272@bbox \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cmlaika.kim@samsung.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sergey.senozhatsky@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.