From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f200.google.com (mail-ig0-f200.google.com [209.85.213.200]) by kanga.kvack.org (Postfix) with ESMTP id 574E46B0253 for ; Mon, 2 May 2016 21:43:11 -0400 (EDT) Received: by mail-ig0-f200.google.com with SMTP id fn8so20377943igb.1 for ; Mon, 02 May 2016 18:43:11 -0700 (PDT) Received: from lgeamrelo11.lge.com (LGEAMRELO11.lge.com. [156.147.23.51]) by mx.google.com with ESMTP id oo7si1215277igb.67.2016.05.02.18.43.08 for ; Mon, 02 May 2016 18:43:09 -0700 (PDT) Date: Tue, 3 May 2016 10:43:05 +0900 From: Minchan Kim Subject: Re: [PATCH v4 11/12] zsmalloc: page migration support Message-ID: <20160503014305.GC2272@bbox> References: <1461743305-19970-1-git-send-email-minchan@kernel.org> <1461743305-19970-12-git-send-email-minchan@kernel.org> <5727E3BC.8070308@samsung.com> <20160503004359.GA2272@bbox> MIME-Version: 1.0 In-Reply-To: <20160503004359.GA2272@bbox> Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Chulmin Kim Cc: Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Sergey Senozhatsky On Tue, May 03, 2016 at 09:43:59AM +0900, Minchan Kim wrote: > Good morning, Chulmin >=20 > On Tue, May 03, 2016 at 08:33:16AM +0900, Chulmin Kim wrote: > > Hello, Minchan! > >=20 > > On 2016=EB=85=84 04=EC=9B=94 27=EC=9D=BC 16:48, Minchan Kim wrote: > > >This patch introduces run-time migration feature for zspage. > > > > > >For migration, VM uses page.lru field so it would be better to not use > > >page.next field for own purpose. For that, firstly, we can get first > > >object offset of the page via runtime calculation instead of > > >page->index so we can use page->index as link for page chaining. > > >In case of huge object, it stores handle rather than page chaining. > > >To identify huge object, we uses PG=5Fowner=5Fpriv=5F1 flag. > > > > > >For migration, it supports three functions > > > > > >* zs=5Fpage=5Fisolate > > > > > >It isolates a zspage which includes a subpage VM want to migrate from > > >class so anyone cannot allocate new object from the zspage if it's fir= st > > >isolation on subpages of zspage. Thus, further isolation on other > > >subpages cannot isolate zspage from class list. > > > > > >* zs=5Fpage=5Fmigrate > > > > > >First of all, it holds write-side zspage->lock to prevent migrate other > > >subpage in zspage. Then, lock all objects in the page VM want to migra= te. > > >The reason we should lock all objects in the page is due to race betwe= en > > >zs=5Fmap=5Fobject and zs=5Fpage=5Fmigrate. > > > > > >zs=5Fmap=5Fobject zs=5Fpage=5Fmigrate > > > > > >pin=5Ftag(handle) > > >obj =3D handle=5Fto=5Fobj(handle) > > >obj=5Fto=5Flocation(obj, &page, &obj=5Fidx); > > > > > > write=5Flock(&zspage->lock) > > > if (!trypin=5Ftag(handle)) > > > goto unpin=5Fobject > > > > > >zspage =3D get=5Fzspage(page); > > >read=5Flock(&zspage->lock); > > > > > >If zs=5Fpage=5Fmigrate doesn't do trypin=5Ftag, zs=5Fmap=5Fobject's pa= ge can > > >be stale so go crash. > > > > > >If it locks all of objects successfully, it copies content from old pa= ge > > >create new one, finally, create new page chain with new page. > > >If it's last isolated page in the zspage, put the zspage back to class. > > > > > >* zs=5Fpage=5Fputback > > > > > >It returns isolated zspage to right fullness=5Fgroup list if it fails = to > > >migrate a page. > > > > > >Lastly, this patch introduces asynchronous zspage free. The reason > > >we need it is we need page=5Flock to clear PG=5Fmovable but unfortunat= ely, > > >zs=5Ffree path should be atomic so the apporach is try to grab page=5F= lock > > >with preemption disabled. If it got page=5Flock of all of pages > > >successfully, it can free zspage in the context. Otherwise, it queues > > >the free request and free zspage via workqueue in process context. > > > > > >Cc: Sergey Senozhatsky > > >Signed-off-by: Minchan Kim > > >--- > > > include/uapi/linux/magic.h | 1 + > > > mm/zsmalloc.c | 552 +++++++++++++++++++++++++++++++++++= ++++------ > > > 2 files changed, 487 insertions(+), 66 deletions(-) > > > > > >diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h > > >index e1fbe72c39c0..93b1affe4801 100644 > > >--- a/include/uapi/linux/magic.h > > >+++ b/include/uapi/linux/magic.h > > >@@ -79,5 +79,6 @@ > > > #define NSFS=5FMAGIC 0x6e736673 > > > #define BPF=5FFS=5FMAGIC 0xcafe4a11 > > > #define BALLOON=5FKVM=5FMAGIC 0x13661366 > > >+#define ZSMALLOC=5FMAGIC 0x58295829 > > > > > > #endif /* =5F=5FLINUX=5FMAGIC=5FH=5F=5F */ > > >diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > > >index 8d82e44c4644..042793015ecf 100644 > > >--- a/mm/zsmalloc.c > > >+++ b/mm/zsmalloc.c > > >@@ -17,15 +17,14 @@ > > > * > > > * Usage of struct page fields: > > > * page->private: points to zspage > > >- * page->index: offset of the first object starting in this page. > > >- * For the first page, this is always 0, so we use this field > > >- * to store handle for huge object. > > >- * page->next: links together all component pages of a zspage > > >+ * page->freelist: links together all component pages of a zspage > > >+ * For the huge page, this is always 0, so we use this field > > >+ * to store handle. > > > * > > > * Usage of struct page flags: > > > * PG=5Fprivate: identifies the first component page > > > * PG=5Fprivate2: identifies the last component page > > >- * > > >+ * PG=5Fowner=5Fpriv=5F1: indentifies the huge component page > > > */ > > > > > > #include > > >@@ -47,6 +46,10 @@ > > > #include > > > #include > > > #include > > >+#include > > >+#include > > >+ > > >+#define ZSPAGE=5FMAGIC 0x58 > > > > > > /* > > > * This must be power of 2 and greater than of equal to sizeof(link= =5Ffree). > > >@@ -128,8 +131,33 @@ > > > * ZS=5FMIN=5FALLOC=5FSIZE and ZS=5FSIZE=5FCLASS=5FDELTA must be mu= ltiple of ZS=5FALIGN > > > * (reason above) > > > */ > > >+ > > >+/* > > >+ * A zspage's class index and fullness group > > >+ * are encoded in its (first)page->mapping > > >+ */ > > >+#define FULLNESS=5FBITS 2 > > >+#define CLASS=5FBITS 8 > > >+#define ISOLATED=5FBITS 3 > > >+#define MAGIC=5FVAL=5FBITS 8 > > >+ > > >+ > > > #define ZS=5FSIZE=5FCLASS=5FDELTA (PAGE=5FSIZE >> CLASS=5FBITS) > > > > > >+struct zspage { > > >+ struct { > > >+ unsigned int fullness:FULLNESS=5FBITS; > > >+ unsigned int class:CLASS=5FBITS; > > >+ unsigned int isolated:ISOLATED=5FBITS; > > >+ unsigned int magic:MAGIC=5FVAL=5FBITS; > > >+ }; > > >+ unsigned int inuse; > > >+ unsigned int freeobj; > > >+ struct page *first=5Fpage; > > >+ struct list=5Fhead list; /* fullness list */ > > >+ rwlock=5Ft lock; > > >+}; > > >+ > > > /* > > > * We do not maintain any list for completely empty or full pages > > > */ > > >@@ -161,6 +189,8 @@ struct zs=5Fsize=5Fstat { > > > static struct dentry *zs=5Fstat=5Froot; > > > #endif > > > > > >+static struct vfsmount *zsmalloc=5Fmnt; > > >+ > > > /* > > > * number of size=5Fclasses > > > */ > > >@@ -243,24 +273,10 @@ struct zs=5Fpool { > > > #ifdef CONFIG=5FZSMALLOC=5FSTAT > > > struct dentry *stat=5Fdentry; > > > #endif > > >-}; > > >- > > >-/* > > >- * A zspage's class index and fullness group > > >- * are encoded in its (first)page->mapping > > >- */ > > >-#define FULLNESS=5FBITS 2 > > >-#define CLASS=5FBITS 8 > > >- > > >-struct zspage { > > >- struct { > > >- unsigned int fullness:FULLNESS=5FBITS; > > >- unsigned int class:CLASS=5FBITS; > > >- }; > > >- unsigned int inuse; > > >- unsigned int freeobj; > > >- struct page *first=5Fpage; > > >- struct list=5Fhead list; /* fullness list */ > > >+ struct inode *inode; > > >+ spinlock=5Ft free=5Flock; > > >+ struct work=5Fstruct free=5Fwork; > > >+ struct list=5Fhead free=5Fzspage; > > > }; > > > > > > struct mapping=5Farea { > > >@@ -312,8 +328,11 @@ static struct zspage *cache=5Falloc=5Fzspage(stru= ct zs=5Fpool *pool, gfp=5Ft flags) > > > struct zspage *zspage; > > > > > > zspage =3D kmem=5Fcache=5Falloc(pool->zspage=5Fcachep, flags & ~=5F= =5FGFP=5FHIGHMEM); > > >- if (zspage) > > >+ if (zspage) { > > > memset(zspage, 0, sizeof(struct zspage)); > > >+ zspage->magic =3D ZSPAGE=5FMAGIC; > > >+ rwlock=5Finit(&zspage->lock); > >=20 > > + INIT=5FLIST=5FHEAD(&zspage->list); > >=20 > > If there is no special intention here, > > I think we need the list initialization. >=20 > Intention was that I just watned to add unncessary instruction there I just don't want to add unnecessary instruction there Typo. :) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932990AbcECBnO (ORCPT ); Mon, 2 May 2016 21:43:14 -0400 Received: from LGEAMRELO11.lge.com ([156.147.23.51]:53033 "EHLO lgeamrelo11.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932281AbcECBnK convert rfc822-to-8bit (ORCPT ); Mon, 2 May 2016 21:43:10 -0400 X-Original-SENDERIP: 156.147.1.125 X-Original-MAILFROM: minchan@kernel.org X-Original-SENDERIP: 165.244.98.150 X-Original-MAILFROM: minchan@kernel.org X-Original-SENDERIP: 10.177.223.161 X-Original-MAILFROM: minchan@kernel.org Date: Tue, 3 May 2016 10:43:05 +0900 From: Minchan Kim To: Chulmin Kim CC: Andrew Morton , , , Sergey Senozhatsky Subject: Re: [PATCH v4 11/12] zsmalloc: page migration support Message-ID: <20160503014305.GC2272@bbox> References: <1461743305-19970-1-git-send-email-minchan@kernel.org> <1461743305-19970-12-git-send-email-minchan@kernel.org> <5727E3BC.8070308@samsung.com> <20160503004359.GA2272@bbox> MIME-Version: 1.0 In-Reply-To: <20160503004359.GA2272@bbox> User-Agent: Mutt/1.5.21 (2010-09-15) X-MIMETrack: Itemize by SMTP Server on LGEKRMHUB07/LGE/LG Group(Release 8.5.3FP6|November 21, 2013) at 2016/05/03 10:43:04, Serialize by Router on LGEKRMHUB07/LGE/LG Group(Release 8.5.3FP6|November 21, 2013) at 2016/05/03 10:43:04 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 03, 2016 at 09:43:59AM +0900, Minchan Kim wrote: > Good morning, Chulmin > > On Tue, May 03, 2016 at 08:33:16AM +0900, Chulmin Kim wrote: > > Hello, Minchan! > > > > On 2016년 04월 27일 16:48, Minchan Kim wrote: > > >This patch introduces run-time migration feature for zspage. > > > > > >For migration, VM uses page.lru field so it would be better to not use > > >page.next field for own purpose. For that, firstly, we can get first > > >object offset of the page via runtime calculation instead of > > >page->index so we can use page->index as link for page chaining. > > >In case of huge object, it stores handle rather than page chaining. > > >To identify huge object, we uses PG_owner_priv_1 flag. > > > > > >For migration, it supports three functions > > > > > >* zs_page_isolate > > > > > >It isolates a zspage which includes a subpage VM want to migrate from > > >class so anyone cannot allocate new object from the zspage if it's first > > >isolation on subpages of zspage. Thus, further isolation on other > > >subpages cannot isolate zspage from class list. > > > > > >* zs_page_migrate > > > > > >First of all, it holds write-side zspage->lock to prevent migrate other > > >subpage in zspage. Then, lock all objects in the page VM want to migrate. > > >The reason we should lock all objects in the page is due to race between > > >zs_map_object and zs_page_migrate. > > > > > >zs_map_object zs_page_migrate > > > > > >pin_tag(handle) > > >obj = handle_to_obj(handle) > > >obj_to_location(obj, &page, &obj_idx); > > > > > > write_lock(&zspage->lock) > > > if (!trypin_tag(handle)) > > > goto unpin_object > > > > > >zspage = get_zspage(page); > > >read_lock(&zspage->lock); > > > > > >If zs_page_migrate doesn't do trypin_tag, zs_map_object's page can > > >be stale so go crash. > > > > > >If it locks all of objects successfully, it copies content from old page > > >create new one, finally, create new page chain with new page. > > >If it's last isolated page in the zspage, put the zspage back to class. > > > > > >* zs_page_putback > > > > > >It returns isolated zspage to right fullness_group list if it fails to > > >migrate a page. > > > > > >Lastly, this patch introduces asynchronous zspage free. The reason > > >we need it is we need page_lock to clear PG_movable but unfortunately, > > >zs_free path should be atomic so the apporach is try to grab page_lock > > >with preemption disabled. If it got page_lock of all of pages > > >successfully, it can free zspage in the context. Otherwise, it queues > > >the free request and free zspage via workqueue in process context. > > > > > >Cc: Sergey Senozhatsky > > >Signed-off-by: Minchan Kim > > >--- > > > include/uapi/linux/magic.h | 1 + > > > mm/zsmalloc.c | 552 +++++++++++++++++++++++++++++++++++++++------ > > > 2 files changed, 487 insertions(+), 66 deletions(-) > > > > > >diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h > > >index e1fbe72c39c0..93b1affe4801 100644 > > >--- a/include/uapi/linux/magic.h > > >+++ b/include/uapi/linux/magic.h > > >@@ -79,5 +79,6 @@ > > > #define NSFS_MAGIC 0x6e736673 > > > #define BPF_FS_MAGIC 0xcafe4a11 > > > #define BALLOON_KVM_MAGIC 0x13661366 > > >+#define ZSMALLOC_MAGIC 0x58295829 > > > > > > #endif /* __LINUX_MAGIC_H__ */ > > >diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > > >index 8d82e44c4644..042793015ecf 100644 > > >--- a/mm/zsmalloc.c > > >+++ b/mm/zsmalloc.c > > >@@ -17,15 +17,14 @@ > > > * > > > * Usage of struct page fields: > > > * page->private: points to zspage > > >- * page->index: offset of the first object starting in this page. > > >- * For the first page, this is always 0, so we use this field > > >- * to store handle for huge object. > > >- * page->next: links together all component pages of a zspage > > >+ * page->freelist: links together all component pages of a zspage > > >+ * For the huge page, this is always 0, so we use this field > > >+ * to store handle. > > > * > > > * Usage of struct page flags: > > > * PG_private: identifies the first component page > > > * PG_private2: identifies the last component page > > >- * > > >+ * PG_owner_priv_1: indentifies the huge component page > > > */ > > > > > > #include > > >@@ -47,6 +46,10 @@ > > > #include > > > #include > > > #include > > >+#include > > >+#include > > >+ > > >+#define ZSPAGE_MAGIC 0x58 > > > > > > /* > > > * This must be power of 2 and greater than of equal to sizeof(link_free). > > >@@ -128,8 +131,33 @@ > > > * ZS_MIN_ALLOC_SIZE and ZS_SIZE_CLASS_DELTA must be multiple of ZS_ALIGN > > > * (reason above) > > > */ > > >+ > > >+/* > > >+ * A zspage's class index and fullness group > > >+ * are encoded in its (first)page->mapping > > >+ */ > > >+#define FULLNESS_BITS 2 > > >+#define CLASS_BITS 8 > > >+#define ISOLATED_BITS 3 > > >+#define MAGIC_VAL_BITS 8 > > >+ > > >+ > > > #define ZS_SIZE_CLASS_DELTA (PAGE_SIZE >> CLASS_BITS) > > > > > >+struct zspage { > > >+ struct { > > >+ unsigned int fullness:FULLNESS_BITS; > > >+ unsigned int class:CLASS_BITS; > > >+ unsigned int isolated:ISOLATED_BITS; > > >+ unsigned int magic:MAGIC_VAL_BITS; > > >+ }; > > >+ unsigned int inuse; > > >+ unsigned int freeobj; > > >+ struct page *first_page; > > >+ struct list_head list; /* fullness list */ > > >+ rwlock_t lock; > > >+}; > > >+ > > > /* > > > * We do not maintain any list for completely empty or full pages > > > */ > > >@@ -161,6 +189,8 @@ struct zs_size_stat { > > > static struct dentry *zs_stat_root; > > > #endif > > > > > >+static struct vfsmount *zsmalloc_mnt; > > >+ > > > /* > > > * number of size_classes > > > */ > > >@@ -243,24 +273,10 @@ struct zs_pool { > > > #ifdef CONFIG_ZSMALLOC_STAT > > > struct dentry *stat_dentry; > > > #endif > > >-}; > > >- > > >-/* > > >- * A zspage's class index and fullness group > > >- * are encoded in its (first)page->mapping > > >- */ > > >-#define FULLNESS_BITS 2 > > >-#define CLASS_BITS 8 > > >- > > >-struct zspage { > > >- struct { > > >- unsigned int fullness:FULLNESS_BITS; > > >- unsigned int class:CLASS_BITS; > > >- }; > > >- unsigned int inuse; > > >- unsigned int freeobj; > > >- struct page *first_page; > > >- struct list_head list; /* fullness list */ > > >+ struct inode *inode; > > >+ spinlock_t free_lock; > > >+ struct work_struct free_work; > > >+ struct list_head free_zspage; > > > }; > > > > > > struct mapping_area { > > >@@ -312,8 +328,11 @@ static struct zspage *cache_alloc_zspage(struct zs_pool *pool, gfp_t flags) > > > struct zspage *zspage; > > > > > > zspage = kmem_cache_alloc(pool->zspage_cachep, flags & ~__GFP_HIGHMEM); > > >- if (zspage) > > >+ if (zspage) { > > > memset(zspage, 0, sizeof(struct zspage)); > > >+ zspage->magic = ZSPAGE_MAGIC; > > >+ rwlock_init(&zspage->lock); > > > > + INIT_LIST_HEAD(&zspage->list); > > > > If there is no special intention here, > > I think we need the list initialization. > > Intention was that I just watned to add unncessary instruction there I just don't want to add unnecessary instruction there Typo. :)