From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f171.google.com (mail-ig0-f171.google.com [209.85.213.171]) by kanga.kvack.org (Postfix) with ESMTP id 030406B0005 for ; Mon, 4 Apr 2016 05:01:10 -0400 (EDT) Received: by mail-ig0-f171.google.com with SMTP id gy3so8903531igb.0 for ; Mon, 04 Apr 2016 02:01:09 -0700 (PDT) Received: from lgeamrelo13.lge.com (LGEAMRELO13.lge.com. [156.147.23.53]) by mx.google.com with ESMTP id c16si7124278igo.104.2016.04.04.02.01.08 for ; Mon, 04 Apr 2016 02:01:09 -0700 (PDT) Date: Mon, 4 Apr 2016 18:01:14 +0900 From: Minchan Kim Subject: Re: [PATCH v3 12/16] zsmalloc: zs_compact refactoring Message-ID: <20160404090114.GA12898@bbox> References: <1459321935-3655-1-git-send-email-minchan@kernel.org> <1459321935-3655-13-git-send-email-minchan@kernel.org> <57022000.9030705@samsung.com> MIME-Version: 1.0 In-Reply-To: <57022000.9030705@samsung.com> Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Chulmin Kim Cc: Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org Hello Chulmin, On Mon, Apr 04, 2016 at 05:04:16PM +0900, Chulmin Kim wrote: > On 2016=EB=85=84 03=EC=9B=94 30=EC=9D=BC 16:12, Minchan Kim wrote: > >Currently, we rely on class->lock to prevent zspage destruction. > >It was okay until now because the critical section is short but > >with run-time migration, it could be long so class->lock is not > >a good apporach any more. > > > >So, this patch introduces [un]freeze=5Fzspage functions which > >freeze allocated objects in the zspage with pinning tag so > >user cannot free using object. With those functions, this patch > >redesign compaction. > > > >Those functions will be used for implementing zspage runtime > >migrations, too. > > > >Signed-off-by: Minchan Kim > >--- > > mm/zsmalloc.c | 393 ++++++++++++++++++++++++++++++++++++++------------= -------- > > 1 file changed, 257 insertions(+), 136 deletions(-) > > > >diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > >index b11dcd718502..ac8ca7b10720 100644 > >--- a/mm/zsmalloc.c > >+++ b/mm/zsmalloc.c > >@@ -922,6 +922,13 @@ static unsigned long obj=5Fto=5Fhead(struct size=5F= class *class, struct page *page, > > return *(unsigned long *)obj; > > } > > > >+static inline int testpin=5Ftag(unsigned long handle) > >+{ > >+ unsigned long *ptr =3D (unsigned long *)handle; > >+ > >+ return test=5Fbit(HANDLE=5FPIN=5FBIT, ptr); > >+} > >+ > > static inline int trypin=5Ftag(unsigned long handle) > > { > > unsigned long *ptr =3D (unsigned long *)handle; > >@@ -949,8 +956,7 @@ static void reset=5Fpage(struct page *page) > > page->freelist =3D NULL; > > } > > > >-static void free=5Fzspage(struct zs=5Fpool *pool, struct size=5Fclass *= class, > >- struct page *first=5Fpage) > >+static void free=5Fzspage(struct zs=5Fpool *pool, struct page *first=5F= page) > > { > > struct page *nextp, *tmp, *head=5Fextra; > > > >@@ -973,11 +979,6 @@ static void free=5Fzspage(struct zs=5Fpool *pool, s= truct size=5Fclass *class, > > } > > reset=5Fpage(head=5Fextra); > > =5F=5Ffree=5Fpage(head=5Fextra); > >- > >- zs=5Fstat=5Fdec(class, OBJ=5FALLOCATED, get=5Fmaxobj=5Fper=5Fzspage( > >- class->size, class->pages=5Fper=5Fzspage)); > >- atomic=5Flong=5Fsub(class->pages=5Fper=5Fzspage, > >- &pool->pages=5Fallocated); > > } > > > > /* Initialize a newly allocated zspage */ > >@@ -1325,6 +1326,11 @@ static bool zspage=5Ffull(struct size=5Fclass *cl= ass, struct page *first=5Fpage) > > return get=5Fzspage=5Finuse(first=5Fpage) =3D=3D class->objs=5Fper=5F= zspage; > > } > > > >+static bool zspage=5Fempty(struct size=5Fclass *class, struct page *fir= st=5Fpage) > >+{ > >+ return get=5Fzspage=5Finuse(first=5Fpage) =3D=3D 0; > >+} > >+ > > unsigned long zs=5Fget=5Ftotal=5Fpages(struct zs=5Fpool *pool) > > { > > return atomic=5Flong=5Fread(&pool->pages=5Fallocated); > >@@ -1455,7 +1461,6 @@ static unsigned long obj=5Fmalloc(struct size=5Fcl= ass *class, > > set=5Fpage=5Fprivate(first=5Fpage, handle | OBJ=5FALLOCATED=5FTAG); > > kunmap=5Fatomic(vaddr); > > mod=5Fzspage=5Finuse(first=5Fpage, 1); > >- zs=5Fstat=5Finc(class, OBJ=5FUSED, 1); > > > > obj =3D location=5Fto=5Fobj(m=5Fpage, obj); > > > >@@ -1510,6 +1515,7 @@ unsigned long zs=5Fmalloc(struct zs=5Fpool *pool, = size=5Ft size) > > } > > > > obj =3D obj=5Fmalloc(class, first=5Fpage, handle); > >+ zs=5Fstat=5Finc(class, OBJ=5FUSED, 1); > > /* Now move the zspage to another fullness group, if required */ > > fix=5Ffullness=5Fgroup(class, first=5Fpage); > > record=5Fobj(handle, obj); > >@@ -1540,7 +1546,6 @@ static void obj=5Ffree(struct size=5Fclass *class,= unsigned long obj) > > kunmap=5Fatomic(vaddr); > > set=5Ffreeobj(first=5Fpage, f=5Fobjidx); > > mod=5Fzspage=5Finuse(first=5Fpage, -1); > >- zs=5Fstat=5Fdec(class, OBJ=5FUSED, 1); > > } > > > > void zs=5Ffree(struct zs=5Fpool *pool, unsigned long handle) > >@@ -1564,10 +1569,19 @@ void zs=5Ffree(struct zs=5Fpool *pool, unsigned = long handle) > > > > spin=5Flock(&class->lock); > > obj=5Ffree(class, obj); > >+ zs=5Fstat=5Fdec(class, OBJ=5FUSED, 1); > > fullness =3D fix=5Ffullness=5Fgroup(class, first=5Fpage); > >- if (fullness =3D=3D ZS=5FEMPTY) > >- free=5Fzspage(pool, class, first=5Fpage); > >+ if (fullness =3D=3D ZS=5FEMPTY) { > >+ zs=5Fstat=5Fdec(class, OBJ=5FALLOCATED, get=5Fmaxobj=5Fper=5Fzspage( > >+ class->size, class->pages=5Fper=5Fzspage)); > >+ spin=5Funlock(&class->lock); > >+ atomic=5Flong=5Fsub(class->pages=5Fper=5Fzspage, > >+ &pool->pages=5Fallocated); > >+ free=5Fzspage(pool, first=5Fpage); > >+ goto out; > >+ } > > spin=5Funlock(&class->lock); > >+out: > > unpin=5Ftag(handle); > > > > free=5Fhandle(pool, handle); > >@@ -1637,127 +1651,66 @@ static void zs=5Fobject=5Fcopy(struct size=5Fcl= ass *class, unsigned long dst, > > kunmap=5Fatomic(s=5Faddr); > > } > > > >-/* > >- * Find alloced object in zspage from index object and > >- * return handle. > >- */ > >-static unsigned long find=5Falloced=5Fobj(struct size=5Fclass *class, > >- struct page *page, int index) > >+static unsigned long handle=5Ffrom=5Fobj(struct size=5Fclass *class, > >+ struct page *first=5Fpage, int obj=5Fidx) > > { > >- unsigned long head; > >- int offset =3D 0; > >- unsigned long handle =3D 0; > >- void *addr =3D kmap=5Fatomic(page); > >- > >- if (!is=5Ffirst=5Fpage(page)) > >- offset =3D page->index; > >- offset +=3D class->size * index; > >- > >- while (offset < PAGE=5FSIZE) { > >- head =3D obj=5Fto=5Fhead(class, page, addr + offset); > >- if (head & OBJ=5FALLOCATED=5FTAG) { > >- handle =3D head & ~OBJ=5FALLOCATED=5FTAG; > >- if (trypin=5Ftag(handle)) > >- break; > >- handle =3D 0; > >- } > >+ struct page *page; > >+ unsigned long offset=5Fin=5Fpage; > >+ void *addr; > >+ unsigned long head, handle =3D 0; > > > >- offset +=3D class->size; > >- index++; > >- } > >+ objidx=5Fto=5Fpage=5Fand=5Foffset(class, first=5Fpage, obj=5Fidx, > >+ &page, &offset=5Fin=5Fpage); > > > >+ addr =3D kmap=5Fatomic(page); > >+ head =3D obj=5Fto=5Fhead(class, page, addr + offset=5Fin=5Fpage); > >+ if (head & OBJ=5FALLOCATED=5FTAG) > >+ handle =3D head & ~OBJ=5FALLOCATED=5FTAG; > > kunmap=5Fatomic(addr); > >+ > > return handle; > > } > > > >-struct zs=5Fcompact=5Fcontrol { > >- /* Source page for migration which could be a subpage of zspage. */ > >- struct page *s=5Fpage; > >- /* Destination page for migration which should be a first page > >- * of zspage. */ > >- struct page *d=5Fpage; > >- /* Starting object index within @s=5Fpage which used for live object > >- * in the subpage. */ > >- int index; > >-}; > >- > >-static int migrate=5Fzspage(struct zs=5Fpool *pool, struct size=5Fclass= *class, > >- struct zs=5Fcompact=5Fcontrol *cc) > >+static int migrate=5Fzspage(struct size=5Fclass *class, struct page *ds= t=5Fpage, > >+ struct page *src=5Fpage) > > { > >- unsigned long used=5Fobj, free=5Fobj; > > unsigned long handle; > >- struct page *s=5Fpage =3D cc->s=5Fpage; > >- struct page *d=5Fpage =3D cc->d=5Fpage; > >- unsigned long index =3D cc->index; > >- int ret =3D 0; > >+ unsigned long old=5Fobj, new=5Fobj; > >+ int i; > >+ int nr=5Fmigrated =3D 0; > > > >- while (1) { > >- handle =3D find=5Falloced=5Fobj(class, s=5Fpage, index); > >- if (!handle) { > >- s=5Fpage =3D get=5Fnext=5Fpage(s=5Fpage); > >- if (!s=5Fpage) > >- break; > >- index =3D 0; > >+ for (i =3D 0; i < class->objs=5Fper=5Fzspage; i++) { > >+ handle =3D handle=5Ffrom=5Fobj(class, src=5Fpage, i); > >+ if (!handle) > > continue; > >- } > >- > >- /* Stop if there is no more space */ > >- if (zspage=5Ffull(class, d=5Fpage)) { > >- unpin=5Ftag(handle); > >- ret =3D -ENOMEM; > >+ if (zspage=5Ffull(class, dst=5Fpage)) > > break; > >- } > >- > >- used=5Fobj =3D handle=5Fto=5Fobj(handle); > >- free=5Fobj =3D obj=5Fmalloc(class, d=5Fpage, handle); > >- zs=5Fobject=5Fcopy(class, free=5Fobj, used=5Fobj); > >- index++; > >+ old=5Fobj =3D handle=5Fto=5Fobj(handle); > >+ new=5Fobj =3D obj=5Fmalloc(class, dst=5Fpage, handle); > >+ zs=5Fobject=5Fcopy(class, new=5Fobj, old=5Fobj); > >+ nr=5Fmigrated++; > > /* > > * record=5Fobj updates handle's value to free=5Fobj and it will > > * invalidate lock bit(ie, HANDLE=5FPIN=5FBIT) of handle, which > > * breaks synchronization using pin=5Ftag(e,g, zs=5Ffree) so > > * let's keep the lock bit. > > */ > >- free=5Fobj |=3D BIT(HANDLE=5FPIN=5FBIT); > >- record=5Fobj(handle, free=5Fobj); > >- unpin=5Ftag(handle); > >- obj=5Ffree(class, used=5Fobj); > >+ new=5Fobj |=3D BIT(HANDLE=5FPIN=5FBIT); > >+ record=5Fobj(handle, new=5Fobj); > >+ obj=5Ffree(class, old=5Fobj); > > } > >- > >- /* Remember last position in this iteration */ > >- cc->s=5Fpage =3D s=5Fpage; > >- cc->index =3D index; > >- > >- return ret; > >-} > >- > >-static struct page *isolate=5Ftarget=5Fpage(struct size=5Fclass *class) > >-{ > >- int i; > >- struct page *page; > >- > >- for (i =3D 0; i < =5FZS=5FNR=5FFULLNESS=5FGROUPS; i++) { > >- page =3D class->fullness=5Flist[i]; > >- if (page) { > >- remove=5Fzspage(class, i, page); > >- break; > >- } > >- } > >- > >- return page; > >+ return nr=5Fmigrated; > > } > > > > /* > > * putback=5Fzspage - add @first=5Fpage into right class's fullness li= st > >- * @pool: target pool > > * @class: destination class > > * @first=5Fpage: target page > > * > > * Return @first=5Fpage's updated fullness=5Fgroup > > */ > >-static enum fullness=5Fgroup putback=5Fzspage(struct zs=5Fpool *pool, > >- struct size=5Fclass *class, > >- struct page *first=5Fpage) > >+static enum fullness=5Fgroup putback=5Fzspage(struct size=5Fclass *clas= s, > >+ struct page *first=5Fpage) > > { > > enum fullness=5Fgroup fullness; > > > >@@ -1768,17 +1721,155 @@ static enum fullness=5Fgroup putback=5Fzspage(s= truct zs=5Fpool *pool, > > return fullness; > > } > > > >+/* > >+ * freeze=5Fzspage - freeze all objects in a zspage > >+ * @class: size class of the page > >+ * @first=5Fpage: first page of zspage > >+ * > >+ * Freeze all allocated objects in a zspage so objects couldn't be > >+ * freed until unfreeze objects. It should be called under class->lock. > >+ * > >+ * RETURNS: > >+ * the number of pinned objects > >+ */ > >+static int freeze=5Fzspage(struct size=5Fclass *class, struct page *fir= st=5Fpage) > >+{ > >+ unsigned long obj=5Fidx; > >+ struct page *obj=5Fpage; > >+ unsigned long offset; > >+ void *addr; > >+ int nr=5Ffreeze =3D 0; > >+ > >+ for (obj=5Fidx =3D 0; obj=5Fidx < class->objs=5Fper=5Fzspage; obj=5Fid= x++) { > >+ unsigned long head; > >+ > >+ objidx=5Fto=5Fpage=5Fand=5Foffset(class, first=5Fpage, obj=5Fidx, > >+ &obj=5Fpage, &offset); > >+ addr =3D kmap=5Fatomic(obj=5Fpage); > >+ head =3D obj=5Fto=5Fhead(class, obj=5Fpage, addr + offset); > >+ if (head & OBJ=5FALLOCATED=5FTAG) { > >+ unsigned long handle =3D head & ~OBJ=5FALLOCATED=5FTAG; > >+ > >+ if (!trypin=5Ftag(handle)) { > >+ kunmap=5Fatomic(addr); > >+ break; > >+ } > >+ nr=5Ffreeze++; > >+ } > >+ kunmap=5Fatomic(addr); > >+ } > >+ > >+ return nr=5Ffreeze; > >+} > >+ > >+/* > >+ * unfreeze=5Fpage - unfreeze objects freezed by freeze=5Fzspage in a z= spage > >+ * @class: size class of the page > >+ * @first=5Fpage: freezed zspage to unfreeze > >+ * @nr=5Fobj: the number of objects to unfreeze > >+ * > >+ * unfreeze objects in a zspage. > >+ */ > >+static void unfreeze=5Fzspage(struct size=5Fclass *class, struct page *= first=5Fpage, > >+ int nr=5Fobj) > >+{ > >+ unsigned long obj=5Fidx; > >+ struct page *obj=5Fpage; > >+ unsigned long offset; > >+ void *addr; > >+ int nr=5Funfreeze =3D 0; > >+ > >+ for (obj=5Fidx =3D 0; obj=5Fidx < class->objs=5Fper=5Fzspage && > >+ nr=5Funfreeze < nr=5Fobj; obj=5Fidx++) { > >+ unsigned long head; > >+ > >+ objidx=5Fto=5Fpage=5Fand=5Foffset(class, first=5Fpage, obj=5Fidx, > >+ &obj=5Fpage, &offset); > >+ addr =3D kmap=5Fatomic(obj=5Fpage); > >+ head =3D obj=5Fto=5Fhead(class, obj=5Fpage, addr + offset); > >+ if (head & OBJ=5FALLOCATED=5FTAG) { > >+ unsigned long handle =3D head & ~OBJ=5FALLOCATED=5FTAG; > >+ > >+ VM=5FBUG=5FON(!testpin=5Ftag(handle)); > >+ unpin=5Ftag(handle); > >+ nr=5Funfreeze++; > >+ } > >+ kunmap=5Fatomic(addr); > >+ } > >+} > >+ > >+/* > >+ * isolate=5Fsource=5Fpage - isolate a zspage for migration source > >+ * @class: size class of zspage for isolation > >+ * > >+ * Returns a zspage which are isolated from list so anyone can > >+ * allocate a object from that page. As well, freeze all objects > >+ * allocated in the zspage so anyone cannot access that objects > >+ * (e.g., zs=5Fmap=5Fobject, zs=5Ffree). > >+ */ > > static struct page *isolate=5Fsource=5Fpage(struct size=5Fclass *class) > > { > > int i; > > struct page *page =3D NULL; > > > > for (i =3D ZS=5FALMOST=5FEMPTY; i >=3D ZS=5FALMOST=5FFULL; i--) { > >+ int inuse, freezed; > >+ > > page =3D class->fullness=5Flist[i]; > > if (!page) > > continue; > > > > remove=5Fzspage(class, i, page); > >+ > >+ inuse =3D get=5Fzspage=5Finuse(page); > >+ freezed =3D freeze=5Fzspage(class, page); > >+ > >+ if (inuse !=3D freezed) { > >+ unfreeze=5Fzspage(class, page, freezed); > >+ putback=5Fzspage(class, page); > >+ page =3D NULL; > >+ continue; > >+ } > >+ > >+ break; > >+ } > >+ > >+ return page; > >+} > >+ > >+/* > >+ * isolate=5Ftarget=5Fpage - isolate a zspage for migration target > >+ * @class: size class of zspage for isolation > >+ * > >+ * Returns a zspage which are isolated from list so anyone can > >+ * allocate a object from that page. As well, freeze all objects > >+ * allocated in the zspage so anyone cannot access that objects > >+ * (e.g., zs=5Fmap=5Fobject, zs=5Ffree). > >+ */ > >+static struct page *isolate=5Ftarget=5Fpage(struct size=5Fclass *class) > >+{ > >+ int i; > >+ struct page *page; > >+ > >+ for (i =3D 0; i < =5FZS=5FNR=5FFULLNESS=5FGROUPS; i++) { > >+ int inuse, freezed; > >+ > >+ page =3D class->fullness=5Flist[i]; > >+ if (!page) > >+ continue; > >+ > >+ remove=5Fzspage(class, i, page); > >+ > >+ inuse =3D get=5Fzspage=5Finuse(page); > >+ freezed =3D freeze=5Fzspage(class, page); > >+ > >+ if (inuse !=3D freezed) { > >+ unfreeze=5Fzspage(class, page, freezed); > >+ putback=5Fzspage(class, page); > >+ page =3D NULL; > >+ continue; > >+ } > >+ > > break; > > } > > > >@@ -1793,9 +1884,11 @@ static struct page *isolate=5Fsource=5Fpage(struc= t size=5Fclass *class) > > static unsigned long zs=5Fcan=5Fcompact(struct size=5Fclass *class) > > { > > unsigned long obj=5Fwasted; > >+ unsigned long obj=5Fallocated, obj=5Fused; > > > >- obj=5Fwasted =3D zs=5Fstat=5Fget(class, OBJ=5FALLOCATED) - > >- zs=5Fstat=5Fget(class, OBJ=5FUSED); > >+ obj=5Fallocated =3D zs=5Fstat=5Fget(class, OBJ=5FALLOCATED); > >+ obj=5Fused =3D zs=5Fstat=5Fget(class, OBJ=5FUSED); > >+ obj=5Fwasted =3D obj=5Fallocated - obj=5Fused; > > > > obj=5Fwasted /=3D get=5Fmaxobj=5Fper=5Fzspage(class->size, > > class->pages=5Fper=5Fzspage); > >@@ -1805,53 +1898,81 @@ static unsigned long zs=5Fcan=5Fcompact(struct s= ize=5Fclass *class) > > > > static void =5F=5Fzs=5Fcompact(struct zs=5Fpool *pool, struct size=5Fc= lass *class) > > { > >- struct zs=5Fcompact=5Fcontrol cc; > >- struct page *src=5Fpage; > >+ struct page *src=5Fpage =3D NULL; > > struct page *dst=5Fpage =3D NULL; > > > >- spin=5Flock(&class->lock); > >- while ((src=5Fpage =3D isolate=5Fsource=5Fpage(class))) { > >+ while (1) { > >+ int nr=5Fmigrated; > > > >- if (!zs=5Fcan=5Fcompact(class)) > >+ spin=5Flock(&class->lock); > >+ if (!zs=5Fcan=5Fcompact(class)) { > >+ spin=5Funlock(&class->lock); > > break; > >+ } > > > >- cc.index =3D 0; > >- cc.s=5Fpage =3D src=5Fpage; > >+ /* > >+ * Isolate source page and freeze all objects in a zspage > >+ * to prevent zspage destroying. > >+ */ > >+ if (!src=5Fpage) { > >+ src=5Fpage =3D isolate=5Fsource=5Fpage(class); > >+ if (!src=5Fpage) { > >+ spin=5Funlock(&class->lock); > >+ break; > >+ } > >+ } > > > >- while ((dst=5Fpage =3D isolate=5Ftarget=5Fpage(class))) { > >- cc.d=5Fpage =3D dst=5Fpage; > >- /* > >- * If there is no more space in dst=5Fpage, resched > >- * and see if anyone had allocated another zspage. > >- */ > >- if (!migrate=5Fzspage(pool, class, &cc)) > >+ /* Isolate target page and freeze all objects in the zspage */ > >+ if (!dst=5Fpage) { > >+ dst=5Fpage =3D isolate=5Ftarget=5Fpage(class); > >+ if (!dst=5Fpage) { > >+ spin=5Funlock(&class->lock); > > break; > >+ } > >+ } > >+ spin=5Funlock(&class->lock); >=20 > (Sorry to delete individual recipients due to my compliance issues.) >=20 > Hello, Minchan. >=20 >=20 > Is it safe to unlock? >=20 >=20 > (I assume that the system has 2 cores > and a swap device is using zsmalloc pool.) > If a zs compact context scheduled out after this "spin=5Funlock" line, >=20 >=20 > CPU A (Swap In) CPU B (zs=5Ffree by process killed) > --------------------- ------------------------- > ... > spin=5Flock(&si->lock) > ... > # assume it is pinned by zs=5Fcompact cont= ext. > pin=5Ftag(handle) --> block >=20 > ... > spin=5Flock(&si->lock) --> block >=20 >=20 > I think CPU A and CPU B may not be released forever. > Am I missing something? You didn't miss anything. It could be dead locked. The swap=5Fslot=5Ffree=5Fnotify is always really problem. :( That's why I want to remove it. I will think over how to handle it and send fix in next revision. Thanks for the review! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754807AbcDDJBL (ORCPT ); Mon, 4 Apr 2016 05:01:11 -0400 Received: from LGEAMRELO13.lge.com ([156.147.23.53]:45551 "EHLO lgeamrelo13.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751744AbcDDJBJ convert rfc822-to-8bit (ORCPT ); Mon, 4 Apr 2016 05:01:09 -0400 X-Original-SENDERIP: 156.147.1.151 X-Original-MAILFROM: minchan@kernel.org X-Original-SENDERIP: 165.244.98.203 X-Original-MAILFROM: minchan@kernel.org X-Original-SENDERIP: 10.177.223.161 X-Original-MAILFROM: minchan@kernel.org Date: Mon, 4 Apr 2016 18:01:14 +0900 From: Minchan Kim To: Chulmin Kim CC: Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v3 12/16] zsmalloc: zs_compact refactoring Message-ID: <20160404090114.GA12898@bbox> References: <1459321935-3655-1-git-send-email-minchan@kernel.org> <1459321935-3655-13-git-send-email-minchan@kernel.org> <57022000.9030705@samsung.com> MIME-Version: 1.0 In-Reply-To: <57022000.9030705@samsung.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-MIMETrack: Itemize by SMTP Server on LGEKRMHUB07/LGE/LG Group(Release 8.5.3FP6|November 21, 2013) at 2016/04/04 18:01:06, Serialize by Router on LGEKRMHUB07/LGE/LG Group(Release 8.5.3FP6|November 21, 2013) at 2016/04/04 18:01:06 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Chulmin, On Mon, Apr 04, 2016 at 05:04:16PM +0900, Chulmin Kim wrote: > On 2016년 03월 30일 16:12, Minchan Kim wrote: > >Currently, we rely on class->lock to prevent zspage destruction. > >It was okay until now because the critical section is short but > >with run-time migration, it could be long so class->lock is not > >a good apporach any more. > > > >So, this patch introduces [un]freeze_zspage functions which > >freeze allocated objects in the zspage with pinning tag so > >user cannot free using object. With those functions, this patch > >redesign compaction. > > > >Those functions will be used for implementing zspage runtime > >migrations, too. > > > >Signed-off-by: Minchan Kim > >--- > > mm/zsmalloc.c | 393 ++++++++++++++++++++++++++++++++++++++-------------------- > > 1 file changed, 257 insertions(+), 136 deletions(-) > > > >diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > >index b11dcd718502..ac8ca7b10720 100644 > >--- a/mm/zsmalloc.c > >+++ b/mm/zsmalloc.c > >@@ -922,6 +922,13 @@ static unsigned long obj_to_head(struct size_class *class, struct page *page, > > return *(unsigned long *)obj; > > } > > > >+static inline int testpin_tag(unsigned long handle) > >+{ > >+ unsigned long *ptr = (unsigned long *)handle; > >+ > >+ return test_bit(HANDLE_PIN_BIT, ptr); > >+} > >+ > > static inline int trypin_tag(unsigned long handle) > > { > > unsigned long *ptr = (unsigned long *)handle; > >@@ -949,8 +956,7 @@ static void reset_page(struct page *page) > > page->freelist = NULL; > > } > > > >-static void free_zspage(struct zs_pool *pool, struct size_class *class, > >- struct page *first_page) > >+static void free_zspage(struct zs_pool *pool, struct page *first_page) > > { > > struct page *nextp, *tmp, *head_extra; > > > >@@ -973,11 +979,6 @@ static void free_zspage(struct zs_pool *pool, struct size_class *class, > > } > > reset_page(head_extra); > > __free_page(head_extra); > >- > >- zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage( > >- class->size, class->pages_per_zspage)); > >- atomic_long_sub(class->pages_per_zspage, > >- &pool->pages_allocated); > > } > > > > /* Initialize a newly allocated zspage */ > >@@ -1325,6 +1326,11 @@ static bool zspage_full(struct size_class *class, struct page *first_page) > > return get_zspage_inuse(first_page) == class->objs_per_zspage; > > } > > > >+static bool zspage_empty(struct size_class *class, struct page *first_page) > >+{ > >+ return get_zspage_inuse(first_page) == 0; > >+} > >+ > > unsigned long zs_get_total_pages(struct zs_pool *pool) > > { > > return atomic_long_read(&pool->pages_allocated); > >@@ -1455,7 +1461,6 @@ static unsigned long obj_malloc(struct size_class *class, > > set_page_private(first_page, handle | OBJ_ALLOCATED_TAG); > > kunmap_atomic(vaddr); > > mod_zspage_inuse(first_page, 1); > >- zs_stat_inc(class, OBJ_USED, 1); > > > > obj = location_to_obj(m_page, obj); > > > >@@ -1510,6 +1515,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size) > > } > > > > obj = obj_malloc(class, first_page, handle); > >+ zs_stat_inc(class, OBJ_USED, 1); > > /* Now move the zspage to another fullness group, if required */ > > fix_fullness_group(class, first_page); > > record_obj(handle, obj); > >@@ -1540,7 +1546,6 @@ static void obj_free(struct size_class *class, unsigned long obj) > > kunmap_atomic(vaddr); > > set_freeobj(first_page, f_objidx); > > mod_zspage_inuse(first_page, -1); > >- zs_stat_dec(class, OBJ_USED, 1); > > } > > > > void zs_free(struct zs_pool *pool, unsigned long handle) > >@@ -1564,10 +1569,19 @@ void zs_free(struct zs_pool *pool, unsigned long handle) > > > > spin_lock(&class->lock); > > obj_free(class, obj); > >+ zs_stat_dec(class, OBJ_USED, 1); > > fullness = fix_fullness_group(class, first_page); > >- if (fullness == ZS_EMPTY) > >- free_zspage(pool, class, first_page); > >+ if (fullness == ZS_EMPTY) { > >+ zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage( > >+ class->size, class->pages_per_zspage)); > >+ spin_unlock(&class->lock); > >+ atomic_long_sub(class->pages_per_zspage, > >+ &pool->pages_allocated); > >+ free_zspage(pool, first_page); > >+ goto out; > >+ } > > spin_unlock(&class->lock); > >+out: > > unpin_tag(handle); > > > > free_handle(pool, handle); > >@@ -1637,127 +1651,66 @@ static void zs_object_copy(struct size_class *class, unsigned long dst, > > kunmap_atomic(s_addr); > > } > > > >-/* > >- * Find alloced object in zspage from index object and > >- * return handle. > >- */ > >-static unsigned long find_alloced_obj(struct size_class *class, > >- struct page *page, int index) > >+static unsigned long handle_from_obj(struct size_class *class, > >+ struct page *first_page, int obj_idx) > > { > >- unsigned long head; > >- int offset = 0; > >- unsigned long handle = 0; > >- void *addr = kmap_atomic(page); > >- > >- if (!is_first_page(page)) > >- offset = page->index; > >- offset += class->size * index; > >- > >- while (offset < PAGE_SIZE) { > >- head = obj_to_head(class, page, addr + offset); > >- if (head & OBJ_ALLOCATED_TAG) { > >- handle = head & ~OBJ_ALLOCATED_TAG; > >- if (trypin_tag(handle)) > >- break; > >- handle = 0; > >- } > >+ struct page *page; > >+ unsigned long offset_in_page; > >+ void *addr; > >+ unsigned long head, handle = 0; > > > >- offset += class->size; > >- index++; > >- } > >+ objidx_to_page_and_offset(class, first_page, obj_idx, > >+ &page, &offset_in_page); > > > >+ addr = kmap_atomic(page); > >+ head = obj_to_head(class, page, addr + offset_in_page); > >+ if (head & OBJ_ALLOCATED_TAG) > >+ handle = head & ~OBJ_ALLOCATED_TAG; > > kunmap_atomic(addr); > >+ > > return handle; > > } > > > >-struct zs_compact_control { > >- /* Source page for migration which could be a subpage of zspage. */ > >- struct page *s_page; > >- /* Destination page for migration which should be a first page > >- * of zspage. */ > >- struct page *d_page; > >- /* Starting object index within @s_page which used for live object > >- * in the subpage. */ > >- int index; > >-}; > >- > >-static int migrate_zspage(struct zs_pool *pool, struct size_class *class, > >- struct zs_compact_control *cc) > >+static int migrate_zspage(struct size_class *class, struct page *dst_page, > >+ struct page *src_page) > > { > >- unsigned long used_obj, free_obj; > > unsigned long handle; > >- struct page *s_page = cc->s_page; > >- struct page *d_page = cc->d_page; > >- unsigned long index = cc->index; > >- int ret = 0; > >+ unsigned long old_obj, new_obj; > >+ int i; > >+ int nr_migrated = 0; > > > >- while (1) { > >- handle = find_alloced_obj(class, s_page, index); > >- if (!handle) { > >- s_page = get_next_page(s_page); > >- if (!s_page) > >- break; > >- index = 0; > >+ for (i = 0; i < class->objs_per_zspage; i++) { > >+ handle = handle_from_obj(class, src_page, i); > >+ if (!handle) > > continue; > >- } > >- > >- /* Stop if there is no more space */ > >- if (zspage_full(class, d_page)) { > >- unpin_tag(handle); > >- ret = -ENOMEM; > >+ if (zspage_full(class, dst_page)) > > break; > >- } > >- > >- used_obj = handle_to_obj(handle); > >- free_obj = obj_malloc(class, d_page, handle); > >- zs_object_copy(class, free_obj, used_obj); > >- index++; > >+ old_obj = handle_to_obj(handle); > >+ new_obj = obj_malloc(class, dst_page, handle); > >+ zs_object_copy(class, new_obj, old_obj); > >+ nr_migrated++; > > /* > > * record_obj updates handle's value to free_obj and it will > > * invalidate lock bit(ie, HANDLE_PIN_BIT) of handle, which > > * breaks synchronization using pin_tag(e,g, zs_free) so > > * let's keep the lock bit. > > */ > >- free_obj |= BIT(HANDLE_PIN_BIT); > >- record_obj(handle, free_obj); > >- unpin_tag(handle); > >- obj_free(class, used_obj); > >+ new_obj |= BIT(HANDLE_PIN_BIT); > >+ record_obj(handle, new_obj); > >+ obj_free(class, old_obj); > > } > >- > >- /* Remember last position in this iteration */ > >- cc->s_page = s_page; > >- cc->index = index; > >- > >- return ret; > >-} > >- > >-static struct page *isolate_target_page(struct size_class *class) > >-{ > >- int i; > >- struct page *page; > >- > >- for (i = 0; i < _ZS_NR_FULLNESS_GROUPS; i++) { > >- page = class->fullness_list[i]; > >- if (page) { > >- remove_zspage(class, i, page); > >- break; > >- } > >- } > >- > >- return page; > >+ return nr_migrated; > > } > > > > /* > > * putback_zspage - add @first_page into right class's fullness list > >- * @pool: target pool > > * @class: destination class > > * @first_page: target page > > * > > * Return @first_page's updated fullness_group > > */ > >-static enum fullness_group putback_zspage(struct zs_pool *pool, > >- struct size_class *class, > >- struct page *first_page) > >+static enum fullness_group putback_zspage(struct size_class *class, > >+ struct page *first_page) > > { > > enum fullness_group fullness; > > > >@@ -1768,17 +1721,155 @@ static enum fullness_group putback_zspage(struct zs_pool *pool, > > return fullness; > > } > > > >+/* > >+ * freeze_zspage - freeze all objects in a zspage > >+ * @class: size class of the page > >+ * @first_page: first page of zspage > >+ * > >+ * Freeze all allocated objects in a zspage so objects couldn't be > >+ * freed until unfreeze objects. It should be called under class->lock. > >+ * > >+ * RETURNS: > >+ * the number of pinned objects > >+ */ > >+static int freeze_zspage(struct size_class *class, struct page *first_page) > >+{ > >+ unsigned long obj_idx; > >+ struct page *obj_page; > >+ unsigned long offset; > >+ void *addr; > >+ int nr_freeze = 0; > >+ > >+ for (obj_idx = 0; obj_idx < class->objs_per_zspage; obj_idx++) { > >+ unsigned long head; > >+ > >+ objidx_to_page_and_offset(class, first_page, obj_idx, > >+ &obj_page, &offset); > >+ addr = kmap_atomic(obj_page); > >+ head = obj_to_head(class, obj_page, addr + offset); > >+ if (head & OBJ_ALLOCATED_TAG) { > >+ unsigned long handle = head & ~OBJ_ALLOCATED_TAG; > >+ > >+ if (!trypin_tag(handle)) { > >+ kunmap_atomic(addr); > >+ break; > >+ } > >+ nr_freeze++; > >+ } > >+ kunmap_atomic(addr); > >+ } > >+ > >+ return nr_freeze; > >+} > >+ > >+/* > >+ * unfreeze_page - unfreeze objects freezed by freeze_zspage in a zspage > >+ * @class: size class of the page > >+ * @first_page: freezed zspage to unfreeze > >+ * @nr_obj: the number of objects to unfreeze > >+ * > >+ * unfreeze objects in a zspage. > >+ */ > >+static void unfreeze_zspage(struct size_class *class, struct page *first_page, > >+ int nr_obj) > >+{ > >+ unsigned long obj_idx; > >+ struct page *obj_page; > >+ unsigned long offset; > >+ void *addr; > >+ int nr_unfreeze = 0; > >+ > >+ for (obj_idx = 0; obj_idx < class->objs_per_zspage && > >+ nr_unfreeze < nr_obj; obj_idx++) { > >+ unsigned long head; > >+ > >+ objidx_to_page_and_offset(class, first_page, obj_idx, > >+ &obj_page, &offset); > >+ addr = kmap_atomic(obj_page); > >+ head = obj_to_head(class, obj_page, addr + offset); > >+ if (head & OBJ_ALLOCATED_TAG) { > >+ unsigned long handle = head & ~OBJ_ALLOCATED_TAG; > >+ > >+ VM_BUG_ON(!testpin_tag(handle)); > >+ unpin_tag(handle); > >+ nr_unfreeze++; > >+ } > >+ kunmap_atomic(addr); > >+ } > >+} > >+ > >+/* > >+ * isolate_source_page - isolate a zspage for migration source > >+ * @class: size class of zspage for isolation > >+ * > >+ * Returns a zspage which are isolated from list so anyone can > >+ * allocate a object from that page. As well, freeze all objects > >+ * allocated in the zspage so anyone cannot access that objects > >+ * (e.g., zs_map_object, zs_free). > >+ */ > > static struct page *isolate_source_page(struct size_class *class) > > { > > int i; > > struct page *page = NULL; > > > > for (i = ZS_ALMOST_EMPTY; i >= ZS_ALMOST_FULL; i--) { > >+ int inuse, freezed; > >+ > > page = class->fullness_list[i]; > > if (!page) > > continue; > > > > remove_zspage(class, i, page); > >+ > >+ inuse = get_zspage_inuse(page); > >+ freezed = freeze_zspage(class, page); > >+ > >+ if (inuse != freezed) { > >+ unfreeze_zspage(class, page, freezed); > >+ putback_zspage(class, page); > >+ page = NULL; > >+ continue; > >+ } > >+ > >+ break; > >+ } > >+ > >+ return page; > >+} > >+ > >+/* > >+ * isolate_target_page - isolate a zspage for migration target > >+ * @class: size class of zspage for isolation > >+ * > >+ * Returns a zspage which are isolated from list so anyone can > >+ * allocate a object from that page. As well, freeze all objects > >+ * allocated in the zspage so anyone cannot access that objects > >+ * (e.g., zs_map_object, zs_free). > >+ */ > >+static struct page *isolate_target_page(struct size_class *class) > >+{ > >+ int i; > >+ struct page *page; > >+ > >+ for (i = 0; i < _ZS_NR_FULLNESS_GROUPS; i++) { > >+ int inuse, freezed; > >+ > >+ page = class->fullness_list[i]; > >+ if (!page) > >+ continue; > >+ > >+ remove_zspage(class, i, page); > >+ > >+ inuse = get_zspage_inuse(page); > >+ freezed = freeze_zspage(class, page); > >+ > >+ if (inuse != freezed) { > >+ unfreeze_zspage(class, page, freezed); > >+ putback_zspage(class, page); > >+ page = NULL; > >+ continue; > >+ } > >+ > > break; > > } > > > >@@ -1793,9 +1884,11 @@ static struct page *isolate_source_page(struct size_class *class) > > static unsigned long zs_can_compact(struct size_class *class) > > { > > unsigned long obj_wasted; > >+ unsigned long obj_allocated, obj_used; > > > >- obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) - > >- zs_stat_get(class, OBJ_USED); > >+ obj_allocated = zs_stat_get(class, OBJ_ALLOCATED); > >+ obj_used = zs_stat_get(class, OBJ_USED); > >+ obj_wasted = obj_allocated - obj_used; > > > > obj_wasted /= get_maxobj_per_zspage(class->size, > > class->pages_per_zspage); > >@@ -1805,53 +1898,81 @@ static unsigned long zs_can_compact(struct size_class *class) > > > > static void __zs_compact(struct zs_pool *pool, struct size_class *class) > > { > >- struct zs_compact_control cc; > >- struct page *src_page; > >+ struct page *src_page = NULL; > > struct page *dst_page = NULL; > > > >- spin_lock(&class->lock); > >- while ((src_page = isolate_source_page(class))) { > >+ while (1) { > >+ int nr_migrated; > > > >- if (!zs_can_compact(class)) > >+ spin_lock(&class->lock); > >+ if (!zs_can_compact(class)) { > >+ spin_unlock(&class->lock); > > break; > >+ } > > > >- cc.index = 0; > >- cc.s_page = src_page; > >+ /* > >+ * Isolate source page and freeze all objects in a zspage > >+ * to prevent zspage destroying. > >+ */ > >+ if (!src_page) { > >+ src_page = isolate_source_page(class); > >+ if (!src_page) { > >+ spin_unlock(&class->lock); > >+ break; > >+ } > >+ } > > > >- while ((dst_page = isolate_target_page(class))) { > >- cc.d_page = dst_page; > >- /* > >- * If there is no more space in dst_page, resched > >- * and see if anyone had allocated another zspage. > >- */ > >- if (!migrate_zspage(pool, class, &cc)) > >+ /* Isolate target page and freeze all objects in the zspage */ > >+ if (!dst_page) { > >+ dst_page = isolate_target_page(class); > >+ if (!dst_page) { > >+ spin_unlock(&class->lock); > > break; > >+ } > >+ } > >+ spin_unlock(&class->lock); > > (Sorry to delete individual recipients due to my compliance issues.) > > Hello, Minchan. > > > Is it safe to unlock? > > > (I assume that the system has 2 cores > and a swap device is using zsmalloc pool.) > If a zs compact context scheduled out after this "spin_unlock" line, > > > CPU A (Swap In) CPU B (zs_free by process killed) > --------------------- ------------------------- > ... > spin_lock(&si->lock) > ... > # assume it is pinned by zs_compact context. > pin_tag(handle) --> block > > ... > spin_lock(&si->lock) --> block > > > I think CPU A and CPU B may not be released forever. > Am I missing something? You didn't miss anything. It could be dead locked. The swap_slot_free_notify is always really problem. :( That's why I want to remove it. I will think over how to handle it and send fix in next revision. Thanks for the review!