Re: [RFC][PATCH v3 1/5] mm/zsmalloc: introduce class auto-compaction

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Minchan Kim <minchan@kernel.org>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Joonsoo Kim <js1304@gmail.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC][PATCH v3 1/5] mm/zsmalloc: introduce class auto-compaction
Date: Tue, 15 Mar 2016 15:17:24 +0900	[thread overview]
Message-ID: <20160315061723.GB25154@bbox> (raw)
In-Reply-To: <20160315013303.GC2126@swordfish>

On Tue, Mar 15, 2016 at 10:33:03AM +0900, Sergey Senozhatsky wrote:
> On (03/15/16 09:46), Minchan Kim wrote:
> [..]
> > > yes,
> > > 
> > > we do less work this way - scan and compact only one class, instead
> > > of locking and compacting all of them; which sounds reasonable.
> > 
> > Hmm,, It consumes more memory(i.e., sizeof(work_struct) + sizeof(void *)
> > + sizeof(bool) * NR_CLASS) as well as kicking many work up to NR_CLASS.
> 
> yes, it does. not really happy with it either.
> 
> > I didn't test your patch but I guess I can make worst case scenario.
> > 
> > * make every class fragmented under 40%
> > * On the 40% boundary, repeated alloc/free of every class so every free
> >   can schedule work if it was not scheduled.
> > * Although class fragment is too high, it's not a problem if the class
> >   consumes small amount of memory.
> 
> hm, in this scenario both solutions are less than perfect. we jump
> X times over 40% margin, we have X*NR_CLASS compaction scans in the
> end. the difference is that we queue less works, yes, but we don't
> have to use workqueue in the first place; compaction can be done
> asynchronously by a pool's dedicated kthread. so we will just
> wake_up() the process.

Hmm, kthread is over-engineered to me. If we want to create new kthread
in the system, I guess we should persuade many people to merge in.
Surely, we should have why it couldn't be done by others(e.g., workqueue).

I think your workqueue approach is good to me.
Only problem I can see with it is we cannot start compaction when
we want instantly so my conclusion is we need both direct and
background compaction.

For shrinker and user-space trigger knob, we could compact in that context
while we could queue background job to compact in zs_free.

> 
> > I guess it can make degradation if I try to test on zsmalloc
> > microbenchmark.
> > 
> > As well, although I don't know workqueue internal well, thesedays,
> > I saw a few of mails related to workqueue(maybe, vmstat) and it had
> > some trouble if system memory pressure is heavy IIRC.
> 
> yes, you are right. wq provides WQ_MEM_RECLAIM bit for this
> case -- a special kthread that it will wake up to process works.
> 
> > My approach is as follows, for exmaple.
> >
> > Let's make a global ratio. Let's say it's 4M.
> 
> ok. should it depend on pool size?  min(20% of pool_size, XXMB)?

Maybe, that could be a knob but need to think more what should be
default. In this moment, clear thing is that we should prevent
frequent ping-pong background compaction as repeated alloc/free
with dancing on threshold boundary.

> 
> > If zs_free(or something) realizes current fragment is over 4M,
> > kick compacion backgroud job.
> 
> yes, zs_free() is the only place that introduces fragmentation.
> 
> > The job scans from highest to lower class and compact zspages
> > in each size_class until it meets high watermark(e.g, 4M + 4M /2 =
> > 6M fragment ratio).
> 
> ok.
> 
> > And in the middle of background compaction, if we find it's too
> > many scan(e.g., 256 zspages or somethings), just bail out the
> > job for the latency and reschedule it for next time. At the next
> > time, we can continue from the last size class.
> 
> ok. I'd probably prefer more simple rules here:
> -- bail out because it has compacted XXMB
>    so the fragmentation ratio is *expected* to be below the watermark

Need high watermark to stop compaction.
It will prevent frequent background compaction triggering.


> -- nothing to scan anymore
>    compaction is executed concurrently with zs_free()/zs_malloc()
>    calls, it's harder to control/guarantee some global state.
> 
> overall, no real objections. this approach can work, I think. need
> to test it.

Thanks, Sergey!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Minchan Kim <minchan@kernel.org>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Joonsoo Kim <js1304@gmail.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC][PATCH v3 1/5] mm/zsmalloc: introduce class auto-compaction
Date: Tue, 15 Mar 2016 15:17:24 +0900	[thread overview]
Message-ID: <20160315061723.GB25154@bbox> (raw)
In-Reply-To: <20160315013303.GC2126@swordfish>

On Tue, Mar 15, 2016 at 10:33:03AM +0900, Sergey Senozhatsky wrote:
> On (03/15/16 09:46), Minchan Kim wrote:
> [..]
> > > yes,
> > > 
> > > we do less work this way - scan and compact only one class, instead
> > > of locking and compacting all of them; which sounds reasonable.
> > 
> > Hmm,, It consumes more memory(i.e., sizeof(work_struct) + sizeof(void *)
> > + sizeof(bool) * NR_CLASS) as well as kicking many work up to NR_CLASS.
> 
> yes, it does. not really happy with it either.
> 
> > I didn't test your patch but I guess I can make worst case scenario.
> > 
> > * make every class fragmented under 40%
> > * On the 40% boundary, repeated alloc/free of every class so every free
> >   can schedule work if it was not scheduled.
> > * Although class fragment is too high, it's not a problem if the class
> >   consumes small amount of memory.
> 
> hm, in this scenario both solutions are less than perfect. we jump
> X times over 40% margin, we have X*NR_CLASS compaction scans in the
> end. the difference is that we queue less works, yes, but we don't
> have to use workqueue in the first place; compaction can be done
> asynchronously by a pool's dedicated kthread. so we will just
> wake_up() the process.

Hmm, kthread is over-engineered to me. If we want to create new kthread
in the system, I guess we should persuade many people to merge in.
Surely, we should have why it couldn't be done by others(e.g., workqueue).

I think your workqueue approach is good to me.
Only problem I can see with it is we cannot start compaction when
we want instantly so my conclusion is we need both direct and
background compaction.

For shrinker and user-space trigger knob, we could compact in that context
while we could queue background job to compact in zs_free.

> 
> > I guess it can make degradation if I try to test on zsmalloc
> > microbenchmark.
> > 
> > As well, although I don't know workqueue internal well, thesedays,
> > I saw a few of mails related to workqueue(maybe, vmstat) and it had
> > some trouble if system memory pressure is heavy IIRC.
> 
> yes, you are right. wq provides WQ_MEM_RECLAIM bit for this
> case -- a special kthread that it will wake up to process works.
> 
> > My approach is as follows, for exmaple.
> >
> > Let's make a global ratio. Let's say it's 4M.
> 
> ok. should it depend on pool size?  min(20% of pool_size, XXMB)?

Maybe, that could be a knob but need to think more what should be
default. In this moment, clear thing is that we should prevent
frequent ping-pong background compaction as repeated alloc/free
with dancing on threshold boundary.

> 
> > If zs_free(or something) realizes current fragment is over 4M,
> > kick compacion backgroud job.
> 
> yes, zs_free() is the only place that introduces fragmentation.
> 
> > The job scans from highest to lower class and compact zspages
> > in each size_class until it meets high watermark(e.g, 4M + 4M /2 =
> > 6M fragment ratio).
> 
> ok.
> 
> > And in the middle of background compaction, if we find it's too
> > many scan(e.g., 256 zspages or somethings), just bail out the
> > job for the latency and reschedule it for next time. At the next
> > time, we can continue from the last size class.
> 
> ok. I'd probably prefer more simple rules here:
> -- bail out because it has compacted XXMB
>    so the fragmentation ratio is *expected* to be below the watermark

Need high watermark to stop compaction.
It will prevent frequent background compaction triggering.


> -- nothing to scan anymore
>    compaction is executed concurrently with zs_free()/zs_malloc()
>    calls, it's harder to control/guarantee some global state.
> 
> overall, no real objections. this approach can work, I think. need
> to test it.

Thanks, Sergey!

next prev parent reply	other threads:[~2016-03-15  6:16 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-03 14:45 [RFC][PATCH v3 0/5] mm/zsmalloc: rework compaction and increase density Sergey Senozhatsky
2016-03-03 14:45 ` Sergey Senozhatsky
2016-03-03 14:45 ` [RFC][PATCH v3 1/5] mm/zsmalloc: introduce class auto-compaction Sergey Senozhatsky
2016-03-03 14:45   ` Sergey Senozhatsky
2016-03-14  6:17   ` Minchan Kim
2016-03-14  6:17     ` Minchan Kim
2016-03-14  7:41     ` Sergey Senozhatsky
2016-03-14  7:41       ` Sergey Senozhatsky
2016-03-14  8:20       ` Sergey Senozhatsky
2016-03-14  8:20         ` Sergey Senozhatsky
2016-03-15  0:46       ` Minchan Kim
2016-03-15  0:46         ` Minchan Kim
2016-03-15  1:33         ` Sergey Senozhatsky
2016-03-15  1:33           ` Sergey Senozhatsky
2016-03-15  6:17           ` Minchan Kim [this message]
2016-03-15  6:17             ` Minchan Kim
2016-03-17  1:29             ` Sergey Senozhatsky
2016-03-17  1:29               ` Sergey Senozhatsky
2016-03-18  1:17               ` Minchan Kim
2016-03-18  1:17                 ` Minchan Kim
2016-03-18  2:00                 ` Sergey Senozhatsky
2016-03-18  2:00                   ` Sergey Senozhatsky
2016-03-18  4:03                   ` Minchan Kim
2016-03-18  4:03                     ` Minchan Kim
2016-03-18  4:10                     ` Sergey Senozhatsky
2016-03-18  4:10                       ` Sergey Senozhatsky
2016-03-03 14:46 ` [RFC][PATCH v3 2/5] mm/zsmalloc: remove shrinker compaction callbacks Sergey Senozhatsky
2016-03-03 14:46   ` Sergey Senozhatsky
2016-03-14  6:32   ` Minchan Kim
2016-03-14  6:32     ` Minchan Kim
2016-03-14  7:45     ` Sergey Senozhatsky
2016-03-14  7:45       ` Sergey Senozhatsky
2016-03-15  0:52       ` Minchan Kim
2016-03-15  0:52         ` Minchan Kim
2016-03-15  1:05         ` Sergey Senozhatsky
2016-03-15  1:05           ` Sergey Senozhatsky
2016-03-15  2:19           ` Minchan Kim
2016-03-15  2:19             ` Minchan Kim
2016-03-03 14:46 ` [RFC][PATCH v3 3/5] mm/zsmalloc: introduce zs_huge_object() Sergey Senozhatsky
2016-03-03 14:46   ` Sergey Senozhatsky
2016-03-14  6:53   ` Minchan Kim
2016-03-14  6:53     ` Minchan Kim
2016-03-14  8:08     ` Sergey Senozhatsky
2016-03-14  8:08       ` Sergey Senozhatsky
2016-03-15  0:54       ` Minchan Kim
2016-03-15  0:54         ` Minchan Kim
2016-03-03 14:46 ` [RFC][PATCH v3 4/5] zram: use zs_huge_object() Sergey Senozhatsky
2016-03-03 14:46   ` Sergey Senozhatsky
2016-03-03 14:46 ` [RFC][PATCH v3 5/5] mm/zsmalloc: reduce the number of huge classes Sergey Senozhatsky
2016-03-03 14:46   ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160315061723.GB25154@bbox \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=js1304@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sergey.senozhatsky@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.