All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
To: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Subject: Re: [RFC][PATCH 07/10] zsmalloc: introduce auto-compact support
Date: Thu, 4 Jun 2015 14:30:56 +0900	[thread overview]
Message-ID: <20150604053056.GA662@swordfish> (raw)
In-Reply-To: <20150604045725.GI2241@blaptop>

On (06/04/15 13:57), Minchan Kim wrote:
> On Sat, May 30, 2015 at 12:05:25AM +0900, Sergey Senozhatsky wrote:
> > perform class compaction in zs_free(), if zs_free() has created
> > a ZS_ALMOST_EMPTY page. this is the most trivial `policy'.
> 
> Finally, I got realized your intention.
> 
> Actually, I had a plan to add /sys/block/zram0/compact_threshold_ratio
> which means to compact automatically when compr_data_size/mem_used_total
> is below than the threshold but I didn't try because it could be done
> by usertool.
> 
> Another reason I didn't try the approach is that it could scan all of
> zs_objects repeatedly withtout any freeing zspage in some corner cases,
> which could be big overhead we should prevent so we might add some
> heuristic. as an example, we could delay a few compaction trial when
> we found a few previous trials as all fails.

this is why I use zs_can_compact() -- to evict from zs_compact() as soon
as possible. so useless scans are minimized (well, at least expected). I'm
also thinking of a threshold-based solution -- do class auto-compaction
only if we can free X pages, for example.

the problem of compaction is that there is no compaction until you trigger
it.

and fragmented classes are not necessarily a win. if writes don't happen
to a fragmented class-X (and we basically can't tell if they will, nor we
can estimate; it's up to I/O and data patterns, compression algorithm, etc.)
then class-X stays fragmented w/o any use.


> It's simple design of mm/compaction.c to prevent pointless overhead
> but historically it made pains several times and required more
> complicated logics but it's still painful.
> 
> Other thing I found recently is that it's not always win zsmalloc
> for zram is not fragmented. The fragmented space could be used
> for storing upcoming compressed objects although it is wasted space
> at the moment but if we don't have any hole(ie, fragment space)
> via frequent compaction, zsmalloc should allocate a new zspage
> which could be allocated on movable pageblock by fallback of
> nonmovable pageblock request on highly memory pressure system
> so it accelerates fragment problem of the system memory.

yes, but compaction almost always leave classes fragmented. I think
it's a corner case, when the number of unused allocated objects was
exactly the same as the number of objects that we migrated and the
number of migrated objects was exactly N*maxobj_per_zspage, so we
left the class w/o any unused objects (OBJ_ALLOCATED == OBJ_USED).
classes have 'holes' after compaction.


> So, I want to pass the policy to userspace.
> If we found it's really trobule on userspace, then, we need more
> thinking.

well, it can be under config "aggressive compaction" or "automatic
compaction" option.

	-ss

> Thanks.
> 
> > 
> > probably it would make zs_can_compact() to return an estimated number
> > of pages that potentially will be free and trigger auto-compaction
> > only when it's above some limit (e.g. at least 4 zs pages); or put it
> > under config option.
> > 
> > this also tweaks __zs_compact() -- we can't do reschedule
> > anymore, waiting for new pages in the current class. so we
> > compact as much as we can and return immediately if compaction
> > is not possible anymore.
> > 
> > auto-compaction is not a replacement of manual compaction.
> > 
> > compiled linux kernel with auto-compaction:
> > 
> > cat /sys/block/zram0/mm_stat
> > 2339885056 1601034235 1624076288        0 1624076288    19961     1106
> > 
> > performing additional manual compaction:
> > 
> > echo 1 > /sys/block/zram0/compact
> > cat /sys/block/zram0/mm_stat
> > 2339885056 1601034235 1624051712        0 1624076288    19961     1114
> > 
> > manual compaction was able to migrate additional 8 objects. so
> > auto-compaction is 'good enough'.
> > 
> > TEST
> > 
> > this test copies a 1.3G linux kernel tar to mounted zram disk,
> > and extracts it.
> > 
> > w/auto-compaction:
> > 
> > cat /sys/block/zram0/mm_stat
> >  1171456    26006    86016        0    86016    32781        0
> > 
> > time tar xf linux-3.10.tar.gz -C linux
> > 
> > real    0m16.970s
> > user    0m15.247s
> > sys     0m8.477s
> > 
> > du -sh linux
> > 2.0G    linux
> > 
> > cat /sys/block/zram0/mm_stat
> > 3547353088 2993384270 3011088384        0 3011088384    24310      108
> > 
> > =====================================================================
> > 
> > w/o auto compaction:
> > 
> > cat /sys/block/zram0/mm_stat
> >  1171456    26000    81920        0    81920    32781        0
> > 
> > time tar xf linux-3.10.tar.gz -C linux
> > 
> > real    0m16.983s
> > user    0m15.267s
> > sys     0m8.417s
> > 
> > du -sh linux
> > 2.0G    linux
> > 
> > cat /sys/block/zram0/mm_stat
> > 3548917760 2993566924 3011317760        0 3011317760    23928        0
> > 
> > =====================================================================
> > 
> > iozone shows that auto-compacted code runs faster in several
> > tests, which is hardly trustworthy. anyway.
> > 
> > iozone -t 3 -R -r 16K -s 60M -I +Z
> > 
> >        test           base       auto-compact (compacted 66123 objs)
> >    Initial write   1603682.25          1645112.38
> >          Rewrite   2502243.31          2256570.31
> >             Read   7040860.00          7130575.00
> >          Re-read   7036490.75          7066744.25
> >     Reverse Read   6617115.25          6155395.50
> >      Stride read   6705085.50          6350030.38
> >      Random read   6668497.75          6350129.38
> >   Mixed workload   5494030.38          5091669.62
> >     Random write   2526834.44          2500977.81
> >           Pwrite   1656874.00          1663796.94
> >            Pread   3322818.91          3359683.44
> >           Fwrite   4090124.25          4099773.88
> >            Fread   10358916.25         10324409.75
> > 
> > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > ---
> >  mm/zsmalloc.c | 25 +++++++++++++------------
> >  1 file changed, 13 insertions(+), 12 deletions(-)
> > 
> > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> > index c2a640a..70bf481 100644
> > --- a/mm/zsmalloc.c
> > +++ b/mm/zsmalloc.c
> > @@ -1515,34 +1515,28 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class)
> >  
> >  		while ((dst_page = isolate_target_page(class))) {
> >  			cc.d_page = dst_page;
> > -			/*
> > -			 * If there is no more space in dst_page, resched
> > -			 * and see if anyone had allocated another zspage.
> > -			 */
> > +
> >  			if (!migrate_zspage(pool, class, &cc))
> > -				break;
> > +				goto out;
> >  
> >  			putback_zspage(pool, class, dst_page);
> >  		}
> >  
> > -		/* Stop if we couldn't find slot */
> > -		if (dst_page == NULL)
> > +		if (!dst_page)
> >  			break;
> > -
> >  		putback_zspage(pool, class, dst_page);
> >  		putback_zspage(pool, class, src_page);
> > -		spin_unlock(&class->lock);
> > -		cond_resched();
> > -		spin_lock(&class->lock);
> >  	}
> >  
> > +out:
> > +	if (dst_page)
> > +		putback_zspage(pool, class, dst_page);
> >  	if (src_page)
> >  		putback_zspage(pool, class, src_page);
> >  
> >  	spin_unlock(&class->lock);
> >  }
> >  
> > -
> >  unsigned long zs_get_total_pages(struct zs_pool *pool)
> >  {
> >  	return atomic_long_read(&pool->pages_allocated);
> > @@ -1741,6 +1735,13 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
> >  	unpin_tag(handle);
> >  
> >  	free_handle(pool, handle);
> > +
> > +	/*
> > +	 * actual fullness might have changed, __zs_compact() checks
> > +	 * if compaction makes sense
> > +	 */
> > +	if (fullness == ZS_ALMOST_EMPTY)
> > +		__zs_compact(pool, class);
> >  }
> >  EXPORT_SYMBOL_GPL(zs_free);
> >  
> > -- 
> > 2.4.2.337.gfae46aa
> > 
> 
> -- 
> Kind regards,
> Minchan Kim
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
To: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Subject: Re: [RFC][PATCH 07/10] zsmalloc: introduce auto-compact support
Date: Thu, 4 Jun 2015 14:30:56 +0900	[thread overview]
Message-ID: <20150604053056.GA662@swordfish> (raw)
In-Reply-To: <20150604045725.GI2241@blaptop>

On (06/04/15 13:57), Minchan Kim wrote:
> On Sat, May 30, 2015 at 12:05:25AM +0900, Sergey Senozhatsky wrote:
> > perform class compaction in zs_free(), if zs_free() has created
> > a ZS_ALMOST_EMPTY page. this is the most trivial `policy'.
> 
> Finally, I got realized your intention.
> 
> Actually, I had a plan to add /sys/block/zram0/compact_threshold_ratio
> which means to compact automatically when compr_data_size/mem_used_total
> is below than the threshold but I didn't try because it could be done
> by usertool.
> 
> Another reason I didn't try the approach is that it could scan all of
> zs_objects repeatedly withtout any freeing zspage in some corner cases,
> which could be big overhead we should prevent so we might add some
> heuristic. as an example, we could delay a few compaction trial when
> we found a few previous trials as all fails.

this is why I use zs_can_compact() -- to evict from zs_compact() as soon
as possible. so useless scans are minimized (well, at least expected). I'm
also thinking of a threshold-based solution -- do class auto-compaction
only if we can free X pages, for example.

the problem of compaction is that there is no compaction until you trigger
it.

and fragmented classes are not necessarily a win. if writes don't happen
to a fragmented class-X (and we basically can't tell if they will, nor we
can estimate; it's up to I/O and data patterns, compression algorithm, etc.)
then class-X stays fragmented w/o any use.


> It's simple design of mm/compaction.c to prevent pointless overhead
> but historically it made pains several times and required more
> complicated logics but it's still painful.
> 
> Other thing I found recently is that it's not always win zsmalloc
> for zram is not fragmented. The fragmented space could be used
> for storing upcoming compressed objects although it is wasted space
> at the moment but if we don't have any hole(ie, fragment space)
> via frequent compaction, zsmalloc should allocate a new zspage
> which could be allocated on movable pageblock by fallback of
> nonmovable pageblock request on highly memory pressure system
> so it accelerates fragment problem of the system memory.

yes, but compaction almost always leave classes fragmented. I think
it's a corner case, when the number of unused allocated objects was
exactly the same as the number of objects that we migrated and the
number of migrated objects was exactly N*maxobj_per_zspage, so we
left the class w/o any unused objects (OBJ_ALLOCATED == OBJ_USED).
classes have 'holes' after compaction.


> So, I want to pass the policy to userspace.
> If we found it's really trobule on userspace, then, we need more
> thinking.

well, it can be under config "aggressive compaction" or "automatic
compaction" option.

	-ss

> Thanks.
> 
> > 
> > probably it would make zs_can_compact() to return an estimated number
> > of pages that potentially will be free and trigger auto-compaction
> > only when it's above some limit (e.g. at least 4 zs pages); or put it
> > under config option.
> > 
> > this also tweaks __zs_compact() -- we can't do reschedule
> > anymore, waiting for new pages in the current class. so we
> > compact as much as we can and return immediately if compaction
> > is not possible anymore.
> > 
> > auto-compaction is not a replacement of manual compaction.
> > 
> > compiled linux kernel with auto-compaction:
> > 
> > cat /sys/block/zram0/mm_stat
> > 2339885056 1601034235 1624076288        0 1624076288    19961     1106
> > 
> > performing additional manual compaction:
> > 
> > echo 1 > /sys/block/zram0/compact
> > cat /sys/block/zram0/mm_stat
> > 2339885056 1601034235 1624051712        0 1624076288    19961     1114
> > 
> > manual compaction was able to migrate additional 8 objects. so
> > auto-compaction is 'good enough'.
> > 
> > TEST
> > 
> > this test copies a 1.3G linux kernel tar to mounted zram disk,
> > and extracts it.
> > 
> > w/auto-compaction:
> > 
> > cat /sys/block/zram0/mm_stat
> >  1171456    26006    86016        0    86016    32781        0
> > 
> > time tar xf linux-3.10.tar.gz -C linux
> > 
> > real    0m16.970s
> > user    0m15.247s
> > sys     0m8.477s
> > 
> > du -sh linux
> > 2.0G    linux
> > 
> > cat /sys/block/zram0/mm_stat
> > 3547353088 2993384270 3011088384        0 3011088384    24310      108
> > 
> > =====================================================================
> > 
> > w/o auto compaction:
> > 
> > cat /sys/block/zram0/mm_stat
> >  1171456    26000    81920        0    81920    32781        0
> > 
> > time tar xf linux-3.10.tar.gz -C linux
> > 
> > real    0m16.983s
> > user    0m15.267s
> > sys     0m8.417s
> > 
> > du -sh linux
> > 2.0G    linux
> > 
> > cat /sys/block/zram0/mm_stat
> > 3548917760 2993566924 3011317760        0 3011317760    23928        0
> > 
> > =====================================================================
> > 
> > iozone shows that auto-compacted code runs faster in several
> > tests, which is hardly trustworthy. anyway.
> > 
> > iozone -t 3 -R -r 16K -s 60M -I +Z
> > 
> >        test           base       auto-compact (compacted 66123 objs)
> >    Initial write   1603682.25          1645112.38
> >          Rewrite   2502243.31          2256570.31
> >             Read   7040860.00          7130575.00
> >          Re-read   7036490.75          7066744.25
> >     Reverse Read   6617115.25          6155395.50
> >      Stride read   6705085.50          6350030.38
> >      Random read   6668497.75          6350129.38
> >   Mixed workload   5494030.38          5091669.62
> >     Random write   2526834.44          2500977.81
> >           Pwrite   1656874.00          1663796.94
> >            Pread   3322818.91          3359683.44
> >           Fwrite   4090124.25          4099773.88
> >            Fread   10358916.25         10324409.75
> > 
> > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > ---
> >  mm/zsmalloc.c | 25 +++++++++++++------------
> >  1 file changed, 13 insertions(+), 12 deletions(-)
> > 
> > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> > index c2a640a..70bf481 100644
> > --- a/mm/zsmalloc.c
> > +++ b/mm/zsmalloc.c
> > @@ -1515,34 +1515,28 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class)
> >  
> >  		while ((dst_page = isolate_target_page(class))) {
> >  			cc.d_page = dst_page;
> > -			/*
> > -			 * If there is no more space in dst_page, resched
> > -			 * and see if anyone had allocated another zspage.
> > -			 */
> > +
> >  			if (!migrate_zspage(pool, class, &cc))
> > -				break;
> > +				goto out;
> >  
> >  			putback_zspage(pool, class, dst_page);
> >  		}
> >  
> > -		/* Stop if we couldn't find slot */
> > -		if (dst_page == NULL)
> > +		if (!dst_page)
> >  			break;
> > -
> >  		putback_zspage(pool, class, dst_page);
> >  		putback_zspage(pool, class, src_page);
> > -		spin_unlock(&class->lock);
> > -		cond_resched();
> > -		spin_lock(&class->lock);
> >  	}
> >  
> > +out:
> > +	if (dst_page)
> > +		putback_zspage(pool, class, dst_page);
> >  	if (src_page)
> >  		putback_zspage(pool, class, src_page);
> >  
> >  	spin_unlock(&class->lock);
> >  }
> >  
> > -
> >  unsigned long zs_get_total_pages(struct zs_pool *pool)
> >  {
> >  	return atomic_long_read(&pool->pages_allocated);
> > @@ -1741,6 +1735,13 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
> >  	unpin_tag(handle);
> >  
> >  	free_handle(pool, handle);
> > +
> > +	/*
> > +	 * actual fullness might have changed, __zs_compact() checks
> > +	 * if compaction makes sense
> > +	 */
> > +	if (fullness == ZS_ALMOST_EMPTY)
> > +		__zs_compact(pool, class);
> >  }
> >  EXPORT_SYMBOL_GPL(zs_free);
> >  
> > -- 
> > 2.4.2.337.gfae46aa
> > 
> 
> -- 
> Kind regards,
> Minchan Kim
> 

  reply	other threads:[~2015-06-04  5:30 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-29 15:05 [RFC][PATCH 00/10] zsmalloc auto-compaction Sergey Senozhatsky
2015-05-29 15:05 ` Sergey Senozhatsky
2015-05-29 15:05 ` [RFC][PATCH 01/10] zsmalloc: drop unused variable `nr_to_migrate' Sergey Senozhatsky
2015-05-29 15:05   ` Sergey Senozhatsky
2015-06-04  2:04   ` Minchan Kim
2015-06-04  2:04     ` Minchan Kim
2015-06-04  2:10     ` Sergey Senozhatsky
2015-06-04  2:10       ` Sergey Senozhatsky
2015-05-29 15:05 ` [RFC][PATCH 02/10] zsmalloc: always keep per-class stats Sergey Senozhatsky
2015-05-29 15:05   ` Sergey Senozhatsky
2015-06-04  2:18   ` Minchan Kim
2015-06-04  2:18     ` Minchan Kim
2015-06-04  2:34     ` Sergey Senozhatsky
2015-06-04  2:34       ` Sergey Senozhatsky
2015-05-29 15:05 ` [RFC][PATCH 03/10] zsmalloc: introduce zs_can_compact() function Sergey Senozhatsky
2015-05-29 15:05   ` Sergey Senozhatsky
2015-06-04  2:55   ` Minchan Kim
2015-06-04  2:55     ` Minchan Kim
2015-06-04  3:15     ` Sergey Senozhatsky
2015-06-04  3:15       ` Sergey Senozhatsky
2015-06-04  3:30       ` Minchan Kim
2015-06-04  3:30         ` Minchan Kim
2015-06-04  3:42         ` Sergey Senozhatsky
2015-06-04  3:42           ` Sergey Senozhatsky
2015-06-04  3:50           ` Minchan Kim
2015-06-04  3:50             ` Minchan Kim
2015-06-04  4:19             ` Sergey Senozhatsky
2015-06-04  4:19               ` Sergey Senozhatsky
2015-06-04  3:31       ` Sergey Senozhatsky
2015-06-04  3:31         ` Sergey Senozhatsky
2015-05-29 15:05 ` [RFC][PATCH 04/10] zsmalloc: cosmetic compaction code adjustments Sergey Senozhatsky
2015-05-29 15:05   ` Sergey Senozhatsky
2015-06-04  3:14   ` Minchan Kim
2015-06-04  3:14     ` Minchan Kim
2015-05-29 15:05 ` [RFC][PATCH 05/10] zsmalloc: add `num_migrated' to zs_pool Sergey Senozhatsky
2015-05-29 15:05   ` Sergey Senozhatsky
2015-05-29 15:05 ` [RFC][PATCH 06/10] zsmalloc: move compaction functions Sergey Senozhatsky
2015-05-29 15:05   ` Sergey Senozhatsky
2015-05-29 15:05 ` [RFC][PATCH 07/10] zsmalloc: introduce auto-compact support Sergey Senozhatsky
2015-05-29 15:05   ` Sergey Senozhatsky
2015-06-04  4:57   ` Minchan Kim
2015-06-04  4:57     ` Minchan Kim
2015-06-04  5:30     ` Sergey Senozhatsky [this message]
2015-06-04  5:30       ` Sergey Senozhatsky
2015-06-04  6:27       ` Minchan Kim
2015-06-04  6:27         ` Minchan Kim
2015-06-04  7:04         ` Minchan Kim
2015-06-04  7:04           ` Minchan Kim
2015-06-04 14:47           ` Sergey Senozhatsky
2015-06-04 14:47             ` Sergey Senozhatsky
2015-06-04  7:28         ` Sergey Senozhatsky
2015-06-04  7:28           ` Sergey Senozhatsky
2015-05-29 15:05 ` [RFC][PATCH 08/10] zsmalloc: export zs_pool `num_migrated' Sergey Senozhatsky
2015-05-29 15:05   ` Sergey Senozhatsky
2015-05-29 15:05 ` [RFC][PATCH 09/10] zram: remove `num_migrated' from zram_stats Sergey Senozhatsky
2015-05-29 15:05   ` Sergey Senozhatsky
2015-05-29 15:05 ` [RFC][PATCH 10/10] zsmalloc: lower ZS_ALMOST_FULL waterline Sergey Senozhatsky
2015-05-29 15:05   ` Sergey Senozhatsky
2015-06-03  5:09 ` [RFC][PATCH 00/10] zsmalloc auto-compaction Sergey Senozhatsky
2015-06-03  5:09   ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150604053056.GA662@swordfish \
    --to=sergey.senozhatsky.work@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=sergey.senozhatsky@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.