From: Minchan Kim <minchan@kernel.org>
To: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Joonsoo Kim <js1304@gmail.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Subject: Re: [RFC][PATCH v3 1/5] mm/zsmalloc: introduce class auto-compaction
Date: Mon, 14 Mar 2016 15:17:59 +0900 [thread overview]
Message-ID: <20160314061759.GC10675@bbox> (raw)
In-Reply-To: <1457016363-11339-2-git-send-email-sergey.senozhatsky@gmail.com>
Hey Sergey,
Sorry for late review.
On Thu, Mar 03, 2016 at 11:45:59PM +0900, Sergey Senozhatsky wrote:
> zsmalloc classes are known to be affected by internal fragmentation.
>
> For example, /sys/kernel/debug/zsmalloc/zramX/classes
> class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
> 54 896 1 12 117 57 26 2 12
> ...
> 107 1744 1 23 196 76 84 3 51
> 111 1808 0 0 63 63 28 4 0
> 126 2048 0 160 568 408 284 1 80
> 144 2336 52 620 8631 5747 4932 4 1648
> 151 2448 123 406 10090 8736 6054 3 810
> 168 2720 0 512 15738 14926 10492 2 540
> 190 3072 0 2 136 130 102 3 3
> ...
>
> demonstrates that class-896 has 12/26=46% of unused pages, class-2336 has
> 1648/4932=33% of unused pages, etc. And the more classes we will have as
> 'normal' classes (more than one object per-zspage) the bigger this problem
> will grow. The existing compaction relies on a user space (user can trigger
> compaction via `compact' zram's sysfs attr) or a shrinker; it does not
> happen automatically.
>
> This patch introduces a 'watermark' value of unused pages and schedules a
> compaction work on a per-class basis once class's fragmentation becomes
> too big. So compaction is not performed in current I/O operation context,
> but in workqueue workers later.
>
> The current watermark is set to 40% -- if class has 40+% of `freeable'
> pages then compaction work will be scheduled.
Could you explain why you select per-class watermark?
Because my plan was we kick background work based on total fragmented memory
(i.e., considering used_pages/allocated_pages < some threshold).
IOW, if used_pages/allocated_pages is less than some ratio,
we kick background job with marking index of size class just freed
and then the job scans size_class from the index circulary.
As well, we should put a upper bound to scan zspages to make it
deterministic.
What do you think about it?
>
> TEST
> ====
>
> 2G zram, ext4, lz0
>
> iozone -t 1 -R -r 64K -s 1200M -I +Z
>
> BASE PATCHED
> " Initial write " 959670.94 966724.62
> " Rewrite " 1276167.62 1237632.88
> " Read " 3334708.25 3345357.50
> " Re-read " 3405310.75 3337137.25
> " Reverse Read " 3284499.75 3241283.50
> " Stride read " 3293417.75 3268364.00
> " Random read " 3255253.50 3241685.00
> " Mixed workload " 3274398.00 3231498.00
> " Random write " 1253207.50 1216247.00
> " Pwrite " 873682.25 877045.81
> " Pread " 3173266.00 3318471.75
> " Fwrite " 881278.38 897622.81
> " Fread " 4397147.00 4501131.50
>
> iozone -t 3 -R -r 64K -s 60M -I +Z
>
> BASE PATCHED
> " Initial write " 1855931.62 1869576.31
> " Rewrite " 2223531.06 2221543.62
> " Read " 7958435.75 8023044.75
> " Re-read " 7912776.75 8068961.00
> " Reverse Read " 7832227.50 7788237.50
> " Stride read " 7952113.50 7919778.00
> " Random read " 7908816.00 7881792.50
> " Mixed workload " 6364520.38 6332493.94
> " Random write " 2230115.69 2176777.19
> " Pwrite " 1915939.31 1929464.75
> " Pread " 3857052.91 3840517.91
> " Fwrite " 2271730.44 2272800.31
> " Fread " 9053867.00 8880966.25
>
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> ---
> mm/zsmalloc.c | 37 +++++++++++++++++++++++++++++++++++++
> 1 file changed, 37 insertions(+)
>
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index e72efb1..a4ef7e7 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -219,6 +219,10 @@ struct size_class {
> int pages_per_zspage;
> /* huge object: pages_per_zspage == 1 && maxobj_per_zspage == 1 */
> bool huge;
> +
> + bool compact_scheduled;
> + struct zs_pool *pool;
> + struct work_struct compact_work;
> };
>
> /*
> @@ -1467,6 +1471,8 @@ static void obj_free(struct zs_pool *pool, struct size_class *class,
> zs_stat_dec(class, OBJ_USED, 1);
> }
>
> +static bool class_watermark_ok(struct size_class *class);
> +
> void zs_free(struct zs_pool *pool, unsigned long handle)
> {
> struct page *first_page, *f_page;
> @@ -1495,6 +1501,11 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
> atomic_long_sub(class->pages_per_zspage,
> &pool->pages_allocated);
> free_zspage(first_page);
> + } else {
> + if (!class_watermark_ok(class) && !class->compact_scheduled) {
> + queue_work(system_long_wq, &class->compact_work);
> + class->compact_scheduled = true;
> + }
> }
> spin_unlock(&class->lock);
> unpin_tag(handle);
> @@ -1745,6 +1756,19 @@ static unsigned long zs_can_compact(struct size_class *class)
> return obj_wasted * class->pages_per_zspage;
> }
>
> +static bool class_watermark_ok(struct size_class *class)
> +{
> + unsigned long pages_used = zs_stat_get(class, OBJ_ALLOCATED);
> +
> + pages_used /= get_maxobj_per_zspage(class->size,
> + class->pages_per_zspage) * class->pages_per_zspage;
> +
> + if (!pages_used)
> + return true;
> +
> + return (100 * zs_can_compact(class) / pages_used) < 40;
> +}
> +
> static void __zs_compact(struct zs_pool *pool, struct size_class *class)
> {
> struct zs_compact_control cc;
> @@ -1789,9 +1813,17 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class)
> if (src_page)
> putback_zspage(pool, class, src_page);
>
> + class->compact_scheduled = false;
> spin_unlock(&class->lock);
> }
>
> +static void class_compaction_work(struct work_struct *work)
> +{
> + struct size_class *class = container_of(work, struct size_class, compact_work);
> +
> + __zs_compact(class->pool, class);
> +}
> +
> unsigned long zs_compact(struct zs_pool *pool)
> {
> int i;
> @@ -1948,6 +1980,9 @@ struct zs_pool *zs_create_pool(const char *name, gfp_t flags)
> if (pages_per_zspage == 1 &&
> get_maxobj_per_zspage(size, pages_per_zspage) == 1)
> class->huge = true;
> +
> + INIT_WORK(&class->compact_work, class_compaction_work);
> + class->pool = pool;
> spin_lock_init(&class->lock);
> pool->size_class[i] = class;
>
> @@ -1990,6 +2025,8 @@ void zs_destroy_pool(struct zs_pool *pool)
> if (class->index != i)
> continue;
>
> + cancel_work_sync(&class->compact_work);
> +
> for (fg = 0; fg < _ZS_NR_FULLNESS_GROUPS; fg++) {
> if (class->fullness_list[fg]) {
> pr_info("Freeing non-empty class with size %db, fullness group %d\n",
> --
> 2.8.0.rc0
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-03-14 6:17 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-03 14:45 [RFC][PATCH v3 0/5] mm/zsmalloc: rework compaction and increase density Sergey Senozhatsky
2016-03-03 14:45 ` [RFC][PATCH v3 1/5] mm/zsmalloc: introduce class auto-compaction Sergey Senozhatsky
2016-03-14 6:17 ` Minchan Kim [this message]
2016-03-14 7:41 ` Sergey Senozhatsky
2016-03-14 8:20 ` Sergey Senozhatsky
2016-03-15 0:46 ` Minchan Kim
2016-03-15 1:33 ` Sergey Senozhatsky
2016-03-15 6:17 ` Minchan Kim
2016-03-17 1:29 ` Sergey Senozhatsky
2016-03-18 1:17 ` Minchan Kim
2016-03-18 2:00 ` Sergey Senozhatsky
2016-03-18 4:03 ` Minchan Kim
2016-03-18 4:10 ` Sergey Senozhatsky
2016-03-03 14:46 ` [RFC][PATCH v3 2/5] mm/zsmalloc: remove shrinker compaction callbacks Sergey Senozhatsky
2016-03-14 6:32 ` Minchan Kim
2016-03-14 7:45 ` Sergey Senozhatsky
2016-03-15 0:52 ` Minchan Kim
2016-03-15 1:05 ` Sergey Senozhatsky
2016-03-15 2:19 ` Minchan Kim
2016-03-03 14:46 ` [RFC][PATCH v3 3/5] mm/zsmalloc: introduce zs_huge_object() Sergey Senozhatsky
2016-03-14 6:53 ` Minchan Kim
2016-03-14 8:08 ` Sergey Senozhatsky
2016-03-15 0:54 ` Minchan Kim
2016-03-03 14:46 ` [RFC][PATCH v3 4/5] zram: use zs_huge_object() Sergey Senozhatsky
2016-03-03 14:46 ` [RFC][PATCH v3 5/5] mm/zsmalloc: reduce the number of huge classes Sergey Senozhatsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160314061759.GC10675@bbox \
--to=minchan@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=js1304@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=sergey.senozhatsky.work@gmail.com \
--cc=sergey.senozhatsky@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).