Re: zsmalloc limitations and related topics

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ric Mason <ric.masonn@gmail.com>
To: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: minchan@kernel.org, sjenning@linux.vnet.ibm.com,
	Nitin Gupta <nitingupta910@gmail.com>,
	Konrad Wilk <konrad.wilk@oracle.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Bob Liu <lliubbo@gmail.com>,
	Luigi Semenzato <semenzato@google.com>,
	Mel Gorman <mgorman@suse.de>
Subject: Re: zsmalloc limitations and related topics
Date: Fri, 01 Mar 2013 09:40:18 +0800	[thread overview]
Message-ID: <51300702.1050006@gmail.com> (raw)
In-Reply-To: <0efe9610-1aa5-4aa9-bde9-227acfa969ca@default>

On 02/28/2013 07:24 AM, Dan Magenheimer wrote:
> Hi all --
>
> I've been doing some experimentation on zsmalloc in preparation
> for my topic proposed for LSFMM13 and have run across some
> perplexing limitations.  Those familiar with the intimate details
> of zsmalloc might be well aware of these limitations, but they
> aren't documented or immediately obvious, so I thought it would
> be worthwhile to air them publicly.  I've also included some
> measurements from the experimentation and some related thoughts.
>
> (Some of the terms here are unusual and may be used inconsistently
> by different developers so a glossary of definitions of the terms
> used here is appended.)
>
> ZSMALLOC LIMITATIONS
>
> Zsmalloc is used for two zprojects: zram and the out-of-tree
> zswap.  Zsmalloc can achieve high density when "full".  But:
>
> 1) Zsmalloc has a worst-case density of 0.25 (one zpage per
>     four pageframes).
> 2) When not full and especially when nearly-empty _after_
>     being full, density may fall below 1.0 as a result of
>     fragmentation.

What's the meaning of nearly-empty _after_ being full?

> 3) Zsmalloc has a density of exactly 1.0 for any number of
>     zpages with zsize >= 0.8.
> 4) Zsmalloc contains several compile-time parameters;
>     the best value of these parameters may be very workload
>     dependent.
>
> If density == 1.0, that means we are paying the overhead of
> compression+decompression for no space advantage.  If
> density < 1.0, that means using zsmalloc is detrimental,
> resulting in worse memory pressure than if it were not used.
>
> WORKLOAD ANALYSIS
>
> These limitations emphasize that the workload used to evaluate
> zsmalloc is very important.  Benchmarks that measure data

Could you share your benchmark? In order that other guys can take 
advantage of it.

> throughput or CPU utilization are of questionable value because
> it is the _content_ of the data that is particularly relevant
> for compression.  Even more precisely, it is the "entropy"
> of the data that is relevant, because the amount of
> compressibility in the data is related to the entropy:
> I.e. an entirely random pagefull of bits will compress poorly
> and a highly-regular pagefull of bits will compress well.
> Since the zprojects manage a large number of zpages, both
> the mean and distribution of zsize of the workload should
> be "representative".
>
> The workload most widely used to publish results for
> the various zprojects is a kernel-compile using "make -jN"
> where N is artificially increased to impose memory pressure.
> By adding some debug code to zswap, I was able to analyze
> this workload and found the following:
>
> 1) The average page compressed by almost a factor of six
>     (mean zsize == 694, stddev == 474)

stddev is what?

> 2) Almost eleven percent of the pages were zero pages.  A
>     zero page compresses to 28 bytes.
> 3) On average, 77% of the bytes (3156) in the pages-to-be-
>     compressed contained a byte-value of zero.
> 4) Despite the above, mean density of zsmalloc was measured at
>     3.2 zpages/pageframe, presumably losing nearly half of
>     available space to fragmentation.
>
> I have no clue if these measurements are representative
> of a wide range of workloads over the lifetime of a booted
> machine, but I am suspicious that they are not.  For example,
> the lzo1x compression algorithm claims to compress data by
> about a factor of two.
>
> I would welcome ideas on how to evaluate workloads for
> "representativeness".  Personally I don't believe we should
> be making decisions about selecting the "best" algorithms
> or merging code without an agreement on workloads.
>
> PAGEFRAME EVACUATION AND RECLAIM
>
> I've repeatedly stated the opinion that managing the number of
> pageframes containing compressed pages will be valuable for
> managing MM interaction/policy when compression is used in
> the kernel.  After the experimentation above and some brainstorming,
> I still do not see an effective method for zsmalloc evacuating and
> reclaiming pageframes, because both are complicated by high density
> and page-crossing.  In other words, zsmalloc's strengths may
> also be its Achilles heels.  For zram, as far as I can see,
> pageframe evacuation/reclaim is irrelevant except perhaps
> as part of mass defragmentation.  For zcache and zswap, where
> writethrough is used, pageframe evacuation/reclaim is very relevant.
> (Note: The writeback implemented in zswap does _zpage_ evacuation
> without pageframe reclaim.)
>
> CLOSING THOUGHT
>
> Since zsmalloc and zbud have different strengths and weaknesses,
> I wonder if some combination or hybrid might be more optimal?
> But unless/until we have and can measure a representative workload,
> only intuition can answer that.
>
> GLOSSARY
>
> zproject -- a kernel project using compression (zram, zcache, zswap)
> zpage -- a compressed sequence of PAGE_SIZE bytes
> zsize -- the number of bytes in a compressed page
> pageframe -- the term "page" is widely used both to describe
>      either (1) PAGE_SIZE bytes of data, or (2) a physical RAM
>      area with size=PAGE_SIZE which is PAGE_SIZE-aligned,
>      as represented in the kernel by a struct page.  To be explicit,
>      we refer to (2) as a pageframe.
> density -- zpages per pageframe; higher is (presumably) better
> zsmalloc -- a slab-based allocator written by Nitin Gupta to
>       efficiently store zpages and designed to allow zpages
>       to be split across two non-contiguous pageframes
> zspage -- a grouping of N non-contiguous pageframes managed
>       as a unit by zsmalloc to store zpages for which zsize
>       falls within a certain range.  (The compile-time
>       default maximum size for N is 4).
> zbud -- a buddy-based allocator written by Dan Magenheimer
>       (specifically for zcache) to predictably store zpages;
>       no more than two zpages are stored in any pageframe
> pageframe evacuation/reclaim -- the process of removing
>       zpages from one or more pageframes, including pointers/nodes
>       from any data structures referencing those zpages,
>       so that the pageframe(s) can be freed for use by
>       the rest of the kernel
> writeback --  the process of transferring zpages from
>       storage in a zproject to a backing swap device
> lzo1x -- a compression algorithm used by default by all the
>       zprojects; the kernel implementation resides in lib/lzo.c
> entropy -- randomness of data to be compressed; higher entropy
>       means worse data compression
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=ilto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Ric Mason <ric.masonn@gmail.com>
To: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: minchan@kernel.org, sjenning@linux.vnet.ibm.com,
	Nitin Gupta <nitingupta910@gmail.com>,
	Konrad Wilk <konrad.wilk@oracle.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Bob Liu <lliubbo@gmail.com>,
	Luigi Semenzato <semenzato@google.com>,
	Mel Gorman <mgorman@suse.de>
Subject: Re: zsmalloc limitations and related topics
Date: Fri, 01 Mar 2013 09:40:18 +0800	[thread overview]
Message-ID: <51300702.1050006@gmail.com> (raw)
In-Reply-To: <0efe9610-1aa5-4aa9-bde9-227acfa969ca@default>

On 02/28/2013 07:24 AM, Dan Magenheimer wrote:
> Hi all --
>
> I've been doing some experimentation on zsmalloc in preparation
> for my topic proposed for LSFMM13 and have run across some
> perplexing limitations.  Those familiar with the intimate details
> of zsmalloc might be well aware of these limitations, but they
> aren't documented or immediately obvious, so I thought it would
> be worthwhile to air them publicly.  I've also included some
> measurements from the experimentation and some related thoughts.
>
> (Some of the terms here are unusual and may be used inconsistently
> by different developers so a glossary of definitions of the terms
> used here is appended.)
>
> ZSMALLOC LIMITATIONS
>
> Zsmalloc is used for two zprojects: zram and the out-of-tree
> zswap.  Zsmalloc can achieve high density when "full".  But:
>
> 1) Zsmalloc has a worst-case density of 0.25 (one zpage per
>     four pageframes).
> 2) When not full and especially when nearly-empty _after_
>     being full, density may fall below 1.0 as a result of
>     fragmentation.

What's the meaning of nearly-empty _after_ being full?

> 3) Zsmalloc has a density of exactly 1.0 for any number of
>     zpages with zsize >= 0.8.
> 4) Zsmalloc contains several compile-time parameters;
>     the best value of these parameters may be very workload
>     dependent.
>
> If density == 1.0, that means we are paying the overhead of
> compression+decompression for no space advantage.  If
> density < 1.0, that means using zsmalloc is detrimental,
> resulting in worse memory pressure than if it were not used.
>
> WORKLOAD ANALYSIS
>
> These limitations emphasize that the workload used to evaluate
> zsmalloc is very important.  Benchmarks that measure data

Could you share your benchmark? In order that other guys can take 
advantage of it.

> throughput or CPU utilization are of questionable value because
> it is the _content_ of the data that is particularly relevant
> for compression.  Even more precisely, it is the "entropy"
> of the data that is relevant, because the amount of
> compressibility in the data is related to the entropy:
> I.e. an entirely random pagefull of bits will compress poorly
> and a highly-regular pagefull of bits will compress well.
> Since the zprojects manage a large number of zpages, both
> the mean and distribution of zsize of the workload should
> be "representative".
>
> The workload most widely used to publish results for
> the various zprojects is a kernel-compile using "make -jN"
> where N is artificially increased to impose memory pressure.
> By adding some debug code to zswap, I was able to analyze
> this workload and found the following:
>
> 1) The average page compressed by almost a factor of six
>     (mean zsize == 694, stddev == 474)

stddev is what?

> 2) Almost eleven percent of the pages were zero pages.  A
>     zero page compresses to 28 bytes.
> 3) On average, 77% of the bytes (3156) in the pages-to-be-
>     compressed contained a byte-value of zero.
> 4) Despite the above, mean density of zsmalloc was measured at
>     3.2 zpages/pageframe, presumably losing nearly half of
>     available space to fragmentation.
>
> I have no clue if these measurements are representative
> of a wide range of workloads over the lifetime of a booted
> machine, but I am suspicious that they are not.  For example,
> the lzo1x compression algorithm claims to compress data by
> about a factor of two.
>
> I would welcome ideas on how to evaluate workloads for
> "representativeness".  Personally I don't believe we should
> be making decisions about selecting the "best" algorithms
> or merging code without an agreement on workloads.
>
> PAGEFRAME EVACUATION AND RECLAIM
>
> I've repeatedly stated the opinion that managing the number of
> pageframes containing compressed pages will be valuable for
> managing MM interaction/policy when compression is used in
> the kernel.  After the experimentation above and some brainstorming,
> I still do not see an effective method for zsmalloc evacuating and
> reclaiming pageframes, because both are complicated by high density
> and page-crossing.  In other words, zsmalloc's strengths may
> also be its Achilles heels.  For zram, as far as I can see,
> pageframe evacuation/reclaim is irrelevant except perhaps
> as part of mass defragmentation.  For zcache and zswap, where
> writethrough is used, pageframe evacuation/reclaim is very relevant.
> (Note: The writeback implemented in zswap does _zpage_ evacuation
> without pageframe reclaim.)
>
> CLOSING THOUGHT
>
> Since zsmalloc and zbud have different strengths and weaknesses,
> I wonder if some combination or hybrid might be more optimal?
> But unless/until we have and can measure a representative workload,
> only intuition can answer that.
>
> GLOSSARY
>
> zproject -- a kernel project using compression (zram, zcache, zswap)
> zpage -- a compressed sequence of PAGE_SIZE bytes
> zsize -- the number of bytes in a compressed page
> pageframe -- the term "page" is widely used both to describe
>      either (1) PAGE_SIZE bytes of data, or (2) a physical RAM
>      area with size=PAGE_SIZE which is PAGE_SIZE-aligned,
>      as represented in the kernel by a struct page.  To be explicit,
>      we refer to (2) as a pageframe.
> density -- zpages per pageframe; higher is (presumably) better
> zsmalloc -- a slab-based allocator written by Nitin Gupta to
>       efficiently store zpages and designed to allow zpages
>       to be split across two non-contiguous pageframes
> zspage -- a grouping of N non-contiguous pageframes managed
>       as a unit by zsmalloc to store zpages for which zsize
>       falls within a certain range.  (The compile-time
>       default maximum size for N is 4).
> zbud -- a buddy-based allocator written by Dan Magenheimer
>       (specifically for zcache) to predictably store zpages;
>       no more than two zpages are stored in any pageframe
> pageframe evacuation/reclaim -- the process of removing
>       zpages from one or more pageframes, including pointers/nodes
>       from any data structures referencing those zpages,
>       so that the pageframe(s) can be freed for use by
>       the rest of the kernel
> writeback --  the process of transferring zpages from
>       storage in a zproject to a backing swap device
> lzo1x -- a compression algorithm used by default by all the
>       zprojects; the kernel implementation resides in lib/lzo.c
> entropy -- randomness of data to be compressed; higher entropy
>       means worse data compression
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=ilto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2013-03-01  1:40 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-27 23:24 zsmalloc limitations and related topics Dan Magenheimer
2013-02-27 23:24 ` Dan Magenheimer
2013-02-28 22:00 ` Dan Magenheimer
2013-02-28 22:00   ` Dan Magenheimer
2013-03-01  1:40 ` Ric Mason [this message]
2013-03-01  1:40   ` Ric Mason
2013-03-04 18:29   ` Dan Magenheimer
2013-03-04 18:29     ` Dan Magenheimer
2013-03-13 15:14 ` Robert Jennings
2013-03-13 15:14   ` Robert Jennings
2013-03-13 15:33   ` Seth Jennings
2013-03-13 15:33     ` Seth Jennings
2013-03-13 15:56     ` Seth Jennings
2013-03-13 15:56       ` Seth Jennings
2013-03-13 20:02   ` Dan Magenheimer
2013-03-13 20:02     ` Dan Magenheimer
2013-03-13 22:59     ` Seth Jennings
2013-03-13 22:59       ` Seth Jennings
2013-03-14 12:02       ` Bob
2013-03-14 12:02         ` Bob
2013-03-14 13:20         ` Robert Jennings
2013-03-14 13:20           ` Robert Jennings
2013-03-14 18:54           ` Dan Magenheimer
2013-03-14 18:54             ` Dan Magenheimer
2013-03-15 16:14             ` Seth Jennings
2013-03-15 16:14               ` Seth Jennings
2013-03-15 16:54               ` Dan Magenheimer
2013-03-15 16:54                 ` Dan Magenheimer
2013-03-15 16:18             ` Seth Jennings
2013-03-15 16:18               ` Seth Jennings
2013-03-14 17:39       ` Dan Magenheimer
2013-03-14 17:39         ` Dan Magenheimer
2013-03-14 19:16     ` Dan Magenheimer
2013-03-14 19:16       ` Dan Magenheimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51300702.1050006@gmail.com \
    --to=ric.masonn@gmail.com \
    --cc=dan.magenheimer@oracle.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lliubbo@gmail.com \
    --cc=mgorman@suse.de \
    --cc=minchan@kernel.org \
    --cc=nitingupta910@gmail.com \
    --cc=semenzato@google.com \
    --cc=sjenning@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.