linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ric Mason <ric.masonn@gmail.com>
To: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: minchan@kernel.org, sjenning@linux.vnet.ibm.com,
	Nitin Gupta <nitingupta910@gmail.com>,
	Konrad Wilk <konrad.wilk@oracle.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Bob Liu <lliubbo@gmail.com>,
	Luigi Semenzato <semenzato@google.com>,
	Mel Gorman <mgorman@suse.de>
Subject: Re: zsmalloc limitations and related topics
Date: Fri, 01 Mar 2013 09:40:18 +0800	[thread overview]
Message-ID: <51300702.1050006@gmail.com> (raw)
In-Reply-To: <0efe9610-1aa5-4aa9-bde9-227acfa969ca@default>

On 02/28/2013 07:24 AM, Dan Magenheimer wrote:
> Hi all --
>
> I've been doing some experimentation on zsmalloc in preparation
> for my topic proposed for LSFMM13 and have run across some
> perplexing limitations.  Those familiar with the intimate details
> of zsmalloc might be well aware of these limitations, but they
> aren't documented or immediately obvious, so I thought it would
> be worthwhile to air them publicly.  I've also included some
> measurements from the experimentation and some related thoughts.
>
> (Some of the terms here are unusual and may be used inconsistently
> by different developers so a glossary of definitions of the terms
> used here is appended.)
>
> ZSMALLOC LIMITATIONS
>
> Zsmalloc is used for two zprojects: zram and the out-of-tree
> zswap.  Zsmalloc can achieve high density when "full".  But:
>
> 1) Zsmalloc has a worst-case density of 0.25 (one zpage per
>     four pageframes).
> 2) When not full and especially when nearly-empty _after_
>     being full, density may fall below 1.0 as a result of
>     fragmentation.

What's the meaning of nearly-empty _after_ being full?

> 3) Zsmalloc has a density of exactly 1.0 for any number of
>     zpages with zsize >= 0.8.
> 4) Zsmalloc contains several compile-time parameters;
>     the best value of these parameters may be very workload
>     dependent.
>
> If density == 1.0, that means we are paying the overhead of
> compression+decompression for no space advantage.  If
> density < 1.0, that means using zsmalloc is detrimental,
> resulting in worse memory pressure than if it were not used.
>
> WORKLOAD ANALYSIS
>
> These limitations emphasize that the workload used to evaluate
> zsmalloc is very important.  Benchmarks that measure data

Could you share your benchmark? In order that other guys can take 
advantage of it.

> throughput or CPU utilization are of questionable value because
> it is the _content_ of the data that is particularly relevant
> for compression.  Even more precisely, it is the "entropy"
> of the data that is relevant, because the amount of
> compressibility in the data is related to the entropy:
> I.e. an entirely random pagefull of bits will compress poorly
> and a highly-regular pagefull of bits will compress well.
> Since the zprojects manage a large number of zpages, both
> the mean and distribution of zsize of the workload should
> be "representative".
>
> The workload most widely used to publish results for
> the various zprojects is a kernel-compile using "make -jN"
> where N is artificially increased to impose memory pressure.
> By adding some debug code to zswap, I was able to analyze
> this workload and found the following:
>
> 1) The average page compressed by almost a factor of six
>     (mean zsize == 694, stddev == 474)

stddev is what?

> 2) Almost eleven percent of the pages were zero pages.  A
>     zero page compresses to 28 bytes.
> 3) On average, 77% of the bytes (3156) in the pages-to-be-
>     compressed contained a byte-value of zero.
> 4) Despite the above, mean density of zsmalloc was measured at
>     3.2 zpages/pageframe, presumably losing nearly half of
>     available space to fragmentation.
>
> I have no clue if these measurements are representative
> of a wide range of workloads over the lifetime of a booted
> machine, but I am suspicious that they are not.  For example,
> the lzo1x compression algorithm claims to compress data by
> about a factor of two.
>
> I would welcome ideas on how to evaluate workloads for
> "representativeness".  Personally I don't believe we should
> be making decisions about selecting the "best" algorithms
> or merging code without an agreement on workloads.
>
> PAGEFRAME EVACUATION AND RECLAIM
>
> I've repeatedly stated the opinion that managing the number of
> pageframes containing compressed pages will be valuable for
> managing MM interaction/policy when compression is used in
> the kernel.  After the experimentation above and some brainstorming,
> I still do not see an effective method for zsmalloc evacuating and
> reclaiming pageframes, because both are complicated by high density
> and page-crossing.  In other words, zsmalloc's strengths may
> also be its Achilles heels.  For zram, as far as I can see,
> pageframe evacuation/reclaim is irrelevant except perhaps
> as part of mass defragmentation.  For zcache and zswap, where
> writethrough is used, pageframe evacuation/reclaim is very relevant.
> (Note: The writeback implemented in zswap does _zpage_ evacuation
> without pageframe reclaim.)
>
> CLOSING THOUGHT
>
> Since zsmalloc and zbud have different strengths and weaknesses,
> I wonder if some combination or hybrid might be more optimal?
> But unless/until we have and can measure a representative workload,
> only intuition can answer that.
>
> GLOSSARY
>
> zproject -- a kernel project using compression (zram, zcache, zswap)
> zpage -- a compressed sequence of PAGE_SIZE bytes
> zsize -- the number of bytes in a compressed page
> pageframe -- the term "page" is widely used both to describe
>      either (1) PAGE_SIZE bytes of data, or (2) a physical RAM
>      area with size=PAGE_SIZE which is PAGE_SIZE-aligned,
>      as represented in the kernel by a struct page.  To be explicit,
>      we refer to (2) as a pageframe.
> density -- zpages per pageframe; higher is (presumably) better
> zsmalloc -- a slab-based allocator written by Nitin Gupta to
>       efficiently store zpages and designed to allow zpages
>       to be split across two non-contiguous pageframes
> zspage -- a grouping of N non-contiguous pageframes managed
>       as a unit by zsmalloc to store zpages for which zsize
>       falls within a certain range.  (The compile-time
>       default maximum size for N is 4).
> zbud -- a buddy-based allocator written by Dan Magenheimer
>       (specifically for zcache) to predictably store zpages;
>       no more than two zpages are stored in any pageframe
> pageframe evacuation/reclaim -- the process of removing
>       zpages from one or more pageframes, including pointers/nodes
>       from any data structures referencing those zpages,
>       so that the pageframe(s) can be freed for use by
>       the rest of the kernel
> writeback --  the process of transferring zpages from
>       storage in a zproject to a backing swap device
> lzo1x -- a compression algorithm used by default by all the
>       zprojects; the kernel implementation resides in lib/lzo.c
> entropy -- randomness of data to be compressed; higher entropy
>       means worse data compression
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=ilto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2013-03-01  1:40 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-27 23:24 zsmalloc limitations and related topics Dan Magenheimer
2013-02-28 22:00 ` Dan Magenheimer
2013-03-01  1:40 ` Ric Mason [this message]
2013-03-04 18:29   ` Dan Magenheimer
2013-03-13 15:14 ` Robert Jennings
2013-03-13 15:33   ` Seth Jennings
2013-03-13 15:56     ` Seth Jennings
2013-03-13 20:02   ` Dan Magenheimer
2013-03-13 22:59     ` Seth Jennings
2013-03-14 12:02       ` Bob
2013-03-14 13:20         ` Robert Jennings
2013-03-14 18:54           ` Dan Magenheimer
2013-03-15 16:14             ` Seth Jennings
2013-03-15 16:54               ` Dan Magenheimer
2013-03-15 16:18             ` Seth Jennings
2013-03-14 17:39       ` Dan Magenheimer
2013-03-14 19:16     ` Dan Magenheimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51300702.1050006@gmail.com \
    --to=ric.masonn@gmail.com \
    --cc=dan.magenheimer@oracle.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lliubbo@gmail.com \
    --cc=mgorman@suse.de \
    --cc=minchan@kernel.org \
    --cc=nitingupta910@gmail.com \
    --cc=semenzato@google.com \
    --cc=sjenning@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).