From: Simon Jeons <simon.jeons@gmail.com>
To: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Nitin Gupta <ngupta@vflare.org>, Minchan Kim <minchan@kernel.org>,
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
Dan Magenheimer <dan.magenheimer@oracle.com>,
Robert Jennings <rcj@linux.vnet.ibm.com>,
Jenifer Hopper <jhopper@us.ibm.com>, Mel Gorman <mgorman@suse.de>,
Johannes Weiner <jweiner@redhat.com>,
Rik van Riel <riel@redhat.com>,
Larry Woodman <lwoodman@redhat.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Dave Hansen <dave@linux.vnet.ibm.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
devel@driverdev.osuosl.org
Subject: Re: [PATCHv4 0/7] zswap: compressed swap caching
Date: Thu, 31 Jan 2013 19:39:44 -0600 [thread overview]
Message-ID: <1359682784.3574.2.camel@kernel> (raw)
In-Reply-To: <1359495627-30285-1-git-send-email-sjenning@linux.vnet.ibm.com>
Hi Seth,
On Tue, 2013-01-29 at 15:40 -0600, Seth Jennings wrote:
> Sorry for the churn but just this set might be easier to review.
> The code required for the flushing is in a separate patch now
> as requested.
>
> Changelog:
>
> v4:
> * Added Acks (Minchan)
> * Separated flushing functionality into standalone patch
> for easier review (Minchan)
> * fix comment on zswap enabled attribute (Minchan)
> * add TODO for dynamic mempool size (Minchan)
> * and check for NULL in zswap_free_page() (Minchan)
> * add missing zs_free() in error path (Minchan)
> * TODO: add comments for flushing/refcounting (Minchan)
>
> NOTE: To build, read this:
> http://lkml.org/lkml/2013/1/28/586
>
> v3:
> * Dropped the zsmalloc patches from the set, except the promotion patch
> which has be converted to a rename patch (vs full diff). The dropped
> patches have been Acked and are going into Greg's staging tree soon.
> * Separated [PATCHv2 7/9] into two patches since it makes changes for two
> different reasons (Minchan)
> * Moved ZSWAP_MAX_OUTSTANDING_FLUSHES near the top in zswap.c (Rik)
> * Rebase to v3.8-rc5. linux-next is a little volatile with the
> swapper_space per type changes which will effect this patchset.
> * TODO: Move some stats from debugfs to sysfs. Which ones? (Rik)
>
> v2:
> * Rename zswap_fs_* functions to zswap_frontswap_* to avoid
> confusion with "filesystem"
> * Add comment about what the tree lock protects
> * Remove "#if 0" code (should have been done before)
> * Break out changes to existing swap code into separate patch
> * Fix blank line EOF warning on documentation file
> * Rebase to next-20130107
>
> Zswap Overview:
>
> Zswap is a lightweight compressed cache for swap pages. It takes
> pages that are in the process of being swapped out and attempts to
> compress them into a dynamically allocated RAM-based memory pool.
> If this process is successful, the writeback to the swap device is
> deferred and, in many cases, avoided completely. This results in
> a significant I/O reduction and performance gains for systems that
> are swapping.
>
> The results of a kernel building benchmark indicate a
> runtime reduction of 53% and an I/O reduction 76% with zswap vs normal
> swapping with a kernel build under heavy memory pressure (see
> Performance section for more).
>
> Some addition performance metrics regarding the performance
> improvements and I/O reductions that can be achieved using zswap as
> measured by SPECjbb are provided here:
>
> http://ibm.co/VCgHvM
>
> These results include runs on x86 and new results on Power7+ with
> hardware compression acceleration.
>
> Of particular note is that zswap is able to evict pages from the compressed
> cache, on an LRU basis, to the backing swap device when the compressed pool
> reaches it size limit or the pool is unable to obtain additional pages
> from the buddy allocator. This eviction functionality had been identified
> as a requirement in prior community discussions.
>
> Patchset Structure:
> 1: add atomic_t get/set to debugfs
> 2: promote zsmalloc to /lib
> 3,4: changes to existing swap code for zswap
> 5,6: add zswap and documentation
>
> Rationale:
>
> Zswap provides compressed swap caching that basically trades CPU cycles
> for reduced swap I/O. This trade-off can result in a significant
> performance improvement as reads to/writes from to the compressed
> cache almost always faster that reading from a swap device
> which incurs the latency of an asynchronous block I/O read.
>
> Some potential benefits:
> * Desktop/laptop users with limited RAM capacities can mitigate the
> performance impact of swapping.
> * Overcommitted guests that share a common I/O resource can
> dramatically reduce their swap I/O pressure, avoiding heavy
> handed I/O throttling by the hypervisor. This allows more work
> to get done with less impact to the guest workload and guests
> sharing the I/O subsystem
> * Users with SSDs as swap devices can extend the life of the device by
> drastically reducing life-shortening writes.
>
> Compressed swap is also provided in zcache, along with page cache
> compression and RAM clustering through RAMSter. Zswap seeks to deliver
> the benefit of swap compression to users in a discrete function.
> This design decision is akin to Unix design philosophy of doing one
> thing well, it leaves file cache compression and other features
> for separate code.
>
> Design:
>
> Zswap receives pages for compression through the Frontswap API and
> is able to evict pages from its own compressed pool on an LRU basis
> and write them back to the backing swap device in the case that the
> compressed pool is full or unable to secure additional pages from
> the buddy allocator.
>
> Zswap makes use of zsmalloc for the managing the compressed memory
> pool. This is because zsmalloc is specifically designed to minimize
> fragmentation on large (> PAGE_SIZE/2) allocation sizes. Each
> allocation in zsmalloc is not directly accessible by address.
> Rather, a handle is return by the allocation routine and that handle
> must be mapped before being accessed. The compressed memory pool grows
> on demand and shrinks as compressed pages are freed. The pool is
> not preallocated.
>
> When a swap page is passed from frontswap to zswap, zswap maintains
> a mapping of the swap entry, a combination of the swap type and swap
> offset, to the zsmalloc handle that references that compressed swap
> page. This mapping is achieved with a red-black tree per swap type.
> The swap offset is the search key for the tree nodes.
>
> Zswap seeks to be simple in its policies. Sysfs attributes allow for
> two user controlled policies:
> * max_compression_ratio - Maximum compression ratio, as as percentage,
> for an acceptable compressed page. Any page that does not compress
> by at least this ratio will be rejected.
> * max_pool_percent - The maximum percentage of memory that the compressed
> pool can occupy.
>
> To enabled zswap, the "enabled" attribute must be set to 1 at boot time.
>
> Zswap allows the compressor to be selected at kernel boot time by
> setting the a??compressora?? attribute. The default compressor is lzo.
>
> A debugfs interface is provided for various statistic about pool size,
> number of pages stored, and various counters for the reasons pages
> are rejected.
>
> Performance, Kernel Building:
>
> Setup
> ========
> Gentoo w/ kernel v3.7-rc7
> Quad-core i5-2500 @ 3.3GHz
> 512MB DDR3 1600MHz (limited with mem=512m on boot)
> Filesystem and swap on 80GB HDD (about 58MB/s with hdparm -t)
> majflt are major page faults reported by the time command
> pswpin/out is the delta of pswpin/out from /proc/vmstat before and after
> the make -jN
>
> Summary
> ========
> * Zswap reduces I/O and improves performance at all swap pressure levels.
>
> * Under heavy swaping at 24 threads, zswap reduced I/O by 76%, saving
> over 1.5GB of I/O, and cut runtime in half.
How to get your benchmark?
>
> Details
> ========
> I/O (in pages)
> base zswap change change
> N pswpin pswpout majflt I/O sum pswpin pswpout majflt I/O sum %I/O MB
> 8 1 335 291 627 0 0 249 249 -60% 1
> 12 3688 14315 5290 23293 123 860 5954 6937 -70% 64
> 16 12711 46179 16803 75693 2936 7390 46092 56418 -25% 75
> 20 42178 133781 49898 225857 9460 28382 92951 130793 -42% 371
> 24 96079 357280 105242 558601 7719 18484 109309 135512 -76% 1653
>
> Runtime (in seconds)
> N base zswap %change
> 8 107 107 0%
> 12 128 110 -14%
> 16 191 179 -6%
> 20 371 240 -35%
> 24 570 267 -53%
>
> %CPU utilization (out of 400% on 4 cpus)
> N base zswap %change
> 8 317 319 1%
> 12 267 311 16%
> 16 179 191 7%
> 20 94 143 52%
> 24 60 128 113%
>
>
> Seth Jennings (7):
> debugfs: add get/set for atomic types
> zsmalloc: promote to lib/
> zswap: add to mm/
> mm: break up swap_writepage() for frontswap backends
> mm: allow for outstanding swap writeback accounting
> zswap: add flushing support
> zswap: add documentation
>
> Documentation/vm/zswap.txt | 73 ++
> drivers/staging/Kconfig | 2 -
> drivers/staging/Makefile | 1 -
> drivers/staging/zcache/zcache-main.c | 3 +-
> drivers/staging/zram/zram_drv.h | 3 +-
> drivers/staging/zsmalloc/Kconfig | 10 -
> drivers/staging/zsmalloc/Makefile | 3 -
> fs/debugfs/file.c | 42 +
> include/linux/debugfs.h | 2 +
> include/linux/swap.h | 4 +
> .../staging/zsmalloc => include/linux}/zsmalloc.h | 0
> lib/Kconfig | 18 +
> lib/Makefile | 1 +
> .../zsmalloc/zsmalloc-main.c => lib/zsmalloc.c | 3 +-
> mm/Kconfig | 15 +
> mm/Makefile | 1 +
> mm/page_io.c | 22 +-
> mm/swap_state.c | 2 +-
> mm/zswap.c | 1073 ++++++++++++++++++++
> 19 files changed, 1250 insertions(+), 28 deletions(-)
> create mode 100644 Documentation/vm/zswap.txt
> delete mode 100644 drivers/staging/zsmalloc/Kconfig
> delete mode 100644 drivers/staging/zsmalloc/Makefile
> rename {drivers/staging/zsmalloc => include/linux}/zsmalloc.h (100%)
> rename drivers/staging/zsmalloc/zsmalloc-main.c => lib/zsmalloc.c (99%)
> create mode 100644 mm/zswap.c
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-02-01 1:39 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-29 21:40 [PATCHv4 0/7] zswap: compressed swap caching Seth Jennings
2013-01-29 21:40 ` [PATCHv4 1/7] debugfs: add get/set for atomic types Seth Jennings
2013-01-29 21:40 ` [PATCHv4 2/7] zsmalloc: promote to lib/ Seth Jennings
2013-01-29 22:51 ` Andrew Morton
2013-01-30 16:28 ` Seth Jennings
2013-01-30 23:34 ` Andrew Morton
2013-01-31 5:35 ` Minchan Kim
2013-02-13 16:00 ` Seth Jennings
2013-01-29 21:40 ` [PATCHv4 3/7] zswap: add to mm/ Seth Jennings
2013-01-31 7:07 ` Minchan Kim
2013-01-31 19:06 ` Seth Jennings
2013-01-31 20:07 ` Robert Jennings
2013-02-01 2:38 ` Minchan Kim
2013-02-01 15:31 ` Seth Jennings
2013-02-01 17:46 ` Seth Jennings
2013-01-29 21:40 ` [PATCHv4 4/7] mm: break up swap_writepage() for frontswap backends Seth Jennings
2013-01-29 21:40 ` [PATCHv4 5/7] mm: allow for outstanding swap writeback accounting Seth Jennings
2013-01-29 21:40 ` [PATCHv4 6/7] zswap: add flushing support Seth Jennings
2013-01-29 23:03 ` Andrew Morton
2013-02-01 7:27 ` Minchan Kim
2013-02-13 6:24 ` Seth Jennings
2013-01-29 21:40 ` [PATCHv4 7/7] zswap: add documentation Seth Jennings
2013-01-29 23:07 ` Andrew Morton
2013-01-29 22:14 ` [PATCHv4 0/7] zswap: compressed swap caching Joe Perches
2013-01-29 22:49 ` Seth Jennings
2013-01-30 4:32 ` Minchan Kim
2013-01-30 16:01 ` Seth Jennings
2013-02-01 1:39 ` Simon Jeons [this message]
2013-02-01 15:13 ` Seth Jennings
2013-02-03 0:17 ` Simon Jeons
2013-02-04 14:56 ` Seth Jennings
2013-02-04 1:03 ` Simon Jeons
2013-02-04 15:07 ` Seth Jennings
[not found] ` <5110287A.5050200@linux.vnet.ibm.com>
2013-02-04 21:45 ` Seth Jennings
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1359682784.3574.2.camel@kernel \
--to=simon.jeons@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=benh@kernel.crashing.org \
--cc=dan.magenheimer@oracle.com \
--cc=dave@linux.vnet.ibm.com \
--cc=devel@driverdev.osuosl.org \
--cc=gregkh@linuxfoundation.org \
--cc=jhopper@us.ibm.com \
--cc=jweiner@redhat.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lwoodman@redhat.com \
--cc=mgorman@suse.de \
--cc=minchan@kernel.org \
--cc=ngupta@vflare.org \
--cc=rcj@linux.vnet.ibm.com \
--cc=riel@redhat.com \
--cc=sjenning@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).