RE: [PATCH 7/8] zswap: add to mm/

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Dan Magenheimer <dan.magenheimer@oracle.com>
To: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Nitin Gupta <ngupta@vflare.org>, Minchan Kim <minchan@kernel.org>,
	Konrad Wilk <konrad.wilk@oracle.com>,
	Robert Jennings <rcj@linux.vnet.ibm.com>,
	Jenifer Hopper <jhopper@us.ibm.com>, Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <jweiner@redhat.com>,
	Rik van Riel <riel@redhat.com>,
	Larry Woodman <lwoodman@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	devel@driverdev.osuosl.org, Dave Hansen <dave@linux.vnet.ibm.com>
Subject: RE: [PATCH 7/8] zswap: add to mm/
Date: Wed, 2 Jan 2013 09:08:04 -0800 (PST)	[thread overview]
Message-ID: <26bb76b3-308e-404f-b2bf-3d19b28b393a@default> (raw)
In-Reply-To: <50E32255.60901@linux.vnet.ibm.com>

> From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
> Subject: Re: [PATCH 7/8] zswap: add to mm/
> 
> > I am eagerly studying one of the details of your zswap "flush"
> > code in this patch to see how you solved a problem or two that
> > I was struggling with for the similar mechanism RFC'ed for zcache
> > (see https://lkml.org/lkml/2012/10/3/558).  I like the way
> > that you force the newly-uncompressed to-be-flushed page immediately
> > into a swap bio in zswap_flush_entry via the call to swap_writepage,
> > though I'm not entirely convinced that there aren't some race
> > conditions there.  However, won't swap_writepage simply call
> > frontswap_store instead and re-compress the page back into zswap?
> 
> I break part of swap_writepage() into a bottom half called
> __swap_writepage() that doesn't include the call to frontswap_store().
> __swap_writepage() is what is called from zswap_flush_entry().  That
> is how I avoid flushed pages recycling back into zswap and the
> potential recursion mentioned.

OK, I missed that.  Nice.  I will see if I can use the
same with zcache and, if so, would be happy to support
the change to swap_writepage.

In your next version, maybe you could break out that chunk
into a separate distinct patch so it can be pulled separately
into Andrew's tree?

> > A second related issue that concerns me is that, although you
> > are now, like zcache2, using an LRU queue for compressed pages
> > (aka "zpages"), there is no relationship between that queue and
> > physical pageframes.  In other words, you may free up 100 zpages
> > out of zswap via zswap_flush_entries, but not free up a single
> > pageframe.  This seems like a significant design issue.  Or am
> > I misunderstanding the code?
> 
> You understand correctly.  There is room for optimization here and it
> is something I'm working on right now.
> 
> What I'm looking to do is give zswap a little insight into zsmalloc
> internals,

Not to be at all snide, but had you been as eager to break
the zsmalloc abstraction last spring, a lot of unpleasantness
and extra work might have been avoided. :v(

> namely the ability figure out what class size a particular
> allocation is in and, in the event the store can't be satisfied, flush
> an entry from that exact class size so that we can be assured the
> store will succeed with minimal flushing work.  In this solution,
> there would be an LRU list per zsmalloc class size tracked in zswap.
> The result is LRU-ish flushing overall with class size being the first
> flush selection criteria and LRU as the second.

Clever and definitely useful, though I think there are two related
problems and IIUC this solves only one of them.  The problem it _does_
solve is (A) where to put a new zpage: Move a zpage from the same
class to real-swap-disk and then fill its slot with the new zpage.
The problem it _doesn't_ solve is (B) how to shrink the total number
of pageframes used by zswap, even by a single page.  I believe
(though cannot prove right now) that this latter problem will
need to be solved to implement any suitable MM policy for balancing
pages-used-for-compression vs pages-not-used-for-compression.

I fear that problem (B) is the fundamental concern with
using a high-density storage allocator such as zsmalloc, which
is why I abandoned zsmalloc in favor of a more-predictable-but-
less-dense allocator (zbud).  However, if you have a solution
for (B) as well, I would gladly abandon zbud in zcache (for _both_
cleancache and frontswap pages) and our respective in-kernel
compression efforts would be more easy to merge into one solution
in the future.

> > A third concern is about scalability... the locking seems very
> > coarse-grained.  In zcache, you personally observed and fixed
> > hashbucket contention (see https://lkml.org/lkml/2011/9/29/215).
> > Doesn't zswap's tree_lock essentially use a single tree (per
> > swaptype), i.e. no scalability?
> 
> The reason the coarse lock isn't a problem for zswap like the hash
> bucket locks where in zcache is that the lock is not held for long
> periods time as it is in zcache.  It is only held while operating on
> the tree, not during compression/decompression and larger memory
> operations.

Hmmm... IIRC, to avoid races in zcache, it was necessary to
update both the data (zpage) and meta-data ("tree" in zswap,
and tmem-data-structure in zcache) atomically.  I will need
to study your code more to understand how zswap avoids this
requirement.  Or if it is obvious to you, I would be grateful
if you would point it out to me.

Thanks,
Dan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2013-01-02 17:08 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <<1355262966-15281-1-git-send-email-sjenning@linux.vnet.ibm.com>
     [not found] ` <<1355262966-15281-8-git-send-email-sjenning@linux.vnet.ibm.com>
2012-12-31 23:06   ` [PATCH 7/8] zswap: add to mm/ Dan Magenheimer
2013-01-01 17:52     ` Seth Jennings
2013-01-02 15:55       ` Dave Hansen
2013-01-02 17:26         ` Dan Magenheimer
2013-01-02 18:17           ` Dave Hansen
2013-01-02 19:04             ` Dan Magenheimer
2013-01-03  7:33               ` Dave Chinner
2013-01-03 22:37                 ` Dan Magenheimer
2013-01-04  2:30                   ` Dave Chinner
2013-01-04 15:55                     ` Seth Jennings
2013-01-04 18:45                     ` Dan Magenheimer
2013-01-22 23:58                 ` High slab usage testing with zcache/zswap (Was: [PATCH 7/8] zswap: add to mm/) Dan Magenheimer
2013-01-02 22:44         ` [PATCH 7/8] zswap: add to mm/ Seth Jennings
2013-01-02 17:08       ` Dan Magenheimer [this message]
2013-01-02 23:25         ` Seth Jennings
2013-01-03 22:33           ` Dan Magenheimer
2013-01-04 15:42             ` Seth Jennings
2013-01-04 22:45               ` Dan Magenheimer
2013-01-07 14:47                 ` Seth Jennings
2012-12-11 21:55 [PATCH 0/8] zswap: compressed swap caching Seth Jennings
2012-12-11 21:55 ` [PATCH 1/8] staging: zsmalloc: add gfp flags to zs_create_pool Seth Jennings
2012-12-11 21:56 ` [PATCH 2/8] staging: zsmalloc: remove unsed pool name Seth Jennings
2012-12-11 21:56 ` [PATCH 3/8] staging: zsmalloc: add page alloc/free callbacks Seth Jennings
2012-12-11 21:56 ` [PATCH 4/8] staging: zsmalloc: make CLASS_DELTA relative to PAGE_SIZE Seth Jennings
2012-12-11 21:56 ` [PATCH 5/8] debugfs: add get/set for atomic types Seth Jennings
2012-12-11 21:56 ` [PATCH 6/8] zsmalloc: promote to lib/ Seth Jennings
2012-12-11 21:56 ` [PATCH 7/8] zswap: add to mm/ Seth Jennings
2013-01-03 16:07   ` Seth Jennings
2012-12-11 21:56 ` [PATCH 8/8] zswap: add documentation Seth Jennings
2012-12-11 22:01 ` [PATCH 0/8] zswap: compressed swap caching Greg Kroah-Hartman
2012-12-12 16:29   ` Seth Jennings
2012-12-12 17:27     ` Dan Magenheimer
2012-12-12 18:32       ` Seth Jennings
2012-12-12 18:36 ` Seth Jennings
2012-12-12 22:49 ` Luigi Semenzato
2012-12-12 23:46   ` Dan Magenheimer
2012-12-14 15:59   ` Seth Jennings
2013-01-03 16:01 ` Seth Jennings

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=26bb76b3-308e-404f-b2bf-3d19b28b393a@default \
    --to=dan.magenheimer@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=devel@driverdev.osuosl.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jhopper@us.ibm.com \
    --cc=jweiner@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lwoodman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=minchan@kernel.org \
    --cc=ngupta@vflare.org \
    --cc=rcj@linux.vnet.ibm.com \
    --cc=riel@redhat.com \
    --cc=sjenning@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).