RE: [PATCH 7/8] zswap: add to mm/

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dan Magenheimer <dan.magenheimer@oracle.com>
To: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Nitin Gupta <ngupta@vflare.org>, Minchan Kim <minchan@kernel.org>,
	Konrad Wilk <konrad.wilk@oracle.com>,
	Robert Jennings <rcj@linux.vnet.ibm.com>,
	Jenifer Hopper <jhopper@us.ibm.com>, Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <jweiner@redhat.com>,
	Rik van Riel <riel@redhat.com>,
	Larry Woodman <lwoodman@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	devel@driverdev.osuosl.org, Dave Hansen <dave@linux.vnet.ibm.com>
Subject: RE: [PATCH 7/8] zswap: add to mm/
Date: Wed, 2 Jan 2013 09:08:04 -0800 (PST)	[thread overview]
Message-ID: <26bb76b3-308e-404f-b2bf-3d19b28b393a@default> (raw)
In-Reply-To: <50E32255.60901@linux.vnet.ibm.com>

> From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
> Subject: Re: [PATCH 7/8] zswap: add to mm/
> 
> > I am eagerly studying one of the details of your zswap "flush"
> > code in this patch to see how you solved a problem or two that
> > I was struggling with for the similar mechanism RFC'ed for zcache
> > (see https://lkml.org/lkml/2012/10/3/558).  I like the way
> > that you force the newly-uncompressed to-be-flushed page immediately
> > into a swap bio in zswap_flush_entry via the call to swap_writepage,
> > though I'm not entirely convinced that there aren't some race
> > conditions there.  However, won't swap_writepage simply call
> > frontswap_store instead and re-compress the page back into zswap?
> 
> I break part of swap_writepage() into a bottom half called
> __swap_writepage() that doesn't include the call to frontswap_store().
> __swap_writepage() is what is called from zswap_flush_entry().  That
> is how I avoid flushed pages recycling back into zswap and the
> potential recursion mentioned.

OK, I missed that.  Nice.  I will see if I can use the
same with zcache and, if so, would be happy to support
the change to swap_writepage.

In your next version, maybe you could break out that chunk
into a separate distinct patch so it can be pulled separately
into Andrew's tree?

> > A second related issue that concerns me is that, although you
> > are now, like zcache2, using an LRU queue for compressed pages
> > (aka "zpages"), there is no relationship between that queue and
> > physical pageframes.  In other words, you may free up 100 zpages
> > out of zswap via zswap_flush_entries, but not free up a single
> > pageframe.  This seems like a significant design issue.  Or am
> > I misunderstanding the code?
> 
> You understand correctly.  There is room for optimization here and it
> is something I'm working on right now.
> 
> What I'm looking to do is give zswap a little insight into zsmalloc
> internals,

Not to be at all snide, but had you been as eager to break
the zsmalloc abstraction last spring, a lot of unpleasantness
and extra work might have been avoided. :v(

> namely the ability figure out what class size a particular
> allocation is in and, in the event the store can't be satisfied, flush
> an entry from that exact class size so that we can be assured the
> store will succeed with minimal flushing work.  In this solution,
> there would be an LRU list per zsmalloc class size tracked in zswap.
> The result is LRU-ish flushing overall with class size being the first
> flush selection criteria and LRU as the second.

Clever and definitely useful, though I think there are two related
problems and IIUC this solves only one of them.  The problem it _does_
solve is (A) where to put a new zpage: Move a zpage from the same
class to real-swap-disk and then fill its slot with the new zpage.
The problem it _doesn't_ solve is (B) how to shrink the total number
of pageframes used by zswap, even by a single page.  I believe
(though cannot prove right now) that this latter problem will
need to be solved to implement any suitable MM policy for balancing
pages-used-for-compression vs pages-not-used-for-compression.

I fear that problem (B) is the fundamental concern with
using a high-density storage allocator such as zsmalloc, which
is why I abandoned zsmalloc in favor of a more-predictable-but-
less-dense allocator (zbud).  However, if you have a solution
for (B) as well, I would gladly abandon zbud in zcache (for _both_
cleancache and frontswap pages) and our respective in-kernel
compression efforts would be more easy to merge into one solution
in the future.

> > A third concern is about scalability... the locking seems very
> > coarse-grained.  In zcache, you personally observed and fixed
> > hashbucket contention (see https://lkml.org/lkml/2011/9/29/215).
> > Doesn't zswap's tree_lock essentially use a single tree (per
> > swaptype), i.e. no scalability?
> 
> The reason the coarse lock isn't a problem for zswap like the hash
> bucket locks where in zcache is that the lock is not held for long
> periods time as it is in zcache.  It is only held while operating on
> the tree, not during compression/decompression and larger memory
> operations.

Hmmm... IIRC, to avoid races in zcache, it was necessary to
update both the data (zpage) and meta-data ("tree" in zswap,
and tmem-data-structure in zcache) atomically.  I will need
to study your code more to understand how zswap avoids this
requirement.  Or if it is obvious to you, I would be grateful
if you would point it out to me.

Thanks,
Dan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Dan Magenheimer <dan.magenheimer@oracle.com>
To: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Nitin Gupta <ngupta@vflare.org>, Minchan Kim <minchan@kernel.org>,
	Konrad Wilk <konrad.wilk@oracle.com>,
	Robert Jennings <rcj@linux.vnet.ibm.com>,
	Jenifer Hopper <jhopper@us.ibm.com>, Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <jweiner@redhat.com>,
	Rik van Riel <riel@redhat.com>,
	Larry Woodman <lwoodman@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	devel@driverdev.osuosl.org, Dave Hansen <dave@linux.vnet.ibm.com>
Subject: RE: [PATCH 7/8] zswap: add to mm/
Date: Wed, 2 Jan 2013 09:08:04 -0800 (PST)	[thread overview]
Message-ID: <26bb76b3-308e-404f-b2bf-3d19b28b393a@default> (raw)
In-Reply-To: <50E32255.60901@linux.vnet.ibm.com>

> From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
> Subject: Re: [PATCH 7/8] zswap: add to mm/
> 
> > I am eagerly studying one of the details of your zswap "flush"
> > code in this patch to see how you solved a problem or two that
> > I was struggling with for the similar mechanism RFC'ed for zcache
> > (see https://lkml.org/lkml/2012/10/3/558).  I like the way
> > that you force the newly-uncompressed to-be-flushed page immediately
> > into a swap bio in zswap_flush_entry via the call to swap_writepage,
> > though I'm not entirely convinced that there aren't some race
> > conditions there.  However, won't swap_writepage simply call
> > frontswap_store instead and re-compress the page back into zswap?
> 
> I break part of swap_writepage() into a bottom half called
> __swap_writepage() that doesn't include the call to frontswap_store().
> __swap_writepage() is what is called from zswap_flush_entry().  That
> is how I avoid flushed pages recycling back into zswap and the
> potential recursion mentioned.

OK, I missed that.  Nice.  I will see if I can use the
same with zcache and, if so, would be happy to support
the change to swap_writepage.

In your next version, maybe you could break out that chunk
into a separate distinct patch so it can be pulled separately
into Andrew's tree?

> > A second related issue that concerns me is that, although you
> > are now, like zcache2, using an LRU queue for compressed pages
> > (aka "zpages"), there is no relationship between that queue and
> > physical pageframes.  In other words, you may free up 100 zpages
> > out of zswap via zswap_flush_entries, but not free up a single
> > pageframe.  This seems like a significant design issue.  Or am
> > I misunderstanding the code?
> 
> You understand correctly.  There is room for optimization here and it
> is something I'm working on right now.
> 
> What I'm looking to do is give zswap a little insight into zsmalloc
> internals,

Not to be at all snide, but had you been as eager to break
the zsmalloc abstraction last spring, a lot of unpleasantness
and extra work might have been avoided. :v(

> namely the ability figure out what class size a particular
> allocation is in and, in the event the store can't be satisfied, flush
> an entry from that exact class size so that we can be assured the
> store will succeed with minimal flushing work.  In this solution,
> there would be an LRU list per zsmalloc class size tracked in zswap.
> The result is LRU-ish flushing overall with class size being the first
> flush selection criteria and LRU as the second.

Clever and definitely useful, though I think there are two related
problems and IIUC this solves only one of them.  The problem it _does_
solve is (A) where to put a new zpage: Move a zpage from the same
class to real-swap-disk and then fill its slot with the new zpage.
The problem it _doesn't_ solve is (B) how to shrink the total number
of pageframes used by zswap, even by a single page.  I believe
(though cannot prove right now) that this latter problem will
need to be solved to implement any suitable MM policy for balancing
pages-used-for-compression vs pages-not-used-for-compression.

I fear that problem (B) is the fundamental concern with
using a high-density storage allocator such as zsmalloc, which
is why I abandoned zsmalloc in favor of a more-predictable-but-
less-dense allocator (zbud).  However, if you have a solution
for (B) as well, I would gladly abandon zbud in zcache (for _both_
cleancache and frontswap pages) and our respective in-kernel
compression efforts would be more easy to merge into one solution
in the future.

> > A third concern is about scalability... the locking seems very
> > coarse-grained.  In zcache, you personally observed and fixed
> > hashbucket contention (see https://lkml.org/lkml/2011/9/29/215).
> > Doesn't zswap's tree_lock essentially use a single tree (per
> > swaptype), i.e. no scalability?
> 
> The reason the coarse lock isn't a problem for zswap like the hash
> bucket locks where in zcache is that the lock is not held for long
> periods time as it is in zcache.  It is only held while operating on
> the tree, not during compression/decompression and larger memory
> operations.

Hmmm... IIRC, to avoid races in zcache, it was necessary to
update both the data (zpage) and meta-data ("tree" in zswap,
and tmem-data-structure in zcache) atomically.  I will need
to study your code more to understand how zswap avoids this
requirement.  Or if it is obvious to you, I would be grateful
if you would point it out to me.

Thanks,
Dan

next prev parent reply	other threads:[~2013-01-02 17:08 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <<1355262966-15281-1-git-send-email-sjenning@linux.vnet.ibm.com>
     [not found] ` <<1355262966-15281-8-git-send-email-sjenning@linux.vnet.ibm.com>
2012-12-31 23:06   ` [PATCH 7/8] zswap: add to mm/ Dan Magenheimer
2012-12-31 23:06     ` Dan Magenheimer
2013-01-01 17:52     ` Seth Jennings
2013-01-01 17:52       ` Seth Jennings
2013-01-02 15:55       ` Dave Hansen
2013-01-02 15:55         ` Dave Hansen
2013-01-02 17:26         ` Dan Magenheimer
2013-01-02 17:26           ` Dan Magenheimer
2013-01-02 18:17           ` Dave Hansen
2013-01-02 18:17             ` Dave Hansen
2013-01-02 19:04             ` Dan Magenheimer
2013-01-02 19:04               ` Dan Magenheimer
2013-01-03  7:33               ` Dave Chinner
2013-01-03  7:33                 ` Dave Chinner
2013-01-03 22:37                 ` Dan Magenheimer
2013-01-03 22:37                   ` Dan Magenheimer
2013-01-04  2:30                   ` Dave Chinner
2013-01-04  2:30                     ` Dave Chinner
2013-01-04 15:55                     ` Seth Jennings
2013-01-04 15:55                       ` Seth Jennings
2013-01-04 18:45                     ` Dan Magenheimer
2013-01-04 18:45                       ` Dan Magenheimer
2013-01-22 23:58                 ` High slab usage testing with zcache/zswap (Was: [PATCH 7/8] zswap: add to mm/) Dan Magenheimer
2013-01-22 23:58                   ` Dan Magenheimer
2013-01-02 22:44         ` [PATCH 7/8] zswap: add to mm/ Seth Jennings
2013-01-02 22:44           ` Seth Jennings
2013-01-02 17:08       ` Dan Magenheimer [this message]
2013-01-02 17:08         ` Dan Magenheimer
2013-01-02 23:25         ` Seth Jennings
2013-01-02 23:25           ` Seth Jennings
2013-01-03 22:33           ` Dan Magenheimer
2013-01-03 22:33             ` Dan Magenheimer
2013-01-04 15:42             ` Seth Jennings
2013-01-04 15:42               ` Seth Jennings
2013-01-04 22:45               ` Dan Magenheimer
2013-01-04 22:45                 ` Dan Magenheimer
2013-01-07 14:47                 ` Seth Jennings
2013-01-07 14:47                   ` Seth Jennings
2012-12-11 21:55 [PATCH 0/8] zswap: compressed swap caching Seth Jennings
2012-12-11 21:55 ` Seth Jennings
2012-12-11 21:55 ` [PATCH 1/8] staging: zsmalloc: add gfp flags to zs_create_pool Seth Jennings
2012-12-11 21:55   ` Seth Jennings
2012-12-11 21:56 ` [PATCH 2/8] staging: zsmalloc: remove unsed pool name Seth Jennings
2012-12-11 21:56   ` Seth Jennings
2012-12-11 21:56 ` [PATCH 3/8] staging: zsmalloc: add page alloc/free callbacks Seth Jennings
2012-12-11 21:56   ` Seth Jennings
2012-12-11 21:56 ` [PATCH 4/8] staging: zsmalloc: make CLASS_DELTA relative to PAGE_SIZE Seth Jennings
2012-12-11 21:56   ` Seth Jennings
2012-12-11 21:56 ` [PATCH 5/8] debugfs: add get/set for atomic types Seth Jennings
2012-12-11 21:56   ` Seth Jennings
2012-12-11 21:56 ` [PATCH 6/8] zsmalloc: promote to lib/ Seth Jennings
2012-12-11 21:56   ` Seth Jennings
2012-12-11 21:56 ` [PATCH 7/8] zswap: add to mm/ Seth Jennings
2012-12-11 21:56   ` Seth Jennings
2013-01-03 16:07   ` Seth Jennings
2013-01-03 16:07     ` Seth Jennings
2012-12-11 21:56 ` [PATCH 8/8] zswap: add documentation Seth Jennings
2012-12-11 21:56   ` Seth Jennings
2012-12-11 22:01 ` [PATCH 0/8] zswap: compressed swap caching Greg Kroah-Hartman
2012-12-11 22:01   ` Greg Kroah-Hartman
2012-12-12 16:29   ` Seth Jennings
2012-12-12 16:29     ` Seth Jennings
2012-12-12 17:27     ` Dan Magenheimer
2012-12-12 17:27       ` Dan Magenheimer
2012-12-12 18:32       ` Seth Jennings
2012-12-12 18:32         ` Seth Jennings
2012-12-12 18:36 ` Seth Jennings
2012-12-12 22:49 ` Luigi Semenzato
2012-12-12 22:49   ` Luigi Semenzato
2012-12-12 23:46   ` Dan Magenheimer
2012-12-12 23:46     ` Dan Magenheimer
2012-12-14 15:59   ` Seth Jennings
2012-12-14 15:59     ` Seth Jennings
2013-01-03 16:01 ` Seth Jennings
2013-01-03 16:01   ` Seth Jennings

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=26bb76b3-308e-404f-b2bf-3d19b28b393a@default \
    --to=dan.magenheimer@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=devel@driverdev.osuosl.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jhopper@us.ibm.com \
    --cc=jweiner@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lwoodman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=minchan@kernel.org \
    --cc=ngupta@vflare.org \
    --cc=rcj@linux.vnet.ibm.com \
    --cc=riel@redhat.com \
    --cc=sjenning@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.