Re: [PATCHv7 4/8] zswap: add to mm/

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Seth Jennings <sjenning@linux.vnet.ibm.com>
To: Dave Hansen <dave@sr71.net>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Nitin Gupta <ngupta@vflare.org>, Minchan Kim <minchan@kernel.org>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Dan Magenheimer <dan.magenheimer@oracle.com>,
	Robert Jennings <rcj@linux.vnet.ibm.com>,
	Jenifer Hopper <jhopper@us.ibm.com>, Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <jweiner@redhat.com>,
	Rik van Riel <riel@redhat.com>,
	Larry Woodman <lwoodman@redhat.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Joe Perches <joe@perches.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Cody P Schafer <cody@linux.vnet.ibm.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	devel@driverdev.osuosl.org
Subject: Re: [PATCHv7 4/8] zswap: add to mm/
Date: Thu, 07 Mar 2013 15:21:54 -0600	[thread overview]
Message-ID: <513904F2.50607@linux.vnet.ibm.com> (raw)
In-Reply-To: <5138E3C7.9080205@sr71.net>

On 03/07/2013 01:00 PM, Dave Hansen wrote:
> On 03/06/2013 07:52 AM, Seth Jennings wrote:
>> +static int __zswap_cpu_notifier(unsigned long action, unsigned long cpu)
>> +{
>> +	struct crypto_comp *tfm;
>> +	u8 *dst;
>> +
>> +	switch (action) {
>> +	case CPU_UP_PREPARE:
>> +		tfm = crypto_alloc_comp(zswap_compressor, 0, 0);
>> +		if (IS_ERR(tfm)) {
>> +			pr_err("can't allocate compressor transform\n");
>> +			return NOTIFY_BAD;
>> +		}
>> +		*per_cpu_ptr(zswap_comp_pcpu_tfms, cpu) = tfm;
>> +		dst = (u8 *)__get_free_pages(GFP_KERNEL, 1);
> 
> Are there some alignment requirements for 'dst'?  If not, why not use
> kmalloc()?  I think kmalloc() should always be used where possible since
> slab debugging is so useful compared to what we can do with raw
> buddy-allocated pages.

Sounds good to me.

> 
> Where does the order-1 requirement come from by the way?

Unsafe LZO compression
(http://article.gmane.org/gmane.linux.kernel.mm/95460)

Forgot to put in the comment for v7.

> 
> ...
>> +**********************************/
>> +/* attempts to compress and store an single page */
>> +static int zswap_frontswap_store(unsigned type, pgoff_t offset,
>> +				struct page *page)
>> +{
> ...
>> +	/* store */
>> +	handle = zs_malloc(tree->pool, dlen,
>> +		__GFP_NORETRY | __GFP_HIGHMEM | __GFP_NOMEMALLOC |
>> +			__GFP_NOWARN);
>> +	if (!handle) {
>> +		zswap_reject_zsmalloc_fail++;
>> +		ret = -ENOMEM;
>> +		goto putcpu;
>> +	}
>> +
> 
> I think there needs to at least be some strong comments in here about
> why you're doing this kind of allocation.  From some IRC discussion, it
> seems like you found some pathological case where zswap wasn't helping
> make reclaim progress and ended up draining the reserve pools and you
> did this to avoid draining the reserve pools.

I'm currently doing some tests with fewer zsmalloc class sizes and
removing __GFP_NOMEMALLOC to see the effect.

> 
> I think the lack of progress doing reclaim is really the root cause you
> should be going after here instead of just working around the symptom.
> 
>> +/* NOTE: this is called in atomic context from swapon and must not sleep */
>> +static void zswap_frontswap_init(unsigned type)
>> +{
>> +	struct zswap_tree *tree;
>> +
>> +	tree = kzalloc(sizeof(struct zswap_tree), GFP_NOWAIT);
>> +	if (!tree)
>> +		goto err;
>> +	tree->pool = zs_create_pool(GFP_NOWAIT, &zswap_zs_ops);
>> +	if (!tree->pool)
>> +		goto freetree;
>> +	tree->rbroot = RB_ROOT;
>> +	spin_lock_init(&tree->lock);
>> +	zswap_trees[type] = tree;
>> +	return;
>> +
>> +freetree:
>> +	kfree(tree);
>> +err:
>> +	pr_err("alloc failed, zswap disabled for swap type %d\n", type);
>> +}
> 
> How large are these allocations?  Why are you doing GFP_NOWAIT instead
> of GFP_ATOMIC?  This seems like the kind of thing that you'd _want_ to
> be able to dip in to the reserves for.

Not large. Would almost never make a difference, but you're right;
should use GFP_ATOMIC.

Thanks,
Seth

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Seth Jennings <sjenning@linux.vnet.ibm.com>
To: Dave Hansen <dave@sr71.net>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Nitin Gupta <ngupta@vflare.org>, Minchan Kim <minchan@kernel.org>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Dan Magenheimer <dan.magenheimer@oracle.com>,
	Robert Jennings <rcj@linux.vnet.ibm.com>,
	Jenifer Hopper <jhopper@us.ibm.com>, Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <jweiner@redhat.com>,
	Rik van Riel <riel@redhat.com>,
	Larry Woodman <lwoodman@redhat.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Joe Perches <joe@perches.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Cody P Schafer <cody@linux.vnet.ibm.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	devel@driverdev.osuosl.org
Subject: Re: [PATCHv7 4/8] zswap: add to mm/
Date: Thu, 07 Mar 2013 15:21:54 -0600	[thread overview]
Message-ID: <513904F2.50607@linux.vnet.ibm.com> (raw)
In-Reply-To: <5138E3C7.9080205@sr71.net>

On 03/07/2013 01:00 PM, Dave Hansen wrote:
> On 03/06/2013 07:52 AM, Seth Jennings wrote:
>> +static int __zswap_cpu_notifier(unsigned long action, unsigned long cpu)
>> +{
>> +	struct crypto_comp *tfm;
>> +	u8 *dst;
>> +
>> +	switch (action) {
>> +	case CPU_UP_PREPARE:
>> +		tfm = crypto_alloc_comp(zswap_compressor, 0, 0);
>> +		if (IS_ERR(tfm)) {
>> +			pr_err("can't allocate compressor transform\n");
>> +			return NOTIFY_BAD;
>> +		}
>> +		*per_cpu_ptr(zswap_comp_pcpu_tfms, cpu) = tfm;
>> +		dst = (u8 *)__get_free_pages(GFP_KERNEL, 1);
> 
> Are there some alignment requirements for 'dst'?  If not, why not use
> kmalloc()?  I think kmalloc() should always be used where possible since
> slab debugging is so useful compared to what we can do with raw
> buddy-allocated pages.

Sounds good to me.

> 
> Where does the order-1 requirement come from by the way?

Unsafe LZO compression
(http://article.gmane.org/gmane.linux.kernel.mm/95460)

Forgot to put in the comment for v7.

> 
> ...
>> +**********************************/
>> +/* attempts to compress and store an single page */
>> +static int zswap_frontswap_store(unsigned type, pgoff_t offset,
>> +				struct page *page)
>> +{
> ...
>> +	/* store */
>> +	handle = zs_malloc(tree->pool, dlen,
>> +		__GFP_NORETRY | __GFP_HIGHMEM | __GFP_NOMEMALLOC |
>> +			__GFP_NOWARN);
>> +	if (!handle) {
>> +		zswap_reject_zsmalloc_fail++;
>> +		ret = -ENOMEM;
>> +		goto putcpu;
>> +	}
>> +
> 
> I think there needs to at least be some strong comments in here about
> why you're doing this kind of allocation.  From some IRC discussion, it
> seems like you found some pathological case where zswap wasn't helping
> make reclaim progress and ended up draining the reserve pools and you
> did this to avoid draining the reserve pools.

I'm currently doing some tests with fewer zsmalloc class sizes and
removing __GFP_NOMEMALLOC to see the effect.

> 
> I think the lack of progress doing reclaim is really the root cause you
> should be going after here instead of just working around the symptom.
> 
>> +/* NOTE: this is called in atomic context from swapon and must not sleep */
>> +static void zswap_frontswap_init(unsigned type)
>> +{
>> +	struct zswap_tree *tree;
>> +
>> +	tree = kzalloc(sizeof(struct zswap_tree), GFP_NOWAIT);
>> +	if (!tree)
>> +		goto err;
>> +	tree->pool = zs_create_pool(GFP_NOWAIT, &zswap_zs_ops);
>> +	if (!tree->pool)
>> +		goto freetree;
>> +	tree->rbroot = RB_ROOT;
>> +	spin_lock_init(&tree->lock);
>> +	zswap_trees[type] = tree;
>> +	return;
>> +
>> +freetree:
>> +	kfree(tree);
>> +err:
>> +	pr_err("alloc failed, zswap disabled for swap type %d\n", type);
>> +}
> 
> How large are these allocations?  Why are you doing GFP_NOWAIT instead
> of GFP_ATOMIC?  This seems like the kind of thing that you'd _want_ to
> be able to dip in to the reserves for.

Not large. Would almost never make a difference, but you're right;
should use GFP_ATOMIC.

Thanks,
Seth

next prev parent reply	other threads:[~2013-03-07 21:22 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-06 15:52 [PATCHv7 0/8] zswap: compressed swap caching Seth Jennings
2013-03-06 15:52 ` Seth Jennings
2013-03-06 15:52 ` [PATCHv7 1/8] zsmalloc: add to mm/ Seth Jennings
2013-03-06 15:52   ` Seth Jennings
2013-03-16 14:09   ` Bob Liu
2013-03-16 14:09     ` Bob Liu
2013-03-06 15:52 ` [PATCHv7 2/8] zsmalloc: add documentation Seth Jennings
2013-03-06 15:52   ` Seth Jennings
2013-03-06 15:52 ` [PATCHv7 3/8] debugfs: add get/set for atomic types Seth Jennings
2013-03-06 15:52   ` Seth Jennings
2013-03-06 15:52 ` [PATCHv7 4/8] zswap: add to mm/ Seth Jennings
2013-03-06 15:52   ` Seth Jennings
2013-03-07 19:00   ` Dave Hansen
2013-03-07 19:00     ` Dave Hansen
2013-03-07 21:21     ` Seth Jennings [this message]
2013-03-07 21:21       ` Seth Jennings
2013-03-07 21:24       ` Dave Hansen
2013-03-07 21:24         ` Dave Hansen
2013-03-07 23:11       ` Dan Magenheimer
2013-03-07 23:11         ` Dan Magenheimer
2013-03-06 15:52 ` [PATCHv7 5/8] mm: break up swap_writepage() for frontswap backends Seth Jennings
2013-03-06 15:52   ` Seth Jennings
2013-03-06 15:52 ` [PATCHv7 6/8] mm: allow for outstanding swap writeback accounting Seth Jennings
2013-03-06 15:52   ` Seth Jennings
2013-03-06 15:52 ` [PATCHv7 7/8] zswap: add swap page writeback support Seth Jennings
2013-03-06 15:52   ` Seth Jennings
2013-03-06 15:52 ` [PATCHv7 8/8] zswap: add documentation Seth Jennings
2013-03-06 15:52   ` Seth Jennings

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=513904F2.50607@linux.vnet.ibm.com \
    --to=sjenning@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=cody@linux.vnet.ibm.com \
    --cc=dan.magenheimer@oracle.com \
    --cc=dave@sr71.net \
    --cc=devel@driverdev.osuosl.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=jhopper@us.ibm.com \
    --cc=joe@perches.com \
    --cc=jweiner@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lwoodman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=minchan@kernel.org \
    --cc=ngupta@vflare.org \
    --cc=rcj@linux.vnet.ibm.com \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.