Re: frontswap/zcache: xvmalloc discussion

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Seth Jennings <sjenning@linux.vnet.ibm.com>
To: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: linux-mm <linux-mm@kvack.org>, Nitin Gupta <ngupta@vflare.org>,
	Robert Jennings <rcj@linux.vnet.ibm.com>,
	Brian King <brking@linux.vnet.ibm.com>,
	Greg Kroah-Hartman <gregkh@suse.de>
Subject: Re: frontswap/zcache: xvmalloc discussion
Date: Thu, 23 Jun 2011 16:59:54 -0500	[thread overview]
Message-ID: <4E03B75A.9040203@linux.vnet.ibm.com> (raw)
In-Reply-To: <0a3a5959-5d8f-4f62-a879-34266922c59f@default>

On 06/23/2011 11:38 AM, Dan Magenheimer wrote:
>> From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
>> Cc: Dan Magenheimer; Nitin Gupta; Robert Jennings; Brian King; Greg Kroah-Hartman
>> Subject: frontswap/zcache: xvmalloc discussion
>>
>> Dan, Nitin,
> 
> Hi Seth --
> 
> Thanks for your interest in frontswap and zcache!

Thanks for your quick response!

> 
>> I have been experimenting with the frontswap v4 patches and the latest
>> zcache in the mainline drivers/staging.  There is a particular issue I'm
>> seeing when using pages of different compressibilities.
>>
>> When the pages compress to less than PAGE_SIZE/2, I get good compression
>> and little external fragmentation in the xvmalloc pool.  However, when
>> the pages have a compressed size greater than PAGE_SIZE/2, it is a very
>> different story.  Basically, because xvmalloc allocations can't span
>> multiple pool pages, grow_pool() is called on each allocation, reducing
>> the effective compression (total_pages_in_frontswap /
>> total_pages_in_xvmalloc_pool) to 0 and drastically increasing external
>> fragmentation to up to 50%.
>>
>> The likelihood that the size of a compressed page is greater than
>> PAGE_SIZE/2 is high, considering that lzo1x-1 sacrifices compressibility
>> for speed.  In my experiments, pages of English text only compressed to
>> 75% of their original size with 1zo1x-1.
> 
> Wow, I'm surprised to hear that.  I suppose it is very workload
> dependent, but I agree that consistently poor compression can create
> issues for frontswap.
>
 
Yes, I was surprised as well with how little it compressed.  I guess I'm 
used to gzip level compression, which was around 50% on the same data set.

>> In order to calculate the effective compression of frontswap, you need
>> the number of pages stored by frontswap, provided by frontswap's
>> curr_pages sysfs attribute, and the number of pages in the xvmalloc
>> pool.  There isn't a sysfs attribute for this, so I made a patch that
>> creates a new zv_pool_pages_count attribute for zcache that provides
>> this value (patch is in a follow-up message).  I have also included my
>> simple test program at the end of this email.  It just allocates and
>> stores random pages of from a text file (in my case, a text file of Moby
>> Dick).
>>
>> The real problem here is compressing pages of size x and storing them in
>> a pool that has "chunks", if you will, also of size x, where allocations
>> can't span multiple chunks.  Ideally, I'd like to address this issue by
>> expanding the size of the xvmalloc pool chunks from one page to four
>> pages (I can explain why four is a good number, just didn't want to make
>> this note too long).
> 
> Nitin is the expert on compression and xvmalloc... I mostly built on top
> of his earlier work... so I will wait for him to comment on compression
> and xvmalloc issues.
>

Yes, I do need Nitin to weigh in on this since any changes to the xvmalloc
code would impact zcache and zram.
 
> BUT... I'd be concerned with increasing the pool chunk, at least without
> a fallback.  When memory is constrained, finding chunks in the kernel
> of even two consecutive pages might be a challenge, let alone four.
> Since frontswap only is invoked if swapping is occurring, memory
> is definitely already constrained.
> 
> If it is possible to modify xvmalloc (or possibly the pool creation
> calls from zcache) to juggle multiple pools, one with chunkorder==2,
> one with chunkorder==1, and one with chunkorder=0, with a fallback
> sequence if a higher chunkorder is not available, might that be
> helpful?  Still I worry that the same problems might occur because
> the higher chunkorders might never be available after some time
> passes.
>

To avoid the problem with getting one large set (up to 4 pages) of 
contiguous space, I'm looking into using vm_map_ram() to map
chunks that are multiple noncontiguous pages into a single contiguous 
address space.  I don't know what the overhead is yet.

I do like the idea of having a few pools with different chunk sizes.
 
>> After a little playing around, I've found this isn't entirely trivial to
>> do because of the memory mapping implications; more specifically the use
>> of kmap/kunamp in the xvmalloc and zcache layers.  I've looked into
>> using vmap to map multiple pages into a linear address space, but it
>> seems like there is a lot of memory overhead in doing that.
>>
>> Do you have any feedback on this issue or suggestion solution?
> 
> One neat feature of frontswap (and the underlying Transcendent
> Memory definition) is that ANY PUT may be rejected**.  So zcache
> could keep track of the distribution of "zsize" and if the number
> of pages with zsize>PAGE_SIZE/2 greatly exceeds the number of pages
> with "complementary zsize", the frontswap code in zcache can reject
> the larger pages until balance/sanity is restored.
> 
> Might that help?  

We could do that, but I imagine that would let a lot of pages through 
on most workloads.  Ideally, I'd like to find a solution that would
capture and (efficiently) store pages that compressed to up to 80% of 
their original size.

> If so, maybe your new sysfs value could be
> replaced with the ratio (zv_pool_pages_count/frontswap_curr_pages)
> and this could be _writeable_ to allow the above policy target to
> be modified at runtime.   Even better, the fraction could be
> represented by number-of-bytes ("target_zsize"), which could default
> to something like (3*PAGE_SIZE)/4... if the ratio above
> exceeds target_zsize and the zsize of the page-being-put exceeds
> target_zsize, then the put is rejected.
> 
> Thanks,
> Dan
> 
> ** The "put" shouldn't actually be rejected outright... it should
> be converted to a "flush" so that, if a previous put was
> performed for the matching handle, the space can be reclaimed.
> (Let me know if you need more explanation of this.)

Thanks again for your reply, Dan.  I'll explore this more next week.

--
Seth

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2011-06-23 22:00 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-22 19:15 frontswap/zcache: xvmalloc discussion Seth Jennings
2011-06-22 19:23 ` [PATCH] Add zv_pool_pages_count to zcache sysfs Seth Jennings
2011-06-23 15:38   ` Dave Hansen
2011-06-23 16:38 ` frontswap/zcache: xvmalloc discussion Dan Magenheimer
2011-06-23 21:59   ` Seth Jennings [this message]
2011-06-24 22:40     ` Dan Magenheimer
2011-06-30  2:31       ` Dan Magenheimer
2011-06-30 16:09         ` Dan Magenheimer
2011-06-24  6:11 ` Nitin Gupta
2011-06-24 15:52   ` Dave Hansen
2011-06-25  2:42     ` Nitin Gupta
2011-08-05 16:22   ` Seth Jennings

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E03B75A.9040203@linux.vnet.ibm.com \
    --to=sjenning@linux.vnet.ibm.com \
    --cc=brking@linux.vnet.ibm.com \
    --cc=dan.magenheimer@oracle.com \
    --cc=gregkh@suse.de \
    --cc=linux-mm@kvack.org \
    --cc=ngupta@vflare.org \
    --cc=rcj@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).