Re: frontswap/zcache: xvmalloc discussion

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Nitin Gupta <ngupta@vflare.org>
To: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: linux-mm <linux-mm@kvack.org>,
	Dan Magenheimer <dan.magenheimer@oracle.com>,
	Robert Jennings <rcj@linux.vnet.ibm.com>,
	Brian King <brking@linux.vnet.ibm.com>,
	Greg Kroah-Hartman <gregkh@suse.de>
Subject: Re: frontswap/zcache: xvmalloc discussion
Date: Thu, 23 Jun 2011 23:11:16 -0700	[thread overview]
Message-ID: <4E042A84.5010204@vflare.org> (raw)
In-Reply-To: <4E023F61.8080904@linux.vnet.ibm.com>

Hi Seth,

On 06/22/2011 12:15 PM, Seth Jennings wrote:

>
> The real problem here is compressing pages of size x and storing them in
> a pool that has "chunks", if you will, also of size x, where allocations
> can't span multiple chunks. Ideally, I'd like to address this issue by
> expanding the size of the xvmalloc pool chunks from one page to four
> pages (I can explain why four is a good number, just didn't want to make
> this note too long).
>
> After a little playing around, I've found this isn't entirely trivial to
> do because of the memory mapping implications; more specifically the use
> of kmap/kunamp in the xvmalloc and zcache layers. I've looked into using
> vmap to map multiple pages into a linear address space, but it seems
> like there is a lot of memory overhead in doing that.
>
> Do you have any feedback on this issue or suggestion solution?
>

xvmalloc fragmentation issue has been reported by several zram users and 
quite some time back I started working on a new allocator (xcfmalloc) 
which potentially solves many of these issues. However, all of the 
details are currently on paper and I'm sure actual implementation will 
bring a lot of surprises.

Currently, xvmalloc wastes memory due to:
  - No compaction support: Each page can store chunks of any size which 
makes compaction really hard to implement.
  - Use of 0-order pages only: This was enforced to avoid memory 
allocation failures. As Dan pointed out, any higher order allocation is 
almost guaranteed to fail under memory pressure.

To solve these issues, xcfmalloc:
  - Supports compaction: Its size class based (like SLAB) which, among 
other things, simplifies compaction.
  - Supports higher order pages using little trickery:

For 64-bit systems, we can simply use vmalloc(16k or 64k) pages and 
never bother unmapping them. This is expensive (how much?) in terms of 
both CPU and memory but easy to implement.

But on 32-bit (almost all "embedded" devices), this ofcourse cannot be 
done. For this case, the plan is to create a "vpage" abstraction which 
can be treated as usual higher-order page.

vpage abstraction:
  - Allocate 0-order pages and maintain them in an array
  - Allow a chunk to cross at most one 4K (or whatever is the native 
PAGE_SIZE) page boundary. This limits maximum allocation size to 4K but 
simplifies mapping logic.
  - A vpage is assigned a specific size class just like usual SLAB. This 
will simplify compaction.
  - xcfmalloc() will return a object handle instead of a direct pointer.
  - Provide xcfmalloc_{map,unmap}() which will handle the case where a 
chunk spans two pages. It will map the pages using kmap_atomic() and 
thus user will be expected to unmap them soon.
  - Allow vpage to be "partially freed" i.e. empty 4K pages can be freed 
individually if completely empty.

Much of this vpage functionality seems to be already present in mainline 
as "flexible arrays"[1]

For scalability, we can simply go for per-cpu lists and use Hoard[2] 
like design to bound fragmentation associated with such per-cpu slabs.

Unfortunately, I'm currently too loaded to work on this, atleast for 
next 2 months (internship) but would be glad to contribute if someone is 
willing to work on this.

[1] http://lxr.linux.no/linux+v2.6.39/Documentation/flexible-arrays.txt
[2] Hoard allocator: 
http://www.cs.umass.edu/~emery/pubs/berger-asplos2000.pdf

Thanks,
Nitin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2011-06-24  6:12 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-22 19:15 frontswap/zcache: xvmalloc discussion Seth Jennings
2011-06-22 19:23 ` [PATCH] Add zv_pool_pages_count to zcache sysfs Seth Jennings
2011-06-23 15:38   ` Dave Hansen
2011-06-23 16:38 ` frontswap/zcache: xvmalloc discussion Dan Magenheimer
2011-06-23 21:59   ` Seth Jennings
2011-06-24 22:40     ` Dan Magenheimer
2011-06-30  2:31       ` Dan Magenheimer
2011-06-30 16:09         ` Dan Magenheimer
2011-06-24  6:11 ` Nitin Gupta [this message]
2011-06-24 15:52   ` Dave Hansen
2011-06-25  2:42     ` Nitin Gupta
2011-08-05 16:22   ` Seth Jennings

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E042A84.5010204@vflare.org \
    --to=ngupta@vflare.org \
    --cc=brking@linux.vnet.ibm.com \
    --cc=dan.magenheimer@oracle.com \
    --cc=gregkh@suse.de \
    --cc=linux-mm@kvack.org \
    --cc=rcj@linux.vnet.ibm.com \
    --cc=sjenning@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.