From: "Ross Biro" <rossb@google.com>
To: Dave Hansen <haveblue@us.ibm.com>
Cc: linux-mm@kvack.org
Subject: Re: RFC/POC Make Page Tables Relocatable
Date: Thu, 25 Oct 2007 13:40:07 -0400 [thread overview]
Message-ID: <d43160c70710251040u23feeaf9l16fafc2685b2ce52@mail.gmail.com> (raw)
In-Reply-To: <1193330774.4039.136.camel@localhost>
On 10/25/07, Dave Hansen <haveblue@us.ibm.com> wrote:
> On Thu, 2007-10-25 at 11:16 -0400, Ross Biro wrote:
> > 1) Add a separate meta-data allocation to the slab and slub allocator
> > and allocate full pages through kmem_cache_alloc instead of get_page.
> > The primary motivation of this is that we could shrink struct page by
> > using kmem_cache_alloc to allocate whole pages and put the supported
> > data in the meta_data area instead of struct page.
>
> The idea seems cool, but I think I'm missing a lot of your motivation
> here.
>
> First of all, which meta-data, exactly, is causing 'struct page' to be
> larger than it could be? Which meta-data can be moved?
Almost all of it. Most of struct page isn't about the kernel manging
pages in general, but about managing particular types of pages.
Although it's been cleaned up over the years, there are still
some things:
union {
atomic_t _mapcount; /* Count of ptes mapped in mms,
* to show when page is mapped
* & limit reverse map searches.
*/
struct { /* SLUB uses */
short unsigned int inuse;
short unsigned int offset;
};
};
mapcount is only used when the page is mapped via a pte, while the
other part is only used when the page is part of a SLUB cache.
Neither of which is always true and not 100% needed as part of struct
page. There is just currently no better place to put them. The rest
of the unions don't really belong in struct page. Similarly the lru
list only applies to pages which could go on the lru list. So why not
make a better place to put them.
>
> > 2) Add support for relocating memory allocated via kmem_cache_alloc.
> > When a cache is created, optional relocation information can be
> > provided. If a relocation function is provided, caches can be
> > defragmented and overall memory consumption can be reduced.
>
> We may truly need this some day, but I'm not sure we need it for
> pagetables. If I were a stupid, naive kernel developer and I wanted to
I chose to start with page tables because I figured they would be the
hardest to properly relocate.
> get a pte page back, I might simply hold the page table lock, walk the
> pagetables to the pmd, lock and invalidate the pmd, copy the pagetable
> contents into a new page, update the pmd, and be on my merry way. Why
> doesn't this work? I'm just fishing for a good explanation why we need
> all the slab silliness.
This would almost work, but to do it properly, you find you'll need
some more locks and a couple of extra pointers and such. With out all
the slab silliness you would need to add them to struct page. It would
have needlessly bloated struct page hence the previous change. I've
also managed to convince myself that using the slab/slub allocator
will tend to clump the page tables together which should reduce
fragmentation and make more memory available for huge pages. In fact,
I've got this idea that by using slab/slub, we can stop allocating
individual pages and only allocate huge pages on systems that have
them.
>
> I applaud you for posting early and posting often, but there is an
> absolute ton of code in your patch. For your subsequent postings, I'd
> highly recommend trying to break it up in some logical ways. Your 4
> steps would be an excellent start.
I don't think any of the four changes stand on their own, but only
when you see them together. If there is enough agreement in principle
to go forward, then for real patches you are correct. Remember, that
patch was only meant as a proof of concept.
> You might also want to run checkpatch.pl on your patch. It has some
> style issues that also need to get worked out.
That patch isn't meant to be applied, but is there because it's easier
to point to code to try to explain what I'm mean than to explain in
words. I didn't think a few style issues would be an issue. And just
to reiterate, if you actually use the code I posted, you get what you
deserve. It was only meant to illustrate what I'm trying to say.
Ross
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-10-25 17:40 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-25 15:16 RFC/POC Make Page Tables Relocatable Ross Biro
2007-10-25 16:46 ` Dave Hansen
2007-10-25 17:40 ` Ross Biro [this message]
2007-10-25 18:08 ` Dave Hansen
2007-10-25 18:44 ` Ross Biro
2007-10-25 18:47 ` Dave Hansen
2007-10-25 19:23 ` Dave Hansen
2007-10-25 19:53 ` Ross Biro
2007-10-25 19:56 ` Dave Hansen
2007-10-25 19:58 ` Ross Biro
2007-10-25 20:15 ` Dave Hansen
2007-10-25 20:00 ` Dave Hansen
2007-10-25 20:10 ` Ross Biro
2007-10-25 20:20 ` Dave Hansen
2007-10-26 16:10 ` Mel Gorman
2007-10-26 16:51 ` Ross Biro
2007-10-26 17:11 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d43160c70710251040u23feeaf9l16fafc2685b2ce52@mail.gmail.com \
--to=rossb@google.com \
--cc=haveblue@us.ibm.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).