Re: RFC/POC Make Page Tables Relocatable

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: mel@skynet.ie (Mel Gorman)
To: Dave Hansen <haveblue@us.ibm.com>
Cc: Ross Biro <rossb@google.com>,
	linux-mm@kvack.org, Mel Gorman <MELGOR@ie.ibm.com>
Subject: Re: RFC/POC Make Page Tables Relocatable
Date: Fri, 26 Oct 2007 17:10:07 +0100	[thread overview]
Message-ID: <20071026161007.GA19443@skynet.ie> (raw)
In-Reply-To: <1193335725.24087.19.camel@localhost>

On (25/10/07 11:08), Dave Hansen didst pronounce:
> On Thu, 2007-10-25 at 13:40 -0400, Ross Biro wrote: 
> > On 10/25/07, Dave Hansen <haveblue@us.ibm.com> wrote:
> > > On Thu, 2007-10-25 at 11:16 -0400, Ross Biro wrote:
> > > > 1) Add a separate meta-data allocation to the slab and slub allocator
> > > > and allocate full pages through kmem_cache_alloc instead of get_page.
> > > > The primary motivation of this is that we could shrink struct page by
> > > > using kmem_cache_alloc to allocate whole pages and put the supported
> > > > data in the meta_data area instead of struct page.
> > >
> > > The idea seems cool, but I think I'm missing a lot of your motivation
> > > here.
> > >
> > > First of all, which meta-data, exactly, is causing 'struct page' to be
> > > larger than it could be?  Which meta-data can be moved?
> > 
> > Almost all of it.  Most of struct page isn't about the kernel manging
> > pages in general, but about managing particular types of pages.
> > Although it's been cleaned up over the years, there are still
> > some things:
> > 
> >         union {
> >                 atomic_t _mapcount;     /* Count of ptes mapped in mms,
> >                                          * to show when page is mapped
> >                                          * & limit reverse map searches.
> >                                          */
> >                 struct {        /* SLUB uses */
> >                         short unsigned int inuse;
> >                         short unsigned int offset;
> >                 };
> >         };
> > 
> > mapcount is only used when the page is mapped via a pte, while the
> > other part is only used when the page is part of a SLUB cache.
> > Neither of which is always true and not 100% needed as part of struct
> > page.  There is just currently no better place to put them.  The rest
> > of the unions don't really belong in struct page.  Similarly the lru
> > list only applies to pages which could go on the lru list.  So why not
> > make a better place to put them.
> 
> Right, but we're talking about pagetable pages here, right?  What fields
> in 'struct page' are used by pagetable pages, but will allow 'struct
> page' to shrink in size if pagetables pages stop using them?
> 
> On a more general note: so it's all about saving memory in the end?
> Making 'struct page' smaller?  If I were you, I'd be very conerned about
> the pathological cases.  We may get the lru pointers out of 'struct
> page', so we'll need some parallel lookup to get from physical page to
> LRU, right?   Although the bootup footprint of mem_map[] (and friends)
> smaller, what happens on a machine with virtually all its memory used by
> pages on the LRU (which I would guess is actually quite common).  Will
> the memory footprint even be close to the two pointers per physical page
> that it cost us for the current implementation?
> 
> That doesn't even consider the runtime overhead of such a scheme.  Right
> now, if you touch any part of 'struct page' on a 32-bit machine, you
> generally bring the entire thing into a single cacheline.  Every other
> subsequent access is essentially free.  Any ideas what the ballpark
> number of cachelines are that would have to be brought in with another
> lookup method for 'struct page' to lru?
> 
> I dunno.  I'm highly skeptical this can work.
> 
> I've heard rumors in the past that the Windows' 'struct page' is much
> smaller than the Linux one.  But, I've also heard that this weighs
> heavily in other areas such as page reclamation.  Could be _completely_
> bogus, but it might be worth a search or two to see if there have been
> any papers on the subject.  
> 
> > > get a pte page back, I might simply hold the page table lock, walk the
> > > pagetables to the pmd, lock and invalidate the pmd, copy the pagetable
> > > contents into a new page, update the pmd, and be on my merry way.  Why
> > > doesn't this work?  I'm just fishing for a good explanation why we need
> > > all the slab silliness.
> > 
> > This would almost work, but to do it properly, you find you'll need
> > some more locks and a couple of extra pointers and such.
> 
> Could you be specific?
> 
> > With out all
> > the slab silliness you would need to add them to struct page. It would
> > have needlessly bloated struct page hence the previous change.  I've
> > also managed to convince myself that using the slab/slub allocator
> > will tend to clump the page tables together which should reduce
> > fragmentation and make more memory available for huge pages.  In fact,
> > I've got this idea that by using slab/slub, we can stop allocating
> > individual pages and only allocate huge pages on systems that have
> > them.
> 
> You may want to have a talk with Mel about memory fragmentation, and
> whether there is any lower hanging fruit (cc'd). :)
> 

I suspect this might be overkill from a memory fragmentation
perspective. When grouping pages by mobility, page table pages are
currently considered MIGRATE_UNMOVABLE. From what I have seen, they are
by far the most common unmovable allocation. If they were relocatable
with the standard page migration mechanism, they could be considered
MIGRATE_MOVABLE and external fragmentation would be easier to content
with.

I haven't looked closely enough at this patch to know if moving page
table pages with page migration is the aim or not.

However, using huge pages just all slabs does not feel like a great
idea. There will be a lot memory wasted due to internal fragmentation
and systems with less memory are not going to want to commit a hugepage
for a small slab allocation.

> > > You might also want to run checkpatch.pl on your patch.  It has some
> > > style issues that also need to get worked out.
> > 
> > That patch isn't meant to be applied, but is there because it's easier
> > to point to code to try to explain what I'm mean than to explain in
> > words.  I didn't think a few style issues would be an issue.  And just
> > to reiterate, if you actually use the code I posted, you get what you
> > deserve.  It was only meant to illustrate what I'm trying to say.
> 
> In general, the reason to run such a script (and to have coding
> standards in the first place) is so that others can more easily read
> your code.  The posted patch is hard to understand in some areas because
> of indenting bracketing.  If you'd like people to read, review, and give
> suggestions on what they see, I'd suggest trying to make it as easy as
> possible to understand.
> 
> Check out Documentation/CodingStyle.  
> 
> -- Dave
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2007-10-26 16:10 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-25 15:16 RFC/POC Make Page Tables Relocatable Ross Biro
2007-10-25 16:46 ` Dave Hansen
2007-10-25 17:40   ` Ross Biro
2007-10-25 18:08     ` Dave Hansen
2007-10-25 18:44       ` Ross Biro
2007-10-25 18:47         ` Dave Hansen
2007-10-25 19:23         ` Dave Hansen
2007-10-25 19:53           ` Ross Biro
2007-10-25 19:56             ` Dave Hansen
2007-10-25 19:58             ` Ross Biro
2007-10-25 20:15               ` Dave Hansen
2007-10-25 20:00             ` Dave Hansen
2007-10-25 20:10               ` Ross Biro
2007-10-25 20:20                 ` Dave Hansen
2007-10-26 16:10       ` Mel Gorman [this message]
2007-10-26 16:51         ` Ross Biro
2007-10-26 17:11           ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071026161007.GA19443@skynet.ie \
    --to=mel@skynet.ie \
    --cc=MELGOR@ie.ibm.com \
    --cc=haveblue@us.ibm.com \
    --cc=linux-mm@kvack.org \
    --cc=rossb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).