From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from zps76.corp.google.com (zps76.corp.google.com [172.25.146.76]) by smtp-out.google.com with ESMTP id l9PIi32F002645 for ; Thu, 25 Oct 2007 11:44:03 -0700 Received: from nf-out-0910.google.com (nfdg16.prod.google.com [10.48.133.16]) by zps76.corp.google.com with ESMTP id l9PIfTN3006084 for ; Thu, 25 Oct 2007 11:44:02 -0700 Received: by nf-out-0910.google.com with SMTP id g16so546025nfd for ; Thu, 25 Oct 2007 11:44:02 -0700 (PDT) Message-ID: Date: Thu, 25 Oct 2007 14:44:02 -0400 From: "Ross Biro" Subject: Re: RFC/POC Make Page Tables Relocatable In-Reply-To: <1193335725.24087.19.camel@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <1193330774.4039.136.camel@localhost> <1193335725.24087.19.camel@localhost> Sender: owner-linux-mm@kvack.org Return-Path: To: Dave Hansen Cc: linux-mm@kvack.org, Mel Gorman List-ID: On 10/25/07, Dave Hansen wrote: > Right, but we're talking about pagetable pages here, right? What fields > in 'struct page' are used by pagetable pages, but will allow 'struct > page' to shrink in size if pagetables pages stop using them? At the moment we are only talking about page tables, but I hope in the future to do more. Perhaps page tables were a bad place to start, but like I said I thought they would be the hardest, and hence a good place to start. > > On a more general note: so it's all about saving memory in the end? Sort of. As you pointed out, right now struct page is pretty well tuned, but it's also not easy to add something. If I need an extra pointer or two to do something, then I more or less totally trash all sorts of efficiencies we are getting now. It's about having flexibility without losing efficeincy. > > I dunno. I'm highly skeptical this can work. Skepticism is good. I think I can pull it off, and possibly even make it more efficient. The big gotcha is going to be caching issues if we end up bouncing back and forth between struct page and the new meta data, we might end up being toast. On the other hand, if we are looking at multiple pages and we fit multiple smaller structures into a cache line, we might still win. This is why I asked for some micro benchmarks. I figured people would send the ones they feel are most likely to fail. Remember, this change doesn't stand on it's own. In a vacuum, I don't think this change is worth doing at all. But it enables the other changes and a lot more going forward. > > > get a pte page back, I might simply hold the page table lock, walk the > > > pagetables to the pmd, lock and invalidate the pmd, copy the pagetable > > > contents into a new page, update the pmd, and be on my merry way. Why > > > doesn't this work? I'm just fishing for a good explanation why we need > > > all the slab silliness. > > > > This would almost work, but to do it properly, you find you'll need > > some more locks and a couple of extra pointers and such. > > Could you be specific? Well to go quickly from an arbitrary page that happens to be part of a page table to the appropriate mm to get a lock, I had to store a pointer to the mm. Then I also needed to know where the particular page fit into the page table tree. Once I had those, it turned out I needed a spinlock to protect them to deallocate the page with out racing against the relocation. I think I could have used the ptl lock struct page, but I wasn't really clear on it when I started. So I needed 2 pointers which I could have squeezed into struct page somewhere, but then what about when I needed a third or forth pointer to make something else work well? I'm pretty sure I can clean up some of the tlb flushing and make all levels of the page tables relocatable with out a problem by adding another flag. Of course, I could put a flag into the page flags, but it doesn't take long to run out of flag space. The meta data change we are talking about above is to make the code flexible enough to support things like this with out killing performance. Your argument against the meta data change above is that it will kill performance. I don't think so, but I could be wrong. However, if the only objection is that it will kill performance, then it's worth doing and running some benchmarks. If it turns out I'm correct and it's a win or not a big loss from a performance point of view, then it goes in. If not, it doesn't. > > You may want to have a talk with Mel about memory fragmentation, and > whether there is any lower hanging fruit (cc'd). :) I usually like to go for the high hanging fruit with the idea if I do that well, the low hanging fruit becomes a cake walk. However, any input on this is welcome. > > your code. The posted patch is hard to understand in some areas because > of indenting bracketing. If you'd like people to read, review, and give > suggestions on what they see, I'd suggest trying to make it as easy as I'm sorry about that. It must have happened when I hand applied the patch to 2.6.23 (it was developed under 2.6.22). I should have had emacs reflow all the changes after deleting all the +'s that diff sticks in front of the lines. Ross -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org