From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from sd0109e.au.ibm.com (d23rh905.au.ibm.com [202.81.18.225]) by e23smtp02.au.ibm.com (8.13.1/8.13.1) with ESMTP id l98HC5JM025462 for ; Tue, 9 Oct 2007 03:12:05 +1000 Received: from d23av04.au.ibm.com (d23av04.au.ibm.com [9.190.235.139]) by sd0109e.au.ibm.com (8.13.8/8.13.8/NCO v8.5) with ESMTP id l98HFcvb099270 for ; Tue, 9 Oct 2007 03:15:38 +1000 Received: from d23av04.au.ibm.com (loopback [127.0.0.1]) by d23av04.au.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l98HBl3N008788 for ; Tue, 9 Oct 2007 03:11:47 +1000 Message-ID: <470A64C8.1030801@linux.vnet.ibm.com> Date: Mon, 08 Oct 2007 22:41:36 +0530 From: Vaidyanathan Srinivasan MIME-Version: 1.0 Subject: Re: VMA lookup with RCU References: <46F01289.7040106@linux.vnet.ibm.com> <20070918205419.60d24da7@lappy> <1191436672.7103.38.camel@alexis> <1191440429.5599.72.camel@lappy> <470509F5.4010902@linux.vnet.ibm.com> <1191518486.5574.24.camel@lappy> In-Reply-To: <1191518486.5574.24.camel@lappy> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Peter Zijlstra Cc: Alexis Bruemmer , Balbir Singh , Badari Pulavarty , Max Asbock , linux-mm , Bharata B Rao , Nick Piggin List-ID: Peter Zijlstra wrote: > On Thu, 2007-10-04 at 21:12 +0530, Vaidyanathan Srinivasan wrote: >> Peter Zijlstra wrote: >> Hi Peter, >> >> Making node local copies of VMA is a good idea to reduce inter-node >> traffic, but the cost of search and delete is very high. Also, as you have >> pointed out, if the atomic operations happen on remote node due to >> scheduler migrating our thread, then all the cycles saved may be lost. >> >> In find_get_vma() cross node traffic is due to btree traversal or the >> actual VMA object reference? > > Not sure, I'm not sure how to profile cacheline transfers. I asked around and found that oprofile can give cache misses. But associating it with find_get_vma() is the problem. > The outlined approach would try to keep all accesses read-only, so that > the cacheline can be shared. But yeah, once it get evicted it needs to > be re-transfered. >> Can we look at duplicating the btree >> structure per node and have VMA structures just one copy and make all >> btrees in each node point to the same vma object. This will make write >> operation and deletion of btree entries on all nodes little simple. All >> VMA lists will be unique and not duplicated. > > But that would end up with a 2d tree, (mm, vma) in which you can try to > find an exact match for a given (mm, address) key. > > Trouble with multi-dimensional trees is the balancing thing, afaik its > an np-hard problem. Not a good idea then :) >> Another related idea is to move the VMA object to node local memory. Can >> we migrate the VMA object to the node where it is referenced the most? We >> still maintain only _one_ copy of VMA object. No data duplication, but we >> can move the memory around to make it node local. > > I guess we can do that, is you take the vma lock in exclusive mode, you > can make a copy of the object, replace the tree pointer, mark the old > one dead (so that rcu lookups with re-try) and rcu_free the old one. So worth a try. I will pick this up. >> Some more thoughts: >> >> Pagefault handler does most of the find_get_vma() to validate user address >> and then create page table entries (allocate page frames)... can we make >> the page fault handler run on the node where the VMAs have been allocated? > > explicit migration - like migrate_disable() - make load balancing a very > hard problem. >> The CPU that has page-faulted need not necessarily do all the find_vma() >> calls and update the page table. The process can sleep while another CPU >> _near_ to the memory containing VMAs and pagetable can do the job with >> local memory references. > > would we not end up with remote page tables? Remote pagetable update is less costly than remote vma lookup. We will have to balance between the two. >> I don't know if the page tables for the faulting process is allocated in >> node local memory. >> >> Per CPU last vma cache: Currently we have the last vma referenced in a one >> entry cache in mm_struct. Can we have this cache per CPU or per node so >> that a multi threaded application can have node/cpu local cache of last vma >> referenced. This may reduce btree/rbtree traversal. Let the hardware >> cache maintain the corresponding VMA object and its coherency. >> >> Please let me know your comment and thoughts. > > Nick Piggin (and I think Eric Dumazet) had nice patches for this. I > think they were posted in the private futex thread. Good. I would like to try them out. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org