linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Alexis Bruemmer <alexisb@us.ibm.com>,
	Balbir Singh <balbir@in.ibm.com>,
	Badari Pulavarty <pbadari@us.ibm.com>,
	Max Asbock <amax@us.ibm.com>, linux-mm <linux-mm@kvack.org>,
	Bharata B Rao <bharata@in.ibm.com>
Subject: Re: VMA lookup with RCU
Date: Thu, 04 Oct 2007 21:12:45 +0530	[thread overview]
Message-ID: <470509F5.4010902@linux.vnet.ibm.com> (raw)
In-Reply-To: <1191440429.5599.72.camel@lappy>

Peter Zijlstra wrote:
>>>     lookup in node local tree
>>>     if found, take read lock on local reference
>>>     if not-found, do global lookup, lock vma, take reference, 
>>>                   insert reference into local tree,
>>>                   take read lock on it, drop vma lock
>>>
>>> write lock on the vma would:
>>>     find the vma in the global tree, lock it
>>>     enqueue work items in a waitqueue that,
>>>       find the local ref, lock it (might sleep)
>>>       release the reference, unlock and clear from local tree
>>>       signal completion
>>>     once all nodes have completed we have no outstanding refs
>>>     and since we have the lock, we're exclusive.
> 
> void invalidate_vma_refs(void *addr)
> {
> 	BTREE_LOCK_CONTEXT(ctx, node_local_tree());
> 
> 	rcu_read_lock();
> 	ref = btree_find(node_local_tree, (unsigned long)addr);
> 	if (!ref)
> 		goto out_unlock;
> 
> 	down_write(&ref->lock); /* no more local refs */
> 	ref->dead = 1;
> 	atomic_dec(&ref->vma->refs); /* release */
> 	btree_delete(ctx, (unsigned long)addr); /* unhook */
> 	rcu_call(free_vma_ref, ref); /* destroy */
> 	up_write(&ref->lock);
> 
> out_unlock:
> 	rcu_read_unlock();
> }
> 
> struct vm_area_struct *
> write_lock_vma(struct mm *mm, unsigned long addr)
> {
> 	rcu_read_lock();
> 	vma = btree_find(&mm->btree, addr);
> 	if (!vma)
> 		goto out_unlock;
> 
> 	down_write(&vma->lock); /* no new refs */
> 	rcu_read_unlock();
> 
> 	schedule_on_each_cpu(invalidate_vma_refs, vma, 0, 1);
> 
> 	return vma;
> 
> out_unlock:
> 	rcu_read_unlock();
> 	return NULL;
> }
> 
> 

Hi Peter,

Making node local copies of VMA is a good idea to reduce inter-node
traffic, but the cost of search and delete is very high.  Also, as you have
pointed out, if the atomic operations happen on remote node due to
scheduler migrating our thread, then all the cycles saved may be lost.

In find_get_vma() cross node traffic is due to btree traversal or the
actual VMA object reference?  Can we look at duplicating the btree
structure per node and have VMA structures just one copy and make all
btrees in each node point to the same vma object.  This will make write
operation and deletion of btree entries on all nodes little simple.  All
VMA lists will be unique and not duplicated.

Another related idea is to move the VMA object to node local memory.  Can
we migrate the VMA object to the node where it is referenced the most?  We
still maintain only _one_ copy of VMA object.  No data duplication, but we
can move the memory around to make it node local.

Some more thoughts:

Pagefault handler does most of the find_get_vma() to validate user address
and then create page table entries (allocate page frames)... can we make
the page fault handler run on the node where the VMAs have been allocated?
 The CPU that has page-faulted need not necessarily do all the find_vma()
calls and update the page table.  The process can sleep while another CPU
_near_ to the memory containing VMAs and pagetable can do the job with
local memory references.

I don't know if the page tables for the faulting process is allocated in
node local memory.

Per CPU last vma cache:  Currently we have the last vma referenced in a one
entry cache in mm_struct.  Can we have this cache per CPU or per node so
that a multi threaded application can have node/cpu local cache of last vma
referenced.  This may reduce btree/rbtree traversal.  Let the hardware
cache maintain the corresponding VMA object and its coherency.

Please let me know your comment and thoughts.

Thanks,
Vaidy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2007-10-04 15:44 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <46F01289.7040106@linux.vnet.ibm.com>
     [not found] ` <20070918205419.60d24da7@lappy>
     [not found]   ` <1191436672.7103.38.camel@alexis>
2007-10-03 19:40     ` VMA lookup with RCU Peter Zijlstra
2007-10-03 19:54       ` Peter Zijlstra
2007-10-04 15:42       ` Vaidyanathan Srinivasan [this message]
2007-10-04 17:21         ` Peter Zijlstra
2007-10-07  7:47           ` Nick Piggin
2007-10-08  7:51             ` Peter Zijlstra
2007-10-08  9:32               ` Balbir Singh
2007-10-08 16:51                 ` Vaidyanathan Srinivasan
2007-10-08  8:17                   ` Nick Piggin
2007-10-22  9:54                   ` Vaidyanathan Srinivasan
2007-10-08 17:02             ` Vaidyanathan Srinivasan
2007-10-08 17:11           ` Vaidyanathan Srinivasan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=470509F5.4010902@linux.vnet.ibm.com \
    --to=svaidy@linux.vnet.ibm.com \
    --cc=alexisb@us.ibm.com \
    --cc=amax@us.ibm.com \
    --cc=balbir@in.ibm.com \
    --cc=bharata@in.ibm.com \
    --cc=linux-mm@kvack.org \
    --cc=pbadari@us.ibm.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).