All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Alexis Bruemmer <alexisb@us.ibm.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
	Balbir Singh <balbir@in.ibm.com>,
	Badari Pulavarty <pbadari@us.ibm.com>,
	Max Asbock <amax@us.ibm.com>, linux-mm <linux-mm@kvack.org>
Subject: Re: VMA lookup with RCU
Date: Wed, 03 Oct 2007 21:40:29 +0200	[thread overview]
Message-ID: <1191440429.5599.72.camel@lappy> (raw)
In-Reply-To: <1191436672.7103.38.camel@alexis>

On Wed, 2007-10-03 at 11:37 -0700, Alexis Bruemmer wrote:
> Hi Peter,
> 
> Some colleagues and I have been looking at how best to implement
> your latest suggestions for the VMA lookup with RCU patches, however we
> have some questions:
> 
> <snip>
> > My current plan for these patches is:
> > 
> > Fine grain lock the update side of the B+tree
> > do a per-node vma reference, so that vma lookup and lock is:

> How did you envision the implementation here?  More specifically, what
> is the best way to get local copies of B+trees on each node?

Don't copy the whole trees, just keep references to vma's locally. Saves
a lot of traffic:

struct vm_area_struct {

	...

	struct rw_semaphore lock;
	atomic_t refs;
};

struct vma_ref {
	struct vm_area_struct *vma;
	struct rw_semaphore lock;
	int dead;
};

struct vma_ref *
find_get_vma(struct mm_struct *mm, unsigned long addr)
{
	struct vm_area_struct *vma;
	struct vma_ref *ref = NULL;

again:
	rcu_read_lock();
	vma = btree_find(&mm->btree, addr);
	if (!vma)
		goto out_unlock;

	ref = btree_find(node_local_tree(), (unsigned long)vma);
	if (!ref) {
		BTREE_LOCK_CONTEXT(ctx, node_local_tree()); /* see fine grain locked RADIX tree */

		down_read(&vma->lock);	
		if (atomic_read(&vma->refs) < 0) {
			/* vma got destroyed */
			up_read(&vma->lock);
			goto out_unlock;
		}
		atomic_inc(&vma->refs);
		rcu_read_unlock(); /* we got vma->lock, can't escape */

		ref = kmalloc(sizeof(*ref), GFP_KERNEL);
		/* XXX: what to do when fails */

		ref->vma = vma;
		init_rwsem(&ref->mutex);
		ref->dead = 0;

		ret = btree_insert(ctx, (unsigned long)vma, ref);

		if (ret = -EEXIST) {
			kfree(ref);
			goto again;
		}
	}
	
	down_read(&ref->lock);
	if (ref->dead) {
		up_read(&ref->lock);
		rcu_read_unlock();
		goto again;
	}

out_unlock:
	rcu_read_unlock();
out:
	return ref;
}

Something like that, it got big holes in it, but should illustrate the
idea. We index the local reference by the vma address.

> >     lookup in node local tree
> >     if found, take read lock on local reference
> >     if not-found, do global lookup, lock vma, take reference, 
> >                   insert reference into local tree,
> >                   take read lock on it, drop vma lock
> >
> > write lock on the vma would:
> >     find the vma in the global tree, lock it
> >     enqueue work items in a waitqueue that,
> >       find the local ref, lock it (might sleep)
> >       release the reference, unlock and clear from local tree
> >       signal completion
> >     once all nodes have completed we have no outstanding refs
> >     and since we have the lock, we're exclusive.

void invalidate_vma_refs(void *addr)
{
	BTREE_LOCK_CONTEXT(ctx, node_local_tree());

	rcu_read_lock();
	ref = btree_find(node_local_tree, (unsigned long)addr);
	if (!ref)
		goto out_unlock;

	down_write(&ref->lock); /* no more local refs */
	ref->dead = 1;
	atomic_dec(&ref->vma->refs); /* release */
	btree_delete(ctx, (unsigned long)addr); /* unhook */
	rcu_call(free_vma_ref, ref); /* destroy */
	up_write(&ref->lock);

out_unlock:
	rcu_read_unlock();
}

struct vm_area_struct *
write_lock_vma(struct mm *mm, unsigned long addr)
{
	rcu_read_lock();
	vma = btree_find(&mm->btree, addr);
	if (!vma)
		goto out_unlock;

	down_write(&vma->lock); /* no new refs */
	rcu_read_unlock();

	schedule_on_each_cpu(invalidate_vma_refs, vma, 0, 1);

	return vma;

out_unlock:
	rcu_read_unlock();
	return NULL;
}


> Some general questions:
> How does a process and vma know what node it's on?

numa_node_id() ?

just return the ref object, and unlock against that.

> Also, as we discussed before the initial test results on the first set
> of these patches showed really no improvement in performance over the
> vanilla kernel.  What kind of performance gains to you predict with this
> new fine grain lock approach?

Really no idea, the write side is rather heavy, it would need to
invalidate (unsigned long)vma on all cpu's (perhaps track a cpumask of
cpu's that did take a reference). The advantage is that its per vma, not
per mm.

> Finally many local kernel hackers that I have spoke with about this idea
> have expressed great interest in your patch set.  Do you have any
> objections to opening this discussion up to linux-mm? 

done :-)

> Thank you again for your time and help,

no problem at all.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

       reply	other threads:[~2007-10-03 19:40 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <46F01289.7040106@linux.vnet.ibm.com>
     [not found] ` <20070918205419.60d24da7@lappy>
     [not found]   ` <1191436672.7103.38.camel@alexis>
2007-10-03 19:40     ` Peter Zijlstra [this message]
2007-10-03 19:54       ` VMA lookup with RCU Peter Zijlstra
2007-10-04 15:42       ` Vaidyanathan Srinivasan
2007-10-04 17:21         ` Peter Zijlstra
2007-10-07  7:47           ` Nick Piggin
2007-10-08  7:51             ` Peter Zijlstra
2007-10-08  9:32               ` Balbir Singh
2007-10-08 16:51                 ` Vaidyanathan Srinivasan
2007-10-08  8:17                   ` Nick Piggin
2007-10-22  9:54                   ` Vaidyanathan Srinivasan
2007-10-08 17:02             ` Vaidyanathan Srinivasan
2007-10-08 17:11           ` Vaidyanathan Srinivasan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1191440429.5599.72.camel@lappy \
    --to=peterz@infradead.org \
    --cc=alexisb@us.ibm.com \
    --cc=amax@us.ibm.com \
    --cc=balbir@in.ibm.com \
    --cc=linux-mm@kvack.org \
    --cc=pbadari@us.ibm.com \
    --cc=svaidy@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.