Re: [RFC] RCU Judy array with distributed locking for FS extents

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Chris Mason <clmason@fusionio.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Dave Chinner <dchinner@redhat.com>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	David Woodhouse <David.Woodhouse@intel.com>,
	"bo.li.liu@oracle.com" <bo.li.liu@oracle.com>,
	"rp@svcs.cs.pdx.edu" <rp@svcs.cs.pdx.edu>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	Stephen Hemminger <shemminger@vyatta.com>,
	Alan Stern <stern@rowland.harvard.edu>
Subject: Re: [RFC] RCU Judy array with distributed locking for FS extents
Date: Wed, 12 Jun 2013 21:25:49 -0400	[thread overview]
Message-ID: <20130613012549.4914.75883@localhost.localdomain> (raw)
In-Reply-To: <20130612011231.GA3975@Krystal>

Quoting Mathieu Desnoyers (2013-06-11 21:12:31)
> * Chris Mason (clmason@fusionio.com) wrote:
> [...]
> > Ouch, ok.  In private email yesterday I talked with Mathieu about how
> > his current setup can't prevent the concurrent insertion of overlapping
> > extents.  He does have a plan to address this where the insertion is
> > synchronized by keeping placeholders in the tree for the free space.  I
> > think it'll work, but I'm worried about doubling the cost of the insert.
> 
> Hi Chris,
> 
> The weekend and early week has been productive on my side. My updated
> work is available on this new branch:
> 
> git://git.lttng.org/userspace-rcu.git
> branch: urcu/rcuja-range
> 
> Since last week, I managed to:
> 
> - expand the RCU Judy Array API documentation:
>   https://git.lttng.org/?p=userspace-rcu.git;a=blob;f=urcu/rcuja.h;h=82e272bd4ede1aec436845aef287754dd1dab8b6;hb=03a50ae89ec4d7f39e91d0d49c4639c4cf6e894c

Nice

> 
> - create an API for Judy Array Ranges, as discussed via email privately:
> 
> API:
> https://git.lttng.org/?p=userspace-rcu.git;a=blob;f=urcu/rcuja-range.h;h=63035a1660888aa5f9b20548046571dcb54ad193;hb=03a50ae89ec4d7f39e91d0d49c4639c4cf6e894c
> 
> Implementation:
> https://git.lttng.org/?p=userspace-rcu.git;a=blob;f=rcuja/rcuja-range.c;h=7e4585ef942d76f1811f3c958fff3138ac120ca3;hb=03a50ae89ec4d7f39e91d0d49c4639c4cf6e894c
> 
> Please keep in mind that this code has only been moderately
> stress-tested (with up to 24 cores, on small keyspaces of 3, 5, 10, 100
> keys, so races occur much more frequently). It should not be considered
> production-ready yet.

Ok, I'll definitely take a look.

> 
> The test code (and thus examples usage) is available here:
> https://git.lttng.org/?p=userspace-rcu.git;a=blob;f=tests/test_urcu_ja_range.c;h=12abcc51465b64a7124fb3e48a2150e225e145af;hb=03a50ae89ec4d7f39e91d0d49c4639c4cf6e894c
> https://git.lttng.org/?p=userspace-rcu.git;a=blob;f=tests/test_urcu_ja_range.h;h=e9bbdbc3ed7eb8f57e30c26b8789ba609a6bfdd9;hb=03a50ae89ec4d7f39e91d0d49c4639c4cf6e894c
> 
> So far, my benchmarks shows near-linear read-side scalability (as
> expected from RCU). However, early results does not show the scalability
> I would have expected for concurrent updates. It's not as bad as, e.g.,
> a global lock making performances crawl due to ping-pong between
> processors, but so far, roughly speaking, if I multiply the number of
> cores doing updates by e.g. 12, the per-core throughput of update
> stress-test gets divided by approximately 12. Therefore, the number of
> updates system-wide seems to stay constant as we increase the number of
> cores. I will try to get more info as I dig into more benchmarking,
> which may point at some memory-throughput bottlenecks.

We're benchmarking different workloads, and I'm not sure how much of the
scalability difference is from being in the kernel.  One test I have
here is a batch of deletion and reinsertion of keys at random.

I'm running on a key space of 10 million keys.  If I run the same number
of random operations on 100,000 keys I get similar (but slightly faster)
numbers:

100,000 random insertion and deletions batches:

skiplist: 3.01s per thread
rbtree:   2.1s  per thread

With 16 threads:

skiplist: 5.8s per thread
rbtree:   ~70s per thread (ranges from 15s to 76s)

The random part is crucial for scaling with the skiplists.  The locks
are per node, and as long as all the threads are working in different
places things scale fairly well. 

> 
> I stopped working on the range implementation thinking that I should
> wait to get some feedback before I start implementing more complex
> features like RCU-friendly range resize.

I really wanted to send out my code this morning, but I also wanted to
match rbtrees single threaded first.  It's much closer now, so I'm
commenting and cleaning up what I have for posting tomorrow.

I'll talk with Liu Bo about putting the skiplists under LGPL, but I'd
love some help getting numbers against librcu.

-chris

next prev parent reply	other threads:[~2013-06-13  1:25 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-03  5:27 [RFC] RCU Judy array with distributed locking for FS extents Mathieu Desnoyers
2013-06-03 12:40 ` Chris Mason
2013-06-03 12:46   ` Mathieu Desnoyers
2013-06-03 13:07     ` Chris Mason
2013-06-03 13:50       ` Mathieu Desnoyers
2013-06-04 11:54     ` Dave Chinner
2013-06-04 14:21       ` Chris Mason
2013-06-04 18:57         ` Mathieu Desnoyers
2013-06-05 23:48         ` Dave Chinner
2013-06-12  1:12         ` Mathieu Desnoyers
2013-06-13  1:25           ` Chris Mason [this message]
2013-06-16 14:02             ` Liu Bo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130613012549.4914.75883@localhost.localdomain \
    --to=clmason@fusionio.com \
    --cc=David.Woodhouse@intel.com \
    --cc=bo.li.liu@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=rp@svcs.cs.pdx.edu \
    --cc=shemminger@vyatta.com \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).