From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Chris Mason <clmason@fusionio.com>
Cc: Linux FS Devel <linux-fsdevel@vger.kernel.org>,
David Woodhouse <David.Woodhouse@intel.com>,
"dchinner@redhat.com" <dchinner@redhat.com>,
"bo.li.liu@oracle.com" <bo.li.liu@oracle.com>,
"rp@svcs.cs.pdx.edu" <rp@svcs.cs.pdx.edu>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Lai Jiangshan <laijs@cn.fujitsu.com>,
Stephen Hemminger <shemminger@vyatta.com>,
Alan Stern <stern@rowland.harvard.edu>
Subject: Re: [PATCH RFC 0/2] rcu skiplists v2
Date: Wed, 26 Jun 2013 19:02:18 -0400 [thread overview]
Message-ID: <20130626230218.GA4002@Krystal> (raw)
In-Reply-To: <20130616145612.4914.3009@localhost.localdomain>
Hi Chris,
* Chris Mason (clmason@fusionio.com) wrote:
> Hi everyone,
>
> This is another iteration of my skiplists patch, with some bugs fixed
> and a few missing parts of the API done. The biggest change is in the
> insertion code, where I now link from the bottom up once I find the
> proper insertion point. This makes it possible to write custom
> insertion routines that allow duplicate keys.
>
> It also means insertion doesn't have to lock and track all the nodes
> from the top as it searches. In the likely event that we're able to
> insert into a free spot in an existing leaf, it only needs to take one
> lock.
nice, I did pretty much the same for RCU Judy arrays. I made insertion
and deletion take only the localized locks required, nesting from bottom
to top.
> The IOMMU part of the patch is updated slightly, but still not using all
> the new bits in the API. This is mostly because the IOMMU part is going
> to change later on and I'm really only using it for testing now.
>
> For benchmarking, the IOMMU numbers are slightly higher than last time,
> but not more than 5% or so. This is because the bottleneck is still
> skiplist_insert_hole(), which I haven't really changed in this round.
>
> Most of my benchmarking now is with the skiplist_test module, which has
> expanded to a few new cases. For random operations, my performance is
> slightly slower than rbtree when single threaded.
>
> Random lookups on 10 million items: (inserts are similar)
>
> rbtree random lookup time 4 s 327 ms
> skiplist-rcu random lookup time 5 s 175 ms
These benchmarks are only considering averages.
I'm worried that the skiplist approach has chances to create long chains
(probabilistically non-zero). Clearly, the lack of bound is something
the RT guys might frown upon. I'd be curious to see how long the worse
chains encountered can be on, say, 1000 machines stress-testing this for
a week.
FWIW, my benchmark of RCU Judy array with ranges in similar conditions:
- Using 32-bit key length
- I first populated 10M ranges of len = 1, sequentially.
- Then, I run a reader threads for 10s, which perform random lookups
(all successful) in keys from 0 to 10M.
The average lookup time is (on a per-core basis):
- single-thread: 522 ns / lookup ( 19122393 lookups in 10s)
- 2 threads: 553 ns / lookup ( ~36134994 lookups in 10s on 2 cores)
- 8 threads: 696 ns / lookup (~114854104 lookups in 10s on 8 cores)
We get an efficiency of 0.79 when going from 2 to 8 threads (efficiency
of 1 being linear scalability).
Performed on this HW:
8-core
model name : Intel(R) Xeon(R) CPU E5405 @ 2.00GHz
What I gather from your skiplist-rcu numbers, it seems to take
517ns/lookup (avg), and for rbtree: 432ns/lookup. I'd be curious to
test all these alternatives in the same test bench.
>
> The new API makes it easier to do sequential operations. Here is a walk
> through the 10 million item list:
>
> skiplist-rcu check time 0 s 79 ms
> rbtree check time 0 s 295 ms
>
> And sequential insert:
>
> kiplist-rcu fill time 1 s 599 ms
> rbtree fill time 1 s 875 ms
>
> The benchmark does random lookup/deletion/insertion rounds. Most of the
> operations done are either deletion or insertion, so I'm not skewing the
> results with the easy rcu lookup operation.
We could do helpers for sequential traversal of ranges, they are not
implemented in RCU Judy array yet though.
>
> Single threaded rbtree does consistently win across all sizes of lists:
>
> skiplist-rcu thread 0 time 0 s 287 ms
> rbtree thread 0 time 0 s 215 ms
Are the 287ms and 215ms numbers you get for skiplist-rcu and rbtree the
total for 10M add/delete, or is it per operation, or is it a different
number of operations altogether ? Also, how many items are present in
the data structures in average ?
> But once we go up to two threads, skiplists win:
>
> skiplist-rcu thread 1 time 0 s 299 ms
> rbtree thread 1 time 0 s 499 ms
>
> At 8 threads, the skiplists don't get linear scaling, but it's really
> pretty good. Since I'm doing random operations on a wide key space,
> the locking skiplist variant is also fast:
>
> skiplist-rcu thread 7 time 0 s 379 ms
> skiplist-locking thread 7 time 0 s 425 ms
> rbtree thread 7 time 3 s 711 ms
Hopefully I'm having a setup not too different from yours. Here is my
benchmark of the RCU Judy Array insert operations. Please note that I
have not focused on optimizing the update-side, I've made design choices
aimed towards making lookups as fast as possible, sometimes at the
expense of update overhead.
I used a keyspace of 400M, just so the number of inserts that end up
being a simple rcu lookup are non significant. I'm still using a 32-bit
key length. I'm adding ranges of length 1.
Here is the average time taken per insertion (on a per-core basis):
- single-thread: 15000 ns/insert ( 645474 inserts in 10s)
- 2 threads: 17000 ns/insert (~1188232 inserts in 10s on 2 cores)
- 8 threads: 25000 ns/insert (~3161024 inserts in 10s on 8 cores)
We get an efficiency of 0.66 when going from 2 to 8 threads (efficiency
of 1 being linear scalability).
>
> My test machine is 8 cores, so at 16 threads we're into HT:
>
> skiplist-rcu thread 15 time 0 s 583 ms
> skiplist-locking thread 15 time 0 s 559 ms
> rbtree thread 15 time 7 s 423 ms
Skipping this test, since my test machine does not have hyperthreading.
>
> It's not all sunshine though. If I shrink the keyspace down to 1000
> keys or less, there is more contention on the node locks and we're tied
> (or slightly worse) than rbtree. In that kind of workload it makes
> sense to add a big fat lock around the skiplist, or just stick with
> rbtrees.
I'd expect a similar behavior with RCU judy arrays with respect to small
keyspace (single lock around updates might perform better that locks
distributed within the data). However, even with a small keyspace (and
few nodes), I expect that RCU Judy Array and skiplist-rcu will win over
non-RCU rbtree for lookup speed when we have concurrent updates and
lookups from many cores.
Feedback is welcome,
Thanks!
Mathieu
>
> The skiplists do have locking and rcu variants of lookup operations.
> The locking ones are built on top of the rcu ones, and you can use them
> both at the same time.
>
> Patches follow.
>
> -chris
>
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
next prev parent reply other threads:[~2013-06-26 23:02 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-16 14:56 [PATCH RFC 0/2] rcu skiplists v2 Chris Mason
2013-06-16 14:57 ` [PATCH RFC 1/2] Skiplist: rcu range index Chris Mason
2013-07-14 14:06 ` Dong Fang
2013-06-16 14:58 ` [PATCH RFC 2/2] Switch the IOMMU over to the skiplists Chris Mason
2013-06-26 23:02 ` Mathieu Desnoyers [this message]
2013-06-26 23:51 ` [PATCH RFC 0/2] rcu skiplists v2 Chris Mason
2013-06-27 2:29 ` Mathieu Desnoyers
2013-06-27 4:46 ` Chris Mason
2013-06-27 11:42 ` Mathieu Desnoyers
2013-06-27 5:19 ` Dave Chinner
2013-06-27 10:55 ` [BULK] " Chris Mason
2013-06-27 12:12 ` Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130626230218.GA4002@Krystal \
--to=mathieu.desnoyers@efficios.com \
--cc=David.Woodhouse@intel.com \
--cc=bo.li.liu@oracle.com \
--cc=clmason@fusionio.com \
--cc=dchinner@redhat.com \
--cc=laijs@cn.fujitsu.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=rp@svcs.cs.pdx.edu \
--cc=shemminger@vyatta.com \
--cc=stern@rowland.harvard.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).