linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Chris Mason <clmason@fusionio.com>
Cc: Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	David Woodhouse <David.Woodhouse@intel.com>,
	"dchinner@redhat.com" <dchinner@redhat.com>,
	"bo.li.liu@oracle.com" <bo.li.liu@oracle.com>,
	"rp@svcs.cs.pdx.edu" <rp@svcs.cs.pdx.edu>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	Stephen Hemminger <shemminger@vyatta.com>,
	Alan Stern <stern@rowland.harvard.edu>
Subject: Re: [PATCH RFC 0/2] rcu skiplists v2
Date: Wed, 26 Jun 2013 19:02:18 -0400	[thread overview]
Message-ID: <20130626230218.GA4002@Krystal> (raw)
In-Reply-To: <20130616145612.4914.3009@localhost.localdomain>

Hi Chris,

* Chris Mason (clmason@fusionio.com) wrote:
> Hi everyone,
> 
> This is another iteration of my skiplists patch, with some bugs fixed
> and a few missing parts of the API done.  The biggest change is in the
> insertion code, where I now link from the bottom up once I find the
> proper insertion point.  This makes it possible to write custom
> insertion routines that allow duplicate keys.
> 
> It also means insertion doesn't have to lock and track all the nodes
> from the top as it searches.  In the likely event that we're able to
> insert into a free spot in an existing leaf, it only needs to take one
> lock.

nice, I did pretty much the same for RCU Judy arrays. I made insertion
and deletion take only the localized locks required, nesting from bottom
to top.

> The IOMMU part of the patch is updated slightly, but still not using all
> the new bits in the API.  This is mostly because the IOMMU part is going
> to change later on and I'm really only using it for testing now.
> 
> For benchmarking, the IOMMU numbers are slightly higher than last time,
> but not more than 5% or so.  This is because the bottleneck is still
> skiplist_insert_hole(), which I haven't really changed in this round.
> 
> Most of my benchmarking now is with the skiplist_test module, which has
> expanded to a few new cases.  For random operations, my performance is
> slightly slower than rbtree when single threaded.
> 
> Random lookups on 10 million items: (inserts are similar)
> 
> rbtree       random lookup time 4 s 327 ms
> skiplist-rcu random lookup time 5 s 175 ms 

These benchmarks are only considering averages.

I'm worried that the skiplist approach has chances to create long chains
(probabilistically non-zero). Clearly, the lack of bound is something
the RT guys might frown upon. I'd be curious to see how long the worse
chains encountered can be on, say, 1000 machines stress-testing this for
a week.

FWIW, my benchmark of RCU Judy array with ranges in similar conditions:

- Using 32-bit key length
- I first populated 10M ranges of len = 1, sequentially.
- Then, I run a reader threads for 10s, which perform random lookups
  (all successful) in keys from 0 to 10M.

The average lookup time is (on a per-core basis):

- single-thread: 522 ns / lookup (  19122393 lookups in 10s)
- 2 threads:     553 ns / lookup ( ~36134994 lookups in 10s on 2 cores)
- 8 threads:     696 ns / lookup (~114854104 lookups in 10s on 8 cores)

We get an efficiency of 0.79 when going from 2 to 8 threads (efficiency
of 1 being linear scalability).

Performed on this HW:
8-core
model name      : Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz

What I gather from your skiplist-rcu numbers, it seems to take
517ns/lookup (avg), and for rbtree: 432ns/lookup. I'd be curious to
test all these alternatives in the same test bench.

> 
> The new API makes it easier to do sequential operations.  Here is a walk
> through the 10 million item list:
> 
> skiplist-rcu check time 0 s 79 ms
> rbtree check time      0 s 295 ms
> 
> And sequential insert:
> 
> kiplist-rcu fill time 1 s 599 ms
> rbtree fill time      1 s 875 ms
> 
> The benchmark does random lookup/deletion/insertion rounds.  Most of the
> operations done are either deletion or insertion, so I'm not skewing the
> results with the easy rcu lookup operation.

We could do helpers for sequential traversal of ranges, they are not
implemented in RCU Judy array yet though.

> 
> Single threaded rbtree does consistently win across all sizes of lists:
> 
> skiplist-rcu thread 0 time 0 s 287 ms
> rbtree thread       0 time 0 s 215 ms

Are the 287ms and 215ms numbers you get for skiplist-rcu and rbtree the
total for 10M add/delete, or is it per operation, or is it a different
number of operations altogether ? Also, how many items are present in
the data structures in average ?


> But once we go up to two threads, skiplists win:
> 
> skiplist-rcu thread 1 time 0 s 299 ms
> rbtree thread 1 time       0 s 499 ms
> 
> At 8 threads, the skiplists don't get linear scaling, but it's really
> pretty good.  Since I'm doing random operations on a wide key space,
> the locking skiplist variant is also fast:
> 
> skiplist-rcu     thread 7 time 0 s 379 ms
> skiplist-locking thread 7 time 0 s 425 ms
> rbtree           thread 7 time 3 s 711 ms

Hopefully I'm having a setup not too different from yours. Here is my
benchmark of the RCU Judy Array insert operations. Please note that I
have not focused on optimizing the update-side, I've made design choices
aimed towards making lookups as fast as possible, sometimes at the
expense of update overhead.

I used a keyspace of 400M, just so the number of inserts that end up
being a simple rcu lookup are non significant. I'm still using a 32-bit
key length. I'm adding ranges of length 1.

Here is the average time taken per insertion (on a per-core basis):

- single-thread: 15000 ns/insert (  645474 inserts in 10s)
- 2 threads:     17000 ns/insert (~1188232 inserts in 10s on 2 cores)
- 8 threads:     25000 ns/insert (~3161024 inserts in 10s on 8 cores)

We get an efficiency of 0.66 when going from 2 to 8 threads (efficiency
of 1 being linear scalability).

> 
> My test machine is 8 cores, so at 16 threads we're into HT:
> 
> skiplist-rcu thread     15 time 0 s 583 ms
> skiplist-locking thread 15 time 0 s 559 ms
> rbtree thread           15 time 7 s 423 ms

Skipping this test, since my test machine does not have hyperthreading.

> 
> It's not all sunshine though.  If I shrink the keyspace down to 1000
> keys or less, there is more contention on the node locks and we're tied
> (or slightly worse) than rbtree.  In that kind of workload it makes
> sense to add a big fat lock around the skiplist, or just stick with
> rbtrees.

I'd expect a similar behavior with RCU judy arrays with respect to small
keyspace (single lock around updates might perform better that locks
distributed within the data). However, even with a small keyspace (and
few nodes), I expect that RCU Judy Array and skiplist-rcu will win over
non-RCU rbtree for lookup speed when we have concurrent updates and
lookups from many cores.

Feedback is welcome,

Thanks!

Mathieu

> 
> The skiplists do have locking and rcu variants of lookup operations.
> The locking ones are built on top of the rcu ones, and you can use them
> both at the same time.
> 
> Patches follow.
> 
> -chris
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  parent reply	other threads:[~2013-06-26 23:02 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-16 14:56 [PATCH RFC 0/2] rcu skiplists v2 Chris Mason
2013-06-16 14:57 ` [PATCH RFC 1/2] Skiplist: rcu range index Chris Mason
2013-07-14 14:06   ` Dong Fang
2013-06-16 14:58 ` [PATCH RFC 2/2] Switch the IOMMU over to the skiplists Chris Mason
2013-06-26 23:02 ` Mathieu Desnoyers [this message]
2013-06-26 23:51   ` [PATCH RFC 0/2] rcu skiplists v2 Chris Mason
2013-06-27  2:29     ` Mathieu Desnoyers
2013-06-27  4:46       ` Chris Mason
2013-06-27 11:42         ` Mathieu Desnoyers
2013-06-27  5:19       ` Dave Chinner
2013-06-27 10:55         ` [BULK] " Chris Mason
2013-06-27 12:12         ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130626230218.GA4002@Krystal \
    --to=mathieu.desnoyers@efficios.com \
    --cc=David.Woodhouse@intel.com \
    --cc=bo.li.liu@oracle.com \
    --cc=clmason@fusionio.com \
    --cc=dchinner@redhat.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=rp@svcs.cs.pdx.edu \
    --cc=shemminger@vyatta.com \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).