Re: [PATCH RFC 0/2] rcu skiplists v2

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Chris Mason <clmason@fusionio.com>
Cc: Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	David Woodhouse <David.Woodhouse@intel.com>,
	"dchinner@redhat.com" <dchinner@redhat.com>,
	"bo.li.liu@oracle.com" <bo.li.liu@oracle.com>,
	"rp@svcs.cs.pdx.edu" <rp@svcs.cs.pdx.edu>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	Stephen Hemminger <shemminger@vyatta.com>,
	Alan Stern <stern@rowland.harvard.edu>
Subject: Re: [PATCH RFC 0/2] rcu skiplists v2
Date: Wed, 26 Jun 2013 19:02:18 -0400	[thread overview]
Message-ID: <20130626230218.GA4002@Krystal> (raw)
In-Reply-To: <20130616145612.4914.3009@localhost.localdomain>

Hi Chris,

* Chris Mason (clmason@fusionio.com) wrote:
> Hi everyone,
> 
> This is another iteration of my skiplists patch, with some bugs fixed
> and a few missing parts of the API done.  The biggest change is in the
> insertion code, where I now link from the bottom up once I find the
> proper insertion point.  This makes it possible to write custom
> insertion routines that allow duplicate keys.
> 
> It also means insertion doesn't have to lock and track all the nodes
> from the top as it searches.  In the likely event that we're able to
> insert into a free spot in an existing leaf, it only needs to take one
> lock.

nice, I did pretty much the same for RCU Judy arrays. I made insertion
and deletion take only the localized locks required, nesting from bottom
to top.

> The IOMMU part of the patch is updated slightly, but still not using all
> the new bits in the API.  This is mostly because the IOMMU part is going
> to change later on and I'm really only using it for testing now.
> 
> For benchmarking, the IOMMU numbers are slightly higher than last time,
> but not more than 5% or so.  This is because the bottleneck is still
> skiplist_insert_hole(), which I haven't really changed in this round.
> 
> Most of my benchmarking now is with the skiplist_test module, which has
> expanded to a few new cases.  For random operations, my performance is
> slightly slower than rbtree when single threaded.
> 
> Random lookups on 10 million items: (inserts are similar)
> 
> rbtree       random lookup time 4 s 327 ms
> skiplist-rcu random lookup time 5 s 175 ms 

These benchmarks are only considering averages.

I'm worried that the skiplist approach has chances to create long chains
(probabilistically non-zero). Clearly, the lack of bound is something
the RT guys might frown upon. I'd be curious to see how long the worse
chains encountered can be on, say, 1000 machines stress-testing this for
a week.

FWIW, my benchmark of RCU Judy array with ranges in similar conditions:

- Using 32-bit key length
- I first populated 10M ranges of len = 1, sequentially.
- Then, I run a reader threads for 10s, which perform random lookups
  (all successful) in keys from 0 to 10M.

The average lookup time is (on a per-core basis):

- single-thread: 522 ns / lookup (  19122393 lookups in 10s)
- 2 threads:     553 ns / lookup ( ~36134994 lookups in 10s on 2 cores)
- 8 threads:     696 ns / lookup (~114854104 lookups in 10s on 8 cores)

We get an efficiency of 0.79 when going from 2 to 8 threads (efficiency
of 1 being linear scalability).

Performed on this HW:
8-core
model name      : Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz

What I gather from your skiplist-rcu numbers, it seems to take
517ns/lookup (avg), and for rbtree: 432ns/lookup. I'd be curious to
test all these alternatives in the same test bench.

> 
> The new API makes it easier to do sequential operations.  Here is a walk
> through the 10 million item list:
> 
> skiplist-rcu check time 0 s 79 ms
> rbtree check time      0 s 295 ms
> 
> And sequential insert:
> 
> kiplist-rcu fill time 1 s 599 ms
> rbtree fill time      1 s 875 ms
> 
> The benchmark does random lookup/deletion/insertion rounds.  Most of the
> operations done are either deletion or insertion, so I'm not skewing the
> results with the easy rcu lookup operation.

We could do helpers for sequential traversal of ranges, they are not
implemented in RCU Judy array yet though.

> 
> Single threaded rbtree does consistently win across all sizes of lists:
> 
> skiplist-rcu thread 0 time 0 s 287 ms
> rbtree thread       0 time 0 s 215 ms

Are the 287ms and 215ms numbers you get for skiplist-rcu and rbtree the
total for 10M add/delete, or is it per operation, or is it a different
number of operations altogether ? Also, how many items are present in
the data structures in average ?

> But once we go up to two threads, skiplists win:
> 
> skiplist-rcu thread 1 time 0 s 299 ms
> rbtree thread 1 time       0 s 499 ms
> 
> At 8 threads, the skiplists don't get linear scaling, but it's really
> pretty good.  Since I'm doing random operations on a wide key space,
> the locking skiplist variant is also fast:
> 
> skiplist-rcu     thread 7 time 0 s 379 ms
> skiplist-locking thread 7 time 0 s 425 ms
> rbtree           thread 7 time 3 s 711 ms

Hopefully I'm having a setup not too different from yours. Here is my
benchmark of the RCU Judy Array insert operations. Please note that I
have not focused on optimizing the update-side, I've made design choices
aimed towards making lookups as fast as possible, sometimes at the
expense of update overhead.

I used a keyspace of 400M, just so the number of inserts that end up
being a simple rcu lookup are non significant. I'm still using a 32-bit
key length. I'm adding ranges of length 1.

Here is the average time taken per insertion (on a per-core basis):

- single-thread: 15000 ns/insert (  645474 inserts in 10s)
- 2 threads:     17000 ns/insert (~1188232 inserts in 10s on 2 cores)
- 8 threads:     25000 ns/insert (~3161024 inserts in 10s on 8 cores)

We get an efficiency of 0.66 when going from 2 to 8 threads (efficiency
of 1 being linear scalability).

> 
> My test machine is 8 cores, so at 16 threads we're into HT:
> 
> skiplist-rcu thread     15 time 0 s 583 ms
> skiplist-locking thread 15 time 0 s 559 ms
> rbtree thread           15 time 7 s 423 ms

Skipping this test, since my test machine does not have hyperthreading.

> 
> It's not all sunshine though.  If I shrink the keyspace down to 1000
> keys or less, there is more contention on the node locks and we're tied
> (or slightly worse) than rbtree.  In that kind of workload it makes
> sense to add a big fat lock around the skiplist, or just stick with
> rbtrees.

I'd expect a similar behavior with RCU judy arrays with respect to small
keyspace (single lock around updates might perform better that locks
distributed within the data). However, even with a small keyspace (and
few nodes), I expect that RCU Judy Array and skiplist-rcu will win over
non-RCU rbtree for lookup speed when we have concurrent updates and
lookups from many cores.

Feedback is welcome,

Thanks!

Mathieu

> 
> The skiplists do have locking and rcu variants of lookup operations.
> The locking ones are built on top of the rcu ones, and you can use them
> both at the same time.
> 
> Patches follow.
> 
> -chris
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

next prev parent reply	other threads:[~2013-06-26 23:02 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-16 14:56 [PATCH RFC 0/2] rcu skiplists v2 Chris Mason
2013-06-16 14:57 ` [PATCH RFC 1/2] Skiplist: rcu range index Chris Mason
2013-07-14 14:06   ` Dong Fang
2013-06-16 14:58 ` [PATCH RFC 2/2] Switch the IOMMU over to the skiplists Chris Mason
2013-06-26 23:02 ` Mathieu Desnoyers [this message]
2013-06-26 23:51   ` [PATCH RFC 0/2] rcu skiplists v2 Chris Mason
2013-06-27  2:29     ` Mathieu Desnoyers
2013-06-27  4:46       ` Chris Mason
2013-06-27 11:42         ` Mathieu Desnoyers
2013-06-27  5:19       ` Dave Chinner
2013-06-27 10:55         ` [BULK] " Chris Mason
2013-06-27 12:12         ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130626230218.GA4002@Krystal \
    --to=mathieu.desnoyers@efficios.com \
    --cc=David.Woodhouse@intel.com \
    --cc=bo.li.liu@oracle.com \
    --cc=clmason@fusionio.com \
    --cc=dchinner@redhat.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=rp@svcs.cs.pdx.edu \
    --cc=shemminger@vyatta.com \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.