All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: tree rcu: call_rcu scalability problem?
Date: Wed, 2 Sep 2009 09:48:44 -0700	[thread overview]
Message-ID: <20090902164844.GJ6774@linux.vnet.ibm.com> (raw)
In-Reply-To: <20090902164504.GD28052@wotan.suse.de>

On Wed, Sep 02, 2009 at 06:45:04PM +0200, Nick Piggin wrote:
> On Wed, Sep 02, 2009 at 09:37:05AM -0700, Paul E. McKenney wrote:
> > On Wed, Sep 02, 2009 at 06:24:51PM +0200, Nick Piggin wrote:
> > > On Wed, Sep 02, 2009 at 08:19:27AM -0700, Paul E. McKenney wrote:
> > > > On Wed, Sep 02, 2009 at 02:27:56PM +0200, Nick Piggin wrote:
> > > > > On Wed, Sep 02, 2009 at 11:48:35AM +0200, Nick Piggin wrote:
> > > > > > Hi Paul,
> > > > > > 
> > > > > > I'm testing out scalability of some vfs code paths, and I'm seeing
> > > > > > a problem with call_rcu. This is a 2s8c opteron system, so nothing
> > > > > > crazy.
> > > > > > 
> > > > > > I'll show you the profile results for 1-8 threads:
> > > > > > 
> > > > > > 1:
> > > > > >  29768 total                                      0.0076
> > > > > >  15550 default_idle                              48.5938
> > > > > >   1340 __d_lookup                                 3.6413
> > > > > >    954 __link_path_walk                           0.2559
> > > > > >    816 system_call_after_swapgs                   8.0792
> > > > > >    680 kmem_cache_alloc                           1.4167
> > > > > >    669 dput                                       1.1946
> > > > > >    591 __call_rcu                                 2.0521
> > > > > > 
> > > > > > 2:
> > > > > >  56733 total                                      0.0145
> > > > > >  20074 default_idle                              62.7313
> > > > > >   3075 __call_rcu                                10.6771
> > > > > >   2650 __d_lookup                                 7.2011
> > > > > >   2019 dput                                       3.6054
> > > > > > 
> > > > > > 4:
> > > > > >  98889 total                                      0.0253
> > > > > >  21759 default_idle                              67.9969
> > > > > >  10994 __call_rcu                                38.1736
> > > > > >   5185 __d_lookup                                14.0897
> > > > > >   4475 dput                                       7.9911
> > > > 
> > > > Four threads runs on one socket but 8 threads runs on two sockets,
> > > > I take it?
> > > 
> > > Yes.
> > > 
> > > 
> > > > > > 8:
> > > > > > 170391 total                                      0.0437
> > > > > >  31815 __call_rcu                               110.4688
> > > > > >  12958 dput                                      23.1393
> > > > > >  10417 __d_lookup                                28.3071
> > > > > > 
> > > > > > Of course there are other scalability factors involved too, but
> > > > > > __call_rcu is taking 54 times more CPU to do 8 times the amount
> > > > > > of work from 1-8 threads, or a factor of 6.7 slowdown.
> > > > > > 
> > > > > > This is with tree RCU.
> > > > > 
> > > > > It seems like nearly 2/3 of the cost is here:
> > > > >         /* Add the callback to our list. */
> > > > >         *rdp->nxttail[RCU_NEXT_TAIL] = head; <<<
> > > > >         rdp->nxttail[RCU_NEXT_TAIL] = &head->next;
> > > > 
> > > > Hmmm...  That certainly is not the first list of code in call_rcu() that
> > > > would come to mind...
> > > 
> > > It's weird. I *think* I read the asm right, but oprofile maybe is
> > > not attributing the cost to the right instruction.
> > > 
> > > 
> > > > > In loading the pointer to the next tail pointer. If I'm reading the profile
> > > > > correctly. Can't see why that should be a probem though...
> > > > 
> > > > The usual diagnosis would be false sharing.
> > > 
> > > Hmm that's possible yes.
> > > 
> > > 
> > > > Hmmm...  What is the workload?  CPU-bound?  If CONFIG_PREEMPT=n, I might
> > > > expect interference from force_quiescent_state(), except that it should
> > > > run only every few clock ticks.  So this seems quite unlikely.
> > > 
> > > It's CPU bound and preempt=y.
> > > 
> > > Workload is just 8 processes running a loop of close(open("file$i")) as
> > > I said though you probably won't be able to reproduce it on a vanilla
> > > kernel.
> > 
> > OK, so you are executing call_rcu() a -lot-!!!
> 
> Oh yeah. The combined frequency in the 8 proc case is once per 270ns, so
> nearly half a million times per second per core.

Woo-hoo!!!  ;-)

> It's not *slow* by any means, but it is increasing much faster than other
> functions on the profile so I just want to understand what is happening.

Understood and agreed.

> > Could you also please try CONFIG_RCU_TRACE=y, and send me the contents of
> > the files in the "rcu" subdirectory in debugfs?  Please take a snapshot
> > of these files, run your test for a fixed time interval (perhaps ten
> > seconds, but please tell me how long), then take a second snapshot.
> 
> Will do.

							Thanx, Paul

  reply	other threads:[~2009-09-02 16:48 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-02  9:48 tree rcu: call_rcu scalability problem? Nick Piggin
2009-09-02 12:27 ` Nick Piggin
2009-09-02 15:19   ` Paul E. McKenney
2009-09-02 16:24     ` Nick Piggin
2009-09-02 16:37       ` Paul E. McKenney
2009-09-02 16:45         ` Nick Piggin
2009-09-02 16:48           ` Paul E. McKenney [this message]
2009-09-02 17:50         ` Nick Piggin
2009-09-02 19:17   ` Peter Zijlstra
2009-09-03  5:14     ` Paul E. McKenney
2009-09-03  7:45       ` Nick Piggin
2009-09-03  9:01       ` Nick Piggin
2009-09-03 13:28         ` Paul E. McKenney
2009-09-03  7:14     ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090902164844.GJ6774@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.