public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: tree rcu: call_rcu scalability problem?
Date: Wed, 2 Sep 2009 08:19:27 -0700	[thread overview]
Message-ID: <20090902151927.GA6774@linux.vnet.ibm.com> (raw)
In-Reply-To: <20090902122756.GC12251@wotan.suse.de>

On Wed, Sep 02, 2009 at 02:27:56PM +0200, Nick Piggin wrote:
> On Wed, Sep 02, 2009 at 11:48:35AM +0200, Nick Piggin wrote:
> > Hi Paul,
> > 
> > I'm testing out scalability of some vfs code paths, and I'm seeing
> > a problem with call_rcu. This is a 2s8c opteron system, so nothing
> > crazy.
> > 
> > I'll show you the profile results for 1-8 threads:
> > 
> > 1:
> >  29768 total                                      0.0076
> >  15550 default_idle                              48.5938
> >   1340 __d_lookup                                 3.6413
> >    954 __link_path_walk                           0.2559
> >    816 system_call_after_swapgs                   8.0792
> >    680 kmem_cache_alloc                           1.4167
> >    669 dput                                       1.1946
> >    591 __call_rcu                                 2.0521
> > 
> > 2:
> >  56733 total                                      0.0145
> >  20074 default_idle                              62.7313
> >   3075 __call_rcu                                10.6771
> >   2650 __d_lookup                                 7.2011
> >   2019 dput                                       3.6054
> > 
> > 4:
> >  98889 total                                      0.0253
> >  21759 default_idle                              67.9969
> >  10994 __call_rcu                                38.1736
> >   5185 __d_lookup                                14.0897
> >   4475 dput                                       7.9911

Four threads runs on one socket but 8 threads runs on two sockets,
I take it?

> > 8:
> > 170391 total                                      0.0437
> >  31815 __call_rcu                               110.4688
> >  12958 dput                                      23.1393
> >  10417 __d_lookup                                28.3071
> > 
> > Of course there are other scalability factors involved too, but
> > __call_rcu is taking 54 times more CPU to do 8 times the amount
> > of work from 1-8 threads, or a factor of 6.7 slowdown.
> > 
> > This is with tree RCU.
> 
> It seems like nearly 2/3 of the cost is here:
>         /* Add the callback to our list. */
>         *rdp->nxttail[RCU_NEXT_TAIL] = head; <<<
>         rdp->nxttail[RCU_NEXT_TAIL] = &head->next;

Hmmm...  That certainly is not the first list of code in call_rcu() that
would come to mind...

> In loading the pointer to the next tail pointer. If I'm reading the profile
> correctly. Can't see why that should be a probem though...

The usual diagnosis would be false sharing.

Hmmm...  What is the workload?  CPU-bound?  If CONFIG_PREEMPT=n, I might
expect interference from force_quiescent_state(), except that it should
run only every few clock ticks.  So this seems quite unlikely.

Could you please try padding the beginning and end of struct rcu_data
with a few hundred bytes and rerunning?  Just in case there is a shared
per-CPU variable either before or after rcu_data in your memory layout?

							Thanx, Paul

> ffffffff8107dee0 <__call_rcu>: /* __call_rcu total: 320971 100.000 */
>    697  0.2172 :ffffffff8107dee0:       push   %r12
>    228  0.0710 :ffffffff8107dee2:       push   %rbp
>    133  0.0414 :ffffffff8107dee3:       mov    %rdx,%rbp
>    918  0.2860 :ffffffff8107dee6:       push   %rbx
>    316  0.0985 :ffffffff8107dee7:       mov    %rsi,0x8(%rdi)
>    257  0.0801 :ffffffff8107deeb:       movq   $0x0,(%rdi)
>   1660  0.5172 :ffffffff8107def2:       mfence
>  27730  8.6394 :ffffffff8107def5:       pushfq
>  13153  4.0979 :ffffffff8107def6:       pop    %r12
>    903  0.2813 :ffffffff8107def8:       cli
>   2562  0.7982 :ffffffff8107def9:       mov    %gs:0xde68,%eax
>   1784  0.5558 :ffffffff8107df01:       cltq
>                :ffffffff8107df03:       mov    0x60(%rdx,%rax,8),%rbx
>                :ffffffff8107df08:       pushfq
>   3494  1.0886 :ffffffff8107df09:       pop    %rdx
>    896  0.2792 :ffffffff8107df0a:       cli
>   2655  0.8272 :ffffffff8107df0b:       mov    0xd0(%rbp),%rcx
>   1800  0.5608 :ffffffff8107df12:       cmp    (%rbx),%rcx
>     21  0.0065 :ffffffff8107df15:       je     ffffffff8107df32 <__call_rcu+0x52
>                :ffffffff8107df17:       mov    0x40(%rbx),%rax
>     81  0.0252 :ffffffff8107df1b:       mov    %rcx,(%rbx)
>      3 9.3e-04 :ffffffff8107df1e:       mov    %rax,0x38(%rbx)
>                :ffffffff8107df22:       mov    0x48(%rbx),%rax
>                :ffffffff8107df26:       mov    %rax,0x40(%rbx)
>                :ffffffff8107df2a:       mov    0x50(%rbx),%rax
>                :ffffffff8107df2e:       mov    %rax,0x48(%rbx)
>                :ffffffff8107df32:       push   %rdx
>   1194  0.3720 :ffffffff8107df33:       popfq
>   9518  2.9654 :ffffffff8107df34:       pushfq
>   4179  1.3020 :ffffffff8107df35:       pop    %rdx
>   1277  0.3979 :ffffffff8107df36:       cli
>   2546  0.7932 :ffffffff8107df37:       mov    0xc8(%rbp),%rax
>   1748  0.5446 :ffffffff8107df3e:       cmp    %rax,0x8(%rbx)
>      5  0.0016 :ffffffff8107df42:       je     ffffffff8107df57 <__call_rcu+0x77
>                :ffffffff8107df44:       movb   $0x1,0x19(%rbx)
>      2 6.2e-04 :ffffffff8107df48:       movb   $0x0,0x18(%rbx)
>                :ffffffff8107df4c:       mov    0xc8(%rbp),%rax
>                :ffffffff8107df53:       mov    %rax,0x8(%rbx)
>    921  0.2869 :ffffffff8107df57:       push   %rdx
>    151  0.0470 :ffffffff8107df58:       popfq
> 183507 57.1725 :ffffffff8107df59:       mov    0x50(%rbx),%rax
>    995  0.3100 :ffffffff8107df5d:       mov    %rdi,(%rax)
>      2 6.2e-04 :ffffffff8107df60:       mov    %rdi,0x50(%rbx)
>     18  0.0056 :ffffffff8107df64:       mov    0xd0(%rbp),%rdx
>    940  0.2929 :ffffffff8107df6b:       mov    0xc8(%rbp),%rax
>     15  0.0047 :ffffffff8107df72:       cmp    %rax,%rdx
>      1 3.1e-04 :ffffffff8107df75:       je     ffffffff8107dfb0 <__call_rcu+0xd0
>    787  0.2452 :ffffffff8107df77:       mov    0x58(%rbx),%rax
>     58  0.0181 :ffffffff8107df7b:       inc    %rax
>      2 6.2e-04 :ffffffff8107df7e:       mov    %rax,0x58(%rbx)
>   1679  0.5231 :ffffffff8107df82:       movslq 0x4988fb(%rip),%rdx        # ffff
>     40  0.0125 :ffffffff8107df89:       cmp    %rdx,%rax
>      5  0.0016 :ffffffff8107df8c:       jg     ffffffff8107dfd7 <__call_rcu+0xf7
>    588  0.1832 :ffffffff8107df8e:       mov    0xe0(%rbp),%rdx
>     84  0.0262 :ffffffff8107df95:       mov    0x51f924(%rip),%rax        # ffff
>      5  0.0016 :ffffffff8107df9c:       cmp    %rax,%rdx
>    505  0.1573 :ffffffff8107df9f:       js     ffffffff8107dfc8 <__call_rcu+0xe8
>  17580  5.4771 :ffffffff8107dfa1:       push   %r12
>   1671  0.5206 :ffffffff8107dfa3:       popfq
>  24201  7.5399 :ffffffff8107dfa4:       pop    %rbx
>   1367  0.4259 :ffffffff8107dfa5:       pop    %rbp
>    377  0.1175 :ffffffff8107dfa6:       pop    %r12
>                :ffffffff8107dfa8:       retq
>                :ffffffff8107dfa9:       nopl   0x0(%rax)
>                :ffffffff8107dfb0:       mov    %rbp,%rdi
>                :ffffffff8107dfb3:       callq  ffffffff813be930 <_spin_lock_irqs
>     12  0.0037 :ffffffff8107dfb8:       mov    %rbp,%rdi
>                :ffffffff8107dfbb:       mov    %rax,%rsi
>                :ffffffff8107dfbe:       callq  ffffffff8107d8e0 <rcu_start_gp>
>                :ffffffff8107dfc3:       jmp    ffffffff8107df77 <__call_rcu+0x97
>                :ffffffff8107dfc5:       nopl   (%rax)
>                :ffffffff8107dfc8:       mov    $0x1,%esi
>     10  0.0031 :ffffffff8107dfcd:       mov    %rbp,%rdi
>                :ffffffff8107dfd0:       callq  ffffffff8107dd50 <force_quiescent
>      1 3.1e-04 :ffffffff8107dfd5:       jmp    ffffffff8107dfa1 <__call_rcu+0xc1
>    451  0.1405 :ffffffff8107dfd7:       mov    $0x7fffffffffffffff,%rdx
>    411  0.1280 :ffffffff8107dfe1:       xor    %esi,%esi
>                :ffffffff8107dfe3:       mov    %rbp,%rdi
>                :ffffffff8107dfe6:       mov    %rdx,0x60(%rbx)
>    317  0.0988 :ffffffff8107dfea:       callq  ffffffff8107dd50 <force_quiescent
>   4510  1.4051 :ffffffff8107dfef:       jmp    ffffffff8107dfa1 <__call_rcu+0xc1
>                :ffffffff8107dff1:       nopw   %cs:0x0(%rax,%rax,1)
> 
> 

  reply	other threads:[~2009-09-02 15:59 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-02  9:48 tree rcu: call_rcu scalability problem? Nick Piggin
2009-09-02 12:27 ` Nick Piggin
2009-09-02 15:19   ` Paul E. McKenney [this message]
2009-09-02 16:24     ` Nick Piggin
2009-09-02 16:37       ` Paul E. McKenney
2009-09-02 16:45         ` Nick Piggin
2009-09-02 16:48           ` Paul E. McKenney
2009-09-02 17:50         ` Nick Piggin
2009-09-02 19:17   ` Peter Zijlstra
2009-09-03  5:14     ` Paul E. McKenney
2009-09-03  7:45       ` Nick Piggin
2009-09-03  9:01       ` Nick Piggin
2009-09-03 13:28         ` Paul E. McKenney
2009-09-03  7:14     ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090902151927.GA6774@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox