Re: [sched, rcu] b84c4e08143: +3.1% will-it-scale.per_thread_ops

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Fengguang Wu <fengguang.wu@intel.com>
To: lkp@lists.01.org
Subject: Re: [sched, rcu] b84c4e08143: +3.1% will-it-scale.per_thread_ops
Date: Sat, 19 Apr 2014 16:11:46 +0800	[thread overview]
Message-ID: <20140419081146.GA29068@localhost> (raw)
In-Reply-To: <20140417135503.GK4496@linux.vnet.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 5741 bytes --]

On Thu, Apr 17, 2014 at 06:55:03AM -0700, Paul E. McKenney wrote:
> On Thu, Apr 17, 2014 at 12:03:53PM +0800, Fengguang Wu wrote:
> > Hi Paul,
> > 
> > FYI, this improves will-it-scale/open1 throughput.
> 
> Cool!  Not a planned benefit, but I will take it.  ;-)
> 
> > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev.2014.04.14a
> > commit b84c4e08143c98dad4b4d139f08db0b98b0d3ec4 ("sched,rcu: Make cond_resched() report RCU quiescent states")
> 
> But how should I read the data below?  I see lots of positive percentages
> and lots of negative percentages for the delta, and all near zero for
> standard deviation.  Is the overall improvement an average of these or
> some such?  What is being measured?

There are a lot of things being measured, which are shown after each
"TOTAL".  For example, to get an overview of the report:

grep "TOTAL" this_email

     563496 ~ 0%      +3.1%     581059 ~ 0%  TOTAL will-it-scale.per_thread_ops
     756894 ~ 0%      +2.8%     778452 ~ 0%  TOTAL will-it-scale.per_process_ops
       0.57 ~ 0%      -2.7%       0.55 ~ 0%  TOTAL will-it-scale.scalability
     346764 ~ 2%     -74.0%      90164 ~ 1%  TOTAL slabinfo.kmalloc-256.active_objs
      10837 ~ 2%     -73.9%       2824 ~ 1%  TOTAL slabinfo.kmalloc-256.active_slabs
      10837 ~ 2%     -73.9%       2824 ~ 1%  TOTAL slabinfo.kmalloc-256.num_slabs
     346821 ~ 2%     -73.9%      90393 ~ 1%  TOTAL slabinfo.kmalloc-256.num_objs
     105961 ~ 1%     -63.0%      39153 ~ 1%  TOTAL meminfo.SUnreclaim
      26432 ~ 1%     -62.9%       9814 ~ 1%  TOTAL proc-vmstat.nr_slab_unreclaimable
      87318 ~ 0%    +130.0%     200809 ~ 0%  TOTAL softirqs.RCU
     140354 ~ 1%     -47.6%      73490 ~ 0%  TOTAL meminfo.Slab
      77391 ~ 1%     -46.7%      41235 ~ 2%  TOTAL cpuidle.C6-NHM.usage
      38368 ~ 2%     -37.6%      23954 ~ 2%  TOTAL softirqs.SCHED
       1.24 ~ 4%     -35.4%       0.80 ~ 3%  TOTAL perf-profile.cpu-cycles.do_notify_resume.int_signal.close
       1.43 ~ 4%     +41.9%       2.03 ~ 4%  TOTAL perf-profile.cpu-cycles.rcu_process_callbacks.__do_softirq.irq_exit.smp_apic_timer_in
rupt.apic_timer_interrupt
       1.27 ~ 3%     -30.0%       0.89 ~ 6%  TOTAL perf-profile.cpu-cycles.setup_object.isra.46.new_slab.__slab_alloc.kmem_cache_alloc.g
empty_filp
       1.54 ~ 7%     +35.6%       2.09 ~ 8%  TOTAL perf-profile.cpu-cycles.kmem_cache_alloc.getname_flags.getname.do_sys_open.sys_open
       4.21 ~ 2%     -29.1%       2.98 ~ 3%  TOTAL perf-profile.cpu-cycles.link_path_walk.path_openat.do_filp_open.do_sys_open.sys_open
       1.37 ~ 4%     -23.1%       1.05 ~ 7%  TOTAL perf-profile.cpu-cycles.__d_lookup_rcu.lookup_fast.link_path_walk.path_openat.do_filp
en
       0.88 ~17%     +29.1%       1.14 ~ 9%  TOTAL perf-profile.cpu-cycles.path_init.path_openat.do_filp_open.do_sys_open.sys_open
       0.67 ~16%     +33.6%       0.90 ~10%  TOTAL perf-profile.cpu-cycles.restore_sigcontext.sys_rt_sigreturn.stub_rt_sigreturn.raise
       3.19 ~ 1%     +17.4%       3.74 ~ 5%  TOTAL perf-profile.cpu-cycles.file_free_rcu.rcu_process_callbacks.__do_softirq.irq_exit.smp
ic_timer_interrupt
       4329 ~ 7%     +15.2%       4986 ~ 5%  TOTAL slabinfo.vm_area_struct.active_objs
       2536 ~ 1%     -75.8%        614 ~ 9%  TOTAL time.involuntary_context_switches
      32593 ~ 1%     -62.1%      12349 ~ 2%  TOTAL interrupts.0:IO-APIC-edge.timer
       6934 ~ 9%     +86.2%      12908 ~ 7%  TOTAL interrupts.RES
        490 ~ 1%     -37.3%        307 ~ 1%  TOTAL vmstat.system.cs
       1639 ~ 0%      -8.8%       1495 ~ 0%  TOTAL vmstat.system.in
     819681 ~ 0%      -3.7%     789527 ~ 0%  TOTAL interrupts.LOC

> > Legend:
> > 	~XX%    - stddev percent
> > 	[+-]XX% - change percent
> > 
> > 
> >                           time.involuntary_context_switches
> > 
> >    3500 ++------------------------------------------------------------------+
> >         |             .*..                                                  |
> >    3000 ++         .*.    *..*..  .*..*.. .*..                              |
> >         *..*..*..*.             *.       *                                  |
> >         |                                     *..*..     .*..     .*..*     |
> >    2500 ++                                          *..*.    *..*.          |
> >         |                                                                   |
> >    2000 ++                                                                  |
> >         |                                                                   |
> >    1500 ++                                                                  |
> >         |                                                                   |
> >         |                                                                   |
> >    1000 ++                                                                  |
> >         |     O  O  O              O  O                   O  O     O        O
> >     500 O+-O-----------O--O--O--O--------O-O--O--O--O--O--------O-----O--O--+
> > 
> > 
> > 	[*] bisect-good sample
> > 	[O] bisect-bad  sample
> 
> So the good case increases involuntary context switches, but helps something
> else?  Or does the benefit stem from increased involuntary context switches
> and thus less time spinning or some such?

In bisect POV, branch BASE is good and HEAD is bad. Which has nothing
to do with the improvement/regression in performance POV.

Here the HEAD(bisect bad) commit has less involuntary_context_switches
which indicates an improvement over BASE.  It does look like close to
the root cause of improved will-it-scale throughput.

Thanks,
Fengguang

WARNING: multiple messages have this Message-ID (diff)

From: Fengguang Wu <fengguang.wu@intel.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: LKML <linux-kernel@vger.kernel.org>, lkp@01.org
Subject: Re: [sched,rcu] b84c4e08143: +3.1% will-it-scale.per_thread_ops
Date: Sat, 19 Apr 2014 16:11:46 +0800	[thread overview]
Message-ID: <20140419081146.GA29068@localhost> (raw)
In-Reply-To: <20140417135503.GK4496@linux.vnet.ibm.com>

On Thu, Apr 17, 2014 at 06:55:03AM -0700, Paul E. McKenney wrote:
> On Thu, Apr 17, 2014 at 12:03:53PM +0800, Fengguang Wu wrote:
> > Hi Paul,
> > 
> > FYI, this improves will-it-scale/open1 throughput.
> 
> Cool!  Not a planned benefit, but I will take it.  ;-)
> 
> > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev.2014.04.14a
> > commit b84c4e08143c98dad4b4d139f08db0b98b0d3ec4 ("sched,rcu: Make cond_resched() report RCU quiescent states")
> 
> But how should I read the data below?  I see lots of positive percentages
> and lots of negative percentages for the delta, and all near zero for
> standard deviation.  Is the overall improvement an average of these or
> some such?  What is being measured?

There are a lot of things being measured, which are shown after each
"TOTAL".  For example, to get an overview of the report:

grep "TOTAL" this_email

     563496 ~ 0%      +3.1%     581059 ~ 0%  TOTAL will-it-scale.per_thread_ops
     756894 ~ 0%      +2.8%     778452 ~ 0%  TOTAL will-it-scale.per_process_ops
       0.57 ~ 0%      -2.7%       0.55 ~ 0%  TOTAL will-it-scale.scalability
     346764 ~ 2%     -74.0%      90164 ~ 1%  TOTAL slabinfo.kmalloc-256.active_objs
      10837 ~ 2%     -73.9%       2824 ~ 1%  TOTAL slabinfo.kmalloc-256.active_slabs
      10837 ~ 2%     -73.9%       2824 ~ 1%  TOTAL slabinfo.kmalloc-256.num_slabs
     346821 ~ 2%     -73.9%      90393 ~ 1%  TOTAL slabinfo.kmalloc-256.num_objs
     105961 ~ 1%     -63.0%      39153 ~ 1%  TOTAL meminfo.SUnreclaim
      26432 ~ 1%     -62.9%       9814 ~ 1%  TOTAL proc-vmstat.nr_slab_unreclaimable
      87318 ~ 0%    +130.0%     200809 ~ 0%  TOTAL softirqs.RCU
     140354 ~ 1%     -47.6%      73490 ~ 0%  TOTAL meminfo.Slab
      77391 ~ 1%     -46.7%      41235 ~ 2%  TOTAL cpuidle.C6-NHM.usage
      38368 ~ 2%     -37.6%      23954 ~ 2%  TOTAL softirqs.SCHED
       1.24 ~ 4%     -35.4%       0.80 ~ 3%  TOTAL perf-profile.cpu-cycles.do_notify_resume.int_signal.close
       1.43 ~ 4%     +41.9%       2.03 ~ 4%  TOTAL perf-profile.cpu-cycles.rcu_process_callbacks.__do_softirq.irq_exit.smp_apic_timer_in
rupt.apic_timer_interrupt
       1.27 ~ 3%     -30.0%       0.89 ~ 6%  TOTAL perf-profile.cpu-cycles.setup_object.isra.46.new_slab.__slab_alloc.kmem_cache_alloc.g
empty_filp
       1.54 ~ 7%     +35.6%       2.09 ~ 8%  TOTAL perf-profile.cpu-cycles.kmem_cache_alloc.getname_flags.getname.do_sys_open.sys_open
       4.21 ~ 2%     -29.1%       2.98 ~ 3%  TOTAL perf-profile.cpu-cycles.link_path_walk.path_openat.do_filp_open.do_sys_open.sys_open
       1.37 ~ 4%     -23.1%       1.05 ~ 7%  TOTAL perf-profile.cpu-cycles.__d_lookup_rcu.lookup_fast.link_path_walk.path_openat.do_filp
en
       0.88 ~17%     +29.1%       1.14 ~ 9%  TOTAL perf-profile.cpu-cycles.path_init.path_openat.do_filp_open.do_sys_open.sys_open
       0.67 ~16%     +33.6%       0.90 ~10%  TOTAL perf-profile.cpu-cycles.restore_sigcontext.sys_rt_sigreturn.stub_rt_sigreturn.raise
       3.19 ~ 1%     +17.4%       3.74 ~ 5%  TOTAL perf-profile.cpu-cycles.file_free_rcu.rcu_process_callbacks.__do_softirq.irq_exit.smp
ic_timer_interrupt
       4329 ~ 7%     +15.2%       4986 ~ 5%  TOTAL slabinfo.vm_area_struct.active_objs
       2536 ~ 1%     -75.8%        614 ~ 9%  TOTAL time.involuntary_context_switches
      32593 ~ 1%     -62.1%      12349 ~ 2%  TOTAL interrupts.0:IO-APIC-edge.timer
       6934 ~ 9%     +86.2%      12908 ~ 7%  TOTAL interrupts.RES
        490 ~ 1%     -37.3%        307 ~ 1%  TOTAL vmstat.system.cs
       1639 ~ 0%      -8.8%       1495 ~ 0%  TOTAL vmstat.system.in
     819681 ~ 0%      -3.7%     789527 ~ 0%  TOTAL interrupts.LOC

> > Legend:
> > 	~XX%    - stddev percent
> > 	[+-]XX% - change percent
> > 
> > 
> >                           time.involuntary_context_switches
> > 
> >    3500 ++------------------------------------------------------------------+
> >         |             .*..                                                  |
> >    3000 ++         .*.    *..*..  .*..*.. .*..                              |
> >         *..*..*..*.             *.       *                                  |
> >         |                                     *..*..     .*..     .*..*     |
> >    2500 ++                                          *..*.    *..*.          |
> >         |                                                                   |
> >    2000 ++                                                                  |
> >         |                                                                   |
> >    1500 ++                                                                  |
> >         |                                                                   |
> >         |                                                                   |
> >    1000 ++                                                                  |
> >         |     O  O  O              O  O                   O  O     O        O
> >     500 O+-O-----------O--O--O--O--------O-O--O--O--O--O--------O-----O--O--+
> > 
> > 
> > 	[*] bisect-good sample
> > 	[O] bisect-bad  sample
> 
> So the good case increases involuntary context switches, but helps something
> else?  Or does the benefit stem from increased involuntary context switches
> and thus less time spinning or some such?

In bisect POV, branch BASE is good and HEAD is bad. Which has nothing
to do with the improvement/regression in performance POV.

Here the HEAD(bisect bad) commit has less involuntary_context_switches
which indicates an improvement over BASE.  It does look like close to
the root cause of improved will-it-scale throughput.

Thanks,
Fengguang

next prev parent reply	other threads:[~2014-04-19  8:11 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-17  4:03 [sched,rcu] b84c4e08143: +3.1% will-it-scale.per_thread_ops Fengguang Wu
2014-04-17  4:03 ` Fengguang Wu
2014-04-17 13:55 ` [sched, rcu] " Paul E. McKenney
2014-04-17 13:55   ` [sched,rcu] " Paul E. McKenney
2014-04-19  8:11   ` Fengguang Wu [this message]
2014-04-19  8:11     ` Fengguang Wu
2014-04-22  1:50     ` [sched, rcu] " Paul E. McKenney
2014-04-22  1:50       ` [sched,rcu] " Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140419081146.GA29068@localhost \
    --to=fengguang.wu@intel.com \
    --cc=lkp@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.