public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jason Low <jason.low2@hp.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Ingo Molnar <mingo@kernel.org>,
	linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [regression, 3.16-rc] rwsem: optimistic spinning causing performance degradation
Date: Thu, 03 Jul 2014 18:46:06 -0700	[thread overview]
Message-ID: <1404438366.8764.121.camel@j-VirtualBox> (raw)
In-Reply-To: <20140704010147.GY4453@dastard>

On Fri, 2014-07-04 at 11:01 +1000, Dave Chinner wrote:
> [re-added lkml]
> 
> On Thu, Jul 03, 2014 at 11:50:20AM -0700, Jason Low wrote:
> > On Wed, Jul 2, 2014 at 7:32 PM, Dave Chinner <david@fromorbit.com> wrote:
> > > This is what the kernel profile looks like on the strided run:
> > >
> > > -  83.06%  [kernel]  [k] osq_lock
> > >    - osq_lock
> > >       - 100.00% rwsem_down_write_failed
> > >          - call_rwsem_down_write_failed
> > >             - 99.55% sys_mprotect
> > >                  tracesys
> > >                  __GI___mprotect
> > > -  12.02%  [kernel]  [k] rwsem_down_write_failed
> > 
> > Hi Dave,
> > 
> > So with no sign of rwsem_spin_on_owner(), yet with such heavy contention in
> > osq_lock, this makes me wonder if it's spending most of its time spinning
> > on !owner while a reader has the lock? (We don't set sem->owner for the readers.)
> > 
> > If that's an issue, maybe the below is worth a test, in which we'll just
> > avoid spinning if rwsem_can_spin_on_owner() finds that there is no owner.
> > If we just had to enter the slowpath yet there is no owner, we'll be conservative
> > and assume readers have the lock.
> 
> That makes it quite a bit faster:
> 
>         XFS_REPAIR Summary    Fri Jul  4 10:39:32 2014
> 
> Phase           Start           End             Duration
> Phase 1:        07/04 10:38:04  07/04 10:38:05  1 second
> Phase 2:        07/04 10:38:05  07/04 10:38:08  3 seconds
> Phase 3:        07/04 10:38:08  07/04 10:39:12  1 minute, 4 seconds
> Phase 4:        07/04 10:39:12  07/04 10:39:21  9 seconds
> Phase 5:        07/04 10:39:21  07/04 10:39:22  1 second
> Phase 6:        07/04 10:39:22  07/04 10:39:30  8 seconds
> Phase 7:        07/04 10:39:30  07/04 10:39:30  
> 
> Total run time: 1 minute, 26 seconds
> done
> 
> real    1m28.504s
> user    1m23.990s
> sys     3m20.132s
> 
> So system time goes down massively, and speed comes up to within 30%
> of the single AG run.  But it's still 2-3000 IOPS down, but it's
> acceptible for the moment.
> 
> FWIW, the kernel profile ifor the multi-AG run now looks like:
> 
>   29.64%  [kernel]  [k] _raw_spin_unlock_irq
>    - _raw_spin_unlock_irq
>       + 35.34% __schedule
>       - 34.15% call_rwsem_down_write_failed
>          + 99.27% sys_mprotect
>       - 30.02% call_rwsem_down_read_failed
>            95.59% __do_page_fault
> -  24.65%  [kernel]  [k] _raw_spin_unlock_irqrestore
>    - _raw_spin_unlock_irqrestore
>       - 69.38% rwsem_wake
>          - call_rwsem_wake
>             - 83.32% sys_mprotect
>             + 15.54% __do_page_fault
>       + 22.55% try_to_wake_up
> +   9.77%  [kernel]  [k] default_send_IPI_mask_sequence_phys
> -   3.21%  [kernel]  [k] smp_call_function_many
>    - smp_call_function_many
>       - 99.22% flush_tlb_page
> +   2.51%  [kernel]  [k] rwsem_down_write_failed
> 
> It's much more like the 3.15 profile - it's only wasting half the
> CPU spinning on the internal spinlock and it's now going fast enough
> to be blowing another 10-12% of the CPU time sending tlb flushes to
> other CPUs....
> 
> One thing I did notice, even with the single-AG-at-a-time run, is
> that the system time is *significantly* reduced with this patch,
> even though it doesn't change performance.
> 
> ie unpatched:
> 
> 	unpatched	patched
> runtime   0m58s		  1m2s
> systime   4m55s		  1m1s
> 
> So not spinning when there are read holders is a major win even
> when there are few threads contending on read/write.
> 
> FWIW, the rwsems in the struct xfs_inode are often heavily
> read/write contended, so there are lots of IO related workloads that
> are going to regress on XFS without this optimisation...
> 
> Anyway, consider the patch:
> 
> Tested-by: Dave Chinner <dchinner@redhat.com>

Hi Dave,

Thanks for testing. I'll update the patch with an actual changelog.



  reply	other threads:[~2014-07-04  1:46 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1404413420.8764.42.camel@j-VirtualBox>
     [not found] ` <1404416236.3179.18.camel@buesod1.americas.hpqcorp.net>
2014-07-03 20:08   ` [regression, 3.16-rc] rwsem: optimistic spinning causing performance degradation Davidlohr Bueso
2014-07-04  1:01 ` Dave Chinner
2014-07-04  1:46   ` Jason Low [this message]
2014-07-04  1:54     ` Jason Low
2014-07-04  6:13       ` Dave Chinner
2014-07-04  7:06         ` Jason Low
2014-07-04  8:21           ` Dave Chinner
2014-07-04  8:53           ` Davidlohr Bueso
2014-07-05  3:14             ` Jason Low
2014-07-04  7:52       ` Peter Zijlstra
2014-07-04  8:40         ` Davidlohr Bueso
2014-07-05  3:49           ` Jason Low
     [not found]             ` <CAAW_DMjgd5+EOvZX7_iZe-jHp=00Nf7MX3z6hBCRPgOfqnMtEA@mail.gmail.com>
2014-07-14  9:55               ` Peter Zijlstra
2014-07-14 17:10                 ` Jason Low
2014-07-15  2:17                 ` Dave Chinner
2014-07-16 19:20             ` [tip:locking/urgent] locking/rwsem: Allow conservative optimistic spinning when readers have lock tip-bot for Jason Low
2014-07-03  2:32 [regression, 3.16-rc] rwsem: optimistic spinning causing performance degradation Dave Chinner
2014-07-03  3:31 ` Davidlohr Bueso
2014-07-03  4:59   ` Dave Chinner
2014-07-03  5:39     ` Dave Chinner
2014-07-03  7:38       ` Peter Zijlstra
2014-07-03  7:56         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1404438366.8764.121.camel@j-VirtualBox \
    --to=jason.low2@hp.com \
    --cc=david@fromorbit.com \
    --cc=davidlohr@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox