All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Low <jason.low2@hp.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Ingo Molnar <mingo@kernel.org>,
	linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [regression, 3.16-rc] rwsem: optimistic spinning causing performance degradation
Date: Thu, 03 Jul 2014 18:46:06 -0700	[thread overview]
Message-ID: <1404438366.8764.121.camel@j-VirtualBox> (raw)
In-Reply-To: <20140704010147.GY4453@dastard>

On Fri, 2014-07-04 at 11:01 +1000, Dave Chinner wrote:
> [re-added lkml]
> 
> On Thu, Jul 03, 2014 at 11:50:20AM -0700, Jason Low wrote:
> > On Wed, Jul 2, 2014 at 7:32 PM, Dave Chinner <david@fromorbit.com> wrote:
> > > This is what the kernel profile looks like on the strided run:
> > >
> > > -  83.06%  [kernel]  [k] osq_lock
> > >    - osq_lock
> > >       - 100.00% rwsem_down_write_failed
> > >          - call_rwsem_down_write_failed
> > >             - 99.55% sys_mprotect
> > >                  tracesys
> > >                  __GI___mprotect
> > > -  12.02%  [kernel]  [k] rwsem_down_write_failed
> > 
> > Hi Dave,
> > 
> > So with no sign of rwsem_spin_on_owner(), yet with such heavy contention in
> > osq_lock, this makes me wonder if it's spending most of its time spinning
> > on !owner while a reader has the lock? (We don't set sem->owner for the readers.)
> > 
> > If that's an issue, maybe the below is worth a test, in which we'll just
> > avoid spinning if rwsem_can_spin_on_owner() finds that there is no owner.
> > If we just had to enter the slowpath yet there is no owner, we'll be conservative
> > and assume readers have the lock.
> 
> That makes it quite a bit faster:
> 
>         XFS_REPAIR Summary    Fri Jul  4 10:39:32 2014
> 
> Phase           Start           End             Duration
> Phase 1:        07/04 10:38:04  07/04 10:38:05  1 second
> Phase 2:        07/04 10:38:05  07/04 10:38:08  3 seconds
> Phase 3:        07/04 10:38:08  07/04 10:39:12  1 minute, 4 seconds
> Phase 4:        07/04 10:39:12  07/04 10:39:21  9 seconds
> Phase 5:        07/04 10:39:21  07/04 10:39:22  1 second
> Phase 6:        07/04 10:39:22  07/04 10:39:30  8 seconds
> Phase 7:        07/04 10:39:30  07/04 10:39:30  
> 
> Total run time: 1 minute, 26 seconds
> done
> 
> real    1m28.504s
> user    1m23.990s
> sys     3m20.132s
> 
> So system time goes down massively, and speed comes up to within 30%
> of the single AG run.  But it's still 2-3000 IOPS down, but it's
> acceptible for the moment.
> 
> FWIW, the kernel profile ifor the multi-AG run now looks like:
> 
>   29.64%  [kernel]  [k] _raw_spin_unlock_irq
>    - _raw_spin_unlock_irq
>       + 35.34% __schedule
>       - 34.15% call_rwsem_down_write_failed
>          + 99.27% sys_mprotect
>       - 30.02% call_rwsem_down_read_failed
>            95.59% __do_page_fault
> -  24.65%  [kernel]  [k] _raw_spin_unlock_irqrestore
>    - _raw_spin_unlock_irqrestore
>       - 69.38% rwsem_wake
>          - call_rwsem_wake
>             - 83.32% sys_mprotect
>             + 15.54% __do_page_fault
>       + 22.55% try_to_wake_up
> +   9.77%  [kernel]  [k] default_send_IPI_mask_sequence_phys
> -   3.21%  [kernel]  [k] smp_call_function_many
>    - smp_call_function_many
>       - 99.22% flush_tlb_page
> +   2.51%  [kernel]  [k] rwsem_down_write_failed
> 
> It's much more like the 3.15 profile - it's only wasting half the
> CPU spinning on the internal spinlock and it's now going fast enough
> to be blowing another 10-12% of the CPU time sending tlb flushes to
> other CPUs....
> 
> One thing I did notice, even with the single-AG-at-a-time run, is
> that the system time is *significantly* reduced with this patch,
> even though it doesn't change performance.
> 
> ie unpatched:
> 
> 	unpatched	patched
> runtime   0m58s		  1m2s
> systime   4m55s		  1m1s
> 
> So not spinning when there are read holders is a major win even
> when there are few threads contending on read/write.
> 
> FWIW, the rwsems in the struct xfs_inode are often heavily
> read/write contended, so there are lots of IO related workloads that
> are going to regress on XFS without this optimisation...
> 
> Anyway, consider the patch:
> 
> Tested-by: Dave Chinner <dchinner@redhat.com>

Hi Dave,

Thanks for testing. I'll update the patch with an actual changelog.



  reply	other threads:[~2014-07-04  1:46 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1404413420.8764.42.camel@j-VirtualBox>
     [not found] ` <1404416236.3179.18.camel@buesod1.americas.hpqcorp.net>
2014-07-03 20:08   ` [regression, 3.16-rc] rwsem: optimistic spinning causing performance degradation Davidlohr Bueso
2014-07-04  1:01 ` Dave Chinner
2014-07-04  1:46   ` Jason Low [this message]
2014-07-04  1:54     ` Jason Low
2014-07-04  6:13       ` Dave Chinner
2014-07-04  7:06         ` Jason Low
2014-07-04  8:21           ` Dave Chinner
2014-07-04  8:53           ` Davidlohr Bueso
2014-07-05  3:14             ` Jason Low
2014-07-04  7:52       ` Peter Zijlstra
2014-07-04  8:40         ` Davidlohr Bueso
2014-07-05  3:49           ` Jason Low
     [not found]             ` <CAAW_DMjgd5+EOvZX7_iZe-jHp=00Nf7MX3z6hBCRPgOfqnMtEA@mail.gmail.com>
2014-07-14  9:55               ` Peter Zijlstra
2014-07-14 17:10                 ` Jason Low
2014-07-15  2:17                 ` Dave Chinner
2014-07-16 19:20             ` [tip:locking/urgent] locking/rwsem: Allow conservative optimistic spinning when readers have lock tip-bot for Jason Low
2014-07-03  2:32 [regression, 3.16-rc] rwsem: optimistic spinning causing performance degradation Dave Chinner
2014-07-03  3:31 ` Davidlohr Bueso
2014-07-03  4:59   ` Dave Chinner
2014-07-03  5:39     ` Dave Chinner
2014-07-03  7:38       ` Peter Zijlstra
2014-07-03  7:56         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1404438366.8764.121.camel@j-VirtualBox \
    --to=jason.low2@hp.com \
    --cc=david@fromorbit.com \
    --cc=davidlohr@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.