From: Ingo Molnar <mingo@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Waiman Long <Waiman.Long@hp.com>, Ingo Molnar <mingo@elte.hu>,
Andrew Morton <akpm@linux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Rik van Riel <riel@redhat.com>,
Peter Hurley <peter@hurleysoftware.com>,
Davidlohr Bueso <davidlohr.bueso@hp.com>,
Alex Shi <alex.shi@intel.com>,
Tim Chen <tim.c.chen@linux.intel.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Andrea Arcangeli <aarcange@redhat.com>,
Matthew R Wilcox <matthew.r.wilcox@intel.com>,
Dave Hansen <dave.hansen@intel.com>,
Michel Lespinasse <walken@google.com>,
Andi Kleen <andi@firstfloor.org>,
"Chandramouleeswaran, Aswin" <aswin@hp.com>,
"Norton, Scott J" <scott.norton@hp.com>
Subject: Re: [PATCH] rwsem: reduce spinlock contention in wakeup code path
Date: Sat, 28 Sep 2013 09:41:45 +0200 [thread overview]
Message-ID: <20130928074144.GA17773@gmail.com> (raw)
In-Reply-To: <CA+55aFxXeQ69B1bfrO+0QtBqm0gt688LOshx=ppNjch10JF8FQ@mail.gmail.com>
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Fri, Sep 27, 2013 at 12:00 PM, Waiman Long <Waiman.Long@hp.com> wrote:
> >
> > On a large NUMA machine, it is entirely possible that a fairly large
> > number of threads are queuing up in the ticket spinlock queue to do
> > the wakeup operation. In fact, only one will be needed. This patch
> > tries to reduce spinlock contention by doing just that.
> >
> > A new wakeup field is added to the rwsem structure. This field is set
> > on entry to rwsem_wake() and __rwsem_do_wake() to mark that a thread
> > is pending to do the wakeup call. It is cleared on exit from those
> > functions.
>
> Ok, this is *much* simpler than adding the new MCS spinlock, so I'm
> wondering what the performance difference between the two are.
>
> I'm obviously always in favor of just removing lock contention over
> trying to improve the lock scalability, so I really like Waiman's
> approach over Tim's new MCS lock. Not because I dislike MCS locks in
> general (or you, Tim ;), it's really more fundamental: I just
> fundamentally believe more in trying to avoid lock contention than in
> trying to improve lock behavior when that contention happens.
>
> As such, I love exactly these kinds of things that Wainman's patch does,
> and I'm heavily biased.
Yeah, I fully agree. The reason I'm still very sympathetic to Tim's
efforts is that they address a regression caused by a mechanic
mutex->rwsem conversion:
5a505085f043 mm/rmap: Convert the struct anon_vma::mutex to an rwsem
... and Tim's patches turn that regression into an actual speedup.
The main regression component happens to be more prominent on larger
systems which inevitably have higher contention. That was a result of
mutexes having better contention behavior than rwsems - which was not a
property Tim picked arbitrarily.
The other advantage would be that by factoring out the MCS code it gets
reviewed and seen more prominently - this resulted in a micro-optimization
already. So people working on mutex or rwsem scalability will have a
chance to help each other.
A third advantage would be that arguably our rwsems degrade on really big
systems catastrophically. We had that with spinlocks and mutexes and it
got fixd. Since rwsems are a long-term concern for us I thought that
something like MCS queueing could be considered a fundamental
quality-of-implementation requirement as well.
In any case I fully agree that we never want to overdo it, our priority of
optimization and our ranking will always be:
- lockless algorithms
- reduction of critical paths
...
- improved locking fast path
- improved locking slow path
and people will see much better speedups if they concentrate on entries
higher on this list.
It's a pretty rigid order as well: no slowpath improvement is allowed to
hurt any of the higher priority optimization concepts and people are
encouraged to always work on the higher order concepts before considering
the symptoms of contention.
But as long as contention behavior improvements are cleanly done, don't
cause regressions, do not hinder primary scalability efforts and the
numbers are as impressive as Tim's, it looks like good stuff to me.
I was thinking about putting them into the locking tree once the review
and testing process converges and wanted to send them to you in the v3.13
merge window.
The only potential downside (modulo bugs) I can see from Tim's patches is
XFS's heavy dependence on fine behavioral details of rwsem. But if done
right then none of these scalability efforts should impact those details.
Thanks,
Ingo
next prev parent reply other threads:[~2013-09-28 7:41 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-27 19:00 [PATCH] rwsem: reduce spinlock contention in wakeup code path Waiman Long
2013-09-27 19:28 ` Linus Torvalds
2013-09-27 19:39 ` Davidlohr Bueso
2013-09-27 21:49 ` Tim Chen
2013-09-28 6:45 ` Ingo Molnar
2013-09-28 7:41 ` Ingo Molnar [this message]
2013-09-28 18:55 ` Linus Torvalds
2013-09-28 19:13 ` Andi Kleen
2013-09-28 19:22 ` Linus Torvalds
2013-09-28 19:21 ` Ingo Molnar
2013-09-28 19:33 ` Linus Torvalds
2013-09-28 19:39 ` Ingo Molnar
2013-09-30 10:44 ` Peter Zijlstra
2013-09-30 16:13 ` Linus Torvalds
2013-09-30 16:41 ` Peter Zijlstra
2013-10-01 7:28 ` Ingo Molnar
2013-10-01 8:09 ` Peter Zijlstra
2013-10-01 8:25 ` Ingo Molnar
2013-09-30 15:58 ` Waiman Long
2013-10-01 7:33 ` Ingo Molnar
2013-10-01 20:03 ` Waiman Long
2013-09-28 19:37 ` [PATCH] anon_vmas: Convert the rwsem to an rwlock_t Ingo Molnar
2013-09-28 19:43 ` Linus Torvalds
2013-09-28 19:46 ` Ingo Molnar
2013-09-28 19:52 ` [PATCH, v2] " Ingo Molnar
2013-09-30 11:00 ` Peter Zijlstra
2013-09-30 11:44 ` Peter Zijlstra
2013-09-30 17:03 ` Andrew Morton
2013-09-30 17:25 ` Linus Torvalds
2013-09-30 17:10 ` Tim Chen
2013-09-30 18:14 ` Peter Zijlstra
2013-09-30 19:23 ` Tim Chen
2013-09-30 19:35 ` Waiman Long
2013-09-30 19:47 ` Tim Chen
2013-09-30 22:03 ` Tim Chen
2013-10-01 2:41 ` Waiman Long
2013-09-30 8:52 ` [PATCH] " Andrea Arcangeli
2013-09-30 14:40 ` Jerome Glisse
2013-09-30 16:26 ` Linus Torvalds
2013-09-30 19:16 ` Andrea Arcangeli
2013-09-30 18:21 ` Andi Kleen
2013-09-30 19:04 ` Jerome Glisse
2013-09-29 23:06 ` [PATCH] rwsem: reduce spinlock contention in wakeup code path Davidlohr Bueso
2013-09-29 23:26 ` Linus Torvalds
2013-09-30 0:40 ` Davidlohr Bueso
2013-09-30 0:51 ` Linus Torvalds
2013-09-30 6:57 ` Ingo Molnar
2013-09-30 1:11 ` Michel Lespinasse
2013-09-30 7:05 ` Ingo Molnar
2013-09-30 16:03 ` Waiman Long
2013-09-30 10:46 ` Peter Zijlstra
2013-10-01 7:48 ` Ingo Molnar
2013-10-01 8:10 ` Peter Zijlstra
2013-09-27 19:32 ` Peter Hurley
2013-09-28 0:46 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130928074144.GA17773@gmail.com \
--to=mingo@kernel.org \
--cc=Waiman.Long@hp.com \
--cc=a.p.zijlstra@chello.nl \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alex.shi@intel.com \
--cc=andi@firstfloor.org \
--cc=aswin@hp.com \
--cc=dave.hansen@intel.com \
--cc=davidlohr.bueso@hp.com \
--cc=linux-kernel@vger.kernel.org \
--cc=matthew.r.wilcox@intel.com \
--cc=mingo@elte.hu \
--cc=peter@hurleysoftware.com \
--cc=riel@redhat.com \
--cc=scott.norton@hp.com \
--cc=tim.c.chen@linux.intel.com \
--cc=torvalds@linux-foundation.org \
--cc=walken@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).