From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Jason Low <jason.low2@hp.com>, Waiman Long <Waiman.Long@hp.com>,
Ingo Molnar <mingo@elte.hu>,
Andrew Morton <akpm@linux-foundation.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Alex Shi <alex.shi@linaro.org>, Andi Kleen <andi@firstfloor.org>,
Michel Lespinasse <walken@google.com>,
Davidlohr Bueso <davidlohr.bueso@hp.com>,
Matthew R Wilcox <matthew.r.wilcox@intel.com>,
Dave Hansen <dave.hansen@intel.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Rik van Riel <riel@redhat.com>,
Peter Hurley <peter@hurleysoftware.com>,
linux-kernel@vger.kernel.org, linux-mm <linux-mm@kvack.org>
Subject: Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
Date: Fri, 27 Sep 2013 16:01:37 -0700 [thread overview]
Message-ID: <20130927230137.GE9093@linux.vnet.ibm.com> (raw)
In-Reply-To: <1380322005.3467.186.camel@schen9-DESK>
On Fri, Sep 27, 2013 at 03:46:45PM -0700, Tim Chen wrote:
> On Fri, 2013-09-27 at 13:38 -0700, Paul E. McKenney wrote:
> > On Fri, Sep 27, 2013 at 12:38:53PM -0700, Tim Chen wrote:
> > > On Fri, 2013-09-27 at 08:29 -0700, Paul E. McKenney wrote:
> > > > On Wed, Sep 25, 2013 at 03:10:49PM -0700, Tim Chen wrote:
> > > > > We will need the MCS lock code for doing optimistic spinning for rwsem.
> > > > > Extracting the MCS code from mutex.c and put into its own file allow us
> > > > > to reuse this code easily for rwsem.
> > > > >
> > > > > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > > > > Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> > > > > ---
> > > > > include/linux/mcslock.h | 58 +++++++++++++++++++++++++++++++++++++++++++++++
> > > > > kernel/mutex.c | 58 +++++-----------------------------------------
> > > > > 2 files changed, 65 insertions(+), 51 deletions(-)
> > > > > create mode 100644 include/linux/mcslock.h
> > > > >
> > > > > diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> > > > > new file mode 100644
> > > > > index 0000000..20fd3f0
> > > > > --- /dev/null
> > > > > +++ b/include/linux/mcslock.h
> > > > > @@ -0,0 +1,58 @@
> > > > > +/*
> > > > > + * MCS lock defines
> > > > > + *
> > > > > + * This file contains the main data structure and API definitions of MCS lock.
> > > > > + */
> > > > > +#ifndef __LINUX_MCSLOCK_H
> > > > > +#define __LINUX_MCSLOCK_H
> > > > > +
> > > > > +struct mcs_spin_node {
> > > > > + struct mcs_spin_node *next;
> > > > > + int locked; /* 1 if lock acquired */
> > > > > +};
> > > > > +
> > > > > +/*
> > > > > + * We don't inline mcs_spin_lock() so that perf can correctly account for the
> > > > > + * time spent in this lock function.
> > > > > + */
> > > > > +static noinline
> > > > > +void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> > > > > +{
> > > > > + struct mcs_spin_node *prev;
> > > > > +
> > > > > + /* Init node */
> > > > > + node->locked = 0;
> > > > > + node->next = NULL;
> > > > > +
> > > > > + prev = xchg(lock, node);
> > > > > + if (likely(prev == NULL)) {
> > > > > + /* Lock acquired */
> > > > > + node->locked = 1;
> > > > > + return;
> > > > > + }
> > > > > + ACCESS_ONCE(prev->next) = node;
> > > > > + smp_wmb();
> > >
> > > BTW, is the above memory barrier necessary? It seems like the xchg
> > > instruction already provided a memory barrier.
> > >
> > > Now if we made the changes that Jason suggested:
> > >
> > >
> > > /* Init node */
> > > - node->locked = 0;
> > > node->next = NULL;
> > >
> > > prev = xchg(lock, node);
> > > if (likely(prev == NULL)) {
> > > /* Lock acquired */
> > > - node->locked = 1;
> > > return;
> > > }
> > > + node->locked = 0;
> > > ACCESS_ONCE(prev->next) = node;
> > > smp_wmb();
> > >
> > > We are probably still okay as other cpus do not read the value of
> > > node->locked, which is a local variable.
> >
> > I don't immediately see the need for the smp_wmb() in either case.
>
>
> Thinking a bit more, the following could happen in Jason's
> initial patch proposal. In this case variable "prev" referenced
> by CPU1 points to "node" referenced by CPU2
>
> CPU 1 (calling lock) CPU 2 (calling unlock)
> ACCESS_ONCE(prev->next) = node
> *next = ACCESS_ONCE(node->next);
> ACCESS_ONCE(next->locked) = 1;
> node->locked = 0;
>
> Then we will be spinning forever on CPU1 as we overwrite the lock passed
> from CPU2 before we check it. The original code assign
> "node->locked = 0" before xchg does not have this issue.
> Doing the following change of moving smp_wmb immediately
> after node->locked assignment (suggested by Jason)
>
> node->locked = 0;
> smp_wmb();
> ACCESS_ONCE(prev->next) = node;
>
> could avoid the problem, but will need closer scrutiny to see if
> there are other pitfalls if wmb happen before
>
> ACCESS_ONCE(prev->next) = node;
I could believe that an smp_wmb() might be needed before the
"ACCESS_ONCE(prev->next) = node;", just not after.
> > > > > + /* Wait until the lock holder passes the lock down */
> > > > > + while (!ACCESS_ONCE(node->locked))
> > > > > + arch_mutex_cpu_relax();
> >
> > However, you do need a full memory barrier here in order to ensure that
> > you see the effects of the previous lock holder's critical section.
>
> Is it necessary to add a memory barrier after acquiring
> the lock if the previous lock holder execute smp_wmb before passing
> the lock?
Yep. The previous lock holder's smp_wmb() won't keep either the compiler
or the CPU from reordering things for the new lock holder. They could for
example reorder the critical section to precede the node->locked check,
which would be very bad.
Thanx, Paul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Jason Low <jason.low2@hp.com>, Waiman Long <Waiman.Long@hp.com>,
Ingo Molnar <mingo@elte.hu>,
Andrew Morton <akpm@linux-foundation.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Alex Shi <alex.shi@linaro.org>, Andi Kleen <andi@firstfloor.org>,
Michel Lespinasse <walken@google.com>,
Davidlohr Bueso <davidlohr.bueso@hp.com>,
Matthew R Wilcox <matthew.r.wilcox@intel.com>,
Dave Hansen <dave.hansen@intel.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Rik van Riel <riel@redhat.com>,
Peter Hurley <peter@hurleysoftware.com>,
linux-kernel@vger.kernel.org, linux-mm <linux-mm@kvack.org>
Subject: Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
Date: Fri, 27 Sep 2013 16:01:37 -0700 [thread overview]
Message-ID: <20130927230137.GE9093@linux.vnet.ibm.com> (raw)
In-Reply-To: <1380322005.3467.186.camel@schen9-DESK>
On Fri, Sep 27, 2013 at 03:46:45PM -0700, Tim Chen wrote:
> On Fri, 2013-09-27 at 13:38 -0700, Paul E. McKenney wrote:
> > On Fri, Sep 27, 2013 at 12:38:53PM -0700, Tim Chen wrote:
> > > On Fri, 2013-09-27 at 08:29 -0700, Paul E. McKenney wrote:
> > > > On Wed, Sep 25, 2013 at 03:10:49PM -0700, Tim Chen wrote:
> > > > > We will need the MCS lock code for doing optimistic spinning for rwsem.
> > > > > Extracting the MCS code from mutex.c and put into its own file allow us
> > > > > to reuse this code easily for rwsem.
> > > > >
> > > > > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > > > > Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> > > > > ---
> > > > > include/linux/mcslock.h | 58 +++++++++++++++++++++++++++++++++++++++++++++++
> > > > > kernel/mutex.c | 58 +++++-----------------------------------------
> > > > > 2 files changed, 65 insertions(+), 51 deletions(-)
> > > > > create mode 100644 include/linux/mcslock.h
> > > > >
> > > > > diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> > > > > new file mode 100644
> > > > > index 0000000..20fd3f0
> > > > > --- /dev/null
> > > > > +++ b/include/linux/mcslock.h
> > > > > @@ -0,0 +1,58 @@
> > > > > +/*
> > > > > + * MCS lock defines
> > > > > + *
> > > > > + * This file contains the main data structure and API definitions of MCS lock.
> > > > > + */
> > > > > +#ifndef __LINUX_MCSLOCK_H
> > > > > +#define __LINUX_MCSLOCK_H
> > > > > +
> > > > > +struct mcs_spin_node {
> > > > > + struct mcs_spin_node *next;
> > > > > + int locked; /* 1 if lock acquired */
> > > > > +};
> > > > > +
> > > > > +/*
> > > > > + * We don't inline mcs_spin_lock() so that perf can correctly account for the
> > > > > + * time spent in this lock function.
> > > > > + */
> > > > > +static noinline
> > > > > +void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> > > > > +{
> > > > > + struct mcs_spin_node *prev;
> > > > > +
> > > > > + /* Init node */
> > > > > + node->locked = 0;
> > > > > + node->next = NULL;
> > > > > +
> > > > > + prev = xchg(lock, node);
> > > > > + if (likely(prev == NULL)) {
> > > > > + /* Lock acquired */
> > > > > + node->locked = 1;
> > > > > + return;
> > > > > + }
> > > > > + ACCESS_ONCE(prev->next) = node;
> > > > > + smp_wmb();
> > >
> > > BTW, is the above memory barrier necessary? It seems like the xchg
> > > instruction already provided a memory barrier.
> > >
> > > Now if we made the changes that Jason suggested:
> > >
> > >
> > > /* Init node */
> > > - node->locked = 0;
> > > node->next = NULL;
> > >
> > > prev = xchg(lock, node);
> > > if (likely(prev == NULL)) {
> > > /* Lock acquired */
> > > - node->locked = 1;
> > > return;
> > > }
> > > + node->locked = 0;
> > > ACCESS_ONCE(prev->next) = node;
> > > smp_wmb();
> > >
> > > We are probably still okay as other cpus do not read the value of
> > > node->locked, which is a local variable.
> >
> > I don't immediately see the need for the smp_wmb() in either case.
>
>
> Thinking a bit more, the following could happen in Jason's
> initial patch proposal. In this case variable "prev" referenced
> by CPU1 points to "node" referenced by CPU2
>
> CPU 1 (calling lock) CPU 2 (calling unlock)
> ACCESS_ONCE(prev->next) = node
> *next = ACCESS_ONCE(node->next);
> ACCESS_ONCE(next->locked) = 1;
> node->locked = 0;
>
> Then we will be spinning forever on CPU1 as we overwrite the lock passed
> from CPU2 before we check it. The original code assign
> "node->locked = 0" before xchg does not have this issue.
> Doing the following change of moving smp_wmb immediately
> after node->locked assignment (suggested by Jason)
>
> node->locked = 0;
> smp_wmb();
> ACCESS_ONCE(prev->next) = node;
>
> could avoid the problem, but will need closer scrutiny to see if
> there are other pitfalls if wmb happen before
>
> ACCESS_ONCE(prev->next) = node;
I could believe that an smp_wmb() might be needed before the
"ACCESS_ONCE(prev->next) = node;", just not after.
> > > > > + /* Wait until the lock holder passes the lock down */
> > > > > + while (!ACCESS_ONCE(node->locked))
> > > > > + arch_mutex_cpu_relax();
> >
> > However, you do need a full memory barrier here in order to ensure that
> > you see the effects of the previous lock holder's critical section.
>
> Is it necessary to add a memory barrier after acquiring
> the lock if the previous lock holder execute smp_wmb before passing
> the lock?
Yep. The previous lock holder's smp_wmb() won't keep either the compiler
or the CPU from reordering things for the new lock holder. They could for
example reorder the critical section to precede the node->locked check,
which would be very bad.
Thanx, Paul
next prev parent reply other threads:[~2013-09-27 23:01 UTC|newest]
Thread overview: 129+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <cover.1380144003.git.tim.c.chen@linux.intel.com>
2013-09-25 22:10 ` [PATCH v6 0/6] rwsem: performance optimizations Tim Chen
2013-09-25 22:10 ` Tim Chen
2013-09-25 22:10 ` [PATCH v6 1/6] rwsem: check the lock before cpmxchg in down_write_trylock Tim Chen
2013-09-25 22:10 ` Tim Chen
2013-09-25 22:10 ` [PATCH v6 2/6] rwsem: remove 'out' label in do_wake Tim Chen
2013-09-25 22:10 ` Tim Chen
2013-09-25 22:10 ` [PATCH v6 3/6] rwsem: remove try_reader_grant label do_wake Tim Chen
2013-09-25 22:10 ` Tim Chen
2013-09-25 22:10 ` [PATCH v6 4/6] rwsem/wake: check lock before do atomic update Tim Chen
2013-09-25 22:10 ` Tim Chen
2013-09-25 22:10 ` [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file Tim Chen
2013-09-25 22:10 ` Tim Chen
2013-09-26 6:46 ` Ingo Molnar
2013-09-26 6:46 ` Ingo Molnar
2013-09-26 8:40 ` Peter Zijlstra
2013-09-26 8:40 ` Peter Zijlstra
2013-09-26 9:37 ` Ingo Molnar
2013-09-26 9:37 ` Ingo Molnar
2013-09-26 18:18 ` Tim Chen
2013-09-26 18:18 ` Tim Chen
2013-09-26 19:27 ` Jason Low
2013-09-26 19:27 ` Jason Low
2013-09-26 20:06 ` Davidlohr Bueso
2013-09-26 20:06 ` Davidlohr Bueso
2013-09-26 20:23 ` Jason Low
2013-09-26 20:23 ` Jason Low
2013-09-26 20:40 ` Davidlohr Bueso
2013-09-26 20:40 ` Davidlohr Bueso
2013-09-26 21:09 ` Jason Low
2013-09-26 21:09 ` Jason Low
2013-09-26 21:41 ` Tim Chen
2013-09-26 21:41 ` Tim Chen
2013-09-26 22:42 ` Jason Low
2013-09-26 22:42 ` Jason Low
2013-09-26 22:57 ` Tim Chen
2013-09-26 22:57 ` Tim Chen
2013-09-27 6:02 ` Ingo Molnar
2013-09-27 6:02 ` Ingo Molnar
2013-09-27 6:26 ` Jason Low
2013-09-27 6:26 ` Jason Low
2013-09-27 11:23 ` Peter Zijlstra
2013-09-27 11:23 ` Peter Zijlstra
2013-09-27 13:44 ` Joe Perches
2013-09-27 13:44 ` Joe Perches
2013-09-27 13:48 ` Peter Zijlstra
2013-09-27 13:48 ` Peter Zijlstra
2013-09-27 14:05 ` Joe Perches
2013-09-27 14:05 ` Joe Perches
2013-09-27 14:18 ` Peter Zijlstra
2013-09-27 14:18 ` Peter Zijlstra
2013-09-27 14:14 ` [PATCH] checkpatch: Make the memory barrier test noisier Joe Perches
2013-09-27 14:14 ` Joe Perches
2013-09-27 14:26 ` Peter Zijlstra
2013-09-27 14:26 ` Peter Zijlstra
2013-09-27 14:34 ` Joe Perches
2013-09-27 14:34 ` Joe Perches
2013-09-27 14:50 ` Peter Zijlstra
2013-09-27 14:50 ` Peter Zijlstra
2013-09-27 15:17 ` Paul E. McKenney
2013-09-27 15:17 ` Paul E. McKenney
2013-09-27 15:34 ` Peter Zijlstra
2013-09-27 15:34 ` Peter Zijlstra
2013-09-27 16:04 ` Paul E. McKenney
2013-09-27 16:04 ` Paul E. McKenney
2013-09-27 16:04 ` Paul E. McKenney
2013-09-27 23:40 ` Oliver Neukum
2013-09-27 23:40 ` Oliver Neukum
2013-09-28 7:54 ` Peter Zijlstra
2013-09-28 7:54 ` Peter Zijlstra
2013-09-27 16:12 ` [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file Jason Low
2013-09-27 16:12 ` Jason Low
2013-09-27 16:19 ` Tim Chen
2013-09-27 16:19 ` Tim Chen
2013-10-02 19:19 ` Waiman Long
2013-10-02 19:19 ` Waiman Long
2013-10-02 19:30 ` Jason Low
2013-10-02 19:30 ` Jason Low
2013-10-02 19:37 ` Waiman Long
2013-10-02 19:37 ` Waiman Long
2013-09-26 22:22 ` Davidlohr Bueso
2013-09-26 22:22 ` Davidlohr Bueso
2013-09-27 15:29 ` Paul E. McKenney
2013-09-27 15:29 ` Paul E. McKenney
2013-09-27 18:09 ` Tim Chen
2013-09-27 18:09 ` Tim Chen
2013-09-28 2:58 ` Waiman Long
2013-09-28 2:58 ` Waiman Long
2013-09-27 19:38 ` Tim Chen
2013-09-27 19:38 ` Tim Chen
2013-09-27 20:16 ` Jason Low
2013-09-27 20:16 ` Jason Low
2013-09-27 20:38 ` Paul E. McKenney
2013-09-27 20:38 ` Paul E. McKenney
2013-09-27 22:46 ` Tim Chen
2013-09-27 22:46 ` Tim Chen
2013-09-27 23:01 ` Paul E. McKenney [this message]
2013-09-27 23:01 ` Paul E. McKenney
2013-09-27 23:54 ` Jason Low
2013-09-27 23:54 ` Jason Low
2013-09-28 0:02 ` Davidlohr Bueso
2013-09-28 0:02 ` Davidlohr Bueso
2013-09-28 2:19 ` Paul E. McKenney
2013-09-28 2:19 ` Paul E. McKenney
2013-09-28 4:34 ` Jason Low
2013-09-28 4:34 ` Jason Low
2013-09-30 15:51 ` Waiman Long
2013-09-30 15:51 ` Waiman Long
2013-09-30 16:10 ` Jason Low
2013-09-30 16:10 ` Jason Low
2013-09-30 16:36 ` Waiman Long
2013-09-30 16:36 ` Waiman Long
2013-10-01 16:48 ` Tim Chen
2013-10-01 16:48 ` Tim Chen
2013-10-01 20:01 ` Waiman Long
2013-10-01 20:01 ` Waiman Long
2013-10-01 21:16 ` Tim Chen
2013-10-01 21:16 ` Tim Chen
2013-10-02 1:25 ` Waiman Long
2013-10-02 1:25 ` Waiman Long
2013-10-02 18:43 ` Tim Chen
2013-10-02 18:43 ` Tim Chen
2013-10-02 19:32 ` Waiman Long
2013-10-02 19:32 ` Waiman Long
2013-09-30 16:28 ` Tim Chen
2013-09-30 16:28 ` Tim Chen
2013-09-25 22:10 ` [PATCH v6 6/6] rwsem: do optimistic spinning for writer lock acquisition Tim Chen
2013-09-25 22:10 ` Tim Chen
2013-09-26 6:53 ` Ingo Molnar
2013-09-26 6:53 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130927230137.GE9093@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=Waiman.Long@hp.com \
--cc=a.p.zijlstra@chello.nl \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alex.shi@linaro.org \
--cc=andi@firstfloor.org \
--cc=dave.hansen@intel.com \
--cc=davidlohr.bueso@hp.com \
--cc=jason.low2@hp.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=matthew.r.wilcox@intel.com \
--cc=mingo@elte.hu \
--cc=peter@hurleysoftware.com \
--cc=riel@redhat.com \
--cc=tim.c.chen@linux.intel.com \
--cc=walken@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.