All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	mingo@kernel.org, linux-kernel@vger.kernel.org, corbet@lwn.net,
	mhocko@kernel.org, dhowells@redhat.com,
	torvalds@linux-foundation.org, will.deacon@arm.com,
	Michael Ellerman <mpe@ellerman.id.au>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>
Subject: Re: [PATCH 4/4] locking: Introduce smp_cond_acquire()
Date: Thu, 12 Nov 2015 06:40:04 -0800	[thread overview]
Message-ID: <20151112144004.GU3972@linux.vnet.ibm.com> (raw)
In-Reply-To: <20151112150058.GA30321@redhat.com>

On Thu, Nov 12, 2015 at 04:00:58PM +0100, Oleg Nesterov wrote:
> On 11/12, Boqun Feng wrote:
> >
> > On Wed, Nov 11, 2015 at 08:39:53PM +0100, Oleg Nesterov wrote:
> > >
> > > 	object_t *object;
> > > 	spinlock_t lock;
> > >
> > > 	void update(void)
> > > 	{
> > > 		object_t *o;
> > >
> > > 		spin_lock(&lock);
> > > 		o = READ_ONCE(object);
> > > 		if (o) {
> > > 			BUG_ON(o->dead);
> > > 			do_something(o);
> > > 		}
> > > 		spin_unlock(&lock);
> > > 	}
> > >
> > > 	void destroy(void) // can be called only once, can't race with itself
> > > 	{
> > > 		object_t *o;
> > >
> > > 		o = object;
> > > 		object = NULL;
> > >
> > > 		/*
> > > 		 * pairs with lock/ACQUIRE. The next update() must see
> > > 		 * object == NULL after spin_lock();
> > > 		 */
> > > 		smp_mb();
> > >
> > > 		spin_unlock_wait(&lock);
> > >
> > > 		/*
> > > 		 * pairs with unlock/RELEASE. The previous update() has
> > > 		 * already passed BUG_ON(o->dead).
> > > 		 *
> > > 		 * (Yes, yes, in this particular case it is not needed,
> > > 		 *  we can rely on the control dependency).
> > > 		 */
> > > 		smp_mb();
> > >
> > > 		o->dead = true;
> > > 	}
> > >
> > > I believe the code above is correct and it needs the barriers on both sides.
> > >
> >
> > Hmm.. probably incorrect.. because the ACQUIRE semantics of spin_lock()
> > only guarantees that the memory operations following spin_lock() can't
> > be reorder before the *LOAD* part of spin_lock() not the *STORE* part,
> > i.e. the case below can happen(assuming the spin_lock() is implemented
> > as ll/sc loop)
> >
> > 	spin_lock(&lock):
> > 	  r1 = *lock; // LL, r1 == 0
> > 	o = READ_ONCE(object); // could be reordered here.
> > 	  *lock = 1; // SC
> >
> > This could happen because of the ACQUIRE semantics of spin_lock(), and
> > the current implementation of spin_lock() on PPC allows this happen.
> >
> > (Cc PPC maintainers for their opinions on this one)
> 
> In this case the code above is obviously wrong. And I do not understand
> how we can rely on spin_unlock_wait() then.
> 
> And afaics do_exit() is buggy too then, see below.
> 
> > I think it's OK for it as an ACQUIRE(with a proper barrier) or even just
> > a control dependency to pair with spin_unlock(), for example, the
> > following snippet in do_exit() is OK, except the smp_mb() is redundant,
> > unless I'm missing something subtle:
> >
> > 	/*
> > 	 * The setting of TASK_RUNNING by try_to_wake_up() may be delayed
> > 	 * when the following two conditions become true.
> > 	 *   - There is race condition of mmap_sem (It is acquired by
> > 	 *     exit_mm()), and
> > 	 *   - SMI occurs before setting TASK_RUNINNG.
> > 	 *     (or hypervisor of virtual machine switches to other guest)
> > 	 *  As a result, we may become TASK_RUNNING after becoming TASK_DEAD
> > 	 *
> > 	 * To avoid it, we have to wait for releasing tsk->pi_lock which
> > 	 * is held by try_to_wake_up()
> > 	 */
> > 	smp_mb();
> > 	raw_spin_unlock_wait(&tsk->pi_lock);
> 
> Perhaps it is me who missed something. But I don't think we can remove
> this mb(). And at the same time it can't help on PPC if I understand
> your explanation above correctly.

I cannot resist suggesting that any lock that interacts with
spin_unlock_wait() must have all relevant acquisitions followed by
smp_mb__after_unlock_lock().

							Thanx, Paul


  reply	other threads:[~2015-11-12 14:40 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-02 13:29 [PATCH 0/4] scheduler ordering bits Peter Zijlstra
2015-11-02 13:29 ` [PATCH 1/4] sched: Better document the try_to_wake_up() barriers Peter Zijlstra
2015-12-04  0:09   ` Byungchul Park
2015-12-04  0:58   ` Byungchul Park
2015-11-02 13:29 ` [PATCH 2/4] sched: Document Program-Order guarantees Peter Zijlstra
2015-11-02 20:27   ` Paul Turner
2015-11-02 20:34     ` Peter Zijlstra
2015-11-02 22:09       ` Paul Turner
2015-11-02 22:12         ` Peter Zijlstra
2015-11-20 10:02     ` Peter Zijlstra
2015-11-20 14:08       ` Boqun Feng
2015-11-20 14:18         ` Peter Zijlstra
2015-11-20 14:21           ` Boqun Feng
2015-11-20 19:41             ` Peter Zijlstra
2015-11-02 13:29 ` [PATCH 3/4] sched: Fix a race in try_to_wake_up() vs schedule() Peter Zijlstra
2015-11-02 13:29 ` [PATCH 4/4] locking: Introduce smp_cond_acquire() Peter Zijlstra
2015-11-02 13:57   ` Peter Zijlstra
2015-11-02 17:43     ` Will Deacon
2015-11-03  1:14       ` Paul E. McKenney
2015-11-03  1:25         ` Linus Torvalds
2015-11-02 17:42   ` Will Deacon
2015-11-02 18:08   ` Linus Torvalds
2015-11-02 18:37     ` Will Deacon
2015-11-02 19:17       ` Linus Torvalds
2015-11-02 19:57         ` Will Deacon
2015-11-02 20:23           ` Peter Zijlstra
2015-11-02 21:56         ` Peter Zijlstra
2015-11-03  1:57         ` Paul E. McKenney
2015-11-03 19:40           ` Linus Torvalds
2015-11-04  3:57             ` Paul E. McKenney
2015-11-04  4:43               ` Linus Torvalds
2015-11-04 12:54                 ` Paul E. McKenney
2015-11-02 20:36     ` David Howells
2015-11-02 20:40       ` Peter Zijlstra
2015-11-02 21:11       ` Linus Torvalds
2015-11-03 17:59   ` Oleg Nesterov
2015-11-03 18:23     ` Peter Zijlstra
2015-11-11  9:39     ` Boqun Feng
2015-11-11 10:34       ` Boqun Feng
2015-11-11 19:53         ` Oleg Nesterov
2015-11-12 13:50         ` Paul E. McKenney
2015-11-11 12:12       ` Peter Zijlstra
2015-11-11 19:39         ` Oleg Nesterov
2015-11-11 21:23           ` Linus Torvalds
2015-11-12  7:14           ` Boqun Feng
2015-11-12 10:28             ` Peter Zijlstra
2015-11-12 15:00             ` Oleg Nesterov
2015-11-12 14:40               ` Paul E. McKenney [this message]
2015-11-12 14:49                 ` Boqun Feng
2015-11-12 15:02                   ` Paul E. McKenney
2015-11-12 21:53                     ` Will Deacon
2015-11-12 14:50                 ` Peter Zijlstra
2015-11-12 15:01                   ` Paul E. McKenney
2015-11-12 15:08                     ` Peter Zijlstra
2015-11-12 15:20                       ` Paul E. McKenney
2015-11-12 21:25                         ` Will Deacon
2015-11-12 15:18               ` Boqun Feng
2015-11-12 18:38                 ` Oleg Nesterov
2015-11-12 18:02                   ` Peter Zijlstra
2015-11-12 19:33                     ` Oleg Nesterov
2015-11-12 18:59                       ` Paul E. McKenney
2015-11-12 21:33                         ` Will Deacon
2015-11-12 23:43                           ` Paul E. McKenney
2015-11-16 13:58                             ` Will Deacon
2015-11-12 18:21             ` Linus Torvalds
2015-11-12 22:09               ` Will Deacon
2015-11-16 15:56               ` Peter Zijlstra
2015-11-16 16:04                 ` Peter Zijlstra
2015-11-16 16:24                   ` Will Deacon
2015-11-16 16:44                     ` Paul E. McKenney
2015-11-16 16:46                       ` Will Deacon
2015-11-16 17:15                         ` Paul E. McKenney
2015-11-16 21:58                     ` Linus Torvalds
2015-11-17 11:51                       ` Will Deacon
2015-11-17 21:01                         ` Paul E. McKenney
2015-11-18 11:25                           ` Will Deacon
2015-11-19 18:01                             ` Will Deacon
2015-11-20 10:09                               ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151112144004.GU3972@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=boqun.feng@gmail.com \
    --cc=corbet@lwn.net \
    --cc=dhowells@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=oleg@redhat.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.