public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Oleg Nesterov <oleg@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>,
	Clark Williams <williams@redhat.com>
Subject: Re: safety of *mutex_unlock() (Was: [BUG] signal: sighand unprotected when accessed by /proc)
Date: Tue, 10 Jun 2014 05:56:55 -0700	[thread overview]
Message-ID: <20140610125655.GJ4581@linux.vnet.ibm.com> (raw)
In-Reply-To: <CA+55aFyO8bb7fS3AaQzfMq7geERQ1JpSdcRZ9Fbs5-tZP0oRXg@mail.gmail.com>

On Mon, Jun 09, 2014 at 11:51:09AM -0700, Linus Torvalds wrote:
> On Mon, Jun 9, 2014 at 11:29 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> >>
> >> And once again, note that the normal mutex is already unsafe (unless I missed
> >> something).
> >
> > Is it unsafe?
> >
> > This thread was started because of a bug we triggered in -rt, which
> > ended up being a change specific to -rt that modified the way slub
> > handled SLAB_DESTROY_BY_RCU. What else was wrong with it?
> 
> There's a different issue with freeing of mutexes, which is not a bug,
> but "by design". Namely that mutexes aren't truly "atomic". They are
> complex data structures, and they have issues that a spinlock does not
> have.
> 
> When unlocking a mutex, the thread doing the unlocking will still
> touch the mutex itself _after_ another thread could already have
> successfully acquired the mutex. This is not a problem in any normal
> use. since all of this is perfectly coherent in general, but it means
> that code sequences like:
> 
>   mutex_lock(mem->mutex);
>   kill_it = !--mem->refcount;
>   mutex_unlock(mem->mutex);
>   if (kill_it)
>      free(mem);
> 
> are fundamentally buggy.
> 
> Note that if you think of mutexes as truly indivisible atomic
> operations, the above is "obviously correct": the last person who got
> the mutex marked it for killing. But the fact is, the *next-to-last*
> mutex acquirer may still actively be in the *non-indivisible*
> mutex_unlock() when the last person frees it, resulting in a
> use-after-free. And yes, we've had this bug, and as far as I know it's
> possible that the RT code *introduces* this bug when it changes
> spinlocks into mutexes. Because we do exactly the above code sequence
> with spinlocks. So just replacing spinlocks with mutexes is a very
> subtly buggy thing to do in general.
> 
> Another example of this kind of situation is using a mutex as a
> completion event: that's simply buggy. Again, it's because mutexes are
> complex data structures, and you have a very similar use-after-free
> issue. It's why the fundamental data structure for a "struct
> completion" is a spinlock, not a mutex.
> 
> Again, in *theory*, a completion could be just a mutex that starts out
> locked, and then the completer completes it by just unlocking it. The
> person who waits for a completion does so by just asking for a lock.
> Obvious, simple, and trivial, right? Not so, because of the *exact*
> same issue above: the completer (who does an "unlock") may still be
> accessing the completion data structure when the completion waiter has
> successfully gotten the lock. So the standard thing of the completer
> freeing the underlying completion memory (which is often on the stack,
> so "freeing" is just he act of going out of scope of the liveness of
> the completion data structure) would not work if the completion was a
> mutex.
> 
> This is subtle, and it is basically unavoidable. If a mutex (or
> counting semaphore) has a fast-path - and a mutex/semaphore without a
> fast-path is shit - then this issue will exist. Exactly because the
> fast-path will depend on just one part of the whole big mutex
> structure, and the slow-path will have other pieces to it.
> 
> There might be reasonable ways to avoid this issue (having the
> fastpath locking field share memory with the slow-path locking, for
> example), but it's not how our semaphores and mutexes work, and I
> suspect it cannot be the case in general (because it limits you too
> badly in how to implement the mutex). As a result, this is all "by
> design" as opposed to being a bug.

So to safely free a structure containing a mutex, is there some better
approach than the following?

	mutex_lock(mem->mutex);
	kill_it = !--mem->refcount;
	rcu_read_lock();
	mutex_unlock(mem->mutex);
	rcu_read_unlock();
	if (kill_it)
		kfree_rcu(mem, rh); /* rh is the rcu_head field in mem. */

For example, is there some other way to know that all the prior lock
releases have finished their post-release accesses?

							Thanx, Paul


  parent reply	other threads:[~2014-06-10 12:57 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-03 17:02 [BUG] signal: sighand unprotected when accessed by /proc Steven Rostedt
2014-06-03 17:26 ` Oleg Nesterov
2014-06-03 18:03   ` Linus Torvalds
2014-06-03 20:01     ` Oleg Nesterov
2014-06-03 20:03       ` Oleg Nesterov
2014-06-06 20:33       ` Paul E. McKenney
2014-06-08 13:07         ` safety of *mutex_unlock() (Was: [BUG] signal: sighand unprotected when accessed by /proc) Oleg Nesterov
2014-06-09 16:26           ` Paul E. McKenney
2014-06-09 18:15             ` Oleg Nesterov
2014-06-09 18:29               ` Steven Rostedt
2014-06-09 18:51                 ` Linus Torvalds
2014-06-09 19:41                   ` Steven Rostedt
2014-06-10  8:53                     ` Thomas Gleixner
2014-06-10 16:57                       ` Oleg Nesterov
2014-06-10 18:08                         ` Thomas Gleixner
2014-06-10 18:13                           ` Steven Rostedt
2014-06-10 20:05                             ` Thomas Gleixner
2014-06-10 20:13                               ` Thomas Gleixner
2014-06-11 15:52                                 ` Paul E. McKenney
2014-06-11 17:07                                   ` Oleg Nesterov
2014-06-11 17:17                                     ` Oleg Nesterov
2014-06-11 17:29                                       ` Paul E. McKenney
2014-06-11 17:59                                         ` Oleg Nesterov
2014-06-11 19:56                                           ` Paul E. McKenney
2014-06-12 17:28                                             ` Oleg Nesterov
2014-06-12 20:35                                               ` Paul E. McKenney
2014-06-12 21:40                                                 ` Thomas Gleixner
2014-06-12 22:27                                                   ` Paul E. McKenney
2014-06-12 23:19                                                     ` Paul E. McKenney
2014-06-13 15:08                                                       ` Oleg Nesterov
2014-06-15  5:40                                                         ` Paul E. McKenney
2014-06-17 18:57                                                           ` Paul E. McKenney
2014-06-18 16:43                                                             ` Oleg Nesterov
2014-06-18 16:53                                                               ` Steven Rostedt
2014-06-21 19:54                                                                 ` Thomas Gleixner
2014-06-18 17:00                                                               ` Paul E. McKenney
2014-06-13 14:55                                                   ` Oleg Nesterov
2014-06-13 16:10                                                     ` Thomas Gleixner
2014-06-13 16:19                                                       ` Oleg Nesterov
2014-06-13 14:52                                                 ` Oleg Nesterov
2014-06-11 17:27                                     ` Paul E. McKenney
2014-06-10 17:07                       ` Oleg Nesterov
2014-06-10 17:51                         ` Thomas Gleixner
2014-06-10 12:56                   ` Paul E. McKenney [this message]
2014-06-10 14:48                     ` Peter Zijlstra
2014-06-10 15:18                       ` Paul E. McKenney
2014-06-10 15:35                     ` Linus Torvalds
2014-06-10 16:15                       ` Paul E. McKenney
2014-06-09 19:04                 ` Oleg Nesterov
2014-06-10  8:37             ` Peter Zijlstra
2014-06-10 12:52               ` Paul E. McKenney
2014-06-10 13:01                 ` Peter Zijlstra
2014-06-10 14:36                   ` Paul E. McKenney
2014-06-10 15:20                     ` Paul E. McKenney
2014-06-03 20:05     ` [BUG] signal: sighand unprotected when accessed by /proc Steven Rostedt
2014-06-03 20:09       ` Oleg Nesterov
2014-06-03 20:15         ` Steven Rostedt
2014-06-03 20:25         ` Steven Rostedt
2014-06-03 21:12           ` Thomas Gleixner
2014-06-03 18:05   ` Steven Rostedt
2014-06-03 19:25     ` Oleg Nesterov
2014-06-04  1:16       ` Steven Rostedt
2014-06-04 16:31         ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140610125655.GJ4581@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=williams@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox