Re: [RFC][PATCH 4/4] futex: Rewrite FUTEX_UNLOCK_PI

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: mingo@kernel.org, juri.lelli@arm.com, rostedt@goodmis.org,
	xlpang@redhat.com, bigeasy@linutronix.de,
	linux-kernel@vger.kernel.org, mathieu.desnoyers@efficios.com,
	jdesfossez@efficios.com, bristot@redhat.com
Subject: Re: [RFC][PATCH 4/4] futex: Rewrite FUTEX_UNLOCK_PI
Date: Wed, 23 Nov 2016 20:20:05 +0100	[thread overview]
Message-ID: <20161123192005.GA3107@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <alpine.DEB.2.20.1610272052580.4913@nanos>

On Thu, Oct 27, 2016 at 10:36:00PM +0200, Thomas Gleixner wrote:
> > +	new_owner = rt_mutex_next_owner(&pi_state->pi_mutex);
> > +	if (!new_owner) {
> > +		/*
> > +		 * This is the case where futex_lock_pi() has not yet or failed
> > +		 * to acquire the lock but still has the futex_q enqueued. So
> > +		 * the futex state has a 'waiter' while the rt_mutex state does
> > +		 * not.
> > +		 *
> > +		 * Even though there still is pi_state for this futex, we can
> > +		 * clear FUTEX_WAITERS. Either:
> > +		 *
> > +		 *  - we or futex_lock_pi() will drop the last reference and
> > +		 *    clean up this pi_state,
> > +		 *
> > +		 *  - userspace acquires the futex through its fastpath
> > +		 *    and the above pi_state cleanup still happens,
> > +		 *
> > +		 *  - or futex_lock_pi() will re-set the WAITERS bit in
> > +		 *    fixup_owner().
> > +		 */
> > +		newval = 0;
> > +		/*
> > +		 * Since pi_state->owner must point to a valid task, and
> > +		 * task_pid_vnr(pi_state->owner) must match TID_MASK, use
> > +		 * init_task.
> > +		 */
> > +		new_owner = &init_task;
> 
> So that waiter which is now spinning on pi_mutex->lock has already set the
> waiters bit, which you undo. So you created the following scenario:
> 
> CPU0	     	       	  CPU1				CPU2
> 
> TID 0x1001		  TID 0x1000			TID 0x1002
> 
> 			  Acquires futex in user space
>     			  futex value = 0x1000;
> 
> futex_lock_pi()
>   
>   lock_hb()
>   set_waiter_bit()
>   --> futex value = 0x40001000;
> 
> 			  futex_unlock_pi()
> 			   lock_hb()
>   attach_to_pi_owner()
>     rt_mutex_init_proxy_locked()
>   queue_me()
>     unlock_hb()
> 			   unlock_hb()
>   rt_mutex_lock()	   wake_futex_pi()
>    			   lock(pi_mutex->lock);
>    lock(pi_mutex->lock)	   new_owner is NULL;
> 		 	    --> futex value = 0;
> 			   rt_mutex_futex_unlock(pi_mutex);
> 			   unlock(pi_mutex->lock);
>    acquire_rt_mutex()	   return to user space
> 							Acquires futex in user space
> 							--> futex value = 0x1002
>   fixup_owner()
>     fixup_pi_state_owner()
>        uval = 0x1002;
>        newval = 0x40001001;
>        cmpxchg(uval, newval) succeeds
>        --> futex value = 0x40001001
> 
> Voila. Inconsistent state .... TID 0x1002 is borked now.

Urgh, right.

> The other option we have is to set the futex value to FUTEX_WAITERS instead
> of 0.

Yeah, I initially did that but didn't really like it, I then went on to
convince myself setting it to 0 was good, silly me. Leaking the WAITERS
bit is _by_far_ the simplest option though.

Ok, I went and implemented various broken and discarded alternatives
while re-learning all about futexes that I forgot the past few weeks,
while trying to figure out wtf the below does.

I also tried to create some 4+ CPU races that would hit holes in the
below, no luck so far.

> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -2246,16 +2246,27 @@ static int fixup_pi_state_owner(u32 __us
>  	if (get_futex_value_locked(&uval, uaddr))
>  		goto handle_fault;
>  
> +	/*
> +	 * If wake_futex_pi() set the futex to 0 and made init_task the
> +	 * transient owner another task might have acquired the futex
> +	 * in user space.
> +	 */

True, but that doesn't explain why we do this. Naively leaving
pi_state->owner set while returning EAGAIN shouldn't be a problem
because put_pi_state() should clean that up.

_However_, that does rt_mutex_proxy_unlock() on it as well, and _that_
is a problem, because it's not the rt_mutex owner.

Then again, we can hit this very problem through any of the other
put_pi_state() calls after setting ->owner = &init_task I think. Which
would argue to special case this in put_pi_state() instead.

> +	if (oldowner == &init_task && uval != 0) {
> +		raw_spin_lock(&pi_state->owner->pi_lock);
> +		list_del_init(&pi_state->list);
> +		raw_spin_unlock(&pi_state->owner->pi_lock);
> +		pi_state->owner = NULL;
> +		return -EAGAIN;
>  	}
>  
> +	newval = (uval & FUTEX_OWNER_DIED) | newtid;
> +
> +	if (cmpxchg_futex_value_locked(&curval, uaddr, uval, newval))
> +		goto handle_fault;
> +
> +	if (curval != uval)
> +		goto retry;

This is slightly weird in that we loose the obvious cmpxchg loop
construct. So I'd write it differently, also that
get_futex_value_locked() call is entirely superfluous the second time
around, we got curval after all.

> +
>  	/*
>  	 * We fixed up user space. Now we need to fix the pi_state
>  	 * itself.
> @@ -2679,6 +2690,10 @@ static int futex_lock_pi(u32 __user *uad
>  
>  out_put_key:
>  	put_futex_key(&q.key);
> +
> +	if (ret == -EAGAIN)
> +		goto retry;
> +

And this is far too clever and really needs a comment. So the crucial
point is that this is after unqueue_me_pi(), which drops the pi_state
and loops back to lookup the pi_state again, which, hopefully, has now
been completely destroyed -- and therefore we hit the regular
attach_to_pi_owner() path, fixing up our 'funny' state.

This is where I was playing with 4+ CPU scenarios, to see if we could
somehow keep the pi_state alive and not make progress.

I think we can do the retry slightly earlier, right after
unqueue_me_pi() and then add retry_queue: right before
futex_lock_pi_atomic(), that would avoid dropping the hb->lock (and
avoids put/get_futex_key).

>  out:
>  	if (to)
>  		destroy_hrtimer_on_stack(&to->timer);

Also, since fixup_pi_state_owner() can now return -EAGAIN, all users
need handling for that.

I'll try and do a patch that does all that and attempt to write coherent
comments on our fancy new state.

next prev parent reply	other threads:[~2016-11-23 20:19 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-03  9:12 [RFC][PATCH 0/4] FUTEX_UNLOCK_PI wobbles Peter Zijlstra
2016-10-03  9:12 ` [RFC][PATCH 1/4] futex: Cleanup variable names for futex_top_waiter() Peter Zijlstra
2016-10-03 14:15   ` Steven Rostedt
2016-10-05  3:58   ` Davidlohr Bueso
2016-10-03  9:12 ` [RFC][PATCH 2/4] futex: Use smp_store_release() in mark_wake_futex() Peter Zijlstra
2016-10-03 14:19   ` Steven Rostedt
2016-10-05  3:57   ` Davidlohr Bueso
2016-10-05  6:20     ` Peter Zijlstra
2016-10-03  9:12 ` [RFC][PATCH 3/4] futex: Remove rt_mutex_deadlock_account_*() Peter Zijlstra
2016-10-03  9:34   ` Peter Zijlstra
2016-10-03 14:25   ` Steven Rostedt
2016-10-05  1:08   ` Davidlohr Bueso
2016-10-05  7:29   ` Sebastian Andrzej Siewior
2016-10-03  9:12 ` [RFC][PATCH 4/4] futex: Rewrite FUTEX_UNLOCK_PI Peter Zijlstra
2016-10-03 15:36   ` Steven Rostedt
2016-10-03 15:44     ` Peter Zijlstra
2016-10-03 15:45     ` Peter Zijlstra
2016-10-03 16:23       ` Steven Rostedt
2016-10-05  7:41   ` Sebastian Andrzej Siewior
2016-10-05  8:09     ` Peter Zijlstra
2016-10-05  8:21       ` Sebastian Andrzej Siewior
2016-10-05  8:32         ` Peter Zijlstra
2016-10-06 10:29   ` Peter Zijlstra
2016-10-07 11:21   ` Peter Zijlstra
2016-10-08 15:53     ` Thomas Gleixner
2016-10-08 16:55       ` Peter Zijlstra
2016-10-08 17:06         ` Thomas Gleixner
2016-10-10 10:17         ` Thomas Gleixner
2016-10-10 11:40           ` Peter Zijlstra
2016-10-21 12:27           ` Peter Zijlstra
2016-10-27 20:36             ` Thomas Gleixner
2016-11-23 19:20               ` Peter Zijlstra [this message]
2016-11-24 16:52                 ` Peter Zijlstra
2016-11-24 17:56                   ` Thomas Gleixner
2016-11-24 18:58                     ` Peter Zijlstra
2016-11-25  9:23                       ` Peter Zijlstra
2016-11-25 10:03                         ` Peter Zijlstra
2016-11-25 19:13                           ` Thomas Gleixner
2016-11-25 14:09               ` Peter Zijlstra
2016-10-08 18:22     ` Thomas Gleixner
2016-10-09 11:17     ` Thomas Gleixner
2016-10-10 14:06       ` Peter Zijlstra
2016-10-05  1:02 ` [RFC][PATCH 0/4] FUTEX_UNLOCK_PI wobbles Davidlohr Bueso
2016-10-05  6:20   ` Peter Zijlstra
2016-10-05  7:26     ` Sebastian Andrzej Siewior
2016-10-05 16:04     ` Davidlohr Bueso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161123192005.GA3107@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=bigeasy@linutronix.de \
    --cc=bristot@redhat.com \
    --cc=jdesfossez@efficios.com \
    --cc=juri.lelli@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=xlpang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.