Re: [TIP][RFC 6/7] futex: add requeue_pi calls

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Thomas Gleixner <tglx@linutronix.de>
To: Darren Hart <dvhltc@us.ibm.com>
Cc: "lkml, " <linux-kernel@vger.kernel.org>,
	Steven Rostedt <srostedt@redhat.com>,
	Sripathi Kodi <sripathik@in.ibm.com>,
	John Stultz <johnstul@linux.vnet.ibm.com>
Subject: Re: [TIP][RFC 6/7] futex: add requeue_pi calls
Date: Tue, 10 Mar 2009 14:39:25 +0100 (CET)	[thread overview]
Message-ID: <alpine.LFD.2.00.0903101242580.29264@localhost.localdomain> (raw)
In-Reply-To: <49B5F17C.8020100@us.ibm.com>

On Mon, 9 Mar 2009, Darren Hart wrote:
> Thomas Gleixner wrote:
> > > +			int has_timeout;
> > 
> > Hmm. time == 0 perhaps or just add another flag bit to the flags field ?
> 
> I've created FLAGS_HAS_TIMEOUT and use that instead.  I opted for this
> approach instead of using time==0 as that seemed like it could be confused for
> an expired timer.

Yup. Preferred solution.
 
> > > +	/*
> > > +	 * The pifutex has an owner, make sure it's us, if not complain
> > > +	 * to userspace.
> > > +	 * FIXME_LATER: handle this gracefully
> > 
> > We should do this right now. It's not that hard.
> 
> Hrm. So if another task owns the futex then POSIX says undefined scheduling be
> should be expected.  So failing here seems like a good way to encourage proper
> usage of the cond vars. 

Non predicatable != invalid usage

> Or can you think of a sane scenario in which the
> outer mutex should not be held prior to the call to cond_signal() or
> cond_broadcast()?  If you are thinking outside of glibc cond vars, then there
> are likely other gaps with this patch, such as only being able to requeue from
> non-pi to pi futexes.

No, I think about glibc and the madness of existing code. Code just
relies on that and it works with the current glibc implementation so
making the enhanced requeue_pi behave different will break existing
applications and has to be considered as a regression.
 
> That being said... now that this is more or less implemented, I'm not sure
> there is really anything I'd need to do different to support this.  I'll pour
> over this for a while and see if there are any gotchas that I'm missing right
> now.

I think you have everything what you need already except of some minor
tweaks.
 
> > 
> > > +	 */
> > > +	pid = curval & FUTEX_TID_MASK;
> > > +	if (pid && pid != task_pid_vnr(current))
> > > +		return -EMORON;
> > 
> > Though it's a pity that we lose EMORON :)
> 
> 
> Does that mean I'll have to forfeit the nickname bestowed upon me by LWN?
> I've received numerous comments on this line - all of them in favor.
> Apparently lots of kernel developers are eager for a way to slyly mock users
> from within the context of the kernel.  ;-)

Send a separate patch then. You have my Acked-by :)
 
> > > +/*
> > > + * Requeue all waiters hashed on one physical page to another physical
> > > page.
> > > + * In the requeue_pi case, either takeover uaddr2 or set FUTEX_WAITERS
> > > and
> > > + * setup the pistate.  FUTEX_REQUEUE_PI only supports requeueing from a
> > > non-pi
> > > + * futex to a pi futex.
> > >  */
> > > static int futex_requeue(u32 __user *uaddr1, int fshared, u32 __user
> > > *uaddr2,
> > > -			 int nr_wake, int nr_requeue, u32 *cmpval)
> > > +			 int nr_wake, int nr_requeue, u32 *cmpval,
> > > +			 int requeue_pi)
> > > {
> > > 	union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT;
> > > 	struct futex_hash_bucket *hb1, *hb2;
> > > 	struct plist_head *head1;
> > > 	struct futex_q *this, *next;
> > > -	int ret, drop_count = 0;
> > > +	u32 curval2;
> > > +	struct futex_pi_state *pi_state = NULL;
> > > +	int drop_count = 0, attempt = 0, task_count = 0, ret;
> > > +
> > > +	if (requeue_pi && refill_pi_state_cache())
> > > +		return -ENOMEM;
> > 
> >   Why do you want to refill the pi_state_cache of current ? current is
> >   not going to wait for the pi_futex.
> 
> 
> I may need to allocate a pi_state during
> futex_requeue_pi_init()->futex_lock_pi_atomic()->lookup_pi_state(). Making use
> of this current->pi_cache seemed like the best way to do it.  Is there a more
> appropriate place to store a preallocated pi_state?

Hmm. You are right. That's the best way to have one handy.
 
> > > +		if (requeue_pi &&
> > > +		    ((curval2 & FUTEX_TID_MASK) != task_pid_vnr(this->task)))
> > > {
> > 
> >   Why don't you check the owner of the rtmutex, which we installed
> >   already ? Then we can drop this curval2 business altogether.
> 
> 
> Because we can own the futex without an rt_mutex existing.  The first time
> through this loop we consider the top_waiter of uaddr1.  If uaddr2 had no
> owner, then it is now owned by the top_waiter, but since there are still no
> waiters on uaddr2, the rt_mutex has not been initialized. Once we know we are
> not the owner, then we also know the pi_state->pi_mutex exists.
> 
> Hrm.... I suppose a test for "!this->pi_state" is equivalent to
> "(curval2 & FUTEX_TID_MASK) != task_pid_vnr(this->task)" then isn't it...?
> Seems like a rather fragile test though doesn't it?
> 
> We wouldn't save much by eliminating the curval2 here though since
> futex_requeue_pi_init needs it for the atomic_lock, so we still need the fault
> logic... unfortunatley.
> 
> Thoughts?

In futex_requeue_pi_init() we figure already out who owns the
futex, right ? So why don't we capture that information ?

> > > +	if (abs_time) {
> > > +		unsigned long slack;
> > 
> >   Missing new line. Also this is nonsense. A rt task should have set
> >   its timer_slack_ns to 0, so we can just use current->timer_slack_ns
> >   in the hrtimer_set_expires_range_ns(). If the timer_slack_ns is
> >   random for rt_tasks then we need to fix this in general.
> > 
> 
> 
> Added a WARN_ON(!current->timer_slack_ns) to my debug patch and used that
> value here.  I'll create a patch to futex-fixes to address the existing calls
> that I copied this approach from.

Thanks.

> >   Also the logic is completely backwards here. It has to be the same
> >   as in futex_wait()
> 
> 
> Is it broken?  I believe the reason I wrote it as I did was because the ret ==
> 0 case should be very rare here.  Infact, it is an error.  The only reason we
> should wake on the first futex (uaddr) is on a signal or a timeout, otherwise
> the user most likely paired a FUTEX_WAKE_REQUEUE_PI  with FUTEX_WAKE, instead
> of FUTEX_REQUEUE_PI.
> 
> I can change it around if you have a strong preference though (or if it's
> broken in some way I'm missing of course).

The 0 case might be rare, but it is definitely not an error.

    futex_wait_queue_me(hb, &q, to);
    	queue_me();
	    drops hb->lock;

    unqueue_me() is called w/o the hb->lock held, so we can race here
    against a requeue_pi wakeup.

    So we need to check the unqueue_me() == 0 case first and check
    whether we got requeued.

    The requeue check is done unlocked, so the requeue and wakeup
    might happen between the check and unqueue_me().

But this needs more thoughts about locking and possible races. The
whole code sequence here screams "can you debug the nasty races here ?"

> > > +	/* FIXME: this test is REALLY scary... gotta be a better way...
> > > +	 * If the pi futex was uncontended, futex_requeue_pi_init() gave us
> > > +	 * the lock.
> > > +	 */
> > 
> >   Didn't we take the rtmutex as well ??
> 
> 
> Not necessarily.  If the pi futex was ownerless and there was only one task
> requeued, then there is no rtmutex as it is created by the first waiter.
> 
> It is possible that other tasks have since tried to take this mutex and now
> there is an rtmutex, but we have to catch the case where there might not be
> one.

Fair enough.

> > 
> > > +	if (!q.pi_state) {

Again. This check might be racy.

> > > +		ret = 0;
> > > +		goto out;
> > > +	}
> > > +	pi_mutex = &q.pi_state->pi_mutex;
> > > +
> > > +	ret = rt_mutex_finish_proxy_lock(pi_mutex, to, &rt_waiter, 1);
> > 
> > 
> >   Eeek. We got a wakeup _after_ we have been requeued. So now you go
> >   back to sleep on the pi_mutex ?
> 
> 
> Yes.  The futex_requeue() calls wake_up() which allows the waiter here to wake
> and continue processing it's half of the rt_mutex acquisition loop that I
> split out of rt_mutex_slowlock() in rt_mutex_finish_proxy_lock().
 
Ah, ok. We try to get the rt_mutex there as we are only the designated
owner. If we fail, then we go back to sleep.
 
Thanks,

	tglx

next prev parent reply	other threads:[~2009-03-10 13:40 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-03  0:02 [TIP][RFC 0/7] requeue pi implemenation Darren Hart
2009-03-03  0:09 ` [TIP][RFC 1/7] futex: futex_wait_queue_me() Darren Hart
2009-03-03  0:11 ` [TIP][RFC 2/7] futex: futex_top_waiter() Darren Hart
2009-03-07 15:16   ` Thomas Gleixner
2009-03-09 18:04     ` Darren Hart
2009-03-03  0:13 ` [TIP][RFC 3/7] futex: futex_lock_pi_atomic() Darren Hart
2009-03-03 13:03   ` Peter Zijlstra
2009-03-03 17:29     ` Darren Hart
2009-03-03  0:14 ` [TIP][RFC 4/7] futex: finish_futex_lock_pi() Darren Hart
2009-03-07 15:30   ` Thomas Gleixner
2009-03-09 18:05     ` Darren Hart
2009-03-03  0:16 ` [TIP][RFC 5/7] rt_mutex: add proxy lock routines Darren Hart
2009-03-07 15:44   ` Thomas Gleixner
2009-03-09 18:31     ` Darren Hart
2009-03-03  0:20 ` [TIP][RFC 6/7] futex: add requeue_pi calls Darren Hart
2009-03-04  7:53   ` Darren Hart
2009-03-05 16:51     ` Darren Hart
2009-03-06  1:42       ` Darren Hart
2009-03-06  2:21         ` Steven Rostedt
2009-03-06  5:27           ` Darren Hart
2009-03-07 15:50             ` Thomas Gleixner
2009-03-09 19:55               ` Darren Hart
2009-03-07  6:03         ` Sripathi Kodi
2009-03-09  9:48   ` Thomas Gleixner
2009-03-10  4:50     ` Darren Hart
2009-03-10 13:39       ` Thomas Gleixner [this message]
2009-03-03  0:23 ` [TIP][RFC 7/7] requeue pi testcase Darren Hart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.00.0903101242580.29264@localhost.localdomain \
    --to=tglx@linutronix.de \
    --cc=dvhltc@us.ibm.com \
    --cc=johnstul@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sripathik@in.ibm.com \
    --cc=srostedt@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.