Re: [RFC][PATCH] muptiple bugs in PI futexes

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Thomas Gleixner <tglx@linutronix.de>
To: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Ingo Molnar <mingo@elte.hu>,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Ulrich Drepper <drepper@redhat.com>
Subject: Re: [RFC][PATCH] muptiple bugs in PI futexes
Date: Tue, 05 Jun 2007 18:25:11 +0200	[thread overview]
Message-ID: <1181060711.4404.114.camel@chaos> (raw)
In-Reply-To: <20070523115159.GA30251@ms2.inr.ac.ru>

Hi,

On Wed, 2007-05-23 at 15:51 +0400, Alexey Kuznetsov wrote:
> The first chunk: results in self-inflicted deadlock inside glibc.
> Sometimes futex_lock_pi returns -ESRCH, when it is not expected
> and glibc enters to for(;;) sleep() to simulate deadlock. This problem
> is quite obvious and I think the patch is right. Though it looks like
> each "if" in futex_lock_pi() got some stupid special case "else if". :-)

Hmm, what means not expected ? -ESRCH is returned, when the owner task
is not found. 

> +		} else if (ret == -ESRCH) {
> +			/* Process could exit right now, so that robust list
> +			 * was processed after we got uval. Retry. */
> +			pagefault_disable();
> +			curval = futex_atomic_cmpxchg_inatomic(uaddr,
> +							       uval, uval);

When we retrieve uval, we hold the hash bucket lock and we keep it down
to this point, so the robust list processing is stuck on the hash bucket
lock. Also using uval is wrong. The user space variable holds "newval",
which was written there before:

 curval = cmpxchg_futex_value_locked(uaddr, uval, newval); 

So you just go back up to retry_locked or into the fault handling path
(which is unlikely).

This does not really explain, why you do prevent the -ESRCH return value
in the next cycle, except for the case, where we go through the fault
handling path.

> The second chunk: sometimes futex_lock_pi() returns -EDEADLK,
> when nobody has the lock. The reason is also obvious (see comment
> in the patch), but correct fix is far beyond my comprehension.
> I guess someone already saw this, the chunk:
> 
>                         if (rt_mutex_trylock(&q.pi_state->pi_mutex))
>                                 ret = 0;
> 
> is obviously from the same opera. But it does not work, because the
> rtmutex is really taken at this point: wake_futex_pi() of previous
> owner reassigned it to us.

No, -EDEADLK is returned from here:

	ret = rt_mutex_timed_lock(&q.pi_state->pi_mutex, to, 1);

The rtmutex code only returns -EDEADLK, when the lock is already held by
the task or when it runs into a circular rtmutex dependency (e.g. a ABBA
deadlock) during the PI-walk.

Also wake_futex_pi() does not assign the ownership of the rtmutex, it
assigns the ownership of the futex pi state and unlocks the rtmutex,
which sets the pending owner. The pending owner is the highest priority
waiter. The ownership assignment of the rtmutex happens inside the
rtmutex code, not in wake_futex_pi.

The trylock is covering the following situation:

rt_mutex_timed_lock(&q.pi_state->pi_mutex, to, 1) returns -ETIMEOUT;

Now it get's blocked on:
	spin_lock(q.lock_ptr);

because on the other CPU the futex is unlocked. The rt-mutex code does
not find a waiter and unlocks the rtmutex with no new owner. The hack
bucket lock is released and the task which is blocked on the hash bucket
lock has a chance to grab it.

>  My fix works. But it looks very stupid.
> I would think about removal of shift of ownership in wake_futex_pi()
> and making all the work in context of process taking lock.

I have no clue why it works. The only explanation is, that an existing
deadlock is resolved by the exiting task.

>  		if (ret && q.pi_state->owner == curr) {

This happens only, when the rtmutex was unlocked by another task between
the return from the rtmutex code and reacquiring the hash bucket lock.

>  			if (rt_mutex_trylock(&q.pi_state->pi_mutex))
>  				ret = 0;

If we can get the rtmutex here, we are the owner of the lock and need to
fix up ownership in the user space variable further down.

> +			/* Holy crap... Now futex lock returns -EDEADLK
> +			 * sometimes, because ownership was passed to us while
> +			 * unlock of previous owner. Who wrote this?

I admit, that I was one of the authors :)

> +			 * Please, fix this correctly. For now:
> +			 */

The fix is to remove it and to find the real cause of the problem.

I'm running the glibc tests since hours w/o tripping into it.

	tglx

next prev parent reply	other threads:[~2007-06-05 16:25 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-07 14:43 [RFC][PATCH] muptiple bugs in PI futexes Alexey Kuznetsov
2007-05-23  7:26 ` Ingo Molnar
2007-05-23 11:51   ` Alexey Kuznetsov
2007-06-05 16:25     ` Thomas Gleixner [this message]
2007-06-05 17:39       ` Alexey Kuznetsov
2007-06-05 18:48         ` Thomas Gleixner
2007-06-05 19:15           ` Thomas Gleixner
2007-06-05 21:00             ` Alexey Kuznetsov
2007-06-05 21:13               ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1181060711.4404.114.camel@chaos \
    --to=tglx@linutronix.de \
    --cc=akpm@linux-foundation.org \
    --cc=drepper@redhat.com \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.