All of lore.kernel.org
 help / color / mirror / Atom feed
From: Darren Hart <dvhltc@us.ibm.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	LKML <linux-kernel@vger.kernel.org>,
	Nick Piggin <npiggin@suse.de>,
	Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: futex: race in lock and unlock&exit for robust futex with PI?
Date: Fri, 25 Jun 2010 16:35:14 -0700	[thread overview]
Message-ID: <4C253D32.6040304@us.ibm.com> (raw)
In-Reply-To: <4C24ED34.9040808@us.ibm.com>

On 06/25/2010 10:53 AM, Darren Hart wrote:
> On 06/25/2010 01:27 AM, Michal Hocko wrote:
>> On Thu 24-06-10 19:42:50, Darren Hart wrote:
>>> On 06/23/2010 02:13 AM, Michal Hocko wrote:

>>>> attached you can find a simple test case which fails quite easily on 
>>>> the
>>>> following glibc assert:
>>>> "SharedMutexTest: pthread_mutex_lock.c:289: __pthread_mutex_lock:
>>>> Assertion `(-(e)) != 3 || !robust' failed." "
>>>
>>> I've run runSimple.sh in a tight loop for a couple hours (about 2k
>>> iterations so far) and haven't seen anything other than "Here we go"
>>> printed to the console.
>>
>> Maybe a higher load on CPUs would help (busy loop on other CPUs).
> 
> Must have been a build issue. I can reproduce _something_ now. Within 10 
> iterations of runSimple.sh the test hangs. ps shows all the simple 
> processes sitting in pause.
> 
> (gdb) bt
> #0 0x0000003c0060e030 in __pause_nocancel () from /lib64/libpthread.so.0
> #1 0x0000003c006085fc in __pthread_mutex_lock_full ()
> from /lib64/libpthread.so.0
> #2 0x0000000000400cd6 in main (argc=1, argv=0x7fffc016e508) at simple.c:101
> 
> There is only one call to pause* in pthread_mutex_lock.c: (line ~316):
> 
> /* ESRCH can happen only for non-robust PI mutexes where
> the owner of the lock died. */
> assert (INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust);
> 
> /* Delay the thread indefinitely. */
> while (1)
> pause_not_cancel ();
> 
> Right now I'm thinking that NDEBUG is set in my build for whatever 
> reason, but I think I'm seeing the same issue you are. I'll review the 
> futex code and prepare a trace patch and see if I can reproduce with that.
> 
> Note: confirmed, the glibc rpm has -DNDEBUG=1

The simple tracing patch (below) confirms that we are indeed returning
-ESRCH to userspace from futex_lock_pi(). Notice that the pids of the
two "simple" processes lingering after the runSimple.sh script are the
ones that return -ESRCH to userspace, and therefor end up in the
pause_not_cancel() trap inside glibc.

# trace-cmd record -p nop ./runSimple.sh 
<snip>

# ps -eLo pid,comm,wchan | grep "simple "
20636 simple          pause
20876 simple          pause

# trace-cmd report
version = 6
CPU 0 is empty
cpus=4
field->offset = 24 size=8
           <...>-20636 [003]  1778.965860: bprint:               futex_lock_pi_atomic : lookup_pi_state: -ESRCH
           <...>-20636 [003]  1778.965865: bprint:               futex_lock_pi_atomic : ownerdied not detected, returning -ESRCH
           <...>-20636 [003]  1778.965866: bprint:               futex_lock_pi_atomic : lookup_pi_state: -3
>>--->     <...>-20636 [003]  1778.965867: bprint:               futex_lock_pi : returning -ESRCH to userspace
           <...>-20876 [001]  1780.199394: bprint:               futex_lock_pi_atomic : cmpxchg failed, retrying
           <...>-20876 [001]  1780.199400: bprint:               futex_lock_pi_atomic : lookup_pi_state: -ESRCH
           <...>-20876 [001]  1780.199401: bprint:               futex_lock_pi_atomic : ownerdied not detected, returning -ESRCH
           <...>-20876 [001]  1780.199402: bprint:               futex_lock_pi_atomic : lookup_pi_state: -3
>>--->     <...>-20876 [001]  1780.199403: bprint:               futex_lock_pi : returning -ESRCH to userspace
           <...>-21316 [002]  1782.300695: bprint:               futex_lock_pi_atomic : cmpxchg failed, retrying
           <...>-21316 [002]  1782.300698: bprint:               futex_lock_pi_atomic : cmpxchg failed, retrying

Attaching gdb to 20636, we can see the state of the mutex:
(gdb) print (struct __pthread_mutex_s)*mutex
$1 = {__lock = 0, __count = 1, __owner = 0, __nusers = 0, __kind = 176, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}

This is consistent with hex dump of the first bits of the backing file:
# xxd test.file | head -n 3
0000000: 0000 0000 0100 0000 0000 0000 0000 0000  ................
0000010: b000 0000 0000 0000 0000 0000 0000 0000  ................
0000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................

The futex (__lock) value is 0, indicating it is unlocked and has no waiters. The count being 1 however suggests a task has acquired it once, which, if I read the glibc source correctly, means the owner field and __lock fields should not be 0. This supports Michal's thought about lock racing with unlock, seeing it's held, then unable to find the owner (pi_state) as it has since been unlocked. Possibly some horkage with the WAITERS bit leading to glibc performing atomic acquistions/releases and rendering the mutex inconsistent with the kernel's view. This should be protected against, but that is the direction I am going to start looking.

--
Darren Hart

>From 92014a07df73489460ff788274506255ff0f775d Mon Sep 17 00:00:00 2001
From: Darren Hart <dvhltc@us.ibm.com>
Date: Fri, 25 Jun 2010 13:54:25 -0700
Subject: [PATCH] robust pi futex tracing

---
 kernel/futex.c |   24 ++++++++++++++++++++----
 1 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index e7a35f1..24ac437 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -683,6 +683,8 @@ retry:
 	 */
 	if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
 		/* Keep the OWNER_DIED bit */
+		if (ownerdied)
+			trace_printk("ownerdied, taking over lock\n");
 		newval = (curval & ~FUTEX_TID_MASK) | task_pid_vnr(task);
 		ownerdied = 0;
 		lock_taken = 1;
@@ -692,14 +694,18 @@ retry:
 
 	if (unlikely(curval == -EFAULT))
 		return -EFAULT;
-	if (unlikely(curval != uval))
+	if (unlikely(curval != uval)) {
+		trace_printk("cmpxchg failed, retrying\n");
 		goto retry;
+	}
 
 	/*
 	 * We took the lock due to owner died take over.
 	 */
-	if (unlikely(lock_taken))
+	if (unlikely(lock_taken)) {
+		trace_printk("ownerdied, lock acquired, return 1\n");
 		return 1;
+	}
 
 	/*
 	 * We dont have the lock. Look up the PI state (or create it if
@@ -710,13 +716,16 @@ retry:
 	if (unlikely(ret)) {
 		switch (ret) {
 		case -ESRCH:
+			trace_printk("lookup_pi_state: -ESRCH\n");
 			/*
 			 * No owner found for this futex. Check if the
 			 * OWNER_DIED bit is set to figure out whether
 			 * this is a robust futex or not.
 			 */
-			if (get_futex_value_locked(&curval, uaddr))
+			if (get_futex_value_locked(&curval, uaddr)) {
+				trace_printk("get_futex_value_locked: -EFAULT\n");
 				return -EFAULT;
+			}
 
 			/*
 			 * We simply start over in case of a robust
@@ -724,10 +733,13 @@ retry:
 			 * and return happy.
 			 */
 			if (curval & FUTEX_OWNER_DIED) {
+				trace_printk("ownerdied, goto retry\n");
 				ownerdied = 1;
 				goto retry;
 			}
+			trace_printk("ownerdied not detected, returning -ESRCH\n");
 		default:
+			trace_printk("lookup_pi_state: %d\n", ret);
 			break;
 		}
 	}
@@ -1950,6 +1962,8 @@ retry_private:
 			put_futex_key(fshared, &q.key);
 			cond_resched();
 			goto retry;
+		case -ESRCH:
+			trace_printk("returning -ESRCH to userspace\n");
 		default:
 			goto out_unlock_put_key;
 		}
@@ -2537,8 +2551,10 @@ void exit_robust_list(struct task_struct *curr)
 		/*
 		 * Avoid excessively long or circular lists:
 		 */
-		if (!--limit)
+		if (!--limit) {
+			trace_printk("excessively long list, aborting\n");
 			break;
+		}
 
 		cond_resched();
 	}
-- 
1.7.0.4

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

  reply	other threads:[~2010-06-25 23:35 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-23  9:13 futex: race in lock and unlock&exit for robust futex with PI? Michal Hocko
2010-06-25  2:42 ` Darren Hart
2010-06-25  8:27   ` Michal Hocko
2010-06-25 17:53     ` Darren Hart
2010-06-25 23:35       ` Darren Hart [this message]
2010-06-28 14:42         ` Michal Hocko
2010-06-28 14:56           ` Darren Hart
2010-06-28 15:32           ` Michal Hocko
2010-06-28 15:40             ` Michal Hocko
2010-06-28 15:58             ` Michal Hocko
2010-06-28 16:39               ` Michal Hocko
2010-06-28 16:45                 ` Peter Zijlstra
2010-06-28 16:56                   ` Michal Hocko
2010-06-28 16:49                 ` Peter Zijlstra
2010-06-29  8:42                   ` [PATCH] futex: futex_find_get_task make credentials check conditional Michal Hocko
2010-06-29 14:56                     ` Darren Hart
2010-06-29 15:24                       ` Michal Hocko
2010-06-29 16:41                     ` Linus Torvalds
2010-06-29 16:58                       ` Darren Hart
2010-06-29 18:03                         ` Thomas Gleixner
2010-06-30  7:01                       ` Michal Hocko
2010-06-30  9:55                         ` [PATCH] futex: futex_find_get_task remove credentails check Michal Hocko
2010-06-30 16:43                           ` Darren Hart
2010-07-08  9:28                             ` Michal Hocko
2010-07-08  9:32                               ` Ingo Molnar
2010-07-08  9:39                                 ` Michal Hocko
2010-07-08  9:43                                   ` Peter Zijlstra
2010-07-08  9:50                                     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C253D32.6040304@us.ibm.com \
    --to=dvhltc@us.ibm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.cz \
    --cc=npiggin@suse.de \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.