From: Dinakar Guniguntala <dino@in.ibm.com>
To: david singleton <dsingleton@mvista.com>
Cc: Ingo Molnar <mingo@elte.hu>,
linux-kernel@vger.kernel.org, robustmutexes@lists.osdl.org
Subject: Re: Recursion bug in -rt
Date: Sat, 17 Dec 2005 00:12:09 +0530 [thread overview]
Message-ID: <20051216184209.GD4732@in.ibm.com> (raw)
In-Reply-To: <43F8915C-6DC7-11DA-A45A-000A959BB91E@mvista.com>
On Thu, Dec 15, 2005 at 04:02:36PM -0800, david singleton wrote:
> Dinakar,
>
> I believe the problem we have is that the library is not checking
> to see if the mutex is a recursive mutex, and then checking to see
> if the recursive mutex is already owned by the calling thread. If a
> recursive mutex
> is owned by the calling thread the library should increment the lock
> count (POSIX says recursive mutexes must be unlocked as
> many times as they are locked) and return 0 (success) to the caller.
Sorry I seem to have confused you. I am _not_ talking about recursive
mutexes here.
I have two testcases. Here is a code snippet of testcase I
void* test_thread (void* arg)
{
pthread_mutex_lock(&child_mutex);
printf("test_thread got lock\n");
pthread_mutex_lock(&child_mutex);
printf("test_thread got lock 2nd time !!\n");
printf("test_thread exiting\n");
return NULL;
}
main ()
{
...
if (pthread_mutexattr_init(&mutexattr) != 0) {
printf("Failed to init mutexattr\n");
};
if (pthread_mutexattr_setprotocol(&mutexattr,
PTHREAD_PRIO_INHERIT) != 0) {
printf("Can't set protocol prio inherit\n");
}
if (pthread_mutexattr_getprotocol(&mutexattr, &protocol) != 0) {
printf("Can't get mutexattr protocol\n");
} else {
printf("protocol in mutexattr is %d\n", protocol);
}
if ((retc = pthread_mutex_init(&child_mutex, &mutexattr)) != 0) {
printf("Failed to init mutex: %d\n", retc);
}
...
}
Clearly what the application is doing here is wrong. However,
1. In the case of normal (non-robust) non recursive mutexes, the
behaviour when we make the second pthread_mutex_lock call is for glibc
to make a futex_wait call which will block forever.
(Which is the right behaviour)
2. In the case of a robust/PI non recursive mutex, the current
behaviour is the glibc makes a futex_wait_robust call (which is right)
The kernel (2.6.15-rc5-rt1) rt_mutex lock is currently unowned and
since we do not call down_try_futex if current == owner_task, we end
up grabbing the lock in __down_interruptible and returning succesfully !
3. Adding the check below in down_futex is also wrong
if (!owner_task || owner_task == current) {
up(sem);
up_read(¤t->mm->mmap_sem);
return -EAGAIN;
}
This will result in glibc calling the kernel continuosly in a
loop and we will end up context switching to death
I guess we need to cleanup this path to ensure that the application
blocks forever.
I also have testcase II which does not do anything illegal as the
above one, instead it exercises the PI boosting code path in the
kernel and that is where I see the system hang up yet again
and this is the race that I am currently investigating
Hope that clearls up things a bit
-Dinakar
>
> I don't see anywhere in the library where the check for a recursive
> mutex is being made. The mutex_trylock code just does a compare
> and exchange on the lock to see if it can get it. A recursive mutex
> will fail this test and erroneously enter the kernel.
>
> I believe we need the down_futex change and a patch
> to glibc in which recursive mutexes are handled in the library.
>
> I'm talking to the library people now to see if that's the way
> it's supposed to work for recursive mutexes. If it is we'll have
> to provide a new glibc patch along with the kernel patch.
>
> I think I'll have to provide a new glibc patch anyways to
> fix the INLINE_SYSCALL problem.
>
> Does this make sense?
>
> David
>
> On Dec 15, 2005, at 11:44 AM, Dinakar Guniguntala wrote:
>
> >On Wed, Dec 14, 2005 at 05:03:13PM -0800, david singleton wrote:
> >>Dinakar,
> >>you may be correct, since reverting the change in down_futex seems
> >>to work.
> >
> >Well it did not. It ran for much longer than previously but still
> >hung up.
> >
> >Well we have a very basic problem in the current implementation.
> >
> >Currently if a thread calls pthread_mutex_lock on the same lock
> >(normal, non recursive lock) twice without unlocking in between, the
> >application hangs. Which is the right behaviour.
> >However if the same thing is done with a non recursive robust mutex,
> >it manages to acquire the lock.
> >
> >I see many problems here (I am assuming that the right behaviour
> >with robust mutexes is for application to ultimately block
> >indefinitely in the kernel)
> >
> >1. In down_futex we do the following check
> >
> > if (owner_task != current)
> > down_try_futex(lock, owner_task->thread_info __EIP__);
> >
> > In the above scenario, the thread would have acquired the
> >uncontended
> > lock first time around in userspace. The second time it tries to
> > acquire the same mutex, because of the above check, does not
> > call down_try_futex and hence will not initialize the lock structure
> > in the kernel. It then goes to __down_interruptible where it is
> > granted the lock, which is wrong.
> >
> > So IMO the above check is not right. However removing this check
> > is not the end of story. This time it gets to task_blocks_on_lock
> > and tries to grab the task->pi_lock of the owvner which is itself
> > and results in a system hang. (Assuming CONFIG_DEBUG_DEADLOCKS
> > is not set). So it looks like we need to add some check to
> > prevent this below in case lock_owner happens to be current.
> >
> > _raw_spin_lock(&lock_owner(lock)->task->pi_lock);
> >
> >
> >>However, I'm wondering if you've hit a bug that Dave Carlson is
> >>reporting that he's tracked down to an inline in the glibc patch.
> >
> >Yes I noticed that, basically it looks like INLINE_SYSCALL, on error
> >returns -1 with the error code in errno. Whereas we expect the
> >return code to be the same as the kernel return code. Are you referring
> >to this or something else ??
> >
> >However even with all of the above fixes (remove the check in
> >down_futex
> >as mentioned above, add a check in task_blocks_on_lock and the glibc
> >changes) my test program continues to hang up the system, though it
> >takes a lot longer to recreate the problem now
> >
> >[snip]
> >
> >>1) Why did the library call into the kernel if the calling thread
> >>owned the lock?
> >
> >This is something I still havent figured out and leads me to believe
> >that we still have a tiny race race somewhere
> >
> > -Dinakar
>
next prev parent reply other threads:[~2005-12-16 18:25 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-12-14 22:39 Recursion bug in -rt Dinakar Guniguntala
2005-12-15 1:03 ` david singleton
2005-12-15 19:44 ` Dinakar Guniguntala
2005-12-15 20:40 ` David Singleton
2005-12-16 0:02 ` david singleton
2005-12-16 18:42 ` Dinakar Guniguntala [this message]
2005-12-16 21:26 ` David Singleton
2005-12-19 11:56 ` Dinakar Guniguntala
2005-12-19 20:11 ` David Singleton
2005-12-15 19:00 ` David Singleton
2005-12-15 19:52 ` Dinakar Guniguntala
2005-12-20 13:19 ` Ingo Molnar
2005-12-20 15:50 ` Dinakar Guniguntala
2005-12-20 17:43 ` Esben Nielsen
2005-12-20 19:33 ` Steven Rostedt
2005-12-20 20:42 ` Esben Nielsen
2005-12-20 21:20 ` Steven Rostedt
2005-12-20 21:55 ` david singleton
2005-12-20 22:56 ` Esben Nielsen
2005-12-20 23:12 ` Steven Rostedt
2005-12-20 23:55 ` Esben Nielsen
2005-12-22 4:37 ` david singleton
2005-12-20 22:43 ` Esben Nielsen
2005-12-20 22:59 ` Steven Rostedt
2006-01-03 1:54 ` david singleton
2006-01-05 2:14 ` david singleton
2006-01-05 9:43 ` Esben Nielsen
2006-01-05 17:11 ` david singleton
2006-01-05 17:47 ` Esben Nielsen
2006-01-05 18:26 ` david singleton
2006-01-07 2:40 ` robust futex deadlock detection patch david singleton
[not found] ` <a36005b50601071145y7e2ead9an4a4ca7896f35a85e@mail.gmail.com>
2006-01-07 19:49 ` Ulrich Drepper
2006-01-09 9:23 ` Esben Nielsen
2006-01-09 20:01 ` David Singleton
2006-01-09 20:16 ` Esben Nielsen
2006-01-09 21:08 ` Steven Rostedt
2006-01-09 21:19 ` Esben Nielsen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20051216184209.GD4732@in.ibm.com \
--to=dino@in.ibm.com \
--cc=dsingleton@mvista.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=robustmutexes@lists.osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox