Re: Recursion bug in -rt - Dinakar Guniguntala

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dinakar Guniguntala <dino@in.ibm.com>
To: david singleton <dsingleton@mvista.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	linux-kernel@vger.kernel.org, robustmutexes@lists.osdl.org
Subject: Re: Recursion bug in -rt
Date: Sat, 17 Dec 2005 00:12:09 +0530	[thread overview]
Message-ID: <20051216184209.GD4732@in.ibm.com> (raw)
In-Reply-To: <43F8915C-6DC7-11DA-A45A-000A959BB91E@mvista.com>

On Thu, Dec 15, 2005 at 04:02:36PM -0800, david singleton wrote:
> Dinakar,
> 
> 	I believe the problem we have is that the library is not checking
> to see if the mutex is a recursive mutex, and then checking to see
> if the recursive mutex is already owned by the calling thread.  If a 
> recursive mutex
> is owned by the calling thread the library should increment the lock
> count (POSIX says recursive mutexes must be unlocked as
> many times as they are locked) and return 0 (success) to the caller.

Sorry I seem to have confused you. I am _not_ talking about recursive
mutexes here.

I have two testcases. Here is a code snippet of testcase I

        void* test_thread (void* arg)
        {
            pthread_mutex_lock(&child_mutex);
            printf("test_thread got lock\n");

            pthread_mutex_lock(&child_mutex);
            printf("test_thread got lock 2nd time !!\n");

            printf("test_thread exiting\n");
            return NULL;
        }

        main ()
        {
            ...

            if (pthread_mutexattr_init(&mutexattr) != 0) {
              printf("Failed to init mutexattr\n");
            };
            if (pthread_mutexattr_setprotocol(&mutexattr,
                                        PTHREAD_PRIO_INHERIT) != 0) {
              printf("Can't set protocol prio inherit\n");
            }
            if (pthread_mutexattr_getprotocol(&mutexattr, &protocol) != 0) {
              printf("Can't get mutexattr protocol\n");
            } else {
              printf("protocol in mutexattr is %d\n", protocol);
            }
            if ((retc = pthread_mutex_init(&child_mutex, &mutexattr)) != 0) {
              printf("Failed to init mutex: %d\n", retc);
            }

            ...
        }


Clearly what the application is doing here is wrong. However,

1. In the case of normal (non-robust) non recursive mutexes, the
behaviour when we make the second pthread_mutex_lock call is for glibc
to make a futex_wait call which will block forever.
(Which is the right behaviour)

2. In the case of a robust/PI non recursive mutex, the current
behaviour is the glibc makes a futex_wait_robust call (which is right)
The kernel (2.6.15-rc5-rt1) rt_mutex lock is currently unowned and
since we do not call down_try_futex if current == owner_task, we end
up grabbing the lock in __down_interruptible and returning succesfully !

3. Adding the check below in down_futex is also wrong

   if (!owner_task || owner_task == current) {
        up(sem);
        up_read(&current->mm->mmap_sem);
        return -EAGAIN;
   }

   This will result in glibc calling the kernel continuosly in a
   loop and we will end up context switching to death

I guess we need to cleanup this path to ensure that the application
blocks forever.

I also have testcase II which does not do anything illegal as the
above one, instead it exercises the PI boosting code path in the
kernel and that is where I see the system hang up yet again
and this is the race that I am currently investigating

Hope that clearls up things a bit

	-Dinakar


> 
>      I don't see anywhere in the library where the check for a recursive
> mutex is being made.  The mutex_trylock code just does a compare
> and exchange on the lock to see if it can get it.  A recursive mutex
> will fail this test and erroneously enter the kernel.
> 
>     I believe we need the down_futex change and a patch
> to glibc in which recursive mutexes are handled in the library.
> 
>     I'm talking to the library people now to see if that's the way
> it's supposed to work for recursive mutexes.  If it is we'll have
> to provide a new glibc patch along with the kernel patch.
> 
>      I think I'll have to provide a new glibc patch anyways to
> fix the INLINE_SYSCALL problem.
> 
>      Does this make sense?
> 
> David
> 
> On Dec 15, 2005, at 11:44 AM, Dinakar Guniguntala wrote:
> 
> >On Wed, Dec 14, 2005 at 05:03:13PM -0800, david singleton wrote:
> >>Dinakar,
> >>you may be correct, since reverting the change in down_futex seems
> >>to work.
> >
> >Well it did not. It ran for much longer than previously but still
> >hung up.
> >
> >Well we have a very basic problem in the current implementation.
> >
> >Currently if a thread calls pthread_mutex_lock on the same lock
> >(normal, non recursive lock) twice without unlocking in between, the
> >application hangs. Which is the right behaviour.
> >However if the same thing is done with a non recursive robust mutex,
> >it manages to acquire the lock.
> >
> >I see many problems here (I am assuming that the right behaviour
> >with robust mutexes is for application to ultimately block
> >indefinitely in the kernel)
> >
> >1. In down_futex we do the following check
> >
> >	if (owner_task != current)
> >		down_try_futex(lock, owner_task->thread_info __EIP__);
> >
> >   In the above scenario, the thread would have acquired the 
> >uncontended
> >   lock first time around in userspace. The second time it tries to
> >   acquire the same mutex, because of the above check, does not
> >   call down_try_futex and hence will not initialize the lock structure
> >   in the kernel. It then goes to __down_interruptible where it is
> >   granted the lock, which is wrong.
> >
> >   So IMO the above check is not right. However removing this check
> >   is not the end of story.  This time it gets to task_blocks_on_lock
> >   and tries to grab the task->pi_lock of the owvner which is itself
> >   and results in a system hang. (Assuming CONFIG_DEBUG_DEADLOCKS
> >   is not set). So it looks like we need to add some check to
> >   prevent this below in case lock_owner happens to be current.
> >
> >    _raw_spin_lock(&lock_owner(lock)->task->pi_lock);
> >
> >
> >>However, I'm  wondering if you've hit a bug that Dave Carlson is
> >>reporting that he's tracked down to an inline in the glibc patch.
> >
> >Yes I noticed that, basically it looks like INLINE_SYSCALL, on error
> >returns -1 with the error code in errno. Whereas we expect the
> >return code to be the same as the kernel return code. Are you referring
> >to this or something else ??
> >
> >However even with all of the above fixes (remove the check in 
> >down_futex
> >as mentioned above, add a check in task_blocks_on_lock and the glibc
> >changes) my test program continues to hang up the system, though it
> >takes a lot longer to recreate the problem now
> >
> >[snip]
> >
> >>1) Why did the library call into the kernel if the calling thread
> >>owned the lock?
> >
> >This is something I still havent figured out and leads me to believe
> >that we still have a tiny race race somewhere
> >
> >	-Dinakar
>

next prev parent reply	other threads:[~2005-12-16 18:25 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-12-14 22:39 Recursion bug in -rt Dinakar Guniguntala
2005-12-15  1:03 ` david singleton
2005-12-15 19:44   ` Dinakar Guniguntala
2005-12-15 20:40     ` David Singleton
2005-12-16  0:02     ` david singleton
2005-12-16 18:42       ` Dinakar Guniguntala [this message]
2005-12-16 21:26         ` David Singleton
2005-12-19 11:56           ` Dinakar Guniguntala
2005-12-19 20:11             ` David Singleton
2005-12-15 19:00 ` David Singleton
2005-12-15 19:52   ` Dinakar Guniguntala
2005-12-20 13:19   ` Ingo Molnar
2005-12-20 15:50     ` Dinakar Guniguntala
2005-12-20 17:43       ` Esben Nielsen
2005-12-20 19:33         ` Steven Rostedt
2005-12-20 20:42           ` Esben Nielsen
2005-12-20 21:20             ` Steven Rostedt
2005-12-20 21:55               ` david singleton
2005-12-20 22:56                 ` Esben Nielsen
2005-12-20 23:12                   ` Steven Rostedt
2005-12-20 23:55                     ` Esben Nielsen
2005-12-22  4:37                       ` david singleton
2005-12-20 22:43               ` Esben Nielsen
2005-12-20 22:59                 ` Steven Rostedt
2006-01-03  1:54       ` david singleton
2006-01-05  2:14       ` david singleton
2006-01-05  9:43         ` Esben Nielsen
2006-01-05 17:11           ` david singleton
2006-01-05 17:47             ` Esben Nielsen
2006-01-05 18:26               ` david singleton
2006-01-07  2:40               ` robust futex deadlock detection patch david singleton
     [not found]                 ` <a36005b50601071145y7e2ead9an4a4ca7896f35a85e@mail.gmail.com>
2006-01-07 19:49                   ` Ulrich Drepper
2006-01-09  9:23                 ` Esben Nielsen
2006-01-09 20:01                   ` David Singleton
2006-01-09 20:16                     ` Esben Nielsen
2006-01-09 21:08                       ` Steven Rostedt
2006-01-09 21:19                         ` Esben Nielsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20051216184209.GD4732@in.ibm.com \
    --to=dino@in.ibm.com \
    --cc=dsingleton@mvista.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=robustmutexes@lists.osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.