All of lore.kernel.org
 help / color / mirror / Atom feed
From: Guilhem Lavaux <guilhem.lavaux@free.fr>
To: linux-kernel@vger.kernel.org
Subject: PThreads, signals, futex and SMP
Date: Sat, 01 Jan 2005 17:48:09 +0100	[thread overview]
Message-ID: <41D6D449.3080907@free.fr> (raw)

Hi,

I am one of the developer of kaffe, a GPL implementation of the Java 
Virtual Machine, and we encountered some problems on Linux/2.6/SMP. I am 
currently running 2.6.8.1 Mandrake Kernel. I can try whether the problem 
is reproduceable on a vanilla kernel but I don't think this part has 
been touched.

Here is the problem: we have a regression test which is quite intense in 
thread creation/destruction, each thread can start the garbage 
collector. The garbage collector needs to stop all running thread to be 
able to walk the heap/stack. For this, it uses a particular signal which 
is sent to all threads. The signal handler calls sigwait to stop the 
thread. 50% of the time everything is fine but from time to time, kaffe 
has a deadlock. It appears that it always happen when we are in the 
following configuration:

Thread 1
------------

sigwait
<signal handler>
futex syscall
pthread_mutex_unlock

Thread 2
------------
futex syscall
pthread_mutex_lock
Garbage Collector thread

Now if we look at the futex code in the linux kernel, we see this:

static int futex_wait(unsigned long uaddr, int val, unsigned long time)
{
       DECLARE_WAITQUEUE(wait, current);
       int ret, curval;
       struct futex_q q;

       down_read(&current->mm->mmap_sem);

the kernel then prepares the wait queue and unlock mmap_sem. 


Concerning the mutex_unlock  part we have this:

static int futex_wake(unsigned long uaddr, int nr_wake)
{
       union futex_key key;
       struct futex_hash_bucket *bh;
       struct list_head *head;
       struct futex_q *this, *next;
       int ret;

       down_read(&current->mm->mmap_sem);

and the kernel iterates the semaphores and wakes up all threads.


What may happen if the signal handler is called after down_read in 
futex_wake ? My guess is that we are not able to call futex_wait because 
the application will deadlock because the first thread is frozen by a 
sigwait.

So either we have a limitation of the kernel either a bug if the 
analysis is correct.
The only point is that I am not sure whether a signal is allowed to 
interrupt a syscall just in the middle of futex_wake. If this is not 
possible there may be a bug in our application somewhere else.

Regards,

Guilhem Lavaux.

                 reply	other threads:[~2005-01-01 16:48 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41D6D449.3080907@free.fr \
    --to=guilhem.lavaux@free.fr \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.