From: Guilhem Lavaux <guilhem.lavaux@free.fr>
To: linux-kernel@vger.kernel.org
Subject: PThreads, signals, futex and SMP
Date: Sat, 01 Jan 2005 17:48:09 +0100 [thread overview]
Message-ID: <41D6D449.3080907@free.fr> (raw)
Hi,
I am one of the developer of kaffe, a GPL implementation of the Java
Virtual Machine, and we encountered some problems on Linux/2.6/SMP. I am
currently running 2.6.8.1 Mandrake Kernel. I can try whether the problem
is reproduceable on a vanilla kernel but I don't think this part has
been touched.
Here is the problem: we have a regression test which is quite intense in
thread creation/destruction, each thread can start the garbage
collector. The garbage collector needs to stop all running thread to be
able to walk the heap/stack. For this, it uses a particular signal which
is sent to all threads. The signal handler calls sigwait to stop the
thread. 50% of the time everything is fine but from time to time, kaffe
has a deadlock. It appears that it always happen when we are in the
following configuration:
Thread 1
------------
sigwait
<signal handler>
futex syscall
pthread_mutex_unlock
Thread 2
------------
futex syscall
pthread_mutex_lock
Garbage Collector thread
Now if we look at the futex code in the linux kernel, we see this:
static int futex_wait(unsigned long uaddr, int val, unsigned long time)
{
DECLARE_WAITQUEUE(wait, current);
int ret, curval;
struct futex_q q;
down_read(¤t->mm->mmap_sem);
the kernel then prepares the wait queue and unlock mmap_sem.
Concerning the mutex_unlock part we have this:
static int futex_wake(unsigned long uaddr, int nr_wake)
{
union futex_key key;
struct futex_hash_bucket *bh;
struct list_head *head;
struct futex_q *this, *next;
int ret;
down_read(¤t->mm->mmap_sem);
and the kernel iterates the semaphores and wakes up all threads.
What may happen if the signal handler is called after down_read in
futex_wake ? My guess is that we are not able to call futex_wait because
the application will deadlock because the first thread is frozen by a
sigwait.
So either we have a limitation of the kernel either a bug if the
analysis is correct.
The only point is that I am not sure whether a signal is allowed to
interrupt a syscall just in the middle of futex_wake. If this is not
possible there may be a bug in our application somewhere else.
Regards,
Guilhem Lavaux.
reply other threads:[~2005-01-01 16:48 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=41D6D449.3080907@free.fr \
--to=guilhem.lavaux@free.fr \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.