From: Guilhem Lavaux <guilhem.lavaux@free.fr>
To: linux-kernel@vger.kernel.org
Subject: PThreads, signals, futex and SMP
Date: Sat, 01 Jan 2005 17:48:09 +0100 [thread overview]
Message-ID: <41D6D449.3080907@free.fr> (raw)
Hi,
I am one of the developer of kaffe, a GPL implementation of the Java
Virtual Machine, and we encountered some problems on Linux/2.6/SMP. I am
currently running 2.6.8.1 Mandrake Kernel. I can try whether the problem
is reproduceable on a vanilla kernel but I don't think this part has
been touched.
Here is the problem: we have a regression test which is quite intense in
thread creation/destruction, each thread can start the garbage
collector. The garbage collector needs to stop all running thread to be
able to walk the heap/stack. For this, it uses a particular signal which
is sent to all threads. The signal handler calls sigwait to stop the
thread. 50% of the time everything is fine but from time to time, kaffe
has a deadlock. It appears that it always happen when we are in the
following configuration:
Thread 1
------------
sigwait
<signal handler>
futex syscall
pthread_mutex_unlock
Thread 2
------------
futex syscall
pthread_mutex_lock
Garbage Collector thread
Now if we look at the futex code in the linux kernel, we see this:
static int futex_wait(unsigned long uaddr, int val, unsigned long time)
{
DECLARE_WAITQUEUE(wait, current);
int ret, curval;
struct futex_q q;
down_read(¤t->mm->mmap_sem);
the kernel then prepares the wait queue and unlock mmap_sem.
Concerning the mutex_unlock part we have this:
static int futex_wake(unsigned long uaddr, int nr_wake)
{
union futex_key key;
struct futex_hash_bucket *bh;
struct list_head *head;
struct futex_q *this, *next;
int ret;
down_read(¤t->mm->mmap_sem);
and the kernel iterates the semaphores and wakes up all threads.
What may happen if the signal handler is called after down_read in
futex_wake ? My guess is that we are not able to call futex_wait because
the application will deadlock because the first thread is frozen by a
sigwait.
So either we have a limitation of the kernel either a bug if the
analysis is correct.
The only point is that I am not sure whether a signal is allowed to
interrupt a syscall just in the middle of futex_wake. If this is not
possible there may be a bug in our application somewhere else.
Regards,
Guilhem Lavaux.
reply other threads:[~2005-01-01 16:48 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=41D6D449.3080907@free.fr \
--to=guilhem.lavaux@free.fr \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox