From: Milan Broz <mbroz@redhat.com>
To: lvm-devel@redhat.com
Subject: [PATCH] (DRAFT!) Fix a deadlock in CLVMD (possibly related to BZ 561226).
Date: Wed, 20 Oct 2010 12:40:48 +0200 [thread overview]
Message-ID: <4CBEC730.4020906@redhat.com> (raw)
In-Reply-To: <874ocje7y1.fsf@twilight.int.mornfall.net.>
On 10/18/2010 07:17 PM, Petr Rockai wrote:
> the signalling code (pthread_cond_signal/pthread_cond_wait) in the
> pre_and_post_thread in clvmd is using the wait mutex (see man
> pthread_cond_wait) incorrectly, and this can cause clvmd to completely
> deadlock when the timing is right.
well, nobody even tried to answer, despite this quite important problem...
So, I see these important items:
1) always rely on implicit unlock in pthread_cond_wait()
2) some of the code now runs with client->bits.localsock.mutex locked,
mainly the do_pre_command() and do_post_command() + socket writes
3) the common code to signal is _only_ this pattern
pthread_mutex_lock(&client->bits.localsock.mutex);
client->bits.localsock.state = <STATE>;
pthread_cond_signal(&client->bits.localsock.cond);
pthread_mutex_unlock(&client->bits.localsock.mutex);
this is called from request_timed_out() and read_from_local_sock()
and it wakes up thrad always waiting in this code pattern
pthread_cond_wait(&client->bits.localsock.cond,
&client->bits.localsock.mutex);
I didn't found any other uses for this lock.
So the thread is signaled only when in cond_wait now (there is no
other place where signaling thread can take lock now).
This seems as expected fix.
The post/pre routines should not have problems with running with
localsock.mutex taken, same for socket writes - hope I am right:-)
Did I miss or misinterpret anything?
Not extra deep analysis, but for me it seems like correct fix,
I think we should commit it and run some tests...
Milan
p.s.
nitpicking
DEBUGLOG("in sub thread: client = %p\n", client);
+ pthread_mutex_lock(&client->bits.localsock.mutex);
/* Don't start until the LVM thread is ready */
pthread_mutex_lock(&lvm_start_mutex);
pthread_mutex_unlock(&lvm_start_mutex);
DEBUGLOG("Sub thread ready for work.\n");
Can we move that localsock.muext lock after the lvm_start_mutex exercise?
(Which I do not like either - isn't cond wait better for that?)
next prev parent reply other threads:[~2010-10-20 10:40 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-18 17:17 [PATCH] (DRAFT!) Fix a deadlock in CLVMD (possibly related to BZ 561226) Petr Rockai
2010-10-18 19:54 ` Petr Rockai
2010-10-20 10:40 ` Milan Broz [this message]
2010-10-20 13:03 ` Petr Rockai
2010-10-20 13:30 ` Milan Broz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CBEC730.4020906@redhat.com \
--to=mbroz@redhat.com \
--cc=lvm-devel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.