* [Patch] CLD: fix hang in ncld_sess_open if session open fails
@ 2010-04-14 0:22 Pete Zaitcev
2010-04-14 4:01 ` Jeff Garzik
0 siblings, 1 reply; 2+ messages in thread
From: Pete Zaitcev @ 2010-04-14 0:22 UTC (permalink / raw)
To: jeff; +Cc: Project Hail List
The problem turned out to be two-fold, with the same symptom.
Firstly, we use is_open to signal that the caller thread may
proceed, but this is incorrect in case the thread open fails:
we still want the caller thread to proceed and deliver the
error indicator from ncld_sess_open to the application.
So, let's split is_up from the condition variable mechanism.
It continues to mean that the session is open and up, and
open_done is introduced for the waiting mechanism.
In addition, we forgot to take a mutex around a call into
the cldc layer. It manifested itself in timers not firing,
and so we would hang waiting for an answer from CLD server.
Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>
---
include/ncld.h | 1 +
lib/cldc.c | 15 +++++++++++++--
2 files changed, 14 insertions(+), 2 deletions(-)
This, I think, can go either before or after the e2e verbose.
My previous patch added a HAIL_VERBOSE in a strategic location,
but it's not really necessary and I took it out.
diff -urp -X dontdiff cld-m/include/ncld.h cld-tip/include/ncld.h
--- cld-m/include/ncld.h 2010-04-13 12:08:50.328845225 -0600
+++ cld-tip/include/ncld.h 2010-04-02 20:48:40.629801187 -0600
@@ -36,6 +36,7 @@ struct ncld_sess {
GCond *cond;
GThread *thread;
bool is_up;
+ bool open_done;
int errc;
GList *handles;
int to_thread[2];
diff -urp -X dontdiff cld-m/lib/cldc.c cld-tip/lib/cldc.c
--- cld-m/lib/cldc.c 2010-04-13 12:08:50.329845492 -0600
+++ cld-tip/lib/cldc.c 2010-04-02 23:52:05.258864321 -0600
@@ -1491,6 +1496,8 @@ static void ncld_p_event(void *priv, str
if (what == CE_SESS_FAILED) {
if (nsess->udp->sess != csp)
abort();
+ if (!nsess->is_up)
+ return;
nsess->is_up = false;
/* XXX wake up all I/O waiters here */
/*
@@ -1519,9 +1526,15 @@ static int ncld_new_sess(struct cldc_cal
/*
* All callbacks from cldc layer run on the context of the thread
* with nsess->mutex locked, so we don't lock it again here.
+ *
+ * We set is_up on the context that is invoked from cldc,
+ * so the flag has a sane meaning in case we immediately
+ * get a session failure event.
*/
nsess->errc = errc;
- nsess->is_up = true;
+ nsess->open_done = true;
+ if (!errc)
+ nsess->is_up = true;
g_cond_broadcast(nsess->cond);
return 0;
}
@@ -1529,7 +1542,7 @@ static int ncld_new_sess(struct cldc_cal
static int ncld_wait_session(struct ncld_sess *nsess)
{
g_mutex_lock(nsess->mutex);
- while (!nsess->is_up)
+ while (!nsess->open_done)
g_cond_wait(nsess->cond, nsess->mutex);
g_mutex_unlock(nsess->mutex);
return nsess->errc;
@@ -1620,6 +1633,7 @@ struct ncld_sess *ncld_sess_open(const c
goto out_thread;
}
+ g_mutex_lock(nsess->mutex);
memset(&copts, 0, sizeof(copts));
copts.cb = ncld_new_sess;
copts.private = nsess;
@@ -1627,9 +1641,11 @@ struct ncld_sess *ncld_sess_open(const c
nsess->udp->addr, nsess->udp->addr_len,
cld_user, cld_key, nsess, log,
&nsess->udp->sess)) {
+ g_mutex_unlock(nsess->mutex);
err = 1024;
goto out_session;
}
+ g_mutex_unlock(nsess->mutex);
rc = ncld_wait_session(nsess);
if (rc) {
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [Patch] CLD: fix hang in ncld_sess_open if session open fails
2010-04-14 0:22 [Patch] CLD: fix hang in ncld_sess_open if session open fails Pete Zaitcev
@ 2010-04-14 4:01 ` Jeff Garzik
0 siblings, 0 replies; 2+ messages in thread
From: Jeff Garzik @ 2010-04-14 4:01 UTC (permalink / raw)
To: Pete Zaitcev; +Cc: Project Hail List
On 04/13/2010 08:22 PM, Pete Zaitcev wrote:
> The problem turned out to be two-fold, with the same symptom.
>
> Firstly, we use is_open to signal that the caller thread may
> proceed, but this is incorrect in case the thread open fails:
> we still want the caller thread to proceed and deliver the
> error indicator from ncld_sess_open to the application.
> So, let's split is_up from the condition variable mechanism.
> It continues to mean that the session is open and up, and
> open_done is introduced for the waiting mechanism.
>
> In addition, we forgot to take a mutex around a call into
> the cldc layer. It manifested itself in timers not firing,
> and so we would hang waiting for an answer from CLD server.
>
> Signed-off-by: Pete Zaitcev<zaitcev@redhat.com>
>
> ---
> include/ncld.h | 1 +
> lib/cldc.c | 15 +++++++++++++--
> 2 files changed, 14 insertions(+), 2 deletions(-)
>
> This, I think, can go either before or after the e2e verbose.
> My previous patch added a HAIL_VERBOSE in a strategic location,
> but it's not really necessary and I took it out.
applied
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2010-04-14 4:01 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-14 0:22 [Patch] CLD: fix hang in ncld_sess_open if session open fails Pete Zaitcev
2010-04-14 4:01 ` Jeff Garzik
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.