From: Lars Marowsky-Bree <lmb@suse.de>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH] fs/dlm: fix connection close handling.
Date: Mon, 10 Aug 2009 21:51:24 +0200 [thread overview]
Message-ID: <20090810195124.GB15541@suse.de> (raw)
For some reason, connections which were closed were being put back on
the work queue, causing a hang in trying to connect to a blocked node,
or a crash trying to access a closed connection.
David provided a fix which introduced the CF_CLOSE flag, but which still
could trigger the crash. Chrissie provided a fix which cleared the
CONNECT_ and WRITE_PENDING flags, but which still could trigger the hang
(I think because send_to_sock() would still attempt to connect in its
retry path). I added a fix which avoided the unconditional call to
send_to_sock() and also cancelled any work which might still be on the
workqueue.
Combined, these three fix the hangs and crashes I have been seeing when
a node was killed (bugzilla.novell.com #52422).
I'm not perfectly happy with this patch; it feels as if it is fixing
symptoms. In particular, I don't quite understand where
lowcomms_connect_to_sock() ends up being called from with the connection
closed, but I've resisted the urge to insert a BUG() in the if clause
there so far. Maybe someone else is inspired by this patch to reevaluate
the connection handling completely ;-)
Acked-by: teigland at redhat.com
Acked-by: ccaulfie at redhat.com
Index: dlm/lowcomms.c
===================================================================
--- dlm.orig/lowcomms.c
+++ dlm/lowcomms.c
@@ -106,6 +106,7 @@ struct connection {
#define CF_CONNECT_PENDING 3
#define CF_INIT_PENDING 4
#define CF_IS_OTHERCON 5
+#define CF_CLOSE 6
struct list_head writequeue; /* List of outgoing writequeue_entries */
spinlock_t writequeue_lock;
int (*rx_action) (struct connection *); /* What to do when active */
@@ -299,6 +300,8 @@ static void lowcomms_write_space(struct
static inline void lowcomms_connect_sock(struct connection *con)
{
+ if (test_bit(CF_CLOSE, &con->flags))
+ return;
if (!test_and_set_bit(CF_CONNECT_PENDING, &con->flags))
queue_work(send_workqueue, &con->swork);
}
@@ -1370,6 +1373,15 @@ int dlm_lowcomms_close(int nodeid)
log_print("closing connection to node %d", nodeid);
con = nodeid2con(nodeid, 0);
if (con) {
+ clear_bit(CF_CONNECT_PENDING, &con->flags);
+ clear_bit(CF_WRITE_PENDING, &con->flags);
+ set_bit(CF_CLOSE, &con->flags);
+ if (cancel_work_sync(&con->swork)) {
+ log_print("swork cancelled for node %d", nodeid);
+ }
+ if (cancel_work_sync(&con->rwork)) {
+ log_print("rwork cancelled for node %d", nodeid);
+ }
clean_one_writequeue(con);
close_connection(con, true);
}
@@ -1395,9 +1407,11 @@ static void process_send_sockets(struct
if (test_and_clear_bit(CF_CONNECT_PENDING, &con->flags)) {
con->connect_action(con);
+ set_bit(CF_WRITE_PENDING, &con->flags);
+ }
+ if (test_and_clear_bit(CF_WRITE_PENDING, &con->flags)) {
+ send_to_sock(con);
}
- clear_bit(CF_WRITE_PENDING, &con->flags);
- send_to_sock(con);
}
Regards,
Lars
--
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG N?rnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
reply other threads:[~2009-08-10 19:51 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090810195124.GB15541@suse.de \
--to=lmb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).