From: Mathieu Avila <mathieu.avila@seanodes.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] Panic when stopping gulm.
Date: Mon, 16 Oct 2006 16:07:38 +0200 [thread overview]
Message-ID: <20061016160738.4b35b93e@mathieu.toulouse> (raw)
Hello,
I got panics sometimes, when stopping gulm on my whole cluster.
These are really not very frequent. The panics appear inside a function
of the "ipv6" module, when called by one of the gulm kernel threads.
^MProcess gulm_res_recvd (pid: 5029, threadinfo 0000010021300000, task
000001003f60f030)
^MStack: 0000000000004034 0000000000000000 0000001e124dd670
000001001533d380 ^M 0000010021301d08 0000000000000000
0000010021301e18 000001003c3a5e00 ^M 00000100124dd670
0000000023222120 ^MCall Trace:<ffffffffa01d7a58>{:ipv6:tcp_v6_xmit+611}
<ffffffff80134dea>{autoremove_wake_function+0}
^M <ffffffffa02c6af4>{:lock_gulm:do_tfer+252}
<ffffffffa02c6bcb>{:lock_gulm:xdr_send+34}
^M <ffffffffa02c5c53>{:lock_gulm:xdr_enc_flush+44}
<ffffffffa02c5cc3>{:lock_gulm:xdr_enc_release+19}
^M <ffffffffa02c383b>{:lock_gulm:lg_core_handle_messages+394}
^M <ffffffffa02be1b7>{:lock_gulm:cm_io_recving_thread+73}
^M <ffffffff80110e17>{child_rip+8}
<ffffffffa02be16e>{:lock_gulm:cm_io_recving_thread+0}
^M <ffffffff80110e0f>{child_rip+0}
I looked at the code, in src/gulm/xdr_io.c, in function "do_tfer".
I find something strange :
---------------------------------------------------
for (;;) {
m.msg_iov = iov;
m.msg_iovlen = n;
m.msg_flags = MSG_NOSIGNAL;
if (dir)
rv = sock_sendmsg (sock, &m, size - moved);
else
rv = sock_recvmsg (sock, &m, size - moved, 0);
if (rv <= 0)
goto out_err;
moved += rv;
if (moved >= size)
break;
/* adjust iov's for next transfer */
while (iov->iov_len == 0) {
iov++;
n--;
}
---------------------------------------------------
In my opinion, when "sock_sendmsg" doesn't return the
exact size that was asked to be sent, we get into
while (iov->iov_len == 0) {
iov++;
n--;
}
Even if we are already at the last buffer, without checking "n", which
is the number of buffers in the table "iov". "sock_sendmsg" is then
called with an invalid buffer pointer.... (m.msg_iov = iov)
I don't know if this is of any interest, since "n" always equals "1",
wherever "do_tfer" is called.
Anyway, this couldn't happen if "n" was checked:
---------------------------
while ( (n>1)&&(iov->iov_len == 0) {
iov++;
n--;
}
if (n<=1) break;
---------------------------
This still doesn't guarantee that the message will be sent as a
whole. Using :
m.msg_flags = MSG_NOSIGNAL | MSG_WAITALL;
and a loop over sock_sendmsg till the full message is sent is the
solution, maybe.
Any idea on this ?
Thanks in advance,
--
Mathieu Avila
reply other threads:[~2006-10-16 14:07 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20061016160738.4b35b93e@mathieu.toulouse \
--to=mathieu.avila@seanodes.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).