* [Cluster-devel] Panic when stopping gulm.
@ 2006-10-16 14:07 Mathieu Avila
0 siblings, 0 replies; only message in thread
From: Mathieu Avila @ 2006-10-16 14:07 UTC (permalink / raw)
To: cluster-devel.redhat.com
Hello,
I got panics sometimes, when stopping gulm on my whole cluster.
These are really not very frequent. The panics appear inside a function
of the "ipv6" module, when called by one of the gulm kernel threads.
^MProcess gulm_res_recvd (pid: 5029, threadinfo 0000010021300000, task
000001003f60f030)
^MStack: 0000000000004034 0000000000000000 0000001e124dd670
000001001533d380 ^M 0000010021301d08 0000000000000000
0000010021301e18 000001003c3a5e00 ^M 00000100124dd670
0000000023222120 ^MCall Trace:<ffffffffa01d7a58>{:ipv6:tcp_v6_xmit+611}
<ffffffff80134dea>{autoremove_wake_function+0}
^M <ffffffffa02c6af4>{:lock_gulm:do_tfer+252}
<ffffffffa02c6bcb>{:lock_gulm:xdr_send+34}
^M <ffffffffa02c5c53>{:lock_gulm:xdr_enc_flush+44}
<ffffffffa02c5cc3>{:lock_gulm:xdr_enc_release+19}
^M <ffffffffa02c383b>{:lock_gulm:lg_core_handle_messages+394}
^M <ffffffffa02be1b7>{:lock_gulm:cm_io_recving_thread+73}
^M <ffffffff80110e17>{child_rip+8}
<ffffffffa02be16e>{:lock_gulm:cm_io_recving_thread+0}
^M <ffffffff80110e0f>{child_rip+0}
I looked at the code, in src/gulm/xdr_io.c, in function "do_tfer".
I find something strange :
---------------------------------------------------
for (;;) {
m.msg_iov = iov;
m.msg_iovlen = n;
m.msg_flags = MSG_NOSIGNAL;
if (dir)
rv = sock_sendmsg (sock, &m, size - moved);
else
rv = sock_recvmsg (sock, &m, size - moved, 0);
if (rv <= 0)
goto out_err;
moved += rv;
if (moved >= size)
break;
/* adjust iov's for next transfer */
while (iov->iov_len == 0) {
iov++;
n--;
}
---------------------------------------------------
In my opinion, when "sock_sendmsg" doesn't return the
exact size that was asked to be sent, we get into
while (iov->iov_len == 0) {
iov++;
n--;
}
Even if we are already at the last buffer, without checking "n", which
is the number of buffers in the table "iov". "sock_sendmsg" is then
called with an invalid buffer pointer.... (m.msg_iov = iov)
I don't know if this is of any interest, since "n" always equals "1",
wherever "do_tfer" is called.
Anyway, this couldn't happen if "n" was checked:
---------------------------
while ( (n>1)&&(iov->iov_len == 0) {
iov++;
n--;
}
if (n<=1) break;
---------------------------
This still doesn't guarantee that the message will be sent as a
whole. Using :
m.msg_flags = MSG_NOSIGNAL | MSG_WAITALL;
and a loop over sock_sendmsg till the full message is sent is the
solution, maybe.
Any idea on this ?
Thanks in advance,
--
Mathieu Avila
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2006-10-16 14:07 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-16 14:07 [Cluster-devel] Panic when stopping gulm Mathieu Avila
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).