[Cluster-devel] Panic when stopping gulm.

cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

From: Mathieu Avila <mathieu.avila@seanodes.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] Panic when stopping gulm.
Date: Mon, 16 Oct 2006 16:07:38 +0200	[thread overview]
Message-ID: <20061016160738.4b35b93e@mathieu.toulouse> (raw)

Hello, 


I got panics sometimes, when stopping gulm on my whole cluster.
These are really not very frequent. The panics appear inside a function
of the "ipv6" module, when called by one of the gulm kernel threads. 


^MProcess gulm_res_recvd (pid: 5029, threadinfo 0000010021300000, task
000001003f60f030)
^MStack: 0000000000004034 0000000000000000 0000001e124dd670
000001001533d380 ^M       0000010021301d08 0000000000000000
0000010021301e18 000001003c3a5e00 ^M       00000100124dd670
0000000023222120 ^MCall Trace:<ffffffffa01d7a58>{:ipv6:tcp_v6_xmit+611}
<ffffffff80134dea>{autoremove_wake_function+0}
^M       <ffffffffa02c6af4>{:lock_gulm:do_tfer+252}
<ffffffffa02c6bcb>{:lock_gulm:xdr_send+34}
^M       <ffffffffa02c5c53>{:lock_gulm:xdr_enc_flush+44}
<ffffffffa02c5cc3>{:lock_gulm:xdr_enc_release+19}
^M       <ffffffffa02c383b>{:lock_gulm:lg_core_handle_messages+394}
^M       <ffffffffa02be1b7>{:lock_gulm:cm_io_recving_thread+73}
^M       <ffffffff80110e17>{child_rip+8}
<ffffffffa02be16e>{:lock_gulm:cm_io_recving_thread+0}
^M       <ffffffff80110e0f>{child_rip+0}


I looked at the code, in src/gulm/xdr_io.c, in function "do_tfer".
I find something strange :

---------------------------------------------------
	for (;;) {
		m.msg_iov = iov;
		m.msg_iovlen = n;
		m.msg_flags = MSG_NOSIGNAL;

		if (dir)
			rv = sock_sendmsg (sock, &m, size - moved);
		else
			rv = sock_recvmsg (sock, &m, size - moved, 0);

		if (rv <= 0)
			goto out_err;
		moved += rv;

		if (moved >= size)
			break;

		/* adjust iov's for next transfer */
		while (iov->iov_len == 0) {
			iov++;
			n--;
		}
---------------------------------------------------

In my opinion, when "sock_sendmsg" doesn't return the
exact size that was asked to be sent, we get into  
		while (iov->iov_len == 0) {
			iov++;
			n--;
		}
Even if we are already at the last buffer, without checking "n", which
is the number of buffers in the table "iov". "sock_sendmsg" is then
called with an invalid buffer pointer.... (m.msg_iov = iov)
I don't know if this is of any interest, since "n" always equals "1",
wherever "do_tfer" is called.

Anyway, this couldn't happen if "n" was checked:
---------------------------
		while ( (n>1)&&(iov->iov_len == 0) {
			iov++;
			n--;
		}
		if (n<=1) break;
---------------------------

This still doesn't guarantee that the message will be sent as a
whole. Using : 
		m.msg_flags = MSG_NOSIGNAL | MSG_WAITALL;
and a loop over sock_sendmsg till the full message is sent is the
solution, maybe.

Any idea on this ?

Thanks in advance,

--
Mathieu Avila

                 reply	other threads:[~2006-10-16 14:07 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061016160738.4b35b93e@mathieu.toulouse \
    --to=mathieu.avila@seanodes.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).