public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] rsocket: Fix crash resulting from keepalive timeout
@ 2014-07-02 22:46 sean.hefty-ral2JQCrhuEAvxtiuMwx3w
       [not found] ` <1404341182-12533-1-git-send-email-sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 2+ messages in thread
From: sean.hefty-ral2JQCrhuEAvxtiuMwx3w @ 2014-07-02 22:46 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, hal-VPRAkNaXOzVWk0Htik3J/w; +Cc: Sean Hefty

From: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

The following crash was reported by Hal Rosenstock,
<hal-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, with keepalive enabled.  The crash
occurs in the keepalive thread attempting to send a
keepalive message.

report:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffecf08700 (LWP 6013)]
rs_post_write (rs=<value optimized out>, sgl=0x0, nsge=0, wr_data=3758096385,
    flags=0, addr=0, rkey=0) at src/rsocket.c:1660
1660            return rdma_seterrno(ibv_post_send(rs->cm_id->qp, &wr, &bad));
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6.x86_64
(gdb)
(gdb) p/x rs
$1 = value has been optimized out

So I added in the following to debug:
1660    if (rs == NULL)
1661    abort();
1662    if (rs->cm_id == NULL)
1663    abort();
1664    if (rs->cm_id->qp == NULL)
1665    abort();
1666            return rdma_seterrno(ibv_post_send(rs->cm_id->qp, &wr, &bad));
1667    }

And saw in gdb:

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffecf08700 (LWP 8096)]
0x00000030d50328a5 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6.x86_64
(gdb)
(gdb) bt
#0  0x00000030d50328a5 in raise () from /lib64/libc.so.6
#1  0x00000030d5034085 in abort () from /lib64/libc.so.6
#2  0x00007ffff057fe23 in rs_post_write (rs=<value optimized out>, sgl=0x1fa0,
    nsge=6, wr_data=4294967295, flags=0, addr=0, rkey=0) at src/rsocket.c:1665
#3  0x00007ffff058193d in tcp_svc_send_keepalive (arg=0x7ffff0789f20)
    at src/rsocket.c:4245
#4  tcp_svc_run (arg=0x7ffff0789f20) at src/rsocket.c:4279
#5  0x00000030d5807851 in start_thread () from /lib64/libpthread.so.0
#6  0x00000030d50e890d in clone () from /lib64/libc.so.6
(gdb) fr 2
#2  0x00007ffff057fe23 in rs_post_write (rs=<value optimized out>, sgl=0x1fa0,
    nsge=6, wr_data=4294967295, flags=0, addr=0, rkey=0) at src/rsocket.c:1665
1665    abort();

So qp is NULL somehow...
:end report

There is an issue if an rsocket is closed without going through
the rshutdown.

int rshutdown(int socket, int how)
{
	...
	if (rs->opts & RS_OPT_SVC_ACTIVE)
		rs_notify_svc(&tcp_svc, rs, RS_SVC_REM_KEEPALIVE);

We remove the rsocket from the keepalive thread in rshutdown.

int rclose(int socket)
{
	...
		if (rs->state & rs_connected)
			rshutdown(socket, SHUT_RDWR);
	...
	rs_free(rs);

rclose will call shutdown only if we're connected.  However, if the
keepalive failed, the socket will be in an error state.  So,
no call to rshutdown, which will leave the freed rsocket on
the keepalive thread's list.

The fix is to to have rclose remove an rsocket from being processed
by a service thread if it is still active.

Signed-off-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 src/rsocket.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/src/rsocket.c b/src/rsocket.c
index 3048e5e..f81fb1b 100644
--- a/src/rsocket.c
+++ b/src/rsocket.c
@@ -3265,6 +3265,8 @@ int rclose(int socket)
 	if (rs->type == SOCK_STREAM) {
 		if (rs->state & rs_connected)
 			rshutdown(socket, SHUT_RDWR);
+		else if (rs->opts & RS_OPT_SVC_ACTIVE)
+			rs_notify_svc(&tcp_svc, rs, RS_SVC_REM_KEEPALIVE);
 	} else {
 		ds_shutdown(rs);
 	}
-- 
1.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] rsocket: Fix crash resulting from keepalive timeout
       [not found] ` <1404341182-12533-1-git-send-email-sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2014-07-03  3:04   ` Hal Rosenstock
  0 siblings, 0 replies; 2+ messages in thread
From: Hal Rosenstock @ 2014-07-03  3:04 UTC (permalink / raw)
  To: sean.hefty-ral2JQCrhuEAvxtiuMwx3w
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, hal-VPRAkNaXOzVWk0Htik3J/w

On 7/2/2014 6:46 PM, sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org wrote:

<snip...>

> rclose will call shutdown only if we're connected.  However, if the
> keepalive failed, the socket will be in an error state.  So,
> no call to rshutdown, which will leave the freed rsocket on
> the keepalive thread's list.
> 
> The fix is to to have rclose remove an rsocket from being processed
> by a service thread if it is still active.
> 
> Signed-off-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Tested-by: Hal Rosenstock <hal-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Thanks!

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2014-07-03  3:04 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-02 22:46 [PATCH] rsocket: Fix crash resulting from keepalive timeout sean.hefty-ral2JQCrhuEAvxtiuMwx3w
     [not found] ` <1404341182-12533-1-git-send-email-sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2014-07-03  3:04   ` Hal Rosenstock

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox