* [PATCH 3/16] librdmacm/rsocket: Fix hang in rrecv/rsend after disconnecting
@ 2012-05-30 17:21 Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A24BE7-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Hefty, Sean @ 2012-05-30 17:21 UTC (permalink / raw)
To: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org),
samudrala-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org
If a user calls rrecv() after a blocking rsocket has been disconnected,
it will hang. This problem and the cause was reported by Sridhar Samudrala
<samudrala-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>. It can be reproduced by running netserver -f -D
using the rs-preload library. A similar issue exists with rsend().
Fix this by not blocking on a CQ unless we're connected.
Signed-off-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
Sridhar, can you please let me know if this fixes the hang you were seeing?
I moved the connected check inside holding the cq lock from the patch that
you sent me.
src/rsocket.c | 26 +++++++++++++++++++++++---
1 files changed, 23 insertions(+), 3 deletions(-)
diff --git a/src/rsocket.c b/src/rsocket.c
index 01b7248..8c96dc1 100644
--- a/src/rsocket.c
+++ b/src/rsocket.c
@@ -908,6 +908,11 @@ static int rs_can_send(struct rsocket *rs)
(rs->target_sgl[rs->target_sge].length != 0);
}
+static int rs_conn_can_send(struct rsocket *rs)
+{
+ return rs_can_send(rs) || (rs->state != rs_connected);
+}
+
static int rs_can_send_ctrl(struct rsocket *rs)
{
return rs->ctrl_avail;
@@ -918,6 +923,11 @@ static int rs_have_rdata(struct rsocket *rs)
return (rs->rmsg_head != rs->rmsg_tail);
}
+static int rs_conn_have_rdata(struct rsocket *rs)
+{
+ return rs_have_rdata(rs) || (rs->state != rs_connected);
+}
+
static int rs_all_sends_done(struct rsocket *rs)
{
return (rs->sqe_avail + rs->ctrl_avail) == RS_QP_SIZE;
@@ -980,7 +990,7 @@ ssize_t rrecv(int socket, void *buf, size_t len, int flags)
}
fastlock_acquire(&rs->rlock);
if (!rs_have_rdata(rs)) {
- ret = rs_process_cq(rs, rs_nonblocking(rs, flags), rs_have_rdata);
+ ret = rs_process_cq(rs, rs_nonblocking(rs, flags), rs_conn_have_rdata);
if (ret && errno != ECONNRESET)
goto out;
}
@@ -1084,9 +1094,14 @@ ssize_t rsend(int socket, const void *buf, size_t len, int flags)
fastlock_acquire(&rs->slock);
for (left = len; left; left -= xfer_size, buf += xfer_size) {
if (!rs_can_send(rs)) {
- ret = rs_process_cq(rs, rs_nonblocking(rs, flags), rs_can_send);
+ ret = rs_process_cq(rs, rs_nonblocking(rs, flags),
+ rs_conn_can_send);
if (ret)
break;
+ if (rs->state != rs_connected) {
+ ret = ERR(ECONNRESET);
+ break;
+ }
}
if (olen < left) {
@@ -1193,9 +1208,14 @@ static ssize_t rsendv(int socket, const struct iovec *iov, int iovcnt, int flags
fastlock_acquire(&rs->slock);
for (left = len; left; left -= xfer_size) {
if (!rs_can_send(rs)) {
- ret = rs_process_cq(rs, rs_nonblocking(rs, flags), rs_can_send);
+ ret = rs_process_cq(rs, rs_nonblocking(rs, flags),
+ rs_conn_can_send);
if (ret)
break;
+ if (rs->state != rs_connected) {
+ ret = ERR(ECONNRESET);
+ break;
+ }
}
if (olen < left) {
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 4+ messages in thread[parent not found: <1828884A29C6694DAF28B7E6B8A8237346A24BE7-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* Re: [PATCH 3/16] librdmacm/rsocket: Fix hang in rrecv/rsend after disconnecting [not found] ` <1828884A29C6694DAF28B7E6B8A8237346A24BE7-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2012-05-30 19:10 ` Pradeep Satyanarayana [not found] ` <4FC6709C.70304-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: Pradeep Satyanarayana @ 2012-05-30 19:10 UTC (permalink / raw) To: Hefty, Sean Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org), samudrala-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org On 05/30/2012 10:21 AM, Hefty, Sean wrote: > If a user calls rrecv() after a blocking rsocket has been disconnected, > it will hang. This problem and the cause was reported by Sridhar Samudrala > <samudrala-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>. It can be reproduced by running netserver -f -D > using the rs-preload library. A similar issue exists with rsend(). > > Fix this by not blocking on a CQ unless we're connected. > > Signed-off-by: Sean Hefty<sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> > --- > Sridhar, can you please let me know if this fixes the hang you were seeing? > I moved the connected check inside holding the cq lock from the patch that > you sent me. > > src/rsocket.c | 26 +++++++++++++++++++++++--- > 1 files changed, 23 insertions(+), 3 deletions(-) > > diff --git a/src/rsocket.c b/src/rsocket.c > index 01b7248..8c96dc1 100644 > --- a/src/rsocket.c > +++ b/src/rsocket.c > @@ -908,6 +908,11 @@ static int rs_can_send(struct rsocket *rs) > (rs->target_sgl[rs->target_sge].length != 0); > } > > +static int rs_conn_can_send(struct rsocket *rs) > +{ > + return rs_can_send(rs) || (rs->state != rs_connected); > +} > + > static int rs_can_send_ctrl(struct rsocket *rs) > { > return rs->ctrl_avail; > @@ -918,6 +923,11 @@ static int rs_have_rdata(struct rsocket *rs) > return (rs->rmsg_head != rs->rmsg_tail); > } > > +static int rs_conn_have_rdata(struct rsocket *rs) > +{ > + return rs_have_rdata(rs) || (rs->state != rs_connected); > +} > + > static int rs_all_sends_done(struct rsocket *rs) > { > return (rs->sqe_avail + rs->ctrl_avail) == RS_QP_SIZE; > @@ -980,7 +990,7 @@ ssize_t rrecv(int socket, void *buf, size_t len, int flags) > } > fastlock_acquire(&rs->rlock); > if (!rs_have_rdata(rs)) { > - ret = rs_process_cq(rs, rs_nonblocking(rs, flags), rs_have_rdata); > + ret = rs_process_cq(rs, rs_nonblocking(rs, flags), rs_conn_have_rdata); > if (ret&& errno != ECONNRESET) > goto out; > } > @@ -1084,9 +1094,14 @@ ssize_t rsend(int socket, const void *buf, size_t len, int flags) > fastlock_acquire(&rs->slock); > for (left = len; left; left -= xfer_size, buf += xfer_size) { > if (!rs_can_send(rs)) { > - ret = rs_process_cq(rs, rs_nonblocking(rs, flags), rs_can_send); > + ret = rs_process_cq(rs, rs_nonblocking(rs, flags), > + rs_conn_can_send); > if (ret) > break; > + if (rs->state != rs_connected) { > + ret = ERR(ECONNRESET); > + break; > + } > } > > if (olen< left) { > @@ -1193,9 +1208,14 @@ static ssize_t rsendv(int socket, const struct iovec *iov, int iovcnt, int flags > fastlock_acquire(&rs->slock); > for (left = len; left; left -= xfer_size) { > if (!rs_can_send(rs)) { > - ret = rs_process_cq(rs, rs_nonblocking(rs, flags), rs_can_send); > + ret = rs_process_cq(rs, rs_nonblocking(rs, flags), > + rs_conn_can_send); > if (ret) > break; > + if (rs->state != rs_connected) { > + ret = ERR(ECONNRESET); > + break; > + } > } > > if (olen< left) { > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > Sean, Have tested by applying only this patch in the entire series. netperf now seems to be working. Thanks Pradeep -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <4FC6709C.70304-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>]
* Re: [PATCH 3/16] librdmacm/rsocket: Fix hang in rrecv/rsend after disconnecting [not found] ` <4FC6709C.70304-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> @ 2012-05-30 23:55 ` Sridhar Samudrala [not found] ` <4FC6B369.90100-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: Sridhar Samudrala @ 2012-05-30 23:55 UTC (permalink / raw) To: Pradeep Satyanarayana Cc: Hefty, Sean, linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org) On 5/30/2012 12:10 PM, Pradeep Satyanarayana wrote: > On 05/30/2012 10:21 AM, Hefty, Sean wrote: >> If a user calls rrecv() after a blocking rsocket has been disconnected, >> it will hang. This problem and the cause was reported by Sridhar >> Samudrala >> <samudrala-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>. It can be reproduced by running netserver -f -D >> using the rs-preload library. A similar issue exists with rsend(). >> >> Fix this by not blocking on a CQ unless we're connected. >> >> Signed-off-by: Sean Hefty<sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> >> --- >> Sridhar, can you please let me know if this fixes the hang you were >> seeing? >> I moved the connected check inside holding the cq lock from the patch >> that >> you sent me. >> >> src/rsocket.c | 26 +++++++++++++++++++++++--- >> 1 files changed, 23 insertions(+), 3 deletions(-) >> >> diff --git a/src/rsocket.c b/src/rsocket.c >> index 01b7248..8c96dc1 100644 >> --- a/src/rsocket.c >> +++ b/src/rsocket.c >> @@ -908,6 +908,11 @@ static int rs_can_send(struct rsocket *rs) >> (rs->target_sgl[rs->target_sge].length != 0); >> } >> >> +static int rs_conn_can_send(struct rsocket *rs) >> +{ >> + return rs_can_send(rs) || (rs->state != rs_connected); >> +} >> + >> static int rs_can_send_ctrl(struct rsocket *rs) >> { >> return rs->ctrl_avail; >> @@ -918,6 +923,11 @@ static int rs_have_rdata(struct rsocket *rs) >> return (rs->rmsg_head != rs->rmsg_tail); >> } >> >> +static int rs_conn_have_rdata(struct rsocket *rs) >> +{ >> + return rs_have_rdata(rs) || (rs->state != rs_connected); >> +} >> + >> static int rs_all_sends_done(struct rsocket *rs) >> { >> return (rs->sqe_avail + rs->ctrl_avail) == RS_QP_SIZE; >> @@ -980,7 +990,7 @@ ssize_t rrecv(int socket, void *buf, size_t len, >> int flags) >> } >> fastlock_acquire(&rs->rlock); >> if (!rs_have_rdata(rs)) { >> - ret = rs_process_cq(rs, rs_nonblocking(rs, flags), >> rs_have_rdata); >> + ret = rs_process_cq(rs, rs_nonblocking(rs, flags), >> rs_conn_have_rdata); >> if (ret&& errno != ECONNRESET) >> goto out; >> } >> @@ -1084,9 +1094,14 @@ ssize_t rsend(int socket, const void *buf, >> size_t len, int flags) >> fastlock_acquire(&rs->slock); >> for (left = len; left; left -= xfer_size, buf += xfer_size) { >> if (!rs_can_send(rs)) { >> - ret = rs_process_cq(rs, rs_nonblocking(rs, flags), >> rs_can_send); >> + ret = rs_process_cq(rs, rs_nonblocking(rs, flags), >> + rs_conn_can_send); >> if (ret) >> break; >> + if (rs->state != rs_connected) { >> + ret = ERR(ECONNRESET); >> + break; >> + } >> } >> >> if (olen< left) { >> @@ -1193,9 +1208,14 @@ static ssize_t rsendv(int socket, const struct >> iovec *iov, int iovcnt, int flags >> fastlock_acquire(&rs->slock); >> for (left = len; left; left -= xfer_size) { >> if (!rs_can_send(rs)) { >> - ret = rs_process_cq(rs, rs_nonblocking(rs, flags), >> rs_can_send); >> + ret = rs_process_cq(rs, rs_nonblocking(rs, flags), >> + rs_conn_can_send); >> if (ret) >> break; >> + if (rs->state != rs_connected) { >> + ret = ERR(ECONNRESET); >> + break; >> + } >> } >> >> if (olen< left) { >> >> > Sean, Have tested by applying only this patch in the entire series. > netperf now seems to be working. Yes. The patch fixes the hang in recv(). However, i still see a few other issues related to socket semantics that need to be addressed. # ldp netperf -H 192.168.0.198 -l 3 -t TCP_STREAM MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.198 (192.168.0.198) port 0 AF_INET netperf: get_transport_info: getsockopt: errno 95 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 131072 131072 3.00 6176.91 shutdown_control: no response received errno 95 1. netperf: get_transport_info: getsockopt: errno 95 This failure is due to the missing TCP_MAXSEG socket option support. May be this is OK as this option doesn't make much sense when using RDMA. Or we could return a reasonable value. 2. shutdown_control: no response received errno 95 Here select() on control socket is failing with EOPNOTSUPP after doing a shutdown(SHUT_WR) of the control socket 3. Once in a while netserver timesout in recv() after the client closes the connection. Thanks Sridhar -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <4FC6B369.90100-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* RE: [PATCH 3/16] librdmacm/rsocket: Fix hang in rrecv/rsend after disconnecting [not found] ` <4FC6B369.90100-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2012-05-31 0:33 ` Hefty, Sean 0 siblings, 0 replies; 4+ messages in thread From: Hefty, Sean @ 2012-05-31 0:33 UTC (permalink / raw) To: Sridhar Samudrala, Pradeep Satyanarayana Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org) Thanks for the feedback. > 1. netperf: get_transport_info: getsockopt: errno 95 > This failure is due to the missing TCP_MAXSEG socket option support. May > be this is OK as this option > doesn't make much sense when using RDMA. Or we could return a reasonable > value. Missing socket options are usually easy enough to add, so that setsockopt can return success. :) Maybe this option makes sense as the MTU? One issue I hit into with rsockets is that several options must be set before connecting. And in this case, if MAXSEG mapped to MTU, it would need to match on both sides. > 2. shutdown_control: no response received errno 95 > Here select() on control socket is failing with EOPNOTSUPP after doing a > shutdown(SHUT_WR) of the control socket rshutdown() pretty much assumes SHUT_RDWR. I need to think whether anything should be done to handle SHUT_RD or SHUT_WR, or if shutdown should just ignore those. The issue may be the result of rshutdown() switching the rsocket from nonblocking to blocking in order to process the shutdown properly. The code is waiting for all pending sends to complete to prevent data loss... > 3. Once in a while netserver timesout in recv() after the client closes > the connection. ...or maybe we still lose the disconnect message. Did this occur after you applied patch 3? - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-05-31 0:33 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-30 17:21 [PATCH 3/16] librdmacm/rsocket: Fix hang in rrecv/rsend after disconnecting Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A24BE7-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-05-30 19:10 ` Pradeep Satyanarayana
[not found] ` <4FC6709C.70304-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2012-05-30 23:55 ` Sridhar Samudrala
[not found] ` <4FC6B369.90100-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2012-05-31 0:33 ` Hefty, Sean
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox