From: Sridhar Samudrala <sri-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
To: Pradeep Satyanarayana
<pradeeps-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Cc: "Hefty,
Sean" <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
"linux-rdma
(linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)"
<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH 3/16] librdmacm/rsocket: Fix hang in rrecv/rsend after disconnecting
Date: Wed, 30 May 2012 16:55:21 -0700 [thread overview]
Message-ID: <4FC6B369.90100@us.ibm.com> (raw)
In-Reply-To: <4FC6709C.70304-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
On 5/30/2012 12:10 PM, Pradeep Satyanarayana wrote:
> On 05/30/2012 10:21 AM, Hefty, Sean wrote:
>> If a user calls rrecv() after a blocking rsocket has been disconnected,
>> it will hang. This problem and the cause was reported by Sridhar
>> Samudrala
>> <samudrala-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>. It can be reproduced by running netserver -f -D
>> using the rs-preload library. A similar issue exists with rsend().
>>
>> Fix this by not blocking on a CQ unless we're connected.
>>
>> Signed-off-by: Sean Hefty<sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>> ---
>> Sridhar, can you please let me know if this fixes the hang you were
>> seeing?
>> I moved the connected check inside holding the cq lock from the patch
>> that
>> you sent me.
>>
>> src/rsocket.c | 26 +++++++++++++++++++++++---
>> 1 files changed, 23 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/rsocket.c b/src/rsocket.c
>> index 01b7248..8c96dc1 100644
>> --- a/src/rsocket.c
>> +++ b/src/rsocket.c
>> @@ -908,6 +908,11 @@ static int rs_can_send(struct rsocket *rs)
>> (rs->target_sgl[rs->target_sge].length != 0);
>> }
>>
>> +static int rs_conn_can_send(struct rsocket *rs)
>> +{
>> + return rs_can_send(rs) || (rs->state != rs_connected);
>> +}
>> +
>> static int rs_can_send_ctrl(struct rsocket *rs)
>> {
>> return rs->ctrl_avail;
>> @@ -918,6 +923,11 @@ static int rs_have_rdata(struct rsocket *rs)
>> return (rs->rmsg_head != rs->rmsg_tail);
>> }
>>
>> +static int rs_conn_have_rdata(struct rsocket *rs)
>> +{
>> + return rs_have_rdata(rs) || (rs->state != rs_connected);
>> +}
>> +
>> static int rs_all_sends_done(struct rsocket *rs)
>> {
>> return (rs->sqe_avail + rs->ctrl_avail) == RS_QP_SIZE;
>> @@ -980,7 +990,7 @@ ssize_t rrecv(int socket, void *buf, size_t len,
>> int flags)
>> }
>> fastlock_acquire(&rs->rlock);
>> if (!rs_have_rdata(rs)) {
>> - ret = rs_process_cq(rs, rs_nonblocking(rs, flags),
>> rs_have_rdata);
>> + ret = rs_process_cq(rs, rs_nonblocking(rs, flags),
>> rs_conn_have_rdata);
>> if (ret&& errno != ECONNRESET)
>> goto out;
>> }
>> @@ -1084,9 +1094,14 @@ ssize_t rsend(int socket, const void *buf,
>> size_t len, int flags)
>> fastlock_acquire(&rs->slock);
>> for (left = len; left; left -= xfer_size, buf += xfer_size) {
>> if (!rs_can_send(rs)) {
>> - ret = rs_process_cq(rs, rs_nonblocking(rs, flags),
>> rs_can_send);
>> + ret = rs_process_cq(rs, rs_nonblocking(rs, flags),
>> + rs_conn_can_send);
>> if (ret)
>> break;
>> + if (rs->state != rs_connected) {
>> + ret = ERR(ECONNRESET);
>> + break;
>> + }
>> }
>>
>> if (olen< left) {
>> @@ -1193,9 +1208,14 @@ static ssize_t rsendv(int socket, const struct
>> iovec *iov, int iovcnt, int flags
>> fastlock_acquire(&rs->slock);
>> for (left = len; left; left -= xfer_size) {
>> if (!rs_can_send(rs)) {
>> - ret = rs_process_cq(rs, rs_nonblocking(rs, flags),
>> rs_can_send);
>> + ret = rs_process_cq(rs, rs_nonblocking(rs, flags),
>> + rs_conn_can_send);
>> if (ret)
>> break;
>> + if (rs->state != rs_connected) {
>> + ret = ERR(ECONNRESET);
>> + break;
>> + }
>> }
>>
>> if (olen< left) {
>>
>>
> Sean, Have tested by applying only this patch in the entire series.
> netperf now seems to be working.
Yes. The patch fixes the hang in recv().
However, i still see a few other issues related to socket semantics that
need to be addressed.
# ldp netperf -H 192.168.0.198 -l 3 -t TCP_STREAM
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
192.168.0.198 (192.168.0.198) port 0 AF_INET
netperf: get_transport_info: getsockopt: errno 95
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
131072 131072 131072 3.00 6176.91
shutdown_control: no response received errno 95
1. netperf: get_transport_info: getsockopt: errno 95
This failure is due to the missing TCP_MAXSEG socket option support. May
be this is OK as this option
doesn't make much sense when using RDMA. Or we could return a reasonable
value.
2. shutdown_control: no response received errno 95
Here select() on control socket is failing with EOPNOTSUPP after doing a
shutdown(SHUT_WR) of the control socket
3. Once in a while netserver timesout in recv() after the client closes
the connection.
Thanks
Sridhar
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2012-05-30 23:55 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-30 17:21 [PATCH 3/16] librdmacm/rsocket: Fix hang in rrecv/rsend after disconnecting Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A24BE7-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-05-30 19:10 ` Pradeep Satyanarayana
[not found] ` <4FC6709C.70304-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2012-05-30 23:55 ` Sridhar Samudrala [this message]
[not found] ` <4FC6B369.90100-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2012-05-31 0:33 ` Hefty, Sean
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FC6B369.90100@us.ibm.com \
--to=sri-r/jw6+rmf7hqt0dzr+alfa@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=pradeeps-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
--cc=sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.