All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sridhar Samudrala <sri-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
To: Pradeep Satyanarayana
	<pradeeps-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Cc: "Hefty,
	Sean" <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	"linux-rdma
	(linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)"
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH 3/16] librdmacm/rsocket: Fix hang in rrecv/rsend after disconnecting
Date: Wed, 30 May 2012 16:55:21 -0700	[thread overview]
Message-ID: <4FC6B369.90100@us.ibm.com> (raw)
In-Reply-To: <4FC6709C.70304-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

On 5/30/2012 12:10 PM, Pradeep Satyanarayana wrote:
> On 05/30/2012 10:21 AM, Hefty, Sean wrote:
>> If a user calls rrecv() after a blocking rsocket has been disconnected,
>> it will hang.  This problem and the cause was reported by Sridhar 
>> Samudrala
>> <samudrala-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>.  It can be reproduced by running netserver -f -D
>> using the rs-preload library.  A similar issue exists with rsend().
>>
>> Fix this by not blocking on a CQ unless we're connected.
>>
>> Signed-off-by: Sean Hefty<sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>> ---
>> Sridhar, can you please let me know if this fixes the hang you were 
>> seeing?
>> I moved the connected check inside holding the cq lock from the patch 
>> that
>> you sent me.
>>
>>   src/rsocket.c |   26 +++++++++++++++++++++++---
>>   1 files changed, 23 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/rsocket.c b/src/rsocket.c
>> index 01b7248..8c96dc1 100644
>> --- a/src/rsocket.c
>> +++ b/src/rsocket.c
>> @@ -908,6 +908,11 @@ static int rs_can_send(struct rsocket *rs)
>>              (rs->target_sgl[rs->target_sge].length != 0);
>>   }
>>
>> +static int rs_conn_can_send(struct rsocket *rs)
>> +{
>> +    return rs_can_send(rs) || (rs->state != rs_connected);
>> +}
>> +
>>   static int rs_can_send_ctrl(struct rsocket *rs)
>>   {
>>       return rs->ctrl_avail;
>> @@ -918,6 +923,11 @@ static int rs_have_rdata(struct rsocket *rs)
>>       return (rs->rmsg_head != rs->rmsg_tail);
>>   }
>>
>> +static int rs_conn_have_rdata(struct rsocket *rs)
>> +{
>> +    return rs_have_rdata(rs) || (rs->state != rs_connected);
>> +}
>> +
>>   static int rs_all_sends_done(struct rsocket *rs)
>>   {
>>       return (rs->sqe_avail + rs->ctrl_avail) == RS_QP_SIZE;
>> @@ -980,7 +990,7 @@ ssize_t rrecv(int socket, void *buf, size_t len, 
>> int flags)
>>       }
>>       fastlock_acquire(&rs->rlock);
>>       if (!rs_have_rdata(rs)) {
>> -        ret = rs_process_cq(rs, rs_nonblocking(rs, flags), 
>> rs_have_rdata);
>> +        ret = rs_process_cq(rs, rs_nonblocking(rs, flags), 
>> rs_conn_have_rdata);
>>           if (ret&&  errno != ECONNRESET)
>>               goto out;
>>       }
>> @@ -1084,9 +1094,14 @@ ssize_t rsend(int socket, const void *buf, 
>> size_t len, int flags)
>>       fastlock_acquire(&rs->slock);
>>       for (left = len; left; left -= xfer_size, buf += xfer_size) {
>>           if (!rs_can_send(rs)) {
>> -            ret = rs_process_cq(rs, rs_nonblocking(rs, flags), 
>> rs_can_send);
>> +            ret = rs_process_cq(rs, rs_nonblocking(rs, flags),
>> +                        rs_conn_can_send);
>>               if (ret)
>>                   break;
>> +            if (rs->state != rs_connected) {
>> +                ret = ERR(ECONNRESET);
>> +                break;
>> +            }
>>           }
>>
>>           if (olen<  left) {
>> @@ -1193,9 +1208,14 @@ static ssize_t rsendv(int socket, const struct 
>> iovec *iov, int iovcnt, int flags
>>       fastlock_acquire(&rs->slock);
>>       for (left = len; left; left -= xfer_size) {
>>           if (!rs_can_send(rs)) {
>> -            ret = rs_process_cq(rs, rs_nonblocking(rs, flags), 
>> rs_can_send);
>> +            ret = rs_process_cq(rs, rs_nonblocking(rs, flags),
>> +                        rs_conn_can_send);
>>               if (ret)
>>                   break;
>> +            if (rs->state != rs_connected) {
>> +                ret = ERR(ECONNRESET);
>> +                break;
>> +            }
>>           }
>>
>>           if (olen<  left) {
>>
>>
> Sean, Have tested by applying only this patch in the entire series. 
> netperf now seems to be working.
Yes. The patch fixes the hang in recv().
However, i still see a few other issues related to socket semantics that 
need to be addressed.

# ldp netperf  -H 192.168.0.198 -l 3 -t TCP_STREAM
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.0.198 (192.168.0.198) port 0 AF_INET
netperf: get_transport_info: getsockopt: errno 95
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

131072 131072 131072    3.00     6176.91
shutdown_control: no response received  errno 95

1. netperf: get_transport_info: getsockopt: errno 95
This failure is due to the missing TCP_MAXSEG socket option support. May 
be this is OK as this option
doesn't make much sense when using RDMA. Or we could return a reasonable 
value.

2. shutdown_control: no response received  errno 95
Here select() on control socket is failing with EOPNOTSUPP after doing a 
shutdown(SHUT_WR) of the control socket

3. Once in a while netserver timesout in recv() after the client closes 
the connection.

Thanks
Sridhar




--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2012-05-30 23:55 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-30 17:21 [PATCH 3/16] librdmacm/rsocket: Fix hang in rrecv/rsend after disconnecting Hefty, Sean
     [not found] ` <1828884A29C6694DAF28B7E6B8A8237346A24BE7-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-05-30 19:10   ` Pradeep Satyanarayana
     [not found]     ` <4FC6709C.70304-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2012-05-30 23:55       ` Sridhar Samudrala [this message]
     [not found]         ` <4FC6B369.90100-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2012-05-31  0:33           ` Hefty, Sean

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FC6B369.90100@us.ibm.com \
    --to=sri-r/jw6+rmf7hqt0dzr+alfa@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=pradeeps-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
    --cc=sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.