From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sagi Grimberg Subject: Re: [PATCH v2 18/18] xprtrdma: Faster server reboot recovery Date: Tue, 26 Apr 2016 23:31:54 +0300 Message-ID: <571FD03A.2020100@grimberg.me> References: <20160425185956.3566.64142.stgit@manet.1015granger.net> <20160425192315.3566.4175.stgit@manet.1015granger.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20160425192315.3566.4175.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org> Sender: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Chuck Lever , linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: linux-rdma@vger.kernel.org > In a cluster failover scenario, it is desirable for the client to > attempt to reconnect quickly, as an alternate NFS server is already > waiting to take over for the down server. The client can't see that > a server IP address has moved to a new server until the existing > connection is gone. > > For fabrics and devices where it is meaningful, set an upper bound > on the amount of time before it is determined that a connection is > no longer valid. This allows the RPC client to detect connection > loss in a timely matter, then perform a fresh resolution of the > server GUID in case it has changed (cluster failover). > > Signed-off-by: Chuck Lever > --- > net/sunrpc/xprtrdma/verbs.c | 28 ++++++++++++++++++++++++---- > 1 file changed, 24 insertions(+), 4 deletions(-) > > diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c > index b7a5bc1..5cc57fb 100644 > --- a/net/sunrpc/xprtrdma/verbs.c > +++ b/net/sunrpc/xprtrdma/verbs.c > @@ -211,9 +211,10 @@ rpcrdma_conn_upcall(struct rdma_cm_id *id, struct rdma_cm_event *event) > struct rpcrdma_ep *ep = &xprt->rx_ep; > #if IS_ENABLED(CONFIG_SUNRPC_DEBUG) > struct sockaddr *sap = (struct sockaddr *)&ep->rep_remote_addr; > -#endif > + u64 timeout; > struct ib_qp_attr *attr = &ia->ri_qp_attr; > struct ib_qp_init_attr *iattr = &ia->ri_qp_init_attr; > +#endif > int connstate = 0; > > switch (event->event) { > @@ -235,14 +236,23 @@ rpcrdma_conn_upcall(struct rdma_cm_id *id, struct rdma_cm_event *event) > complete(&ia->ri_done); > break; > case RDMA_CM_EVENT_ESTABLISHED: > - connstate = 1; > +#if IS_ENABLED(CONFIG_SUNRPC_DEBUG) > + memset(attr, 0, sizeof(*attr)); > ib_query_qp(ia->ri_id->qp, attr, > - IB_QP_MAX_QP_RD_ATOMIC | IB_QP_MAX_DEST_RD_ATOMIC, > + IB_QP_MAX_QP_RD_ATOMIC | > + IB_QP_MAX_DEST_RD_ATOMIC | > + IB_QP_TIMEOUT, > iattr); > dprintk("RPC: %s: %d responder resources" > " (%d initiator)\n", > __func__, attr->max_dest_rd_atomic, > attr->max_rd_atomic); > + timeout = 4096 * (1ULL << attr->timeout); > + do_div(timeout, NSEC_PER_SEC); > + dprintk("RPC: %s: retry timeout: %llu seconds\n", > + __func__, timeout); > +#endif Can you put the debug in a separate patch, at first glance I was confused how that helped reboot recovery... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html