All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: Chuck Lever <chuck.lever@oracle.com>,
	linux-rdma@vger.kernel.org, linux-nfs@vger.kernel.org
Subject: Re: [PATCH v2 18/18] xprtrdma: Faster server reboot recovery
Date: Tue, 26 Apr 2016 23:31:54 +0300	[thread overview]
Message-ID: <571FD03A.2020100@grimberg.me> (raw)
In-Reply-To: <20160425192315.3566.4175.stgit@manet.1015granger.net>


> In a cluster failover scenario, it is desirable for the client to
> attempt to reconnect quickly, as an alternate NFS server is already
> waiting to take over for the down server. The client can't see that
> a server IP address has moved to a new server until the existing
> connection is gone.
>
> For fabrics and devices where it is meaningful, set an upper bound
> on the amount of time before it is determined that a connection is
> no longer valid. This allows the RPC client to detect connection
> loss in a timely matter, then perform a fresh resolution of the
> server GUID in case it has changed (cluster failover).
>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>   net/sunrpc/xprtrdma/verbs.c |   28 ++++++++++++++++++++++++----
>   1 file changed, 24 insertions(+), 4 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index b7a5bc1..5cc57fb 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -211,9 +211,10 @@ rpcrdma_conn_upcall(struct rdma_cm_id *id, struct rdma_cm_event *event)
>   	struct rpcrdma_ep *ep = &xprt->rx_ep;
>   #if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
>   	struct sockaddr *sap = (struct sockaddr *)&ep->rep_remote_addr;
> -#endif
> +	u64 timeout;
>   	struct ib_qp_attr *attr = &ia->ri_qp_attr;
>   	struct ib_qp_init_attr *iattr = &ia->ri_qp_init_attr;
> +#endif
>   	int connstate = 0;
>
>   	switch (event->event) {
> @@ -235,14 +236,23 @@ rpcrdma_conn_upcall(struct rdma_cm_id *id, struct rdma_cm_event *event)
>   		complete(&ia->ri_done);
>   		break;
>   	case RDMA_CM_EVENT_ESTABLISHED:
> -		connstate = 1;
> +#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
> +		memset(attr, 0, sizeof(*attr));
>   		ib_query_qp(ia->ri_id->qp, attr,
> -			    IB_QP_MAX_QP_RD_ATOMIC | IB_QP_MAX_DEST_RD_ATOMIC,
> +			    IB_QP_MAX_QP_RD_ATOMIC |
> +			    IB_QP_MAX_DEST_RD_ATOMIC |
> +			    IB_QP_TIMEOUT,
>   			    iattr);
>   		dprintk("RPC:       %s: %d responder resources"
>   			" (%d initiator)\n",
>   			__func__, attr->max_dest_rd_atomic,
>   			attr->max_rd_atomic);
> +		timeout = 4096 * (1ULL << attr->timeout);
> +		do_div(timeout, NSEC_PER_SEC);
> +		dprintk("RPC:       %s: retry timeout: %llu seconds\n",
> +			__func__, timeout);
> +#endif

Can you put the debug in a separate patch, at first glance I was
confused how that helped reboot recovery...

WARNING: multiple messages have this Message-ID (diff)
From: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
To: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH v2 18/18] xprtrdma: Faster server reboot recovery
Date: Tue, 26 Apr 2016 23:31:54 +0300	[thread overview]
Message-ID: <571FD03A.2020100@grimberg.me> (raw)
In-Reply-To: <20160425192315.3566.4175.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>


> In a cluster failover scenario, it is desirable for the client to
> attempt to reconnect quickly, as an alternate NFS server is already
> waiting to take over for the down server. The client can't see that
> a server IP address has moved to a new server until the existing
> connection is gone.
>
> For fabrics and devices where it is meaningful, set an upper bound
> on the amount of time before it is determined that a connection is
> no longer valid. This allows the RPC client to detect connection
> loss in a timely matter, then perform a fresh resolution of the
> server GUID in case it has changed (cluster failover).
>
> Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> ---
>   net/sunrpc/xprtrdma/verbs.c |   28 ++++++++++++++++++++++++----
>   1 file changed, 24 insertions(+), 4 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index b7a5bc1..5cc57fb 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -211,9 +211,10 @@ rpcrdma_conn_upcall(struct rdma_cm_id *id, struct rdma_cm_event *event)
>   	struct rpcrdma_ep *ep = &xprt->rx_ep;
>   #if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
>   	struct sockaddr *sap = (struct sockaddr *)&ep->rep_remote_addr;
> -#endif
> +	u64 timeout;
>   	struct ib_qp_attr *attr = &ia->ri_qp_attr;
>   	struct ib_qp_init_attr *iattr = &ia->ri_qp_init_attr;
> +#endif
>   	int connstate = 0;
>
>   	switch (event->event) {
> @@ -235,14 +236,23 @@ rpcrdma_conn_upcall(struct rdma_cm_id *id, struct rdma_cm_event *event)
>   		complete(&ia->ri_done);
>   		break;
>   	case RDMA_CM_EVENT_ESTABLISHED:
> -		connstate = 1;
> +#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
> +		memset(attr, 0, sizeof(*attr));
>   		ib_query_qp(ia->ri_id->qp, attr,
> -			    IB_QP_MAX_QP_RD_ATOMIC | IB_QP_MAX_DEST_RD_ATOMIC,
> +			    IB_QP_MAX_QP_RD_ATOMIC |
> +			    IB_QP_MAX_DEST_RD_ATOMIC |
> +			    IB_QP_TIMEOUT,
>   			    iattr);
>   		dprintk("RPC:       %s: %d responder resources"
>   			" (%d initiator)\n",
>   			__func__, attr->max_dest_rd_atomic,
>   			attr->max_rd_atomic);
> +		timeout = 4096 * (1ULL << attr->timeout);
> +		do_div(timeout, NSEC_PER_SEC);
> +		dprintk("RPC:       %s: retry timeout: %llu seconds\n",
> +			__func__, timeout);
> +#endif

Can you put the debug in a separate patch, at first glance I was
confused how that helped reboot recovery...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2016-04-26 20:32 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-25 19:20 [PATCH v2 00/18] NFS/RDMA client patches for v4.7 Chuck Lever
2016-04-25 19:20 ` Chuck Lever
2016-04-25 19:20 ` [PATCH v2 01/18] sunrpc: Advertise maximum backchannel payload size Chuck Lever
2016-04-25 19:20   ` Chuck Lever
2016-04-25 19:21 ` [PATCH v2 02/18] xprtrdma: Bound the inline threshold values Chuck Lever
2016-04-25 19:21   ` Chuck Lever
2016-04-25 19:21 ` [PATCH v2 03/18] xprtrdma: Limit number of RDMA segments in RPC-over-RDMA headers Chuck Lever
2016-04-25 19:21   ` Chuck Lever
2016-04-26 19:43   ` Sagi Grimberg
2016-04-26 19:43     ` Sagi Grimberg
2016-04-25 19:21 ` [PATCH v2 04/18] xprtrdma: Prevent inline overflow Chuck Lever
2016-04-25 19:21   ` Chuck Lever
2016-04-26 19:55   ` Sagi Grimberg
2016-04-26 19:55     ` Sagi Grimberg
2016-04-26 20:04     ` Chuck Lever
2016-04-26 20:04       ` Chuck Lever
2016-04-26 20:42       ` Sagi Grimberg
2016-04-26 20:42         ` Sagi Grimberg
2016-04-26 20:56         ` Chuck Lever
2016-04-26 20:56           ` Chuck Lever
2016-04-25 19:21 ` [PATCH v2 05/18] xprtrdma: Avoid using Write list for small NFS READ requests Chuck Lever
2016-04-25 19:21   ` Chuck Lever
2016-04-26 19:56   ` Sagi Grimberg
2016-04-26 19:56     ` Sagi Grimberg
2016-04-25 19:21 ` [PATCH v2 06/18] xprtrdma: Update comments in rpcrdma_marshal_req() Chuck Lever
2016-04-25 19:21   ` Chuck Lever
2016-04-26 19:57   ` Sagi Grimberg
2016-04-26 19:57     ` Sagi Grimberg
2016-04-25 19:21 ` [PATCH v2 07/18] xprtrdma: Allow Read list and Reply chunk simultaneously Chuck Lever
2016-04-25 19:21   ` Chuck Lever
2016-04-26 20:04   ` Sagi Grimberg
2016-04-26 20:04     ` Sagi Grimberg
2016-04-25 19:21 ` [PATCH v2 08/18] xprtrdma: Remove rpcrdma_create_chunks() Chuck Lever
2016-04-25 19:21   ` Chuck Lever
2016-04-26 20:04   ` Sagi Grimberg
2016-04-26 20:04     ` Sagi Grimberg
2016-04-25 19:22 ` [PATCH v2 09/18] xprtrdma: Use core ib_drain_qp() API Chuck Lever
2016-04-25 19:22   ` Chuck Lever
2016-04-26 20:07   ` Sagi Grimberg
2016-04-26 20:07     ` Sagi Grimberg
2016-04-25 19:22 ` [PATCH v2 10/18] xprtrdma: Rename rpcrdma_frwr::sg and sg_nents Chuck Lever
2016-04-25 19:22   ` Chuck Lever
2016-04-26 20:08   ` Sagi Grimberg
2016-04-26 20:08     ` Sagi Grimberg
2016-04-25 19:22 ` [PATCH v2 11/18] xprtrdma: Save I/O direction in struct rpcrdma_frwr Chuck Lever
2016-04-25 19:22   ` Chuck Lever
2016-04-26 20:12   ` Sagi Grimberg
2016-04-26 20:12     ` Sagi Grimberg
2016-04-26 20:14     ` Chuck Lever
2016-04-26 20:14       ` Chuck Lever
2016-04-25 19:22 ` [PATCH v2 12/18] xprtrdma: Reset MRs in frwr_op_unmap_sync() Chuck Lever
2016-04-25 19:22   ` Chuck Lever
2016-04-26 20:13   ` Sagi Grimberg
2016-04-26 20:13     ` Sagi Grimberg
2016-04-25 19:22 ` [PATCH v2 13/18] xprtrdma: Refactor the FRWR recovery worker Chuck Lever
2016-04-25 19:22   ` Chuck Lever
2016-04-26 20:16   ` Sagi Grimberg
2016-04-26 20:16     ` Sagi Grimberg
2016-04-26 20:30     ` Chuck Lever
2016-04-26 20:30       ` Chuck Lever
2016-04-26 20:33       ` Sagi Grimberg
2016-04-26 20:33         ` Sagi Grimberg
2016-04-25 19:22 ` [PATCH v2 14/18] xprtrdma: Move fr_xprt and fr_worker to struct rpcrdma_mw Chuck Lever
2016-04-25 19:22   ` Chuck Lever
2016-04-26 20:18   ` Sagi Grimberg
2016-04-26 20:18     ` Sagi Grimberg
2016-04-25 19:22 ` [PATCH v2 15/18] xprtrdma: Refactor __fmr_dma_unmap() Chuck Lever
2016-04-25 19:22   ` Chuck Lever
2016-04-26 20:21   ` Sagi Grimberg
2016-04-26 20:21     ` Sagi Grimberg
2016-04-25 19:22 ` [PATCH v2 16/18] xprtrdma: Add ro_unmap_safe memreg method Chuck Lever
2016-04-25 19:22   ` Chuck Lever
2016-04-26 20:26   ` Sagi Grimberg
2016-04-26 20:26     ` Sagi Grimberg
2016-04-26 20:44     ` Chuck Lever
2016-04-26 20:44       ` Chuck Lever
2016-04-27 15:59       ` Removing NFS/RDMA client support for PHYSICAL memory registration Chuck Lever
2016-04-27 15:59         ` Chuck Lever
2016-04-28 10:59         ` Sagi Grimberg
2016-04-28 10:59           ` Sagi Grimberg
2016-04-25 19:23 ` [PATCH v2 17/18] xprtrdma: Remove ro_unmap() from all registration modes Chuck Lever
2016-04-25 19:23   ` Chuck Lever
2016-04-26 20:29   ` Sagi Grimberg
2016-04-26 20:29     ` Sagi Grimberg
2016-04-26 20:46     ` Chuck Lever
2016-04-26 20:46       ` Chuck Lever
2016-04-26 20:50       ` Sagi Grimberg
2016-04-26 20:50         ` Sagi Grimberg
2016-04-25 19:23 ` [PATCH v2 18/18] xprtrdma: Faster server reboot recovery Chuck Lever
2016-04-25 19:23   ` Chuck Lever
2016-04-26 20:31   ` Sagi Grimberg [this message]
2016-04-26 20:31     ` Sagi Grimberg
2016-04-26 14:13 ` [PATCH v2 00/18] NFS/RDMA client patches for v4.7 Steve Wise
2016-04-26 14:13   ` Steve Wise
2016-04-26 14:57   ` Chuck Lever
2016-04-26 14:57     ` Chuck Lever
2016-04-26 16:45     ` Steve Wise
2016-04-26 16:45       ` Steve Wise
2016-04-26 17:15       ` Chuck Lever
2016-04-26 17:15         ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=571FD03A.2020100@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.