linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Anna Schumaker <Anna.Schumaker@netapp.com>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: <linux-rdma@vger.kernel.org>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH v1 01/14] xprtrdma: Transport fault injection
Date: Tue, 5 May 2015 11:16:28 -0400	[thread overview]
Message-ID: <5548DECC.3080800@Netapp.com> (raw)
In-Reply-To: <2098B4A5-48C7-4458-BAC6-10F64359C405@oracle.com>

On 05/05/2015 11:15 AM, Chuck Lever wrote:
> 
> On May 5, 2015, at 10:44 AM, Anna Schumaker <Anna.Schumaker@netapp.com> wrote:
> 
>> On 05/05/2015 09:53 AM, Chuck Lever wrote:
>>>
>>> On May 5, 2015, at 9:49 AM, Anna Schumaker <Anna.Schumaker@netapp.com> wrote:
>>>
>>>> Hi Chuck,
>>>>
>>>> Neat idea!  Are servers able to handle client recovery without getting too confused?
>>>
>>> So far I have encountered only issues on the client side. I think this
>>> is because the client is the active part of re-establishing transport
>>> connections. In addition, RPC/RDMA clients have a bunch of resources
>>> that need to be reset after a transport disconnect.
>>>
>>> I think this idea can be translated into something that can be done
>>> in the generic layer (ie, xprt.c) if people think that would be of
>>> benefit for testing TCP also.
>>
>> It might, and now is the time to discuss it before we're stuck maintaining multiple interfaces to the same thing.
>>
>> Another thought:  can you move this under debugfs instead of proc?  That's where the other kernel fault injection controls are, and it might give us a little more flexibility if we need to change the interface later.
> 
> Something like /sys/kernel/debug/sunrpc/inject_transport_fault ?

That looks good to me! :)

Anna
> 
>>
>> Anna
>>>
>>>
>>>> Anna
>>>>
>>>> On 05/04/2015 01:56 PM, Chuck Lever wrote:
>>>>> It has been exceptionally useful to exercise the logic that handles
>>>>> local immediate errors and RDMA connection loss.  To enable
>>>>> developers to test this regularly and repeatably, add logic to
>>>>> simulate connection loss every so often.
>>>>>
>>>>> Fault injection is disabled by default. It is enabled with
>>>>>
>>>>> $ sudo echo xxx > /proc/sys/sunrpc/rdma_inject_transport_fault
>>>>>
>>>>> where "xxx" is a large positive number of transport method calls
>>>>> before a disconnect. A value of several thousand is usually a good
>>>>> number that allows reasonable forward progress while still causing a
>>>>> lot of connection drops.
>>>>>
>>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>>> ---
>>>>> net/sunrpc/Kconfig              |   12 ++++++++++++
>>>>> net/sunrpc/xprtrdma/transport.c |   34 ++++++++++++++++++++++++++++++++++
>>>>> net/sunrpc/xprtrdma/xprt_rdma.h |    1 +
>>>>> 3 files changed, 47 insertions(+)
>>>>>
>>>>> diff --git a/net/sunrpc/Kconfig b/net/sunrpc/Kconfig
>>>>> index 9068e72..329f82c 100644
>>>>> --- a/net/sunrpc/Kconfig
>>>>> +++ b/net/sunrpc/Kconfig
>>>>> @@ -61,6 +61,18 @@ config SUNRPC_XPRT_RDMA_CLIENT
>>>>>
>>>>> 	  If unsure, say N.
>>>>>
>>>>> +config SUNRPC_XPRT_RDMA_FAULT_INJECTION
>>>>> +	bool "RPC over RDMA client fault injection"
>>>>> +	depends on SUNRPC_XPRT_RDMA_CLIENT
>>>>> +	default N
>>>>> +	help
>>>>> +	  This option enables fault injection in the xprtrdma module.
>>>>> +	  Fault injection is disabled by default. It is enabled with:
>>>>> +
>>>>> +	    $ sudo echo xxx > /proc/sys/sunrpc/rdma_inject_fault
>>>>> +
>>>>> +	  If unsure, say N.
>>>>> +
>>>>> config SUNRPC_XPRT_RDMA_SERVER
>>>>> 	tristate "RPC over RDMA Server Support"
>>>>> 	depends on SUNRPC && INFINIBAND && INFINIBAND_ADDR_TRANS
>>>>> diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
>>>>> index 54f23b1..fdcb2c7 100644
>>>>> --- a/net/sunrpc/xprtrdma/transport.c
>>>>> +++ b/net/sunrpc/xprtrdma/transport.c
>>>>> @@ -74,6 +74,7 @@ static unsigned int xprt_rdma_max_inline_write = RPCRDMA_DEF_INLINE;
>>>>> static unsigned int xprt_rdma_inline_write_padding;
>>>>> static unsigned int xprt_rdma_memreg_strategy = RPCRDMA_FRMR;
>>>>> 		int xprt_rdma_pad_optimize = 1;
>>>>> +static unsigned int xprt_rdma_inject_transport_fault;
>>>>>
>>>>> #if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
>>>>>
>>>>> @@ -135,6 +136,13 @@ static struct ctl_table xr_tunables_table[] = {
>>>>> 		.mode		= 0644,
>>>>> 		.proc_handler	= proc_dointvec,
>>>>> 	},
>>>>> +	{
>>>>> +		.procname	= "rdma_inject_transport_fault",
>>>>> +		.data		= &xprt_rdma_inject_transport_fault,
>>>>> +		.maxlen		= sizeof(unsigned int),
>>>>> +		.mode		= 0644,
>>>>> +		.proc_handler	= proc_dointvec,
>>>>> +	},
>>>>> 	{ },
>>>>> };
>>>>>
>>>>> @@ -246,6 +254,27 @@ xprt_rdma_connect_worker(struct work_struct *work)
>>>>> 	xprt_clear_connecting(xprt);
>>>>> }
>>>>>
>>>>> +#if defined CONFIG_SUNRPC_XPRT_RDMA_FAULT_INJECTION
>>>>> +static void
>>>>> +xprt_rdma_inject_disconnect(struct rpcrdma_xprt *r_xprt)
>>>>> +{
>>>>> +	if (!xprt_rdma_inject_transport_fault)
>>>>> +		return;
>>>>> +
>>>>> +	if (atomic_dec_return(&r_xprt->rx_inject_count) == 0) {
>>>>> +		atomic_set(&r_xprt->rx_inject_count,
>>>>> +			   xprt_rdma_inject_transport_fault);
>>>>> +		pr_info("rpcrdma: injecting transport disconnect\n");
>>>>> +		(void)rdma_disconnect(r_xprt->rx_ia.ri_id);
>>>>> +	}
>>>>> +}
>>>>> +#else
>>>>> +static void
>>>>> +xprt_rdma_inject_disconnect(struct rpcrdma_xprt *r_xprt)
>>>>> +{
>>>>> +}
>>>>> +#endif
>>>>> +
>>>>> /*
>>>>> * xprt_rdma_destroy
>>>>> *
>>>>> @@ -405,6 +434,8 @@ xprt_setup_rdma(struct xprt_create *args)
>>>>> 	INIT_DELAYED_WORK(&new_xprt->rx_connect_worker,
>>>>> 			  xprt_rdma_connect_worker);
>>>>>
>>>>> +	atomic_set(&new_xprt->rx_inject_count,
>>>>> +		   xprt_rdma_inject_transport_fault);
>>>>> 	xprt_rdma_format_addresses(xprt);
>>>>> 	xprt->max_payload = new_xprt->rx_ia.ri_ops->ro_maxpages(new_xprt);
>>>>> 	if (xprt->max_payload == 0)
>>>>> @@ -515,6 +546,7 @@ xprt_rdma_allocate(struct rpc_task *task, size_t size)
>>>>> out:
>>>>> 	dprintk("RPC:       %s: size %zd, request 0x%p\n", __func__, size, req);
>>>>> 	req->rl_connect_cookie = 0;	/* our reserved value */
>>>>> +	xprt_rdma_inject_disconnect(r_xprt);
>>>>> 	return req->rl_sendbuf->rg_base;
>>>>>
>>>>> out_rdmabuf:
>>>>> @@ -589,6 +621,7 @@ xprt_rdma_free(void *buffer)
>>>>> 	}
>>>>>
>>>>> 	rpcrdma_buffer_put(req);
>>>>> +	xprt_rdma_inject_disconnect(r_xprt);
>>>>> }
>>>>>
>>>>> /*
>>>>> @@ -634,6 +667,7 @@ xprt_rdma_send_request(struct rpc_task *task)
>>>>>
>>>>> 	rqst->rq_xmit_bytes_sent += rqst->rq_snd_buf.len;
>>>>> 	rqst->rq_bytes_sent = 0;
>>>>> +	xprt_rdma_inject_disconnect(r_xprt);
>>>>> 	return 0;
>>>>>
>>>>> failed_marshal:
>>>>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
>>>>> index 78e0b8b..08aee53 100644
>>>>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
>>>>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
>>>>> @@ -377,6 +377,7 @@ struct rpcrdma_xprt {
>>>>> 	struct rpcrdma_create_data_internal rx_data;
>>>>> 	struct delayed_work	rx_connect_worker;
>>>>> 	struct rpcrdma_stats	rx_stats;
>>>>> +	atomic_t		rx_inject_count;
>>>>> };
>>>>>
>>>>> #define rpcx_to_rdmax(x) container_of(x, struct rpcrdma_xprt, rx_xprt)
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>> --
>>> Chuck Lever
>>> chuck[dot]lever[at]oracle[dot]com
>>>
>>>
>>>
>>
> 
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
> 
> 
> 


  reply	other threads:[~2015-05-05 15:16 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-04 17:56 [PATCH v1 00/14] client NFS/RDMA patches for 4.2 Chuck Lever
2015-05-04 17:56 ` [PATCH v1 01/14] xprtrdma: Transport fault injection Chuck Lever
2015-05-05 13:49   ` Anna Schumaker
2015-05-05 13:53     ` Chuck Lever
2015-05-05 14:44       ` Anna Schumaker
2015-05-05 15:15         ` Chuck Lever
2015-05-05 15:16           ` Anna Schumaker [this message]
2015-05-05 15:10   ` Steve Wise
2015-05-04 17:57 ` [PATCH v1 02/14] xprtrdma: Warn when there are orphaned IB objects Chuck Lever
2015-05-06 11:37   ` Devesh Sharma
2015-05-06 13:24     ` Chuck Lever
2015-05-06 14:05       ` Sagi Grimberg
2015-05-06 14:22       ` Devesh Sharma
2015-05-06 16:48         ` Jason Gunthorpe
2015-05-07  7:53           ` Devesh Sharma
2015-05-04 17:57 ` [PATCH v1 03/14] xprtrdma: Replace rpcrdma_rep::rr_buffer with rr_rxprt Chuck Lever
2015-05-07  9:38   ` Sagi Grimberg
2015-05-07 13:25     ` Chuck Lever
2015-05-04 17:57 ` [PATCH v1 04/14] xprtrdma: Use ib_device pointer safely Chuck Lever
2015-05-07 10:00   ` Sagi Grimberg
2015-05-07 13:39     ` Chuck Lever
2015-05-07 13:56       ` Sagi Grimberg
2015-05-07 14:12         ` Chuck Lever
2015-05-07 15:11           ` Sagi Grimberg
2015-05-11 15:22             ` Chuck Lever
2015-05-11 18:26             ` Hefty, Sean
2015-05-11 18:57               ` Chuck Lever
2015-05-12 10:01               ` Sagi Grimberg
2015-05-04 17:57 ` [PATCH v1 05/14] xprtrdma: Introduce helpers for allocating MWs Chuck Lever
2015-05-07 10:16   ` Sagi Grimberg
2015-05-04 17:57 ` [PATCH v1 06/14] xprtrdma: Acquire FMRs in rpcrdma_fmr_register_external() Chuck Lever
2015-05-07 10:15   ` Sagi Grimberg
2015-05-04 17:57 ` [PATCH v1 07/14] xprtrdma: Introduce an FRMR recovery workqueue Chuck Lever
2015-05-07 10:37   ` Devesh Sharma
2015-05-04 17:57 ` [PATCH v1 08/14] xprtrdma: Acquire MRs in rpcrdma_register_external() Chuck Lever
2015-05-07 10:31   ` Sagi Grimberg
2015-05-08 15:24     ` Devesh Sharma
2015-05-08 15:40       ` Chuck Lever
2015-05-10 10:17         ` Sagi Grimberg
2015-05-04 17:58 ` [PATCH v1 09/14] xprtrdma: Remove unused LOCAL_INV recovery logic Chuck Lever
2015-05-07 10:35   ` Sagi Grimberg
2015-05-08 15:31     ` Devesh Sharma
2015-05-04 17:58 ` [PATCH v1 10/14] xprtrdma: Remove ->ro_reset Chuck Lever
2015-05-07 10:36   ` Sagi Grimberg
2015-05-08 15:33     ` Devesh Sharma
2015-05-04 17:58 ` [PATCH v1 11/14] xprtrdma: Remove rpcrdma_ia::ri_memreg_strategy Chuck Lever
2015-05-07 10:36   ` Sagi Grimberg
2015-05-08 15:34     ` Devesh Sharma
2015-05-04 17:58 ` [PATCH v1 12/14] xprtrdma: Split rb_lock Chuck Lever
2015-05-07 10:37   ` Sagi Grimberg
2015-05-04 17:58 ` [PATCH v1 13/14] xprtrdma: Stack relief in fmr_op_map() Chuck Lever
2015-05-07 10:50   ` Sagi Grimberg
2015-05-08 15:36     ` Devesh Sharma
2015-05-04 17:58 ` [PATCH v1 14/14] xprtrmda: Reduce per-transport MR allocation Chuck Lever
2015-05-07 11:00   ` Sagi Grimberg
2015-05-08 15:53     ` Devesh Sharma
2015-05-05 15:17 ` [PATCH v1 00/14] client NFS/RDMA patches for 4.2 Steve Wise

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5548DECC.3080800@Netapp.com \
    --to=anna.schumaker@netapp.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).