* [PATCH v1 00/11] [RFC] NFS/RDMA client-side patches for 4.12
@ 2017-03-10 16:05 Chuck Lever
2017-03-10 16:05 ` [PATCH v1 01/11] xprtrdma: Annotate receive workqueue Chuck Lever
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: Chuck Lever @ 2017-03-10 16:05 UTC (permalink / raw)
To: linux-rdma, linux-nfs
These have seen some testing, but there remain some corner cases
that I need to chase down. Therefore this posting is for review
only.
The two main changes in this series are:
1. Break RPC-over-RDMA connections after an RPC timeout.
This gives the client's CM an opportunity to perform server and
network path rediscovery before retrying the timed-out RPC.
This design was selected because it is simple, and does not make any
changes to the normal RPC Call send hot path. Also, note the logic
already in xprt_rdma_send_request() that breaks the connection
anyway just before sending a retransmit.
2. Support unloading the driver of the underlying device.
Full support for the DEVICE_REMOVAL CM upcall is implemented in the
client-side RPC-over-RDMA consumer. Devesh's workaround is reverted,
since it is now no longer necessary.
In addition, support is added for restoring transport operation when
a new driver is subsequently loaded or when another device is
already available with connectivity to the NFS server.
Hopefully this is the basis for device hotplug, suspend/resume with
NFS/RDMA mounts, and handling device failover.
Available in the "nfs-rdma-for-4.12" topic branch of this git repo:
git://git.linux-nfs.org/projects/cel/cel-2.6.git
Or for browsing:
http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=log;h=refs/heads/nfs-rdma-for-4.12
---
Chuck Lever (11):
xprtrdma: Annotate receive workqueue
xprtrdma: Cancel refresh worker during buffer shutdown
xprtrdma: Clean up rpcrdma_marshal_req()
sunrpc: Export xprt_force_disconnect()
xprtrdma: Detect unreachable NFS/RDMA servers more reliably
xprtrdma: Refactor rpcrdma_ia_open()
xprtrdma: Use same device when mapping or syncing DMA buffers
xprtrdma: Support unplugging an HCA from under an NFS mount
xprtrdma: Refactor rpcrdma_ep_connect
xprtrdma: Restore transport after device removal
xprtrdma: Revert commit d0f36c46deea
net/sunrpc/xprt.c | 1
net/sunrpc/xprtrdma/rpc_rdma.c | 67 +++++---
net/sunrpc/xprtrdma/transport.c | 59 ++++++-
net/sunrpc/xprtrdma/verbs.c | 323 ++++++++++++++++++++++++++-------------
net/sunrpc/xprtrdma/xprt_rdma.h | 23 ++-
5 files changed, 327 insertions(+), 146 deletions(-)
--
Chuck Lever
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v1 01/11] xprtrdma: Annotate receive workqueue
2017-03-10 16:05 [PATCH v1 00/11] [RFC] NFS/RDMA client-side patches for 4.12 Chuck Lever
@ 2017-03-10 16:05 ` Chuck Lever
2017-03-10 16:06 ` [PATCH v1 02/11] xprtrdma: Cancel refresh worker during buffer shutdown Chuck Lever
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever @ 2017-03-10 16:05 UTC (permalink / raw)
To: linux-rdma, linux-nfs
Micro-optimize the receive workqueue by marking it's anchor "read-
mostly."
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/xprtrdma/verbs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 81cd31a..8448f89 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -70,7 +70,7 @@
* internal functions
*/
-static struct workqueue_struct *rpcrdma_receive_wq;
+static struct workqueue_struct *rpcrdma_receive_wq __read_mostly;
int
rpcrdma_alloc_wq(void)
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v1 02/11] xprtrdma: Cancel refresh worker during buffer shutdown
2017-03-10 16:05 [PATCH v1 00/11] [RFC] NFS/RDMA client-side patches for 4.12 Chuck Lever
2017-03-10 16:05 ` [PATCH v1 01/11] xprtrdma: Annotate receive workqueue Chuck Lever
@ 2017-03-10 16:06 ` Chuck Lever
2017-03-10 16:06 ` [PATCH v1 03/11] xprtrdma: Clean up rpcrdma_marshal_req() Chuck Lever
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever @ 2017-03-10 16:06 UTC (permalink / raw)
To: linux-rdma, linux-nfs
Trying to create MRs while the transport is being torn down can
cause a crash.
Fixes: e2ac236c0b65 ("xprtrdma: Allocate MRs on demand")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/xprtrdma/verbs.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 8448f89..6c99fc5 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1036,6 +1036,7 @@ struct rpcrdma_rep *
rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
{
cancel_delayed_work_sync(&buf->rb_recovery_worker);
+ cancel_delayed_work_sync(&buf->rb_refresh_worker);
while (!list_empty(&buf->rb_recv_bufs)) {
struct rpcrdma_rep *rep;
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v1 03/11] xprtrdma: Clean up rpcrdma_marshal_req()
2017-03-10 16:05 [PATCH v1 00/11] [RFC] NFS/RDMA client-side patches for 4.12 Chuck Lever
2017-03-10 16:05 ` [PATCH v1 01/11] xprtrdma: Annotate receive workqueue Chuck Lever
2017-03-10 16:06 ` [PATCH v1 02/11] xprtrdma: Cancel refresh worker during buffer shutdown Chuck Lever
@ 2017-03-10 16:06 ` Chuck Lever
2017-03-10 16:06 ` [PATCH v1 04/11] sunrpc: Export xprt_force_disconnect() Chuck Lever
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever @ 2017-03-10 16:06 UTC (permalink / raw)
To: linux-rdma, linux-nfs
Replace C-structure-based XDR encoding with pointer-based,
which is more portable, and more idiomatic.
Add appropriate documenting comment.
rpc_xprt is used only to derive rpcrdma_xprt, which the
caller already has. Pass that directly instead.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/xprtrdma/rpc_rdma.c | 63 ++++++++++++++++++++++-----------------
net/sunrpc/xprtrdma/transport.c | 2 +
net/sunrpc/xprtrdma/xprt_rdma.h | 2 +
3 files changed, 38 insertions(+), 29 deletions(-)
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index a044be2..103491e 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -651,37 +651,46 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
req->rl_mapped_sges = 0;
}
-/*
- * Marshal a request: the primary job of this routine is to choose
- * the transfer modes. See comments below.
+/**
+ * rpcrdma_marshal_req - Marshal and send one RPC request
+ * @r_xprt: controlling transport
+ * @rqst: RPC request to be marshaled
*
- * Returns zero on success, otherwise a negative errno.
+ * For the RPC in "rqst", this function:
+ * - Chooses the transfer mode (eg., RDMA_MSG or RDMA_NOMSG)
+ * - Registers Read, Write, and Reply chunks
+ * - Constructs the transport header
+ * - Posts a Send WR to send the transport header and request
+ *
+ * Returns:
+ * 0: the RPC was sent successfully
+ * ENOTCONN: the connection was lost
+ * EAGAIN: no pages are available for on-demand reply buffer
+ * ENOBUFS: no MRs are available to register chunks
+ * EIO: a permanent problem occurred while marshaling
*/
-
int
-rpcrdma_marshal_req(struct rpc_rqst *rqst)
+rpcrdma_marshal_req(struct rpcrdma_xprt *r_xprt, struct rpc_rqst *rqst)
{
- struct rpc_xprt *xprt = rqst->rq_xprt;
- struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
struct rpcrdma_req *req = rpcr_to_rdmar(rqst);
+ struct rpcrdma_regbuf *rb = req->rl_rdmabuf;
enum rpcrdma_chunktype rtype, wtype;
- struct rpcrdma_msg *headerp;
bool ddp_allowed;
ssize_t hdrlen;
size_t rpclen;
- __be32 *iptr;
+ __be32 *p;
#if defined(CONFIG_SUNRPC_BACKCHANNEL)
if (test_bit(RPC_BC_PA_IN_USE, &rqst->rq_bc_pa_state))
return rpcrdma_bc_marshal_reply(rqst);
#endif
- headerp = rdmab_to_msg(req->rl_rdmabuf);
+ p = rb->rg_base;
/* don't byte-swap XID, it's already done in request */
- headerp->rm_xid = rqst->rq_xid;
- headerp->rm_vers = rpcrdma_version;
- headerp->rm_credit = cpu_to_be32(r_xprt->rx_buf.rb_max_requests);
- headerp->rm_type = rdma_msg;
+ *p++ = rqst->rq_xid;
+ *p++ = rpcrdma_version;
+ *p++ = cpu_to_be32(r_xprt->rx_buf.rb_max_requests);
+ *p = rdma_msg;
/* When the ULP employs a GSS flavor that guarantees integrity
* or privacy, direct data placement of individual data items
@@ -729,7 +738,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
rqst->rq_snd_buf.tail[0].iov_len;
} else {
r_xprt->rx_stats.nomsg_call_count++;
- headerp->rm_type = htonl(RDMA_NOMSG);
+ *p = rdma_nomsg;
rtype = rpcrdma_areadch;
rpclen = 0;
}
@@ -756,17 +765,17 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
* send a Call message with a Position Zero Read chunk and a
* regular Read chunk at the same time.
*/
- iptr = headerp->rm_body.rm_chunks;
- iptr = rpcrdma_encode_read_list(r_xprt, req, rqst, iptr, rtype);
- if (IS_ERR(iptr))
+ p++;
+ p = rpcrdma_encode_read_list(r_xprt, req, rqst, p, rtype);
+ if (IS_ERR(p))
goto out_err;
- iptr = rpcrdma_encode_write_list(r_xprt, req, rqst, iptr, wtype);
- if (IS_ERR(iptr))
+ p = rpcrdma_encode_write_list(r_xprt, req, rqst, p, wtype);
+ if (IS_ERR(p))
goto out_err;
- iptr = rpcrdma_encode_reply_chunk(r_xprt, req, rqst, iptr, wtype);
- if (IS_ERR(iptr))
+ p = rpcrdma_encode_reply_chunk(r_xprt, req, rqst, p, wtype);
+ if (IS_ERR(p))
goto out_err;
- hdrlen = (unsigned char *)iptr - (unsigned char *)headerp;
+ hdrlen = (unsigned char *)p - (unsigned char *)rb->rg_base;
dprintk("RPC: %5u %s: %s/%s: hdrlen %zd rpclen %zd\n",
rqst->rq_task->tk_pid, __func__,
@@ -775,16 +784,16 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
if (!rpcrdma_prepare_send_sges(&r_xprt->rx_ia, req, hdrlen,
&rqst->rq_snd_buf, rtype)) {
- iptr = ERR_PTR(-EIO);
+ p = ERR_PTR(-EIO);
goto out_err;
}
return 0;
out_err:
pr_err("rpcrdma: rpcrdma_marshal_req failed, status %ld\n",
- PTR_ERR(iptr));
+ PTR_ERR(p));
r_xprt->rx_stats.failed_marshal_count++;
- return PTR_ERR(iptr);
+ return PTR_ERR(p);
}
/*
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index c717f54..26c9a19 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -689,7 +689,7 @@
if (unlikely(!list_empty(&req->rl_registered)))
r_xprt->rx_ia.ri_ops->ro_unmap_safe(r_xprt, req, false);
- rc = rpcrdma_marshal_req(rqst);
+ rc = rpcrdma_marshal_req(r_xprt, rqst);
if (rc < 0)
goto failed_marshal;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 171a351..e6d76a0 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -586,7 +586,7 @@ enum rpcrdma_chunktype {
bool rpcrdma_prepare_send_sges(struct rpcrdma_ia *, struct rpcrdma_req *,
u32, struct xdr_buf *, enum rpcrdma_chunktype);
void rpcrdma_unmap_sges(struct rpcrdma_ia *, struct rpcrdma_req *);
-int rpcrdma_marshal_req(struct rpc_rqst *);
+int rpcrdma_marshal_req(struct rpcrdma_xprt *r_xprt, struct rpc_rqst *rqst);
void rpcrdma_set_max_header_sizes(struct rpcrdma_xprt *);
void rpcrdma_reply_handler(struct work_struct *work);
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v1 04/11] sunrpc: Export xprt_force_disconnect()
2017-03-10 16:05 [PATCH v1 00/11] [RFC] NFS/RDMA client-side patches for 4.12 Chuck Lever
` (2 preceding siblings ...)
2017-03-10 16:06 ` [PATCH v1 03/11] xprtrdma: Clean up rpcrdma_marshal_req() Chuck Lever
@ 2017-03-10 16:06 ` Chuck Lever
2017-03-10 16:06 ` [PATCH v1 05/11] xprtrdma: Detect unreachable NFS/RDMA servers more reliably Chuck Lever
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever @ 2017-03-10 16:06 UTC (permalink / raw)
To: linux-rdma, linux-nfs
xprt_force_disconnect() is already invoked from the socket
transport. I want to invoke xprt_force_disconnect() from the
RPC-over-RDMA transport, which is a separate module from sunrpc.ko.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/xprt.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index b530a28..3e63c5e 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -651,6 +651,7 @@ void xprt_force_disconnect(struct rpc_xprt *xprt)
xprt_wake_pending_tasks(xprt, -EAGAIN);
spin_unlock_bh(&xprt->transport_lock);
}
+EXPORT_SYMBOL_GPL(xprt_force_disconnect);
/**
* xprt_conditional_disconnect - force a transport to disconnect
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v1 05/11] xprtrdma: Detect unreachable NFS/RDMA servers more reliably
2017-03-10 16:05 [PATCH v1 00/11] [RFC] NFS/RDMA client-side patches for 4.12 Chuck Lever
` (3 preceding siblings ...)
2017-03-10 16:06 ` [PATCH v1 04/11] sunrpc: Export xprt_force_disconnect() Chuck Lever
@ 2017-03-10 16:06 ` Chuck Lever
2017-03-10 16:06 ` [PATCH v1 06/11] xprtrdma: Refactor rpcrdma_ia_open() Chuck Lever
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever @ 2017-03-10 16:06 UTC (permalink / raw)
To: linux-rdma, linux-nfs
Current NFS clients rely on connection loss to determine when to
retransmit. In particular, for protocols like NFSv4, clients no
longer rely on RPC timeouts to drive retransmission: NFSv4 servers
are required to terminate a connection when they need a client to
retransmit pending RPCs.
When a server is no longer reachable, either because it has crashed
or because the network path has broken, the server cannot actively
terminate a connection. Thus NFS clients depend on transport-level
keepalive to determine when a connection must be replaced and
pending RPCs retransmitted.
However, RDMA RC connections do not have a native keepalive
mechanism. If an NFS/RDMA server crashes after a client has sent
RPCs successfully (an RC ACK has been received for all OTW RDMA
requests), there is no way for the client to know the connection is
moribund.
In addition, new RDMA requests are subject to the RPC-over-RDMA
credit limit. If the client has consumed all granted credits with
NFS traffic, it is not allowed to send another RDMA request until
the server replies. Thus it has no way to send a true keepalive when
the workload has already consumed all credits with pending RPCs.
To address this, forcibly disconnect a transport when an RPC times
out. This prevents moribund connections from stopping the
detection of failover or other configuration changes on the server.
Note that even if the connection is still good, retransmitting
any RPC will trigger a disconnect thanks to this logic in
xprt_rdma_send_request:
/* Must suppress retransmit to maintain credits */
if (req->rl_connect_cookie == xprt->connect_cookie)
goto drop_connection;
req->rl_connect_cookie = xprt->connect_cookie;
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/xprtrdma/transport.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 26c9a19..240f0da 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -484,6 +484,27 @@
dprintk("RPC: %s: %u\n", __func__, port);
}
+/**
+ * xprt_rdma_timer - invoked when an RPC times out
+ * @xprt: controlling RPC transport
+ * @task: RPC task that timed out
+ *
+ * Invoked when the transport is still connected, but an RPC
+ * retransmit timeout occurs.
+ *
+ * Since RDMA connections don't have a keep-alive, forcibly
+ * disconnect and retry to connect. This drives full
+ * detection of the network path, and retransmissions of
+ * all pending RPCs.
+ */
+static void
+xprt_rdma_timer(struct rpc_xprt *xprt, struct rpc_task *task)
+{
+ dprintk("RPC: %5u %s: xprt = %p\n", task->tk_pid, __func__, xprt);
+
+ xprt_force_disconnect(xprt);
+}
+
static void
xprt_rdma_connect(struct rpc_xprt *xprt, struct rpc_task *task)
{
@@ -776,6 +797,7 @@ void xprt_rdma_print_stats(struct rpc_xprt *xprt, struct seq_file *seq)
.alloc_slot = xprt_alloc_slot,
.release_request = xprt_release_rqst_cong, /* ditto */
.set_retrans_timeout = xprt_set_retrans_timeout_def, /* ditto */
+ .timer = xprt_rdma_timer,
.rpcbind = rpcb_getport_async, /* sunrpc/rpcb_clnt.c */
.set_port = xprt_rdma_set_port,
.connect = xprt_rdma_connect,
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v1 06/11] xprtrdma: Refactor rpcrdma_ia_open()
2017-03-10 16:05 [PATCH v1 00/11] [RFC] NFS/RDMA client-side patches for 4.12 Chuck Lever
` (4 preceding siblings ...)
2017-03-10 16:06 ` [PATCH v1 05/11] xprtrdma: Detect unreachable NFS/RDMA servers more reliably Chuck Lever
@ 2017-03-10 16:06 ` Chuck Lever
2017-03-10 16:06 ` [PATCH v1 07/11] xprtrdma: Use same device when mapping or syncing DMA buffers Chuck Lever
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever @ 2017-03-10 16:06 UTC (permalink / raw)
To: linux-rdma, linux-nfs
In order to unload a device driver and reload it, xprtrdma will need
to close a transport's interface adapter, and then call
rpcrdma_ia_open again, possibly finding a different interface
adapter.
Make rpcrdma_ia_open safe to call on the same transport multiple
times.
This is a refactoring change only.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/xprtrdma/transport.c | 6 +++--
net/sunrpc/xprtrdma/verbs.c | 46 ++++++++++++++++++++-------------------
net/sunrpc/xprtrdma/xprt_rdma.h | 7 +++++-
3 files changed, 32 insertions(+), 27 deletions(-)
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 240f0da..e27804c 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -66,8 +66,8 @@
unsigned int xprt_rdma_max_inline_read = RPCRDMA_DEF_INLINE;
static unsigned int xprt_rdma_max_inline_write = RPCRDMA_DEF_INLINE;
static unsigned int xprt_rdma_inline_write_padding;
-static unsigned int xprt_rdma_memreg_strategy = RPCRDMA_FRMR;
- int xprt_rdma_pad_optimize = 0;
+unsigned int xprt_rdma_memreg_strategy = RPCRDMA_FRMR;
+int xprt_rdma_pad_optimize;
#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
@@ -396,7 +396,7 @@
new_xprt = rpcx_to_rdmax(xprt);
- rc = rpcrdma_ia_open(new_xprt, sap, xprt_rdma_memreg_strategy);
+ rc = rpcrdma_ia_open(new_xprt, sap);
if (rc)
goto out1;
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 6c99fc5..5e5f004 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -413,13 +413,16 @@ static void rpcrdma_destroy_id(struct rdma_cm_id *id)
* Exported functions.
*/
-/*
- * Open and initialize an Interface Adapter.
- * o initializes fields of struct rpcrdma_ia, including
- * interface and provider attributes and protection zone.
+/**
+ * rpcrdma_ia_open - Open and initialize an Interface Adapter.
+ * @xprt: controlling transport
+ * @addr: IP address of remote peer
+ *
+ * Returns 0 on success, negative errno if an appropriate
+ * Interface Adapter could not be found and opened.
*/
int
-rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
+rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr)
{
struct rpcrdma_ia *ia = &xprt->rx_ia;
int rc;
@@ -427,7 +430,7 @@ static void rpcrdma_destroy_id(struct rdma_cm_id *id)
ia->ri_id = rpcrdma_create_id(xprt, ia, addr);
if (IS_ERR(ia->ri_id)) {
rc = PTR_ERR(ia->ri_id);
- goto out1;
+ goto out_err;
}
ia->ri_device = ia->ri_id->device;
@@ -435,10 +438,10 @@ static void rpcrdma_destroy_id(struct rdma_cm_id *id)
if (IS_ERR(ia->ri_pd)) {
rc = PTR_ERR(ia->ri_pd);
pr_err("rpcrdma: ib_alloc_pd() returned %d\n", rc);
- goto out2;
+ goto out_err;
}
- switch (memreg) {
+ switch (xprt_rdma_memreg_strategy) {
case RPCRDMA_FRMR:
if (frwr_is_supported(ia)) {
ia->ri_ops = &rpcrdma_frwr_memreg_ops;
@@ -452,28 +455,23 @@ static void rpcrdma_destroy_id(struct rdma_cm_id *id)
}
/*FALLTHROUGH*/
default:
- pr_err("rpcrdma: Unsupported memory registration mode: %d\n",
- memreg);
+ pr_err("rpcrdma: Device %s does not support memreg mode %d\n",
+ ia->ri_device->name, xprt_rdma_memreg_strategy);
rc = -EINVAL;
- goto out3;
+ goto out_err;
}
return 0;
-out3:
- ib_dealloc_pd(ia->ri_pd);
- ia->ri_pd = NULL;
-out2:
- rpcrdma_destroy_id(ia->ri_id);
- ia->ri_id = NULL;
-out1:
+out_err:
+ rpcrdma_ia_close(ia);
return rc;
}
-/*
- * Clean up/close an IA.
- * o if event handles and PD have been initialized, free them.
- * o close the IA
+/**
+ * rpcrdma_ia_close - Clean up/close an IA.
+ * @ia: interface adapter to close
+ *
*/
void
rpcrdma_ia_close(struct rpcrdma_ia *ia)
@@ -483,12 +481,14 @@ static void rpcrdma_destroy_id(struct rdma_cm_id *id)
if (ia->ri_id->qp)
rdma_destroy_qp(ia->ri_id);
rpcrdma_destroy_id(ia->ri_id);
- ia->ri_id = NULL;
}
+ ia->ri_id = NULL;
+ ia->ri_device = NULL;
/* If the pd is still busy, xprtrdma missed freeing a resource */
if (ia->ri_pd && !IS_ERR(ia->ri_pd))
ib_dealloc_pd(ia->ri_pd);
+ ia->ri_pd = NULL;
}
/*
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index e6d76a0..775764c 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -497,10 +497,15 @@ struct rpcrdma_xprt {
* Default is 0, see sysctl entry and rpc_rdma.c rpcrdma_convert_iovs() */
extern int xprt_rdma_pad_optimize;
+/* This setting controls the hunt for a supported memory
+ * registration strategy.
+ */
+extern unsigned int xprt_rdma_memreg_strategy;
+
/*
* Interface Adapter calls - xprtrdma/verbs.c
*/
-int rpcrdma_ia_open(struct rpcrdma_xprt *, struct sockaddr *, int);
+int rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr);
void rpcrdma_ia_close(struct rpcrdma_ia *);
bool frwr_is_supported(struct rpcrdma_ia *);
bool fmr_is_supported(struct rpcrdma_ia *);
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v1 07/11] xprtrdma: Use same device when mapping or syncing DMA buffers
2017-03-10 16:05 [PATCH v1 00/11] [RFC] NFS/RDMA client-side patches for 4.12 Chuck Lever
` (5 preceding siblings ...)
2017-03-10 16:06 ` [PATCH v1 06/11] xprtrdma: Refactor rpcrdma_ia_open() Chuck Lever
@ 2017-03-10 16:06 ` Chuck Lever
2017-03-10 16:06 ` [PATCH v1 08/11] xprtrdma: Support unplugging an HCA from under an NFS mount Chuck Lever
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever @ 2017-03-10 16:06 UTC (permalink / raw)
To: linux-rdma, linux-nfs
When the underlying device driver is reloaded, ia->ri_device will be
replaced. All cached copies of that device pointer have to be
updated as well.
Commit 54cbd6b0c6b9 ("xprtrdma: Delay DMA mapping Send and Receive
buffers") added the rg_device field to each regbuf. As part of
handling a device removal, rpcrdma_dma_unmap_regbuf is invoked on
all regbufs for a transport.
Simply calling rpcrdma_dma_map_regbuf for each Receive buffer after
the driver has been reloaded should reinitialize rg_device correctly
for every case except rpcrdma_wc_receive, which still uses
rpcrdma_rep::rr_device.
Ensure the same device that was used to map a Receive buffer is also
used to sync it in rpcrdma_wc_receive by using rg_device there
instead of rr_device.
This is the only use of rr_device, so it can be removed.
The use of regbufs in the send path is also updated, for
completeness.
Fixes: 54cbd6b0c6b9 ("xprtrdma: Delay DMA mapping Send and ... ")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/xprtrdma/rpc_rdma.c | 4 ++--
net/sunrpc/xprtrdma/verbs.c | 12 ++++++------
net/sunrpc/xprtrdma/xprt_rdma.h | 7 ++++++-
3 files changed, 14 insertions(+), 9 deletions(-)
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 103491e..eac38f6 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -494,7 +494,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
}
sge->length = len;
- ib_dma_sync_single_for_device(ia->ri_device, sge->addr,
+ ib_dma_sync_single_for_device(rdmab_device(rb), sge->addr,
sge->length, DMA_TO_DEVICE);
req->rl_send_wr.num_sge++;
return true;
@@ -523,7 +523,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
sge[sge_no].addr = rdmab_addr(rb);
sge[sge_no].length = xdr->head[0].iov_len;
sge[sge_no].lkey = rdmab_lkey(rb);
- ib_dma_sync_single_for_device(device, sge[sge_no].addr,
+ ib_dma_sync_single_for_device(rdmab_device(rb), sge[sge_no].addr,
sge[sge_no].length, DMA_TO_DEVICE);
/* If there is a Read chunk, the page list is being handled
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 5e5f004..b68b204 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -180,7 +180,7 @@
rep->rr_wc_flags = wc->wc_flags;
rep->rr_inv_rkey = wc->ex.invalidate_rkey;
- ib_dma_sync_single_for_cpu(rep->rr_device,
+ ib_dma_sync_single_for_cpu(rdmab_device(rep->rr_rdmabuf),
rdmab_addr(rep->rr_rdmabuf),
rep->rr_len, DMA_FROM_DEVICE);
@@ -877,7 +877,6 @@ struct rpcrdma_rep *
rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt)
{
struct rpcrdma_create_data_internal *cdata = &r_xprt->rx_data;
- struct rpcrdma_ia *ia = &r_xprt->rx_ia;
struct rpcrdma_rep *rep;
int rc;
@@ -893,7 +892,6 @@ struct rpcrdma_rep *
goto out_free;
}
- rep->rr_device = ia->ri_device;
rep->rr_cqe.done = rpcrdma_wc_receive;
rep->rr_rxprt = r_xprt;
INIT_WORK(&rep->rr_work, rpcrdma_reply_handler);
@@ -1231,17 +1229,19 @@ struct rpcrdma_regbuf *
bool
__rpcrdma_dma_map_regbuf(struct rpcrdma_ia *ia, struct rpcrdma_regbuf *rb)
{
+ struct ib_device *device = ia->ri_device;
+
if (rb->rg_direction == DMA_NONE)
return false;
- rb->rg_iov.addr = ib_dma_map_single(ia->ri_device,
+ rb->rg_iov.addr = ib_dma_map_single(device,
(void *)rb->rg_base,
rdmab_length(rb),
rb->rg_direction);
- if (ib_dma_mapping_error(ia->ri_device, rdmab_addr(rb)))
+ if (ib_dma_mapping_error(device, rdmab_addr(rb)))
return false;
- rb->rg_device = ia->ri_device;
+ rb->rg_device = device;
rb->rg_iov.lkey = ia->ri_pd->local_dma_lkey;
return true;
}
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 775764c..4c0fd4d 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -164,6 +164,12 @@ struct rpcrdma_regbuf {
return (struct rpcrdma_msg *)rb->rg_base;
}
+static inline struct ib_device *
+rdmab_device(struct rpcrdma_regbuf *rb)
+{
+ return rb->rg_device;
+}
+
#define RPCRDMA_DEF_GFP (GFP_NOIO | __GFP_NOWARN)
/* To ensure a transport can always make forward progress,
@@ -209,7 +215,6 @@ struct rpcrdma_rep {
unsigned int rr_len;
int rr_wc_flags;
u32 rr_inv_rkey;
- struct ib_device *rr_device;
struct rpcrdma_xprt *rr_rxprt;
struct work_struct rr_work;
struct list_head rr_list;
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v1 08/11] xprtrdma: Support unplugging an HCA from under an NFS mount
2017-03-10 16:05 [PATCH v1 00/11] [RFC] NFS/RDMA client-side patches for 4.12 Chuck Lever
` (6 preceding siblings ...)
2017-03-10 16:06 ` [PATCH v1 07/11] xprtrdma: Use same device when mapping or syncing DMA buffers Chuck Lever
@ 2017-03-10 16:06 ` Chuck Lever
2017-03-10 16:07 ` [PATCH v1 09/11] xprtrdma: Refactor rpcrdma_ep_connect Chuck Lever
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever @ 2017-03-10 16:06 UTC (permalink / raw)
To: linux-rdma, linux-nfs
The device driver for the underlying physical device associated
with an RPC-over-RDMA transport can be removed while RPC-over-RDMA
transports are still in use (ie, while NFS filesystems are still
mounted and active). The IB core performs a connection event upcall
to request that consumers free all RDMA resources associated with
a transport.
There may be pending RPCs when this occurs. Care must be taken to
release associated resources without leaving references that can
trigger a subsequent crash if a signal or soft timeout occurs. We
rely on the caller of the transport's ->close method to ensure that
the previous RPC task has invoked xprt_release but the transport
remains write-locked.
A DEVICE_REMOVE upcall forces a disconnect then sleeps. When ->close
is invoked, it destroys the transport's H/W resources, then wakes
the upcall, which completes and allows the core driver unload to
continue.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/xprtrdma/transport.c | 29 +++++++++++++--
net/sunrpc/xprtrdma/verbs.c | 74 +++++++++++++++++++++++++++++++++++++--
net/sunrpc/xprtrdma/xprt_rdma.h | 7 ++++
3 files changed, 101 insertions(+), 9 deletions(-)
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index e27804c..94bb375 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -457,19 +457,33 @@
return ERR_PTR(rc);
}
-/*
- * Close a connection, during shutdown or timeout/reconnect
+/**
+ * xprt_rdma_close - Close down RDMA connection
+ * @xprt: generic transport to be closed
+ *
+ * Called during transport shutdown reconnect, or device
+ * removal. Caller holds the transport's write lock.
*/
static void
xprt_rdma_close(struct rpc_xprt *xprt)
{
struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
+ struct rpcrdma_ep *ep = &r_xprt->rx_ep;
+ struct rpcrdma_ia *ia = &r_xprt->rx_ia;
+
+ dprintk("RPC: %s: closing xprt %p\n", __func__, xprt);
- dprintk("RPC: %s: closing\n", __func__);
- if (r_xprt->rx_ep.rep_connected > 0)
+ if (test_and_clear_bit(RPCRDMA_IAF_REMOVING, &ia->ri_flags)) {
+ xprt_clear_connected(xprt);
+ rpcrdma_ia_remove(ia);
+ return;
+ }
+ if (ep->rep_connected == -ENODEV)
+ return;
+ if (ep->rep_connected > 0)
xprt->reestablish_timeout = 0;
xprt_disconnect_done(xprt);
- rpcrdma_ep_disconnect(&r_xprt->rx_ep, &r_xprt->rx_ia);
+ rpcrdma_ep_disconnect(ep, ia);
}
static void
@@ -680,6 +694,8 @@
* xprt_rdma_send_request - marshal and send an RPC request
* @task: RPC task with an RPC message in rq_snd_buf
*
+ * Caller holds the transport's write lock.
+ *
* Return values:
* 0: The request has been sent
* ENOTCONN: Caller needs to invoke connect logic then call again
@@ -706,6 +722,9 @@
struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
int rc = 0;
+ if (!xprt_connected(xprt))
+ goto drop_connection;
+
/* On retransmit, remove any previously registered chunks */
if (unlikely(!list_empty(&req->rl_registered)))
r_xprt->rx_ia.ri_ops->ro_unmap_safe(r_xprt, req, false);
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index b68b204..5bfefee 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -69,6 +69,8 @@
/*
* internal functions
*/
+static void rpcrdma_destroy_mrs(struct rpcrdma_buffer *buf);
+static void rpcrdma_dma_unmap_regbuf(struct rpcrdma_regbuf *rb);
static struct workqueue_struct *rpcrdma_receive_wq __read_mostly;
@@ -262,6 +264,21 @@
__func__, ep);
complete(&ia->ri_done);
break;
+ case RDMA_CM_EVENT_DEVICE_REMOVAL:
+#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
+ pr_info("rpcrdma: removing device for %pIS:%u\n",
+ sap, rpc_get_port(sap));
+#endif
+ set_bit(RPCRDMA_IAF_REMOVING, &ia->ri_flags);
+ ep->rep_connected = -ENODEV;
+ xprt_force_disconnect(&xprt->rx_xprt);
+ wait_for_completion(&ia->ri_remove_done);
+
+ ia->ri_id = NULL;
+ ia->ri_pd = NULL;
+ ia->ri_device = NULL;
+ /* Return 1 to ensure the core destroys the id. */
+ return 1;
case RDMA_CM_EVENT_ESTABLISHED:
connstate = 1;
ib_query_qp(ia->ri_id->qp, attr,
@@ -291,9 +308,6 @@
goto connected;
case RDMA_CM_EVENT_DISCONNECTED:
connstate = -ECONNABORTED;
- goto connected;
- case RDMA_CM_EVENT_DEVICE_REMOVAL:
- connstate = -ENODEV;
connected:
dprintk("RPC: %s: %sconnected\n",
__func__, connstate > 0 ? "" : "dis");
@@ -346,6 +360,7 @@ static void rpcrdma_destroy_id(struct rdma_cm_id *id)
int rc;
init_completion(&ia->ri_done);
+ init_completion(&ia->ri_remove_done);
id = rdma_create_id(&init_net, rpcrdma_conn_upcall, xprt, RDMA_PS_TCP,
IB_QPT_RC);
@@ -469,6 +484,56 @@ static void rpcrdma_destroy_id(struct rdma_cm_id *id)
}
/**
+ * rpcrdma_ia_remove - Handle device driver unload
+ * @ia: interface adapter being removed
+ *
+ * Divest transport H/W resources associated with this adapter,
+ * but allow it to be restored later.
+ */
+void
+rpcrdma_ia_remove(struct rpcrdma_ia *ia)
+{
+ struct rpcrdma_xprt *r_xprt = container_of(ia, struct rpcrdma_xprt,
+ rx_ia);
+ struct rpcrdma_ep *ep = &r_xprt->rx_ep;
+ struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
+ struct rpcrdma_req *req;
+ struct rpcrdma_rep *rep;
+
+ cancel_delayed_work_sync(&buf->rb_refresh_worker);
+
+ /* This is similar to rpcrdma_ep_destroy, but:
+ * - Don't cancel the connect worker.
+ * - Don't call rpcrdma_ep_disconnect, which waits
+ * for another conn upcall, which will deadlock.
+ * - rdma_disconnect is unneeded, the underlying
+ * connection is already gone.
+ */
+ if (ia->ri_id->qp) {
+ ib_drain_qp(ia->ri_id->qp);
+ rdma_destroy_qp(ia->ri_id);
+ ia->ri_id->qp = NULL;
+ }
+ ib_free_cq(ep->rep_attr.recv_cq);
+ ib_free_cq(ep->rep_attr.send_cq);
+
+ /* The ULP is responsible for ensuring all DMA
+ * mappings and MRs are gone.
+ */
+ list_for_each_entry(rep, &buf->rb_recv_bufs, rr_list)
+ rpcrdma_dma_unmap_regbuf(rep->rr_rdmabuf);
+ list_for_each_entry(req, &buf->rb_allreqs, rl_all) {
+ rpcrdma_dma_unmap_regbuf(req->rl_rdmabuf);
+ rpcrdma_dma_unmap_regbuf(req->rl_sendbuf);
+ rpcrdma_dma_unmap_regbuf(req->rl_recvbuf);
+ }
+ rpcrdma_destroy_mrs(buf);
+
+ /* Allow waiters to continue */
+ complete(&ia->ri_remove_done);
+}
+
+/**
* rpcrdma_ia_close - Clean up/close an IA.
* @ia: interface adapter to close
*
@@ -1079,7 +1144,8 @@ struct rpcrdma_mw *
out_nomws:
dprintk("RPC: %s: no MWs available\n", __func__);
- schedule_delayed_work(&buf->rb_refresh_worker, 0);
+ if (r_xprt->rx_ep.rep_connected != -ENODEV)
+ schedule_delayed_work(&buf->rb_refresh_worker, 0);
/* Allow the reply handler and refresh worker to run */
cond_resched();
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 4c0fd4d..7b8701d 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -69,6 +69,7 @@ struct rpcrdma_ia {
struct rdma_cm_id *ri_id;
struct ib_pd *ri_pd;
struct completion ri_done;
+ struct completion ri_remove_done;
int ri_async_rc;
unsigned int ri_max_segs;
unsigned int ri_max_frmr_depth;
@@ -78,10 +79,15 @@ struct rpcrdma_ia {
bool ri_reminv_expected;
bool ri_implicit_roundup;
enum ib_mr_type ri_mrtype;
+ unsigned long ri_flags;
struct ib_qp_attr ri_qp_attr;
struct ib_qp_init_attr ri_qp_init_attr;
};
+enum {
+ RPCRDMA_IAF_REMOVING = 0,
+};
+
/*
* RDMA Endpoint -- one per transport instance
*/
@@ -511,6 +517,7 @@ struct rpcrdma_xprt {
* Interface Adapter calls - xprtrdma/verbs.c
*/
int rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr);
+void rpcrdma_ia_remove(struct rpcrdma_ia *ia);
void rpcrdma_ia_close(struct rpcrdma_ia *);
bool frwr_is_supported(struct rpcrdma_ia *);
bool fmr_is_supported(struct rpcrdma_ia *);
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v1 09/11] xprtrdma: Refactor rpcrdma_ep_connect
2017-03-10 16:05 [PATCH v1 00/11] [RFC] NFS/RDMA client-side patches for 4.12 Chuck Lever
` (7 preceding siblings ...)
2017-03-10 16:06 ` [PATCH v1 08/11] xprtrdma: Support unplugging an HCA from under an NFS mount Chuck Lever
@ 2017-03-10 16:07 ` Chuck Lever
2017-03-10 16:07 ` [PATCH v1 10/11] xprtrdma: Restore transport after device removal Chuck Lever
2017-03-10 16:07 ` [PATCH v1 11/11] xprtrdma: Revert commit d0f36c46deea Chuck Lever
10 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever @ 2017-03-10 16:07 UTC (permalink / raw)
To: linux-rdma, linux-nfs
I'm about to add another arm to
if (ep->rep_connected != 0)
It will be cleaner to use a switch statement here. We'll be looking
for a couple of specific errnos, or "anything else," basically to
sort out the difference between a normal reconnect and recovery from
device removal.
Also, adjust the retry label to accommodate some locking I'm about
to introduce.
This is a refactoring change only.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/xprtrdma/verbs.c | 109 +++++++++++++++++++++++++------------------
1 file changed, 63 insertions(+), 46 deletions(-)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 5bfefee..6a12386 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -710,6 +710,57 @@ static void rpcrdma_destroy_id(struct rdma_cm_id *id)
ib_free_cq(ep->rep_attr.send_cq);
}
+static int
+rpcrdma_ep_reconnect(struct rpcrdma_xprt *r_xprt, struct rpcrdma_ep *ep,
+ struct rpcrdma_ia *ia)
+{
+ struct sockaddr *sap = (struct sockaddr *)&r_xprt->rx_data.addr;
+ struct rdma_cm_id *id, *old;
+ int err, rc;
+
+ dprintk("RPC: %s: reconnecting...\n", __func__);
+
+ rpcrdma_ep_disconnect(ep, ia);
+
+ rc = -EHOSTUNREACH;
+ id = rpcrdma_create_id(r_xprt, ia, sap);
+ if (IS_ERR(id))
+ goto out;
+
+ /* As long as the new ID points to the same device as the
+ * old ID, we can reuse the transport's existing PD and all
+ * previously allocated MRs. Also, the same device means
+ * the transport's previous DMA mappings are still valid.
+ *
+ * This is a sanity check only. There should be no way these
+ * point to two different devices here.
+ */
+ old = id;
+ rc = -ENETUNREACH;
+ if (ia->ri_device != id->device) {
+ pr_err("rpcrdma: can't reconnect on different device!\n");
+ goto out_destroy;
+ }
+
+ err = rdma_create_qp(id, ia->ri_pd, &ep->rep_attr);
+ if (err) {
+ dprintk("RPC: %s: rdma_create_qp returned %d\n",
+ __func__, err);
+ goto out_destroy;
+ }
+
+ /* Atomically replace the transport's ID and QP. */
+ rc = 0;
+ old = ia->ri_id;
+ ia->ri_id = id;
+ rdma_destroy_qp(old);
+
+out_destroy:
+ rpcrdma_destroy_id(old);
+out:
+ return rc;
+}
+
/*
* Connect unconnected endpoint.
*/
@@ -718,61 +769,25 @@ static void rpcrdma_destroy_id(struct rdma_cm_id *id)
{
struct rpcrdma_xprt *r_xprt = container_of(ia, struct rpcrdma_xprt,
rx_ia);
- struct rdma_cm_id *id, *old;
- struct sockaddr *sap;
unsigned int extras;
- int rc = 0;
+ int rc;
- if (ep->rep_connected != 0) {
retry:
- dprintk("RPC: %s: reconnecting...\n", __func__);
-
- rpcrdma_ep_disconnect(ep, ia);
-
- sap = (struct sockaddr *)&r_xprt->rx_data.addr;
- id = rpcrdma_create_id(r_xprt, ia, sap);
- if (IS_ERR(id)) {
- rc = -EHOSTUNREACH;
- goto out;
- }
- /* TEMP TEMP TEMP - fail if new device:
- * Deregister/remarshal *all* requests!
- * Close and recreate adapter, pd, etc!
- * Re-determine all attributes still sane!
- * More stuff I haven't thought of!
- * Rrrgh!
- */
- if (ia->ri_device != id->device) {
- printk("RPC: %s: can't reconnect on "
- "different device!\n", __func__);
- rpcrdma_destroy_id(id);
- rc = -ENETUNREACH;
- goto out;
- }
- /* END TEMP */
- rc = rdma_create_qp(id, ia->ri_pd, &ep->rep_attr);
- if (rc) {
- dprintk("RPC: %s: rdma_create_qp failed %i\n",
- __func__, rc);
- rpcrdma_destroy_id(id);
- rc = -ENETUNREACH;
- goto out;
- }
-
- old = ia->ri_id;
- ia->ri_id = id;
-
- rdma_destroy_qp(old);
- rpcrdma_destroy_id(old);
- } else {
+ switch (ep->rep_connected) {
+ case 0:
dprintk("RPC: %s: connecting...\n", __func__);
rc = rdma_create_qp(ia->ri_id, ia->ri_pd, &ep->rep_attr);
if (rc) {
dprintk("RPC: %s: rdma_create_qp failed %i\n",
__func__, rc);
- /* do not update ep->rep_connected */
- return -ENETUNREACH;
+ rc = -ENETUNREACH;
+ goto out_noupdate;
}
+ break;
+ default:
+ rc = rpcrdma_ep_reconnect(r_xprt, ep, ia);
+ if (rc)
+ goto out;
}
ep->rep_connected = 0;
@@ -800,6 +815,8 @@ static void rpcrdma_destroy_id(struct rdma_cm_id *id)
out:
if (rc)
ep->rep_connected = rc;
+
+out_noupdate:
return rc;
}
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v1 10/11] xprtrdma: Restore transport after device removal
2017-03-10 16:05 [PATCH v1 00/11] [RFC] NFS/RDMA client-side patches for 4.12 Chuck Lever
` (8 preceding siblings ...)
2017-03-10 16:07 ` [PATCH v1 09/11] xprtrdma: Refactor rpcrdma_ep_connect Chuck Lever
@ 2017-03-10 16:07 ` Chuck Lever
2017-03-10 16:07 ` [PATCH v1 11/11] xprtrdma: Revert commit d0f36c46deea Chuck Lever
10 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever @ 2017-03-10 16:07 UTC (permalink / raw)
To: linux-rdma, linux-nfs
After a device removal, enable the transport connect worker to
restore normal operation if there is another device with
connectivity to the server.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/xprtrdma/verbs.c | 48 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 48 insertions(+)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 6a12386..ef3ceec 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -69,6 +69,7 @@
/*
* internal functions
*/
+static void rpcrdma_create_mrs(struct rpcrdma_xprt *r_xprt);
static void rpcrdma_destroy_mrs(struct rpcrdma_buffer *buf);
static void rpcrdma_dma_unmap_regbuf(struct rpcrdma_regbuf *rb);
@@ -710,6 +711,48 @@ static void rpcrdma_destroy_id(struct rdma_cm_id *id)
ib_free_cq(ep->rep_attr.send_cq);
}
+/* Re-establish a connection after a device removal event.
+ * Unlike a normal reconnection, a fresh PD and a new set
+ * of MRs and buffers is needed.
+ */
+static int
+rpcrdma_ep_recreate_xprt(struct rpcrdma_xprt *r_xprt,
+ struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
+{
+ struct sockaddr *sap = (struct sockaddr *)&r_xprt->rx_data.addr;
+ int rc, err;
+
+ pr_info("%s: r_xprt = %p\n", __func__, r_xprt);
+
+ rc = -EHOSTUNREACH;
+ if (rpcrdma_ia_open(r_xprt, sap))
+ goto out1;
+
+ rc = -ENOMEM;
+ err = rpcrdma_ep_create(ep, ia, &r_xprt->rx_data);
+ if (err) {
+ pr_err("rpcrdma: rpcrdma_ep_create returned %d\n", err);
+ goto out2;
+ }
+
+ rc = -ENETUNREACH;
+ err = rdma_create_qp(ia->ri_id, ia->ri_pd, &ep->rep_attr);
+ if (err) {
+ pr_err("rpcrdma: rdma_create_qp returned %d\n", err);
+ goto out3;
+ }
+
+ rpcrdma_create_mrs(r_xprt);
+ return 0;
+
+out3:
+ rpcrdma_ep_destroy(ep, ia);
+out2:
+ rpcrdma_ia_close(ia);
+out1:
+ return rc;
+}
+
static int
rpcrdma_ep_reconnect(struct rpcrdma_xprt *r_xprt, struct rpcrdma_ep *ep,
struct rpcrdma_ia *ia)
@@ -784,6 +827,11 @@ static void rpcrdma_destroy_id(struct rdma_cm_id *id)
goto out_noupdate;
}
break;
+ case -ENODEV:
+ rc = rpcrdma_ep_recreate_xprt(r_xprt, ep, ia);
+ if (rc)
+ goto out_noupdate;
+ break;
default:
rc = rpcrdma_ep_reconnect(r_xprt, ep, ia);
if (rc)
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v1 11/11] xprtrdma: Revert commit d0f36c46deea
2017-03-10 16:05 [PATCH v1 00/11] [RFC] NFS/RDMA client-side patches for 4.12 Chuck Lever
` (9 preceding siblings ...)
2017-03-10 16:07 ` [PATCH v1 10/11] xprtrdma: Restore transport after device removal Chuck Lever
@ 2017-03-10 16:07 ` Chuck Lever
10 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever @ 2017-03-10 16:07 UTC (permalink / raw)
To: linux-rdma, linux-nfs
Device removal is now adequately supported. Pinning the underlying
device driver to prevent removal while an NFS mount is active is no
longer necessary.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/xprtrdma/verbs.c | 33 +++++++--------------------------
1 file changed, 7 insertions(+), 26 deletions(-)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index ef3ceec..41f00c9 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -53,7 +53,7 @@
#include <linux/sunrpc/addr.h>
#include <linux/sunrpc/svc_rdma.h>
#include <asm/bitops.h>
-#include <linux/module.h> /* try_module_get()/module_put() */
+
#include <rdma/ib_cm.h>
#include "xprt_rdma.h"
@@ -344,14 +344,6 @@
return 0;
}
-static void rpcrdma_destroy_id(struct rdma_cm_id *id)
-{
- if (id) {
- module_put(id->device->owner);
- rdma_destroy_id(id);
- }
-}
-
static struct rdma_cm_id *
rpcrdma_create_id(struct rpcrdma_xprt *xprt,
struct rpcrdma_ia *ia, struct sockaddr *addr)
@@ -386,16 +378,6 @@ static void rpcrdma_destroy_id(struct rdma_cm_id *id)
goto out;
}
- /* FIXME:
- * Until xprtrdma supports DEVICE_REMOVAL, the provider must
- * be pinned while there are active NFS/RDMA mounts to prevent
- * hangs and crashes at umount time.
- */
- if (!ia->ri_async_rc && !try_module_get(id->device->owner)) {
- dprintk("RPC: %s: Failed to get device module\n",
- __func__);
- ia->ri_async_rc = -ENODEV;
- }
rc = ia->ri_async_rc;
if (rc)
goto out;
@@ -405,21 +387,20 @@ static void rpcrdma_destroy_id(struct rdma_cm_id *id)
if (rc) {
dprintk("RPC: %s: rdma_resolve_route() failed %i\n",
__func__, rc);
- goto put;
+ goto out;
}
rc = wait_for_completion_interruptible_timeout(&ia->ri_done, wtimeout);
if (rc < 0) {
dprintk("RPC: %s: wait() exited: %i\n",
__func__, rc);
- goto put;
+ goto out;
}
rc = ia->ri_async_rc;
if (rc)
- goto put;
+ goto out;
return id;
-put:
- module_put(id->device->owner);
+
out:
rdma_destroy_id(id);
return ERR_PTR(rc);
@@ -546,7 +527,7 @@ static void rpcrdma_destroy_id(struct rdma_cm_id *id)
if (ia->ri_id != NULL && !IS_ERR(ia->ri_id)) {
if (ia->ri_id->qp)
rdma_destroy_qp(ia->ri_id);
- rpcrdma_destroy_id(ia->ri_id);
+ rdma_destroy_id(ia->ri_id);
}
ia->ri_id = NULL;
ia->ri_device = NULL;
@@ -799,7 +780,7 @@ static void rpcrdma_destroy_id(struct rdma_cm_id *id)
rdma_destroy_qp(old);
out_destroy:
- rpcrdma_destroy_id(old);
+ rdma_destroy_id(old);
out:
return rc;
}
^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2017-03-10 16:07 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-03-10 16:05 [PATCH v1 00/11] [RFC] NFS/RDMA client-side patches for 4.12 Chuck Lever
2017-03-10 16:05 ` [PATCH v1 01/11] xprtrdma: Annotate receive workqueue Chuck Lever
2017-03-10 16:06 ` [PATCH v1 02/11] xprtrdma: Cancel refresh worker during buffer shutdown Chuck Lever
2017-03-10 16:06 ` [PATCH v1 03/11] xprtrdma: Clean up rpcrdma_marshal_req() Chuck Lever
2017-03-10 16:06 ` [PATCH v1 04/11] sunrpc: Export xprt_force_disconnect() Chuck Lever
2017-03-10 16:06 ` [PATCH v1 05/11] xprtrdma: Detect unreachable NFS/RDMA servers more reliably Chuck Lever
2017-03-10 16:06 ` [PATCH v1 06/11] xprtrdma: Refactor rpcrdma_ia_open() Chuck Lever
2017-03-10 16:06 ` [PATCH v1 07/11] xprtrdma: Use same device when mapping or syncing DMA buffers Chuck Lever
2017-03-10 16:06 ` [PATCH v1 08/11] xprtrdma: Support unplugging an HCA from under an NFS mount Chuck Lever
2017-03-10 16:07 ` [PATCH v1 09/11] xprtrdma: Refactor rpcrdma_ep_connect Chuck Lever
2017-03-10 16:07 ` [PATCH v1 10/11] xprtrdma: Restore transport after device removal Chuck Lever
2017-03-10 16:07 ` [PATCH v1 11/11] xprtrdma: Revert commit d0f36c46deea Chuck Lever
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).